This project implements a Retrieval Augmented Generation (RAG) system using Cohere's language models and LangChain framework. The system can answer questions about HR policies by referencing a knowledge base.
- Python 3.10 or higher
- Cohere API key
- Required Python packages
- Clone the repository:
git clone <repository-url>
cd Chains
- Install required packages:
pip install langchain-cohere langchain-community chromadb pydantic python-dotenv
- Set up environment variables:
Create a
.env
file in the project root:
COHERE_API_KEY=your_api_key_here
Chains/
├── data/
│ ├── globalcorp_hr_policy.txt
│ ├── local_vectorstore/
│ └── local_docstore/
├── src/
│ └── CohereQnA.py
└── README.md
- Document Loading: Automatic processing of HR policy documents
- Vector Embeddings: Powered by Cohere's embedding models
- Local Storage: Using Chroma for vector storage
- Smart Retrieval: Parent-child document architecture
- Interactive QA: Question answering using RAG
-
Place your HR policy document in
data/globalcorp_hr_policy.txt
-
Run the QnA system:
python src/CohereQnA.py
- The system will:
- Load and process the document
- Create embeddings
- Store them in a local vector store
- Answer questions about the HR policy
Parameter | Value | Description |
---|---|---|
Parent chunk size | 1000 | Characters per parent chunk |
Child chunk size | 200 | Characters per child chunk |
Overlap | 20 | Characters overlapping between chunks |
Model | command-light | Cohere model used |
Temperature | 0 | Deterministic output setting |
The system uses a two-tier document splitting approach:
- Child Splitter: Creates small, precise chunks for accurate matching
- Parent Splitter: Maintains larger chunks for context preservation
Example scenario:
Document Structure:
├── Parent Chunk (Chapter-sized, 1000 chars)
│ └── Child Chunks (Paragraph-sized, 200 chars)
The system uses a key-value document store to maintain relationships between document chunks and their sources. This is implemented using LangChain's storage system:
- Relationship Preservation: Maintains links between parent and child document chunks
- Source Tracking: Associates chunks with their original source documents
- Efficient Retrieval: Enables quick lookup of parent documents when child chunks are retrieved
- Precise Answers: Find exact relevant passages
- Context Preservation: Access broader context when needed
- Balanced Retrieval: Optimal mix of specificity and context