Skip to content

swati-git/Chains

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

RAG Implementation with Cohere and LangChain

This project implements a Retrieval Augmented Generation (RAG) system using Cohere's language models and LangChain framework. The system can answer questions about HR policies by referencing a knowledge base.

Table of Contents

Prerequisites

  • Python 3.10 or higher
  • Cohere API key
  • Required Python packages

Installation

  1. Clone the repository:
git clone <repository-url>
cd Chains
  1. Install required packages:
pip install langchain-cohere langchain-community chromadb pydantic python-dotenv
  1. Set up environment variables: Create a .env file in the project root:
COHERE_API_KEY=your_api_key_here

Project Structure

Chains/
├── data/
│   ├── globalcorp_hr_policy.txt
│   ├── local_vectorstore/
│   └── local_docstore/
├── src/
│   └── CohereQnA.py
└── README.md

Features

  • Document Loading: Automatic processing of HR policy documents
  • Vector Embeddings: Powered by Cohere's embedding models
  • Local Storage: Using Chroma for vector storage
  • Smart Retrieval: Parent-child document architecture
  • Interactive QA: Question answering using RAG

Usage

  1. Place your HR policy document in data/globalcorp_hr_policy.txt

  2. Run the QnA system:

python src/CohereQnA.py
  1. The system will:
    • Load and process the document
    • Create embeddings
    • Store them in a local vector store
    • Answer questions about the HR policy

Configuration

Parameter Value Description
Parent chunk size 1000 Characters per parent chunk
Child chunk size 200 Characters per child chunk
Overlap 20 Characters overlapping between chunks
Model command-light Cohere model used
Temperature 0 Deterministic output setting

How It Works

Parent-Child Document Retrieval

The system uses a two-tier document splitting approach:

  • Child Splitter: Creates small, precise chunks for accurate matching
  • Parent Splitter: Maintains larger chunks for context preservation

Example scenario:

Document Structure:
├── Parent Chunk (Chapter-sized, 1000 chars)
│   └── Child Chunks (Paragraph-sized, 200 chars)

Key-Value Document Store

The system uses a key-value document store to maintain relationships between document chunks and their sources. This is implemented using LangChain's storage system:

Features

  • Relationship Preservation: Maintains links between parent and child document chunks
  • Source Tracking: Associates chunks with their original source documents
  • Efficient Retrieval: Enables quick lookup of parent documents when child chunks are retrieved

Benefits

  • Precise Answers: Find exact relevant passages
  • Context Preservation: Access broader context when needed
  • Balanced Retrieval: Optimal mix of specificity and context

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages