RAG Full-Cycle Pipeline

A production-ready Retrieval-Augmented Generation (RAG) system built with modern AI technologies.

Evaluation loop

Extract text from PDF
Create chunks with a given size/overlap
Build vector store with batch processing
Generate synthetic questions** for chunks using LLM with ThreadPoolExecutor
Evaluate** each config using:
- Retrieval metrics:
  - Recall@K
  - Precision@K

Technologies & Stack

Vector Database: Pinecone for scalable similarity search
Embeddings: free OLLAMA local
LLM: OLLAMA Model for question generation
PDF Processing: pdfplumber for document extraction
Python: Modern async/await patterns with Poetry dependency management

Architecture

End-to-End RAG Pipeline:

Document Processing: PDF text extraction and intelligent chunking
Vector Generation: embeddings with batch processing
Vector Storage: Pinecone serverless for production scalability
Question Generation: AI-powered question creation from content
Retrieval Testing: Automated similarity search validation

Key Features

Multi-chunk Strategy: Configurable chunk sizes and overlap
Batch Processing: Optimized for large document sets
Type Safety: Pydantic models for data validation
Production Ready: Step support to allow to start from any step, error recovery, monitoring with logfire

Quick Start

Set OPENAI_API_KEY PINECONE_API_KEY LOGFIRE_API_KEY in your environment
Download https://ollama.com/
pull all-minilm and mistral:latest models from ollama
poetry install && poetry run pipeline

You can run with baby book or by10syb report. You can speciy step or not steps ru all all. Here are examples:

poetry run pipeline baby --steps questions
poetry run pipeline baby
poetry run pipeline fy10 --steps retrievers evaluate

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
src/rag_full_cycle		src/rag_full_cycle
tests		tests
.gitignore		.gitignore
README.md		README.md
The-Three-Little-Pigs-original.pdf		The-Three-Little-Pigs-original.pdf
fy10syb.pdf		fy10syb.pdf
logfire.jpg		logfire.jpg
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Full-Cycle Pipeline

Evaluation loop

Technologies & Stack

Architecture

Key Features

Quick Start

About

Uh oh!

Releases

Packages

Languages

ipassynk/rag-full-cycle

Folders and files

Latest commit

History

Repository files navigation

RAG Full-Cycle Pipeline

Evaluation loop

Technologies & Stack

Architecture

Key Features

Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages