A production-ready Retrieval-Augmented Generation (RAG) system built with modern AI technologies.
- Extract text from PDF
- Create chunks with a given size/overlap
- Build vector store with batch processing
- Generate synthetic questions** for chunks using LLM with ThreadPoolExecutor
- Evaluate** each config using:
- Retrieval metrics:
- Recall@K
- Precision@K
- Retrieval metrics:
- Vector Database: Pinecone for scalable similarity search
- Embeddings: free OLLAMA local
- LLM: OLLAMA Model for question generation
- PDF Processing: pdfplumber for document extraction
- Python: Modern async/await patterns with Poetry dependency management
End-to-End RAG Pipeline:
- Document Processing: PDF text extraction and intelligent chunking
- Vector Generation: embeddings with batch processing
- Vector Storage: Pinecone serverless for production scalability
- Question Generation: AI-powered question creation from content
- Retrieval Testing: Automated similarity search validation
- Multi-chunk Strategy: Configurable chunk sizes and overlap
- Batch Processing: Optimized for large document sets
- Type Safety: Pydantic models for data validation
- Production Ready: Step support to allow to start from any step, error recovery, monitoring with logfire
- Set OPENAI_API_KEY PINECONE_API_KEY LOGFIRE_API_KEY in your environment
- Download https://ollama.com/
- pull all-minilm and mistral:latest models from ollama
- poetry install && poetry run pipeline
You can run with baby book or by10syb report. You can speciy step or not steps ru all all. Here are examples:
poetry run pipeline baby --steps questions
poetry run pipeline baby
poetry run pipeline fy10 --steps retrievers evaluate
