This application allows users to upload PDF files and ask questions about their contents in real time. The app leverages an LLM (Language Model) for extracting relevant information from uploaded PDFs and uses a WebSocket to provide a chat-like interface.
│ .dockerignore # Specifies files and directories to ignore in Docker build.
│ .gitignore # Lists files and directories ignored by Git.
│ app.py # Main application entry (runs FastAPI app).
│ Dockerfile # Dockerfile to containerize the application.
│ LICENSE # License for the project.
│ main.py # Application setup, including router and configuration loading.
│ poetry.lock # Lock file for dependencies (generated by Poetry).
│ pyproject.toml # Project metadata and dependencies (for Poetry).
│ README.md # Documentation for the project.
│ requirements.txt # List of dependencies (if using pip instead of Poetry).
│ template.py # Template for setting up certain configurations.
│
├───.github
│ └───workflows
│ action.yml # CI/CD GitHub Actions workflow for automated testing and deployment.
│
├───notebooks
│ 01_experiment.ipynb # Jupyter notebook for experimentation and prototyping.
│
├───pdf_data
│ aml_industry_perspective.pdf # Example PDF file for testing.
│
├───pdf_question_answering
│ │ __init__.py # Initialization file for the main package.
│ │
│ ├───constants
│ │ __init__.py # Directory for storing constants.
│ │
│ ├───db
│ │ config.py # Database configuration settings.
│ │ database.py # Database setup and session management.
│ │ models.py # Database models.
│ │ schemas.py # Pydantic schemas for data validation.
│ │ storage.py # Storage utilities, e.g., for file uploads.
│ │ __init__.py # Initialization file for the db module.
│ │
│ ├───exception
│ │ __init__.py # Module for handling custom exceptions.
│ │
│ ├───llm
│ │ embeddings.py # Module to manage embeddings for PDFs.
│ │ llm.py # Main class for interacting with the Language Model.
│ │ llm_service.py # Service layer for Language Model operations.
│ │ vector_db.py # Vector database for storing and retrieving embeddings.
│ │ __init__.py # Initialization file for the LLM module.
│ │
│ ├───logger
│ │ __init__.py # Logger setup for tracking application logs.
│ │
│ ├───routers
│ │ routes.py # FastAPI route definitions for PDF upload and WebSocket handling.
│ │ __init__.py # Initialization file for routers module.
│ │
│ ├───utils
│ │ read_pdf.py # Utility functions for reading and processing PDF files.
│ │ __init__.py # Initialization file for utils module.
│ │
│ └───websocket
│ websockets.py # WebSocket handling logic.
│ __init__.py # Initialization file for websocket module.
│
└───tests
test_app.py # Tests for application routes and functionalities.
__init__.py # Initialization file for tests module.
- app.py: Runs the FastAPI application and handles startup configurations.
- Dockerfile: Defines the Docker container setup.
- main.py: Configures routes, WebSocket setup, and other app components.
- config.py: Defines database configuration.
- database.py: Database session management.
- models.py: Defines database schema for PDFs.
- schemas.py: Pydantic models for request/response validation.
- embeddings.py: Manages embeddings for document data.
- llm.py: Interface for interacting with the Language Model.
- llm_service.py: Service layer to handle LLM operations.
- vector_db.py: Manages the vector database for quick similarity searches.
- routes.py: Contains FastAPI routes for PDF upload and WebSocket endpoint.
- websockets.py: Logic for handling WebSocket events.
- read_pdf.py: Functions to read and split PDF text.
- __init__.py: Setup for logging.
- __init__.py: Placeholder for application-wide constants.
- Python 3.11.2
- Docker (Optional)
- Poetry (for dependency management)
-
Clone the repository:
git clone https://github.com/yourusername/pdf-question-answering.git cd pdf-question-answering -
Create Conda Environment:
conda create --prefix .venv python=3.11.2 -y
-
Activate the Conda Environment:
Using Terminal on Windows
conda activate ./.venv
Using git bash on Windows
source activate ./.venv -
Install dependencies:
pip install -r requrements.txt
-
Run the application locally:
python app.py
-
Upload a PDF: Use the
/upload-pdfroute to upload your PDF file. This route saves the PDF and processes the content. -
Ask Questions in Real-Time(Chat): Use the
/chatroute to get the route to chat with the uploaded pdf. This route returns a url which can be used for chatting about the pdf.
Run tests with:
pytest tests/test_app.py- GitHub Actions: This repository includes an automated workflow (
.github/workflows/action.yml) that runs tests on each push or pull request to the main or development branch.
