A comprehensive PDF processing system with Ollama integration for document Q&A, built with FastAPI, SQLModel, PostgreSQL, and Docker.
- PDF Upload & Processing: Upload PDF files and extract text content automatically
- Document Management: Store, retrieve, and manage PDF documents with metadata
- Semantic Search: Search through document content using intelligent text matching
- Document Q&A: Chat with your documents using Ollama language models
- Background Processing: Asynchronous PDF processing with status tracking
- REST API: Complete RESTful API with OpenAPI documentation
- Docker Support: Containerized deployment with PostgreSQL and pgAdmin
- FastAPI: Modern Python web framework for building APIs
- SQLModel: Type-safe database models with Pydantic integration
- PostgreSQL: Robust relational database for document storage
- Ollama: Local LLM integration for document Q&A
- Docker: Containerized deployment and development
- Python 3.11+
- Docker & Docker Compose
- Ollama (running locally)
First, install and start Ollama on your system:
# Install Ollama (visit https://ollama.ai for platform-specific instructions)
# Pull a model (e.g., llama2)
ollama pull llama2# Clone the repository
git clone <your-repo-url>
cd ollama-pdf-processor
# Create uploads directory
mkdir uploadsCopy and modify the .env file if needed:
# The default configuration should work for most setups
# Modify OLLAMA_MODEL if you want to use a different model
#Sample Config:
# Database Configuration
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/postgres
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=postgres
# pgAdmin Configuration
[email protected]
PGADMIN_DEFAULT_PASSWORD=admin123
# Application Configuration
APP_NAME=Ollama PDF Processor
APP_VERSION=1.0.0
DEBUG=True
HOST=0.0.0.0
PORT=8000
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2-vision
# File Upload Configuration
MAX_FILE_SIZE_MB=50
UPLOAD_DIR=./uploads
ALLOWED_FILE_TYPES=pdf
# Start all services
docker-compose up -d
# Check service status
docker-compose psThis will start:
- PostgreSQL on port 5432
- pgAdmin on port 5050 ([email protected] / admin123)
- FastAPI on port 8000
Visit these URLs to verify everything is working:
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- pgAdmin: http://localhost:5050
📋 For detailed PDF upload instructions, see PDF_UPLOAD_GUIDE.md
Method 1: Web Interface (Easiest)
- Visit http://localhost:8000/docs
- Find
POST /documents/upload - Click "Try it out" and choose your PDF
- Click "Execute"
Method 2: Command Line
curl -X POST "http://localhost:8000/documents/upload" \\
-H "accept: application/json" \\
-H "Content-Type: multipart/form-data" \\
-F "[email protected]"Method 3: PowerShell (Windows)
$uri = "http://localhost:8000/documents/upload"
$filePath = "C:\\path\\to\\your\\document.pdf"
Invoke-RestMethod -Uri $uri -Method Post -Form @{ file = Get-Item -Path $filePath }Method 4: Upload Scripts (Windows)
For convenience, we've included upload scripts:
# Simple upload
upload-pdf.bat "path\\to\\your\\document.pdf"
# Upload and wait for processing
upload-pdf.bat "path\\to\\your\\document.pdf" -waitOr use the PowerShell script directly:
.\\upload-pdf.ps1 -PdfPath "path\\to\\your\\document.pdf" -WaitForProcessingcurl -X GET "http://localhost:8000/documents"curl -X POST "http://localhost:8000/search" \\
-H "Content-Type: application/json" \\
-d '{
"query": "your search query",
"limit": 10
}'curl -X POST "http://localhost:8000/chat" \\
-H "Content-Type: application/json" \\
-d '{
"message": "What is this document about?",
"context_limit": 5
}'POST /documents/upload- Upload a PDF fileGET /documents- List all documentsGET /documents/{id}- Get specific documentDELETE /documents/{id}- Delete documentGET /documents/{id}/chunks- Get document chunks
POST /search- Semantic search across documentsPOST /chat- Chat with documents using Ollama
GET /health- Health check and system statusGET /ollama/models- List available Ollama modelsPOST /ollama/pull/{model}- Pull new Ollama model
- Install dependencies:
pip install -r requirements.txt- Set up PostgreSQL locally or use Docker:
docker run --name postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:15- Run the application:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000The application uses SQLModel with PostgreSQL. Database tables are created automatically on startup.
To access the database:
- pgAdmin: http://localhost:5050
- Direct connection: postgresql://postgres:postgres@localhost:5432/postgres
Application logs are available in the container:
docker-compose logs apiKey environment variables in .env:
DATABASE_URL: PostgreSQL connection stringOLLAMA_BASE_URL: Ollama service URLOLLAMA_MODEL: Default model for document Q&AMAX_FILE_SIZE_MB: Maximum PDF file sizeCHUNK_SIZE: Text chunk size for processingCHUNK_OVERLAP: Overlap between text chunks
-
Ollama connection failed:
- Ensure Ollama is running:
ollama serve - Check if the model is available:
ollama list - Pull required model:
ollama pull llama2
- Ensure Ollama is running:
-
Database connection error:
- Verify PostgreSQL is running:
docker-compose ps - Check database logs:
docker-compose logs postgres
- Verify PostgreSQL is running:
-
PDF processing fails:
- Check file size limits
- Verify PDF is not password-protected
- Check application logs:
docker-compose logs api
Use the health endpoint to diagnose issues:
curl http://localhost:8000/health- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section
- Review the API documentation at
/docs - Check application logs
- Open an issue on the repository