A minimal Flask-based microservice that performs OCR on uploaded images using Tesseract via pytesseract
. It supports French and English with Basic Auth protection and a modern web interface.
- Upload an image and get extracted text as JSON
- Modern, responsive UI at
/
,/fr
, or/en
with drag-and-drop support - Language selection via URL:
en
(default) andfr
- File size validation (10MB max)
- MIME type validation for security
- Health check endpoint for monitoring
- Structured logging
- Python 3.12+
- Tesseract OCR installed locally (or use pre_build_script.sh)
-
Clone the repository
-
Install dependencies with uv (recommended)
pip install uv uv sync
-
Set up Tesseract (if not already installed)
On macOS:
brew install tesseract tesseract-lang
On Ubuntu/Debian:
sudo apt-get install tesseract-ocr tesseract-ocr-eng tesseract-ocr-fra
Or run the pre-build script:
bash pre_build_script.sh export PATH=$PATH:$(pwd) export TESSDATA_DIR=$(pwd)/tessdata
-
Set environment variables
export BASIC_AUTH_USERNAME="admin" export BASIC_AUTH_PASSWORD="secret" export FLASK_DEBUG="True" # Optional: enable debug mode export TESSDATA_DIR="/path/to/tessdata" # Optional: custom tessdata location
-
Run the application
uv run main.py
-
Access the application
- Open browser: http://localhost:9000
- Login with credentials:
admin
/secret
Test the OCR endpoint with curl:
Note: Replace
localhost
with your Clever Cloud application URL. Beware of thehttps
protocol if you force the https in Clever Cloud.
# English (default)
curl -u "admin:secret" \
-F file=@test_en.jpg \
https://localhost:9000/ocr
# French
curl -u "admin:secret" \
-F [email protected] \
https://localhost:9000/ocr/fr
# Health check (no auth required)
curl https://localhost:9000/health
All endpoints except /health
require Basic Auth.
Health check endpoint for monitoring.
Response:
{
"status": "healthy",
"languages": {
"eng": true,
"fra": true
}
}
Web interface with file upload and language selector.
Process uploaded image and extract text.
Parameters:
lang_short
:en
orfr
(defaults toen
when omitted)- Form-data: field name
file
with the image file
Supported formats: PNG, JPG, JPEG, GIF, BMP, TIFF, WEBP (max 10MB)
Response:
{
"content": "Extracted text from the image..."
}
Error responses:
{
"error": "No file provided"
}
Variable | Required | Default | Description |
---|---|---|---|
BASIC_AUTH_USERNAME |
Yes | - | Username for Basic Auth |
BASIC_AUTH_PASSWORD |
Yes | - | Password for Basic Auth |
TESSDATA_DIR |
No | ./tessdata |
Path to Tesseract language data |
PORT |
No | 9000 |
Server port |
HOST |
No | 0.0.0.0 |
Server host |
FLASK_DEBUG |
No | False |
Enable debug mode (True/False) |
- Login (if not already logged in):
clever login
- Create a Python application:
clever create mini_ocr --type python --region par --org [ORGANISATION ID]
- Set up environment variable:
For uv
projects on Clever Cloud, you must configure the following:
clever env set CC_PYTHON_VERSION "3.13"
clever env set CC_RUN_COMMAND "uv run gunicorn --bind 0.0.0.0:9000 --workers 2 --timeout 120 main:app"
We also need to add some variable to manage deployment though a pre-build Hook
clever env set CC_PRE_BUILD_HOOK "source pre_build_script.sh"
To protect the application, we use a basic auth
clever env set BASIC_AUTH_USERNAME "admin"
clever env set BASIC_AUTH_PASSWORD "secret"
Set up a dedicated health check endpoint
clever env set CC_HEALTH_CHECK_PATH"/health"
- Deploy:
clever deploy