Skip to content

A minimal Flask-based microservice that performs OCR on uploaded images using Tesseract via `pytesseract`. It supports French and English with Basic Auth protection and a web interface.

Notifications You must be signed in to change notification settings

CleverCloud/demo-python-uv-mini-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR Mini SaaS

A minimal Flask-based microservice that performs OCR on uploaded images using Tesseract via pytesseract. It supports French and English with Basic Auth protection and a modern web interface.

Features

  • Upload an image and get extracted text as JSON
  • Modern, responsive UI at /, /fr, or /en with drag-and-drop support
  • Language selection via URL: en (default) and fr
  • File size validation (10MB max)
  • MIME type validation for security
  • Health check endpoint for monitoring
  • Structured logging

Local deployment

Prerequisites

  • Python 3.12+
  • Tesseract OCR installed locally (or use pre_build_script.sh)

Setup

  1. Clone the repository

  2. Install dependencies with uv (recommended)

    pip install uv
    uv sync
  3. Set up Tesseract (if not already installed)

    On macOS:

    brew install tesseract tesseract-lang

    On Ubuntu/Debian:

    sudo apt-get install tesseract-ocr tesseract-ocr-eng tesseract-ocr-fra

    Or run the pre-build script:

    bash pre_build_script.sh
    export PATH=$PATH:$(pwd)
    export TESSDATA_DIR=$(pwd)/tessdata
  4. Set environment variables

    export BASIC_AUTH_USERNAME="admin"
    export BASIC_AUTH_PASSWORD="secret"
    export FLASK_DEBUG="True"  # Optional: enable debug mode
    export TESSDATA_DIR="/path/to/tessdata"  # Optional: custom tessdata location
  5. Run the application

    uv run main.py
  6. Access the application

Testing

Test the OCR endpoint with curl:

Note: Replace localhost with your Clever Cloud application URL. Beware of the https protocol if you force the https in Clever Cloud.

# English (default)
curl -u "admin:secret" \
  -F file=@test_en.jpg \
  https://localhost:9000/ocr

# French
curl -u "admin:secret" \
  -F [email protected] \
  https://localhost:9000/ocr/fr

# Health check (no auth required)
curl https://localhost:9000/health

API

All endpoints except /health require Basic Auth.

GET /health

Health check endpoint for monitoring.

Response:

{
  "status": "healthy",
  "languages": {
    "eng": true,
    "fra": true
  }
}

GET /, /fr, /en

Web interface with file upload and language selector.

POST /ocr or /ocr/<lang_short>

Process uploaded image and extract text.

Parameters:

  • lang_short: en or fr (defaults to en when omitted)
  • Form-data: field name file with the image file

Supported formats: PNG, JPG, JPEG, GIF, BMP, TIFF, WEBP (max 10MB)

Response:

{
  "content": "Extracted text from the image..."
}

Error responses:

{
  "error": "No file provided"
}

Environment Variables

Variable Required Default Description
BASIC_AUTH_USERNAME Yes - Username for Basic Auth
BASIC_AUTH_PASSWORD Yes - Password for Basic Auth
TESSDATA_DIR No ./tessdata Path to Tesseract language data
PORT No 9000 Server port
HOST No 0.0.0.0 Server host
FLASK_DEBUG No False Enable debug mode (True/False)

Deployment on Clever Cloud

  1. Login (if not already logged in):
clever login
  1. Create a Python application:
clever create mini_ocr --type python --region par --org [ORGANISATION ID]
  1. Set up environment variable:

For uv projects on Clever Cloud, you must configure the following:

clever env set CC_PYTHON_VERSION "3.13"
clever env set CC_RUN_COMMAND "uv run gunicorn --bind 0.0.0.0:9000 --workers 2 --timeout 120 main:app"

We also need to add some variable to manage deployment though a pre-build Hook

clever env set CC_PRE_BUILD_HOOK "source pre_build_script.sh"

To protect the application, we use a basic auth

clever env set BASIC_AUTH_USERNAME "admin"
clever env set BASIC_AUTH_PASSWORD "secret"

Set up a dedicated health check endpoint

clever env set CC_HEALTH_CHECK_PATH"/health"
  1. Deploy:
clever deploy

About

A minimal Flask-based microservice that performs OCR on uploaded images using Tesseract via `pytesseract`. It supports French and English with Basic Auth protection and a web interface.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published