Skip to content

greenygh0st/ocr-micro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR Microservice

A lightweight OCR microservice built with FastAPI and Tesseract (pytesseract).
Accepts base64-encoded images and returns extracted text with word-level bounding boxes.

Docker Image

greenygh0st/ocr-micro

Features

  • GET /health → liveness check (returns Tesseract version)
  • GET /ready → readiness check (service ready & concurrency info)
  • POST /ocr → perform OCR on a base64-encoded image
  • Input validation (payload size, language, OEM/PSM modes)
  • Resource limits & concurrency guard
  • Dockerized with HEALTHCHECK

Requirements

  • Docker (recommended)
  • Or: Python 3.11+, Tesseract OCR installed locally

Quick Start

Build & Run

# Build
docker build -t ocr-micro .

# Run
docker run --rm -p 5001:5001 ocr-micro

Health

curl http://localhost:5001/health

Expected response:

{"ok": true, "tesseract_version": "5.4.0"}

OCR Request

curl -X POST http://localhost:5001/ocr \
  -H "Content-Type: application/json" \
  -d '{"image_b64":"<BASE64_STRING>","lang":"eng","psm":6}'

Example response:

{
  "text": "Hello World\n",
  "words": [
    {"text": "Hello", "conf": 95.3, "left": 34, "top": 20, "width": 60, "height": 18},
    {"text": "World", "conf": 91.1, "left": 100, "top": 20, "width": 70, "height": 18}
  ],
  "lang": "eng",
  "duration_ms": 124
}

Configuration

Environment variables:

Variable Default Description
MAX_B64_BYTES 10000000 Max size of base64 payload (bytes)
MAX_IMAGE_PIXELS 40000000 Max image pixels (defense-in-depth)
OCR_TIMEOUT_SEC 15 Max seconds per OCR call
MAX_CONCURRENCY 4 Max simultaneous OCR requests
ALLOWED_LANGS eng Allowed languages (eng+spa+deu etc.)

Development

Install locally:

pip install -r requirements.txt
uvicorn app:app --reload

Security Notes

  • Runs as a non-root user in Docker
  • Input validation prevents overly large images or payloads
  • Use behind a firewall or service mesh (no built-in auth)

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors