Extract data from business registration documents using OCR.
- Clone this repository
- Create virtual environment:
python3 -m venv venv
source venv/bin/activate- Install dependencies:
brew install tesseract
pip install -r requirements.txt- Place document images in
data/sample_documents/ - Run:
python src/main.py- Check results in
output/folder
business-reg-ocr/
├── src/
│ ├── main.py # Main application
│ ├── image_processor.py # Image preprocessing
│ ├── ocr_engine.py # OCR engine
│ └── parser.py # Data extraction
├── tests/
├── data/sample_documents/ # Input images
└── output/ # Results