Skip to content

mammubarak/receipt-data-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Receipt-Data-Extractor

Description

This CLI application processes receipt images, detects text areas, extracts text (items, prices, and totals), and compares multiple receipts by visualizing the totals in a chart.

Features

  • Image preprocessing (resizing, orientation correction, deskewing)
  • Text area detection
  • Optical Character Recognition (OCR) using Tesseract
  • Summarization of receipt data (item name, quantity, price, subtotal, cash, change)
  • Comparison of multiple receipts with a visual bar chart

Prerequisites

  1. Install Tesseract

    • Windows: Download and install Tesseract here
    • Uncomment line 7 in ocr_recognition.py file. (Give the correct installed path of tesseract OCR engine)
    • Linux: Install via terminal:
      sudo apt install tesseract-ocr
    • macOS: Install via Homebrew:
      brew install tesseract
  2. Install Python Dependencies Run the following command to install dependencies:

    pip install -r requirements.txt
    

Running the Application

Process a single receipt

python main.py path/to/receipt_image.jpg
  • example:
python main.py images/input/r1.png

Compare multiple receipts

python main.py --compare

About

Sales Receipt Text Extractor using Computer Graphics and Visualization techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages