Skip to content
View acsenrafilho's full-sized avatar
🌎
Working from Latin America
🌎
Working from Latin America

Highlights

  • Pro

Organizations

@CSIM-Toolkits @LOAMRI

Block or report acsenrafilho

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

OCR & Document Analysis

12 repositories
Python 71 27 Updated May 15, 2026

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 6,093 644 Updated May 22, 2026

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 29,505 3,574 Updated Dec 5, 2025

Document binarization using deep learning

Jupyter Notebook 10 1 Updated Dec 31, 2020

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python 9,784 727 Updated May 21, 2026

Handwritten Text Synthesis and Recognition

Python 300 93 Updated May 15, 2026

Python tool for converting files and office documents to Markdown.

Python 124,844 8,493 Updated May 22, 2026

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

Python 81,100 9,291 Updated May 22, 2026

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python 9,002 765 Updated Mar 25, 2026

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

Python 36,535 2,517 Updated May 21, 2026

Hallucination-prevention RAG system with verbatim span extraction. Ensures all generated content is grounded in source documents with exact citations.

Python 175 23 Updated May 22, 2026

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 9,677 729 Updated Jan 3, 2025