This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
npm run dev— Start Vite dev server on port 5173npm run build— Type-check withtscthen build with Vitenpm run preview— Preview the production build
No test framework is configured. No linter is configured.
A React PWA for batch OCR processing. Users open a local folder via the File System Access API, select image/PDF files, and run OCR through one of three vision-language model providers. Results are saved as .ocr.json sidecar files in an ocr_outputs/ subdirectory next to the source images.
Deployed to GitHub Pages at /ocr_batch_processor/ (configured in vite.config.ts base).
- Image → Provider Client —
App.tsx:processOneOcrreads the file as a data URL and dispatches to the selected provider client - Provider Clients (
lmStudioClient.ts,geminiClient.ts,ollamaClient.ts) — Each has its own API format but all accept the same inputs (image data URL + system prompt) and return raw text - Response Parsing (
ocr/parseOcrResponse.ts) — Normalizes raw model output. Handles multiple formats: GLM-OCR JSON arrays (withlabel/bbox/contentfields, converted to HTML divs withdata-bboxanddata-label), structured JSON objects withhtml/content/textkeys, or raw HTML strings - Post-processing — HTML is converted to markdown (
ocr/htmlToMarkdown.ts), and bounding boxes are rendered as a canvas overlay (ocr/renderBboxes.ts) - Storage — Result saved as
OcrStoredResultto the filesystem viastorage/ocrFileSystem.ts. IndexedDB storage exists instorage/ocrStore.tsbut the filesystem approach is primary
Three providers share the same interface shape but use different APIs:
- LM Studio — Native REST API v1 (
/api/v1/chat), usesinputarray with{type: "text"/"image"}items - Google Gemini — REST API with
inline_datafor images, has built-in 12-second rate limiting for batch processing (5 RPM free tier) - Ollama —
/api/generateendpoint withimagesarray (raw base64, no data URL prefix),stream: false
ocr/prompts.ts defines built-in prompt profiles (Chandra-OCR HTML/Layout, GLM-OCR Markdown/Layout) and supports custom profiles stored in localStorage. The PromptProfile type is the union of built-in keys; custom profiles use string IDs like custom_<timestamp>.
Two coordinate scales exist: Chandra-OCR uses 0–1024, GLM-OCR uses 0–1000. renderBboxes.ts auto-detects the scale by checking if any coordinate exceeds 1000.
All state lives in App.tsx via useState hooks — no external state library. App.tsx is the orchestrator: it holds provider config, workspace state, file selection, and OCR results, passing them down as props to layout components.
WorkspaceLayout— Shell layout with sidebar/toolbar/content slotsFileSidebar— File list with multi-select (shift/cmd+click)ActionToolbar— Run OCR, split pages, convert PDF actionsDocumentViewer— Split view (original + annotated) or text viewSettingsDialog— Provider config, prompt profile selection, connection testingcomponents/ui/— Minimal shadcn-style primitives (button, card, input, label, textarea)
The app uses the browser File System Access API (showDirectoryPicker, FileSystemDirectoryHandle) for reading source files and writing results. This requires a Chromium-based browser and only works over HTTP (localhost) or HTTPS. An HTTPS warning banner appears when local providers are used on a deployed (non-localhost) HTTPS site.
lib/pdfTools.ts uses pdfjs-dist for:
- PDF to Images — Renders each page to canvas at 2x scale, exports as JPEG to
converted_jpegs/subdirectory - Split Pages — Splits double-page scans into left/right halves, supports LR or RL reading order, outputs to
split_jpegs/
- React 18 + TypeScript (strict mode)
- Vite 5 + SWC (via
@vitejs/plugin-react-swc) - Tailwind CSS 3
- PWA via
vite-plugin-pwa(autoUpdate registration) pdfjs-distfor PDF renderingjszipfor export bundlingclsx+tailwind-merge(vialib/utils.tscnhelper)