# General Development Rules ## Code Quality and Style - Always run `make lint` before committing to ensure code passes ruff and mypy checks - Use `make format` to automatically format code with ruff (includes import sorting and code formatting) - Follow Google-style docstrings as configured in pyproject.toml - Maintain type hints for all function parameters and return values - When creating new functions in the trustcall package, first search existing modules (_base.py, _validation_node.py) to avoid duplication ## Development Workflow - Use `uv` for all dependency management - never use pip directly - Run `make tests` for unit tests before pushing changes - Use `make tests_watch` for continuous testing during development - For evaluation testing, use `make evals` (requires API keys) - Always check that new code doesn't break existing functionality ## Import and Module Organization - Public API should only expose necessary functions through trustcall/__init__.py - Internal modules use underscore prefix (_base.py, _validation_node.py) - Follow existing import patterns: langchain-core for LLM integration, langgraph for state management - When adding new dependencies, update pyproject.toml and run `uv sync` # Repository Structure ## Core Package (trustcall/) - `__init__.py`: Public API exposing create_extractor, ExtractionInputs, ExtractionOutputs - `_base.py`: Main extraction logic, tool handling, JSON patch operations, and core extractor functionality - `_validation_node.py`: ValidationNode class for tool call validation in LangGraph workflows - `py.typed`: Indicates package supports type checking ## Testing Structure (tests/) - `unit_tests/`: Core functionality tests (test_extraction.py, test_strict_existing.py, test_utils.py) - `evals/`: Evaluation benchmarks using LangSmith for model comparison (test_evals.py) - `cassettes/`: VCR cassettes for mocking API responses in tests - `conftest.py`: Pytest configuration with asyncio backend setup ## Configuration and Build - `pyproject.toml`: Project metadata, dependencies, tool configuration (ruff, mypy, pytest) - `Makefile`: Common development commands (tests, lint, format, build, publish) - `uv.lock`: Locked dependency versions managed by uv - `.github/workflows/`: CI/CD with unit tests (test.yml) and daily evaluations (eval.yml) ## Documentation and Assets - `README.md`: Comprehensive usage examples and API documentation - `_static/`: Static assets (cover image) - `LICENSE`: MIT license # Dependencies and Installation ## Package Manager - Uses `uv` for fast, reliable dependency management - Never use pip directly - always use `uv run`, `uv sync`, or `uv add` - Dependencies are defined in pyproject.toml with version constraints ## Core Dependencies - `langgraph>=0.2.25`: State graph management for LLM workflows - `dydantic<1.0.0,>=0.0.8`: Dynamic Pydantic model creation - `jsonpatch<2.0,>=1.33`: JSON patch operations for efficient updates - `langchain-core`: LLM integration and tool calling ## Development Dependencies - Code quality: `ruff` (linting/formatting), `mypy` (type checking) - Testing: `pytest`, `pytest-asyncio`, `pytest-socket`, `vcrpy` - LLM providers: `langchain-openai`, `langchain-anthropic`, `langchain-fireworks` ## Installation Commands - `uv sync --all-extras --dev`: Install all dependencies including dev tools - `uv sync`: Install only production dependencies - `uv add `: Add new dependency - `uv build`: Build distribution packages # Testing Instructions ## Test Framework and Structure - Uses pytest with asyncio support for async/await testing patterns - Socket access is disabled by default (`--disable-socket --allow-unix-socket`) to prevent external calls - VCR cassettes in tests/cassettes/ and tests/evals/cassettes/ mock API responses ## Running Tests - `make tests`: Run unit tests with socket restrictions and detailed output - `make tests_watch`: Continuous testing during development (uses ptw) - `make evals`: Run evaluation benchmarks (requires OPENAI_API_KEY, ANTHROPIC_API_KEY, LANGSMITH_API_KEY) - `make doctest`: Run doctests in the trustcall module ## Test Categories - **Unit Tests**: Core functionality testing without external API calls - test_extraction.py: Main extractor functionality and retry logic - test_strict_existing.py: Schema validation and existing data handling - test_utils.py: Utility functions like patch application and type conversion - **Evaluation Tests**: LangSmith-integrated benchmarks comparing model performance - test_evals.py: Comparative evaluation across different LLM providers ## Writing Tests - Use FakeExtractionModel for mocking LLM responses in unit tests - Async tests should use pytest-asyncio decorators - Mock external API calls using VCR cassettes or custom fake models - Follow existing patterns for tool validation and schema testing - Test both success and error scenarios, especially for validation failures