# General Development Rules
## Code Quality and Style
- Always run `make lint` before committing to ensure code passes ruff and mypy checks
- Use `make format` to automatically format code with ruff (includes import sorting and code formatting)
- Follow Google-style docstrings as configured in pyproject.toml
- Maintain type hints for all function parameters and return values
- When creating new functions in the trustcall package, first search existing modules (_base.py, _validation_node.py) to avoid duplication
## Development Workflow
- Use `uv` for all dependency management - never use pip directly
- Run `make tests` for unit tests before pushing changes
- Use `make tests_watch` for continuous testing during development
- For evaluation testing, use `make evals` (requires API keys)
- Always check that new code doesn't break existing functionality
## Import and Module Organization
- Public API should only expose necessary functions through trustcall/__init__.py
- Internal modules use underscore prefix (_base.py, _validation_node.py)
- Follow existing import patterns: langchain-core for LLM integration, langgraph for state management
- When adding new dependencies, update pyproject.toml and run `uv sync`
# Repository Structure
## Core Package (trustcall/)
- `__init__.py`: Public API exposing create_extractor, ExtractionInputs, ExtractionOutputs
- `_base.py`: Main extraction logic, tool handling, JSON patch operations, and core extractor functionality
- `_validation_node.py`: ValidationNode class for tool call validation in LangGraph workflows
- `py.typed`: Indicates package supports type checking
## Testing Structure (tests/)
- `unit_tests/`: Core functionality tests (test_extraction.py, test_strict_existing.py, test_utils.py)
- `evals/`: Evaluation benchmarks using LangSmith for model comparison (test_evals.py)
- `cassettes/`: VCR cassettes for mocking API responses in tests
- `conftest.py`: Pytest configuration with asyncio backend setup
## Configuration and Build
- `pyproject.toml`: Project metadata, dependencies, tool configuration (ruff, mypy, pytest)
- `Makefile`: Common development commands (tests, lint, format, build, publish)
- `uv.lock`: Locked dependency versions managed by uv
- `.github/workflows/`: CI/CD with unit tests (test.yml) and daily evaluations (eval.yml)
## Documentation and Assets
- `README.md`: Comprehensive usage examples and API documentation
- `_static/`: Static assets (cover image)
- `LICENSE`: MIT license
# Dependencies and Installation
## Package Manager
- Uses `uv` for fast, reliable dependency management
- Never use pip directly - always use `uv run`, `uv sync`, or `uv add`
- Dependencies are defined in pyproject.toml with version constraints
## Core Dependencies
- `langgraph>=0.2.25`: State graph management for LLM workflows
- `dydantic<1.0.0,>=0.0.8`: Dynamic Pydantic model creation
- `jsonpatch<2.0,>=1.33`: JSON patch operations for efficient updates
- `langchain-core`: LLM integration and tool calling
## Development Dependencies
- Code quality: `ruff` (linting/formatting), `mypy` (type checking)
- Testing: `pytest`, `pytest-asyncio`, `pytest-socket`, `vcrpy`
- LLM providers: `langchain-openai`, `langchain-anthropic`, `langchain-fireworks`
## Installation Commands
- `uv sync --all-extras --dev`: Install all dependencies including dev tools
- `uv sync`: Install only production dependencies
- `uv add `: Add new dependency
- `uv build`: Build distribution packages
# Testing Instructions
## Test Framework and Structure
- Uses pytest with asyncio support for async/await testing patterns
- Socket access is disabled by default (`--disable-socket --allow-unix-socket`) to prevent external calls
- VCR cassettes in tests/cassettes/ and tests/evals/cassettes/ mock API responses
## Running Tests
- `make tests`: Run unit tests with socket restrictions and detailed output
- `make tests_watch`: Continuous testing during development (uses ptw)
- `make evals`: Run evaluation benchmarks (requires OPENAI_API_KEY, ANTHROPIC_API_KEY, LANGSMITH_API_KEY)
- `make doctest`: Run doctests in the trustcall module
## Test Categories
- **Unit Tests**: Core functionality testing without external API calls
- test_extraction.py: Main extractor functionality and retry logic
- test_strict_existing.py: Schema validation and existing data handling
- test_utils.py: Utility functions like patch application and type conversion
- **Evaluation Tests**: LangSmith-integrated benchmarks comparing model performance
- test_evals.py: Comparative evaluation across different LLM providers
## Writing Tests
- Use FakeExtractionModel for mocking LLM responses in unit tests
- Async tests should use pytest-asyncio decorators
- Mock external API calls using VCR cassettes or custom fake models
- Follow existing patterns for tool validation and schema testing
- Test both success and error scenarios, especially for validation failures