107 lines (88 loc) · 5.01 KB

<general_rules>

General Development Rules

Code Quality and Style

Always run make lint before committing to ensure code passes ruff and mypy checks
Use make format to automatically format code with ruff (includes import sorting and code formatting)
Follow Google-style docstrings as configured in pyproject.toml
Maintain type hints for all function parameters and return values
When creating new functions in the trustcall package, first search existing modules (_base.py, _validation_node.py) to avoid duplication

Development Workflow

Use uv for all dependency management - never use pip directly
Run make tests for unit tests before pushing changes
Use make tests_watch for continuous testing during development
For evaluation testing, use make evals (requires API keys)
Always check that new code doesn't break existing functionality

Import and Module Organization

Public API should only expose necessary functions through trustcall/init.py
Internal modules use underscore prefix (_base.py, _validation_node.py)
Follow existing import patterns: langchain-core for LLM integration, langgraph for state management
When adding new dependencies, update pyproject.toml and run uv sync </general_rules>

<repository_structure>

Repository Structure

Core Package (trustcall/)

__init__.py: Public API exposing create_extractor, ExtractionInputs, ExtractionOutputs
_base.py: Main extraction logic, tool handling, JSON patch operations, and core extractor functionality
_validation_node.py: ValidationNode class for tool call validation in LangGraph workflows
py.typed: Indicates package supports type checking

Testing Structure (tests/)

unit_tests/: Core functionality tests (test_extraction.py, test_strict_existing.py, test_utils.py)
evals/: Evaluation benchmarks using LangSmith for model comparison (test_evals.py)
cassettes/: VCR cassettes for mocking API responses in tests
conftest.py: Pytest configuration with asyncio backend setup

Configuration and Build

pyproject.toml: Project metadata, dependencies, tool configuration (ruff, mypy, pytest)
Makefile: Common development commands (tests, lint, format, build, publish)
uv.lock: Locked dependency versions managed by uv
.github/workflows/: CI/CD with unit tests (test.yml) and daily evaluations (eval.yml)

Documentation and Assets

README.md: Comprehensive usage examples and API documentation
_static/: Static assets (cover image)
LICENSE: MIT license </repository_structure>

<dependencies_and_installation>

Dependencies and Installation

Package Manager

Uses uv for fast, reliable dependency management
Never use pip directly - always use uv run, uv sync, or uv add
Dependencies are defined in pyproject.toml with version constraints

Core Dependencies

langgraph>=0.2.25: State graph management for LLM workflows
dydantic<1.0.0,>=0.0.8: Dynamic Pydantic model creation
jsonpatch<2.0,>=1.33: JSON patch operations for efficient updates
langchain-core: LLM integration and tool calling

Development Dependencies

Code quality: ruff (linting/formatting), mypy (type checking)
Testing: pytest, pytest-asyncio, pytest-socket, vcrpy
LLM providers: langchain-openai, langchain-anthropic, langchain-fireworks

Installation Commands

uv sync --all-extras --dev: Install all dependencies including dev tools
uv sync: Install only production dependencies
uv add <package>: Add new dependency
uv build: Build distribution packages </dependencies_and_installation>

<testing_instructions>

Testing Instructions

Test Framework and Structure

Uses pytest with asyncio support for async/await testing patterns
Socket access is disabled by default (--disable-socket --allow-unix-socket) to prevent external calls
VCR cassettes in tests/cassettes/ and tests/evals/cassettes/ mock API responses

Running Tests

make tests: Run unit tests with socket restrictions and detailed output
make tests_watch: Continuous testing during development (uses ptw)
make evals: Run evaluation benchmarks (requires OPENAI_API_KEY, ANTHROPIC_API_KEY, LANGSMITH_API_KEY)
make doctest: Run doctests in the trustcall module

Test Categories

Unit Tests: Core functionality testing without external API calls
- test_extraction.py: Main extractor functionality and retry logic
- test_strict_existing.py: Schema validation and existing data handling
- test_utils.py: Utility functions like patch application and type conversion
Evaluation Tests: LangSmith-integrated benchmarks comparing model performance
- test_evals.py: Comparative evaluation across different LLM providers

Writing Tests

Use FakeExtractionModel for mocking LLM responses in unit tests
Async tests should use pytest-asyncio decorators
Mock external API calls using VCR cassettes or custom fake models
Follow existing patterns for tool validation and schema testing
Test both success and error scenarios, especially for validation failures </testing_instructions>