Pipeline to fetch San Francisco restaurant data from Socrata, dedupe in DuckDB, and generate newsletter drafts.
python -m venv .venv && source .venv/bin/activatepip install -r requirements.txtcp .env.example .envand fill in secrets locally
python src/fetch.py && python src/transform.py && python src/newsletter.py
pytest
ruff check src tests
ruff format
src/: production modules for fetching Socrata data, deduping in DuckDB, and newsletter prep.notebooks/: exploratory analyses (prefix with date and topic).data/raw/: immutable pulls (timestamped JSON/CSV).data/processed/: derived tables ready for publishing.docs/: output artifacts (draft newsletters, charts).out/: generated outputs (kept out of git; see.gitkeep).
Generated data lives under data/ and out/ and is ignored by git. Keep the
.gitkeep files so directories exist in the repo, and share outputs via
docs/ when needed.