Skip to content

zinoo3000/restaurants_SF

Repository files navigation

restaurants_SF

Pipeline to fetch San Francisco restaurant data from Socrata, dedupe in DuckDB, and generate newsletter drafts.

Setup

  1. python -m venv .venv && source .venv/bin/activate
  2. pip install -r requirements.txt
  3. cp .env.example .env and fill in secrets locally

Run the pipeline

python src/fetch.py && python src/transform.py && python src/newsletter.py

Tests and lint

pytest

ruff check src tests

ruff format

Project layout

  • src/: production modules for fetching Socrata data, deduping in DuckDB, and newsletter prep.
  • notebooks/: exploratory analyses (prefix with date and topic).
  • data/raw/: immutable pulls (timestamped JSON/CSV).
  • data/processed/: derived tables ready for publishing.
  • docs/: output artifacts (draft newsletters, charts).
  • out/: generated outputs (kept out of git; see .gitkeep).

Data notes

Generated data lives under data/ and out/ and is ignored by git. Keep the .gitkeep files so directories exist in the repo, and share outputs via docs/ when needed.

About

SF Openings and Closures of businesses

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages