This is the source to some analysis which I run every so often when producing content for various talks I've given.
Dataset is from https://py-code.org/datasets#metadata
Original credit for the analysis from Seth Michael Larson, https://fosstodon.org/@sethmlarson/111382964885780823
scripts/ directory contains the source files for downloading and running the analysis.
output/ directory contains the results of analysis as CSV files:
projects.csvcontains a list of projects uploaded each day to PyPI which contained native codeuploads.csvis an aggregation which counts the number of projects by language by monthnew_projects.csvis a filtered view which counts each project only once ever
Run the scripts in the following order to complete the full pipeline:
scripts/download.sh # does what you expect
scripts/build_projects.py # initial costly analysis to aggregate the data to a per-project level
scripts/aggregate.py # builds the final content