Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
c0f61f3
added dagster_defs and dg.Dockerfile
jkislin Oct 7, 2025
c5c46ea
simplified names
jkislin Oct 7, 2025
08b1939
moved dagster to its own folder
jkislin Oct 7, 2025
842d7a4
dagster_defs.py a la sandbox; containerfile updates to include dagster
jkislin Oct 8, 2025
8996067
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 9, 2025
066894b
Merge branch 'main' into jk-dagster-sandbox
jkislin Oct 9, 2025
5307f58
quick fix to dagster_defs.py
Oct 14, 2025
6746820
some simplification for dagster. pending debugging tomorrow.
jkislin Oct 14, 2025
f2df9ef
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Oct 14, 2025
3472034
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 14, 2025
631f982
tons of incremental updates toward a working dagster pyrenew-h. check…
jkislin Oct 16, 2025
38cd584
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2025
500df4d
blobfuse mounts in local dagster docker executor!
jkislin Oct 17, 2025
01c2497
pyrenew-h-output for now
jkislin Oct 21, 2025
d0e6947
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Oct 21, 2025
884bb44
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 21, 2025
dbcf189
Merge branch 'main' of https://github.com/CDCgov/pyrenew-hew into jk-…
jkislin Oct 21, 2025
c5d9de6
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Oct 21, 2025
947d34d
output subdir back to ./
jkislin Oct 21, 2025
60e9f6c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 21, 2025
07aa576
fix asset execution context
jkislin Oct 21, 2025
abd1ed4
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Oct 21, 2025
aa03b33
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 21, 2025
5264571
pyrenew asset builder
jkislin Oct 21, 2025
125c4ff
An initial working example for pyrenew-hew!
jkislin Oct 29, 2025
f10171a
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Oct 29, 2025
282c869
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 29, 2025
40ec05a
readme update and testing for the timeseries and batch executor
jkislin Nov 3, 2025
9235f55
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 3, 2025
2609614
blobfuse refinements
jkislin Nov 3, 2025
1082032
switch to docker executor for testing
jkislin Nov 3, 2025
499a753
caj executor docker image tag
jkislin Nov 4, 2025
bf1116e
fixed batch executor source volumes
jkislin Nov 4, 2025
d23d306
updated docker executor to also user username versioned tags
jkislin Nov 4, 2025
8f57a82
reverting to batch executor for testing
jkislin Nov 4, 2025
5ba84d3
Success! Batch Executor works
jkislin Nov 4, 2025
82a762e
some mount cleanup
jkislin Nov 21, 2025
e981428
update definitions
jkislin Nov 25, 2025
e6c9b09
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 25, 2025
c3369d9
remove unnecessary commented and redundant code
jkislin Dec 2, 2025
bf6a20d
Merge branch 'main' of https://github.com/CDCgov/pyrenew-hew into jk-…
jkislin Dec 2, 2025
d615505
adding env variables for Azure Command Center
jkislin Dec 2, 2025
3da4c81
allow different models to have different partitions
jkislin Dec 2, 2025
40a72cb
an attempt at clean venv separation between pyrenew and dagster
jkislin Dec 2, 2025
b8d1f1a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 2, 2025
74cd8a7
Merge branch 'main' into jk-dagster-sandbox
jkislin Dec 4, 2025
a3d891d
simplified blobfuse
jkislin Dec 4, 2025
e3f1a2b
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Dec 4, 2025
eaaf510
merged the dagster and project dependencies together
jkislin Dec 5, 2025
b2415d3
Merge branch 'main' of https://github.com/CDCgov/pyrenew-hew into jk-…
jkislin Dec 5, 2025
1a77853
new dagster_defs with precommit run
jkislin Dec 5, 2025
93f7e7b
Merge branch 'main' into jk-dagster-sandbox
jkislin Dec 9, 2025
df0d8ed
partition definitions are all the same; we exclude in runs themselves…
jkislin Dec 10, 2025
c592804
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Dec 10, 2025
612cff9
Merge branch 'main' of https://github.com/CDCgov/pyrenew-hew into jk-…
jkislin Dec 10, 2025
21f1d9f
preparing for central code location usage; updated cfa dagster
jkislin Dec 11, 2025
59781de
readme, github actions, makefile
jkislin Dec 11, 2025
aa67bb3
Merge branch 'main' into jk-dagster-sandbox
jkislin Dec 11, 2025
7b49243
Update blobfuse/README.md
jkislin Dec 11, 2025
b7190ae
Update Makefile
jkislin Dec 11, 2025
dc1d8cb
replaced ignoring dagster_defs.py with ruff format off for a single c…
jkislin Dec 11, 2025
89e8a59
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Dec 11, 2025
c28c60f
Apply suggestion from @Copilot
jkislin Dec 11, 2025
30ead07
images
jkislin Dec 11, 2025
1ad893f
Merge branch 'main' into jk-dagster-sandbox
jkislin Dec 12, 2025
09e0670
add context to run_pyrenew_model calls
jkislin Dec 12, 2025
5e8f858
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin Dec 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .containerignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Containerfile
nssp_demo/private_data
notebooks
mounts/
4 changes: 4 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Containerfile
nssp_demo/private_data
notebooks
mounts
24 changes: 24 additions & 0 deletions .github/workflows/update-dagster-code.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Deploy Dagster Code

on:
push:
branches:
- main
workflow_dispatch:

jobs:
run:
runs-on: ubuntu-latest
environment: production
name: Run Dagster job
steps:
- name: Run job
uses: CDCgov/cfa-actions/[email protected]
with:
github_app_id: ${{ secrets.CDCENT_ACTOR_APP_ID }}
github_app_pem: ${{ secrets.CDCENT_ACTOR_APP_PEM }}
script: |
echo "Running update script"
uv run \
https://raw.githubusercontent.com/CDCgov/cfa-dagster/refs/heads/main/scripts/update_code_location.py \
--registry_image cfaprdbatchcr.azurecr.io/pyrenew-hew:dagster_latest
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
*.gif
*.pdf
*.png
# except when in img/ folder
!img/*.png

# Compressed archives
*.gz
Expand Down Expand Up @@ -401,7 +403,16 @@ nssp-etl
output
params
nwss-vintages
mounts
prod-param-estimates
pyrenew-hew-config
pyrenew-hew-prod-output
pyrenew-test-output
test-output

# Azure configuration files
azureconfig.env
azureconfig.sh

# blobfuse
config.yaml
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ repos:
- id: ruff-check
# Run the formatter
- id: ruff-format
#####extra
- repo: https://github.com/astral-sh/uv-pre-commit
rev: 0.9.16
hooks:
Expand Down
34 changes: 27 additions & 7 deletions Containerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#syntax=docker/dockerfile:1-labs

FROM rocker/tidyverse
FROM rocker/tidyverse:4.5.1

ARG GIT_COMMIT_SHA
ENV GIT_COMMIT_SHA=$GIT_COMMIT_SHA
Expand All @@ -11,23 +11,43 @@ ENV GIT_BRANCH_NAME=$GIT_BRANCH_NAME
ENV XLA_FLAGS=--xla_force_host_platform_device_count=4

COPY ./hewr /pyrenew-hew/hewr

WORKDIR /pyrenew-hew

# install hewr dependencies
RUN Rscript -e "install.packages('pak')"
RUN Rscript -e "pak::pkg_install('cmu-delphi/epiprocess@main')"
RUN Rscript -e "pak::pkg_install('cmu-delphi/epipredict@main')"
RUN Rscript -e "pak::local_install('hewr', upgrade = FALSE)"

COPY --exclude=pipelines/priors . .
COPY pipelines/priors pipelines/priors

#
# Python from https://docs.astral.sh/uv/guides/integration/docker/
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Some handy uv environment variables
ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy
ENV UV_PYTHON_CACHE_DIR=/root/.cache/uv/python

RUN --mount=type=cache,target=/root/.cache/uv \
uv sync
# copy in the project files
COPY ./pyrenew_hew ./pyrenew_hew
COPY ./pipelines ./pipelines
COPY ./tests ./tests
COPY README.md ./
COPY ./pyproject.toml ./
COPY ./uv.lock ./
COPY ./.python-version ./

# VENV MANAGEMENT AND DEPENDENCY SYNCING

# Create the Pyrenew-Hew venv and sync dependencies from pyproject.toml
RUN uv venv .venv
RUN --mount=type=cache,target=/root/.cache/uv
# Set VIRTUAL_ENV variable at runtime to choose which venv to activate
# By default we'll do the main pyrenew-hew venv
ENV VIRTUAL_ENV=/pyrenew-hew/.venv
RUN uv sync

# Copy in the dagster defs
COPY ./dagster_defs.py ./

# Update PATH to use the selected venv
ENV PATH="${VIRTUAL_ENV}/bin:$PATH"
50 changes: 42 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,16 @@ endif
help:
@echo "Usage: make [target] [ARGS]"
@echo ""

@echo "Blobfuse Mount Targets: "
@echo " mount : Mount blob storage containers using blobfuse2"
@echo " unmount : Unmount blob storage containers and clean up"
@echo ""
@echo "Container Build Targets: "
@echo " container_build : Build the container image"
@echo " dagster : Run dagster definitions locally"
@echo " dagster_build : Build the dagster container image"
@echo " dagster_push : Push the dagster container image to the Azure Container Registry and code location"
@echo " container_tag : Tag the container image"
@echo " ghcr_login : Log in to the Github Container Registry. Requires GH_USERNAME and GH_PAT env vars"
@echo " container_push : Push the container image to the Azure Container Registry"
Expand Down Expand Up @@ -79,13 +87,36 @@ help:
@echo ""
@echo "Passing a flag through ARGS will also override the flags set previously."

#------------------------ #
# Blobfuse Mount Targets
# ----------------------- #

mount:
sudo ./blobfuse/mount.sh

unmount:
sudo ./blobfuse/cleanup.sh

# ----------------------- #
# Container Build Targets
# ----------------------- #

container_build: ghcr_login
$(ENGINE) build . -t $(CONTAINER_NAME) -f $(CONTAINERFILE)

dagster_build:
docker build -t cfaprdbatchcr.azurecr.io/pyrenew-hew:dagster_latest -f Containerfile .

dagster_push: dagster_build
az login --identity && \
az acr login -n cfaprdbatchcr && \
docker push "cfaprdbatchcr.azurecr.io/pyrenew-hew:dagster_latest" && \
uv run https://raw.githubusercontent.com/CDCgov/cfa-dagster/refs/heads/main/scripts/update_code_location.py \
--registry_image "cfaprdbatchcr.azurecr.io/pyrenew-hew:dagster_latest"

dagster:
uv run dagster_defs.py

container_tag:
$(ENGINE) tag $(CONTAINER_NAME) $(CONTAINER_REMOTE_NAME)

Expand All @@ -95,11 +126,14 @@ ghcr_login:
container_push: container_tag ghcr_login
$(ENGINE) push $(CONTAINER_REMOTE_NAME)

config:
bash -c "source ./azureconfig.sh"

# ---------------- #
# Model Fit Targets
# ---------------- #

run_timeseries:
run_timeseries: config
uv run python pipelines/batch/setup_job.py \
--model-family timeseries \
--output-subdir "${FORECAST_DATE}_forecasts" \
Expand All @@ -110,7 +144,7 @@ run_timeseries:
--dry-run "$(DRY_RUN)" \
$(ARGS)

run_e_model:
run_e_model: config
uv run python pipelines/batch/setup_job.py \
--model-family pyrenew \
--output-subdir "${FORECAST_DATE}_forecasts" \
Expand All @@ -122,7 +156,7 @@ run_e_model:
--dry-run "$(DRY_RUN)" \
$(ARGS)

run_h_model:
run_h_model: config
uv run python pipelines/batch/setup_job.py \
--model-family pyrenew \
--output-subdir "${FORECAST_DATE}_forecasts" \
Expand All @@ -134,7 +168,7 @@ run_h_model:
--dry-run "$(DRY_RUN)" \
$(ARGS)

run_he_model:
run_he_model: config
uv run python pipelines/batch/setup_job.py \
--model-family pyrenew \
--output-subdir "${FORECAST_DATE}_forecasts" \
Expand All @@ -146,7 +180,7 @@ run_he_model:
--dry-run "$(DRY_RUN)" \
$(ARGS)

run_hw_model:
run_hw_model: config
uv run python pipelines/batch/setup_job.py \
--model-family pyrenew \
--output-subdir "${FORECAST_DATE}_forecasts" \
Expand All @@ -158,19 +192,19 @@ run_hw_model:
--dry-run "$(DRY_RUN)" \
$(ARGS)

run_hew_model:
run_hew_model: config
uv run python pipelines/batch/setup_job.py \
--model-family pyrenew \
--output-subdir "${FORECAST_DATE}_forecasts" \
--model-letters "hew" \
--job-id "pyrenew-hew-${ENVIRONMENT}_${FORECAST_DATE}" \
--job-id "pyrenew-hew-${ENVIRONMENT}_${FORECAST_DATE}" \
--pool-id pyrenew-pool-32gb \
--rng-key "$(RNG_KEY)" \
--test "$(TEST)" \
--dry-run "$(DRY_RUN)" \
$(ARGS)

post_process:
post_process: config
uv run python pipelines/postprocess_forecast_batches.py \
--input "./blobfuse/mounts/pyrenew-hew-prod-output/${FORECAST_DATE}_forecasts" \
--output "./blobfuse/mounts/nssp-etl/gold/${FORECAST_DATE}_forecasts.parquet" \
Expand Down
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ This repository contains code for the [PyRenew-HEW model](https://github.com/CDC

## Containers

### Standard Container
The project uses GitHub Actions for automatically building container images based on the project's [Containerfile](Containerfile). The images are currently hosted on Github Container Registry and are built and pushed via the [containers.yaml](.github/workflows/containers.yaml) GitHub Actions workflow.

Images can also be built locally. The [Makefile](Makefile) contains several targets for building and pushing images. Although the Makefile uses Docker as the default engine, the `ENGINE` environment variable can be set to `podman` to use Podman instead, for example:
Expand Down Expand Up @@ -42,6 +43,51 @@ Pipelines can be run interactively or non-interactively:
- The `Makefile` also provides targets that will run pipelines non-interactively. Run `make help` for more information.
- Pipelines are run through the command line python interface when scheduled using [Pyrenew-Cron](https://github.com/cdcent/pyrenew-cron).

## Experimental: Running Model Pipelines With Dagster

When mature, our dagster implementation is intended to replace the `Azure Command Center` and `PyRenew-Cron`. Development is ongoing - you can test an early version by following the steps below.

### Local Development

#### Setup Blobfuse (Optional)
> Note: This is only necessary if you want to use the local `Docker Executor` with dagster - which is for development and debugging only. Otherwise, we recommend skipping this.

A `blobfuse/` directory allows local monitoring of inputs and outputs on Azure Blob as if they were in your local filesystem. This automates and replaces the process in the [CFA Blobfuse Tutorial](https://github.com/cdcent/cfa-blobfuse-tutorial). If you'd like to mount the Pyrenew project ecosystem's blobs to your working directory, follow the instructions in the `blobfuse/README.md`. This is only necessary for some debugging operations and for local testing, which isn't recommended unless you have a specific use-case and know what you're doing.

#### Launching Dagster Locally
> Prerequisites: `uv`. `docker`, a VAP VM with a registered managed identity in Azure, and rights to the cfaprdbatchcr container registry. Contact [email protected] for assistance with the latter two.

The following instructions will set up Dagster on your VAP. However, based on the current configuration, actual execution will still run in the cloud via Azure Batch. You can change the `executor` option in `dagster_defs.py` to test using the local Docker Executor - this will require you to have setup Blobfuse.

1. Setup your `uv virtual environment`:
- `uv sync`
- `source .venv/bin/activate`
2. Login to Azure and the Batch Container Registry:
- `az login --identity && az acr login -n cfaprdbatchcr`
3. Build and push the `pyrenew-hew:dagster_latest` image:
- `docker build -t cfaprdbatchcr.azurecr.io/pyrenew-hew:dagster_latest -f Containerfile . --push`
3. Start the Dagster UI by running `uv run dagster_defs.py --configure` and clicking the link in your terminal (usually [http://127.0.0.1:3000/])
- Note: you only need the `--configure` flag the first time you run the Dagster UI.
4. You should now see the dagster UI for Pyrenew-HEW. This is a local server that will only show PyRenew-HEW asssets as defined in your local git repository.
5. Try materializing an asset by navigating to "Lineage" on the left sidebar. By default, these assets will submit jobs to Azure Batch and write to the `pyrenew-test-output` blob.
- We recommend materializing a few partitions at a time for testing purposes.
![alt text](img/dagster_lineage.png)
- You will get a pop-up directing you to your asset runs, which provide progress logs.
![alt text](img/dagster_runs.png)
6. Using the run ID dagster provides, you can also find your jobs in Azure Batch Explorer.

#### Publishing to the Central Code Server
> This section is under construction.

Pushes to main will automatically update the central Dagster Code Location for PyRenew-HEW via a Github Actions Workflow. From the central code server, you can run and schedule Pyrenew-HEW runs and see other projects' pipelines at CFA. You can also manually update the code server with a makefile recipe (see next section).

#### Makefile Recipes for Dagster
After you've familiarized yourself with the above instructions, feel free to use these convenient `make` recipes:
- `make dagster_build`: builds your dagster image.
- `make dagster_push`: builds your dagster image, then pushes it, then uploads the central dagster server's code for pyrenew-hew.
- `make dagster`: runs the dagster UI locally.
- `make mount`: mounts the pyrenew-relevant blobs using blobfuse.
- `make unmount`: gracefully unmounts the pyrenew-relevant blobs.
## General Disclaimer
This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/organization/mission.htm). GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.

Expand Down
17 changes: 17 additions & 0 deletions blobfuse/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Pyrenew Blobfuse Configuration

This directory serves as a project-specific fork of the [cfa-blobfuse-tutorial](https://github.com/cdcent).

> Make sure you have blobfuse2 installed before running this module.

This directory will mount pyrenew-hew blobs to `/mnt` and then symlink to a directory you specify (or the current directory if you don't supply an argument).

To run, make sure you're in the top level as your working directory (`pyrenew-hew`, and not `pyrenew-hew/blobfuse`).
1. Run `sudo chmod +x ./blobfuse/mount.sh`.
2. Run `sudo ./blobfuse/mount.sh`. This will mount to the top-level (pyrenew-hew)
3. Check to make sure `/mnt` has pyrenew blobs mounted and that symlinks have been created in your working directory (`pyrenew-hew/`).
4. Before attempting to remount, run the cleanup script `sudo ./blobfuse/cleanup.sh`.

You can, for convenience, use make commands:
- `make mount`
- `make unmount`
28 changes: 28 additions & 0 deletions blobfuse/archive/unmount.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash

# ensure logged in via Azure CLI.
./blobfuse/verifylogin.sh

if [[ "$?" -ne 0 ]]; then
exit 1
fi

echo "Unmounting containers specified in mounts.txt with blobfuse2..."

TO_UNMOUNT=(
"nssp-etl"
"nssp-archival-vintages"
"prod-param-estimates"
"pyrenew-hew-prod-output"
"pyrenew-test-output"
"nwss-vintages"
"pyrenew-hew-config"
)

for dir in "${TO_UNMOUNT[@]}"; do
echo "Unmounting" $dir
blobfuse2 unmount $dir
rmdir $dir
done

echo "Done."
27 changes: 27 additions & 0 deletions blobfuse/cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

# ensure logged in via Azure CLI.
./blobfuse/verifylogin.sh

if [[ "$?" -ne 0 ]]; then
exit 1
fi

echo "Cleaning up blobfuse mounts"

echo "Unmounting any mounted blob storage containers"
blobfuse2 unmount all

echo "Removing all empty entries in /mnt/"
find /mnt/ -mindepth 1 -type d -empty -delete

echo "Clearing the cache"
rm -rf .blobfuse_cache/*

echo "Removing empty directories"
find . -type d -empty -delete

echo "Removing symlinks"
find . -type l -delete

echo "Done!"
Loading