-
Notifications
You must be signed in to change notification settings - Fork 3
Local Dagster Orchestration of Batch Pipelines #714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jkislin
wants to merge
67
commits into
main
Choose a base branch
from
jk-dagster-sandbox
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 58 commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
c0f61f3
added dagster_defs and dg.Dockerfile
jkislin c5c46ea
simplified names
jkislin 08b1939
moved dagster to its own folder
jkislin 842d7a4
dagster_defs.py a la sandbox; containerfile updates to include dagster
jkislin 8996067
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 066894b
Merge branch 'main' into jk-dagster-sandbox
jkislin 5307f58
quick fix to dagster_defs.py
6746820
some simplification for dagster. pending debugging tomorrow.
jkislin f2df9ef
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin 3472034
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 631f982
tons of incremental updates toward a working dagster pyrenew-h. check…
jkislin 38cd584
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 500df4d
blobfuse mounts in local dagster docker executor!
jkislin 01c2497
pyrenew-h-output for now
jkislin d0e6947
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin 884bb44
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] dbcf189
Merge branch 'main' of https://github.com/CDCgov/pyrenew-hew into jk-…
jkislin c5d9de6
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin 947d34d
output subdir back to ./
jkislin 60e9f6c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 07aa576
fix asset execution context
jkislin abd1ed4
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin aa03b33
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 5264571
pyrenew asset builder
jkislin 125c4ff
An initial working example for pyrenew-hew!
jkislin f10171a
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin 282c869
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 40ec05a
readme update and testing for the timeseries and batch executor
jkislin 9235f55
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2609614
blobfuse refinements
jkislin 1082032
switch to docker executor for testing
jkislin 499a753
caj executor docker image tag
jkislin bf1116e
fixed batch executor source volumes
jkislin d23d306
updated docker executor to also user username versioned tags
jkislin 8f57a82
reverting to batch executor for testing
jkislin 5ba84d3
Success! Batch Executor works
jkislin 82a762e
some mount cleanup
jkislin e981428
update definitions
jkislin e6c9b09
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] c3369d9
remove unnecessary commented and redundant code
jkislin bf6a20d
Merge branch 'main' of https://github.com/CDCgov/pyrenew-hew into jk-…
jkislin d615505
adding env variables for Azure Command Center
jkislin 3da4c81
allow different models to have different partitions
jkislin 40a72cb
an attempt at clean venv separation between pyrenew and dagster
jkislin b8d1f1a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 74cd8a7
Merge branch 'main' into jk-dagster-sandbox
jkislin a3d891d
simplified blobfuse
jkislin e3f1a2b
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin eaaf510
merged the dagster and project dependencies together
jkislin b2415d3
Merge branch 'main' of https://github.com/CDCgov/pyrenew-hew into jk-…
jkislin 1a77853
new dagster_defs with precommit run
jkislin 93f7e7b
Merge branch 'main' into jk-dagster-sandbox
jkislin df0d8ed
partition definitions are all the same; we exclude in runs themselves…
jkislin c592804
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin 612cff9
Merge branch 'main' of https://github.com/CDCgov/pyrenew-hew into jk-…
jkislin 21f1d9f
preparing for central code location usage; updated cfa dagster
jkislin 59781de
readme, github actions, makefile
jkislin aa67bb3
Merge branch 'main' into jk-dagster-sandbox
jkislin 7b49243
Update blobfuse/README.md
jkislin b7190ae
Update Makefile
jkislin dc1d8cb
replaced ignoring dagster_defs.py with ruff format off for a single c…
jkislin 89e8a59
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin c28c60f
Apply suggestion from @Copilot
jkislin 30ead07
images
jkislin 1ad893f
Merge branch 'main' into jk-dagster-sandbox
jkislin 09e0670
add context to run_pyrenew_model calls
jkislin 5e8f858
Merge branch 'jk-dagster-sandbox' of https://github.com/CDCgov/pyrene…
jkislin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,4 @@ | ||
| Containerfile | ||
| nssp_demo/private_data | ||
| notebooks | ||
| mounts/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Containerfile | ||
| nssp_demo/private_data | ||
| notebooks | ||
| mounts |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| name: Deploy Dagster Code | ||
|
|
||
| on: | ||
| push: | ||
| branches: | ||
| - main | ||
| workflow_dispatch: | ||
|
|
||
| jobs: | ||
| run: | ||
| runs-on: ubuntu-latest | ||
| environment: production | ||
| name: Run Dagster job | ||
| steps: | ||
| - name: Run job | ||
| uses: CDCgov/cfa-actions/[email protected] | ||
| with: | ||
| github_app_id: ${{ secrets.CDCENT_ACTOR_APP_ID }} | ||
| github_app_pem: ${{ secrets.CDCENT_ACTOR_APP_PEM }} | ||
| script: | | ||
| echo "Running update script" | ||
| uv run \ | ||
| https://raw.githubusercontent.com/CDCgov/cfa-dagster/refs/heads/main/scripts/update_code_location.py \ | ||
| --registry_image cfaprdbatchcr.azurecr.io/pyrenew-hew:dagster_latest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,6 +15,7 @@ This repository contains code for the [PyRenew-HEW model](https://github.com/CDC | |
|
|
||
| ## Containers | ||
|
|
||
| ### Standard Container | ||
| The project uses GitHub Actions for automatically building container images based on the project's [Containerfile](Containerfile). The images are currently hosted on Github Container Registry and are built and pushed via the [containers.yaml](.github/workflows/containers.yaml) GitHub Actions workflow. | ||
|
|
||
| Images can also be built locally. The [Makefile](Makefile) contains several targets for building and pushing images. Although the Makefile uses Docker as the default engine, the `ENGINE` environment variable can be set to `podman` to use Podman instead, for example: | ||
|
|
@@ -42,6 +43,51 @@ Pipelines can be run interactively or non-interactively: | |
| - The `Makefile` also provides targets that will run pipelines non-interactively. Run `make help` for more information. | ||
| - Pipelines are run through the command line python interface when scheduled using [Pyrenew-Cron](https://github.com/cdcent/pyrenew-cron). | ||
|
|
||
| ## Experimental: Running Model Pipelines With Dagster | ||
|
|
||
| When mature, our dagster implementation is intended to replace the `Azure Command Center` and `PyRenew-Cron`. Development is ongoing - you can test an early version by following the steps below. | ||
|
|
||
| ### Local Development | ||
|
|
||
| #### Setup Blobfuse (Optional) | ||
| > Note: This is only necessary if you want to use the local `Docker Executor` with dagster - which is for development and debugging only. Otherwise, we recommend skipping this. | ||
|
|
||
| A `blobfuse/` directory allows local monitoring of inputs and outputs on Azure Blob as if they were in your local filesystem. This automates and replaces the process in the [CFA Blobfuse Tutorial](https://github.com/cdcent/cfa-blobfuse-tutorial). If you'd like to mount the Pyrenew project ecosystem's blobs to your working directory, follow the instructions in the `blobfuse/README.md`. This is only necessary for some debugging operations and for local testing, which isn't recommended unless you have a specific use-case and know what you're doing. | ||
|
|
||
| #### Launching Dagster Locally | ||
| > Prerequisites: `uv`. `docker`, a VAP VM with a registered managed identity in Azure, and rights to the cfaprdbatchcr container registry. Contact [email protected] for assistance with the latter two. | ||
|
|
||
| The following instructions will set up Dagster on your VAP. However, based on the current configuration, actual execution will still run in the cloud via Azure Batch. You can change the `executor` option in `dagster_defs.py` to test using the local Docker Executor - this will require you to have setup Blobfuse. | ||
|
|
||
| 1. Setup your `uv virtual environment`: | ||
| - `uv sync` | ||
| - `source .venv/bin/activate` | ||
| 2. Login to Azure and the Batch Container Registry: | ||
| - `az login --identity && az acr login -n cfaprdbatchcr` | ||
| 3. Build and push the `pyrenew-hew:dagster_latest` image: | ||
| - `docker build -t cfaprdbatchcr.azurecr.io/pyrenew-hew:dagster_latest -f Containerfile . --push` | ||
| 3. Start the Dagster UI by running `uv run dagster_defs.py --configure` and clicking the link in your terminal (usually [http://127.0.0.1:3000/]) | ||
| - Note: you only need the `--configure` flag the first time you run the Dagster UI. | ||
| 4. You should now see the dagster UI for Pyrenew-HEW. This is a local server that will only show PyRenew-HEW asssets as defined in your local git repository. | ||
| 5. Try materializing an asset by navigating to "Lineage" on the left sidebar. By default, these assets will submit jobs to Azure Batch and write to the `pyrenew-test-output` blob. | ||
| - We recommend materializing a few partitions at a time for testing purposes. | ||
|  | ||
| - You will get a pop-up directing you to your asset runs, which provide progress logs. | ||
|  | ||
| 6. Using the run ID dagster provides, you can also find your jobs in Azure Batch Explorer. | ||
|
|
||
| #### Publishing to the Central Code Server | ||
| > This section is under construction. | ||
|
|
||
| Pushes to main will automatically update the central Dagster Code Location for PyRenew-HEW via a Github Actions Workflow. From the central code server, you can run and schedule Pyrenew-HEW runs and see other projects' pipelines at CFA. You can also manually update the code server with a makefile recipe (see next section). | ||
|
|
||
| #### Makefile Recipes for Dagster | ||
| After you've familiarized yourself with the above instructions, feel free to use these convenient `make` recipes: | ||
| - `make dagster_build`: builds your dagster image. | ||
| - `make dagster_push`: builds your dagster image, then pushes it, then uploads the central dagster server's code for pyrenew-hew. | ||
| - `make dagster`: runs the dagster UI locally. | ||
| - `make mount`: mounts the pyrenew-relevant blobs using blobfuse. | ||
| - `make unmount`: gracefully unmounts the pyrenew-relevant blobs. | ||
| ## General Disclaimer | ||
| This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/organization/mission.htm). GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise. | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Pyrenew Blobfuse Configuration | ||
|
|
||
| This directory serves as a project-specific fork of the [cfa-blobfuse-tutuorial](https://github.com/cdcent). | ||
jkislin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| > Make sure you have blobfuse2 installed before running this module. | ||
| This directory will mount pyrenew-hew blobs to `/mnt` and then symlink to a directory you specify (or the current directory if you don't supply an argument). | ||
|
|
||
| To run, make sure you're in the top level as your working directory (`pyrenew-hew`, and not `pyrenew-hew/blobfuse`). | ||
| 1. Run `sudo chmod +x ./blobfuse/mount.sh`. | ||
| 2. Run `sudo ./blobfuse/mount.sh`. This will mount to the top-level (pyrenew-hew) | ||
| 3. Check to make sure `/mnt` has pyrenew blobs mounted and that symlinks have been created in your working directory (`pyrenew-hew/`). | ||
| 4. Before attempting to remount, run the cleanup script `sudo ./blobfuse/cleanup.sh`. | ||
|
|
||
| You can, for convenience, use make commands: | ||
| - `make mount` | ||
| - `make unmount` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| #!/bin/bash | ||
|
|
||
| # ensure logged in via Azure CLI. | ||
| ./blobfuse/verifylogin.sh | ||
|
|
||
| if [[ "$?" -ne 0 ]]; then | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "Unmounting containers specified in mounts.txt with blobfuse2..." | ||
|
|
||
| TO_UNMOUNT=( | ||
| "nssp-etl" | ||
| "nssp-archival-vintages" | ||
| "prod-param-estimates" | ||
| "pyrenew-hew-prod-output" | ||
| "pyrenew-test-output" | ||
| "nwss-vintages" | ||
| "pyrenew-hew-config" | ||
| "nssp-etl" | ||
jkislin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ) | ||
|
|
||
| for dir in "${TO_UNMOUNT[@]}"; do | ||
| echo "Unmounting" $dir | ||
| blobfuse2 unmount $dir | ||
| rmdir $dir | ||
| done | ||
|
|
||
| echo "Done." | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| #!/bin/bash | ||
|
|
||
| # ensure logged in via Azure CLI. | ||
| ./blobfuse/verifylogin.sh | ||
|
|
||
| if [[ "$?" -ne 0 ]]; then | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "Cleaning up blobfuse mounts" | ||
|
|
||
| echo "Unmounting any mounted blob storage containers" | ||
| blobfuse2 unmount all | ||
|
|
||
| echo "Removing all empty entries in /mnt/" | ||
| find /mnt/ -mindepth 1 -type d -empty -delete | ||
|
|
||
| echo "Clearing the cache" | ||
| rm -rf .blobfuse_cache/* | ||
|
|
||
| echo "Removing empty directories" | ||
| find . -type d -empty -delete | ||
|
|
||
| echo "Removing symlinks" | ||
| find . -type l -delete | ||
|
|
||
| echo "Done!" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.