-
Notifications
You must be signed in to change notification settings - Fork 3
Local Dagster Orchestration of Batch Pipelines #714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #714 +/- ##
=======================================
Coverage 39.43% 39.43%
=======================================
Files 30 30
Lines 2713 2713
=======================================
Hits 1070 1070
Misses 1643 1643
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…w-hew into jk-dagster-sandbox
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…w-hew into jk-dagster-sandbox
for more information, see https://pre-commit.ci
…w-hew into jk-dagster-sandbox
for more information, see https://pre-commit.ci
…w-hew into jk-dagster-sandbox
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces Dagster orchestration for PyRenew-HEW, enabling scheduled and orchestrated pipeline execution through Azure Batch. The implementation provides a local development environment with asset-based workflow definitions while maintaining compatibility with existing workflows.
Key changes:
- Added Dagster dependencies and workflow definitions supporting multiple model variants (timeseries_e, pyrenew_e/h/he/hw/hew)
- Implemented blobfuse automation scripts for mounting Azure Blob storage containers locally
- Enhanced Makefile with convenience targets for Dagster operations and blob storage management
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Added Dagster-related dependencies and new author |
| dagster_defs.py | Core Dagster implementation defining assets, partitions, executors, and schedules for PyRenew-HEW workflows |
| blobfuse/verifylogin.sh | Azure CLI login verification with managed identity fallback |
| blobfuse/pull_config.sh | Downloads Azure configuration and blobfuse config files from blob storage |
| blobfuse/mount.sh | Mounts blob storage containers and creates symlinks for local development |
| blobfuse/cleanup.sh | Unmounts blob storage and cleans up temporary files/symlinks |
| blobfuse/archive/unmount.sh | Legacy unmount script (archived) |
| blobfuse/README.md | Documentation for blobfuse setup and usage |
| README.md | Added comprehensive Dagster setup instructions and usage guide |
| Makefile | Added targets for Dagster operations, blobfuse mounting, and configuration loading |
| Containerfile | Updated base image version, reorganized file copying, and added Dagster definitions |
| .pre-commit-config.yaml | Excluded dagster_defs.py from ruff-format |
| .github/workflows/update-dagster-code.yaml | GitHub Actions workflow for automated Dagster code deployment |
| .dockerignore | Added ignore patterns for container builds |
| .containerignore | Added mounts/ to ignore list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
…odeblock in dagster-defs.py
…w-hew into jk-dagster-sandbox
Co-authored-by: Copilot <[email protected]>
Summary
We're working on getting scheduled and orchestrated versions of CFA Azure Batch Pipelines into Dagster. PyRenew-HEW is one of our pilot projects, alongside the CFA Epinow2 Pipeline.
This PR allows users to run dagster's User Interface locally on their VAP while submitting jobs to Azure Batch.
It also provides a Makefile recipe that allows one to publish pyrenew-hew's codebase to the central dagster server, but central orchestration is not configured quite yet. More to come there soon!
Details
This is a large undertaking, and we are learning along the way. This PR allows users to demo dagster for PyRenew-HEW and report back on bugs and desired features before we move to a production-ready version. It includes changes to existing files used in other workflows, such as the
pyproject.tomlandContainerfile, so we are testing to make sure no cross-compatibility issues arise. Once fully tested and reviewed, we can merge this PR.Currently ready for review (12/11/25):
pyrenew-test-output. This can be changed.Makefilenow includes helpful commands for dagster and blobfuse for PyRenew-HEW.dagster_defs.pyworkflow file and manages dependencies for it unified with the PyRenew-HEW project dependencies. It also takes a more explicit approach to bringing in project files.Instructions for review:
To evaluate dagster, follow the steps in the README.