Skip to content

Conversation

@jkislin
Copy link
Contributor

@jkislin jkislin commented Oct 9, 2025

Summary

We're working on getting scheduled and orchestrated versions of CFA Azure Batch Pipelines into Dagster. PyRenew-HEW is one of our pilot projects, alongside the CFA Epinow2 Pipeline.

This PR allows users to run dagster's User Interface locally on their VAP while submitting jobs to Azure Batch.

It also provides a Makefile recipe that allows one to publish pyrenew-hew's codebase to the central dagster server, but central orchestration is not configured quite yet. More to come there soon!

Details

This is a large undertaking, and we are learning along the way. This PR allows users to demo dagster for PyRenew-HEW and report back on bugs and desired features before we move to a production-ready version. It includes changes to existing files used in other workflows, such as the pyproject.toml and Containerfile, so we are testing to make sure no cross-compatibility issues arise. Once fully tested and reviewed, we can merge this PR.

Currently ready for review (12/11/25):

  1. You can now launch a local dagster server that can submit jobs to Azure Batch. Currently, feature parity is similar to the Azure Command Center. Also currently, the workflows are hard-coded to output to pyrenew-test-output. This can be changed.
  2. The Makefile now includes helpful commands for dagster and blobfuse for PyRenew-HEW.
  3. There are instructions on how to setup dagster locally for your user/VM and how to submit to our central dagster code server.
  4. The Containerfile now pins to rocker/tidyverse:4.5.1 rather than latest.
  5. The Containerfile now includes the dagster_defs.py workflow file and manages dependencies for it unified with the PyRenew-HEW project dependencies. It also takes a more explicit approach to bringing in project files.

Instructions for review:

To evaluate dagster, follow the steps in the README.

@jkislin jkislin marked this pull request as draft October 9, 2025 17:32
@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 39.43%. Comparing base (4f65dce) to head (5e8f858).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #714   +/-   ##
=======================================
  Coverage   39.43%   39.43%           
=======================================
  Files          30       30           
  Lines        2713     2713           
=======================================
  Hits         1070     1070           
  Misses       1643     1643           
Flag Coverage Δ
hewr 29.93% <ø> (ø)
pipelines 35.87% <ø> (ø)
pyrenew_hew 62.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jkislin and others added 19 commits October 14, 2025 13:41
@jkislin jkislin changed the title Sandboxing Dagster using Pyrenew-Hew Minimal Dagster Example; Auto-blobfuse; Makefile convenience recipes Dec 4, 2025
@jkislin jkislin marked this pull request as ready for review December 11, 2025 20:55
@jkislin jkislin requested a review from Copilot December 11, 2025 21:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Dagster orchestration for PyRenew-HEW, enabling scheduled and orchestrated pipeline execution through Azure Batch. The implementation provides a local development environment with asset-based workflow definitions while maintaining compatibility with existing workflows.

Key changes:

  • Added Dagster dependencies and workflow definitions supporting multiple model variants (timeseries_e, pyrenew_e/h/he/hw/hew)
  • Implemented blobfuse automation scripts for mounting Azure Blob storage containers locally
  • Enhanced Makefile with convenience targets for Dagster operations and blob storage management

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pyproject.toml Added Dagster-related dependencies and new author
dagster_defs.py Core Dagster implementation defining assets, partitions, executors, and schedules for PyRenew-HEW workflows
blobfuse/verifylogin.sh Azure CLI login verification with managed identity fallback
blobfuse/pull_config.sh Downloads Azure configuration and blobfuse config files from blob storage
blobfuse/mount.sh Mounts blob storage containers and creates symlinks for local development
blobfuse/cleanup.sh Unmounts blob storage and cleans up temporary files/symlinks
blobfuse/archive/unmount.sh Legacy unmount script (archived)
blobfuse/README.md Documentation for blobfuse setup and usage
README.md Added comprehensive Dagster setup instructions and usage guide
Makefile Added targets for Dagster operations, blobfuse mounting, and configuration loading
Containerfile Updated base image version, reorganized file copying, and added Dagster definitions
.pre-commit-config.yaml Excluded dagster_defs.py from ruff-format
.github/workflows/update-dagster-code.yaml GitHub Actions workflow for automated Dagster code deployment
.dockerignore Added ignore patterns for container builds
.containerignore Added mounts/ to ignore list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jkislin jkislin requested a review from giomrella December 11, 2025 21:47
@jkislin jkislin changed the title Minimal Dagster Example; Auto-blobfuse; Makefile convenience recipes Using Dagster to Orchestrate Batch Pipelines Dec 12, 2025
@jkislin jkislin changed the title Using Dagster to Orchestrate Batch Pipelines Using Dagster to Orchestrate Batch Pipelines - Local Orchestration, Cloud Execution Dec 12, 2025
@jkislin jkislin changed the title Using Dagster to Orchestrate Batch Pipelines - Local Orchestration, Cloud Execution Local Dagster Orchestration of Batch Pipelines Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants