Issue 780: Adds `forecast_epiautogp` #783

SamuelBrand1 · 2025-12-12T17:30:02Z

This PR closes #780 .

This pull request introduces several enhancements and refactors to:

EpIAutoGP. These make the model flexible in doing daily or weekly forecast strides, and formats output to work more smoothly with post-processing.
pipelines/epiautogp. The python module that integrates running EpiAutoGP with the pyrenew-hew pipeline. The entrypoint script for this is forecast_epiautogp.py which is similar to forecast_pyrenew.py and forecast_timeseries.py.

`forecast_epiautogp.py`

Where possible I have reused functionality that exists in pipelines, however, to make dev easier I have introduced some classes and functions that wrap multiple steps. In forecast_epiautogp.py there are the following steps:

prelim step which generates a contextual model name, groups the parameters and exe flags into dicts etc
A pipeline setup step setup_forecast_pipeline which does the credential step and the existing data wrangling code and puts the pipeline information into a ForecastPipelineContext dataclass object. This reduced the amount of parameters that need passing around.
A data setup step which calls a method on the pipeline context to set the data up for model usage and returns a ModelPaths dataclass object to hold the various paths that get passed around.
A specific EpiAutoGP data set up that creates a new JSON data file for EpiAutoGP to use.
A step that runs the EpiAutoGP model
A post-processing step which calls a method on the pipeline context. This does the output formatting, plotting and hubverse table creation. For this I had to write some specific functions for EpiAutoGP but it remains based on current post-processing functions.

End to end testing

I've add the integration test pipelines/tests/test_epiautogp_end_to_end.sh which matches the structure of the existing end to end tests but currently only for covid. Model options cover:

running and forecasting on weekly NHSN data
Weekly NSSP % ED visits
daily ED visit counts
daily other ED visits counts.

Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling.

Outputs to expected parquet format

Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata.

Introduces a use_percentage boolean field to EpiAutoGPInput to distinguish between raw counts and percentage-based input for ED visits. Updates output logic to set the variable name and convert values to proportions when use_percentage is true for nssp targets. Test cases and input construction are updated accordingly.

Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations.

Consolidated forecast post-processing steps (processing outputs, creating hubverse table, and plotting) into a single post_process_forecast utility in epiautogp_forecast_utils.py. Updated imports and usage in __init__.py and forecast_epiautogp.py for improved modularity and code reuse. Added param_data_dir to ForecastPipelineContext and setup_forecast_pipeline.

Moved prepare_model_data and post_process_forecast functions into ForecastPipelineContext as methods. Updated imports and usage in forecast_epiautogp.py and __init__.py to use the new class methods, improving encapsulation and code organization.

Introduces a 'frequency' field to EpiAutoGPInput to support both daily and epiweekly data. Refactors modelling and argument parsing to use a generic 'n_ahead' parameter (number of time steps) instead of 'n_forecast_weeks', and updates all related documentation, tests, and function signatures for consistency and flexibility.

Introduces the ed_visit_type field to EpiAutoGPInput for specifying the type of ED visits, updates output logic to use this field for column selection, and adjusts tests and documentation accordingly. Also updates output file naming to use the frequency prefix.

Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency.

Expanded end-to-end and fit test scripts to include daily NSSP count and 'other ED visits' forecasts. Updated argument handling in test_epiautogp_fit.sh to support an optional ed_visit_type parameter and adjusted expected model counts accordingly.

Updated test_forecast_utils.py to use new ForecastPipelineContext interface, updated patch paths, and migrated to context methods for prepare_model_data and post_process_forecast. Removed test_prep_epiautogp_data.py as part of test suite cleanup.

codecov · 2025-12-12T17:51:19Z

Codecov Report

❌ Patch coverage is 68.77193% with 89 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (epiautogp@993f353). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
pipelines/epiautogp/forecast_epiautogp.py	0.00%	41 Missing ⚠️
pipelines/epiautogp/process_epiautogp_forecast.py	13.33%	26 Missing ⚠️
pipelines/epiautogp/prep_epiautogp_data.py	4.34%	22 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             epiautogp     #783   +/-   ##
============================================
  Coverage             ?   43.28%           
============================================
  Files                ?       36           
  Lines                ?     3061           
  Branches             ?        0           
============================================
  Hits                 ?     1325           
  Misses               ?     1736           
  Partials             ?        0

Flag	Coverage Δ
hewr	`29.93% <ø> (?)`
pipelines	`43.54% <68.77%> (?)`
pyrenew_hew	`62.29% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds a comprehensive forecasting pipeline for the EpiAutoGP model, enabling it to forecast both daily and weekly ED visits (NSSP) and hospital admissions (NHSN). The implementation follows the existing pipeline patterns while introducing new utilities for data conversion, model execution, and post-processing specific to EpiAutoGP's Julia-based workflow.

Key Changes

Introduced forecast_epiautogp.py as the main entry point for the EpiAutoGP forecasting pipeline
Added shared pipeline utilities (ForecastPipelineContext, ModelPaths, setup_forecast_pipeline) to reduce code duplication
Enhanced Julia EpiAutoGP model to support flexible daily/weekly forecast strides with improved parameter naming

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
pipelines/tests/test_prep_epiautogp_data.py	Removed (tests moved or consolidated)
pipelines/tests/test_forecast_utils.py	New unit tests for forecast pipeline utilities using mocks
pipelines/tests/test_epiautogp_prep_script.py	Removed preparation script tests
pipelines/tests/test_epiautogp_prep.sh	Removed shell test for data preparation
pipelines/tests/test_epiautogp_fit.sh	New shell script to run single EpiAutoGP forecasts
pipelines/tests/test_epiautogp_end_to_end.sh	New comprehensive end-to-end integration test
pipelines/forecast_pyrenew.py	Fixed import organization (moved to absolute imports)
pipelines/epiautogp/process_epiautogp_forecast.py	New post-processing utilities for EpiAutoGP outputs
pipelines/epiautogp/prep_epiautogp_data.py	Enhanced data conversion with context/paths pattern and ed_visit_type support
pipelines/epiautogp/plot_epiautogp_forecast.R	New R plotting script for EpiAutoGP-specific visualizations
pipelines/epiautogp/forecast_epiautogp.py	Main pipeline entry point orchestrating all steps
pipelines/epiautogp/epiautogp_forecast_utils.py	Shared utilities and dataclasses for pipeline stages
pipelines/epiautogp/init.py	Updated exports for new utilities
pipelines/epiautogp/README.md	Comprehensive documentation of pipeline architecture
EpiAutoGP/test/test_parse_arguments.jl	Updated test for renamed parameter
EpiAutoGP/test/test_output.jl	Added new required fields to test data
EpiAutoGP/test/test_modelling.jl	Updated tests for renamed parameter and added daily frequency test
EpiAutoGP/test/test_input.jl	Updated all test inputs with new required fields
EpiAutoGP/src/parse_arguments.jl	Renamed `n-forecast-weeks` to `n-ahead` for flexibility
EpiAutoGP/src/output.jl	Added `PipelineOutput` type and refactored output creation
EpiAutoGP/src/modelling.jl	Updated to support daily/weekly frequencies with time_step calculation
EpiAutoGP/src/input.jl	Added frequency, use_percentage, and ed_visit_type fields
EpiAutoGP/src/EpiAutoGP.jl	Added Parquet dependency and new constants
EpiAutoGP/run.jl	Switched default output type to PipelineOutput
EpiAutoGP/Project.toml	Added Parquet dependency and reordered authors field

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

EpiAutoGP/Project.toml

Copilot · 2025-12-12T22:52:06Z

pipelines/epiautogp/prep_epiautogp_data.py

    output_json_path : Path
        Path where the EpiAutoGP JSON file will be saved


The parameter documentation mentions output_json_path, but this parameter was removed from the function signature. The docstring should be updated to remove this parameter and its description.

Suggested change

output_json_path : Path

Path where the EpiAutoGP JSON file will be saved

Copilot · 2025-12-12T22:52:07Z

pipelines/epiautogp/epiautogp_forecast_utils.py

+            first_training_date=self.first_training_date,
+            last_training_date=self.last_training_date,


The prepare_model_data method lacks test coverage for the case when NHSN data path is provided. The existing tests in test_forecast_utils.py should be extended to verify NHSN data handling.

Copilot · 2025-12-12T22:52:07Z

EpiAutoGP/src/output.jl

+        end
+    end
+
+    # Convert percentage to proportion if needed (R expects proportions for prop_ variables)


The comment on line 220 states 'R expects proportions for prop_ variables', but this transformation divides by 100, which converts percentages to proportions. Consider clarifying that the input data is in percentage format (0-100) and needs conversion to proportion format (0-1).

Suggested change

# Convert percentage to proportion if needed (R expects proportions for prop_ variables)

# Input is in percentage format (0-100); convert to proportion (0-1) as R expects proportions for prop_ variables

pipelines/tests/test_epiautogp_end_to_end.sh

SamuelBrand1 added 17 commits December 12, 2025 15:35

change to relative imports

1641e59

add Parquet dep

2e434ea

reduce docstring bloat

fca72b0

Add PipelineOutput support for pipeline forecasts

f89673e

Outputs to expected parquet format

move utils and rename paths dataclass

5156147

Update .gitignore

966f4d6

Update README.md

ada1b74

SamuelBrand1 requested review from damonbayer, dylanhmorris and sbidari as code owners December 12, 2025 17:30

damonbayer requested a review from Copilot December 12, 2025 22:51

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue 780: Adds `forecast_epiautogp` #783

Issue 780: Adds `forecast_epiautogp` #783

Uh oh!

SamuelBrand1 commented Dec 12, 2025

Uh oh!

codecov bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		output_json_path : Path
		Path where the EpiAutoGP JSON file will be saved

		first_training_date=self.first_training_date,
		last_training_date=self.last_training_date,

	# Convert percentage to proportion if needed (R expects proportions for prop_ variables)
	# Input is in percentage format (0-100); convert to proportion (0-1) as R expects proportions for prop_ variables

Issue 780: Adds forecast_epiautogp #783

Are you sure you want to change the base?

Issue 780: Adds forecast_epiautogp #783

Uh oh!

Conversation

SamuelBrand1 commented Dec 12, 2025

forecast_epiautogp.py

End to end testing

Uh oh!

codecov bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Issue 780: Adds `forecast_epiautogp` #783

Issue 780: Adds `forecast_epiautogp` #783

`forecast_epiautogp.py`

codecov bot commented Dec 12, 2025 •

edited

Loading