Skip to content

Conversation

@SamuelBrand1
Copy link
Collaborator

This PR closes #780 .

This pull request introduces several enhancements and refactors to:

  • EpIAutoGP. These make the model flexible in doing daily or weekly forecast strides, and formats output to work more smoothly with post-processing.
  • pipelines/epiautogp. The python module that integrates running EpiAutoGP with the pyrenew-hew pipeline. The entrypoint script for this is forecast_epiautogp.py which is similar to forecast_pyrenew.py and forecast_timeseries.py.

forecast_epiautogp.py

Where possible I have reused functionality that exists in pipelines, however, to make dev easier I have introduced some classes and functions that wrap multiple steps. In forecast_epiautogp.py there are the following steps:

  • prelim step which generates a contextual model name, groups the parameters and exe flags into dicts etc
  • A pipeline setup step setup_forecast_pipeline which does the credential step and the existing data wrangling code and puts the pipeline information into a ForecastPipelineContext dataclass object. This reduced the amount of parameters that need passing around.
  • A data setup step which calls a method on the pipeline context to set the data up for model usage and returns a ModelPaths dataclass object to hold the various paths that get passed around.
  • A specific EpiAutoGP data set up that creates a new JSON data file for EpiAutoGP to use.
  • A step that runs the EpiAutoGP model
  • A post-processing step which calls a method on the pipeline context. This does the output formatting, plotting and hubverse table creation. For this I had to write some specific functions for EpiAutoGP but it remains based on current post-processing functions.

End to end testing

I've add the integration test pipelines/tests/test_epiautogp_end_to_end.sh which matches the structure of the existing end to end tests but currently only for covid. Model options cover:

  • running and forecasting on weekly NHSN data
  • Weekly NSSP % ED visits
  • daily ED visit counts
  • daily other ED visits counts.

Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling.
Outputs to expected parquet format
Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata.
Introduces a use_percentage boolean field to EpiAutoGPInput to distinguish between raw counts and percentage-based input for ED visits. Updates output logic to set the variable name and convert values to proportions when use_percentage is true for nssp targets. Test cases and input construction are updated accordingly.
Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations.
Consolidated forecast post-processing steps (processing outputs, creating hubverse table, and plotting) into a single post_process_forecast utility in epiautogp_forecast_utils.py. Updated imports and usage in __init__.py and forecast_epiautogp.py for improved modularity and code reuse. Added param_data_dir to ForecastPipelineContext and setup_forecast_pipeline.
Moved prepare_model_data and post_process_forecast functions into ForecastPipelineContext as methods. Updated imports and usage in forecast_epiautogp.py and __init__.py to use the new class methods, improving encapsulation and code organization.
Introduces a 'frequency' field to EpiAutoGPInput to support both daily and epiweekly data. Refactors modelling and argument parsing to use a generic 'n_ahead' parameter (number of time steps) instead of 'n_forecast_weeks', and updates all related documentation, tests, and function signatures for consistency and flexibility.
Introduces the ed_visit_type field to EpiAutoGPInput for specifying the type of ED visits, updates output logic to use this field for column selection, and adjusts tests and documentation accordingly. Also updates output file naming to use the frequency prefix.
Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency.
Expanded end-to-end and fit test scripts to include daily NSSP count and 'other ED visits' forecasts. Updated argument handling in test_epiautogp_fit.sh to support an optional ed_visit_type parameter and adjusted expected model counts accordingly.
Updated test_forecast_utils.py to use new ForecastPipelineContext interface, updated patch paths, and migrated to context methods for prepare_model_data and post_process_forecast. Removed test_prep_epiautogp_data.py as part of test suite cleanup.
@codecov
Copy link

codecov bot commented Dec 12, 2025

Codecov Report

❌ Patch coverage is 68.77193% with 89 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (epiautogp@993f353). Learn more about missing BASE report.

Files with missing lines Patch % Lines
pipelines/epiautogp/forecast_epiautogp.py 0.00% 41 Missing ⚠️
pipelines/epiautogp/process_epiautogp_forecast.py 13.33% 26 Missing ⚠️
pipelines/epiautogp/prep_epiautogp_data.py 4.34% 22 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             epiautogp     #783   +/-   ##
============================================
  Coverage             ?   43.28%           
============================================
  Files                ?       36           
  Lines                ?     3061           
  Branches             ?        0           
============================================
  Hits                 ?     1325           
  Misses               ?     1736           
  Partials             ?        0           
Flag Coverage Δ
hewr 29.93% <ø> (?)
pipelines 43.54% <68.77%> (?)
pyrenew_hew 62.29% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@damonbayer damonbayer requested a review from Copilot December 12, 2025 22:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive forecasting pipeline for the EpiAutoGP model, enabling it to forecast both daily and weekly ED visits (NSSP) and hospital admissions (NHSN). The implementation follows the existing pipeline patterns while introducing new utilities for data conversion, model execution, and post-processing specific to EpiAutoGP's Julia-based workflow.

Key Changes

  • Introduced forecast_epiautogp.py as the main entry point for the EpiAutoGP forecasting pipeline
  • Added shared pipeline utilities (ForecastPipelineContext, ModelPaths, setup_forecast_pipeline) to reduce code duplication
  • Enhanced Julia EpiAutoGP model to support flexible daily/weekly forecast strides with improved parameter naming

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pipelines/tests/test_prep_epiautogp_data.py Removed (tests moved or consolidated)
pipelines/tests/test_forecast_utils.py New unit tests for forecast pipeline utilities using mocks
pipelines/tests/test_epiautogp_prep_script.py Removed preparation script tests
pipelines/tests/test_epiautogp_prep.sh Removed shell test for data preparation
pipelines/tests/test_epiautogp_fit.sh New shell script to run single EpiAutoGP forecasts
pipelines/tests/test_epiautogp_end_to_end.sh New comprehensive end-to-end integration test
pipelines/forecast_pyrenew.py Fixed import organization (moved to absolute imports)
pipelines/epiautogp/process_epiautogp_forecast.py New post-processing utilities for EpiAutoGP outputs
pipelines/epiautogp/prep_epiautogp_data.py Enhanced data conversion with context/paths pattern and ed_visit_type support
pipelines/epiautogp/plot_epiautogp_forecast.R New R plotting script for EpiAutoGP-specific visualizations
pipelines/epiautogp/forecast_epiautogp.py Main pipeline entry point orchestrating all steps
pipelines/epiautogp/epiautogp_forecast_utils.py Shared utilities and dataclasses for pipeline stages
pipelines/epiautogp/init.py Updated exports for new utilities
pipelines/epiautogp/README.md Comprehensive documentation of pipeline architecture
EpiAutoGP/test/test_parse_arguments.jl Updated test for renamed parameter
EpiAutoGP/test/test_output.jl Added new required fields to test data
EpiAutoGP/test/test_modelling.jl Updated tests for renamed parameter and added daily frequency test
EpiAutoGP/test/test_input.jl Updated all test inputs with new required fields
EpiAutoGP/src/parse_arguments.jl Renamed n-forecast-weeks to n-ahead for flexibility
EpiAutoGP/src/output.jl Added PipelineOutput type and refactored output creation
EpiAutoGP/src/modelling.jl Updated to support daily/weekly frequencies with time_step calculation
EpiAutoGP/src/input.jl Added frequency, use_percentage, and ed_visit_type fields
EpiAutoGP/src/EpiAutoGP.jl Added Parquet dependency and new constants
EpiAutoGP/run.jl Switched default output type to PipelineOutput
EpiAutoGP/Project.toml Added Parquet dependency and reordered authors field

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 86 to 87
output_json_path : Path
Path where the EpiAutoGP JSON file will be saved
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter documentation mentions output_json_path, but this parameter was removed from the function signature. The docstring should be updated to remove this parameter and its description.

Suggested change
output_json_path : Path
Path where the EpiAutoGP JSON file will be saved

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +116
first_training_date=self.first_training_date,
last_training_date=self.last_training_date,
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prepare_model_data method lacks test coverage for the case when NHSN data path is provided. The existing tests in test_forecast_utils.py should be extended to verify NHSN data handling.

Copilot uses AI. Check for mistakes.
end
end

# Convert percentage to proportion if needed (R expects proportions for prop_ variables)
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 220 states 'R expects proportions for prop_ variables', but this transformation divides by 100, which converts percentages to proportions. Consider clarifying that the input data is in percentage format (0-100) and needs conversion to proportion format (0-1).

Suggested change
# Convert percentage to proportion if needed (R expects proportions for prop_ variables)
# Input is in percentage format (0-100); convert to proportion (0-1) as R expects proportions for prop_ variables

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants