-
Notifications
You must be signed in to change notification settings - Fork 3
Issue 780: Adds forecast_epiautogp
#783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: epiautogp
Are you sure you want to change the base?
Conversation
Introduces forecast_utils.py with dataclasses and functions for setting up, preparing, and postprocessing forecast pipeline runs. Includes comprehensive unit tests for all major utilities, using mocking to isolate dependencies and verify correct logic and file structure handling.
Outputs to expected parquet format
Introduces DEFAULT_TARGET_LETTER mapping for target abbreviations and updates the Parquet output filename in create_forecast_output to use the appropriate target letter for hubverse compatibility. Also adds geo_value and disease columns to the forecast output for improved metadata.
Introduces a use_percentage boolean field to EpiAutoGPInput to distinguish between raw counts and percentage-based input for ED visits. Updates output logic to set the variable name and convert values to proportions when use_percentage is true for nssp targets. Test cases and input construction are updated accordingly.
Renamed forecast_utils.py to epiautogp_forecast_utils.py and updated all imports accordingly. Refactored the EpiAutoGP pipeline to use a context object for configuration, streamlined argument passing, and improved modularity. Added a new R plotting script (plot_epiautogp_forecast.R) for EpiAutoGP outputs. Introduced end-to-end and fit test shell scripts for automated testing. Removed obsolete prep test scripts. Updated process_epiautogp_forecast.py to simplify output processing and match R plotting expectations.
Consolidated forecast post-processing steps (processing outputs, creating hubverse table, and plotting) into a single post_process_forecast utility in epiautogp_forecast_utils.py. Updated imports and usage in __init__.py and forecast_epiautogp.py for improved modularity and code reuse. Added param_data_dir to ForecastPipelineContext and setup_forecast_pipeline.
Moved prepare_model_data and post_process_forecast functions into ForecastPipelineContext as methods. Updated imports and usage in forecast_epiautogp.py and __init__.py to use the new class methods, improving encapsulation and code organization.
Introduces a 'frequency' field to EpiAutoGPInput to support both daily and epiweekly data. Refactors modelling and argument parsing to use a generic 'n_ahead' parameter (number of time steps) instead of 'n_forecast_weeks', and updates all related documentation, tests, and function signatures for consistency and flexibility.
Introduces the ed_visit_type field to EpiAutoGPInput for specifying the type of ED visits, updates output logic to use this field for column selection, and adjusts tests and documentation accordingly. Also updates output file naming to use the frequency prefix.
Introduces the 'ed_visit_type' parameter to allow selection between 'observed' and 'other' ED visits for NSSP targets throughout the EpiAutoGP pipeline. Updates parameter validation, data extraction, and model naming to support this distinction, and adjusts CLI and function signatures accordingly. Also ensures correct forecast sample file selection based on frequency.
Expanded end-to-end and fit test scripts to include daily NSSP count and 'other ED visits' forecasts. Updated argument handling in test_epiautogp_fit.sh to support an optional ed_visit_type parameter and adjusted expected model counts accordingly.
Updated test_forecast_utils.py to use new ForecastPipelineContext interface, updated patch paths, and migrated to context methods for prepare_model_data and post_process_forecast. Removed test_prep_epiautogp_data.py as part of test suite cleanup.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## epiautogp #783 +/- ##
============================================
Coverage ? 43.28%
============================================
Files ? 36
Lines ? 3061
Branches ? 0
============================================
Hits ? 1325
Misses ? 1736
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive forecasting pipeline for the EpiAutoGP model, enabling it to forecast both daily and weekly ED visits (NSSP) and hospital admissions (NHSN). The implementation follows the existing pipeline patterns while introducing new utilities for data conversion, model execution, and post-processing specific to EpiAutoGP's Julia-based workflow.
Key Changes
- Introduced
forecast_epiautogp.pyas the main entry point for the EpiAutoGP forecasting pipeline - Added shared pipeline utilities (
ForecastPipelineContext,ModelPaths,setup_forecast_pipeline) to reduce code duplication - Enhanced Julia EpiAutoGP model to support flexible daily/weekly forecast strides with improved parameter naming
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pipelines/tests/test_prep_epiautogp_data.py | Removed (tests moved or consolidated) |
| pipelines/tests/test_forecast_utils.py | New unit tests for forecast pipeline utilities using mocks |
| pipelines/tests/test_epiautogp_prep_script.py | Removed preparation script tests |
| pipelines/tests/test_epiautogp_prep.sh | Removed shell test for data preparation |
| pipelines/tests/test_epiautogp_fit.sh | New shell script to run single EpiAutoGP forecasts |
| pipelines/tests/test_epiautogp_end_to_end.sh | New comprehensive end-to-end integration test |
| pipelines/forecast_pyrenew.py | Fixed import organization (moved to absolute imports) |
| pipelines/epiautogp/process_epiautogp_forecast.py | New post-processing utilities for EpiAutoGP outputs |
| pipelines/epiautogp/prep_epiautogp_data.py | Enhanced data conversion with context/paths pattern and ed_visit_type support |
| pipelines/epiautogp/plot_epiautogp_forecast.R | New R plotting script for EpiAutoGP-specific visualizations |
| pipelines/epiautogp/forecast_epiautogp.py | Main pipeline entry point orchestrating all steps |
| pipelines/epiautogp/epiautogp_forecast_utils.py | Shared utilities and dataclasses for pipeline stages |
| pipelines/epiautogp/init.py | Updated exports for new utilities |
| pipelines/epiautogp/README.md | Comprehensive documentation of pipeline architecture |
| EpiAutoGP/test/test_parse_arguments.jl | Updated test for renamed parameter |
| EpiAutoGP/test/test_output.jl | Added new required fields to test data |
| EpiAutoGP/test/test_modelling.jl | Updated tests for renamed parameter and added daily frequency test |
| EpiAutoGP/test/test_input.jl | Updated all test inputs with new required fields |
| EpiAutoGP/src/parse_arguments.jl | Renamed n-forecast-weeks to n-ahead for flexibility |
| EpiAutoGP/src/output.jl | Added PipelineOutput type and refactored output creation |
| EpiAutoGP/src/modelling.jl | Updated to support daily/weekly frequencies with time_step calculation |
| EpiAutoGP/src/input.jl | Added frequency, use_percentage, and ed_visit_type fields |
| EpiAutoGP/src/EpiAutoGP.jl | Added Parquet dependency and new constants |
| EpiAutoGP/run.jl | Switched default output type to PipelineOutput |
| EpiAutoGP/Project.toml | Added Parquet dependency and reordered authors field |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| output_json_path : Path | ||
| Path where the EpiAutoGP JSON file will be saved |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter documentation mentions output_json_path, but this parameter was removed from the function signature. The docstring should be updated to remove this parameter and its description.
| output_json_path : Path | |
| Path where the EpiAutoGP JSON file will be saved |
| first_training_date=self.first_training_date, | ||
| last_training_date=self.last_training_date, |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prepare_model_data method lacks test coverage for the case when NHSN data path is provided. The existing tests in test_forecast_utils.py should be extended to verify NHSN data handling.
| end | ||
| end | ||
|
|
||
| # Convert percentage to proportion if needed (R expects proportions for prop_ variables) |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 220 states 'R expects proportions for prop_ variables', but this transformation divides by 100, which converts percentages to proportions. Consider clarifying that the input data is in percentage format (0-100) and needs conversion to proportion format (0-1).
| # Convert percentage to proportion if needed (R expects proportions for prop_ variables) | |
| # Input is in percentage format (0-100); convert to proportion (0-1) as R expects proportions for prop_ variables |
This PR closes #780 .
This pull request introduces several enhancements and refactors to:
EpIAutoGP. These make the model flexible in doing daily or weekly forecast strides, and formats output to work more smoothly with post-processing.pipelines/epiautogp. The python module that integrates runningEpiAutoGPwith thepyrenew-hewpipeline. The entrypoint script for this isforecast_epiautogp.pywhich is similar toforecast_pyrenew.pyandforecast_timeseries.py.forecast_epiautogp.pyWhere possible I have reused functionality that exists in
pipelines, however, to make dev easier I have introduced some classes and functions that wrap multiple steps. Inforecast_epiautogp.pythere are the following steps:setup_forecast_pipelinewhich does the credential step and the existing data wrangling code and puts the pipeline information into aForecastPipelineContextdataclass object. This reduced the amount of parameters that need passing around.ModelPathsdataclass object to hold the various paths that get passed around.EpiAutoGPdata set up that creates a newJSONdata file forEpiAutoGPto use.EpiAutoGPmodelEpiAutoGPbut it remains based on current post-processing functions.End to end testing
I've add the integration test
pipelines/tests/test_epiautogp_end_to_end.shwhich matches the structure of the existing end to end tests but currently only for covid. Model options cover: