|
575 | 575 | "outputs": [], |
576 | 576 | "source": [ |
577 | 577 | "remote_run.download_file(\"outputs/predictions.csv\", \"predictions.csv\")\n", |
578 | | - "df_all = pd.read_csv(\"predictions.csv\")" |
| 578 | + "fcst_df = pd.read_csv(\"predictions.csv\")" |
| 579 | + ] |
| 580 | + }, |
| 581 | + { |
| 582 | + "cell_type": "markdown", |
| 583 | + "metadata": {}, |
| 584 | + "source": [ |
| 585 | + "Note that the rolling forecast can contain multiple predictions for each date, each from a different forecast origin. For example, consider 2012-09-05:" |
| 586 | + ] |
| 587 | + }, |
| 588 | + { |
| 589 | + "cell_type": "code", |
| 590 | + "execution_count": null, |
| 591 | + "metadata": {}, |
| 592 | + "outputs": [], |
| 593 | + "source": [ |
| 594 | + "fcst_df[fcst_df.date == \"2012-09-05\"]" |
| 595 | + ] |
| 596 | + }, |
| 597 | + { |
| 598 | + "cell_type": "markdown", |
| 599 | + "metadata": {}, |
| 600 | + "source": [ |
| 601 | + "Here, the forecast origin refers to the latest date of actuals available for a given forecast. The earliest origin in the rolling forecast, 2012-08-31, is the last day in the training data. For origin date 2012-09-01, the forecasts use actual recorded counts from the training data *and* the actual count recorded on 2012-09-01. Note that the model is not retrained for origin dates later than 2012-08-31, but the values for model features, such as lagged values of daily count, are updated.\n", |
| 602 | + "\n", |
| 603 | + "Let's calculate the metrics over all rolling forecasts:" |
579 | 604 | ] |
580 | 605 | }, |
581 | 606 | { |
|
587 | 612 | "from azureml.automl.core.shared import constants\n", |
588 | 613 | "from azureml.automl.runtime.shared.score import scoring\n", |
589 | 614 | "from sklearn.metrics import mean_absolute_error, mean_squared_error\n", |
590 | | - "from matplotlib import pyplot as plt\n", |
591 | 615 | "\n", |
592 | 616 | "# use automl metrics module\n", |
593 | 617 | "scores = scoring.score_regression(\n", |
594 | | - " y_test=df_all[target_column_name],\n", |
595 | | - " y_pred=df_all[\"predicted\"],\n", |
| 618 | + " y_test=fcst_df[target_column_name],\n", |
| 619 | + " y_pred=fcst_df[\"predicted\"],\n", |
596 | 620 | " metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n", |
597 | 621 | ")\n", |
598 | 622 | "\n", |
599 | 623 | "print(\"[Test data scores]\\n\")\n", |
600 | 624 | "for key, value in scores.items():\n", |
601 | | - " print(\"{}: {:.3f}\".format(key, value))\n", |
602 | | - "\n", |
603 | | - "# Plot outputs\n", |
604 | | - "%matplotlib inline\n", |
605 | | - "test_pred = plt.scatter(df_all[target_column_name], df_all[\"predicted\"], color=\"b\")\n", |
606 | | - "test_test = plt.scatter(\n", |
607 | | - " df_all[target_column_name], df_all[target_column_name], color=\"g\"\n", |
608 | | - ")\n", |
609 | | - "plt.legend(\n", |
610 | | - " (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n", |
611 | | - ")\n", |
612 | | - "plt.show()" |
| 625 | + " print(\"{}: {:.3f}\".format(key, value))" |
613 | 626 | ] |
614 | 627 | }, |
615 | 628 | { |
|
618 | 631 | "source": [ |
619 | 632 | "For more details on what metrics are included and how they are calculated, please refer to [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics). You could also calculate residuals, like described [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals).\n", |
620 | 633 | "\n", |
621 | | - "\n", |
622 | | - "Since we did a rolling evaluation on the test set, we can analyze the predictions by their forecast horizon relative to the rolling origin. The model was initially trained at a forecast horizon of 14, so each prediction from the model is associated with a horizon value from 1 to 14. The horizon values are in a column named, \"horizon_origin,\" in the prediction set. For example, we can calculate some of the error metrics grouped by the horizon:" |
| 634 | + "The rolling forecast metric values are very high in comparison to the validation metrics reported by the AutoML job. What's going on here? We will investigate in the following cells!" |
| 635 | + ] |
| 636 | + }, |
| 637 | + { |
| 638 | + "cell_type": "markdown", |
| 639 | + "metadata": {}, |
| 640 | + "source": [ |
| 641 | + "### Forecast versus actuals plot\n", |
| 642 | + "We will plot predictions and actuals on a time series plot. Since there are many forecasts for each date, we select the 14-day-ahead forecast from each forecast origin for our comparison." |
623 | 643 | ] |
624 | 644 | }, |
625 | 645 | { |
|
628 | 648 | "metadata": {}, |
629 | 649 | "outputs": [], |
630 | 650 | "source": [ |
631 | | - "from metrics_helper import MAPE, APE\n", |
632 | | - "\n", |
633 | | - "df_all.groupby(\"horizon_origin\").apply(\n", |
634 | | - " lambda df: pd.Series(\n", |
635 | | - " {\n", |
636 | | - " \"MAPE\": MAPE(df[target_column_name], df[\"predicted\"]),\n", |
637 | | - " \"RMSE\": np.sqrt(\n", |
638 | | - " mean_squared_error(df[target_column_name], df[\"predicted\"])\n", |
639 | | - " ),\n", |
640 | | - " \"MAE\": mean_absolute_error(df[target_column_name], df[\"predicted\"]),\n", |
641 | | - " }\n", |
642 | | - " )\n", |
643 | | - ")" |
| 651 | + "from matplotlib import pyplot as plt\n", |
| 652 | + "\n", |
| 653 | + "%matplotlib inline\n", |
| 654 | + "\n", |
| 655 | + "fcst_df_h14 = (\n", |
| 656 | + " fcst_df.groupby(\"forecast_origin\", as_index=False)\n", |
| 657 | + " .last()\n", |
| 658 | + " .drop(columns=[\"forecast_origin\"])\n", |
| 659 | + ")\n", |
| 660 | + "fcst_df_h14.set_index(time_column_name, inplace=True)\n", |
| 661 | + "plt.plot(fcst_df_h14[[target_column_name, \"predicted\"]])\n", |
| 662 | + "plt.xticks(rotation=45)\n", |
| 663 | + "plt.title(f\"Predicted vs. Actuals\")\n", |
| 664 | + "plt.legend([\"actual\", \"14-day-ahead forecast\"])\n", |
| 665 | + "plt.show()" |
644 | 666 | ] |
645 | 667 | }, |
646 | 668 | { |
647 | 669 | "cell_type": "markdown", |
648 | 670 | "metadata": {}, |
649 | 671 | "source": [ |
650 | | - "To drill down more, we can look at the distributions of APE (absolute percentage error) by horizon. From the chart, it is clear that the overall MAPE is being skewed by one particular point where the actual value is of small absolute value." |
| 672 | + "Looking at the plot, there are two clear issues:\n", |
| 673 | + "1. An anomalously low count value on October 29th, 2012.\n", |
| 674 | + "2. End-of-year holidays (Thanksgiving and Christmas) in late November and late December.\n", |
| 675 | + "\n", |
| 676 | + "What happened on Oct. 29th, 2012? That day, Hurricane Sandy brought severe storm surge flooding to the east coast of the United States, particularly around New York City. This is certainly an anomalous event that the model did not account for!\n", |
| 677 | + "\n", |
| 678 | + "As for the late year holidays, the model apparently did not learn to account for the full reduction of bike share rentals on these major holidays. The training data covers 2011 and early 2012, so the model fit only had access to a single occurrence of these holidays. This makes it challenging to resolve holiday effects; however, a larger AutoML model search may result in a better model that is more holiday-aware.\n", |
| 679 | + "\n", |
| 680 | + "If we filter the predictions prior to the Thanksgiving holiday and remove the anomalous day of 2012-10-29, the metrics are closer to validation levels:" |
651 | 681 | ] |
652 | 682 | }, |
653 | 683 | { |
|
656 | 686 | "metadata": {}, |
657 | 687 | "outputs": [], |
658 | 688 | "source": [ |
659 | | - "df_all_APE = df_all.assign(APE=APE(df_all[target_column_name], df_all[\"predicted\"]))\n", |
660 | | - "APEs = [\n", |
661 | | - " df_all_APE[df_all[\"horizon_origin\"] == h].APE.values\n", |
662 | | - " for h in range(1, forecast_horizon + 1)\n", |
663 | | - "]\n", |
664 | | - "\n", |
665 | | - "%matplotlib inline\n", |
666 | | - "plt.boxplot(APEs)\n", |
667 | | - "plt.yscale(\"log\")\n", |
668 | | - "plt.xlabel(\"horizon\")\n", |
669 | | - "plt.ylabel(\"APE (%)\")\n", |
670 | | - "plt.title(\"Absolute Percentage Errors by Forecast Horizon\")\n", |
| 689 | + "date_filter = (fcst_df.date != \"2012-10-29\") & (fcst_df.date < \"2012-11-22\")\n", |
| 690 | + "scores = scoring.score_regression(\n", |
| 691 | + " y_test=fcst_df[date_filter][target_column_name],\n", |
| 692 | + " y_pred=fcst_df[date_filter][\"predicted\"],\n", |
| 693 | + " metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n", |
| 694 | + ")\n", |
671 | 695 | "\n", |
672 | | - "plt.show()" |
| 696 | + "print(\"[Test data scores (filtered)]\\n\")\n", |
| 697 | + "for key, value in scores.items():\n", |
| 698 | + " print(\"{}: {:.3f}\".format(key, value))" |
673 | 699 | ] |
674 | 700 | } |
675 | 701 | ], |
|
711 | 737 | "name": "python", |
712 | 738 | "nbconvert_exporter": "python", |
713 | 739 | "pygments_lexer": "ipython3", |
714 | | - "version": "3.8.5" |
| 740 | + "version": "3.7.13" |
715 | 741 | }, |
716 | 742 | "mimetype": "text/x-python", |
717 | 743 | "name": "python", |
|
0 commit comments