qlycool
diff --git a/‎how-to-use-azureml/reinforcement-learning/cartpole-on-compute-instance/cartpole_ci.ipynb‎
Lines changed: 75 additions & 38 deletions b/‎how-to-use-azureml/reinforcement-learning/cartpole-on-compute-instance/cartpole_ci.ipynb‎
Lines changed: 75 additions & 38 deletions
@@ -451,9 +451,8 @@
       "metadata": {},
       "source": [
         "### Create a dataset of training artifacts\n",
-        "To evaluate a trained policy (a checkpoint) we need to make the checkpoint accessible to the rollout script. All the training artifacts are stored in workspace default datastore under **azureml/&lt;run_id&gt;** directory.\n",
-        "\n",
-        "Here we create a file dataset from the stored artifacts, and then use this dataset to feed these data to rollout estimator."
+        "To evaluate a trained policy (a checkpoint) we need to make the checkpoint accessible to the rollout script.\n",
+        "We can use the Run API to download policy training artifacts (saved model and checkpoints) to local compute."
       ]
     },
     {
@@ -462,22 +461,24 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "from azureml.core import Dataset\n",
+        "from os import path\n",
+        "from distutils import dir_util\n",
         "\n",
-        "run_id = child_run_0.id # Or set to run id of a completed run (e.g. 'rl-cartpole-v0_1587572312_06e04ace_head')\n",
-        "run_artifacts_path = os.path.join('azureml', run_id)\n",
-        "print(\"Run artifacts path:\", run_artifacts_path)\n",
+        "training_artifacts_path = path.join(\"logs\", training_algorithm)\n",
+        "print(\"Training artifacts path:\", training_artifacts_path)\n",
         "\n",
-        "# Create a file dataset object from the files stored on default datastore\n",
-        "datastore = ws.get_default_datastore()\n",
-        "training_artifacts_ds = Dataset.File.from_files(datastore.path(os.path.join(run_artifacts_path, '**')))"
+        "if path.exists(training_artifacts_path):\n",
+        "    dir_util.remove_tree(training_artifacts_path)\n",
+        "\n",
+        "# Download run artifacts to local compute\n",
+        "child_run_0.download_files(training_artifacts_path)"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "To verify, we can print out the number (and paths) of all the files in the dataset, as follows."
+        "Now let's find the checkpoints and the last checkpoint number."
       ]
     },
     {
@@ -486,26 +487,43 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "artifacts_paths = training_artifacts_ds.to_path()\n",
-        "print(\"Number of files in dataset:\", len(artifacts_paths))\n",
-        "\n",
-        "# Uncomment line below to print all file paths\n",
-        "#print(\"Artifacts dataset file paths: \", artifacts_paths)"
+        "# A helper function to find checkpoint files in a directory\n",
+        "def find_checkpoints(file_path):\n",
+        "    print(\"Looking in path:\", file_path)\n",
+        "    checkpoints = []\n",
+        "    for root, _, files in os.walk(file_path):\n",
+        "        for name in files:\n",
+        "            if os.path.basename(root).startswith('checkpoint_'):\n",
+        "                checkpoints.append(path.join(root, name))\n",
+        "    return checkpoints"
       ]
     },
     {
-      "cell_type": "markdown",
+      "cell_type": "code",
+      "execution_count": null,
       "metadata": {},
+      "outputs": [],
       "source": [
-        "### Evaluate a trained policy\n",
-        "We need to configure another reinforcement learning estimator, `rollout_estimator`, and then use it to submit another run. Note that the entry script for this estimator now points to `cartpole-rollout.py` script.\n",
-        "Also note how we pass the checkpoints dataset to this script using `inputs` parameter of the _ReinforcementLearningEstimator_.\n",
+        "# Find checkpoints and last checkpoint number\n",
+        "checkpoint_files = find_checkpoints(training_artifacts_path)\n",
         "\n",
-        "We are using script parameters to pass in the same algorithm and the same environment used during training. We also specify the checkpoint number of the checkpoint we wish to evaluate, `checkpoint-number`, and number of the steps we shall run the rollout, `steps`.\n",
+        "checkpoint_numbers = []\n",
+        "for file in checkpoint_files:\n",
+        "    file = os.path.basename(file)\n",
+        "    if file.startswith('checkpoint-') and not file.endswith('.tune_metadata'):\n",
+        "        checkpoint_numbers.append(int(file.split('-')[1]))\n",
         "\n",
-        "The checkpoints dataset will be accessible to the rollout script as a mounted folder. The mounted folder and the checkpoint number, passed in via `checkpoint-number`, will be used to create a path to the checkpoint we are going to evaluate. The created checkpoint path then will be passed into RLlib rollout script for evaluation.\n",
+        "print(\"Checkpoints:\", checkpoint_numbers)\n",
         "\n",
-        "Let's find the checkpoints and the last checkpoint number first."
+        "last_checkpoint_number = max(checkpoint_numbers)\n",
+        "print(\"Last checkpoint number:\", last_checkpoint_number)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Now we upload checkpoints to default datastore and create a file dataset. This dataset will be used to pass in the checkpoints to the rollout script."
       ]
     },
     {
@@ -514,27 +532,46 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "# Find checkpoints and last checkpoint number\n",
-        "checkpoint_files = [\n",
-        "    os.path.basename(file) for file in training_artifacts_ds.to_path() \\\n",
-        "        if os.path.basename(file).startswith('checkpoint-') and \\\n",
-        "            not os.path.basename(file).endswith('tune_metadata')\n",
-        "]\n",
-        "\n",
-        "checkpoint_numbers = []\n",
-        "for file in checkpoint_files:\n",
-        "    checkpoint_numbers.append(int(file.split('-')[1]))\n",
+        "# Upload the checkpoint files and create a DataSet\n",
+        "from azureml.core import Dataset\n",
         "\n",
-        "print(\"Checkpoints:\", checkpoint_numbers)\n",
+        "datastore = ws.get_default_datastore()\n",
+        "checkpoint_dataref = datastore.upload_files(checkpoint_files, target_path='cartpole_checkpoints_' + run_id, overwrite=True)\n",
+        "checkpoint_ds = Dataset.File.from_files(checkpoint_dataref)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "To verify, we can print out the number (and paths) of all the files in the dataset."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "artifacts_paths = checkpoint_ds.to_path()\n",
+        "print(\"Number of files in dataset:\", len(artifacts_paths))\n",
         "\n",
-        "last_checkpoint_number = max(checkpoint_numbers)\n",
-        "print(\"Last checkpoint number:\", last_checkpoint_number)"
+        "# Uncomment line below to print all file paths\n",
+        "#print(\"Artifacts dataset file paths: \", artifacts_paths)"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
+        "### Evaluate a trained policy\n",
+        "We need to configure another reinforcement learning estimator, `rollout_estimator`, and then use it to submit another run. Note that the entry script for this estimator now points to `cartpole-rollout.py` script.\n",
+        "Also note how we pass the checkpoints dataset to this script using `inputs` parameter of the _ReinforcementLearningEstimator_.\n",
+        "\n",
+        "We are using script parameters to pass in the same algorithm and the same environment used during training. We also specify the checkpoint number of the checkpoint we wish to evaluate, `checkpoint-number`, and number of the steps we shall run the rollout, `steps`.\n",
+        "\n",
+        "The checkpoints dataset will be accessible to the rollout script as a mounted folder. The mounted folder and the checkpoint number, passed in via `checkpoint-number`, will be used to create a path to the checkpoint we are going to evaluate. The created checkpoint path then will be passed into RLlib rollout script for evaluation.\n",
+        "\n",
         "Now let's configure rollout estimator. Note that we use the last checkpoint for evaluation. The assumption is that the last checkpoint points to our best trained agent. You may change this to any of the checkpoint numbers printed above and observe the effect."
       ]
     },
@@ -576,8 +613,8 @@
         "    \n",
         "    # Data inputs\n",
         "    inputs=[\n",
-        "        training_artifacts_ds.as_named_input('artifacts_dataset'),\n",
-        "        training_artifacts_ds.as_named_input('artifacts_path').as_mount()],\n",
+        "        checkpoint_ds.as_named_input('artifacts_dataset'),\n",
+        "        checkpoint_ds.as_named_input('artifacts_path').as_mount()],\n",
         "    \n",
         "    # The Azure Machine Learning compute target\n",
         "    compute_target=compute_target,\n",