Skip to content

Commit 4a9a1ff

Browse files
lokijotaJoao Pedro Martins
andauthored
Add details on the Batch scoring session of Getting Started (microsoft#389)
* Revision of getting started guide up to Batch scoring. Also new diagam and fix to ARM template to remove region restrictions. * Detail on Batch scoring for Getting Started and additional debug message in the copy to ease of diagnosing issues * Tweaked text and added a NOQA for message Co-authored-by: Joao Pedro Martins <[email protected]>
1 parent 4b2667e commit 4a9a1ff

File tree

3 files changed

+22
-12
lines changed

3 files changed

+22
-12
lines changed

diabetes_regression/scoring/parallel_batchscore_copyoutput.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,6 @@ def copy_output(args):
8686
or args.output_path is None
8787
or args.output_path.strip() == ""
8888
):
89-
print("Missing parameters")
89+
print("Missing parameters in parallel_batchscore_copyoutput.py -- Not going to copy inferences to an output datastore") # NOQA E501
9090
else:
9191
copy_output(args)

docs/getting_started.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -286,39 +286,49 @@ The pipeline has the following stage:
286286

287287
### Set up the Batch Scoring pipeline
288288

289-
In your Azure DevOps project, create and run a new build pipeline based on the [diabetes_regression-batchscoring-ci.yml](../.pipelines/diabetes_regression-batchscoring-ci.yml)
290-
pipeline definition in your forked repository.
289+
In your Azure DevOps project, create and run a new build pipeline based on the [.pipelines/diabetes_regression-batchscoring-ci.yml](../.pipelines/diabetes_regression-batchscoring-ci.yml)
290+
pipeline definition in your forked repository. Rename this pipeline to `Batch-Scoring`.
291291

292292
Once the pipeline is finished, check the execution result:
293293

294294
![Build](./images/batchscoring-ci-result.png)
295295

296-
Also check the published batch scoring pipeline in the **mlops-AML-WS** workspace in [Azure Portal](https://portal.azure.com/):
296+
Also check the published batch scoring pipeline in your AML workspace in the [Azure Portal](https://portal.azure.com/):
297297

298298
![Batch scoring pipeline](./images/batchscoring-pipeline.png)
299299

300300
Great, you now have the build pipeline set up for batch scoring which automatically triggers every time there's a change in the master branch!
301301

302-
The pipeline stages are summarized below:
302+
The pipeline stages are described below in detail -- and you must do further configurations to actually see the batch inferences:
303303

304304
#### Batch Scoring CI
305305

306306
- Linting (code quality analysis)
307307
- Unit tests and code coverage analysis
308-
- Build and publish *ML Batch Scoring Pipeline* in an *ML Workspace*
308+
- Build and publish *ML Batch Scoring Pipeline* in an *AML Workspace*
309309

310310
#### Batch Score model
311311

312312
- Determine the model to be used based on the model name (required), model version, model tag name and model tag value bound pipeline parameters.
313313
- If run via Azure DevOps pipeline, the batch scoring pipeline will take the model name and version from the `Model-Train-Register-CI` build used as input.
314314
- If run locally without the model version, the batch scoring pipeline will use the model's latest version.
315-
- Trigger the *ML Batch Scoring Pipeline* and waits for it to complete.
315+
- Trigger the *ML Batch Scoring Pipeline* and wait for it to complete.
316316
- This is an **agentless** job. The CI pipeline can wait for ML pipeline completion for hours or even days without using agent resources.
317-
- Use the scoring input data supplied via the SCORING_DATASTORE_INPUT_* configuration variables, or uses the default datastore and sample data.
318-
- Once scoring is completed, the scores are made available in the same blob storage at the locations specified via the SCORING_DATASTORE_OUTPUT_* configuration variables.
319-
320-
To configure your own custom scoring data, see [Configure Custom Batch Scoring](custom_model.md#Configure-Custom-Batch-Scoring).
321-
317+
- Create an Azure ML pipeline with two steps. The pipeline is created by the code in `ml_service\pipelines\diabetes_regression_build_parallel_batchscore_pipeline.py` and has two steps:
318+
- `scoringstep` - this step is a **`ParallelRunStep`** that executes the code in `diabetes_regression\scoring\parallel_batchscore.py` with several different batches of the data to be scored.
319+
- `scorecopystep` - this is a **`PythonScriptStep`** step that copies the output inferences from Azure ML's internal storage into a target location in a another storage account.
320+
- If you run the instructions as defined above with no changes to variables, this step will be **not** executed. You'll see a message in the logs for the corresponding step saying `Missing Parameters`. In this case, you'll be able to find the file with the inferences in the same Storage Account associated with Azure ML, in a location similar to `azureml-blobstore-SomeGuid\azureml\SomeOtherGuid\defaultoutput\parallel_run_step.txt`. One way to find the right path is this:
321+
- Open your experiment in Azure ML (by default called `mlopspython`).
322+
- Open the run that you want to look at (named something like `neat_morning_qc10dzjy` or similar).
323+
- In the graphical pipeline view with 2 steps, click the button to open the details tab: `Show run overview`.
324+
- You'll see two steps (corresponding to `scoringstep`and `scorecopystep` as described above).
325+
- Click the step with the with older "Submitted time".
326+
- Click "Output + logs" at the top, and you'll see something like the following:
327+
![Outputs of `scoringstep`](./images/batch-child-run-scoringstep.png)
328+
- The `defaultoutput` file will have JSON content with the path to a file called `parallel_run_step.txt` containing the scoring.
329+
330+
To properly configure this step for your own custom scoring data, you must follow the instructions in [Configure Custom Batch Scoring](custom_model.md#Configure-Custom-Batch-Scoring), which let you specify both the location of the files to score (via the `SCORING_DATASTORE_INPUT_*` configuration variables) and where to store the inferences (via the `SCORING_DATASTORE_OUTPUT_*` configuration variables).
331+
322332
## Further Exploration
323333

324334
You should now have a working set of pipelines that can get you started with MLOpsPython. Below are some additional features offered that might suit your scenario.
8.84 KB
Loading

0 commit comments

Comments
 (0)