Skip to content

Conversation

@vpandiarajan20
Copy link
Member

@vpandiarajan20 vpandiarajan20 commented Nov 21, 2025

Adds training-script test-local CLI command for testing ML training scripts locally in Docker before submitting to cloud.

  • Validates training script structure (setup.py, model/training.py) and dataset paths
  • Mounts local directories into Docker containers with proper working directory setup
  • Supports custom training arguments and both standard/custom container versions
  • Handles Docker availability checks, signal interrupts, and platform compatibility (linux/x86_64)
  • Added --force-linux-path flag to dataset export command to force Linux-style paths in dataset.jsonl - needed to run on Windows, so we have Linux paths (to use in the container)

Working on manual testing.

  • Apple Silicon Mac
  • Ubuntu
  • Intel Mac
  • Windows

Usage Example

Download the dataset

viam dataset export --destination=tmp/tmp --dataset-id=6515b637a420eb685e988e84

Clone the classification-tflite repo

Run the below command

viam training-script test-local \
  --dataset-root=/Users/[email protected]/viam/rdk \    
  --dataset-file=tmp/tmp/dataset.jsonl \
  --training-script-directory=/Users/[email protected]/viam-modules/classification-tflite \
  --container-version=tf:2.16 \                                                                                                                       
  --model-output-directory=./output \
  --custom-args=num_epochs=1

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Nov 21, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
// Validate that the key portion only contains safe characters
parts := strings.SplitN(arg, "=", 2)
key := parts[0]
if !isValidArgumentKey(key) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clean the input, let me know if there's a better way to do this.


// Provide additional context for platform-related errors
errMsg := err.Error()
if strings.Contains(errMsg, "platform") || strings.Contains(errMsg, "architecture") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kind of ugly, but I thought it was worth including. it's also in the command description so potentially removable.

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 24, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 24, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 24, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 24, 2025
cli/app.go Outdated
Flags: []cli.Flag{
&cli.StringFlag{
Name: trainFlagDatasetRoot,
Usage: "path to the dataset root directory (where dataset.jsonl and image files are located). This is where you ran the 'viam dataset export' command from.",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is too confusing.

└── images/
└── cat.jpg
NOTES:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this goes in documentation? I also want to add that if the containers really slow, it could be because the dataset root or training script directory has a bunch of extra files. Apparently mounting volumes can be expensive.

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 25, 2025
@viambot viambot removed the safe to test This pull request is marked safe to test from a trusted zone label Nov 25, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 1, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 1, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 2, 2025
cmd.Stderr = c.App.ErrWriter

printf(c.App.Writer, "WARNING: If this is your first time running training, "+
"it may take a few minutes to download the container image. "+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minutes is a little bit of an understatement :)

@viambot viambot removed the safe to test This pull request is marked safe to test from a trusted zone label Dec 4, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 4, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 5, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 5, 2025
Copy link
Member

@njooma njooma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from SDK perspective, did not review implementation details

Copy link
Member

@allisonschiang allisonschiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Dec 5, 2025
Copy link
Member

@etai-shuchatowitz etai-shuchatowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vpandiarajan20 vpandiarajan20 merged commit 909e878 into viamrobotics:main Dec 10, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test This pull request is marked safe to test from a trusted zone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants