Skip to content

Conversation

@danielbotros
Copy link
Member

@danielbotros danielbotros commented Oct 16, 2025

Description

DATA-4707 Add CLI Support for Custom Index Operations

  • Adds CLI for custom indexes
    • viam data index create --org-id="11111111-2222-3333-4444-555555555555" --collection-type=<"hot_store" || pipeline_sink"> [OPTIONAL] --pipeline_name="pipeline1" --index-path=index_spec.json for creating custom indexes
    • viam data index delete --org-id="11111111-2222-3333-4444-555555555555" --collection-type=<"hot_store" || pipeline_sink"> [OPTIONAL] --pipeline_name="pipeline1" --index-name="index_name" for creating deleting indexes
      • Has a confirmation step before deleting an index
    • viam data index list --org-id="11111111-2222-3333-4444-555555555555" --collection-type=<"hot_store" || pipeline_sink"> [OPTIONAL] --pipeline_name="pipeline1" for listing all custom indexes
  • This does not line up with exactly with the scope for the CLI usage, but viam data create index <args> for example would require creating a data subcommand create with a single subcommand index which doesn't structurally make sense.
  • Will validate that arg --collection-type is one of "hot_store or pipeline_sink and if pipeline_sink that arg --pipeline_name is not ""
  • Updates RDK API to minimum version needed to support custom indexes

Testing

I ran my local version of the CLI to confirm usage behaves how I expect. Examples for each command:

  • viam data
NAME:
   viam data - work with data

USAGE:
   viam data <command> [command options]

COMMANDS:
   export    download data from Viam cloud
   delete    delete data from Viam cloud
   database  interact with a MongoDB Atlas Data Federation instance
   tag       tag binary data by filter or ids
   index     manage indexes for hot data stores and pipeline sinks

OPTIONS:
   --help, -h  show help (default: false)
  • viam data index
NAME:
   viam data index - manage indexes for hot data stores and pipeline sinks

USAGE:
   viam data index <command> [command options]

COMMANDS:
   create  create an index for a data collection
   delete  delete an index from a data collection
   list    list indexes for a data collection

OPTIONS:
   --help, -h  show help (default: false)
  • viam data index create
NAME:
   viam data index create - create an index for a data collection

USAGE:
   viam data index create --org-id=<org-id> --collection-type=<collection-type> --index-path=<index-path> [other options]

OPTIONS:
   --collection-type value  collection type. value(s) can be: [hot_store, pipeline_sink]
   --index-path value       path to index specification JSON file
   --org-id value           org ID of the data collection
   --pipeline-name value    name of the pipeline associated with the index when collection type is 'pipeline_sink'
   
Error: Required flags "org-id, collection-type, index-path" not set
exit status 1
  • viam data index delete
NAME:
   viam data index delete - delete an index from a data collection

USAGE:
   viam data index delete --org-id=<org-id> --collection-type=<collection-type> --index-name=<index-name> [other options]

OPTIONS:
   --collection-type value  collection type. value(s) can be: [hot_store, pipeline_sink]
   --index-name value       name of the index to delete
   --org-id value           org ID of the data collection
   --pipeline-name value    name of the pipeline associated with the index when collection type is 'pipeline_sink'
   
Error: Required flags "org-id, collection-type, index-name" not set
exit status 1

Confirmation step:

Are you sure you want to delete index (name: a)? This action cannot be undone. (y/N): 
  • viam data index list
NAME:
   viam data index list - list indexes for a data collection

USAGE:
   viam data index list --org-id=<org-id> --collection-type=<collection-type>

OPTIONS:
   --collection-type value  collection type. value(s) can be: [hot_store, pipeline_sink]
   --org-id value           org ID of the data collection
   
Error: Required flags "org-id, collection-type" not set
exit status 1

See code for flow / errors returned when providing malformed commands.

I also created a small test suite for helper functions for validating args and reading index specs from JSON files.

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Oct 16, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 16, 2025
@danielbotros danielbotros changed the title Add CLI for custom indexes [DATA-4707] Add CLI for Custom Indexes Oct 16, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 16, 2025
cli/app.go Outdated
},
{
Name: "index",
Usage: "manage indexes for hot data stores and pipeline sinks",
Copy link
Member Author

@danielbotros danielbotros Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to hear opinions on better usage messages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one let's do manage indexes for hot data and pipeline sink collections

unspecifiedCollectionType = pb.IndexableCollection_INDEXABLE_COLLECTION_UNSPECIFIED

hotStoreCollectionTypeStr = "hot_store"
pipelineSinkCollectionTypeStr = "pipeline_sink"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pipeline" might be better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gloriacai01

How do we refer to pipelines in customer facing docs/features? Are they familiar with pipeline_sinks? Or just the general concept of pipelines and collections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think pipeline sink makes sense, esp since we reference "pipeline sink" rather than "pipeline" as a data source type users can query from. also nit but i wonder if it should be "hot storage" instead of "hot store"? i know the api says hot store but for tabular data source type, we reference it as hot storage
@RobertXu we've talked about this, but at least for customer facing features these could be aligned? wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, pipeline sink and hot storage both sound good to me to keep things consistent for customers!

Comment on lines +157 to +158
printf(c.App.Writer, "- Name: %s\n", index.IndexName)
printf(c.App.Writer, " Spec: %s\n", index.IndexSpec)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we wanna print more info here like what the collection type / pipeline name for each index is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, name and spec should be fine for now. The collection type / pipeline name will be the same for each index, and the customer will have just entered those in the request, so those fields would make the response longer without adding new info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 16, 2025
}

// DeleteCustomIndexConfirmation prompts the user for confirmation before deleting a custom index.
func DeleteCustomIndexConfirmation(c *cli.Context, args deleteCustomIndexArgs) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be okay to skip a confirmation step here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I didn't know we had this pattern.

I think in this case, it's ok to skip the confirmation step. I don't think people will accidentally delete an index since the action requires quite specific params.

There are 2 risks with deleting an index

  1. worse query performance- we can easily add back the index if this proves to be a problem
  2. they delete a unique index which allows duplicate data to appear

Case 1 is pretty manageable, Case 2 is not great, however, we don't perform any validation when they create a unique index, so it makes sense to not do the same when deleting (Note: I don't love the lack of validation, but this is what came up during the doc review process).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good seems not to important to validate this

@danielbotros danielbotros marked this pull request as ready for review October 16, 2025 21:52
@danielbotros danielbotros requested a review from a team as a code owner October 16, 2025 21:52
}

result := make([][]byte, len(rawMessages))
for i, raw := range rawMessages {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'll need to specifically parse out the key and index properties from the JSON file.

Here's an example index spec JSON from the scope document:

{
  "key": {
    "resource_name": 1,
    "method_name": 1
  },
 "options": {
    "sparse": true
  }
}

We have to do this weird conversion b/c people used to writing MongoDB index specs use JSON objects, but MongoDB's index creation API requires BSON.D slices to preserve key ordering for compound indexes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you're right I did this and then looked at the SDK bug and now realizing I did the same thing lol

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 20, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 20, 2025
unspecifiedCollectionType = pb.IndexableCollection_INDEXABLE_COLLECTION_UNSPECIFIED

hotStoreCollectionTypeStr = "hot_store"
pipelineSinkCollectionTypeStr = "pipeline_sink"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think pipeline sink makes sense, esp since we reference "pipeline sink" rather than "pipeline" as a data source type users can query from. also nit but i wonder if it should be "hot storage" instead of "hot store"? i know the api says hot store but for tabular data source type, we reference it as hot storage
@RobertXu we've talked about this, but at least for customer facing features these could be aligned? wdyt?

pipelineSinkCollectionType = pb.IndexableCollection_INDEXABLE_COLLECTION_PIPELINE_SINK
unspecifiedCollectionType = pb.IndexableCollection_INDEXABLE_COLLECTION_UNSPECIFIED

hotStoreCollectionTypeStr = "hot_store"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: i dont remember what the cli convention is for strings we accept but it seems like the flags we have are normally hyphenated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems right

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 22, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 22, 2025
Copy link
Member

@gloriacai01 gloriacai01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! LGTM 🐐

@danielbotros danielbotros reopened this Oct 28, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 28, 2025
Copy link
Member

@stuqdog stuqdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only skimmed the data side of things but looks good from an SDK perspective!

@danielbotros danielbotros merged commit 1d47a76 into viamrobotics:main Oct 29, 2025
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test This pull request is marked safe to test from a trusted zone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants