-
Notifications
You must be signed in to change notification settings - Fork 2
Add dataset create command and dataset resource #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add dataset create CLI command for creating datasets from image directories - Add Dataset resource class with create and get methods - Add DatasetResponse type for dataset operations - Add utility functions for archive creation and image file listing - Add tests for dataset operations - Update mock client to support dataset operations Co-Authored-By: Sudeep Pillai <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
⚙️ Control Options:
|
| """ | ||
| client = ctx.obj | ||
|
|
||
| # Verify directory contains images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not Path(directory).is_dir():
typer.echo("...")
typer.Exit(1)
| client = ctx.obj | ||
|
|
||
| # Verify directory contains images | ||
| image_files = list_image_files(directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid having to call another function, and inline the logic directly here.
|
|
||
|
|
||
| @retry(wait=wait_fixed(10), stop=stop_after_attempt(3), reraise=True) | ||
| def create_archive(directory: Union[str, Path]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use create_archive(directory: Path) -> Path
| Raises: | ||
| ValueError: If directory does not exist | ||
| """ | ||
| if isinstance(directory, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid this, since directory is assumed to be Path type
|
|
||
| # Create archive in temp directory | ||
| temp_dir = tempfile.gettempdir() | ||
| archive_path = os.path.join(temp_dir, f"dataset_{int(time.time())}.tar.gz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Path(temp_dir) / f"dataset_..."
Add dataset create command and dataset resource
This PR adds a new
vlmrun dataset createcommand that allows users to create datasets from directories of images. The command creates a tar.gz archive from the image directory, uploads it viaclient.upload_file, and creates a dataset using the uploaded file.Changes
Testing
Implementation Details
The implementation follows the existing patterns in the codebase:
Link to Devin run: https://app.devin.ai/sessions/eb7f1b55162b4e5290253e45cb13ad86