Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

Add dataset create command and dataset resource

This PR adds a new vlmrun dataset create command that allows users to create datasets from directories of images. The command creates a tar.gz archive from the image directory, uploads it via client.upload_file, and creates a dataset using the uploaded file.

Changes

  • Add dataset create CLI command for creating datasets from image directories
  • Add Dataset resource class with create and get methods
  • Add DatasetResponse type for dataset operations
  • Add utility functions for archive creation and image file listing
  • Add tests for dataset operations
  • Update mock client to support dataset operations

Testing

  • Added unit tests for dataset operations
  • Tested dataset creation with invalid dataset types
  • Verified proper cleanup of temporary archive files
  • Added type checking for all new methods and classes
  • Added docstrings and type hints for all new functionality

Implementation Details

The implementation follows the existing patterns in the codebase:

  • Uses typer for CLI implementation
  • Follows resource-based client architecture
  • Uses dataclasses for type-safe responses
  • Includes proper error handling and validation
  • Maintains backward compatibility

Link to Devin run: https://app.devin.ai/sessions/eb7f1b55162b4e5290253e45cb13ad86

- Add dataset create CLI command for creating datasets from image directories
- Add Dataset resource class with create and get methods
- Add DatasetResponse type for dataset operations
- Add utility functions for archive creation and image file listing
- Add tests for dataset operations
- Update mock client to support dataset operations

Co-Authored-By: Sudeep Pillai <[email protected]>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add "(aside)" to your comment to have me ignore it.
  • Look at CI failures and help fix them

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

"""
client = ctx.obj

# Verify directory contains images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not Path(directory).is_dir(): 
   typer.echo("...")
   typer.Exit(1)

client = ctx.obj

# Verify directory contains images
image_files = list_image_files(directory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid having to call another function, and inline the logic directly here.



@retry(wait=wait_fixed(10), stop=stop_after_attempt(3), reraise=True)
def create_archive(directory: Union[str, Path]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use create_archive(directory: Path) -> Path

Raises:
ValueError: If directory does not exist
"""
if isinstance(directory, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid this, since directory is assumed to be Path type


# Create archive in temp directory
temp_dir = tempfile.gettempdir()
archive_path = os.path.join(temp_dir, f"dataset_{int(time.time())}.tar.gz")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Path(temp_dir) / f"dataset_..."

@spillai spillai merged commit 5f1514d into main Jan 16, 2025
3 checks passed
@spillai spillai deleted the devin/1737013552-add-dataset-create-command branch January 23, 2025 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants