feat: umap by mcembalest · Pull Request #383 · nomic-ai/nomic

mcembalest · 2025-05-02T22:05:38Z

Important

Introduces ProjectionOptions for 2D projection configurations, replacing NomicProjectOptions, and updates related functions and tests.

Behavior:
- Introduces ProjectionOptions class in data_inference.py for 2D projection configurations, replacing NomicProjectOptions.
- Updates map_data() in atlas.py to use ProjectionOptions for projection parameter.
- Modifies create_index() in dataset.py to handle ProjectionOptions.
Tests:
- Adds integration tests in test_atlas_client.py for ProjectionOptions with different models like umap and nomic-project-v1.
- Updates existing tests to use ProjectionOptions instead of NomicProjectOptions.
Misc:
- Fixes error messages and logging in dataset.py related to projection creation.

^{This description was created by}^{for c34c168. You can customize this summary. It will automatically update as commits are pushed.}

nomic/data_inference.py

nomic/dataset.py

nomic/data_inference.py

nomic/dataset.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 7aa378f in 1 minute and 35 seconds. Click for details.

Reviewed 165 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/data_inference.py:60

Draft comment:
New ProjectionOptions class looks good. Consider adding validation (e.g., via a pydantic validator) to restrict allowed values for 'model' (e.g., 'umap', 'nomic-project-v1', 'nomic-project-v2').
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 70% None

2. nomic/dataset.py:1070

Draft comment:
The refactoring to use ProjectionOptions is clean. In the branch handling the Dict type, ensure that if the dict contains both an 'algorithm' key and a 'model' key, the precedence is clearly defined for setting projection_options.model.
Reason this comment was not posted:
Confidence changes required: 60% <= threshold 70% None

3. nomic/data_inference.py:60

Draft comment:
Consider renaming the 'model' field to 'algorithm' (or clearly documenting its dual role) to better reflect its use (e.g., 'umap' vs 'nomic-project-v1/v2').
Reason this comment was not posted:
Comment looked like it was already resolved.

4. nomic/dataset.py:1110

Draft comment:
When converting a dict to ProjectionOptions, consider wrapping the instantiation in a try/except block to provide clearer error messages on invalid parameters.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 70% None

5. nomic/dataset.py:1125

Draft comment:
The assignment of common hyperparameters (like 'n_neighbors' and 'n_epochs') is duplicated in both the 'umap' and 'nomic-project' branches. Consider refactoring to reduce duplication.
Reason this comment was not posted:
Comment looked like it was already resolved.

6. nomic/dataset.py:1139

Draft comment:
Both branches add 'min_dist' to the hyperparameters. Confirm that including 'min_dist' for a non-UMAP (nomic-project) projection is intentional.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 70% None

Workflow ID: wflow_5mCcTiCwdF00bNgX

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 91b9363 in 49 seconds. Click for details.

Reviewed 34 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 9 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/dataset.py:1111

Draft comment:
Formatting change: merged dictionary parameters for ProjectionOptions into one line. No logic change.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

2. nomic/dataset.py:1122

Draft comment:
Condensed initialization of projection_hyperparameters into a single dictionary literal.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

3. nomic/dataset.py:1143

Draft comment:
Removed an extra blank line for improved readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

4. nomic/dataset.py:1109

Draft comment:
Nit: The combined inline parameters in ProjectionOptions are concise; ensure that this style remains consistent if additional parameters are added.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

5. nomic/dataset.py:1122

Draft comment:
Nit: The one-liner dictionary assignment for projection_hyperparameters is clear and more concise.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

6. nomic/dataset.py:1144

Draft comment:
Nit: Removal of extraneous whitespace; no changes needed.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

7. nomic/dataset.py:242

Draft comment:
Typographical error: 'identifer' should be corrected to 'identifier' in the function's docstring.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

8. nomic/dataset.py:214

Draft comment:
Typographical error: 'seperated' should be corrected to 'separated' in the parameter description of _get_dataset_by_slug_identifier.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. nomic/dataset.py:1458

Draft comment:
Typographical error: 'uploda batch of blobs' should be corrected to 'upload batch of blobs' in the comment.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_pYW9EXJ3L4uHPeBO

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed d023fe9 in 50 seconds. Click for details.

Reviewed 25 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/data_inference.py:64

Draft comment:
Docstring changes remove default/auto-inferred details. Consider mentioning that fields default to None (auto-inferred) to aid clarity.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

2. nomic/data_inference.py:62

Draft comment:
Docstring change: The detailed default (auto-inferred) values were removed. Consider including these details (or a reference to the field defaults) to help users understand expected behavior.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 70% None

Workflow ID: wflow_PrWs3xxAt7GOGrT8

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 7eedc3c in 1 minute and 13 seconds. Click for details.

Reviewed 70 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/atlas.py:14

Draft comment:
Combining the multi-line import into a single line is fine since it remains readable. Make sure it fits within PEP8 line length limits.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

2. nomic/data_inference.py:64

Draft comment:
Docstring formatting cleanup looks good. Only minor whitespace changes, no functionality issues.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

3. nomic/atlas.py:14

Draft comment:
Switching from a multiline to a single-line import is concise, but if the list grows further it may hurt readability. Ensure this style aligns with your team’s conventions.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

4. nomic/data_inference.py:62

Draft comment:
Docstring whitespace and formatting adjustments in projection, project, and UMAP options improve consistency without changing functionality.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

5. nomic/atlas.py:45

Draft comment:
Typographical issue: In the id_field parameter description, it reads "This field can be up 36 characters in length." It would be clearer to say "up to 36 characters in length." Please update for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. nomic/atlas.py:163

Draft comment:
Typographical error: The log message in line 163 ends with an extra backtick (`) after 'dataset'. This seems unintentional and should be corrected.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_PM7GOrSlUc3DAvya

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed 6f426fb in 2 minutes and 22 seconds. Click for details.

Reviewed 90 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tests/test_atlas_client.py:619

Draft comment:
Ensure consistent cleanup for datasets using try/finally blocks in test_map_text_explicit_projection_options.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 70% 1. The test file has many other similar tests that don't use try/finally blocks for cleanup 2. The delete() calls are already at the end of each test section 3. If an assertion fails, pytest will stop the test anyway 4. The comment is asking for defensive programming but isn't pointing out an actual issue 5. Looking at the rest of the file, there are only a few places where try/finally is used, and those are in specific error test cases The comment has a valid point about resource cleanup being important. If an assertion fails mid-test, the dataset might not get cleaned up. While resource cleanup is important, this test follows the established pattern in the codebase. The risk of leaked resources is low since pytest handles test failures, and the codebase deliberately uses try/finally only in specific error test cases. Delete the comment. It suggests a change that goes against the established patterns in the codebase and isn't addressing a real problem.

2. tests/test_atlas_client.py:582

Draft comment:
Duplicate test logic across similar projection tests; consider abstracting common setup and teardown to reduce duplication.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

3. tests/test_atlas_client.py:14

Draft comment:
Updated import to include ProjectionOptions. Ensure all explicit projection tests are using the proper class for consistency.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

4. tests/test_atlas_client.py:572

Draft comment:
In test_map_embeddings_explicit_projection_options, consider wrapping dataset.delete() in a try/finally block to ensure cleanup even if assertions fail.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

5. tests/test_atlas_client.py:611

Draft comment:
Similarly, in test_map_text_explicit_projection_options, wrapping dataset.delete() in a try/finally construct can help guarantee proper cleanup on test failure.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

6. tests/test_atlas_client.py:572

Draft comment:
The explicit projection tests have similar setup/teardown patterns. Consider refactoring common logic into helper functions to reduce duplication and improve maintainability.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 70% None

7. tests/test_atlas_client.py:67

Draft comment:
Typo: In the comment, 'test to long ids' should be corrected to 'test too long ids' for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

8. tests/test_atlas_client.py:376

Draft comment:
Typo: The function name 'test_contrastors_access' may be a misspelling. Consider renaming it to something like 'test_credentials_access' to more accurately reflect its purpose.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_NMR5aPXaF9ugiYGz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

tests/test_atlas_client.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 2901d06 in 53 seconds. Click for details.

Reviewed 47 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 9 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tests/test_atlas_client.py:586

Draft comment:
Whitespace adjustment: The inline ProjectionOptions now has an extra space at the comment. This is a minor cosmetic change, but consistency in formatting is appreciated.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

2. tests/test_atlas_client.py:600

Draft comment:
Reformatting ProjectionOptions arguments to multiple lines for the nomic-project test improves readability. Ensure consistency with similar tests.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

3. tests/test_atlas_client.py:623

Draft comment:
Minor whitespace change for UMAP ProjectionOptions in text mapping test; no functional impact.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

4. tests/test_atlas_client.py:638

Draft comment:
Splitting ProjectionOptions parameters across multiple lines for the nomic-project in text mapping test enhances clarity. Ensure consistent formatting across tests.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

5. tests/test_atlas_client.py:586

Draft comment:
Minor whitespace adjustment: extra space before the inline comment improves readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

6. tests/test_atlas_client.py:600

Draft comment:
Using multi-line formatting for ProjectionOptions (nomic-project) enhances readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

7. tests/test_atlas_client.py:625

Draft comment:
Consistent inline formatting: extra space before the comment for the UMAP projection improves clarity.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

8. tests/test_atlas_client.py:640

Draft comment:
Breaking ProjectionOptions parameters into multiple lines for the nomic-project in text mapping tests significantly improves readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

9. tests/test_atlas_client.py:66

Draft comment:
Typo detected in comment: '# test to long ids'. Consider changing it to '# test too long ids' to clearly convey that the test checks for IDs that are excessively long.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_SrRDTDeZNYNRkLhk

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

wilsonjr

I think we can simplify things and keep only ProjectionOptions

tests/test_atlas_client.py

nomic/dataset.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 8b964f1 in 2 minutes and 26 seconds. Click for details.

Reviewed 362 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 11 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tests/test_atlas_client.py:447

Draft comment:
Consider centralizing the common assertions and mock setups in helper functions to avoid duplication across unit test functions.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

2. tests/test_atlas_client.py:528

Draft comment:
The naming convention between 'integration_test_' and 'unit_test_' is inconsistent. Consider a consistent pattern or documenting the difference.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

3. tests/test_atlas_client.py:602

Draft comment:
Verify that patching '_wait_for_projection_completion' using its full module path is correct. If this function might move, consider refactoring into a helper module.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

4. tests/test_atlas_client.py:757

Draft comment:
The test 'integration_test_map_embeddings_legacy_dict_with_explicit_algorithm' repeats similar code as other tests. Consider reusing shared test logic.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 70% None

5. tests/test_atlas_client.py:32

Draft comment:
Renamed test functions (e.g. from 'test_' to 'integration_test_') to differentiate integration tests. Ensure this naming convention is documented so that integration and unit tests are clearly distinguished.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 70% This comment is asking for documentation of a naming convention, which is not strictly about code changes. It's asking the author to "ensure" something, which violates our rules. The naming convention change is clear from the code itself. While documentation is good, this kind of meta-comment about documentation doesn't require immediate action and could be handled separately from the PR. Documentation of test conventions is important for maintainability. The change affects multiple files and new developers might be confused about when to use which prefix. While documentation is valuable, our rules explicitly state not to make comments that ask authors to "ensure" things or that are purely informative. The code changes themselves are clear and consistent. The comment should be deleted as it violates our rules about not making purely informative comments or asking authors to "ensure" things.

6. tests/test_atlas_client.py:447

Draft comment:
Several unit tests repeat similar setup and assertions with mock datasets. Consider refactoring common patterns into helper functions or fixtures to follow DRY principles and improve maintainability.
Reason this comment was not posted:
Comment was on unchanged code.

7. tests/test_atlas_client.py:447

Draft comment:
The patch decorators for unit tests (e.g. @patch on '_wait_for_projection_completion' and 'nomic.atlas.map_data') are applied in a stack. Adding a brief comment on the expected parameter order in the test functions would help future maintainers avoid confusion.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 70% While the parameter order for stacked patches can be confusing to newcomers, this is standard Python mock behavior that is well documented in the Python unittest.mock documentation. The code follows standard patterns and the parameter order is clear from the decorator order. Adding a comment about this would be documenting something that is part of the Python language rather than specific to this code. The comment raises a valid point about a potentially confusing aspect of Python mocking that could trip up developers unfamiliar with mock decorators. However, this is standard Python behavior that is already well documented in the Python documentation. Adding comments about language features in every file where they're used would be redundant and noisy. The comment should be deleted as it requests documentation of standard Python language behavior rather than anything specific to this codebase.

8. tests/test_atlas_client.py:387

Draft comment:
The _wait_for_projection_completion function uses a fixed 10‐second sleep interval, which in integration tests could slow down overall test execution. Consider reducing the sleep duration or mocking this function in integration tests to speed up feedback.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. tests/test_atlas_client.py:317

Draft comment:
Dataset cleanup via dataset.delete() is done at the end of tests. To ensure resources are always cleaned up even if an assertion fails, consider using try/finally blocks or test fixtures for teardown.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

10. tests/test_atlas_client.py:67

Draft comment:
Typographical error: In the comment '# test to long ids', 'to long' should be changed to 'too long'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. tests/test_atlas_client.py:377

Draft comment:
Typographical suggestion: The function name 'test_contrastors_access' might be misspelled. Consider verifying if 'contrastors' is intended or if it should be something like 'test_credentials_access' or similar.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_d84i0qBXHICkRDfP

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 4553d5d in 1 minute and 17 seconds. Click for details.

Reviewed 468 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 13 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/dataset.py:1109

Draft comment:
Good clean-up: using model_dump() simplifies hyperparameter extraction. Ensure that ProjectionOptions.model always gets a default value so that the fallback 'nomic-project' is rarely used.
Reason this comment was not posted:
Confidence changes required: 25% <= threshold 70% None

2. nomic/dataset.py:1189

Draft comment:
Ensure consistent use of the 'model' field in build_template for both embedding and text/image modalities.
Reason this comment was not posted:
Confidence changes required: 20% <= threshold 70% None

3. tests/test_atlas_client.py:433

Draft comment:
Renamed test function for explicit UMAP. Naming is now clear and consistent with updated ProjectionOptions usage.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

4. tests/test_atlas_client.py:447

Draft comment:
Using patches and renaming mocks improves clarity. Confirm your mocks reflect the new ProjectionOptions and not legacy UMAPOptions.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 70% None

5. tests/test_atlas_client.py:740

Draft comment:
Legacy dict tests updated to pass 'model' key instead of 'algorithm'. Ensure backward compatibility if needed.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 70% None

6. tests/test_atlas_client.py:794

Draft comment:
Explicit projection options test now uses ProjectionOptions with model explicitly set. This is clear; just document the expected default behavior if model is empty.
Reason this comment was not posted:
Confidence changes required: 10% <= threshold 70% None

7. tests/test_atlas_client.py:68

Draft comment:
The assert checking for a datetime (assert isinstance(dataset.created_timestamp, datetime)) is placed inside a pytest.raises block and will never be reached if the exception is raised. Consider removing it to clearly indicate that an exception is expected.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 70% None

8. tests/test_atlas_client.py:417

Draft comment:
The _wait_for_projection_completion function uses a fixed sleep of 10 seconds in its polling loop. Consider parameterizing or reducing the sleep interval for test environments to speed up test execution.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 70% None

9. tests/test_atlas_client.py:121

Draft comment:
Instead of concatenating a datetime with a string to simulate an invalid ISO timestamp, consider explicitly setting an invalid timestamp string. This may improve clarity of the test's intent.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

10. nomic/dataset.py:214

Draft comment:
Typo found: In the _get_dataset_by_slug_identifier docstring, 'seperated' should be corrected to 'separated'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. nomic/dataset.py:1429

Draft comment:
Typo found: In the comment in the _add_blobs method, 'uploda batch of blobs' should be corrected to 'upload batch of blobs'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. tests/test_atlas_client.py:67

Draft comment:
Typo in comment: 'test to long ids' should be 'test too long ids' for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

13. tests/test_atlas_client.py:377

Draft comment:
Typo in function name: 'test_contrastors_access' appears to be misspelled. Consider renaming it (e.g. to 'test_credentials_access') for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_9JS7dzeQR2yKdR0h

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed db1bbee in 2 minutes and 5 seconds. Click for details.

Reviewed 17 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/dataset.py:1109

Draft comment:
Bug: Removing the conditional check means that if projection is False (so that projection_options is left undefined), calling projection_options.model_dump() will raise an error. Restore the check to handle the False case.
Reason this comment was not posted:
Marked as duplicate.

2. nomic/dataset.py:212

Draft comment:
Typo found: The word 'seperated' should be changed to 'separated' in the docstring of _get_dataset_by_slug_identifier.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. nomic/dataset.py:240

Draft comment:
Typo found: The parameter name 'identifer' should be corrected to 'identifier' in the docstring of is_valid_dataset_identifier.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. nomic/dataset.py:1426

Draft comment:
Typo found: The comment 'uploda batch of blobs' should be corrected to 'upload batch of blobs'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_718rUTCWZ6ZiXLaQ

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

nomic/dataset.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed f1c0246 in 1 minute and 4 seconds. Click for details.

Reviewed 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/dataset.py:1109

Draft comment:
Using a default empty dict with a check for projection_options is safer. However, it's worth ensuring that projection_options is truly None when projection is False, and not an unexpected falsy value. This change prevents potential attribute errors when calling model_dump().
Reason this comment was not posted:
Comment looked like it was already resolved.

2. nomic/dataset.py:1109

Draft comment:
Good defensive check: using a conditional to only call model_dump() if projection_options isn't None prevents a potential AttributeError. Ensure that an empty dict is the intended default downstream if no projection options are provided.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 70% The comment is asking the author to ensure that an empty dictionary is the intended default. This is a request for confirmation of intention, which is not allowed by the rules. The comment does not provide a specific code suggestion or ask for a test to be written, nor does it address any of the additional rules provided.

3. nomic/dataset.py:214

Draft comment:
Typo: In the docstring for _get_dataset_by_slug_identifier, the word 'seperated' should be corrected to 'separated'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. nomic/dataset.py:243

Draft comment:
Typo: In the docstring for is_valid_dataset_identifier, 'identifer' should be corrected to 'identifier'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_v8xYneGTIqfFdjte

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed 1824306 in 2 minutes and 24 seconds. Click for details.

Reviewed 162 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 16 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/atlas.py:32

Draft comment:
Changing the type for 'projection' to only accept Dict or ProjectionOptions (default None) is cleaner. Make sure this change is communicated in the release notes.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 70% This comment is asking the PR author to ensure that a change is communicated in the release notes. This falls under the rule of not asking the PR author to update the PR description or similar documentation tasks. Therefore, this comment should be removed.

2. nomic/data_inference.py:89

Draft comment:
Removal of UMAPOptions and NomicProjectOptions classes simplifies the projection option space. Ensure client docs and examples are updated accordingly.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 70% The comment is asking the PR author to ensure that documentation and examples are updated, which falls under the rule of not asking the author to ensure things are updated or tested. This is not a specific code suggestion or a request for a test, so it should be removed.

3. nomic/dataset.py:1185

Draft comment:
Default projection algorithm updated to 'nomic-project-v2' now. This is good, but verify that backend supports it for all modalities consistently.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 70% This comment is asking the PR author to verify something, which violates the rule against asking for confirmation or verification. It doesn't provide a specific suggestion or point out a specific issue in the code.

4. nomic/settings.py:7

Draft comment:
Removal of DEFAULT_PROJECTION_MODEL (set previously to 'nomic-project-v1') seems intentional. Confirm that clients no longer rely on this constant.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. tests/test_atlas_client.py:767

Draft comment:
Test updates to use explicit projection options (e.g., model 'umap' and 'nomic-project-v2') are clear. Confirm that these tests cover legacy behavior and new projection defaults.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. nomic/dataset.py:1185

Draft comment:
The fallback projection algorithm now defaults to 'nomic-project-v2'. Confirm that this is intentional for backward compatibility, as users familiar with the old default ('nomic-project') might be affected.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 70% This comment is asking for confirmation of an intentional change rather than suggesting a specific code change. It's asking the author to "verify" or "confirm" something, which violates the rule about not asking authors to confirm their intentions. The change appears to be intentional since it's done consistently in two places. Additionally, the comment is speculative about backward compatibility concerns rather than pointing out a definite issue. The comment does raise a valid point about backward compatibility which could be important for users of the library. Perhaps there is a real issue here that deserves attention. While backward compatibility is important, the comment doesn't identify a specific issue that needs fixing - it just asks for confirmation. If there was a real backward compatibility problem, the comment should state it directly and suggest a specific solution. The comment should be deleted because it merely asks for confirmation rather than identifying a specific issue that needs to be fixed. If there are backward compatibility concerns, they should be raised as concrete issues with proposed solutions.

7. nomic/dataset.py:1180

Draft comment:
The logic handling projection options is duplicated across different modality branches. Consider refactoring this into a helper function to reduce repetition and enhance maintainability.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

8. nomic/dataset.py:1624

Draft comment:
API requests (e.g. in _add_data) lack retry logic and robust error handling. Consider adding retries (or using a requests adapter with retry) to handle transient network errors.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. tests/test_atlas_client.py:50

Draft comment:
Several test cleanup blocks catch broad exceptions (e.g. 'except BaseException'). It's better to catch more specific exceptions or use fixtures/teardown to ensure proper cleanup.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 70% None

10. nomic/atlas.py:163

Draft comment:
It looks like there is an extra backtick at the end of the log message on line 163 ({dataset.identifier}: Data upload succeeded to dataset`). Please remove the stray backtick to ensure the message is formatted correctly.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. nomic/atlas.py:250

Draft comment:
In the map_text docstring on line 250, the phrase 'the text your want to map' should be corrected to 'the text you want to map' for proper grammar and clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. nomic/dataset.py:214

Draft comment:
Typo: In the docstring for _get_dataset_by_slug_identifier, the word 'seperated' should be corrected to 'separated'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

13. nomic/dataset.py:1430

Draft comment:
Typo: In a comment within the _add_blobs function, 'uploda' should be corrected to 'upload'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

14. nomic/dataset.py:243

Draft comment:
Typo: In the arguments section of the docstring for is_valid_dataset_identifier, 'identifer' should be corrected to 'identifier'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

15. tests/test_atlas_client.py:67

Draft comment:
Typo in comment: 'test to long ids' should be 'test too long ids'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

16. tests/test_atlas_client.py:377

Draft comment:
Possible typo in function name 'test_contrastors_access'. Consider renaming it to something like 'test_credentials_access' (if that's what was intended) to improve clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_ESkDM0VK7bdZ6wUZ

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

nomic/atlas.py

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 3134612 in 3 minutes and 5 seconds. Click for details.

Reviewed 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/atlas.py:32

Draft comment:
Good update: marking 'projection' as Optional makes it explicit that None is allowed, ensuring consistency with similar parameters. Ensure that the docstring is updated accordingly if further clarification is needed.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 70% The comment is purely informative and suggests updating the docstring, which is against the rules. It doesn't provide a specific code suggestion or highlight a potential issue.

2. nomic/atlas.py:32

Draft comment:
Explicitly marking 'projection' as Optional improves clarity. Consider updating the corresponding docstring to indicate that passing None is allowed.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 70% The comment points out a real inconsistency between the type hint and documentation. However, this seems like a minor documentation issue that doesn't affect functionality. The Optional nature is already clear from the type hint and default value. The rules say not to make purely informative comments that don't require code changes. The comment identifies a real documentation gap that could help future developers understand the API better. Documentation improvements can be considered code changes. While documentation is important, this case is obvious from the type hint and default value. The current docstring is adequate even if not completely explicit. The comment should be deleted as it is purely informative and doesn't highlight a significant issue that requires changes.

3. nomic/atlas.py:45

Draft comment:
Typo: In the map_data docstring for 'id_field', please change 'up 36 characters in length' to 'up to 36 characters in length' for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. nomic/atlas.py:203

Draft comment:
Typo: In the map_embeddings docstring for 'id_field', please update 'up 36 characters in length' to 'up to 36 characters in length'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. nomic/atlas.py:250

Draft comment:
Typo: In the map_text docstring for 'indexed_field', update the text from "The name the data field containing the text your want to map." to "The name of the data field containing the text you want to map."
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. nomic/atlas.py:163

Draft comment:
Typo: The log message on line 163 contains an unmatched backtick () at the end ("Data upload succeeded to dataset"); please remove the trailing backtick or balance the backticks correctly.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_fPDa4jPQKuWByCnO

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 56c9608 in 53 seconds. Click for details.

Reviewed 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. nomic/atlas.py:147

Draft comment:
Good conversion, but ensure that to_dict(orient='records') produces the expected data structure for all edge cases. Consider adding a comment or tests to document this conversion behavior.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 70% None

2. nomic/atlas.py:147

Draft comment:
Converting a DataFrame to a list of dicts in the try block is useful, but consider centralizing this conversion (e.g., in a helper) to ensure consistency across modalities and avoid duplicating logic.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 70% None

3. nomic/atlas.py:147

Draft comment:
Update the function documentation to clearly state that DataFrame inputs are automatically converted to a list of records.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 70% None

4. nomic/atlas.py:165

Draft comment:
The log message on line 165 ends with an extraneous backtick (`) after the dataset identifier. Consider removing the trailing backtick to improve clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_ZREc5Hcuc5jioeyq

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

nomic/dataset.py

wilsonjr

LGTM, thanks

This comment was marked as spam.

Sign in to view

feat: umap

3f038bf

mcembalest force-pushed the umap branch from 26d5b0b to 3f038bf Compare May 7, 2025 01:35

This comment was marked as resolved.

Sign in to view

fix: bad code

6448f10

This comment was marked as resolved.

Sign in to view

chore: tests

a1730d0

This comment was marked as spam.

Sign in to view

chore: lint

07785c5

This comment was marked as spam.

Sign in to view

mcembalest changed the title ~~[draft] add umap to nomic project options~~ feat: umap May 7, 2025

chore: remove bad code

94ecc6d

This comment was marked as resolved.

Sign in to view

chore: remove bad code

721d22c

This comment was marked as resolved.

Sign in to view

chore: lint

9e91687

This comment was marked as spam.

Sign in to view

chore: refactor

2076138

This comment was marked as spam.

Sign in to view

wilsonjr suggested changes May 7, 2025

View reviewed changes

nomic/data_inference.py Outdated Show resolved Hide resolved

nomic/dataset.py Outdated Show resolved Hide resolved

wilsonjr reviewed May 7, 2025

View reviewed changes

nomic/dataset.py Outdated Show resolved Hide resolved

wilsonjr suggested changes May 7, 2025

View reviewed changes

nomic/data_inference.py Show resolved Hide resolved

nomic/dataset.py Outdated Show resolved Hide resolved

fix: projectionoptions

7aa378f

ellipsis-dev bot reviewed May 7, 2025

View reviewed changes

chore: lint

91b9363

ellipsis-dev bot reviewed May 7, 2025

View reviewed changes

fix: typing

bce2f57

ellipsis-dev bot reviewed May 8, 2025

View reviewed changes

chore: linting

7eedc3c

ellipsis-dev bot reviewed May 8, 2025

View reviewed changes

mcembalest added 2 commits May 8, 2025 01:50

chore: test projection options

6f426fb

chore: lint

2901d06

ellipsis-dev bot reviewed May 8, 2025

View reviewed changes

tests/test_atlas_client.py Show resolved Hide resolved

ellipsis-dev bot reviewed May 8, 2025

View reviewed changes

mcembalest requested a review from wilsonjr May 8, 2025 05:55

wilsonjr suggested changes May 8, 2025

View reviewed changes

chore: unit vs integration tests

8b964f1