feat: Add Support ChromaDB #622

dm1tryG · 2025-03-08T10:35:45Z

Description

Add Support for ChromaDB

Summary

This PR adds support for ChromaDB as a vector database option in the Cognee application. ChromaDB is a modern, open-source embedding database designed for AI applications.

Changes

Created a new ChromaDBAdapter implementation for vector database operations
Added comprehensive test suite for ChromaDB functionality
Updated docker-compose.yml to include ChromaDB service
Modified environment configuration to support ChromaDB settings
Updated vector engine creation logic to support ChromaDB as an option

Technical Details

Implemented ChromaDBAdapter.py (347 lines) with full CRUD operations for vector data
Created test suite (test_chromadb.py) with 171 lines of test coverage
Updated vector engine creation process to dynamically select ChromaDB when configured
Modified settings router to accommodate new database option
Updated environment template with ChromaDB configuration options

Docker Changes

Added ChromaDB service to docker-compose.yml with appropriate configuration

This PR enhances Cognee's flexibility by providing an alternative vector database option, allowing users to choose the most appropriate database for their specific use case.

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Tested with UI + tests.

Summary by CodeRabbit

New Features
- Expanded vector database integration by adding support for Chromadb, enabling enhanced data management and search functionalities.
Tests
- Added automated tests to validate the Chromadb integration and related operations.
Chores
- Updated configuration guidance and dependency management to include Chromadb.
- Provided an optional container deployment template for Chromadb.
- Added a new entry to ignore the .chromadb_data/ directory in version control.
- Introduced a new GitHub Actions workflow for testing Chromadb integration.

coderabbitai · 2025-03-08T10:35:52Z

Walkthrough

This pull request adds support for the new vector database provider "chromadb." It updates configuration files and provider options in the settings, introduces a new adapter implementation for interacting with ChromaDB, and adjusts the engine creation logic to conditionally instantiate the new adapter. Additionally, tests for ChromaDB functionality are provided along with a Docker service configuration and a new dependency declaration.

Changes

File(s)	Change Summary
`.env.template`, `.gitignore`, `cognee/.../get_settings_router.py`	Updated configuration: expanded provider comment to include "chromadb", revised type hints by replacing "qdrant" with "chromadb", and added an entry to ignore the `.chromadb_data/` directory.
`cognee/.../ChromaDBAdapter.py`, `cognee/.../create_vector_engine.py`	Introduced "chromadb" support by adding a new adapter (`ChromaDBAdapter`) that implements the VectorDBInterface and updating the vector engine creation function to import and instantiate this adapter when the provider is set to "chromadb".
`cognee/.../test_chromadb.py`, `docker-compose.yml`, `pyproject.toml`	Added tests for chromadb functionality, included a Docker service configuration for ChromaDB, and declared a new dependency (`chromadb = "^0.6.0"`) in the project configuration.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant E as create_vector_engine
    participant A as ChromaDBAdapter

    C->>E: Request vector engine (provider="chromadb")
    E->>E: Validate provider and import "chromadb" module
    E-->>C: Raise ImportError (if module missing) or proceed
    E->>A: Instantiate ChromaDBAdapter with config parameters
    A-->>E: Return adapter instance
    E-->>C: Return ChromaDBAdapter instance

Possibly related PRs

Feat/cog 1365 unify retrievers #572: The changes in this PR involve updating the VECTOR_DB_PROVIDER variable and modifying the provider attribute in the VectorDBConfigInputDTO class to include "chromadb," which relates to the integration of "chromadb" into the codebase.
feat: codegraph improvements and new CODE search [COG-1351] #581: This PR also introduces support for "chromadb" in various contexts, including the create_vector_engine function, affecting the same functionality related to vector database providers.
feat: externalize chunkers [cog-1354] #547: This PR involves modifications to the handling of vector database providers, specifically integrating "chromadb" into the codebase.

Suggested reviewers

borisarzentar
Vasilije1990

Poem

I’m a nimble rabbit with a coding flair,
Hopping through changes with a joyful air.
Chromadb joins the garden so bright,
New adapters and tests, a delightful sight.
In configs and code, my carrots align—
CodeRabbit cheers as our changes shine!
🐇🥕

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)

We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
- To enable this feature, set early_access to true under in the settings.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

gitguardian · 2025-03-09T19:50:49Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (9)

pyproject.toml (1)
60-60: Consider adding ChromaDB to extras section.

The dependency is correctly added, but unlike other vector database providers (qdrant, weaviate, etc.), chromadb hasn't been added to the [tool.poetry.extras] section. This would allow users to selectively install this dependency.
[tool.poetry.extras]
filesystem = ["s3fs", "botocore"]
weaviate = ["weaviate-client"]
qdrant = ["qdrant-client"]
neo4j = ["neo4j"]
postgres = ["psycopg2", "pgvector", "asyncpg"]
notebook = ["notebook", "ipykernel", "overrides", "ipywidgets", "jupyterlab", "jupyterlab_widgets", "jupyterlab-server", "jupyterlab-git"]
langchain = ["langsmith", "langchain_text_splitters"]
llama-index = ["llama-index-core"]
gemini = ["google-generativeai"]
huggingface = ["transformers"]
ollama = ["transformers"]
mistral = ["mistral-common"]
deepeval = ["deepeval"]
posthog = ["posthog"]
falkordb = ["falkordb"]
groq = ["groq"]
milvus = ["pymilvus"]
docs = ["unstructured"]
codegraph = ["fastembed", "transformers", "tree-sitter", "tree-sitter-python"]
evals = ["plotly", "gdown"]
gui = ["pyside6", "qasync"]
graphiti = ["graphiti-core"]
+chromadb = ["chromadb"]
cognee/infrastructure/databases/vector/create_vector_engine.py (3)
84-99: ChromaDB implementation looks good but has extra whitespace.

The ChromaDB integration follows the established pattern of other vector database providers and includes proper error handling. However, there's an extra blank line on line 84 that should be removed for consistency with the rest of the file.
    elif vector_db_provider == "falkordb":
        if not (vector_db_url and vector_db_port):
            raise EnvironmentError("Missing requred FalkorDB credentials!")

        from ..hybrid.falkordb.FalkorDBAdapter import FalkorDBAdapter

        return FalkorDBAdapter(
            database_url=vector_db_url,
            database_port=vector_db_port,
            embedding_engine=embedding_engine,
        )
-        
    elif vector_db_provider == "chromadb":
        try:
            import chromadb
        except ImportError:
            raise ImportError(
                "ChromaDB is not installed. Please install it with 'pip install chromadb'"
            )
            
        from .chromadb.ChromaDBAdapter import ChromaDBAdapter

        return ChromaDBAdapter(
            url=vector_db_url,
            api_key=vector_db_key,
            embedding_engine=embedding_engine,
        )
85-91: Consider adding credential validation.

Unlike other database providers like Weaviate and Qdrant, the ChromaDB implementation doesn't validate if the required credentials are provided. Consider adding credential validation for consistency.
    elif vector_db_provider == "chromadb":
+       if not vector_db_url:
+           raise EnvironmentError("Missing required ChromaDB credentials!")
+
        try:
            import chromadb
        except ImportError:
            raise ImportError(
                "ChromaDB is not installed. Please install it with 'pip install chromadb'"
            )
89-91: Typo in error message.

There's a typo in "required" in other error messages in this file ("requred" instead of "required"). For consistency, you should use the same spelling in your error message, even though it's misspelled elsewhere.
        try:
            import chromadb
        except ImportError:
            raise ImportError(
-                "ChromaDB is not installed. Please install it with 'pip install chromadb'"
+                "ChromaDB is not installed. Please install it with 'pip install chromadb'"
            )
Note: If you prefer to fix all the typos in the file, it would be better to create a separate PR for that.
docker-compose.yml (3)
63-82: Commented-out chromadb service configuration looks fine.

However, note that if you plan to enable this service in the future, ensure the environment variables align with production settings and credentials (e.g., tokens, secrets).

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 82-82: trailing spaces

(trailing-spaces)

82-82: Remove trailing spaces to comply with linting.

Below is a proposed fix:
-  #      - "3002:8000"  
+  #      - "3002:8000"
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 82-82: trailing spaces

(trailing-spaces)

103-103: Remove trailing spaces to comply with linting.

Below is a proposed fix:
-# UNCOMMENT IF USING CHROMADB         
+# UNCOMMENT IF USING CHROMADB
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 103-103: trailing spaces

(trailing-spaces)
cognee/tests/test_chromadb.py (1)

153-154: Use a constant for the expected history length.

Relying on a magic number 8 can make the test brittle if the search history length changes in the future. Consider referencing a descriptive constant or deriving the expected value.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
236-238: Honor the normalized parameter or remove it.

Currently, normalization is always performed. If a user passes normalized=False, it is disregarded. You could wrap the normalization logic in a conditional or remove the parameter to avoid confusion.
-        normalized: bool = True,
...
-        normalized_values = normalize_distances(vector_list)
+        if normalized:
+            normalized_values = normalize_distances(vector_list)
+        else:
+            normalized_values = [result["_distance"] for result in vector_list]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56427f2 and 8925442.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

.env.template (1 hunks)
.gitignore (1 hunks)
cognee/api/v1/settings/routers/get_settings_router.py (1 hunks)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1 hunks)
cognee/infrastructure/databases/vector/create_vector_engine.py (1 hunks)
cognee/tests/test_chromadb.py (1 hunks)
docker-compose.yml (2 hunks)
pyproject.toml (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.35.1)

docker-compose.yml

[error] 82-82: trailing spaces

(trailing-spaces)

[error] 103-103: trailing spaces

(trailing-spaces)

🔇 Additional comments (3)

.gitignore (1)

189-191: LGTM! Appropriate data directory exclusion.

The addition of .chromadb_data/ to .gitignore is appropriate and follows the project's pattern of excluding database data directories from version control.

.env.template (1)

33-33: LGTM! Clear environment variable documentation.

The updated comment correctly lists all supported vector database providers including the newly added "chromadb" option.

cognee/api/v1/settings/routers/get_settings_router.py (1)

30-30: Looks great!

Adding "chromadb" as a valid provider is consistent with the new adapter implementation.

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🧹 Nitpick comments (11)

README.md (2)

11-12: Demo Link Positioning & Extra Punctuation
The repositioned demo link now appears before the "Learn more" and "Join Discord" links, which improves its visibility. However, there is an isolated period on line 12 that feels out of place. Consider removing it or incorporating it more naturally with the surrounding content.

92-92: Clarify Default Environment Variables Description
This line clearly explains that if no specific database is specified, the defaults (SQLite, LanceDB, and NetworkX) will be used. For enhanced readability, you might consider splitting this long sentence into two or formatting the options as a bulleted list, particularly as new providers like ChromaDB are now supported elsewhere in the project.

CONTRIBUTING.md (5)

1-6: Invitation Tone and Clarity Improvements
The revised welcome message and introductory paragraph set an inviting tone. One minor suggestion: in line 4, consider adding an article (e.g., “the cognee community”) for grammatical clarity.

32-36: Community Engagement Guidelines
The "Community Channels" section effectively emphasizes joining the Discord community and participating in events. A minor stylistic note: you might consider adding punctuation at the end of each bullet for consistency.

37-41: Direct Contact Information
The "Direct Contact" section provides clear methods for getting in touch. For improved usability, consider turning the email address into a clickable mailto link (e.g., <mailto:[email protected]>).

42-42: Response Time Clarity
The guideline on response times is informative. However, a small punctuation adjustment is recommended: inserting a comma after “Discord channel” (i.e., “...using our Discord channel, where the whole community can help!”) can enhance readability.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~42-~42: Possible missing comma found.
Context: ...r responses, consider using our Discord channel where the whole community can help! ##...

(AI_HYDRA_LEO_MISSING_COMMA)

128-128: Appreciative Sign-off Tone
The closing line effectively conveys gratitude and enthusiasm. Consider moderating the number of exclamation marks to maintain a professional tone if consistency is desired across the document.

🧰 Tools

🪛 LanguageTool

[style] ~128-~128: Using many exclamation marks might seem excessive (in this case: 5 exclamation marks for a text that’s 3216 characters long)
Context: ...Thank you for contributing to cognee! 🌟

(EN_EXCESSIVE_EXCLAMATION)
cognee/eval_framework/metrics_dashboard.py (1)
80-92: Improved HTML generation with dynamic table structure

The HTML generation is now more flexible as it dynamically creates table headers and rows based on the available data. This makes the code more maintainable and adaptable to changes in the metrics structure.

One potential improvement would be to add HTML escaping for user-provided content to prevent potential XSS vulnerabilities.

Consider adding HTML escaping for user-provided values:
- {"".join(f"<td>{value}</td>" for value in item.values())}
+ {"".join(f"<td>{html.escape(str(value))}</td>" for value in item.values())}
And add the import at the top:
+ import html
cognee/modules/retrieval/completion_retriever.py (1)
29-29: Potential limitation with multiple retrieval results

While the top_k parameter allows retrieving multiple document chunks, the get_context method still only returns the first chunk's text. Consider enhancing this method to return a concatenated or otherwise processed collection of all retrieved chunks when top_k > 1.
-        return found_chunks[0].payload["text"]
+        if self.top_k == 1:
+            return found_chunks[0].payload["text"]
+        else:
+            # Join multiple chunks with separators for better context
+            return "\n\n---\n\n".join([chunk.payload["text"] for chunk in found_chunks])
cognee/eval_framework/answer_generation/run_question_answering_module.py (1)

35-35: Add docstring to elaborate usage of top_k.
Consider explaining the effect of passing None vs a positive integer for top_k, so that maintainers can easily understand how to configure the function.

cognee/eval_framework/evaluation/deep_eval_adapter.py (1)

37-37: Consider using an empty list instead of None
To maintain consistency, you may want to store an empty list if golden_context is absent, so that the metric is always dealing with list inputs.

🛑 Comments failed to post (1)

cognee/eval_framework/evaluation/metrics/context_coverage.py (1)
8-33: 💡 Verification agent
🧩 Analysis chain

Handle potential empty context or retrieval_context
If test_case.context or test_case.retrieval_context is empty, indexing [0] will raise an error. Please ensure upstream code always populates these lists, or consider adding a defensive check.

🏁 Script executed:
#!/bin/bash
# Attempt to locate usage of ContextCoverageMetric and see if context is guaranteed to be non-empty
rg -A 5 "ContextCoverageMetric"
Length of output: 1634
Attention: Defensive Check Required for Empty Context Lists

The current implementation in ContextCoverageMetric.measure() assumes that both test_case.context and test_case.retrieval_context contain at least one element. However, if either list is empty, accessing [0] will raise an IndexError. Please either:
Ensure upstream guarantees non-empty lists: Confirm that all callers of this metric always provide non-empty context and retrieval_context, or
Add a defensive guard in the metric: Introduce a validation check at the beginning of the measure method to handle cases when these lists are empty, for example:
if not test_case.context or not test_case.retrieval_context:
    raise ValueError("Test case context or retrieval_context cannot be empty.")
Review the surrounding code to decide which approach best fits your error handling strategy.

hajdul88

First of all, thanks for the contribution, it's a solid implementation. I've opened a new PR to run our tests; you can follow it here:

#629

I'm fixing the Ruff issues there and will probably add your test to GA. I also added some minor comments—nothing crucial, just small things.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

cognee/tests/test_chromadb.py

borisarzentar · 2025-03-11T09:56:08Z

docker-compose.yml

+  #  chromadb:
+  #    image: chromadb/chroma:0.6.3
+  #    container_name: chromadb
+  #    profiles:


You don't have to comment this out. profiles config makes sure that it will start the chromadb service only if chromadb profile is added during docker compose: docker compose --profile chromadb up

borisarzentar · 2025-03-11T09:56:49Z

pyproject.toml

 langdetect = "1.0.9"
 posthog = {version = "^3.5.0", optional = true}
 lancedb = "0.16.0"
+chromadb = "^0.6.0"


Please add chromadb as optional dependency that is installed as chromadb extra.

borisarzentar · 2025-03-11T09:57:48Z

cognee/tests/test_chromadb.py

+    history = await cognee.get_search_history()
+    assert len(history) == 8, "Search history is not correct."
+
+    results = await brute_force_triplet_search("What is a quantum computer?")


You probably don't need this one, it is used in other searches.

borisarzentar · 2025-03-11T09:58:08Z

cognee/tests/test_chromadb.py

+    results = await brute_force_triplet_search("What is a quantum computer?")
+    assert len(results) > 0
+
+    await test_local_file_deletion(text, explanation_file_path)


This is also not needed probably, it is tested in other tests.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (4)
31-47: Consider validated serialization or sanitized storage.

Storing lists and dictionaries as JSON strings is a practical approach, but be mindful of security implications (e.g., malicious JSON content placed in the dictionary or list). It may be prudent to validate or sanitize data before serialization, to mitigate downstream parsing issues.

200-200: Offer option for more comprehensive search.

The code caps the results at 100 (n_results=100). Depending on the dataset size and use case, this could omit valuable global matches. Consider making this value configurable or removing the cap entirely if your dataset and performance profile permit it.

316-317: Restore metadata for consistent data structure.

This code returns raw metadata instead of running it through restore_data_from_chroma. Consider aligning it with the approach used in other methods for uniform data structures across the interface.
-    payload=metadata,
+    payload=restore_data_from_chroma(metadata),
344-347: Rename method for clarity.

Based on previous review comments, you may rename get_table_names to get_collection_names to align with ChromaDB’s domain model and improve clarity.
-    async def get_table_names(self):
+    async def get_collection_names(self):

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3021f1a and b28624c.

📒 Files selected for processing (1)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1 hunks)

🔇 Additional comments (1)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

66-77: Confirmed: Removing 6 characters is correct.

The suffix __dict or __list is exactly 6 characters. The slicing operation key[:-6] is appropriate and aligns with the previous feedback. No further changes needed.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)
183-231: Potential performance concern with large collections.

When computing distances for all elements, the code retrieves the entire collection (n_results=collection_count). For very large collections, this could cause memory or performance issues. Consider setting a reasonable limit or batching the retrieval if you're working with potentially large datasets.

You might implement a paginated approach or a maximum threshold to limit the total number of items processed at once.

346-349: Rename method to reflect collections rather than tables.

The function is named get_table_names but returns collection names. The rest of the code also refers to “collections,” so this can cause confusion.
-    async def get_table_names(self):
+    async def get_collection_names(self):

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b28624c and a299bd6.

📒 Files selected for processing (1)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1 hunks)

🔇 Additional comments (2)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)

159-166: Handle potential missing 'index_fields' key in metadata.

Here, the code uses data_point.metadata["index_fields"] directly. If the "index_fields" key is missing or null in the metadata, this will cause a KeyError. Consider adding an existence check or a default value before accessing "index_fields".

Can you confirm that every DataPoint in your application includes "index_fields" in the metadata attribute? If not, please ensure to handle or validate this field to avoid exceptions.

314-314: Unify distance normalization approach.

Here, distance is converted to similarity using similarity = 1.0 - min(distance, 2.0) / 2.0, which differs from the normalize_distances(...) approach used elsewhere (e.g., in search). For consistency and clarity, consider using a single normalization method across all queries.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

docker-compose.yml (1)
81-81: Remove trailing whitespace.

There is a trailing whitespace on this line that should be removed to fix the YAMLlint warning.
-  
+
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 81-81: trailing spaces

(trailing-spaces)
cognee/tests/test_chromadb.py (1)

69-162: Consider restructuring the main function for better test organization.

The main function contains numerous assertions and test scenarios mixed together. Consider refactoring this into separate, focused test functions for each aspect being tested:

Data addition

Document retrieval

Different search types (INSIGHTS, CHUNKS, etc.)

Search history

Data cleanup

This would improve test readability, maintainability, and make failures easier to diagnose.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a299bd6 and eae2f76.

📒 Files selected for processing (2)

cognee/tests/test_chromadb.py (1 hunks)
docker-compose.yml (2 hunks)

🧰 Additional context used

🪛 YAMLlint (1.35.1)

docker-compose.yml

[error] 81-81: trailing spaces

(trailing-spaces)

🔇 Additional comments (9)

docker-compose.yml (2)

64-80: Good implementation of the ChromaDB service.

The service configuration includes all necessary components:

Specific versioning (0.6.3)

Appropriate authentication setup with environment variables

Proper network configuration

Profile configuration for optional startup

Volume mapping for data persistence

Port exposure (3002:8000)

103-103: Good addition of the ChromaDB volume.

The volume declaration for chromadb_data has been properly added to support the new ChromaDB service.
cognee/tests/test_chromadb.py (7)

122-124: Remove print statements from test.

These print statements should be removed as they're not necessary for automated tests and were noted in a previous review.
-    print("\n\nExtracted sentences are:\n")
-    for result in search_results:
-        print(f"{result}\n")
130-132: Remove print statements from test.

Print statements should be removed from automated tests.
-    print("\n\nExtracted chunks are:\n")
-    for result in search_results:
-        print(f"{result}\n")
140-141: Remove print statements from test.

Print statements should be removed from automated tests.
-    print("Completion result is:")
-    print(graph_completion)
147-149: Remove print statements from test.

Print statements should be removed from automated tests.
-    print("\n\nExtracted summaries are:\n")
-    for result in search_results:
-        print(f"{result}\n")
154-154: Consider removing redundant test call.

As noted in a previous review, this test call might be redundant as it's likely covered by other tests.
-    await test_local_file_deletion(text, explanation_file_path)
14-49: Good implementation of file deletion tests.

The test function thoroughly validates both cases:

Files created by cognee are properly deleted

Files not created by cognee are preserved

This ensures the ChromaDB adapter correctly handles file system operations during data deletion.

51-66: Good implementation of document retrieval tests.

The test function properly verifies document retrieval functionality:

With specific dataset filtering

Without dataset filtering (retrieving all documents)

This ensures the ChromaDB adapter correctly integrates with user permissions.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (3)

31-83: Ensure robust handling of nested or complex data structures.

The functions process_data_for_chroma and restore_data_from_chroma perform JSON serialization for dicts and lists. Deeply nested structures will still be serialized to strings without further recursion checks. If this is intended, make a note. Otherwise, consider recursively processing nested objects for more precise control.

183-231: Be mindful of performance with large collections.

get_distance_from_collection_elements sets n_results to the entire collection size (collection_count). This loads all items, which may be expensive for large collections. Consider pagination or specifying a practical limit to prevent potential performance bottlenecks.

359-362: Rename method to align with ChromaDB naming.

The method name get_table_names might be confusing in the context of ChromaDB, as collections are the relevant concept. Consider renaming it to get_collection_names for clarity, unless it must conform to an interface requirement.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eae2f76 and dc73728.

📒 Files selected for processing (1)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1 hunks)

🔇 Additional comments (2)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)

21-29: Consider validating the text and metadata fields.

While IndexSchema explicitly expects a text field and sets a default metadata value with index_fields, there's no check to ensure the field in index_fields actually exists or is non-empty. If a data point lacks the required fields, it could trigger runtime errors. Consider adding validation or fallback behavior.

153-167: Guard against missing index_fields in metadata.

In index_data_points, you call
text=getattr(data_point, data_point.metadata["index_fields"][0]).
If metadata["index_fields"] is absent or has fewer elements, this code will raise KeyError or IndexError. Consider safely checking for the key or defaulting to a known attribute.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

dm1tryG · 2025-03-12T19:01:45Z

@borisarzentar @hajdul88 I have fixed by comments, take a look again please

hajdul88 · 2025-03-13T09:19:36Z

Ok from my side, didn't check @borisarzentar 's comments

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

cognee/tests/test_chromadb.py (1)
70-70: 🛠️ Refactor suggestion

Remove debug print statement

Remove the print statement that outputs the configuration.
-    print(cognee.config)

🧹 Nitpick comments (7)

docker-compose.yml (2)
82-82: Remove trailing whitespace

There is trailing whitespace on this line that should be removed.
-  
+
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 82-82: trailing spaces

(trailing-spaces)

103-105: Remove trailing whitespace and uncomment volumes section

There is trailing whitespace on line 103. Also, consistent with the service configuration, you should uncomment the volumes section since it will only be used when the ChromaDB profile is activated.
-# UNCOMMENT IF USING CHROMADB         
-# volumes:
-#  chromadb_data:
+# UNCOMMENT IF USING CHROMADB
+volumes:
+  chromadb_data:
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 103-103: trailing spaces

(trailing-spaces)
cognee/tests/test_chromadb.py (4)
128-130: Remove or disable debug print statements

These print statements are useful for debugging but should be removed or disabled in the test file. Consider using logging at the debug level instead.

Also applies to: 136-138, 146-147, 153-155

47-48: Fix assertion error message

The error message states "Data location doesn't exists" when it should be "Data location still exists".
-        assert os.path.exists(data.raw_data_location), (
-            f"Data location doesn't exists: {data.raw_data_location}"
+        assert os.path.exists(data.raw_data_location), (
+            f"Data location still exists: {data.raw_data_location}"
14-49: Test logic looks good, but consider breaking down into smaller test functions

The test_local_file_deletion function performs two distinct tests:

Deleting data created by cognee (and verifying files are removed)

Deleting data not created by cognee (and verifying files remain)

Consider separating these into two distinct test functions for better clarity and to follow the single responsibility principle.

69-171: Main test function is comprehensive but consider restructuring

The main function serves as a comprehensive integration test but mixes test setup, execution, assertions, and cleanup. Consider refactoring this to follow a more standard test structure:

Setup (configure database, create test data)

Test execution (various search operations)

Assertions (verify results)

Cleanup (prune data)

This would make the test more maintainable and easier to understand.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

225-233: Ensure the normalized parameter is honored.

Currently, the search method always normalizes distances without conditionally checking the normalized parameter. Consider introducing logic to skip normalization if normalized is False, ensuring users retain explicit control over how distances are handled.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc73728 and 3362e6d.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

.github/workflows/test_chromadb.yml (1 hunks)
cognee/api/v1/settings/routers/get_settings_router.py (1 hunks)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1 hunks)
cognee/infrastructure/databases/vector/create_vector_engine.py (1 hunks)
cognee/tests/test_chromadb.py (1 hunks)
docker-compose.yml (2 hunks)
pyproject.toml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

pyproject.toml
cognee/api/v1/settings/routers/get_settings_router.py

🧰 Additional context used

🪛 YAMLlint (1.35.1)

docker-compose.yml

[error] 82-82: trailing spaces

(trailing-spaces)

[error] 103-103: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: run_simple_example_test

🔇 Additional comments (9)

cognee/infrastructure/databases/vector/create_vector_engine.py (1)

85-100: Implementation for ChromaDB looks good!

The implementation follows the same pattern used for other vector database providers, which makes it consistent and maintainable. The error message if ChromaDB is not installed is helpful and provides clear instructions for installation.

.github/workflows/test_chromadb.yml (1)

1-64: Well-structured CI workflow for ChromaDB testing

The workflow is properly configured with:

Appropriate triggers (manual and pull request events)

Service container for ChromaDB with the correct image version

Required environment variables for testing

Proper dependency setup with Poetry

This will provide good test coverage for the new ChromaDB integration.

docker-compose.yml (1)

64-81: Uncomment the ChromaDB service instead of using a comment directive

As mentioned in previous review comments, you don't need to comment out this ChromaDB service configuration. The profiles configuration ensures that it will only start when explicitly requested with docker compose --profile chromadb up.

cognee/tests/test_chromadb.py (3)

160-161: Remove redundant test

This test for brute_force_triplet_search is likely redundant as it's used in other search tests.

163-163: Consider relocating test function call

This test is likely redundant as it's tested elsewhere. If it's necessary to include it here, consider adding a comment explaining why it's being tested again in this context.

51-67: Test logic for document retrieval looks good

The test_getting_of_documents function properly tests document retrieval both with and without a dataset filter.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (3)

67-67: Likely off-by-one error when removing '__dict' / '__list' suffixes.

These lines remove 7 characters instead of 6, whereas the suffix (__dict or __list) is only 6 characters long. This may cause a truncated character and potentially break the restoration of original keys. Note that a similar comment has already been raised in past reviews.

Also applies to: 76-76

287-330: Unify distance normalization between batch_search and search.

batch_search transforms the raw distance into a similarity measure (1.0 - min(distance, 2.0) / 2.0), whereas search relies on normalize_distances(...). This disparity in approaches may lead to inconsistent scoring across methods. A refactoring suggestion to unify or deduplicate distance handling logic has been raised previously.

347-347: Use a more descriptive method name.

As suggested in prior comments, consider renaming get_table_names to get_collection_names for clarity and consistency with ChromaDB terminology.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)
67-76: ⚠️ Potential issue

Suffix removal off by one.
For keys ending with __dict or __list, you should remove only the last 6 characters (the substring __dict or __list). Removing 7 characters will truncate the preceding character in the key, leading to incorrect key restoration.

Apply this fix:
-        original_key = key[:-7]  # Remove '__dict' suffix
+        original_key = key[:-6]  # Remove '__dict' suffix

-        original_key = key[:-7]  # Remove '__list' suffix
+        original_key = key[:-6]  # Remove '__list' suffix
319-319: 🛠️ Refactor suggestion

Restore metadata consistently.
Unlike in search and get_distance_from_collection_elements, this line directly assigns metadata to payload without calling restore_data_from_chroma(). This causes inconsistent payload formats across methods.

Apply this fix:
-                    payload=metadata,
+                    payload=restore_data_from_chroma(metadata),

🧹 Nitpick comments (1)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

225-286: Maintain consistent distance normalization.
This method uses normalize_distances, but elsewhere (e.g., in batch_search) you calculate similarity differently with a fixed formula. Either unify the approach with a single normalization function or document the rationale for having two methods.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3362e6d and 9954a09.

📒 Files selected for processing (2)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1 hunks)
cognee/tests/test_chromadb.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

cognee/tests/test_chromadb.py

🔇 Additional comments (3)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (3)

1-19: Solid foundational setup and imports.
The overall structure, including logging configuration and import statements, provides a clear starting point for this adapter. No immediate issues observed in this snippet.

178-224: Confirm 100 as the default result limit.
Here, n_results=100 may be a safe default for typical searches, but large collections or pipeline usage could need more flexibility. Consider making this value configurable or ensuring it meets all use cases.

315-315: Use a shared normalization function for clarity.
Here, similarity = 1.0 - min(distance, 2.0)/2.0 diverges from the main normalization strategy in search. Adopting a consistent approach for all distance calculations simplifies maintenance and reduces potential confusion about how scores are derived.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

cognee/tests/test_chromadb.py (4)
14-49: Fix grammatical issue in assertion message for file existence.
In the assertion at line 47, consider updating the message for clarity. The correct grammar would be "Data location doesn't exist" instead of "doesn't exists."
- f"Data location doesn't exists: {data.raw_data_location}"
+ f"Data location doesn't exist: {data.raw_data_location}"
51-67: Consider additional negative test scenarios.
Currently, the test only checks for the presence of documents (1 or 2). It might be beneficial to add negative tests where no documents exist or invalid dataset names are provided. This would help ensure robust error handling and guard against unexpected data states.

69-164: Refine the large integration test for maintainability.
The main function is quite extensive, testing multiple functionalities end-to-end. Consider splitting it into smaller, more focused test functions or employing parameterized tests to make debugging and maintenance easier.

166-169: Rely on your test runner instead of a manual entry point.
Including a main() entry point for an async test may not be necessary if you are already using an automated test runner like Pytest or unittest. Removing it helps standardize how tests are executed and reported across the project.

🧰 Tools

🪛 GitHub Actions: test | chromadb

[error] 169-169: KeyError: 'Could not automatically map to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9954a09 and 990a04c.

📒 Files selected for processing (3)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1 hunks)
cognee/tests/test_chromadb.py (1 hunks)
docker-compose.yml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

docker-compose.yml

🧰 Additional context used

🪛 GitHub Actions: test | chromadb

cognee/tests/test_chromadb.py

[error] 169-169: KeyError: 'Could not automatically map to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

[warning] 1-1: Ontology file 'None' not found. Using fallback ontology at http://example.org/empty_ontology

[warning] 1-1: File /home/runner/work/cognee/cognee/cognee/tests/.cognee_system/test_chromadb/databases/cognee.graph not found. Initializing an empty graph.

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: test
GitHub Check: run_notebook_test / test
GitHub Check: Test on macos-13
GitHub Check: Test on ubuntu-22.04
GitHub Check: run_eval_framework_test / test
GitHub Check: Test on macos-15
GitHub Check: run_notebook_test / test
GitHub Check: Test on macos-13
GitHub Check: windows-latest
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: docker-compose-test
GitHub Check: run_notebook_test / test
GitHub Check: run_simple_example_test / test
GitHub Check: Test on macos-15
GitHub Check: Test on macos-13
GitHub Check: run_simple_example_test

coderabbitai · 2025-03-13T13:55:01Z

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

+            collection_count = await collection.count()
+
+            results = await collection.query(
+                query_embeddings=[query_vector],
+                include=["metadatas", "distances"],
+                n_results=collection_count,
+            )
+


🛠️ Refactor suggestion

Investigate potential performance impact of retrieving all elements.
Using collection.count() to set n_results=collection_count can lead to very large queries for large collections. Consider adding a limit or a user-configurable parameter to avoid performance bottlenecks when only partial data is needed.

coderabbitai · 2025-03-13T13:55:01Z

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

+        normalized: bool = True,
+    ):
+        """Search for similar items in a collection using text or vector query."""
+        if query_text is None and query_vector is None:
+            raise InvalidValueError(message="One of query_text or query_vector must be provided!")
+
+        if query_text and not query_vector:
+            query_vector = (await self.embedding_engine.embed_text([query_text]))[0]
+
+        try:
+            client = await self.get_connection()
+            collection = await client.get_collection(collection_name)
+
+            results = await collection.query(
+                query_embeddings=[query_vector],
+                include=["metadatas", "distances", "embeddings"]
+                if with_vector
+                else ["metadatas", "distances"],
+                n_results=limit,
+            )
+
+            vector_list = []
+            for i, (id, metadata, distance) in enumerate(
+                zip(results["ids"][0], results["metadatas"][0], results["distances"][0])
+            ):
+                item = {
+                    "id": parse_id(id),
+                    "payload": restore_data_from_chroma(metadata),
+                    "_distance": distance,
+                }
+
+                if with_vector and "embeddings" in results:
+                    item["vector"] = results["embeddings"][0][i]
+
+                vector_list.append(item)
+
+            # Normalize vector distance
+            normalized_values = normalize_distances(vector_list)
+            for i in range(len(normalized_values)):
+                vector_list[i]["score"] = normalized_values[i]
+
+            # Create and return ScoredResult objects
+            return [
+                ScoredResult(
+                    id=row["id"],
+                    payload=row["payload"],
+                    score=row["score"],
+                    vector=row.get("vector") if with_vector else None,
+                )
+                for row in vector_list
+            ]
+        except Exception as e:
+            logger.error(f"Error in search: {str(e)}")
+            return []


🛠️ Refactor suggestion

Honor the normalized parameter conditionally.
Although the function signature includes a normalized flag, the code always normalizes distances (lines 272–275) regardless of the flag’s value. If you intend to allow unnormalized distances, wrap the normalization step in a conditional check.

if normalized: normalized_values = normalize_distances(vector_list) for i in range(len(normalized_values)): vector_list[i]["score"] = normalized_values[i] else: for i in range(len(vector_list)): # Use raw distance or compute an alternative measure vector_list[i]["score"] = 1.0 - vector_list[i]["_distance"] # Example

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

normalized: bool = True,

):

"""Search for similar items in a collection using text or vector query."""

if query_text is None and query_vector is None:

raise InvalidValueError(message="One of query_text or query_vector must be provided!")

if query_text and not query_vector:

query_vector = (await self.embedding_engine.embed_text([query_text]))[0]

try:

client = await self.get_connection()

collection = await client.get_collection(collection_name)

results = await collection.query(

query_embeddings=[query_vector],

include=["metadatas", "distances", "embeddings"]

if with_vector

else ["metadatas", "distances"],

n_results=limit,

)

vector_list = []

for i, (id, metadata, distance) in enumerate(

zip(results["ids"][0], results["metadatas"][0], results["distances"][0])

):

item = {

"id": parse_id(id),

"payload": restore_data_from_chroma(metadata),

"_distance": distance,

}

if with_vector and "embeddings" in results:

item["vector"] = results["embeddings"][0][i]

vector_list.append(item)

# Normalize vector distance

normalized_values = normalize_distances(vector_list)

for i in range(len(normalized_values)):

vector_list[i]["score"] = normalized_values[i]

# Create and return ScoredResult objects

return [

ScoredResult(

id=row["id"],

payload=row["payload"],

score=row["score"],

vector=row.get("vector") if with_vector else None,

)

for row in vector_list

]

except Exception as e:

logger.error(f"Error in search: {str(e)}")

return []

normalized: bool = True,

):

"""Search for similar items in a collection using text or vector query."""

if query_text is None and query_vector is None:

raise InvalidValueError(message="One of query_text or query_vector must be provided!")

if query_text and not query_vector:

query_vector = (await self.embedding_engine.embed_text([query_text]))[0]

try:

client = await self.get_connection()

collection = await client.get_collection(collection_name)

results = await collection.query(

query_embeddings=[query_vector],

include=["metadatas", "distances", "embeddings"]

if with_vector

else ["metadatas", "distances"],

n_results=limit,

)

vector_list = []

for i, (id, metadata, distance) in enumerate(

zip(results["ids"][0], results["metadatas"][0], results["distances"][0])

):

item = {

"id": parse_id(id),

"payload": restore_data_from_chroma(metadata),

"_distance": distance,

}

if with_vector and "embeddings" in results:

item["vector"] = results["embeddings"][0][i]

vector_list.append(item)

# Normalize vector distance

if normalized:

normalized_values = normalize_distances(vector_list)

for i in range(len(normalized_values)):

vector_list[i]["score"] = normalized_values[i]

else:

for i in range(len(vector_list)):

# Use raw distance or compute an alternative measure

vector_list[i]["score"] = 1.0 - vector_list[i]["_distance"] # Example

# Create and return ScoredResult objects

return [

ScoredResult(

id=row["id"],

payload=row["payload"],

score=row["score"],

vector=row.get("vector") if with_vector else None,

)

for row in vector_list

]

except Exception as e:

logger.error(f"Error in search: {str(e)}")

return []

dm1tryG changed the title ~~Support chroma~~ Add Support ChromaDB Mar 8, 2025

dm1tryG changed the title ~~Add Support ChromaDB~~ feat: Add Support ChromaDB Mar 8, 2025

dm1tryG force-pushed the feature/support-chromadb branch from 9e6c021 to 45d3356 Compare March 9, 2025 19:50

dm1tryG force-pushed the feature/support-chromadb branch from 45d3356 to 0318a0f Compare March 9, 2025 19:53

Add Support ChromaDB

8925442

dm1tryG force-pushed the feature/support-chromadb branch from 90b23be to 8925442 Compare March 9, 2025 19:59

dm1tryG marked this pull request as ready for review March 9, 2025 20:00

dm1tryG marked this pull request as draft March 9, 2025 20:01

coderabbitai bot reviewed Mar 9, 2025

View reviewed changes

Update lock file deps

0973e42

dm1tryG marked this pull request as ready for review March 9, 2025 20:07

hajdul88 self-requested a review March 11, 2025 07:10

Merge branch 'dev' into feature/support-chromadb

3021f1a

coderabbitai bot reviewed Mar 11, 2025

View reviewed changes

hajdul88 added the run-checks label Mar 11, 2025

fix: fixes ruff format

1feee13

hajdul88 reviewed Mar 11, 2025

View reviewed changes

borisarzentar reviewed Mar 11, 2025

View reviewed changes

hajdul88 added 10 commits March 11, 2025 11:02

feat: adds chroma_db_test to github actions

6a2ef72

deletes volume from yml

9483b6e

updates health check

6879b7a

healthcheck update

26d5d7d

deletes volume once again

04e4860

updates chroma test

21ba967

updates image and adds volume

c6532d0

deletes healthcheck

6df6cc8

changes port

4bdee26

removes hardcoded config from test

570204d

updates poetry lock file

2970b23

hajdul88 mentioned this pull request Mar 12, 2025

Support chromadb tests fix #629

Closed

Fix original_key splits

b28624c

coderabbitai bot reviewed Mar 12, 2025

View reviewed changes

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py Outdated Show resolved Hide resolved

give back the distance from all the elements

a299bd6

coderabbitai bot reviewed Mar 12, 2025

View reviewed changes

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py Outdated Show resolved Hide resolved

dm1tryG added 2 commits March 12, 2025 22:41

uncomment chroma profile

645be10

rm print

eae2f76

coderabbitai bot reviewed Mar 12, 2025

View reviewed changes

dm1tryG added 2 commits March 12, 2025 22:48

change batch_search with normalize_distances

dc73728

update docker-compose.yml

8b874ad

coderabbitai bot reviewed Mar 12, 2025

View reviewed changes

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py Show resolved Hide resolved

dm1tryG added 3 commits March 12, 2025 22:52

rm test_local_file_deletion

3a682fe

chromadb as extra in pyproject.toml

53428c6

rename get_table_names

18d818f

dm1tryG requested review from borisarzentar and hajdul88 March 12, 2025 18:59

Merge branch 'feature/support-chromadb' into support_chromadb_tests_fix

3362e6d

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

chore: updates naming based on comments

9954a09

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

hajdul88 added 2 commits March 13, 2025 14:49

chore: updates naming based on comments

e60c398

Merge branch 'dev' into feature/support-chromadb

990a04c

borisarzentar approved these changes Mar 13, 2025

View reviewed changes

coderabbitai bot reviewed Mar 13, 2025

View reviewed changes

hajdul88 added run-checks and removed run-checks labels Mar 13, 2025

borisarzentar merged commit e147fa5 into topoteretes:dev Mar 13, 2025
12 of 69 checks passed

feat: Add Support ChromaDB #622

feat: Add Support ChromaDB #622

Uh oh!

Conversation

dm1tryG commented Mar 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Add Support for ChromaDB

Summary

Changes

Technical Details

Docker Changes

DCO Affirmation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

gitguardian bot commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

️✅ There are no secrets present in this pull request anymore.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

hajdul88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

borisarzentar Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

borisarzentar Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

borisarzentar Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

borisarzentar Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dm1tryG commented Mar 12, 2025

Uh oh!

hajdul88 commented Mar 13, 2025

Uh oh!

dm1tryG commented Mar 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

gitguardian bot commented Mar 9, 2025 •

edited

Loading