-
Notifications
You must be signed in to change notification settings - Fork 960
Merge dev to main #827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge dev to main #827
Conversation
<!-- .github/pull_request_template.md --> ## Description Resolve issue with .venv being broken when using docker compose with Cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris Arzentar <[email protected]>
… 1947 (#760) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Add support for UV and for Poetry package management ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Switch typing from str to UUID for NetworkX node_id ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Add both sse and stdio support for Cognee MCP ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
…83] (#782) <!-- .github/pull_request_template.md --> ## Description Add log handling options for cognee exceptions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Fix issue with failing versions gh actions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description This PR adds support for the Memgraph graph database following the [graph database integration guide](https://docs.cognee.ai/contributing/adding-providers/graph-db/graph-database-integration): - Implemented `MemgraphAdapter` for interfacing with Memgraph. - Updated `get_graph_engine.py` to return MemgraphAdapter when appropriate. - Added a test script:` test_memgraph.py.` - Created a dedicated test workflow: `.github/workflows/test_memgraph.yml.` ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]> Co-authored-by: Boris <[email protected]>
<!-- .github/pull_request_template.md --> ## Description refactor: Handle boto3 s3fs dependencies better ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Update LanceDB and rewrite data points to run async ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Boris Arzentar <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description As discussed with @hande-k and Lazar, I've created a short demo to illustrate how to get the pagerank rankings from the knowledge graph given the nx engine. This is a POC, and a first of step towards solving #643 . Please let me know what you think, and how to proceed from here. :) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Added tools to check current cognify and codify status ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Move CollectionNotFoundError logging to be at Debug level ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
This reverts commit c058219.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Please make sure all the checkboxes are checked:
|
WalkthroughThis update introduces broad architectural, interface, and usability changes to the Cognee codebase. It adds Memgraph support, refactors pipeline and task execution with context propagation, modularizes observability, and standardizes dataset identification. Numerous adapters are updated for unified search and error handling. The MCP server is overhauled, and documentation and UI are improved. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant API
participant Pipeline
participant DB
User->>Frontend: Add data / Click Cognify
Frontend->>API: POST /add or /cognify (with dataset name)
API->>Pipeline: cognee_pipeline(tasks, datasets, user, pipeline_name)
Pipeline->>DB: Create or fetch dataset (get_unique_dataset_id)
Pipeline->>Pipeline: Log pipeline run initiated (with status)
Pipeline->>Pipeline: Check pipeline run status
alt Not already processing/completed
Pipeline->>Pipeline: Run tasks (with context)
Pipeline->>DB: Update pipeline run status
else Already processing/completed
Pipeline-->>API: Return early
end
API-->>Frontend: Response (status, result)
Frontend-->>User: Show notification / result
sequenceDiagram
participant User
participant Frontend
participant API
participant MCP Server
User->>Frontend: Submit tool action (e.g., cognify)
Frontend->>API: Call MCP endpoint
API->>MCP Server: Invoke tool (e.g., @mcp.tool('cognify'))
MCP Server->>MCP Server: Launch async task
MCP Server-->>API: Immediate confirmation (background processing)
API-->>Frontend: Show status / instructions
Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
🔭 Outside diff range comments (5)
cognee/shared/logging_utils.py (1)
315-325:⚠️ Potential issueReassess broadened warning suppression threshold.
The condition now silences
SAWarningfor all levels aboveDEBUG(includingINFO), which may hide important warnings. Recommend reverting to suppress only levels aboveWARNINGor refining to specific modules:- if log_level > logging.DEBUG: + if log_level >= logging.WARNING: import warnings from sqlalchemy.exc import SAWarning warnings.filterwarnings( "ignore", category=SAWarning, module="dlt.destinations.impl.sqlalchemy.merge_job" ) warnings.filterwarnings( "ignore", category=SAWarning, module="dlt.destinations.impl.sqlalchemy.load_jobs" )This ensures you don’t unintentionally swallow informative messages.
cognee/exceptions/exceptions.py (1)
38-44:⚠️ Potential issueChild exception classes don't forward new logging parameters.
ServiceError and other subclasses don't pass the new
logandlog_levelparameters to the parent constructor, which means the new logging controls won't work for these exception types.Update the subclass constructors to forward the new parameters:
def __init__( self, message: str = "Service is unavailable.", name: str = "ServiceError", status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, + log=True, + log_level="ERROR", ): - super().__init__(message, name, status_code) + super().__init__(message, name, status_code, log, log_level)This change should be applied to all child exception classes (ServiceError, InvalidValueError, InvalidAttributeError).
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
210-237:⚠️ Potential issue
closest_itemstype confusion causes attribute errors
session.execute(query)returns a SQLAlchemyResult.
Iterating overclosest_items.all()yieldsRowMappingobjects whose columns are available via index or key access (e.g.row[0]orrow.PGVectorDataPoint), not via attributes on the row itself. Thereforevector.idon line 244 will raiseAttributeError.Suggested fix:
- for vector in closest_items.all(): - vector_list.append( - { - "id": parse_id(str(vector.id)), - "payload": vector.payload, - "_distance": vector.similarity, - } - ) + for row in closest_items.all(): + data_point = row[0] # PGVectorDataPoint instance + similarity = row.similarity # labelled column + vector_list.append( + { + "id": parse_id(str(data_point.id)), + "payload": data_point.payload, + "_distance": similarity, + } + )cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)
155-205: 🛠️ Refactor suggestionDouble-closing the Qdrant client & missing
query_vectorfallback
await client.close()is invoked inside thetryblock (line 183) and again in thefinallyblock (line 204).
While idempotent, this is redundant and obscures intent.When
query_vectoris supplied butquery_textis also given, the latter is ignored. Clarify precedence or raiseValueError.Minimal patch:
@@ - await client.close() + # close handled in finallyOptionally document parameter precedence.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
239-241:⚠️ Potential issueExceptions are swallowed, callers receive an empty list
Catching all exceptions and returning
[]hides real problems (network issues, authentication, malformed queries, etc.). Let the exception propagate or wrap it in a domain-specific error; otherwise upstream code cannot distinguish “no results” from “search failed”.- except Exception as e: - logger.error(f"Error in search: {str(e)}") - return [] + except Exception as err: + logger.error("Search failed: %s", err, exc_info=True) + raise
🧹 Nitpick comments (27)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
332-334: Schema handling changed to use a fixed list instead of dynamic retrieval.The code now uses a fixed schema list
["public", "public_staging"]instead of dynamically retrieving all schemas withget_schema_list(). This makes the deletion process more predictable but might not handle custom schemas that aren't in this list.This change aligns with the broader infrastructure updates in the PR to stabilize database schema handling across relational and graph database adapters.
Consider adding a comment explaining why a fixed list is preferred over dynamically retrieving schemas, to help future developers understand the design decision.
examples/data/car_and_tech_companies.txt (2)
17-17: Fix grammatical number agreement in summary sentence.The phrase “Each of these car manufacturer contributes…” mismatches plural “these” with singular “manufacturer.”
Apply this diff:- Each of these car manufacturer contributes to Germany's reputation as a leader in the global automotive industry... + Each of these car manufacturers contributes to Germany's reputation as a leader in the global automotive industry...🧰 Tools
🪛 LanguageTool
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
1-37: Consider externalizing large static text for maintainability.Embedding lengthy multi-line strings directly in a
.txtfile can make updates cumbersome. You might move these into a structured JSON/YAML resource (e.g.,examples/data/car_and_tech_companies.json) and load them at runtime.
This change will:
- Improve readability of code that ingests this data
- Enable syntax validation and diffing in structured formats
- Simplify future updates and translations
Would you like assistance converting this to a JSON/YAML resource?
🧰 Tools
🪛 LanguageTool
[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...(ENGLISH_WORD_REPEAT_RULE)
[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...(A_VARIETY_OF)
[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...(ENGLISH_WORD_REPEAT_RULE)
[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...(IN_RECENT_STYLE)
[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...(ENGLISH_WORD_REPEAT_RULE)
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
53-55: Optional: Reevaluate the__main__guard for test consistency.Adding:
if __name__ == "__main__": test_run_tasks()enables standalone execution but is nonstandard when using frameworks like
pytestorunittestCLI. Consider relying solely on your test runner for consistency across CI and local runs.cognee/modules/pipelines/operations/__init__.py (1)
1-1: Update the import to follow package conventionsThe newly added import
log_pipeline_run_initiatedis not being used within this file and isn't exposed through an__all__declaration. This creates "dead imports" as flagged by the static analysis tool.Choose one of these approaches:
- Add an
__all__list to explicitly define the public API:from .log_pipeline_run_initiated import log_pipeline_run_initiated from .log_pipeline_run_start import log_pipeline_run_start from .log_pipeline_run_complete import log_pipeline_run_complete from .log_pipeline_run_error import log_pipeline_run_error from .pipeline import cognee_pipeline +__all__ = [ + "log_pipeline_run_initiated", + "log_pipeline_run_start", + "log_pipeline_run_complete", + "log_pipeline_run_error", + "cognee_pipeline", +]
- Or add a comment explaining the import pattern if this is an intentional design choice for implicit exports.
🧰 Tools
🪛 Ruff (0.8.2)
1-1:
.log_pipeline_run_initiated.log_pipeline_run_initiatedimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
examples/python/graphiti_example.py (1)
31-33: User context integration is correct but could use clarificationAdding user initialization to align with updated pipeline requirements is good. Consider adding a brief comment explaining why user context is necessary for task execution in the updated system.
# Initialize default user user = await get_default_user() +# User context is required for pipeline task execution in the updated systemcognee/modules/retrieval/exceptions/__init__.py (1)
7-7: Consider addressing the unused imports in the moduleThe static analysis tool has identified that
SearchTypeNotSupportedandCypherSearchErrorare imported but not used within this module. If the intent is to re-export these exceptions at the package level, consider adding them to an__all__list.from .exceptions import SearchTypeNotSupported, CypherSearchError +__all__ = ["SearchTypeNotSupported", "CypherSearchError"]🧰 Tools
🪛 Ruff (0.8.2)
7-7:
.exceptions.SearchTypeNotSupportedimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
7-7:
.exceptions.CypherSearchErrorimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
cognee/modules/observability/get_observe.py (1)
1-11: Add docstring for better documentationThe function lacks a docstring explaining its purpose, parameters, return values, and possible exceptions. Since this is a utility function that will be used throughout the codebase, proper documentation is essential.
Add a docstring to clarify the function's purpose:
def get_observe(): + """ + Get the appropriate observe decorator based on the configured monitoring tool. + + Returns: + The observe decorator from the configured monitoring tool package. + + Raises: + ValueError: If the configured monitoring tool is not supported. + """ monitoring = get_base_config().monitoring_tool if monitoring == Observer.LANGFUSE: from langfuse.decorators import observe return observecognee/modules/observability/observers.py (1)
7-9: Consider adding docstrings for individual enum valuesWhile the class has a brief docstring, adding documentation for each enum value would improve clarity about what each monitoring tool represents and potentially any specific requirements or behaviors associated with it.
class Observer(str, Enum): """Monitoring tools""" - LANGFUSE = "langfuse" - LLMLITE = "llmlite" - LANGSMITH = "langsmith" + LANGFUSE = "langfuse" # Langfuse observability platform + LLMLITE = "llmlite" # LLMLite monitoring tool + LANGSMITH = "langsmith" # LangSmith tracing and evaluation platformcognee/tests/test_neo4j.py (1)
77-86: Consider adding a TODO to revisit this testWhile commenting out the test with a clear explanation is reasonable, it would be helpful to add a TODO or FIXME to ensure this test is revisited in the future when the issue with weak LLM models is addressed.
-# NOTE: Due to the test failing often on weak LLM models we've removed this test for now +# TODO: Revisit this test when more reliable LLM models are available or implement a more robust test +# NOTE: Test removed temporarily as it fails frequently on weaker LLM modelscognee/modules/pipelines/operations/log_pipeline_run_initiated.py (2)
6-14: Add docstring to improve function documentationThe function lacks documentation explaining its purpose, parameters, and return value. Adding a docstring would improve code maintainability.
async def log_pipeline_run_initiated(pipeline_id: str, pipeline_name: str, dataset_id: UUID): + """ + Creates and persists a new pipeline run record indicating the initiation of dataset processing. + + Args: + pipeline_id: The ID of the pipeline + pipeline_name: The name of the pipeline + dataset_id: The UUID of the dataset being processed + + Returns: + PipelineRun: The created pipeline run object + """ pipeline_run = PipelineRun( pipeline_run_id=uuid4(), pipeline_name=pipeline_name,
13-14: Consider adding basic metadata to initial run_infoThe run_info is initialized as an empty dictionary. Consider adding basic metadata like timestamp or initiator to provide more context.
pipeline_id=pipeline_id, status=PipelineRunStatus.DATASET_PROCESSING_INITIATED, dataset_id=dataset_id, - run_info={}, + run_info={ + "initiated_at": datetime.datetime.utcnow().isoformat(), + "environment": os.environ.get("ENVIRONMENT", "unknown") + }, )cognee/infrastructure/databases/graph/get_graph_engine.py (1)
71-72: Consider more specific error messages for missing credentialsThe error message could be more specific about which Memgraph credential is missing to aid in debugging configuration issues.
elif graph_database_provider == "memgraph": - if not (graph_database_url and graph_database_username and graph_database_password): - raise EnvironmentError("Missing required Memgraph credentials.") + missing_credentials = [] + if not graph_database_url: + missing_credentials.append("graph_database_url") + if not graph_database_username: + missing_credentials.append("graph_database_username") + if not graph_database_password: + missing_credentials.append("graph_database_password") + + if missing_credentials: + raise EnvironmentError(f"Missing required Memgraph credentials: {', '.join(missing_credentials)}")cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)
175-190: Clean up unused search type handlingThe component no longer includes "INSIGHTS" in the search options (as mentioned in the AI summary), but the
convertToSearchTypeOutputfunction still handles cases for "INSIGHTS", "SUMMARIES", and "CHUNKS" which may no longer be used.Consider cleaning up the unused cases in this switch statement to match the available search options:
switch (searchType) { - case 'INSIGHTS': - return systemMessages.map((message: InsightMessage) => { - const [node1, relationship, node2] = message; - if (node1.name && node2.name) { - return `${node1.name} ${relationship.relationship_name} ${node2.name}.`; - } - return ''; - }).join('\n'); - case 'SUMMARIES': - return systemMessages.map((message: { text: string }) => message.text).join('\n'); - case 'CHUNKS': - return systemMessages.map((message: { text: string }) => message.text).join('\n'); default: return ""; }cognee/exceptions/exceptions.py (1)
23-30: Consider validating the log_level parameter.The code doesn't validate that log_level is one of the supported values. An invalid value would result in silent failure (no logging) without any error.
Consider adding validation:
def __init__( self, message: str = "Service is unavailable.", name: str = "Cognee", status_code=status.HTTP_418_IM_A_TEAPOT, log=True, - log_level="ERROR", + log_level: Literal["ERROR", "WARNING", "INFO", "DEBUG"] = "ERROR", ): self.message = message self.name = name self.status_code = status_code + valid_log_levels = ["ERROR", "WARNING", "INFO", "DEBUG"] + if log and log_level not in valid_log_levels: + log_level = "ERROR" + logger.warning(f"Invalid log_level '{log_level}'. Using 'ERROR' instead.") # Automatically log the exception details if log and (log_level == "ERROR"):You'll need to add
from typing import Literalat the top of the file.cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (2)
60-70: Good implementation of button state management.The implementation correctly:
- Uses the useBoolean hook for managing button state
- Disables the button during the async operation
- Ensures the button is re-enabled in the finally block
Consider adding error handling to display feedback to the user if the cognify operation fails.
const handleCognify = () => { disableCognifyButton(); onCognify() + .catch(error => { + // Display error to user, e.g. with a toast notification + console.error('Cognify operation failed:', error); + }) .finally(() => enableCognifyButton()); };
36-36: Component name shadows global property.The component name 'DataView' shadows a global property, which could lead to confusion.
Consider renaming the component to avoid shadowing:
-export default function DataView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) { +export default function DatasetView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {Remember to update the component name in any importing files as well.
🧰 Tools
🪛 Biome (1.9.4)
[error] 36-36: Do not shadow the global "DataView" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
cognee/modules/pipelines/operations/run_tasks_base.py (1)
31-33: Optimize dictionary key lookup.The current implementation uses
.keys()which creates an unnecessary view object when checking for key existence.- has_context = any( - [key == "context" for key in inspect.signature(running_task.executable).parameters.keys()] - ) + has_context = "context" in inspect.signature(running_task.executable).parameters🧰 Tools
🪛 Ruff (0.8.2)
32-32: Use
key in dictinstead ofkey in dict.keys()Remove
.keys()(SIM118)
Dockerfile (1)
33-34: Consider consolidating dependency installation.The dependency installation is performed twice with the same parameters. Consider consolidating into a single installation after copying all source code, or clearly documenting why two separate installations are necessary.
Also applies to: 43-44
cognee/tests/test_memgraph.py (2)
14-14: Add docstring to the main function.Consider adding a docstring to explain the purpose and flow of this integration test for better clarity and documentation.
async def main(): + """ + Integration test for Memgraph adapter in Cognee. + + This test configures Cognee with Memgraph, loads test data, performs + cognification, executes various search queries, verifies history tracking, + and performs cleanup to ensure proper isolation. + """ cognee.config.set_graph_database_provider("memgraph")
57-60: Enhance search result assertions.The current assertions only check if results are non-empty. Consider adding more specific assertions to verify the quality and correctness of search results, such as checking for expected content or structural properties.
search_results = await cognee.search( query_type=SearchType.INSIGHTS, query_text=random_node_name ) assert len(search_results) != 0, "The search results list is empty." +# Verify result quality +assert any("quantum" in str(result).lower() for result in search_results), "Expected 'quantum' keyword not found in results"Also applies to: 65-69, 71-77, 79-86
cognee/modules/pipelines/operations/pipeline.py (1)
173-175: Mismatch between validation logic and error messageThe validation currently rejects dots
"."and spaces" "in a dataset name, yet the error message claims that underscores are disallowed.
This will certainly confuse users and may lead to unnecessary support questions.- if "." in dataset_name or " " in dataset_name: - raise ValueError("Dataset name cannot contain spaces or underscores") + if "." in dataset_name or " " in dataset_name: + raise ValueError("Dataset name cannot contain spaces or dots (.)")cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
250-253: Early return path is good but misseswith_vectorflagWhen no results are found the method returns an empty list ‑ great.
However, ifwith_vector=Truethe caller might still expect the vector field in eachScoredResult. Consider documenting this behaviour or returning a structure that makes the absence explicit.cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)
37-47: Repeatedconnect()on every call – cache connection status
await self.client.connect()is executed on eachget_client()invocation.
If the SDK treatsconnect()as idempotent this is only wasted latency; if not, it may raiseAlreadyConnectedor leak resources.async def get_client(self): - await self.client.connect() - return self.client + if not getattr(self.client, "_is_connected", False): + await self.client.connect() + # flag set by us; adjust to real SDK attribute if it exists + self.client._is_connected = True + return self.clientcognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
205-211: Serial deletes can be slow – batch predicate is simplerLooping one-by-one round-trips to delete rows may crawl on large lists.
Since IDs are primary keys, a single predicate delete is supported:-for data_point_id in data_point_ids: - await collection.delete(f"id = '{data_point_id}'") +ids = "', '".join(data_point_ids) +await collection.delete(f"id IN ('{ids}')")This cuts network chatter and lock contention.
cognee-mcp/src/server.py (1)
79-86:all(results)is an unreliable success criterion
run_code_graph_pipelineappears to yield complex objects; relying on truthiness may mark partial failures as success (e.g., non-empty dicts are truthy). Consider:
- Inspect returned objects for explicit success flags, or
- Catch exceptions inside the loop instead of inferring success by truthiness, or
- Log and propagate the first exception.
Without this, the “Codify process failed” branch may never execute even when errors occur.
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
66-73: Added defensive error handling for entity not found.The code now gracefully handles cases where entities don't exist in the database by catching
EntityNotFoundErrorand continuing execution rather than failing. This improves robustness when working with incomplete data.Consider using
contextlib.suppress(EntityNotFoundError)for a more concise implementation:- try: - await memory_fragment.project_graph_from_db( - graph_engine, - node_properties_to_project=properties_to_project, - edge_properties_to_project=["relationship_name"], - ) - except EntityNotFoundError: - pass + from contextlib import suppress + with suppress(EntityNotFoundError): + await memory_fragment.project_graph_from_db( + graph_engine, + node_properties_to_project=properties_to_project, + edge_properties_to_project=["relationship_name"], + )🧰 Tools
🪛 Ruff (0.8.2)
66-73: Use
contextlib.suppress(EntityNotFoundError)instead oftry-except-passReplace with
contextlib.suppress(EntityNotFoundError)(SIM105)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
.github/actions/cognee_setup/action.ymlis excluded by!**/*.yml.github/workflows/python_version_tests.ymlis excluded by!**/*.yml.github/workflows/test_memgraph.ymlis excluded by!**/*.ymlassets/cognee_benefits.pngis excluded by!**/*.png,!**/*.pngassets/graph_visualization.pngis excluded by!**/*.png,!**/*.pngcognee-mcp/pyproject.tomlis excluded by!**/*.tomlcognee-mcp/uv.lockis excluded by!**/*.lock,!**/*.lockpoetry.lockis excluded by!**/*.lock,!**/*.lockpyproject.tomlis excluded by!**/*.tomluv.lockis excluded by!**/*.lock,!**/*.lock
📒 Files selected for processing (78)
CONTRIBUTING.md(1 hunks)Dockerfile(1 hunks)README.md(2 hunks)assets/graph_visualization.html(0 hunks)cognee-frontend/src/app/page.tsx(3 hunks)cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx(1 hunks)cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx(1 hunks)cognee-frontend/src/app/wizard/WizardPage.tsx(1 hunks)cognee-frontend/src/modules/datasets/cognifyDataset.ts(1 hunks)cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts(1 hunks)cognee-frontend/src/modules/ingestion/DataView/DataView.tsx(4 hunks)cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx(1 hunks)cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx(2 hunks)cognee-frontend/src/utils/fetch.ts(1 hunks)cognee-mcp/src/server.py(3 hunks)cognee/api/v1/add/add.py(1 hunks)cognee/api/v1/cognify/code_graph_pipeline.py(3 hunks)cognee/api/v1/cognify/cognify.py(1 hunks)cognee/api/v1/datasets/datasets.py(1 hunks)cognee/base_config.py(1 hunks)cognee/exceptions/exceptions.py(1 hunks)cognee/infrastructure/databases/graph/get_graph_engine.py(1 hunks)cognee/infrastructure/databases/graph/graph_db_interface.py(2 hunks)cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py(1 hunks)cognee/infrastructure/databases/graph/networkx/adapter.py(7 hunks)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py(2 hunks)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py(8 hunks)cognee/infrastructure/databases/vector/exceptions/exceptions.py(1 hunks)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py(8 hunks)cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py(9 hunks)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py(5 hunks)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py(6 hunks)cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py(9 hunks)cognee/infrastructure/llm/gemini/adapter.py(1 hunks)cognee/infrastructure/llm/openai/adapter.py(2 hunks)cognee/modules/data/methods/__init__.py(1 hunks)cognee/modules/data/methods/create_dataset.py(2 hunks)cognee/modules/data/methods/get_unique_dataset_id.py(1 hunks)cognee/modules/graph/cognee_graph/CogneeGraph.py(1 hunks)cognee/modules/observability/get_observe.py(1 hunks)cognee/modules/observability/observers.py(1 hunks)cognee/modules/pipelines/models/PipelineRun.py(1 hunks)cognee/modules/pipelines/operations/__init__.py(1 hunks)cognee/modules/pipelines/operations/get_pipeline_status.py(2 hunks)cognee/modules/pipelines/operations/log_pipeline_run_initiated.py(1 hunks)cognee/modules/pipelines/operations/pipeline.py(3 hunks)cognee/modules/pipelines/operations/run_tasks.py(4 hunks)cognee/modules/pipelines/operations/run_tasks_base.py(4 hunks)cognee/modules/retrieval/exceptions/__init__.py(1 hunks)cognee/modules/retrieval/exceptions/exceptions.py(0 hunks)cognee/modules/retrieval/graph_completion_retriever.py(1 hunks)cognee/modules/retrieval/utils/brute_force_triplet_search.py(4 hunks)cognee/modules/settings/get_settings.py(2 hunks)cognee/shared/data_models.py(0 hunks)cognee/shared/logging_utils.py(1 hunks)cognee/tasks/ingestion/ingest_data.py(1 hunks)cognee/tasks/temporal_awareness/index_graphiti_objects.py(2 hunks)cognee/tests/integration/run_toy_tasks/conftest.py(0 hunks)cognee/tests/test_memgraph.py(1 hunks)cognee/tests/test_neo4j.py(1 hunks)cognee/tests/test_weaviate.py(1 hunks)cognee/tests/unit/modules/pipelines/run_tasks_test.py(1 hunks)cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py(1 hunks)cognee/tests/unit/modules/retrieval/chunks_retriever_test.py(4 hunks)cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py(1 hunks)cognee/tests/unit/modules/retrieval/summaries_retriever_test.py(1 hunks)cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py(0 hunks)entrypoint.sh(3 hunks)examples/data/car_and_tech_companies.txt(1 hunks)examples/python/graphiti_example.py(2 hunks)notebooks/cognee_demo.ipynb(3 hunks)notebooks/cognee_graphiti_demo.ipynb(4 hunks)notebooks/cognee_llama_index.ipynb(2 hunks)notebooks/cognee_simple_demo.ipynb(7 hunks)notebooks/github_graph_visualization.html(0 hunks)notebooks/graphrag_vs_rag.ipynb(7 hunks)notebooks/hr_demo.ipynb(0 hunks)notebooks/llama_index_cognee_integration.ipynb(5 hunks)
💤 Files with no reviewable changes (7)
- cognee/modules/retrieval/exceptions/exceptions.py
- cognee/shared/data_models.py
- cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py
- cognee/tests/integration/run_toy_tasks/conftest.py
- notebooks/github_graph_visualization.html
- assets/graph_visualization.html
- notebooks/hr_demo.ipynb
🧰 Additional context used
🧬 Code Graph Analysis (26)
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1)
test_run_tasks(42-43)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
cognee/modules/users/models/User.py (1)
User(12-39)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)
cognee/modules/retrieval/summaries_retriever.py (1)
SummariesRetriever(9-33)
cognee/api/v1/cognify/cognify.py (2)
cognee/modules/pipelines/operations/pipeline.py (1)
cognee_pipeline(30-103)cognee/api/v1/datasets/datasets.py (1)
datasets(7-40)
cognee/infrastructure/llm/openai/adapter.py (1)
cognee/modules/observability/get_observe.py (1)
get_observe(5-11)
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1)
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
Explorer(15-61)
cognee/tests/test_neo4j.py (1)
cognee/modules/users/methods/get_default_user.py (1)
get_default_user(12-37)
cognee/modules/data/methods/__init__.py (1)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
get_unique_dataset_id(5-6)
cognee/tasks/ingestion/ingest_data.py (1)
cognee/modules/data/methods/create_dataset.py (1)
create_dataset(11-33)
cognee/modules/pipelines/operations/get_pipeline_status.py (1)
cognee/modules/pipelines/models/PipelineRun.py (1)
PipelineRun(15-27)
cognee/modules/pipelines/operations/__init__.py (1)
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
log_pipeline_run_initiated(6-22)
cognee/modules/observability/get_observe.py (2)
cognee/base_config.py (1)
get_base_config(29-30)cognee/modules/observability/observers.py (1)
Observer(4-9)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
MemgraphAdapter(20-690)
cognee/base_config.py (1)
cognee/modules/observability/observers.py (1)
Observer(4-9)
cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)
cognee-frontend/src/utils/fetch.ts (1)
fetch(3-12)
cognee/api/v1/datasets/datasets.py (1)
cognee/modules/pipelines/operations/get_pipeline_status.py (1)
get_pipeline_status(8-35)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (5)
cognee/modules/pipelines/tasks/task.py (1)
Task(5-97)cognee/modules/users/methods/get_default_user.py (1)
get_default_user(12-37)cognee/modules/pipelines/operations/run_tasks_base.py (1)
run_tasks_base(66-82)cognee/infrastructure/databases/relational/create_db_and_tables.py (1)
create_db_and_tables(4-7)cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
run_and_check_tasks(10-46)
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (1)
cognee/api/v1/config/config.py (2)
config(15-194)system_root_directory(17-29)
cognee/tests/test_weaviate.py (1)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
get_vector_engine(5-6)
cognee/tests/test_memgraph.py (7)
cognee/modules/users/methods/get_default_user.py (1)
get_default_user(12-37)cognee/shared/logging_utils.py (1)
get_logger(137-158)cognee/modules/search/types/SearchType.py (1)
SearchType(4-13)cognee/api/v1/config/config.py (4)
config(15-194)set_graph_database_provider(57-59)data_root_directory(32-34)system_root_directory(17-29)cognee/infrastructure/databases/vector/get_vector_engine.py (1)
get_vector_engine(5-6)cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(9-21)cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
get_graph_data(455-483)
cognee/modules/data/methods/create_dataset.py (2)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
get_unique_dataset_id(5-6)cognee/modules/users/models/User.py (1)
User(12-39)
cognee/exceptions/exceptions.py (1)
cognee/shared/logging_utils.py (4)
error(127-128)warning(124-125)info(121-122)debug(133-134)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
CollectionNotFoundError(5-14)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
get_async_session(34-40)
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (5)
cognee/shared/logging_utils.py (1)
get_logger(137-158)cognee/exceptions/exceptions.py (1)
InvalidValueError(47-54)cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(16-96)cognee/infrastructure/engine/utils/parse_id.py (1)
parse_id(4-10)cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
CollectionNotFoundError(5-14)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (5)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (4)
get_collection(121-126)has_collection(111-113)get_connection(99-106)create_data_points(128-144)cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (3)
get_collection(75-80)has_collection(51-53)create_data_points(82-132)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (2)
has_collection(74-78)create_data_points(99-130)cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
CollectionNotFoundError(5-14)cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(16-96)
cognee/infrastructure/databases/graph/networkx/adapter.py (4)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (7)
has_node(66-75)get_edges(264-275)extract_node(121-124)extract_nodes(126-136)get_neighbors(381-383)get_node(385-392)get_nodes(394-402)cognee/infrastructure/databases/graph/kuzu/adapter.py (7)
has_node(167-171)get_edges(439-475)extract_node(284-304)extract_nodes(306-325)get_neighbors(479-481)get_node(483-502)get_nodes(504-521)cognee/infrastructure/databases/graph/graph_db_interface.py (4)
get_edges(177-179)get_neighbors(182-184)get_node(125-127)get_nodes(130-132)cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (2)
extract_node(235-238)extract_nodes(240-241)
🪛 Ruff (0.8.2)
cognee/modules/retrieval/exceptions/__init__.py
7-7: .exceptions.SearchTypeNotSupported imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
7-7: .exceptions.CypherSearchError imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
cognee/modules/data/methods/__init__.py
10-10: .get_unique_dataset_id.get_unique_dataset_id imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
cognee/modules/pipelines/operations/__init__.py
1-1: .log_pipeline_run_initiated.log_pipeline_run_initiated imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
cognee/modules/retrieval/utils/brute_force_triplet_search.py
66-73: Use contextlib.suppress(EntityNotFoundError) instead of try-except-pass
Replace with contextlib.suppress(EntityNotFoundError)
(SIM105)
cognee/modules/pipelines/operations/run_tasks_base.py
32-32: Use key in dict instead of key in dict.keys()
Remove .keys()
(SIM118)
cognee-mcp/src/server.py
48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py
430-430: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🪛 Biome (1.9.4)
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx
[error] 36-36: Do not shadow the global "DataView" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
🪛 LanguageTool
examples/data/car_and_tech_companies.txt
[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...
(A_VARIETY_OF)
[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...
(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...
(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...
(IN_RECENT_STYLE)
[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...
(ENGLISH_WORD_REPEAT_RULE)
🔇 Additional comments (119)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)
130-130: Good parameter naming alignment with the implementation.The change from
limit=20totop_k=20correctly aligns the test with the actual implementation ofSummariesRetriever, which acceptstop_kas a constructor parameter and then uses it internally as thelimitparameter when calling the vector engine. This change is part of the broader standardization of parameter naming conventions across retrieval modules.cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py (1)
157-162: Good refactoring of test execution pattern!Converting individual test calls to a sequential async main function improves test organization and efficiency. This approach:
- Ensures tests run in a predictable order
- Uses a single event loop for all tests rather than creating/destroying multiple loops
- Follows the async/await pattern consistently throughout the test file
- Matches the pattern used in other retrieval test modules
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (4)
19-19: Good test isolation improvementUsing a test-specific directory path (
test_chunks_context_simple) provides better isolation between test cases and prevents data leakage between tests.Also applies to: 23-23
76-77: Good test isolation improvementUsing a test-specific directory path (
test_chunk_context_complex) ensures that this test's data doesn't interfere with other tests. This also makes debugging easier when a specific test fails.Also applies to: 80-81
165-166: Good test isolation improvementUsing a dedicated directory path (
test_chunk_context_empty) for the empty graph test ensures accurate testing of the empty state behavior without interference from other test data.Also applies to: 169-170
193-198: Good refactoring of test execution pattern!Converting multiple individual test calls to a single async main function is an improvement that:
- Ensures consistent use of the async/await pattern
- Prevents multiple event loop creation/destruction cycles
- Makes test execution more predictable and maintainable
- Aligns with the pattern used in other retrieval test modules
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
72-72: Consistency fix in comment formatting.Fixed apostrophe style in comment from curly to straight, improving code consistency.
cognee/modules/pipelines/models/PipelineRun.py (1)
9-9: Verify database migration for new enum member.Adding
DATASET_PROCESSING_INITIATEDtoPipelineRunStatusmay require an Alembic migration if using a PostgreSQLENUMtype. Please confirm that your migration scripts include this new member, or update the type with anALTER TYPEstatement.cognee-frontend/src/utils/fetch.ts (1)
4-4: LGTM on URL standardizationChanging from IP address to hostname is a good standardization practice. The functionality remains the same as both "127.0.0.1" and "localhost" resolve to the local machine.
For future improvements, consider using environment variables for the API base URL:
- return global.fetch('http://localhost:8000/api' + url, { + return global.fetch((process.env.NEXT_PUBLIC_API_BASE_URL || 'http://localhost:8000/api') + url, {This would make the application more configurable across different environments.
cognee/tests/test_weaviate.py (1)
85-85: LGTM on adding await keywordThe addition of the
awaitkeyword is necessary and correct since thelist_all()method has been converted to an asynchronous function as part of the broader WeaviateAdapter refactoring.This change ensures the test properly waits for the asynchronous operation to complete before asserting that the collections list is empty.
cognee-frontend/src/app/wizard/WizardPage.tsx (1)
32-32: LGTM on standardizing dataset identificationChanging from
idtonamealigns with the broader standardization of dataset identification throughout the frontend codebase.This change is consistent with updates in related components like CognifyStep and ExploreStep, which now expect a dataset prop with a
nameproperty instead of anidproperty.cognee/api/v1/cognify/cognify.py (1)
37-39: Pipeline naming standardization looks goodThe addition of the
pipeline_name="cognify_pipeline"parameter helps standardize pipeline identification across the system, allowing for better tracking and filtering of pipeline executions. This aligns well with the changes in the datasets module where the same pipeline name is used for status retrieval.cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
10-10: Dataset identification change is aligned with broader architecture updatesChanging the
datasetproperty from{ id: string }to{ name: string }reflects a shift in how datasets are identified throughout the application. This is consistent with similar changes in other frontend components. The Explorer component correctly handles this change since it only passes the dataset object to other functions without directly accessing specific properties.examples/python/graphiti_example.py (1)
41-41: Pipeline execution with user context looks goodThe updated
run_taskscall with the user parameter aligns with the broader updates to the pipeline task execution system.cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (2)
5-5: LGTM! Dataset identification unified by name.The dataset prop type change from
{ id: string }to{ name: string }aligns with the broader frontend refactoring to standardize dataset identification by name rather than ID.
11-11: Compatible with corresponding Explorer component changes.Removed the non-null assertion operator (
!) when passing the dataset prop to the Explorer component, which is consistent with the updated type system and how Explorer handles the dataset prop.README.md (2)
61-61: LGTM! Updated Colab notebook link.The Google Colab notebook link has been updated to a new URL, ensuring users have access to the latest getting started materials.
128-135: Great addition highlighting the new UI capabilities.The new "cognee UI" section enhances the README by providing users with information about the visual interface, which aligns with the broader frontend improvements in this PR. The screenshot and local setup link make it easy for users to get started with the UI.
cognee/api/v1/add/add.py (1)
2-6: Improved import organization.The reordering of imports provides better logical grouping of related modules, improving code readability:
- Pipeline imports grouped together
- User model below Task
- Task operations below pipeline definition
No functional changes were made to the code logic, making this a safe improvement.
cognee/infrastructure/llm/openai/adapter.py (2)
5-5: Simplified imports.Removed unused imports and kept only the essential
Typeimport from the typing module, improving code cleanliness.
18-20: Centralized observability implementation.Replaced conditional import logic with a centralized approach using the new
get_observe()function. This change:
- Improves code consistency across different LLM adapters
- Centralizes monitoring tool configuration logic
- Makes the code more maintainable by reducing duplicated conditional logic
This is a good architectural improvement that aligns with the broader observability refactoring in the codebase.
cognee/modules/retrieval/graph_completion_retriever.py (1)
78-78:Details
✅ Verification successful
Error handling pattern has changed.
The try-except block catching
EntityNotFoundErrorhas been removed. Ensure that error handling for entity-not-found scenarios is now properly managed in theget_tripletsmethod or that this error condition is no longer relevant for this code path.
🏁 Script executed:
#!/bin/bash # Search for EntityNotFoundError handling in the get_triplets method or brute_force_triplet_search rg -A 5 "EntityNotFoundError" cognee/modules/retrieval/utils/brute_force_triplet_search.pyLength of output: 593
🏁 Script executed:
#!/bin/bash echo "1. Locating get_triplets definition in graph_completion_retriever.py" rg -n "def get_triplets" cognee/modules/retrieval/graph_completion_retriever.py echo "2. Showing the first 200 lines of graph_completion_retriever.py for context" sed -n '1,200p' cognee/modules/retrieval/graph_completion_retriever.py echo "3. Searching for brute_force_triplet_search usage in graph_completion_retriever.py" rg -n "brute_force_triplet_search" cognee/modules/retrieval/graph_completion_retriever.pyLength of output: 5994
Error handling for missing entities is covered upstream
TheEntityNotFoundErroris caught and suppressed withinbrute_force_triplet_search, so there’s no need for an additional try-except inget_triplets.cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts (1)
3-3:Details
✅ Verification successful
Breaking change: parameter type changed from
idtoname.The function parameter has been changed from
{ id: string }to{ name: string }. This is part of a broader refactor to use dataset names instead of IDs. Ensure all callers of this function have been updated to provide anameproperty instead of anid.
🏁 Script executed:
#!/bin/bash # Search for instances where getExplorationGraphUrl is called to verify they use name rg -A 2 "getExplorationGraphUrl" cognee-frontend/srcLength of output: 1050
Verified:
getExplorationGraphUrlcallers updated
- The only invocation found in
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsxnow passes the fulldatasetobject (which includesname).- No other call sites of
getExplorationGraphUrlremain in the codebase.No further changes required.
cognee/tasks/ingestion/ingest_data.py (1)
107-107: LGTM!The change correctly passes the full
userobject instead of just the user ID to thecreate_datasetfunction, which aligns with the updated function signature that now expects aUserobject as the second parameter.cognee/infrastructure/databases/graph/graph_db_interface.py (3)
61-61: Logging level changed from error to debug.The change from
logger.errortologger.debugfor relationship errors reduces log noise for non-critical issues that are already being handled with rollback. This is a good practice as these exceptions are caught and handled properly within the code flow.
81-81: Consistent logging level adjustment for edge relationships.Similarly to the previous change, the logging level for edge relationship errors has been appropriately reduced from error to debug, maintaining consistency in how these handled exceptions are logged.
88-88: Session commit errors now logged at debug level.The session commit errors are now logged at debug level instead of error level. This change aligns with the overall approach to reduce noise in error logs for handled exceptions.
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
131-135:Details
❓ Verification inconclusive
API call updated to use the new unified search interface.
The code has been modernized to use the
searchmethod with explicit keyword arguments instead of the deprecatedget_distance_from_collection_elementsmethod. This change aligns with broader refactoring across vector database adapters to use a unified search interface.The parameter
limit=0might need verification - typically a limit of 0 might mean "no results" in some systems while in others it could mean "no limit".
🏁 Script executed:
#!/bin/bash # Check if the search method implementation treats limit=0 appropriately rg -A 10 "async def search" --type py cognee/infrastructure/databases/vector/Length of output: 7548
I’m inspecting how each adapter handles a
limit=0parameter to confirm that it either returns all results or is treated safely. Running a scan of allsearchimplementations now.
Confirm
limit=0Semantics Across All AdaptersThe switch from the deprecated
get_distance_from_collection_elementsto the unifiedsearch()interface looks correct. However, passinglimit=0can mean “no results” in some clients rather than “unlimited.” Please verify that each adapter treatslimit=0as intended (e.g. return all matches or a sensible default) and adjust if needed:• cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py
• cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py
• cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py
• cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
• cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py
• cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.pycognee/modules/settings/get_settings.py (2)
2-2: Added import for Optional type.The import for
Optionalfrom the typing module has been added to support the type changes in theLLMConfigmodel.
24-25:Details
✅ Verification successful
LLMConfig fields made optional.
The
endpointandapi_versionfields in theLLMConfigmodel have been changed from required strings to optional strings. This provides more flexibility in configuration, especially for LLM providers that might not require these fields.Make sure that any code using these fields handles the case where they might be None.
🏁 Script executed:
#!/bin/bash # Check usages of endpoint and api_version to ensure they handle None values rg -A 5 -B 5 "endpoint|api_version" --type py cognee/infrastructure/llm/Length of output: 15993
🏁 Script executed:
#!/bin/bash # Display the first 200 lines of the settings file to understand context sed -n '1,200p' cognee/modules/settings/get_settings.pyLength of output: 4448
Approved: Optional
endpoint/api_versionin Settings LLMConfig Are SafeThe
endpointandapi_versionfields incognee/modules/settings/get_settings.pyare only used to shape the settings payload returned to the UI. Since they’re nowOptional[str], Pydantic’sBaseModelwill acceptNoneand emitnullin the JSON—there’s no dereferencing or logic applied to these values elsewhere in the codebase.No further changes are required.
cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx (1)
10-10:Details
✅ Verification successful
Dataset identification changed from id to name.
The
datasetprop type has been updated from{ id: string }to{ name: string }, which aligns with a broader standardization of dataset identification across the application. This change ensures consistent dataset referencing throughout the frontend and backend pipeline.Verify that the
cognifyDatasetfunction properly handles this change in the dataset identification scheme:
🏁 Script executed:
#!/bin/bash # Check the implementation of cognifyDataset to ensure it uses dataset.name cat src/modules/datasets/cognifyDataset.tsLength of output: 115
🏁 Script executed:
#!/usr/bin/env bash # Locate the implementation of cognifyDataset to verify it uses dataset.name rg -n "cognifyDataset" .Length of output: 584
🏁 Script executed:
#!/usr/bin/env bash # Display the implementation of cognifyDataset to confirm it references dataset.name sed -n '1,200p' cognee-frontend/src/modules/datasets/cognifyDataset.tsLength of output: 427
Dataset identification change verified in cognifyDataset
The
datasetprop has been updated to{ name: string }inCognifyStep.tsx, and insrc/modules/datasets/cognifyDataset.tswe see:body: JSON.stringify({ datasets: [dataset.id || dataset.name], }),This ensures that when
idis absent,nameis used. No further changes are needed.cognee/modules/observability/observers.py (1)
4-9: LGTM! Good use of string-based enumThe Observer enum inheriting from
strandEnumis a good choice, allowing for direct string comparisons without explicit conversions. The enumeration is clear and concise.cognee/api/v1/datasets/datasets.py (1)
31-31: LGTM! Correct pipeline name specificationThe update correctly specifies the pipeline name parameter when calling
get_pipeline_status, which aligns with the changes in the pipeline operations module. This maintains the same interface for callers while adapting to the enhanced pipeline management system.cognee/tests/test_neo4j.py (1)
90-90: LGTM! Correctly updated assertionThe assertion has been properly updated to reflect the removal of the natural language search test. The expected history length is now 6 instead of 8, which is consistent with the commented-out search operation.
cognee/base_config.py (2)
5-5: Consistent enum migration for observability tools.The import change to use the new
Observerenum aligns with the broader refactoring of observability tooling across the codebase.
11-11: Type consistency maintained in monitoring tool default.The change from
MonitoringTool.LANGFUSEtoObserver.LANGFUSEpreserves the same functionality while using the newly defined enum from the observability module.cognee/modules/pipelines/operations/get_pipeline_status.py (2)
8-8: Function signature updated to support pipeline-specific filtering.Adding the
pipeline_nameparameter allows callers to specify which pipeline's status to retrieve, improving the query's specificity.
23-23: Enhanced query filtering by pipeline name.This new filter restricts the status retrieval to runs of the specified pipeline, improving the accuracy of pipeline status reporting.
cognee-frontend/src/app/page.tsx (2)
46-46: Improved user guidance in notification message.The updated notification clearly informs users of the next step after adding data.
106-106: UI enhancement for direct cognification.Adding the
onCognifyprop toDataViewenables the new UI workflow for triggering dataset cognification directly from the data view.cognee/infrastructure/llm/gemini/adapter.py (2)
6-7: Well-structured imports for improved observabilityThe centralized import of the observability module is a good approach, aligning with the broader refactoring of observability tooling across the codebase.
17-17: Good centralization of observability configurationReplacing conditional imports with a centralized function call improves maintainability and consistency. This approach makes it easier to switch monitoring tools if needed in the future.
entrypoint.sh (4)
16-16: Direct Alembic usage aligned with dependency management changesThis change aligns with the shift from Poetry to uv for dependency management, simplifying the migration command.
25-25: Helpful migration completion messageAdding this message improves visibility into the startup process.
27-27: Simplified startup messageThe more generic "Starting server..." message is appropriate since it's no longer specifically referring to Gunicorn.
36-37:Details
❓ Verification inconclusive
Removal of
execmight affect signal handling in containersRemoving
execfrom the server startup commands changes process hierarchy and signal propagation. In containerized environments, this might affect how signals (like SIGTERM) are handled.Also applies to: 38-39, 41-42
🏁 Script executed:
#!/bin/bash # This script checks if the application is running in a container environment # and warns about potential signal handling implications if [ -f /.dockerenv ] || grep -q docker /proc/self/cgroup 2>/dev/null || grep -q kubernetes /proc/self/cgroup 2>/dev/null; then echo "WARNING: Application appears to be running in a container environment" echo "Without 'exec' prefix, signals may not propagate correctly to the application process." echo "This could affect graceful shutdown during container orchestration." fi # Check if container orchestration tools are being used if [ -n "$KUBERNETES_SERVICE_HOST" ] || [ -n "$DOCKER_HOST" ]; then echo "Container orchestration detected. Signal handling is particularly important." fiLength of output: 130
Add
execto server startup for correct signal handling in containersTo ensure the startup process becomes PID 1 and properly receives SIGTERM/SIGINT in Docker/Kubernetes (allowing graceful shutdown), prepend
execto your debugpy/gunicorn commands:• entrypoint.sh, Lines 36–37, 38–39, 41–42
• Example diff for the debugpy+gunicorn invocation:- debugpy --wait-for-client --listen 0.0.0.0:5678 \ - -m gunicorn -w 3 -k uvicorn.workers.UvicornWorker \ - -t 30000 --bind=0.0.0.0:8000 --log-level debug \ - --reload cognee.api.client:app + exec debugpy --wait-for-client --listen 0.0.0.0:5678 \ + -m gunicorn -w 3 -k uvicorn.workers.UvicornWorker \ + -t 30000 --bind=0.0.0.0:8000 --log-level debug \ + --reload cognee.api.client:appPlease verify that graceful shutdown behavior in your container orchestration works as expected.
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
16-21: LGTM: Database session handling follows best practicesThe database session handling with async context manager follows best practices for asynchronous database operations.
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
70-80: Good implementation of Memgraph supportThe addition of Memgraph support follows the established pattern of the factory function, maintaining consistency with other database providers.
notebooks/cognee_demo.ipynb (3)
470-470: Import path updated to match updated module structure.The import path for the
Taskclass has been updated to reflect the lowercase module naming convention (taskinstead ofTask), aligning with Python's standard naming practices.
508-508: Parameter added to support user context in pipeline execution.The
run_tasksfunction call has been updated to include theuserparameter, which aligns with the broader architectural changes for context propagation in task execution. This change correctly implements the updated function signature.
532-536: Explicit user context retrieval implemented.The code now explicitly retrieves the user context in two steps:
- Getting the default user
- Fetching the complete user object with the default user's ID
This approach aligns with the codebase's architectural changes to standardize user context handling across the application.
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2)
81-82: Improved user experience with immediate input clearingMoving the input clearing to happen immediately after submitting the search rather than waiting for the response provides better feedback to users that their query has been submitted.
108-110: Good error handling to preserve user inputAdding error handling to restore the input value if the fetch request fails is an excellent improvement to prevent users from losing their search query when errors occur.
cognee/api/v1/cognify/code_graph_pipeline.py (4)
5-7: Improved observability module organizationReorganizing the imports for observability-related modules enhances code organization. The
get_observeimport from a dedicated module suggests a more centralized approach to observability management.
16-16: Added dataset ID generation capabilityAdding the import for
get_unique_dataset_idaligns with the asynchronous dataset ID generation approach and enables user-specific dataset IDs.
25-25: Centralized observability configurationReplacing conditional imports with a centralized call to
get_observe()improves code maintainability and provides a cleaner abstraction for observability tooling.
68-68: Enhanced dataset ID generation with user contextSwitching to an asynchronous call to
get_unique_dataset_idwith user context is a good improvement that allows for user-specific dataset IDs and better integration with the user management system.cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
9-14: Improved exception handling and logging configurationThe changes to the
CollectionNotFoundErrorclass enhance error handling by:
- Making the default exception name consistent with the class name
- Adding flexible logging control parameters
- Updating the superclass constructor appropriately
This approach aligns with best practices for exception handling and allows for more granular control over error reporting.
notebooks/cognee_graphiti_demo.ipynb (4)
18-18: Fixed typo in markdownSimple readability improvement in the notebook documentation.
27-29: Updated imports for required modulesAdding the
asyncioimport and updating the other imports to includeget_llm_clientandget_default_userproperly prepares the notebook for the subsequent code changes.Also applies to: 42-43
131-137: Integrated user context initializationSimplifying the pruning call and adding the default user initialization is a good improvement that aligns with the broader changes to incorporate user context throughout the system.
143-148: Updated task execution with user contextPassing the user object to
run_tasksproperly integrates the user context into the pipeline execution, ensuring consistent handling of user-related data throughout the system.cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (2)
1-47: Well-structured test for the new context propagation feature.This test effectively validates that contextual information can be passed through pipeline tasks. The calculation (starting with 5, adding 7, multiplying by 2, then raising to power 7) correctly results in the expected value of 4,586,471,424.
14-21: Clear task definitions with proper context usage.Good implementation of the test tasks with varied parameter patterns:
- task_1 uses both input and context
- task_2 only uses input
- task_3 uses input and context again
This variety helps validate that the context is properly maintained throughout the pipeline execution regardless of whether individual tasks use it.
cognee/modules/data/methods/create_dataset.py (3)
7-8: Appropriate imports for the new implementation.These imports correctly support the updated function signature that now accepts a User object instead of just an ID.
11-12: Good refactoring to accept full User objects.This change aligns with the broader architectural improvements to propagate context throughout the application. Extracting the owner_id from the user object maintains compatibility with the existing database queries.
24-26: Improved dataset ID generation with proper modularization.Moving the ID generation logic to a separate async helper function improves code organization while maintaining the same uniqueness guarantees (based on dataset name and user ID).
cognee/exceptions/exceptions.py (2)
15-16: Good addition of logging flexibility parameters.These new parameters allow callers to control whether and at what level exceptions are logged, enhancing observability control.
23-30: Improved conditional logging based on log level.The implementation properly handles different log levels, providing greater flexibility in how exceptions are reported.
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (3)
9-10: Appropriate UI component imports.The added imports support the new Cognify functionality with proper UI components.
33-33: Well-defined prop interface for the new cognify functionality.The Promise return type allows for proper async handling of the cognify operation.
80-84: Good UI integration of the Cognify button.The button is well-positioned next to related actions and has appropriate disabled state handling.
notebooks/graphrag_vs_rag.ipynb (5)
56-57: Updated cognee package version.Updated to use the newer cognee package version (0.1.39), ensuring compatibility with the latest features and API changes.
152-153: Updated import path and API usage pattern.Changed the import for SearchType from
cognee.modules.search.typestocognee.api.v1.searchand updated the function call to use keyword arguments. This reflects a move to a more stable and explicit API.
173-173: Fixed search type constant usage.Correctly using SearchType.RAG_COMPLETION instead of the previous SearchType.COMPLETION, which is more explicit and accurately describes the search functionality being used.
202-202: Updated search function call to use keyword arguments.Standardized the search function call to use keyword arguments, improving code clarity and alignment with the updated API.
246-248: Updated kernel metadata for consistency.Ensured the notebook's kernel metadata is consistent and compatible with current Python environments, which is good practice for notebook portability.
cognee/modules/pipelines/operations/run_tasks_base.py (1)
17-17: Good implementation of context propagation.The addition of the optional
contextparameter with inspection-based conditional passing provides a flexible way to propagate context through the task execution pipeline. This approach maintains backward compatibility while enabling more sophisticated inter-task communication.Also applies to: 31-36, 66-66, 80-81
Dockerfile (1)
1-61: Excellent improvement with multi-stage build and modern dependency management.The refactored Dockerfile adopts several best practices:
- Multi-stage build to minimize the final image size
- Using
uvfor faster dependency management instead of Poetry- Proper layer caching with separated dependency installation
- Single RUN command for system dependencies with apt cache cleanup
- Mount caching for
uvoperations to speed up buildsThese changes should result in faster builds, smaller images, and better maintainability.
cognee/tests/test_memgraph.py (1)
14-108: Well-structured integration test for Memgraph.This test provides comprehensive coverage of the Memgraph adapter functionality including configuration, data loading, cognification, search operations across multiple query types, and proper cleanup. The test isolation with custom paths is a good practice.
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (3)
122-125: Good error handling improvement.Adding specific exception handling for
CollectionNotExistExceptionand raising a customCollectionNotFoundErrorprovides clearer error information and aligns with error handling standardization across adapters. This makes debugging and error recovery more straightforward.Also applies to: 161-164, 211-214
148-148: Type annotation refinement from str to UUID.Changing
data_point_idstype fromlist[str]tolist[UUID]improves type safety and makes the interface more precise. This helps catch potential type errors earlier in the development process.Also applies to: 236-236
176-176: Search behavior improvements.The increased default limit (5 → 15) provides more comprehensive search results, and handling non-positive limits by passing
Noneis a robust approach that prevents unexpected behavior.Also applies to: 196-196
cognee/modules/pipelines/operations/pipeline.py (1)
142-155:get_pipeline_statusreturn-type assumption can break the status checks
task_statusis assumed to be a mapping (dict-like) whose keys arestr(dataset_id), yet the function call on line 142 provides a list of dataset IDs and—depending on its implementation—could plausibly return:
- a
list[PipelineRunStatus],- a
dict[str, PipelineRunStatus], or- something else entirely.
If it returns a non-mapping type, the subsequent containment check (
if str(dataset_id) in task_status:) will raiseTypeErroror always evaluate toFalse, allowing duplicate pipeline executions.Please double-check
get_pipeline_status’s signature and update the logic accordingly, or cast the result to the expected structure.cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
186-188: Great! RaisingCollectionNotFoundErrorimproves diagnosticsExplicitly surfacing a typed exception with a controllable log level is a solid improvement over returning
Noneor raising generic errors.
No further action required.cognee/infrastructure/databases/graph/networkx/adapter.py (1)
139-142: Edge retrieval logic looks goodThe switch to
UUIDpropagates correctly into thein_edges/out_edgesqueries; behaviour is intact.cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (2)
75-81:collections.getmay requireawait– double-check SDKThe async SDK exposes both sync & async variants.
Ifclient.collections.get()returns a coroutine, omittingawaitwill hand back an un-awaited coroutine to downstream code.Please confirm with the SDK docs; apply
awaitif necessary.
167-174: Limit parameter handled nicelyRaising the default search limit to 15 and forwarding
Nonewhenlimit<=0is clean and keeps API flexible.cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
76-82:await connection.open_table()– confirm it really is a coroutine
AsyncConnection.open_tableis async in current LanceDB (≥0.4), so awaiting is correct.
If your pinned version is older, this will raiseTypeError: object is not awaitable.
Double-check the project’s LanceDB version; downgrade compatibility might need:table = connection.open_table(collection_name) if inspect.iscoroutine(table): table = await tablecognee/modules/retrieval/utils/brute_force_triplet_search.py (4)
4-6: Updated imports for improved error handling.Added imports for
EntityNotFoundErrorandCollectionNotFoundErrorexceptions, which are now used for more graceful error handling in the functions below. This aligns with the broader error handling improvements across the codebase.
146-153: Extracted search function with improved error handling.Created a new helper function
search_in_collectionthat encapsulates the vector search call and properly handles the case where a collection doesn't exist. This is a good refactoring that isolates error handling and makes the code more resilient.
156-157: Improved handling of empty search results.The code now uses the new helper function for searches and returns an empty list when no results are found instead of potentially raising an exception. This provides more predictable behavior and a better API experience for callers.
Also applies to: 159-161
173-174: Added top-level error handling for collection not found.The function now catches
CollectionNotFoundErrorat the top level and returns an empty list, further improving resilience. This ensures the function doesn't fail even if there are unexpected collection issues.cognee/modules/pipelines/operations/run_tasks.py (4)
23-25: Context parameter added to enable task context propagationThe addition of the optional
contextparameter enhances the pipeline's capability by allowing contextual data to be passed through to tasks that need it.
41-41: Context propagation properly passed to run_tasks_baseThe context is correctly forwarded to the underlying function, ensuring seamless propagation throughout the execution chain.
77-77: Context parameter added to run_tasks functionAdding the optional context parameter here completes the context propagation path through the pipeline execution functions.
87-93: Updated function call with named parametersThe function call is updated with named parameters, making the code more readable and explicitly showing the context parameter being passed. This is good practice when functions have multiple parameters.
cognee/tasks/temporal_awareness/index_graphiti_objects.py (6)
23-23: Empty params parameter added for consistencyAdding explicit
params={}ensures consistent parameter passing across all graph queries, which is important for standardization and compatibility with different graph database providers.
27-29: Query parameters standardizedThe relationship name assignment and explicit empty params parameter follow the standardization pattern, ensuring uniform query interfaces across the codebase.
30-32: New query added with consistent parameter styleThis new query ensures nodes have a text field by coalescing summary and content fields, maintaining the standardized query pattern with explicit params.
34-34: Updated method for retrieving graph dataChanging from
get_model_independent_graph_data()toget_graph_data()aligns with the new unified graph engine interface that supports multiple graph database providers including the new Memgraph support.
36-40: Simplified node data iterationThe iteration pattern has been improved to directly use key-value pairs from the graph data structure, making the code more readable and less prone to indexing errors.
64-65: Simplified edge relationship extractionThe code now directly accesses the relationship name at index 2 of each edge tuple, simplifying the previous more complex filtering and indexing logic. This makes the code more maintainable.
notebooks/cognee_llama_index.ipynb (2)
15-17: Notebook cell format standardization looks good.The source field for this code cell has been updated to use a list format instead of a single string, which aligns with standard Jupyter notebook JSON conventions. This is a purely structural change that doesn't affect code execution or functionality.
123-125: Structural field reordering approved.The metadata and outputs fields in this cell have been reordered, which is again just a format standardization that aligns with Jupyter notebook conventions. This change doesn't affect the code execution or functionality of the notebook.
notebooks/cognee_simple_demo.ipynb (9)
17-19: Package version upgrade to cognee 0.1.39.The cognee package has been updated from version 0.1.36 to 0.1.39. This appears to be part of the broader architectural changes mentioned in the PR summary.
13-16: Notebook metadata standardization looks good.The execution_count and outputs fields have been properly standardized for this cell. This change improves notebook compatibility without affecting functionality.
33-41: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized for consistency with other notebook cells.
53-60: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized for consistent notebook formatting.
72-80: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized. These changes are structural only and don't affect the code's functionality.
92-98: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized in this cell.
102-108: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized in this cell.
112-118: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized in this cell.
130-143: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized in this visualization cell.
notebooks/llama_index_cognee_integration.ipynb (5)
6-8: Markdown cell formatting standardized.The source field for this markdown cell has been converted from a single string to a list containing that string, which is a standard Jupyter notebook format. This change doesn't affect functionality.
62-64: Code cell formatting standardized.The source field for this pip install cell has been converted to a list format, which aligns with Jupyter notebook standards. This is a formatting change only.
199-201: Markdown cell formatting standardized.The source field for this markdown cell has been converted to list format, following standard Jupyter notebook conventions.
220-221: Markdown cell formatting standardized.The source field for this conclusion markdown cell has been converted to list format, aligning with Jupyter notebook conventions.
282-283: Python version updated in kernel metadata.The Python version in the language_info metadata has been updated to 3.12.9. This ensures the notebook is associated with the correct Python version.
| 2. Create a Pull Request: | ||
| - Go to the [**cognee** repository](https://github.com/topoteretes/cognee) | ||
| - Click "Compare & Pull Request" and make sure to open PR against dev branch | ||
| - Click "Compare & Pull Request" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure PRs target the dev branch: The removal of the explicit instruction to open pull requests against the dev branch may lead contributors to accidentally target main. Please reintroduce guidance to verify that the base branch is set to dev when clicking "Compare & Pull Request".
🤖 Prompt for AI Agents
In CONTRIBUTING.md at line 100, reintroduce the instruction reminding
contributors to ensure their pull requests target the `dev` branch. Add a clear
note or step near the "Click 'Compare & Pull Request'" instruction that
explicitly tells users to verify the base branch is set to `dev` before
submitting the PR.
| from .get_datasets_by_name import get_datasets_by_name | ||
| from .get_dataset_data import get_dataset_data | ||
| from .get_data import get_data | ||
| from .get_unique_dataset_id import get_unique_dataset_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Address unused import in __init__.py.
The import of get_unique_dataset_id is never referenced within this file, causing a F401 lint error. Since we want to expose it as part of the public API, add a # noqa: F401 or define an __all__ list.
For example, to suppress the warning:
-from .get_unique_dataset_id import get_unique_dataset_id
+# noqa: F401
+from .get_unique_dataset_id import get_unique_dataset_id📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from .get_unique_dataset_id import get_unique_dataset_id | |
| # noqa: F401 | |
| from .get_unique_dataset_id import get_unique_dataset_id |
🧰 Tools
🪛 Ruff (0.8.2)
10-10: .get_unique_dataset_id.get_unique_dataset_id imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
🤖 Prompt for AI Agents
In cognee/modules/data/methods/__init__.py at line 10, the import of
get_unique_dataset_id is unused and triggers a F401 lint error. To fix this,
either add a comment # noqa: F401 at the end of the import line to suppress the
warning or define an __all__ list including 'get_unique_dataset_id' to
explicitly declare it as part of the public API.
| async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | ||
| return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Unnecessary async declaration.
The function is declared as async but doesn't perform any asynchronous operations. Consider removing the async keyword unless you plan to add asynchronous operations later.
-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
+def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")Additionally, consider adding input validation for empty dataset names and checking that the user and user.id are not None.
-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
- return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
+async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
+ if not dataset_name:
+ raise ValueError("Dataset name cannot be empty")
+ if not user or not user.id:
+ raise ValueError("User or user ID cannot be None")
+ return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | |
| return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") | |
| def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | |
| return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") |
| async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | |
| return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") | |
| async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | |
| if not dataset_name: | |
| raise ValueError("Dataset name cannot be empty") | |
| if not user or not user.id: | |
| raise ValueError("User or user ID cannot be None") | |
| return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") |
🤖 Prompt for AI Agents
In cognee/modules/data/methods/get_unique_dataset_id.py at lines 5 to 6, remove
the async keyword from the function declaration since it does not perform any
asynchronous operations. Additionally, add input validation to check that
dataset_name is not empty and that user and user.id are not None, raising
appropriate exceptions if these conditions are not met.
| def get_observe(): | ||
| monitoring = get_base_config().monitoring_tool | ||
|
|
||
| if monitoring == Observer.LANGFUSE: | ||
| from langfuse.decorators import observe | ||
|
|
||
| return observe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing handling for other Observer types
The function only handles the case when monitoring == Observer.LANGFUSE, but doesn't handle the other enum values (LLMLITE, LANGSMITH) defined in the Observer enum. There's also no fallback implementation or error handling if an unsupported monitoring tool is configured.
I recommend adding handling for other observer types or at least a fallback:
def get_observe():
monitoring = get_base_config().monitoring_tool
if monitoring == Observer.LANGFUSE:
from langfuse.decorators import observe
return observe
+ elif monitoring == Observer.LLMLITE:
+ # Import and return LLMLITE observe decorator
+ pass
+ elif monitoring == Observer.LANGSMITH:
+ # Import and return LANGSMITH observe decorator
+ pass
+ else:
+ # Provide a fallback or raise a meaningful error
+ raise ValueError(f"Unsupported monitoring tool: {monitoring}")Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/modules/observability/get_observe.py around lines 5 to 11, the
get_observe function only handles the case when monitoring equals
Observer.LANGFUSE and lacks handling for other Observer enum values like LLMLITE
and LANGSMITH. To fix this, add conditional branches to handle these other
observer types appropriately, or provide a fallback implementation or raise a
clear error if the monitoring tool is unsupported, ensuring all enum cases are
covered.
| const onCognify = useCallback(() => { | ||
| const dataset = datasets.find((dataset) => dataset.id === selectedDataset); | ||
| return onDatasetCognify({ | ||
| id: dataset!.id, | ||
| name: dataset!.name, | ||
| }); | ||
| }, [datasets, onDatasetCognify, selectedDataset]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding null check before accessing dataset properties.
The function doesn't validate that a dataset was found before accessing its properties with non-null assertions (!). This could cause runtime errors if selectedDataset doesn't match any dataset in the array.
const onCognify = useCallback(() => {
const dataset = datasets.find((dataset) => dataset.id === selectedDataset);
+ if (!dataset) {
+ showNotification("No dataset selected. Please select a dataset first.", 5000);
+ return Promise.reject(new Error("No dataset selected"));
+ }
return onDatasetCognify({
- id: dataset!.id,
- name: dataset!.name,
+ id: dataset.id,
+ name: dataset.name,
});
}, [datasets, onDatasetCognify, selectedDataset]);🤖 Prompt for AI Agents
In cognee-frontend/src/app/page.tsx around lines 63 to 69, the onCognify
function uses non-null assertions on dataset properties without checking if
dataset is null or undefined. To fix this, add a null check after finding the
dataset to ensure it exists before accessing its id and name. If dataset is not
found, handle the case appropriately, such as returning early or showing an
error, to prevent runtime errors.
| query = """ | ||
| UNWIND $nodes AS node | ||
| MERGE (n {id: node.node_id}) | ||
| ON CREATE SET n:node.label, n += node.properties, n.updated_at = timestamp() | ||
| ON MATCH SET n:node.label, n += node.properties, n.updated_at = timestamp() | ||
| RETURN ID(n) AS internal_id, n.id AS nodeId | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic labels cannot be parameterised – Cypher syntax error
n:node.label attempts to use a parameter for the label, which Cypher/Memgraph doesn’t allow. The query will raise a syntax error.
-MERGE (n {id: node.node_id})
-ON CREATE SET n:node.label, n += node.properties, n.updated_at = timestamp()
-ON MATCH SET n:node.label, n += node.properties, n.updated_at = timestamp()
+MERGE (n {id: node.node_id})
+WITH n, node
+CALL apoc.create.addLabels(n, [node.label]) YIELD node AS n2 // requires APOC; otherwise build separate queries per label
+SET n2 += node.properties,
+ n2.updated_at = timestamp()Alternatively, group nodes by label and execute one query per label.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 82 to 88, the Cypher query incorrectly uses a parameter for a node label
with syntax like n:node.label, which is not allowed. To fix this, refactor the
code to group nodes by their label and run separate queries for each label,
embedding the label directly in the query string instead of using a parameter.
This avoids dynamic label parameterization errors by executing one query per
label with the label hardcoded in the Cypher MERGE statement.
| predecessors, successors = await asyncio.gather( | ||
| self.query(predecessors_query, dict(node_id=str(node_id))), | ||
| self.query(successors_query, dict(node_id=str(node_id))), | ||
| ) | ||
|
|
||
| connections = [] | ||
|
|
||
| for neighbour in predecessors: | ||
| neighbour = neighbour["relation"] | ||
| connections.append((neighbour[0], {"relationship_name": neighbour[1]}, neighbour[2])) | ||
|
|
||
| for neighbour in successors: | ||
| neighbour = neighbour["relation"] | ||
| connections.append((neighbour[0], {"relationship_name": neighbour[1]}, neighbour[2])) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_connections builds invalid tuples – relation is not subscriptable
neighbour = neighbour["relation"] returns a neo4j.graph.Relationship object; indexing it (neighbour[0]) will raise TypeError. Consider extracting properties explicitly:
- for rec in predecessors:
- rel = rec["relation"]
- connections.append((rel.start_node["id"],
- {"relationship_name": rel.type},
- rel.end_node["id"]))Also remove the shadowing of the neighbour loop variable.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 380 to 394, the code incorrectly treats the 'relation' object as
subscriptable, causing a TypeError. To fix this, avoid indexing the 'relation'
object directly; instead, explicitly extract the start node ID, relationship
type, and end node ID using the appropriate properties or methods of the
Relationship object. Also, rename the inner loop variable to prevent shadowing
the outer 'neighbour' variable.
| sanitized_id = node_id.replace(":", "_") | ||
|
|
||
| query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node" | ||
| params = {"node_id": sanitized_id} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete_node Cypher syntax error (: before {)
MATCH (node: {{id: $node_id}}) results in MATCH (node: {id: $node_id}), which is invalid. The colon introduces a label, not properties. Replace with:
-query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node"
+query = "MATCH (node {id: $node_id}) DETACH DELETE node"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sanitized_id = node_id.replace(":", "_") | |
| query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node" | |
| params = {"node_id": sanitized_id} | |
| sanitized_id = node_id.replace(":", "_") | |
| query = "MATCH (node {id: $node_id}) DETACH DELETE node" | |
| params = {"node_id": sanitized_id} |
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 120 to 124, the Cypher query syntax is incorrect because it uses a colon
before a property map, which is invalid. Replace the query string to remove the
colon and use parentheses with a property map directly, like MATCH (node {id:
$node_id}), to correctly match nodes by property.
| UNWIND $node_ids AS id | ||
| MATCH (node:`{id}`)<-[r:{edge_label}]-(successor) | ||
| DELETE r; | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect f-string placeholders introduce labels named after IDs
MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
produces labels like :123e4567-e89b... which is almost certainly unintended. Use a property match instead:
-MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
+MATCH (node {id: successor_id})<-[r:`{edge_label}`]-(successor)(again, consider validating edge_label and using parameterised Cypher).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 415 to 418, the Cypher query incorrectly uses f-string placeholders to
insert node IDs as labels, resulting in unintended labels like
:123e4567-e89b.... Replace the label match with a property match on the node,
such as matching nodes by an ID property instead of using the ID as a label.
Also, validate the edge_label input and use parameterized Cypher queries to
prevent injection risks.
| def serialize_properties(self, properties=dict()): | ||
| serialized_properties = {} | ||
|
|
||
| for property_key, property_value in properties.items(): | ||
| if isinstance(property_value, UUID): | ||
| serialized_properties[property_key] = str(property_value) | ||
| continue | ||
|
|
||
| if isinstance(property_value, dict): | ||
| serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder) | ||
| continue | ||
|
|
||
| serialized_properties[property_key] = property_value | ||
|
|
||
| return serialized_properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Mutable default argument violates best practices
serialize_properties(self, properties=dict()) uses a mutable default; callers mutating the dict will affect subsequent calls.
- def serialize_properties(self, properties=dict()):
+ def serialize_properties(self, properties: Optional[dict] = None):
+ if properties is None:
+ properties = {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def serialize_properties(self, properties=dict()): | |
| serialized_properties = {} | |
| for property_key, property_value in properties.items(): | |
| if isinstance(property_value, UUID): | |
| serialized_properties[property_key] = str(property_value) | |
| continue | |
| if isinstance(property_value, dict): | |
| serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder) | |
| continue | |
| serialized_properties[property_key] = property_value | |
| return serialized_properties | |
| def serialize_properties(self, properties: Optional[dict] = None): | |
| if properties is None: | |
| properties = {} | |
| serialized_properties = {} | |
| for property_key, property_value in properties.items(): | |
| if isinstance(property_value, UUID): | |
| serialized_properties[property_key] = str(property_value) | |
| continue | |
| if isinstance(property_value, dict): | |
| serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder) | |
| continue | |
| serialized_properties[property_key] = property_value | |
| return serialized_properties |
🧰 Tools
🪛 Ruff (0.8.2)
430-430: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 430 to 444, the method serialize_properties uses a mutable default
argument (properties=dict()), which can lead to unexpected behavior if the
dictionary is modified. Change the default value to None and inside the method
initialize properties to an empty dictionary if it is None before proceeding.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.