Skip to content

Conversation

@lxobr
Copy link
Collaborator

@lxobr lxobr commented Dec 8, 2025

Description

Eliminates double vector search for edges by ensuring all edge lookups happen once in the retrieval layer.

  • brute_force_triplet_search: Always includes "EdgeType_relationship_name" in collections
  • CogneeGraph.map_vector_distances_to_graph_edges: Removed internal vector search fallback; only maps provided distances.
  • Tests updated to reflect the new behavior.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

  • Bug Fixes

    • Ensured relationship edges are automatically included in search collections, improving search completeness and accuracy.
  • Refactor

    • Simplified graph edge distance mapping logic by removing unnecessary external dependencies, resulting in more efficient edge processing during retrieval operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@pull-checklist
Copy link

pull-checklist bot commented Dec 8, 2025

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

Walkthrough

The pull request refactors the map_vector_distances_to_graph_edges method in CogneeGraph by removing dependencies on vector_engine and query_vector parameters. The method signature changes to accept only edge_distances, and the logic that computed distances via external embedding lookups is removed. Related call sites and tests are updated accordingly.

Changes

Cohort / File(s) Summary
Core method signature change
cognee/modules/graph/cognee_graph/CogneeGraph.py
Method map_vector_distances_to_graph_edges signature simplified from (self, vector_engine, query_vector, edge_distances) to (self, edge_distances). Vector engine computation and query_vector validation logic removed; method now returns early if edge_distances is None.
Call site updates
cognee/modules/retrieval/utils/brute_force_triplet_search.py
Updated method invocation to pass only edge_distances parameter; EdgeType_relationship_name ensured in default collections set.
Test updates
cognee/tests/unit/modules/graph/cognee_graph_test.py
All test calls updated to new signature with only edge_distances parameter. Tests exercising vector_engine and query_vector paths removed; new early-return scenario for None edge_distances added.
Collection handling tests
cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py
Tests updated to verify EdgeType_relationship_name is always included in search collections. Custom collection tests adjusted to set-based comparisons accounting for automatic EdgeType_relationship_name inclusion.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Signature change verification: Confirm all call sites and implementations match the new edge_distances-only parameter pattern
  • Early-return behavior: Verify the None edge_distances early-return path is correctly tested and doesn't break existing workflows
  • Collection handling logic: Review the EdgeType_relationship_name inclusion in brute_force_triplet_search to ensure it doesn't cause unintended side effects or duplicate searches

Possibly related PRs

Suggested reviewers

  • Vasilije1990
  • dexters1
  • alekszievr
  • borisarzentar

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the primary change: avoiding double edge vector search in triplet search operations.
Description check ✅ Passed The description provides a clear explanation of changes, identifies the type of change (Performance improvement), and completes most of the pre-submission checklist with reasonable coverage.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/cog-3492-avoid-double-edge-vector-search-in-triplet-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)

214-231: Add docstring to document the updated method behavior.

According to coding guidelines, undocumented function definitions are considered incomplete. This method underwent a significant signature change (removing vector_engine and query_vector parameters), so a docstring would help clarify its current behavior and parameters.

Apply this diff to add documentation:

 async def map_vector_distances_to_graph_edges(self, edge_distances) -> None:
+    """
+    Map pre-computed vector distances to graph edges.
+    
+    Args:
+        edge_distances: List of scored results with payload["text"] and score attributes,
+                       or None to skip mapping.
+    
+    Returns:
+        None. Updates edge.attributes["vector_distance"] in-place for matching edges.
+    
+    Note:
+        Edge matching uses edge.attributes["edge_text"] or edge.attributes["relationship_type"]
+        as the lookup key against payload["text"].
+    """
     try:

Based on coding guidelines, which require documentation for function definitions.

🧹 Nitpick comments (1)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

140-142: Consider avoiding in-place mutation of the caller's collections list.

The current implementation mutates the collections parameter when it's provided by the caller, which could be unexpected. Creating a defensive copy would prevent side effects on the caller's data.

Apply this diff to avoid mutating the input parameter:

+    if collections is not None:
+        collections = collections.copy()
+
     if "EdgeType_relationship_name" not in collections:
         collections.append("EdgeType_relationship_name")

Alternatively, use a set union operation:

-    if "EdgeType_relationship_name" not in collections:
-        collections.append("EdgeType_relationship_name")
+    collections = list(set(collections) | {"EdgeType_relationship_name"})
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a3138e and c04d255.

📒 Files selected for processing (4)
  • cognee/modules/graph/cognee_graph/CogneeGraph.py (1 hunks)
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py (2 hunks)
  • cognee/tests/unit/modules/graph/cognee_graph_test.py (5 hunks)
  • cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/modules/graph/cognee_graph/CogneeGraph.py
  • cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py
  • cognee/tests/unit/modules/graph/cognee_graph_test.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/modules/graph/cognee_graph/CogneeGraph.py
  • cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py
  • cognee/tests/unit/modules/graph/cognee_graph_test.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/modules/graph/cognee_graph/CogneeGraph.py
  • cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py
  • cognee/tests/unit/modules/graph/cognee_graph_test.py
cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

  • cognee/modules/graph/cognee_graph/CogneeGraph.py
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py
cognee/tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests

Files:

  • cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py
  • cognee/tests/unit/modules/graph/cognee_graph_test.py
🧠 Learnings (3)
📚 Learning: 2024-11-13T16:06:32.576Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 196
File: cognee/modules/graph/cognee_graph/CogneeGraph.py:32-38
Timestamp: 2024-11-13T16:06:32.576Z
Learning: In `CogneeGraph.py`, within the `CogneeGraph` class, it's intentional to add skeleton edges in both the `add_edge` method and the `project_graph_from_db` method to ensure that edges are added to the graph and to the nodes.

Applied to files:

  • cognee/modules/graph/cognee_graph/CogneeGraph.py
📚 Learning: 2024-11-13T16:17:17.646Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 196
File: cognee/modules/graph/cognee_graph/CogneeGraphElements.py:82-90
Timestamp: 2024-11-13T16:17:17.646Z
Learning: In `cognee/modules/graph/cognee_graph/CogneeGraphElements.py`, within the `Edge` class, nodes and edges can have different dimensions, and it's acceptable for them not to match.

Applied to files:

  • cognee/modules/graph/cognee_graph/CogneeGraph.py
  • cognee/tests/unit/modules/graph/cognee_graph_test.py
📚 Learning: 2024-12-04T18:37:55.092Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 251
File: cognee/tests/infrastructure/databases/test_index_graph_edges.py:0-0
Timestamp: 2024-12-04T18:37:55.092Z
Learning: In the `index_graph_edges` function, both graph engine and vector engine initialization failures are handled within the same try-except block, so a single test covers both cases.

Applied to files:

  • cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py
  • cognee/tests/unit/modules/graph/cognee_graph_test.py
🧬 Code graph analysis (2)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
  • map_vector_distances_to_graph_edges (214-231)
cognee/tests/unit/modules/graph/cognee_graph_test.py (2)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
  • map_vector_distances_to_graph_edges (214-231)
cognee/modules/graph/cognee_graph/CogneeGraphElements.py (2)
  • Node (6-86)
  • Edge (89-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: End-to-End Tests / Test Entity Extraction
  • GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
  • GitHub Check: End-to-End Tests / Concurrent Subprocess access test
  • GitHub Check: End-to-End Tests / Test graph edge ingestion
  • GitHub Check: End-to-End Tests / Test Feedback Enrichment
  • GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
  • GitHub Check: End-to-End Tests / Conversation sessions test (FS)
  • GitHub Check: End-to-End Tests / S3 Bucket Test
  • GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
  • GitHub Check: End-to-End Tests / Deduplication Test
  • GitHub Check: Basic Tests / Run Integration Tests
  • GitHub Check: Basic Tests / Run Simple Examples
  • GitHub Check: Basic Tests / Run Linting
  • GitHub Check: Basic Tests / Run Simple Examples BAML
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: End-to-End Tests / Run Telemetry Test
  • GitHub Check: End-to-End Tests / Server Start Test
  • GitHub Check: CLI Tests / CLI Functionality Tests
  • GitHub Check: CLI Tests / CLI Integration Tests
🔇 Additional comments (5)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

203-203: LGTM!

The call correctly uses the updated API signature, passing only edge_distances as expected.

cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py (2)

112-136: LGTM!

Test correctly validates that EdgeType_relationship_name is included in the default collections list, aligning with the implementation change.


139-183: LGTM!

Tests correctly validate that EdgeType_relationship_name is always included in the collections, both for custom collections (using set comparison) and explicitly testing the always-included behavior. Good test coverage for the new functionality.

cognee/tests/unit/modules/graph/cognee_graph_test.py (2)

308-410: LGTM!

All test functions correctly updated to use the new API signature with the edge_distances parameter. The use of keyword arguments enhances readability, and test coverage for various edge-mapping scenarios remains comprehensive.


413-422: LGTM!

New test effectively validates the early-return behavior when edge_distances is None, ensuring that edges retain their default vector distance and no errors are raised. This is a good defensive test for the None-handling path.

@lxobr lxobr requested review from hajdul88 and pazone December 8, 2025 16:41
Copy link
Contributor

@pazone pazone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's wait for @hajdul88

Copy link
Collaborator

@hajdul88 hajdul88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Vasilije1990 Vasilije1990 merged commit 49f7c51 into dev Dec 9, 2025
325 of 336 checks passed
@Vasilije1990 Vasilije1990 deleted the feature/cog-3492-avoid-double-edge-vector-search-in-triplet-search branch December 9, 2025 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants