-
Notifications
You must be signed in to change notification settings - Fork 963
feat: avoid double edge vector search in triplet search #1877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: avoid double edge vector search in triplet search #1877
Conversation
Please make sure all the checkboxes are checked:
|
WalkthroughThe pull request refactors the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
214-231: Add docstring to document the updated method behavior.According to coding guidelines, undocumented function definitions are considered incomplete. This method underwent a significant signature change (removing
vector_engineandquery_vectorparameters), so a docstring would help clarify its current behavior and parameters.Apply this diff to add documentation:
async def map_vector_distances_to_graph_edges(self, edge_distances) -> None: + """ + Map pre-computed vector distances to graph edges. + + Args: + edge_distances: List of scored results with payload["text"] and score attributes, + or None to skip mapping. + + Returns: + None. Updates edge.attributes["vector_distance"] in-place for matching edges. + + Note: + Edge matching uses edge.attributes["edge_text"] or edge.attributes["relationship_type"] + as the lookup key against payload["text"]. + """ try:Based on coding guidelines, which require documentation for function definitions.
🧹 Nitpick comments (1)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
140-142: Consider avoiding in-place mutation of the caller's collections list.The current implementation mutates the
collectionsparameter when it's provided by the caller, which could be unexpected. Creating a defensive copy would prevent side effects on the caller's data.Apply this diff to avoid mutating the input parameter:
+ if collections is not None: + collections = collections.copy() + if "EdgeType_relationship_name" not in collections: collections.append("EdgeType_relationship_name")Alternatively, use a set union operation:
- if "EdgeType_relationship_name" not in collections: - collections.append("EdgeType_relationship_name") + collections = list(set(collections) | {"EdgeType_relationship_name"})
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cognee/modules/graph/cognee_graph/CogneeGraph.py(1 hunks)cognee/modules/retrieval/utils/brute_force_triplet_search.py(2 hunks)cognee/tests/unit/modules/graph/cognee_graph_test.py(5 hunks)cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/modules/graph/cognee_graph/CogneeGraph.pycognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.pycognee/modules/retrieval/utils/brute_force_triplet_search.pycognee/tests/unit/modules/graph/cognee_graph_test.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/modules/graph/cognee_graph/CogneeGraph.pycognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.pycognee/modules/retrieval/utils/brute_force_triplet_search.pycognee/tests/unit/modules/graph/cognee_graph_test.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/modules/graph/cognee_graph/CogneeGraph.pycognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.pycognee/modules/retrieval/utils/brute_force_triplet_search.pycognee/tests/unit/modules/graph/cognee_graph_test.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/modules/graph/cognee_graph/CogneeGraph.pycognee/modules/retrieval/utils/brute_force_triplet_search.py
cognee/tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests
Files:
cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.pycognee/tests/unit/modules/graph/cognee_graph_test.py
🧠 Learnings (3)
📚 Learning: 2024-11-13T16:06:32.576Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 196
File: cognee/modules/graph/cognee_graph/CogneeGraph.py:32-38
Timestamp: 2024-11-13T16:06:32.576Z
Learning: In `CogneeGraph.py`, within the `CogneeGraph` class, it's intentional to add skeleton edges in both the `add_edge` method and the `project_graph_from_db` method to ensure that edges are added to the graph and to the nodes.
Applied to files:
cognee/modules/graph/cognee_graph/CogneeGraph.py
📚 Learning: 2024-11-13T16:17:17.646Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 196
File: cognee/modules/graph/cognee_graph/CogneeGraphElements.py:82-90
Timestamp: 2024-11-13T16:17:17.646Z
Learning: In `cognee/modules/graph/cognee_graph/CogneeGraphElements.py`, within the `Edge` class, nodes and edges can have different dimensions, and it's acceptable for them not to match.
Applied to files:
cognee/modules/graph/cognee_graph/CogneeGraph.pycognee/tests/unit/modules/graph/cognee_graph_test.py
📚 Learning: 2024-12-04T18:37:55.092Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 251
File: cognee/tests/infrastructure/databases/test_index_graph_edges.py:0-0
Timestamp: 2024-12-04T18:37:55.092Z
Learning: In the `index_graph_edges` function, both graph engine and vector engine initialization failures are handled within the same try-except block, so a single test covers both cases.
Applied to files:
cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.pycognee/tests/unit/modules/graph/cognee_graph_test.py
🧬 Code graph analysis (2)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
map_vector_distances_to_graph_edges(214-231)
cognee/tests/unit/modules/graph/cognee_graph_test.py (2)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
map_vector_distances_to_graph_edges(214-231)cognee/modules/graph/cognee_graph/CogneeGraphElements.py (2)
Node(6-86)Edge(89-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: End-to-End Tests / Test Entity Extraction
- GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
- GitHub Check: End-to-End Tests / Concurrent Subprocess access test
- GitHub Check: End-to-End Tests / Test graph edge ingestion
- GitHub Check: End-to-End Tests / Test Feedback Enrichment
- GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
- GitHub Check: End-to-End Tests / Conversation sessions test (FS)
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: Basic Tests / Run Linting
- GitHub Check: Basic Tests / Run Simple Examples BAML
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: End-to-End Tests / Run Telemetry Test
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: CLI Tests / CLI Functionality Tests
- GitHub Check: CLI Tests / CLI Integration Tests
🔇 Additional comments (5)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
203-203: LGTM!The call correctly uses the updated API signature, passing only
edge_distancesas expected.cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py (2)
112-136: LGTM!Test correctly validates that EdgeType_relationship_name is included in the default collections list, aligning with the implementation change.
139-183: LGTM!Tests correctly validate that EdgeType_relationship_name is always included in the collections, both for custom collections (using set comparison) and explicitly testing the always-included behavior. Good test coverage for the new functionality.
cognee/tests/unit/modules/graph/cognee_graph_test.py (2)
308-410: LGTM!All test functions correctly updated to use the new API signature with the
edge_distancesparameter. The use of keyword arguments enhances readability, and test coverage for various edge-mapping scenarios remains comprehensive.
413-422: LGTM!New test effectively validates the early-return behavior when
edge_distancesis None, ensuring that edges retain their default vector distance and no errors are raised. This is a good defensive test for the None-handling path.
pazone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's wait for @hajdul88 ✅
hajdul88
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
Eliminates double vector search for edges by ensuring all edge lookups happen once in the retrieval layer.
brute_force_triplet_search: Always includes "EdgeType_relationship_name" in collectionsCogneeGraph.map_vector_distances_to_graph_edges: Removed internal vector search fallback; only maps provided distances.Type of Change
Screenshots/Videos (if applicable)
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Summary by CodeRabbit
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.