feat: optimize repeated entity extraction #1682

lxobr · 2025-10-28T13:30:07Z

Description

Added an edge_text field to edges that auto-fills from relationship_type if not provided.
Containts edges now store descriptions for better embedding
Updated and refactored indexing so that edge_text gets embedded and exposed
Updated retrieval to use the new embeddings
Added a test to verify edge_text exists in the graph with the correct format.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Code refactoring
Performance improvement
Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

pull-checklist · 2025-10-28T13:30:11Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-10-28T13:30:37Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces a comprehensive feedback enrichment pipeline that extracts negative user feedback, generates improved answers via Chain-of-Thought reasoning, creates educational enrichments, and links them within the knowledge graph. Supporting changes include expanded graph node payloads, structured LLM output handling, refined indexing strategies, and updated data models.

Changes

Cohort / File(s)	Summary
Data Models & Exports `cognee/infrastructure/engine/models/Edge.py`, `cognee/tasks/feedback/models.py`	Added `edge_text` field to Edge with validator; introduced FeedbackEnrichment model with fields for text, question, answers, feedback/interaction IDs, and enrichment metadata.
LLM Prompt Templates `cognee/infrastructure/llm/prompts/feedback_*.txt`	Added three new prompt templates: `feedback_reaction_prompt.txt` (improved answer generation), `feedback_report_prompt.txt` (explanation generation), `feedback_user_context_prompt.txt` (context summarization).
Graph Data Retrieval & Adaptation `cognee/infrastructure/databases/graph/kuzu/adapter.py`, `cognee/modules/graph/cognee_graph/CogneeGraph.py`, `cognee/modules/retrieval/utils/brute_force_triplet_search.py`	Expanded Kuzu node query to return name, type, and properties; updated edge-distance mapping to use edge_text with fallback to relationship_type; extended edge projection to include edge_text.
Structured LLM Completion `cognee/modules/retrieval/utils/completion.py`	Introduced `generate_structured_completion` function with response_model support; refactored `generate_completion` to delegate to structured path.
Chain-of-Thought Retriever Refactoring `cognee/modules/retrieval/graph_completion_cot_retriever.py`	Converted to structured-output pipeline; added `get_structured_completion` method; refactored `_run_cot_completion` to return (completion, context_text, triplets) tuple; updated `get_completion` to wrap structured results.
Feedback Extraction & Processing Tasks `cognee/tasks/feedback/extract_feedback_interactions.py`, `cognee/tasks/feedback/generate_improved_answers.py`, `cognee/tasks/feedback/create_enrichments.py`, `cognee/tasks/feedback/link_enrichments_to_feedback.py`	New modules implementing feedback extraction from graph, improved answer generation via LLM, enrichment report creation, and edge linking between enrichments and feedback/interaction nodes.
Data Chunking & Graph Expansion `cognee/modules/chunking/models/DocumentChunk.py`, `cognee/modules/graph/utils/expand_with_nodes_and_edges.py`	Extended DocumentChunk.contains type to include `tuple[Edge, Entity]`; updated expand_with_nodes_and_edges to create Edge objects with relationship descriptions and append (Edge, Entity) tuples.
Storage & Indexing Refactoring `cognee/tasks/storage/index_data_points.py`, `cognee/tasks/storage/index_graph_edges.py`	Reorganized index_data_points to batch by type/field; simplified index_graph_edges to use EdgeType datapoints and index_data_points directly instead of manual vector indexing.
Package Initialization `cognee/tasks/feedback/__init__.py`	Added public exports: `extract_feedback_interactions`, `generate_improved_answers`, `create_enrichments`, `link_enrichments_to_feedback`, `FeedbackEnrichment`.
Tests & Examples `cognee/tests/test_edge_ingestion.py`, `cognee/tests/test_feedback_enrichment.py`, `cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py`, `examples/python/feedback_enrichment_minimal_example.py`	Added assertions for edge_text in contains edges; introduced end-to-end integration test for feedback enrichment pipeline; added structured-completion unit test; provided minimal example workflow.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Graph as Knowledge Graph
    participant Feedback as Feedback<br/>Extraction
    participant Retrieval as CoT Retriever
    participant LLM
    participant Enrichment as Enrichment<br/>Creation
    
    User->>Graph: Submit Feedback (Negative)
    Graph->>Feedback: extract_feedback_interactions()
    Feedback->>Graph: Query Interactions & Feedback Nodes
    Feedback->>LLM: Summarize Context
    LLM-->>Feedback: Context Summary
    Feedback-->>Graph: Emit FeedbackEnrichment Records
    
    Feedback->>Retrieval: generate_improved_answers(enrichments)
    Retrieval->>LLM: Render Reaction Prompt with<br/>Question, Answer, Feedback
    LLM-->>Retrieval: Structured ImprovedAnswerResponse
    Retrieval->>Graph: Fetch Related Context via CoT
    Graph-->>Retrieval: Context Triplets + Edges
    Retrieval-->>Feedback: Updated Enrichments<br/>(improved_answer, new_context)
    
    Feedback->>Enrichment: create_enrichments(enrichments)
    Enrichment->>LLM: Generate Report Prompt
    LLM-->>Enrichment: Educational Report Text
    Enrichment->>Graph: Create NodeSet & Link Enrichments
    Graph-->>Enrichment: Return Enriched Records
    
    Enrichment->>Graph: link_enrichments_to_feedback(enrichments)
    Graph->>Graph: Create enriches_feedback &<br/>improves_interaction Edges
    Graph-->>User: Feedback Loop Closed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Priority areas for review:
- graph_completion_cot_retriever.py: Significant refactoring of core retrieval logic; verify return-type consistency and async/await handling in structured-completion path
- extract_feedback_interactions.py: Complex graph querying, filtering, and record construction; validate error handling and edge-case coverage for interaction matching
- index_data_points.py and index_graph_edges.py: Substantial reorganization of indexing flow; ensure batch processing logic and edge-text handling don't break downstream consumers
- completion.py and LLM integration: New structured-output pathway; verify response_model handling and conversation_history threading
- Test coverage (test_feedback_enrichment.py): End-to-end integration; ensure all pipeline stages are exercised and assertions are comprehensive

Possibly related PRs

This commit is realated to issue #1436 and I removed that bug. #1437: Also expands get_filtered_graph_data in kuzu/adapter.py to return richer node payloads (name, type, properties), directly overlapping with node-query changes here.
Merge dev into main #1398: Modifies GraphCompletionCotRetriever and _run_cot_completion, overlapping substantially with the CoT refactoring in this PR.
fix: refactor get_graph_from_model to return nodes and edges correctly #257: Changes to DocumentChunk.contains type annotation and expand_with_nodes_and_edges function behavior related to Edge/Entity tuple handling.

Suggested reviewers

borisarzentar
hajdul88

🐰 A feedback loop now gleams so bright,
With edges bearing text and light,
Enrichments bloom from answers true,
The graph learns what users knew,
And better answers come into view! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The pull request title "feat: optimize repeated entity extraction" refers to a real aspect of the changeset—specifically the edge_text field improvements and indexing refactoring—but it significantly misses the major scope of the PR. The changes include a substantial new feedback enrichment feature (with new prompt templates, feedback tasks, FeedbackEnrichment model, and related tests) that is not reflected in the title. Additionally, the phrase "repeated entity extraction" is vague and doesn't clearly communicate what optimization is being performed. A teammate scanning the commit history would not fully understand that this PR adds a complete feedback enrichment pipeline alongside edge optimization.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The pull request description is present and follows the provided template structure with all major sections included. It provides a human-written (not AI-generated) explanation of the changes, including the addition of the edge_text field, improvements to contains edges, indexing refactoring, retrieval updates, and test coverage. The Type of Change section is appropriately filled with "New feature," "Code refactoring," and "Performance improvement" checkboxes. The pre-submission checklist is substantially completed with most relevant items checked, and the DCO affirmation is included.
Docstring Coverage	✅ Passed	Docstring coverage is 82.98% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

cognee/modules/retrieval/utils/completion.py (1)

18-23: Handle missing/None system prompt

read_query_prompt may return None; concatenation and downstream call will fail silently. Guard and fail fast.

-    system_prompt = system_prompt if system_prompt else read_query_prompt(system_prompt_path)
+    system_prompt = system_prompt if system_prompt else read_query_prompt(system_prompt_path)
+    if not system_prompt:
+        raise ValueError(f"System prompt not found: {system_prompt_path}")

cognee/infrastructure/databases/graph/kuzu/adapter.py (1)

1362-1382: Attribute names must be validated; unparameterized interpolation enables query injection

The code directly interpolates unquoted attribute names (f"n.{attr} IN $..."), allowing malformed or hostile attr values to break queries. Additionally, using where_clause.replace("n.", "n1.") risks mis-rewriting predicates if attr contains "n.". While current callers use safe hardcoded names, the method is public and accepts user-controllable filters.

Validate attr against allowed columns or a strict name pattern (e.g., ^[A-Za-z_][A-Za-z0-9_]*$).
Route top-level columns ("id", "name", "type") directly; route custom attributes through json_extract() (already used elsewhere in the adapter).
Build predicates explicitly (as shown in the refactor suggestion) rather than relying on string replacement.

The suggested refactor is sound and necessary:

-        for i, filter_dict in enumerate(attribute_filters):
-            for attr, values in filter_dict.items():
-                param_name = f"values_{i}_{attr}"
-                where_clauses.append(f"n.{attr} IN ${param_name}")
-                params[param_name] = values
+        import re
+        safe_cols = {"id", "name", "type"}
+        name_re = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
+        node_preds = []
+        for i, filter_dict in enumerate(attribute_filters):
+            for attr, values in filter_dict.items():
+                if not name_re.match(attr):
+                    raise ValueError(f"Invalid attribute name: {attr}")
+                param_name = f"values_{i}_{attr}"
+                params[param_name] = values
+                if attr in safe_cols:
+                    node_preds.append(f"{{alias}}.{attr} IN ${param_name}")
+                else:
+                    node_preds.append(
+                        f"json_extract({{alias}}.properties, '$.{attr}') IN ${param_name}"
+                    )
+        def build_where(alias: str) -> str:
+            return " AND ".join(p.replace("{alias}", alias) for p in node_preds) or "true"
-        where_clause = " AND ".join(where_clauses)
-        nodes_query = f"""
-        MATCH (n:Node)
-        WHERE {where_clause}
+        nodes_query = f"""
+        MATCH (n:Node)
+        WHERE {build_where('n')}
         RETURN n.id, {{
             name: n.name,
             type: n.type,
             properties: n.properties
         }}
         """
-        edges_query = f"""
-        MATCH (n1:Node)-[r:EDGE]->(n2:Node)
-        WHERE {where_clause.replace("n.", "n1.")} AND {where_clause.replace("n.", "n2.")}
+        edges_query = f"""
+        MATCH (n1:Node)-[r:EDGE]->(n2:Node)
+        WHERE {build_where('n1')} AND {build_where('n2')}
         RETURN n1.id, n2.id, r.relationship_name, r.properties
         """

🧹 Nitpick comments (19)

cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt (1)
4-5: Hyphenate compound adjectives for clarity.

The phrases "one paragraph" and "human readable" function as compound adjectives modifying "summary" and should be hyphenated: "one-paragraph" and "human-readable".

Apply this diff:
-Provide a one paragraph human readable summary of this interaction context,
+Provide a one-paragraph human-readable summary of this interaction context,
cognee/infrastructure/llm/prompts/feedback_reaction_prompt.txt (1)

12-14: Avoid format instructions when using structured outputs

If this prompt is used with a structured response_model (e.g., TestAnswer with fields answer/explanation), the “Format your reply as: Answer: … / Explanation: …” can conflict. Consider removing these lines or gating them only for plain-text flows.

cognee/modules/retrieval/utils/completion.py (1)

6-15: Minor typing/API polish

response_model: Type = str is loose. Prefer Union with str type for better type checking: response_model: type[str] | Type[BaseModel] = str.
cognee/modules/chunking/models/DocumentChunk.py (1)
35-35: Make contains Optional[List[…]] and import Optional

Default is None but type isn’t Optional, which trips type checkers.
-from typing import List, Union
+from typing import List, Union, Optional
…
-    contains: List[Union[Entity, Event, tuple[Edge, Entity]]] = None
+    contains: Optional[List[Union[Entity, Event, tuple[Edge, Entity]]]] = None
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (1)
178-221: Stabilize tests by mocking LLMGateway.acreate_structured_output

Current tests rely on the live LLM path; they can be flaky/slow. Mock to deterministic outputs.

Example:
+    @pytest.fixture(autouse=True)
+    def _mock_llm(monkeypatch):
+        async def _fake_create(text_input: str, system_prompt: str, response_model):
+            if response_model is str:
+                return "Alice"
+            return TestAnswer(answer="Alice", explanation="From graph context.")
+        monkeypatch.setattr(
+            "cognee.infrastructure.llm.LLMGateway.LLMGateway.acreate_structured_output",
+            staticmethod(_fake_create),
+        )
cognee/tasks/storage/index_data_points.py (2)
58-61: Bound parallelism to avoid creating thousands of tasks

Large corpora can spawn unbounded tasks. Use a semaphore.
-    tasks = [
-        asyncio.create_task(vector_engine.index_data_points(type_name, field_name, batch_points))
-        for type_name, field_name, batch_points in batches
-    ]
-    await asyncio.gather(*tasks)
+    sem = asyncio.Semaphore(8)
+    async def _run(type_name, field_name, batch_points):
+        async with sem:
+            await vector_engine.index_data_points(type_name, field_name, batch_points)
+    await asyncio.gather(*(_run(t, f, b) for t, f, b in batches))
49-51: Validate batch_size

Defensive check prevents division/empty-slice issues.
 batch_size = vector_engine.embedding_engine.get_batch_size()
+if not isinstance(batch_size, int) or batch_size <= 0:
+    raise ValueError(f"Invalid embedding batch_size: {batch_size}")
cognee/tests/test_feedback_enrichment.py (1)

36-45: Split setup into helpers to reduce locals (lint R0914)

main() holds many locals. Extract directory prep and node/edge assertions into helpers to satisfy lint and readability.

cognee/tasks/storage/index_graph_edges.py (1)

67-69: Clarify deprecation message

Saying “edge embedding is deprecated” is ambiguous. Consider: “Auto-fetching edges inside index_graph_edges is deprecated; pass edges explicitly.”
cognee/tasks/feedback/extract_feedback_interactions.py (1)
87-95: Make recency sort robust to mixed timestamp formats

Compare numeric timestamps; parse ISO8601 (incl. trailing 'Z') with fallback to 0, to avoid type errors and misordering.

Apply this diff:
@@
-    def _recency_key(pair):
-        _, (_, interaction_props) = pair
-        created_at = interaction_props.get("created_at") or ""
-        updated_at = interaction_props.get("updated_at") or ""
-        return (created_at, updated_at)
+    from datetime import datetime
+
+    def _to_ts(value) -> float:
+        if isinstance(value, (int, float)):
+            return float(value)
+        if isinstance(value, str) and value:
+            val = value.replace("Z", "+00:00")
+            try:
+                return datetime.fromisoformat(val).timestamp()
+            except Exception:
+                return 0.0
+        return 0.0
+
+    def _recency_key(pair):
+        _, (_, interaction_props) = pair
+        return (
+            _to_ts(interaction_props.get("created_at")),
+            _to_ts(interaction_props.get("updated_at")),
+        )
examples/python/feedback_enrichment_minimal_example.py (1)
4-5: Unify SearchType import with internal usage

Elsewhere it’s from cognee.modules.search.types import SearchType. Use the same to avoid API surface drift.

Apply this diff:
-from cognee.api.v1.search import SearchType
+from cognee.modules.search.types import SearchType
If the API alias is required for public users, keep it and justify with a comment.
cognee/tasks/feedback/generate_improved_answers.py (3)
72-81: Remove unnecessary else after return.

Simplify per pylint R1705:
-        if completion:
-            enrichment.improved_answer = completion.answer
-            enrichment.new_context = new_context_text
-            enrichment.explanation = completion.explanation
-            return enrichment
-        else:
-            logger.warning(
-                "Failed to get structured completion from retriever", question=enrichment.question
-            )
-            return None
+        if completion:
+            enrichment.improved_answer = completion.answer
+            enrichment.new_context = new_context_text
+            enrichment.explanation = completion.explanation
+            return enrichment
+        logger.warning(
+            "Failed to get structured completion from retriever | question=%s",
+            enrichment.question,
+        )
+        return None
115-121: Throughput: consider bounded concurrency for multiple enrichments.

If rate limits allow, use asyncio.gather with a semaphore to parallelize per-item processing.

6-9: Unused import detected.

resolve_edges_to_text is imported but not used. Remove to keep import hygiene.
cognee/tasks/feedback/create_enrichments.py (2)

35-45: Optional: pre-check prompt_template to avoid raising exceptions.

You already catch exceptions, but you can avoid the try/except by checking for None and falling back early.

70-81: Throughput: optional bounded concurrency for report generation.

Use asyncio.gather with a semaphore to parallelize _generate_enrichment_report across items if allowed.
cognee/modules/retrieval/graph_completion_cot_retriever.py (3)
146-161: Missing None checks for validation/follow-up prompt files.

read_query_prompt may return None; pass-through to LLM will likely fail. Add guards with fallbacks or raise with context.

84-92: API ergonomics: parameter counts are high; consider grouping configuration.

To address R0913/R0917, introduce small config dataclasses or reuse self.* defaults to reduce arg counts.

Also applies to: 168-176

25-36: Minor: typing polish.

response_model is a type; prefer Type[Any] for annotations and return type tuple[Any, str, List[Edge]] already correct.
-def _as_answer_text(completion: Any) -> str:
+def _as_answer_text(completion: Any) -> str:
     ...
And:
-        response_model: Type = str,
+        response_model: Type[Any] = str,

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c92a7b1 and be7d315.

⛔ Files ignored due to path filters (1)

.github/workflows/e2e_tests.yml is excluded by !**/*.yml

📒 Files selected for processing (23)

cognee/infrastructure/databases/graph/kuzu/adapter.py (1 hunks)
cognee/infrastructure/engine/models/Edge.py (2 hunks)
cognee/infrastructure/llm/prompts/feedback_reaction_prompt.txt (1 hunks)
cognee/infrastructure/llm/prompts/feedback_report_prompt.txt (1 hunks)
cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt (1 hunks)
cognee/modules/chunking/models/DocumentChunk.py (2 hunks)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1 hunks)
cognee/modules/graph/utils/expand_with_nodes_and_edges.py (2 hunks)
cognee/modules/retrieval/graph_completion_cot_retriever.py (7 hunks)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1 hunks)
cognee/modules/retrieval/utils/completion.py (2 hunks)
cognee/tasks/feedback/__init__.py (1 hunks)
cognee/tasks/feedback/create_enrichments.py (1 hunks)
cognee/tasks/feedback/extract_feedback_interactions.py (1 hunks)
cognee/tasks/feedback/generate_improved_answers.py (1 hunks)
cognee/tasks/feedback/link_enrichments_to_feedback.py (1 hunks)
cognee/tasks/feedback/models.py (1 hunks)
cognee/tasks/storage/index_data_points.py (1 hunks)
cognee/tasks/storage/index_graph_edges.py (3 hunks)
cognee/tests/test_edge_ingestion.py (1 hunks)
cognee/tests/test_feedback_enrichment.py (1 hunks)
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (3 hunks)
examples/python/feedback_enrichment_minimal_example.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py: Use 4-space indentation; name modules and functions in snake_case; name classes in PascalCase (Python)
Adhere to ruff rules, including import hygiene and configured line length (100)
Keep Python lines ≤ 100 characters

Files:

cognee/tests/test_feedback_enrichment.py
cognee/modules/chunking/models/DocumentChunk.py
examples/python/feedback_enrichment_minimal_example.py
cognee/tasks/feedback/extract_feedback_interactions.py
cognee/infrastructure/engine/models/Edge.py
cognee/modules/retrieval/utils/brute_force_triplet_search.py
cognee/modules/retrieval/utils/completion.py
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
cognee/tasks/feedback/link_enrichments_to_feedback.py
cognee/tasks/feedback/models.py
cognee/modules/graph/utils/expand_with_nodes_and_edges.py
cognee/modules/graph/cognee_graph/CogneeGraph.py
cognee/infrastructure/databases/graph/kuzu/adapter.py
cognee/tasks/feedback/create_enrichments.py
cognee/tests/test_edge_ingestion.py
cognee/tasks/storage/index_data_points.py
cognee/tasks/feedback/generate_improved_answers.py
cognee/tasks/storage/index_graph_edges.py
cognee/tasks/feedback/__init__.py
cognee/modules/retrieval/graph_completion_cot_retriever.py

cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/**/*.py: Public APIs in the core library should be type-annotated where practical
Prefer explicit, structured error handling and use shared logging utilities from cognee.shared.logging_utils

Files:

cognee/tests/test_feedback_enrichment.py
cognee/modules/chunking/models/DocumentChunk.py
cognee/tasks/feedback/extract_feedback_interactions.py
cognee/infrastructure/engine/models/Edge.py
cognee/modules/retrieval/utils/brute_force_triplet_search.py
cognee/modules/retrieval/utils/completion.py
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
cognee/tasks/feedback/link_enrichments_to_feedback.py
cognee/tasks/feedback/models.py
cognee/modules/graph/utils/expand_with_nodes_and_edges.py
cognee/modules/graph/cognee_graph/CogneeGraph.py
cognee/infrastructure/databases/graph/kuzu/adapter.py
cognee/tasks/feedback/create_enrichments.py
cognee/tests/test_edge_ingestion.py
cognee/tasks/storage/index_data_points.py
cognee/tasks/feedback/generate_improved_answers.py
cognee/tasks/storage/index_graph_edges.py
cognee/tasks/feedback/__init__.py
cognee/modules/retrieval/graph_completion_cot_retriever.py

cognee/tests/**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/tests/**/test_*.py: Name test files as test_*.py
Use pytest.mark.asyncio for async tests
Tests should avoid external state; rely on fixtures and CI-provided env vars when providers are required

Files:

cognee/tests/test_feedback_enrichment.py
cognee/tests/test_edge_ingestion.py

examples/python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

When adding public APIs, provide or update targeted examples under examples/python/

Files:

examples/python/feedback_enrichment_minimal_example.py

cognee/tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Place unit tests under cognee/tests/unit/

Files:

cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py

🧬 Code graph analysis (15)

cognee/tests/test_feedback_enrichment.py (9)

cognee/infrastructure/databases/graph/get_graph_engine.py (1)

get_graph_engine (10-24)

cognee/modules/pipelines/tasks/task.py (1)

Task (5-97)

cognee/modules/search/types/SearchType.py (1)

SearchType (4-19)

cognee/shared/logging_utils.py (2)

get_logger (182-194)

info (175-175)

cognee/tasks/feedback/create_enrichments.py (1)

create_enrichments (51-84)

cognee/tasks/feedback/extract_feedback_interactions.py (1)

extract_feedback_interactions (180-230)

cognee/tasks/feedback/generate_improved_answers.py (1)

generate_improved_answers (92-130)

cognee/tasks/feedback/link_enrichments_to_feedback.py (1)

link_enrichments_to_feedback (33-67)

cognee/api/v1/config/config.py (2)

data_root_directory (36-38)

system_root_directory (18-33)

cognee/modules/chunking/models/DocumentChunk.py (3)

cognee/infrastructure/engine/models/Edge.py (1)

Edge (5-38)

cognee/modules/engine/models/Entity.py (1)

Entity (6-11)

cognee/modules/engine/models/Event.py (1)

Event (8-16)

examples/python/feedback_enrichment_minimal_example.py (6)

cognee/modules/search/types/SearchType.py (1)

SearchType (4-19)

cognee/modules/pipelines/tasks/task.py (1)

Task (5-97)

cognee/tasks/feedback/extract_feedback_interactions.py (1)

extract_feedback_interactions (180-230)

cognee/tasks/feedback/generate_improved_answers.py (1)

generate_improved_answers (92-130)

cognee/tasks/feedback/create_enrichments.py (1)

create_enrichments (51-84)

cognee/tasks/feedback/link_enrichments_to_feedback.py (1)

link_enrichments_to_feedback (33-67)

cognee/tasks/feedback/extract_feedback_interactions.py (5)

cognee/infrastructure/llm/LLMGateway.py (1)

LLMGateway (6-66)

cognee/infrastructure/llm/prompts/read_query_prompt.py (1)

read_query_prompt (6-43)

cognee/shared/logging_utils.py (2)

get_logger (182-194)

info (175-175)

cognee/infrastructure/databases/graph/get_graph_engine.py (1)

get_graph_engine (10-24)

cognee/tasks/feedback/models.py (1)

FeedbackEnrichment (9-26)

cognee/modules/retrieval/utils/completion.py (3)

cognee/infrastructure/llm/LLMGateway.py (1)

LLMGateway (6-66)

cognee/infrastructure/llm/prompts/render_prompt.py (1)

render_prompt (5-42)

cognee/infrastructure/llm/prompts/read_query_prompt.py (1)

read_query_prompt (6-43)

cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (3)

cognee/api/v1/config/config.py (2)

system_root_directory (18-33)

data_root_directory (36-38)

cognee/infrastructure/engine/models/DataPoint.py (1)

DataPoint (20-220)

cognee/modules/retrieval/graph_completion_cot_retriever.py (2)

GraphCompletionCotRetriever (39-272)

get_structured_completion (168-231)

cognee/tasks/feedback/link_enrichments_to_feedback.py (4)

cognee/infrastructure/databases/graph/get_graph_engine.py (1)

get_graph_engine (10-24)

cognee/tasks/storage/index_graph_edges.py (1)

index_graph_edges (42-77)

cognee/shared/logging_utils.py (2)

get_logger (182-194)

info (175-175)

cognee/tasks/feedback/models.py (1)

FeedbackEnrichment (9-26)

cognee/tasks/feedback/models.py (2)

cognee/infrastructure/engine/models/DataPoint.py (1)

DataPoint (20-220)

cognee/modules/engine/models/node_set.py (1)

NodeSet (4-7)

cognee/modules/graph/utils/expand_with_nodes_and_edges.py (1)

cognee/infrastructure/engine/models/Edge.py (1)

Edge (5-38)

cognee/tasks/feedback/create_enrichments.py (5)

cognee/infrastructure/llm/LLMGateway.py (1)

LLMGateway (6-66)

cognee/infrastructure/llm/prompts/read_query_prompt.py (1)

read_query_prompt (6-43)

cognee/shared/logging_utils.py (2)

get_logger (182-194)

info (175-175)

cognee/modules/engine/models/node_set.py (1)

NodeSet (4-7)

cognee/tasks/feedback/models.py (1)

FeedbackEnrichment (9-26)

cognee/tasks/storage/index_data_points.py (5)

cognee/infrastructure/databases/vector/get_vector_engine.py (1)

get_vector_engine (5-7)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)

create_vector_index (292-295)

index_data_points (297-309)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)

create_vector_index (248-249)

index_data_points (251-263)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)

create_vector_index (285-295)

index_data_points (297-319)

cognee/infrastructure/databases/vector/embeddings/EmbeddingEngine.py (1)

get_batch_size (38-45)

cognee/tasks/feedback/generate_improved_answers.py (5)

cognee/infrastructure/llm/LLMGateway.py (1)

LLMGateway (6-66)

cognee/infrastructure/llm/prompts/read_query_prompt.py (1)

read_query_prompt (6-43)

cognee/shared/logging_utils.py (2)

get_logger (182-194)

info (175-175)

cognee/modules/retrieval/graph_completion_cot_retriever.py (2)

GraphCompletionCotRetriever (39-272)

get_structured_completion (168-231)

cognee/tasks/feedback/models.py (1)

FeedbackEnrichment (9-26)

cognee/tasks/storage/index_graph_edges.py (7)

cognee/modules/engine/utils/generate_edge_id.py (1)

generate_edge_id (4-5)

cognee/infrastructure/databases/graph/get_graph_engine.py (1)

get_graph_engine (10-24)

cognee/modules/graph/models/EdgeType.py (1)

EdgeType (4-8)

cognee/tasks/storage/index_data_points.py (1)

index_data_points (10-65)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

index_data_points (297-309)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

index_data_points (251-263)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

index_data_points (297-319)

cognee/tasks/feedback/__init__.py (1)

cognee/tasks/feedback/models.py (1)

FeedbackEnrichment (9-26)

cognee/modules/retrieval/graph_completion_cot_retriever.py (4)

cognee/modules/retrieval/graph_completion_retriever.py (3)

GraphCompletionRetriever (28-281)

save_qa (217-281)

get_completion (144-215)

cognee/modules/retrieval/utils/completion.py (2)

generate_structured_completion (6-28)

summarize_text (51-63)

cognee/infrastructure/databases/cache/config.py (1)

CacheConfig (6-39)

cognee/modules/retrieval/utils/session_cache.py (2)

get_conversation_history (78-156)

save_conversation_history (10-75)

🪛 LanguageTool

cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt

[grammar] ~4-~4: Use a hyphen to join words.
Context: ...stion} Context: {context} Provide a one paragraph human readable summary of this...

(QB_NEW_EN_HYPHEN)

[grammar] ~4-~4: Use a hyphen to join words.
Context: ...{context} Provide a one paragraph human readable summary of this interaction con...

(QB_NEW_EN_HYPHEN)

🪛 Pylint (4.0.1)

cognee/tests/test_feedback_enrichment.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 36-36: Too many local variables (23/15)

(R0914)

examples/python/feedback_enrichment_minimal_example.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tasks/feedback/extract_feedback_interactions.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 153-157: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

cognee/infrastructure/engine/models/Edge.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/retrieval/utils/completion.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 6-6: Too many arguments (7/5)

(R0913)

[refactor] 6-6: Too many positional arguments (7/5)

(R0917)

[refactor] 31-31: Too many arguments (6/5)

(R0913)

[refactor] 31-31: Too many positional arguments (6/5)

(R0917)

cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py

[refactor] 193-193: Too few public methods (0/2)

(R0903)

[refactor] 196-196: Too few public methods (0/2)

(R0903)

cognee/tasks/feedback/link_enrichments_to_feedback.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tasks/feedback/models.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 9-9: Too few public methods (0/2)

(R0903)

cognee/tasks/feedback/create_enrichments.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tasks/feedback/generate_improved_answers.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 72-81: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

cognee/tasks/storage/index_graph_edges.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/retrieval/graph_completion_cot_retriever.py

[refactor] 84-84: Too many arguments (6/5)

(R0913)

[refactor] 84-84: Too many positional arguments (6/5)

(R0917)

[refactor] 84-84: Too many local variables (19/15)

(R0914)

[refactor] 168-168: Too many arguments (6/5)

(R0913)

[refactor] 168-168: Too many positional arguments (6/5)

(R0917)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)

GitHub Check: End-to-End Tests / Test Feedback Enrichment
GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
GitHub Check: End-to-End Tests / Test Entity Extraction
GitHub Check: End-to-End Tests / Concurrent Subprocess access test
GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
GitHub Check: End-to-End Tests / Conversation sessions test
GitHub Check: End-to-End Tests / Deduplication Test
GitHub Check: End-to-End Tests / S3 Bucket Test
GitHub Check: End-to-End Tests / Test graph edge ingestion
GitHub Check: CLI Tests / CLI Functionality Tests
GitHub Check: CLI Tests / CLI Integration Tests
GitHub Check: Basic Tests / Run Simple Examples
GitHub Check: Basic Tests / Run Basic Graph Tests
GitHub Check: End-to-End Tests / Server Start Test
GitHub Check: Basic Tests / Run Unit Tests
GitHub Check: Basic Tests / Run Integration Tests
GitHub Check: Basic Tests / Run Simple Examples BAML
GitHub Check: End-to-End Tests / Test graph edge ingestion
GitHub Check: End-to-End Tests / Run Telemetry Test
GitHub Check: Test Weighted Edges Examples
GitHub Check: Test Weighted Edges with Different Graph Databases (neo4j)

🔇 Additional comments (13)

cognee/tasks/feedback/models.py (1)

9-26: LGTM!

The FeedbackEnrichment data model is well-structured with proper type annotations, sensible defaults for optional fields, and clear field semantics. The metadata configuration for indexing the text field aligns with the DataPoint pattern.

cognee/tasks/feedback/link_enrichments_to_feedback.py (1)

33-67: LGTM with a minor note on defensive checks.

The implementation correctly creates edges from enrichments to feedback and interaction nodes, with proper logging, indexing, and error handling. The conditionals at lines 48 and 54 checking for ID presence are defensive but acceptable, even though feedback_id and interaction_id are required fields in the FeedbackEnrichment model and enrichment.id is auto-generated in the DataPoint base class.

cognee/modules/graph/cognee_graph/CogneeGraph.py (1)

173-179: LGTM! Graceful fallback for edge_text.

The updated logic correctly prioritizes edge_text for distance lookups while falling back to relationship_type when edge_text is unavailable. This provides backward compatibility during migration and aligns with the PR's edge_text enrichment strategy.

cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

74-74: LGTM! Enables edge_text projection for downstream use.

Adding "edge_text" to the edge properties projection correctly supports the enhanced edge metadata strategy introduced in this PR.

cognee/tests/test_edge_ingestion.py (1)

55-66: LGTM! Comprehensive validation of edge_text format.

The new assertions correctly verify that contains edges include edge_text with the expected format (relationship_name and entity information). This provides good test coverage for the edge_text enrichment feature.

cognee/modules/graph/utils/expand_with_nodes_and_edges.py (2)

3-3: LGTM! Import supports Edge-enriched contains relationships.

The new Edge import enables wrapping entity relationships with structured edge metadata.

247-266: LGTM! Enriches contains relationships with descriptive edge_text.

The change correctly constructs Edge instances with semantic edge_text that includes relationship_name, entity_name, and entity_description. This aligns with the PR's edge enrichment strategy and provides richer metadata for embeddings and graph operations. The format is consistent with test expectations in test_edge_ingestion.py.

cognee/infrastructure/llm/prompts/feedback_report_prompt.txt (1)

1-13: LGTM! Clear and well-structured prompt template.

The prompt provides explicit formatting instructions and placeholder definitions, ensuring consistent output from the LLM for feedback enrichment reports.

cognee/infrastructure/engine/models/Edge.py (1)

32-38: LGTM on auto-populating edge_text

Validator is concise and matches Pydantic v2 patterns. Consider annotating the validator signature for clarity:

def ensure_edge_text(cls, v: Optional[str], info) -> Optional[str]

Please confirm runtime with pydantic==2.x in CI.

cognee/tests/test_feedback_enrichment.py (1)

108-114: Logger kwargs may raise TypeError without structlog configured

logger.info("…", feedback=…, sentiment=…, score=…) will fail with stdlib logging (unknown kwargs). Ensure structlog setup runs in tests or switch to extra={} or structured logger.

Run locally without calling setup_logging() to confirm.

cognee/tasks/feedback/__init__.py (1)

1-13: Public surface looks good

Re-exports match implementations; all is complete.

cognee/tasks/feedback/extract_feedback_interactions.py (1)

185-193: Logging kwargs compatibility

Multiple logger.info/warning calls use key=value kwargs. Confirm structlog is initialized in this task context, or switch to extra={} for stdlib logging.

cognee/modules/retrieval/graph_completion_cot_retriever.py (1)

121-123: Edge is fully hashable; current code is correct.

Edge has both __hash__() and __eq__() implementations. The __hash__() method handles both directed and undirected edges properly, and since Node is also hashable (based on hash(self.id)), calling set(triplets) on a list of Edge objects will not raise a TypeError. The code at lines 121–123 is correct as-is.

Likely an incorrect or invalid review comment.

cognee/modules/retrieval/graph_completion_cot_retriever.py

cognee/tasks/feedback/create_enrichments.py

cognee/tasks/feedback/extract_feedback_interactions.py

cognee/tasks/feedback/generate_improved_answers.py

cognee/tasks/storage/index_data_points.py

cognee/tasks/storage/index_graph_edges.py

…raction

hajdul88

LGTM, we agreed to have a followup PR to cover the weighted edges in test suites as this PR started to use them as the part of our main cognify pipeline.

The PR on the other hand: nice job!

…raction

feat: add edge text, embed and expose it

be7d315

lxobr self-assigned this Oct 28, 2025

lxobr added run-checks core-team labels Oct 28, 2025

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

lxobr changed the base branch from main to dev October 28, 2025 14:00

lxobr added 5 commits October 28, 2025 15:00

Merge branch 'dev' into feature/cog-3256-optimize-repeated-entity-ext…

f6c7bd7

…raction

feat: update tests

6e35747

feat: expose edge_text to retrieval context

f3bda73

test: improve test_edge_ingestion.py

2544eb2

Merge branch 'dev' into feature/cog-3256-optimize-repeated-entity-ext…

41e4198

…raction

hajdul88 approved these changes Oct 28, 2025

View reviewed changes

lxobr added 5 commits October 29, 2025 13:28

Merge branch 'dev' into feature/cog-3256-optimize-repeated-entity-ext…

485774d

…raction

Merge branch 'dev' into feature/cog-3256-optimize-repeated-entity-ext…

fd96390

…raction

Merge branch 'dev' into feature/cog-3256-optimize-repeated-entity-ext…

a3553f6

…raction

Merge branch 'dev' into feature/cog-3256-optimize-repeated-entity-ext…

772a1bb

…raction

Merge branch 'dev' into feature/cog-3256-optimize-repeated-entity-ext…

a5c0430

…raction

lxobr merged commit 6223ecf into dev Oct 30, 2025
132 of 137 checks passed

lxobr deleted the feature/cog-3256-optimize-repeated-entity-extraction branch October 30, 2025 12:56

This was referenced Nov 6, 2025

test: fix weighted edges example #1745

Merged

feat: added label field to DataItem dataclass #1778

Closed

coderabbitai bot mentioned this pull request Dec 16, 2025

COG-3532 chore: retriever test reorganization + adding new tests (unit) (STEP 2) #1892

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: optimize repeated entity extraction #1682

feat: optimize repeated entity extraction #1682

Uh oh!

lxobr commented Oct 28, 2025

Uh oh!

pull-checklist bot commented Oct 28, 2025

Uh oh!

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

Review skipped

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hajdul88 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: optimize repeated entity extraction #1682

feat: optimize repeated entity extraction #1682

Uh oh!

Conversation

lxobr commented Oct 28, 2025

Description

Type of Change

Screenshots/Videos (if applicable)

Pre-submission Checklist

DCO Affirmation

Uh oh!

pull-checklist bot commented Oct 28, 2025

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hajdul88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Oct 28, 2025 •

edited

Loading