Skip to content

Conversation

@lxobr
Copy link
Collaborator

@lxobr lxobr commented Oct 28, 2025

Description

  • Added an edge_text field to edges that auto-fills from relationship_type if not provided.
  • Containts edges now store descriptions for better embedding
  • Updated and refactored indexing so that edge_text gets embedded and exposed
  • Updated retrieval to use the new embeddings
  • Added a test to verify edge_text exists in the graph with the correct format.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces a comprehensive feedback enrichment pipeline that extracts negative user feedback, generates improved answers via Chain-of-Thought reasoning, creates educational enrichments, and links them within the knowledge graph. Supporting changes include expanded graph node payloads, structured LLM output handling, refined indexing strategies, and updated data models.

Changes

Cohort / File(s) Summary
Data Models & Exports
cognee/infrastructure/engine/models/Edge.py, cognee/tasks/feedback/models.py
Added edge_text field to Edge with validator; introduced FeedbackEnrichment model with fields for text, question, answers, feedback/interaction IDs, and enrichment metadata.
LLM Prompt Templates
cognee/infrastructure/llm/prompts/feedback_*.txt
Added three new prompt templates: feedback_reaction_prompt.txt (improved answer generation), feedback_report_prompt.txt (explanation generation), feedback_user_context_prompt.txt (context summarization).
Graph Data Retrieval & Adaptation
cognee/infrastructure/databases/graph/kuzu/adapter.py, cognee/modules/graph/cognee_graph/CogneeGraph.py, cognee/modules/retrieval/utils/brute_force_triplet_search.py
Expanded Kuzu node query to return name, type, and properties; updated edge-distance mapping to use edge_text with fallback to relationship_type; extended edge projection to include edge_text.
Structured LLM Completion
cognee/modules/retrieval/utils/completion.py
Introduced generate_structured_completion function with response_model support; refactored generate_completion to delegate to structured path.
Chain-of-Thought Retriever Refactoring
cognee/modules/retrieval/graph_completion_cot_retriever.py
Converted to structured-output pipeline; added get_structured_completion method; refactored _run_cot_completion to return (completion, context_text, triplets) tuple; updated get_completion to wrap structured results.
Feedback Extraction & Processing Tasks
cognee/tasks/feedback/extract_feedback_interactions.py, cognee/tasks/feedback/generate_improved_answers.py, cognee/tasks/feedback/create_enrichments.py, cognee/tasks/feedback/link_enrichments_to_feedback.py
New modules implementing feedback extraction from graph, improved answer generation via LLM, enrichment report creation, and edge linking between enrichments and feedback/interaction nodes.
Data Chunking & Graph Expansion
cognee/modules/chunking/models/DocumentChunk.py, cognee/modules/graph/utils/expand_with_nodes_and_edges.py
Extended DocumentChunk.contains type to include tuple[Edge, Entity]; updated expand_with_nodes_and_edges to create Edge objects with relationship descriptions and append (Edge, Entity) tuples.
Storage & Indexing Refactoring
cognee/tasks/storage/index_data_points.py, cognee/tasks/storage/index_graph_edges.py
Reorganized index_data_points to batch by type/field; simplified index_graph_edges to use EdgeType datapoints and index_data_points directly instead of manual vector indexing.
Package Initialization
cognee/tasks/feedback/__init__.py
Added public exports: extract_feedback_interactions, generate_improved_answers, create_enrichments, link_enrichments_to_feedback, FeedbackEnrichment.
Tests & Examples
cognee/tests/test_edge_ingestion.py, cognee/tests/test_feedback_enrichment.py, cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py, examples/python/feedback_enrichment_minimal_example.py
Added assertions for edge_text in contains edges; introduced end-to-end integration test for feedback enrichment pipeline; added structured-completion unit test; provided minimal example workflow.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Graph as Knowledge Graph
    participant Feedback as Feedback<br/>Extraction
    participant Retrieval as CoT Retriever
    participant LLM
    participant Enrichment as Enrichment<br/>Creation
    
    User->>Graph: Submit Feedback (Negative)
    Graph->>Feedback: extract_feedback_interactions()
    Feedback->>Graph: Query Interactions & Feedback Nodes
    Feedback->>LLM: Summarize Context
    LLM-->>Feedback: Context Summary
    Feedback-->>Graph: Emit FeedbackEnrichment Records
    
    Feedback->>Retrieval: generate_improved_answers(enrichments)
    Retrieval->>LLM: Render Reaction Prompt with<br/>Question, Answer, Feedback
    LLM-->>Retrieval: Structured ImprovedAnswerResponse
    Retrieval->>Graph: Fetch Related Context via CoT
    Graph-->>Retrieval: Context Triplets + Edges
    Retrieval-->>Feedback: Updated Enrichments<br/>(improved_answer, new_context)
    
    Feedback->>Enrichment: create_enrichments(enrichments)
    Enrichment->>LLM: Generate Report Prompt
    LLM-->>Enrichment: Educational Report Text
    Enrichment->>Graph: Create NodeSet & Link Enrichments
    Graph-->>Enrichment: Return Enriched Records
    
    Enrichment->>Graph: link_enrichments_to_feedback(enrichments)
    Graph->>Graph: Create enriches_feedback &<br/>improves_interaction Edges
    Graph-->>User: Feedback Loop Closed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Priority areas for review:
    • graph_completion_cot_retriever.py: Significant refactoring of core retrieval logic; verify return-type consistency and async/await handling in structured-completion path
    • extract_feedback_interactions.py: Complex graph querying, filtering, and record construction; validate error handling and edge-case coverage for interaction matching
    • index_data_points.py and index_graph_edges.py: Substantial reorganization of indexing flow; ensure batch processing logic and edge-text handling don't break downstream consumers
    • completion.py and LLM integration: New structured-output pathway; verify response_model handling and conversation_history threading
    • Test coverage (test_feedback_enrichment.py): End-to-end integration; ensure all pipeline stages are exercised and assertions are comprehensive

Possibly related PRs

Suggested reviewers

  • borisarzentar
  • hajdul88

🐰 A feedback loop now gleams so bright,
With edges bearing text and light,
Enrichments bloom from answers true,
The graph learns what users knew,
And better answers come into view!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The pull request title "feat: optimize repeated entity extraction" refers to a real aspect of the changeset—specifically the edge_text field improvements and indexing refactoring—but it significantly misses the major scope of the PR. The changes include a substantial new feedback enrichment feature (with new prompt templates, feedback tasks, FeedbackEnrichment model, and related tests) that is not reflected in the title. Additionally, the phrase "repeated entity extraction" is vague and doesn't clearly communicate what optimization is being performed. A teammate scanning the commit history would not fully understand that this PR adds a complete feedback enrichment pipeline alongside edge optimization.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed The pull request description is present and follows the provided template structure with all major sections included. It provides a human-written (not AI-generated) explanation of the changes, including the addition of the edge_text field, improvements to contains edges, indexing refactoring, retrieval updates, and test coverage. The Type of Change section is appropriately filled with "New feature," "Code refactoring," and "Performance improvement" checkboxes. The pre-submission checklist is substantially completed with most relevant items checked, and the DCO affirmation is included.
Docstring Coverage ✅ Passed Docstring coverage is 82.98% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cognee/modules/retrieval/utils/completion.py (1)

18-23: Handle missing/None system prompt

read_query_prompt may return None; concatenation and downstream call will fail silently. Guard and fail fast.

-    system_prompt = system_prompt if system_prompt else read_query_prompt(system_prompt_path)
+    system_prompt = system_prompt if system_prompt else read_query_prompt(system_prompt_path)
+    if not system_prompt:
+        raise ValueError(f"System prompt not found: {system_prompt_path}")
cognee/infrastructure/databases/graph/kuzu/adapter.py (1)

1362-1382: Attribute names must be validated; unparameterized interpolation enables query injection

The code directly interpolates unquoted attribute names (f"n.{attr} IN $..."), allowing malformed or hostile attr values to break queries. Additionally, using where_clause.replace("n.", "n1.") risks mis-rewriting predicates if attr contains "n.". While current callers use safe hardcoded names, the method is public and accepts user-controllable filters.

  • Validate attr against allowed columns or a strict name pattern (e.g., ^[A-Za-z_][A-Za-z0-9_]*$).
  • Route top-level columns ("id", "name", "type") directly; route custom attributes through json_extract() (already used elsewhere in the adapter).
  • Build predicates explicitly (as shown in the refactor suggestion) rather than relying on string replacement.

The suggested refactor is sound and necessary:

-        for i, filter_dict in enumerate(attribute_filters):
-            for attr, values in filter_dict.items():
-                param_name = f"values_{i}_{attr}"
-                where_clauses.append(f"n.{attr} IN ${param_name}")
-                params[param_name] = values
+        import re
+        safe_cols = {"id", "name", "type"}
+        name_re = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
+        node_preds = []
+        for i, filter_dict in enumerate(attribute_filters):
+            for attr, values in filter_dict.items():
+                if not name_re.match(attr):
+                    raise ValueError(f"Invalid attribute name: {attr}")
+                param_name = f"values_{i}_{attr}"
+                params[param_name] = values
+                if attr in safe_cols:
+                    node_preds.append(f"{{alias}}.{attr} IN ${param_name}")
+                else:
+                    node_preds.append(
+                        f"json_extract({{alias}}.properties, '$.{attr}') IN ${param_name}"
+                    )
+        def build_where(alias: str) -> str:
+            return " AND ".join(p.replace("{alias}", alias) for p in node_preds) or "true"
-        where_clause = " AND ".join(where_clauses)
-        nodes_query = f"""
-        MATCH (n:Node)
-        WHERE {where_clause}
+        nodes_query = f"""
+        MATCH (n:Node)
+        WHERE {build_where('n')}
         RETURN n.id, {{
             name: n.name,
             type: n.type,
             properties: n.properties
         }}
         """
-        edges_query = f"""
-        MATCH (n1:Node)-[r:EDGE]->(n2:Node)
-        WHERE {where_clause.replace("n.", "n1.")} AND {where_clause.replace("n.", "n2.")}
+        edges_query = f"""
+        MATCH (n1:Node)-[r:EDGE]->(n2:Node)
+        WHERE {build_where('n1')} AND {build_where('n2')}
         RETURN n1.id, n2.id, r.relationship_name, r.properties
         """
🧹 Nitpick comments (19)
cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt (1)

4-5: Hyphenate compound adjectives for clarity.

The phrases "one paragraph" and "human readable" function as compound adjectives modifying "summary" and should be hyphenated: "one-paragraph" and "human-readable".

Apply this diff:

-Provide a one paragraph human readable summary of this interaction context,
+Provide a one-paragraph human-readable summary of this interaction context,
cognee/infrastructure/llm/prompts/feedback_reaction_prompt.txt (1)

12-14: Avoid format instructions when using structured outputs

If this prompt is used with a structured response_model (e.g., TestAnswer with fields answer/explanation), the “Format your reply as: Answer: … / Explanation: …” can conflict. Consider removing these lines or gating them only for plain-text flows.

cognee/modules/retrieval/utils/completion.py (1)

6-15: Minor typing/API polish

response_model: Type = str is loose. Prefer Union with str type for better type checking: response_model: type[str] | Type[BaseModel] = str.

cognee/modules/chunking/models/DocumentChunk.py (1)

35-35: Make contains Optional[List[…]] and import Optional

Default is None but type isn’t Optional, which trips type checkers.

-from typing import List, Union
+from typing import List, Union, Optional-    contains: List[Union[Entity, Event, tuple[Edge, Entity]]] = None
+    contains: Optional[List[Union[Entity, Event, tuple[Edge, Entity]]]] = None
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (1)

178-221: Stabilize tests by mocking LLMGateway.acreate_structured_output

Current tests rely on the live LLM path; they can be flaky/slow. Mock to deterministic outputs.

Example:

+    @pytest.fixture(autouse=True)
+    def _mock_llm(monkeypatch):
+        async def _fake_create(text_input: str, system_prompt: str, response_model):
+            if response_model is str:
+                return "Alice"
+            return TestAnswer(answer="Alice", explanation="From graph context.")
+        monkeypatch.setattr(
+            "cognee.infrastructure.llm.LLMGateway.LLMGateway.acreate_structured_output",
+            staticmethod(_fake_create),
+        )
cognee/tasks/storage/index_data_points.py (2)

58-61: Bound parallelism to avoid creating thousands of tasks

Large corpora can spawn unbounded tasks. Use a semaphore.

-    tasks = [
-        asyncio.create_task(vector_engine.index_data_points(type_name, field_name, batch_points))
-        for type_name, field_name, batch_points in batches
-    ]
-    await asyncio.gather(*tasks)
+    sem = asyncio.Semaphore(8)
+    async def _run(type_name, field_name, batch_points):
+        async with sem:
+            await vector_engine.index_data_points(type_name, field_name, batch_points)
+    await asyncio.gather(*(_run(t, f, b) for t, f, b in batches))

49-51: Validate batch_size

Defensive check prevents division/empty-slice issues.

 batch_size = vector_engine.embedding_engine.get_batch_size()
+if not isinstance(batch_size, int) or batch_size <= 0:
+    raise ValueError(f"Invalid embedding batch_size: {batch_size}")
cognee/tests/test_feedback_enrichment.py (1)

36-45: Split setup into helpers to reduce locals (lint R0914)

main() holds many locals. Extract directory prep and node/edge assertions into helpers to satisfy lint and readability.

cognee/tasks/storage/index_graph_edges.py (1)

67-69: Clarify deprecation message

Saying “edge embedding is deprecated” is ambiguous. Consider: “Auto-fetching edges inside index_graph_edges is deprecated; pass edges explicitly.”

cognee/tasks/feedback/extract_feedback_interactions.py (1)

87-95: Make recency sort robust to mixed timestamp formats

Compare numeric timestamps; parse ISO8601 (incl. trailing 'Z') with fallback to 0, to avoid type errors and misordering.

Apply this diff:

@@
-    def _recency_key(pair):
-        _, (_, interaction_props) = pair
-        created_at = interaction_props.get("created_at") or ""
-        updated_at = interaction_props.get("updated_at") or ""
-        return (created_at, updated_at)
+    from datetime import datetime
+
+    def _to_ts(value) -> float:
+        if isinstance(value, (int, float)):
+            return float(value)
+        if isinstance(value, str) and value:
+            val = value.replace("Z", "+00:00")
+            try:
+                return datetime.fromisoformat(val).timestamp()
+            except Exception:
+                return 0.0
+        return 0.0
+
+    def _recency_key(pair):
+        _, (_, interaction_props) = pair
+        return (
+            _to_ts(interaction_props.get("created_at")),
+            _to_ts(interaction_props.get("updated_at")),
+        )
examples/python/feedback_enrichment_minimal_example.py (1)

4-5: Unify SearchType import with internal usage

Elsewhere it’s from cognee.modules.search.types import SearchType. Use the same to avoid API surface drift.

Apply this diff:

-from cognee.api.v1.search import SearchType
+from cognee.modules.search.types import SearchType

If the API alias is required for public users, keep it and justify with a comment.

cognee/tasks/feedback/generate_improved_answers.py (3)

72-81: Remove unnecessary else after return.

Simplify per pylint R1705:

-        if completion:
-            enrichment.improved_answer = completion.answer
-            enrichment.new_context = new_context_text
-            enrichment.explanation = completion.explanation
-            return enrichment
-        else:
-            logger.warning(
-                "Failed to get structured completion from retriever", question=enrichment.question
-            )
-            return None
+        if completion:
+            enrichment.improved_answer = completion.answer
+            enrichment.new_context = new_context_text
+            enrichment.explanation = completion.explanation
+            return enrichment
+        logger.warning(
+            "Failed to get structured completion from retriever | question=%s",
+            enrichment.question,
+        )
+        return None

115-121: Throughput: consider bounded concurrency for multiple enrichments.

If rate limits allow, use asyncio.gather with a semaphore to parallelize per-item processing.


6-9: Unused import detected.

resolve_edges_to_text is imported but not used. Remove to keep import hygiene.

cognee/tasks/feedback/create_enrichments.py (2)

35-45: Optional: pre-check prompt_template to avoid raising exceptions.

You already catch exceptions, but you can avoid the try/except by checking for None and falling back early.


70-81: Throughput: optional bounded concurrency for report generation.

Use asyncio.gather with a semaphore to parallelize _generate_enrichment_report across items if allowed.

cognee/modules/retrieval/graph_completion_cot_retriever.py (3)

146-161: Missing None checks for validation/follow-up prompt files.

read_query_prompt may return None; pass-through to LLM will likely fail. Add guards with fallbacks or raise with context.


84-92: API ergonomics: parameter counts are high; consider grouping configuration.

To address R0913/R0917, introduce small config dataclasses or reuse self.* defaults to reduce arg counts.

Also applies to: 168-176


25-36: Minor: typing polish.

response_model is a type; prefer Type[Any] for annotations and return type tuple[Any, str, List[Edge]] already correct.

-def _as_answer_text(completion: Any) -> str:
+def _as_answer_text(completion: Any) -> str:
     ...

And:

-        response_model: Type = str,
+        response_model: Type[Any] = str,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c92a7b1 and be7d315.

⛔ Files ignored due to path filters (1)
  • .github/workflows/e2e_tests.yml is excluded by !**/*.yml
📒 Files selected for processing (23)
  • cognee/infrastructure/databases/graph/kuzu/adapter.py (1 hunks)
  • cognee/infrastructure/engine/models/Edge.py (2 hunks)
  • cognee/infrastructure/llm/prompts/feedback_reaction_prompt.txt (1 hunks)
  • cognee/infrastructure/llm/prompts/feedback_report_prompt.txt (1 hunks)
  • cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt (1 hunks)
  • cognee/modules/chunking/models/DocumentChunk.py (2 hunks)
  • cognee/modules/graph/cognee_graph/CogneeGraph.py (1 hunks)
  • cognee/modules/graph/utils/expand_with_nodes_and_edges.py (2 hunks)
  • cognee/modules/retrieval/graph_completion_cot_retriever.py (7 hunks)
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py (1 hunks)
  • cognee/modules/retrieval/utils/completion.py (2 hunks)
  • cognee/tasks/feedback/__init__.py (1 hunks)
  • cognee/tasks/feedback/create_enrichments.py (1 hunks)
  • cognee/tasks/feedback/extract_feedback_interactions.py (1 hunks)
  • cognee/tasks/feedback/generate_improved_answers.py (1 hunks)
  • cognee/tasks/feedback/link_enrichments_to_feedback.py (1 hunks)
  • cognee/tasks/feedback/models.py (1 hunks)
  • cognee/tasks/storage/index_data_points.py (1 hunks)
  • cognee/tasks/storage/index_graph_edges.py (3 hunks)
  • cognee/tests/test_edge_ingestion.py (1 hunks)
  • cognee/tests/test_feedback_enrichment.py (1 hunks)
  • cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (3 hunks)
  • examples/python/feedback_enrichment_minimal_example.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py: Use 4-space indentation; name modules and functions in snake_case; name classes in PascalCase (Python)
Adhere to ruff rules, including import hygiene and configured line length (100)
Keep Python lines ≤ 100 characters

Files:

  • cognee/tests/test_feedback_enrichment.py
  • cognee/modules/chunking/models/DocumentChunk.py
  • examples/python/feedback_enrichment_minimal_example.py
  • cognee/tasks/feedback/extract_feedback_interactions.py
  • cognee/infrastructure/engine/models/Edge.py
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py
  • cognee/modules/retrieval/utils/completion.py
  • cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
  • cognee/tasks/feedback/link_enrichments_to_feedback.py
  • cognee/tasks/feedback/models.py
  • cognee/modules/graph/utils/expand_with_nodes_and_edges.py
  • cognee/modules/graph/cognee_graph/CogneeGraph.py
  • cognee/infrastructure/databases/graph/kuzu/adapter.py
  • cognee/tasks/feedback/create_enrichments.py
  • cognee/tests/test_edge_ingestion.py
  • cognee/tasks/storage/index_data_points.py
  • cognee/tasks/feedback/generate_improved_answers.py
  • cognee/tasks/storage/index_graph_edges.py
  • cognee/tasks/feedback/__init__.py
  • cognee/modules/retrieval/graph_completion_cot_retriever.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/**/*.py: Public APIs in the core library should be type-annotated where practical
Prefer explicit, structured error handling and use shared logging utilities from cognee.shared.logging_utils

Files:

  • cognee/tests/test_feedback_enrichment.py
  • cognee/modules/chunking/models/DocumentChunk.py
  • cognee/tasks/feedback/extract_feedback_interactions.py
  • cognee/infrastructure/engine/models/Edge.py
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py
  • cognee/modules/retrieval/utils/completion.py
  • cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
  • cognee/tasks/feedback/link_enrichments_to_feedback.py
  • cognee/tasks/feedback/models.py
  • cognee/modules/graph/utils/expand_with_nodes_and_edges.py
  • cognee/modules/graph/cognee_graph/CogneeGraph.py
  • cognee/infrastructure/databases/graph/kuzu/adapter.py
  • cognee/tasks/feedback/create_enrichments.py
  • cognee/tests/test_edge_ingestion.py
  • cognee/tasks/storage/index_data_points.py
  • cognee/tasks/feedback/generate_improved_answers.py
  • cognee/tasks/storage/index_graph_edges.py
  • cognee/tasks/feedback/__init__.py
  • cognee/modules/retrieval/graph_completion_cot_retriever.py
cognee/tests/**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/tests/**/test_*.py: Name test files as test_*.py
Use pytest.mark.asyncio for async tests
Tests should avoid external state; rely on fixtures and CI-provided env vars when providers are required

Files:

  • cognee/tests/test_feedback_enrichment.py
  • cognee/tests/test_edge_ingestion.py
examples/python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

When adding public APIs, provide or update targeted examples under examples/python/

Files:

  • examples/python/feedback_enrichment_minimal_example.py
cognee/tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Place unit tests under cognee/tests/unit/

Files:

  • cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
🧬 Code graph analysis (15)
cognee/tests/test_feedback_enrichment.py (9)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
  • get_graph_engine (10-24)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/modules/search/types/SearchType.py (1)
  • SearchType (4-19)
cognee/shared/logging_utils.py (2)
  • get_logger (182-194)
  • info (175-175)
cognee/tasks/feedback/create_enrichments.py (1)
  • create_enrichments (51-84)
cognee/tasks/feedback/extract_feedback_interactions.py (1)
  • extract_feedback_interactions (180-230)
cognee/tasks/feedback/generate_improved_answers.py (1)
  • generate_improved_answers (92-130)
cognee/tasks/feedback/link_enrichments_to_feedback.py (1)
  • link_enrichments_to_feedback (33-67)
cognee/api/v1/config/config.py (2)
  • data_root_directory (36-38)
  • system_root_directory (18-33)
cognee/modules/chunking/models/DocumentChunk.py (3)
cognee/infrastructure/engine/models/Edge.py (1)
  • Edge (5-38)
cognee/modules/engine/models/Entity.py (1)
  • Entity (6-11)
cognee/modules/engine/models/Event.py (1)
  • Event (8-16)
examples/python/feedback_enrichment_minimal_example.py (6)
cognee/modules/search/types/SearchType.py (1)
  • SearchType (4-19)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/tasks/feedback/extract_feedback_interactions.py (1)
  • extract_feedback_interactions (180-230)
cognee/tasks/feedback/generate_improved_answers.py (1)
  • generate_improved_answers (92-130)
cognee/tasks/feedback/create_enrichments.py (1)
  • create_enrichments (51-84)
cognee/tasks/feedback/link_enrichments_to_feedback.py (1)
  • link_enrichments_to_feedback (33-67)
cognee/tasks/feedback/extract_feedback_interactions.py (5)
cognee/infrastructure/llm/LLMGateway.py (1)
  • LLMGateway (6-66)
cognee/infrastructure/llm/prompts/read_query_prompt.py (1)
  • read_query_prompt (6-43)
cognee/shared/logging_utils.py (2)
  • get_logger (182-194)
  • info (175-175)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
  • get_graph_engine (10-24)
cognee/tasks/feedback/models.py (1)
  • FeedbackEnrichment (9-26)
cognee/modules/retrieval/utils/completion.py (3)
cognee/infrastructure/llm/LLMGateway.py (1)
  • LLMGateway (6-66)
cognee/infrastructure/llm/prompts/render_prompt.py (1)
  • render_prompt (5-42)
cognee/infrastructure/llm/prompts/read_query_prompt.py (1)
  • read_query_prompt (6-43)
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (3)
cognee/api/v1/config/config.py (2)
  • system_root_directory (18-33)
  • data_root_directory (36-38)
cognee/infrastructure/engine/models/DataPoint.py (1)
  • DataPoint (20-220)
cognee/modules/retrieval/graph_completion_cot_retriever.py (2)
  • GraphCompletionCotRetriever (39-272)
  • get_structured_completion (168-231)
cognee/tasks/feedback/link_enrichments_to_feedback.py (4)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
  • get_graph_engine (10-24)
cognee/tasks/storage/index_graph_edges.py (1)
  • index_graph_edges (42-77)
cognee/shared/logging_utils.py (2)
  • get_logger (182-194)
  • info (175-175)
cognee/tasks/feedback/models.py (1)
  • FeedbackEnrichment (9-26)
cognee/tasks/feedback/models.py (2)
cognee/infrastructure/engine/models/DataPoint.py (1)
  • DataPoint (20-220)
cognee/modules/engine/models/node_set.py (1)
  • NodeSet (4-7)
cognee/modules/graph/utils/expand_with_nodes_and_edges.py (1)
cognee/infrastructure/engine/models/Edge.py (1)
  • Edge (5-38)
cognee/tasks/feedback/create_enrichments.py (5)
cognee/infrastructure/llm/LLMGateway.py (1)
  • LLMGateway (6-66)
cognee/infrastructure/llm/prompts/read_query_prompt.py (1)
  • read_query_prompt (6-43)
cognee/shared/logging_utils.py (2)
  • get_logger (182-194)
  • info (175-175)
cognee/modules/engine/models/node_set.py (1)
  • NodeSet (4-7)
cognee/tasks/feedback/models.py (1)
  • FeedbackEnrichment (9-26)
cognee/tasks/storage/index_data_points.py (5)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
  • get_vector_engine (5-7)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)
  • create_vector_index (292-295)
  • index_data_points (297-309)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
  • create_vector_index (248-249)
  • index_data_points (251-263)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)
  • create_vector_index (285-295)
  • index_data_points (297-319)
cognee/infrastructure/databases/vector/embeddings/EmbeddingEngine.py (1)
  • get_batch_size (38-45)
cognee/tasks/feedback/generate_improved_answers.py (5)
cognee/infrastructure/llm/LLMGateway.py (1)
  • LLMGateway (6-66)
cognee/infrastructure/llm/prompts/read_query_prompt.py (1)
  • read_query_prompt (6-43)
cognee/shared/logging_utils.py (2)
  • get_logger (182-194)
  • info (175-175)
cognee/modules/retrieval/graph_completion_cot_retriever.py (2)
  • GraphCompletionCotRetriever (39-272)
  • get_structured_completion (168-231)
cognee/tasks/feedback/models.py (1)
  • FeedbackEnrichment (9-26)
cognee/tasks/storage/index_graph_edges.py (7)
cognee/modules/engine/utils/generate_edge_id.py (1)
  • generate_edge_id (4-5)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
  • get_graph_engine (10-24)
cognee/modules/graph/models/EdgeType.py (1)
  • EdgeType (4-8)
cognee/tasks/storage/index_data_points.py (1)
  • index_data_points (10-65)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
  • index_data_points (297-309)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
  • index_data_points (251-263)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
  • index_data_points (297-319)
cognee/tasks/feedback/__init__.py (1)
cognee/tasks/feedback/models.py (1)
  • FeedbackEnrichment (9-26)
cognee/modules/retrieval/graph_completion_cot_retriever.py (4)
cognee/modules/retrieval/graph_completion_retriever.py (3)
  • GraphCompletionRetriever (28-281)
  • save_qa (217-281)
  • get_completion (144-215)
cognee/modules/retrieval/utils/completion.py (2)
  • generate_structured_completion (6-28)
  • summarize_text (51-63)
cognee/infrastructure/databases/cache/config.py (1)
  • CacheConfig (6-39)
cognee/modules/retrieval/utils/session_cache.py (2)
  • get_conversation_history (78-156)
  • save_conversation_history (10-75)
🪛 LanguageTool
cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt

[grammar] ~4-~4: Use a hyphen to join words.
Context: ...stion} Context: {context} Provide a one paragraph human readable summary of this...

(QB_NEW_EN_HYPHEN)


[grammar] ~4-~4: Use a hyphen to join words.
Context: ...{context} Provide a one paragraph human readable summary of this interaction con...

(QB_NEW_EN_HYPHEN)

🪛 Pylint (4.0.1)
cognee/tests/test_feedback_enrichment.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)


[refactor] 36-36: Too many local variables (23/15)

(R0914)

examples/python/feedback_enrichment_minimal_example.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tasks/feedback/extract_feedback_interactions.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)


[refactor] 153-157: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

cognee/infrastructure/engine/models/Edge.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/retrieval/utils/completion.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)


[refactor] 6-6: Too many arguments (7/5)

(R0913)


[refactor] 6-6: Too many positional arguments (7/5)

(R0917)


[refactor] 31-31: Too many arguments (6/5)

(R0913)


[refactor] 31-31: Too many positional arguments (6/5)

(R0917)

cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py

[refactor] 193-193: Too few public methods (0/2)

(R0903)


[refactor] 196-196: Too few public methods (0/2)

(R0903)

cognee/tasks/feedback/link_enrichments_to_feedback.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tasks/feedback/models.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)


[refactor] 9-9: Too few public methods (0/2)

(R0903)

cognee/tasks/feedback/create_enrichments.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tasks/feedback/generate_improved_answers.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)


[refactor] 72-81: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

cognee/tasks/storage/index_graph_edges.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/retrieval/graph_completion_cot_retriever.py

[refactor] 84-84: Too many arguments (6/5)

(R0913)


[refactor] 84-84: Too many positional arguments (6/5)

(R0917)


[refactor] 84-84: Too many local variables (19/15)

(R0914)


[refactor] 168-168: Too many arguments (6/5)

(R0913)


[refactor] 168-168: Too many positional arguments (6/5)

(R0917)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: End-to-End Tests / Test Feedback Enrichment
  • GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
  • GitHub Check: End-to-End Tests / Test Entity Extraction
  • GitHub Check: End-to-End Tests / Concurrent Subprocess access test
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: End-to-End Tests / Conversation sessions test
  • GitHub Check: End-to-End Tests / Deduplication Test
  • GitHub Check: End-to-End Tests / S3 Bucket Test
  • GitHub Check: End-to-End Tests / Test graph edge ingestion
  • GitHub Check: CLI Tests / CLI Functionality Tests
  • GitHub Check: CLI Tests / CLI Integration Tests
  • GitHub Check: Basic Tests / Run Simple Examples
  • GitHub Check: Basic Tests / Run Basic Graph Tests
  • GitHub Check: End-to-End Tests / Server Start Test
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: Basic Tests / Run Integration Tests
  • GitHub Check: Basic Tests / Run Simple Examples BAML
  • GitHub Check: End-to-End Tests / Test graph edge ingestion
  • GitHub Check: End-to-End Tests / Run Telemetry Test
  • GitHub Check: Test Weighted Edges Examples
  • GitHub Check: Test Weighted Edges with Different Graph Databases (neo4j)
🔇 Additional comments (13)
cognee/tasks/feedback/models.py (1)

9-26: LGTM!

The FeedbackEnrichment data model is well-structured with proper type annotations, sensible defaults for optional fields, and clear field semantics. The metadata configuration for indexing the text field aligns with the DataPoint pattern.

cognee/tasks/feedback/link_enrichments_to_feedback.py (1)

33-67: LGTM with a minor note on defensive checks.

The implementation correctly creates edges from enrichments to feedback and interaction nodes, with proper logging, indexing, and error handling. The conditionals at lines 48 and 54 checking for ID presence are defensive but acceptable, even though feedback_id and interaction_id are required fields in the FeedbackEnrichment model and enrichment.id is auto-generated in the DataPoint base class.

cognee/modules/graph/cognee_graph/CogneeGraph.py (1)

173-179: LGTM! Graceful fallback for edge_text.

The updated logic correctly prioritizes edge_text for distance lookups while falling back to relationship_type when edge_text is unavailable. This provides backward compatibility during migration and aligns with the PR's edge_text enrichment strategy.

cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

74-74: LGTM! Enables edge_text projection for downstream use.

Adding "edge_text" to the edge properties projection correctly supports the enhanced edge metadata strategy introduced in this PR.

cognee/tests/test_edge_ingestion.py (1)

55-66: LGTM! Comprehensive validation of edge_text format.

The new assertions correctly verify that contains edges include edge_text with the expected format (relationship_name and entity information). This provides good test coverage for the edge_text enrichment feature.

cognee/modules/graph/utils/expand_with_nodes_and_edges.py (2)

3-3: LGTM! Import supports Edge-enriched contains relationships.

The new Edge import enables wrapping entity relationships with structured edge metadata.


247-266: LGTM! Enriches contains relationships with descriptive edge_text.

The change correctly constructs Edge instances with semantic edge_text that includes relationship_name, entity_name, and entity_description. This aligns with the PR's edge enrichment strategy and provides richer metadata for embeddings and graph operations. The format is consistent with test expectations in test_edge_ingestion.py.

cognee/infrastructure/llm/prompts/feedback_report_prompt.txt (1)

1-13: LGTM! Clear and well-structured prompt template.

The prompt provides explicit formatting instructions and placeholder definitions, ensuring consistent output from the LLM for feedback enrichment reports.

cognee/infrastructure/engine/models/Edge.py (1)

32-38: LGTM on auto-populating edge_text

Validator is concise and matches Pydantic v2 patterns. Consider annotating the validator signature for clarity:

  • def ensure_edge_text(cls, v: Optional[str], info) -> Optional[str]

Please confirm runtime with pydantic==2.x in CI.

cognee/tests/test_feedback_enrichment.py (1)

108-114: Logger kwargs may raise TypeError without structlog configured

logger.info("…", feedback=…, sentiment=…, score=…) will fail with stdlib logging (unknown kwargs). Ensure structlog setup runs in tests or switch to extra={} or structured logger.

Run locally without calling setup_logging() to confirm.

cognee/tasks/feedback/__init__.py (1)

1-13: Public surface looks good

Re-exports match implementations; all is complete.

cognee/tasks/feedback/extract_feedback_interactions.py (1)

185-193: Logging kwargs compatibility

Multiple logger.info/warning calls use key=value kwargs. Confirm structlog is initialized in this task context, or switch to extra={} for stdlib logging.

cognee/modules/retrieval/graph_completion_cot_retriever.py (1)

121-123: Edge is fully hashable; current code is correct.

Edge has both __hash__() and __eq__() implementations. The __hash__() method handles both directed and undirected edges properly, and since Node is also hashable (based on hash(self.id)), calling set(triplets) on a list of Edge objects will not raise a TypeError. The code at lines 121–123 is correct as-is.

Likely an incorrect or invalid review comment.

@lxobr lxobr changed the base branch from main to dev October 28, 2025 14:00
Copy link
Collaborator

@hajdul88 hajdul88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we agreed to have a followup PR to cover the weighted edges in test suites as this PR started to use them as the part of our main cognify pipeline.

The PR on the other hand: nice job!

@lxobr lxobr merged commit 6223ecf into dev Oct 30, 2025
132 of 137 checks passed
@lxobr lxobr deleted the feature/cog-3256-optimize-repeated-entity-extraction branch October 30, 2025 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants