-
Notifications
You must be signed in to change notification settings - Fork 966
feat: optimize repeated entity extraction #1682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: optimize repeated entity extraction #1682
Conversation
Please make sure all the checkboxes are checked:
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces a comprehensive feedback enrichment pipeline that extracts negative user feedback, generates improved answers via Chain-of-Thought reasoning, creates educational enrichments, and links them within the knowledge graph. Supporting changes include expanded graph node payloads, structured LLM output handling, refined indexing strategies, and updated data models. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Graph as Knowledge Graph
participant Feedback as Feedback<br/>Extraction
participant Retrieval as CoT Retriever
participant LLM
participant Enrichment as Enrichment<br/>Creation
User->>Graph: Submit Feedback (Negative)
Graph->>Feedback: extract_feedback_interactions()
Feedback->>Graph: Query Interactions & Feedback Nodes
Feedback->>LLM: Summarize Context
LLM-->>Feedback: Context Summary
Feedback-->>Graph: Emit FeedbackEnrichment Records
Feedback->>Retrieval: generate_improved_answers(enrichments)
Retrieval->>LLM: Render Reaction Prompt with<br/>Question, Answer, Feedback
LLM-->>Retrieval: Structured ImprovedAnswerResponse
Retrieval->>Graph: Fetch Related Context via CoT
Graph-->>Retrieval: Context Triplets + Edges
Retrieval-->>Feedback: Updated Enrichments<br/>(improved_answer, new_context)
Feedback->>Enrichment: create_enrichments(enrichments)
Enrichment->>LLM: Generate Report Prompt
LLM-->>Enrichment: Educational Report Text
Enrichment->>Graph: Create NodeSet & Link Enrichments
Graph-->>Enrichment: Return Enriched Records
Enrichment->>Graph: link_enrichments_to_feedback(enrichments)
Graph->>Graph: Create enriches_feedback &<br/>improves_interaction Edges
Graph-->>User: Feedback Loop Closed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cognee/modules/retrieval/utils/completion.py (1)
18-23: Handle missing/None system promptread_query_prompt may return None; concatenation and downstream call will fail silently. Guard and fail fast.
- system_prompt = system_prompt if system_prompt else read_query_prompt(system_prompt_path) + system_prompt = system_prompt if system_prompt else read_query_prompt(system_prompt_path) + if not system_prompt: + raise ValueError(f"System prompt not found: {system_prompt_path}")cognee/infrastructure/databases/graph/kuzu/adapter.py (1)
1362-1382: Attribute names must be validated; unparameterized interpolation enables query injectionThe code directly interpolates unquoted attribute names (
f"n.{attr} IN $..."), allowing malformed or hostile attr values to break queries. Additionally, usingwhere_clause.replace("n.", "n1.")risks mis-rewriting predicates if attr contains "n.". While current callers use safe hardcoded names, the method is public and accepts user-controllable filters.
- Validate attr against allowed columns or a strict name pattern (e.g.,
^[A-Za-z_][A-Za-z0-9_]*$).- Route top-level columns ("id", "name", "type") directly; route custom attributes through
json_extract()(already used elsewhere in the adapter).- Build predicates explicitly (as shown in the refactor suggestion) rather than relying on string replacement.
The suggested refactor is sound and necessary:
- for i, filter_dict in enumerate(attribute_filters): - for attr, values in filter_dict.items(): - param_name = f"values_{i}_{attr}" - where_clauses.append(f"n.{attr} IN ${param_name}") - params[param_name] = values + import re + safe_cols = {"id", "name", "type"} + name_re = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$") + node_preds = [] + for i, filter_dict in enumerate(attribute_filters): + for attr, values in filter_dict.items(): + if not name_re.match(attr): + raise ValueError(f"Invalid attribute name: {attr}") + param_name = f"values_{i}_{attr}" + params[param_name] = values + if attr in safe_cols: + node_preds.append(f"{{alias}}.{attr} IN ${param_name}") + else: + node_preds.append( + f"json_extract({{alias}}.properties, '$.{attr}') IN ${param_name}" + ) + def build_where(alias: str) -> str: + return " AND ".join(p.replace("{alias}", alias) for p in node_preds) or "true" - where_clause = " AND ".join(where_clauses) - nodes_query = f""" - MATCH (n:Node) - WHERE {where_clause} + nodes_query = f""" + MATCH (n:Node) + WHERE {build_where('n')} RETURN n.id, {{ name: n.name, type: n.type, properties: n.properties }} """ - edges_query = f""" - MATCH (n1:Node)-[r:EDGE]->(n2:Node) - WHERE {where_clause.replace("n.", "n1.")} AND {where_clause.replace("n.", "n2.")} + edges_query = f""" + MATCH (n1:Node)-[r:EDGE]->(n2:Node) + WHERE {build_where('n1')} AND {build_where('n2')} RETURN n1.id, n2.id, r.relationship_name, r.properties """
🧹 Nitpick comments (19)
cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt (1)
4-5: Hyphenate compound adjectives for clarity.The phrases "one paragraph" and "human readable" function as compound adjectives modifying "summary" and should be hyphenated: "one-paragraph" and "human-readable".
Apply this diff:
-Provide a one paragraph human readable summary of this interaction context, +Provide a one-paragraph human-readable summary of this interaction context,cognee/infrastructure/llm/prompts/feedback_reaction_prompt.txt (1)
12-14: Avoid format instructions when using structured outputsIf this prompt is used with a structured response_model (e.g., TestAnswer with fields answer/explanation), the “Format your reply as: Answer: … / Explanation: …” can conflict. Consider removing these lines or gating them only for plain-text flows.
cognee/modules/retrieval/utils/completion.py (1)
6-15: Minor typing/API polishresponse_model: Type = str is loose. Prefer Union with str type for better type checking: response_model: type[str] | Type[BaseModel] = str.
cognee/modules/chunking/models/DocumentChunk.py (1)
35-35: Make contains Optional[List[…]] and import OptionalDefault is None but type isn’t Optional, which trips type checkers.
-from typing import List, Union +from typing import List, Union, Optional … - contains: List[Union[Entity, Event, tuple[Edge, Entity]]] = None + contains: Optional[List[Union[Entity, Event, tuple[Edge, Entity]]]] = Nonecognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (1)
178-221: Stabilize tests by mocking LLMGateway.acreate_structured_outputCurrent tests rely on the live LLM path; they can be flaky/slow. Mock to deterministic outputs.
Example:
+ @pytest.fixture(autouse=True) + def _mock_llm(monkeypatch): + async def _fake_create(text_input: str, system_prompt: str, response_model): + if response_model is str: + return "Alice" + return TestAnswer(answer="Alice", explanation="From graph context.") + monkeypatch.setattr( + "cognee.infrastructure.llm.LLMGateway.LLMGateway.acreate_structured_output", + staticmethod(_fake_create), + )cognee/tasks/storage/index_data_points.py (2)
58-61: Bound parallelism to avoid creating thousands of tasksLarge corpora can spawn unbounded tasks. Use a semaphore.
- tasks = [ - asyncio.create_task(vector_engine.index_data_points(type_name, field_name, batch_points)) - for type_name, field_name, batch_points in batches - ] - await asyncio.gather(*tasks) + sem = asyncio.Semaphore(8) + async def _run(type_name, field_name, batch_points): + async with sem: + await vector_engine.index_data_points(type_name, field_name, batch_points) + await asyncio.gather(*(_run(t, f, b) for t, f, b in batches))
49-51: Validate batch_sizeDefensive check prevents division/empty-slice issues.
batch_size = vector_engine.embedding_engine.get_batch_size() +if not isinstance(batch_size, int) or batch_size <= 0: + raise ValueError(f"Invalid embedding batch_size: {batch_size}")cognee/tests/test_feedback_enrichment.py (1)
36-45: Split setup into helpers to reduce locals (lint R0914)main() holds many locals. Extract directory prep and node/edge assertions into helpers to satisfy lint and readability.
cognee/tasks/storage/index_graph_edges.py (1)
67-69: Clarify deprecation messageSaying “edge embedding is deprecated” is ambiguous. Consider: “Auto-fetching edges inside index_graph_edges is deprecated; pass edges explicitly.”
cognee/tasks/feedback/extract_feedback_interactions.py (1)
87-95: Make recency sort robust to mixed timestamp formatsCompare numeric timestamps; parse ISO8601 (incl. trailing 'Z') with fallback to 0, to avoid type errors and misordering.
Apply this diff:
@@ - def _recency_key(pair): - _, (_, interaction_props) = pair - created_at = interaction_props.get("created_at") or "" - updated_at = interaction_props.get("updated_at") or "" - return (created_at, updated_at) + from datetime import datetime + + def _to_ts(value) -> float: + if isinstance(value, (int, float)): + return float(value) + if isinstance(value, str) and value: + val = value.replace("Z", "+00:00") + try: + return datetime.fromisoformat(val).timestamp() + except Exception: + return 0.0 + return 0.0 + + def _recency_key(pair): + _, (_, interaction_props) = pair + return ( + _to_ts(interaction_props.get("created_at")), + _to_ts(interaction_props.get("updated_at")), + )examples/python/feedback_enrichment_minimal_example.py (1)
4-5: Unify SearchType import with internal usageElsewhere it’s
from cognee.modules.search.types import SearchType. Use the same to avoid API surface drift.Apply this diff:
-from cognee.api.v1.search import SearchType +from cognee.modules.search.types import SearchTypeIf the API alias is required for public users, keep it and justify with a comment.
cognee/tasks/feedback/generate_improved_answers.py (3)
72-81: Remove unnecessary else after return.Simplify per pylint R1705:
- if completion: - enrichment.improved_answer = completion.answer - enrichment.new_context = new_context_text - enrichment.explanation = completion.explanation - return enrichment - else: - logger.warning( - "Failed to get structured completion from retriever", question=enrichment.question - ) - return None + if completion: + enrichment.improved_answer = completion.answer + enrichment.new_context = new_context_text + enrichment.explanation = completion.explanation + return enrichment + logger.warning( + "Failed to get structured completion from retriever | question=%s", + enrichment.question, + ) + return None
115-121: Throughput: consider bounded concurrency for multiple enrichments.If rate limits allow, use asyncio.gather with a semaphore to parallelize per-item processing.
6-9: Unused import detected.resolve_edges_to_text is imported but not used. Remove to keep import hygiene.
cognee/tasks/feedback/create_enrichments.py (2)
35-45: Optional: pre-check prompt_template to avoid raising exceptions.You already catch exceptions, but you can avoid the try/except by checking for None and falling back early.
70-81: Throughput: optional bounded concurrency for report generation.Use asyncio.gather with a semaphore to parallelize _generate_enrichment_report across items if allowed.
cognee/modules/retrieval/graph_completion_cot_retriever.py (3)
146-161: Missing None checks for validation/follow-up prompt files.read_query_prompt may return None; pass-through to LLM will likely fail. Add guards with fallbacks or raise with context.
84-92: API ergonomics: parameter counts are high; consider grouping configuration.To address R0913/R0917, introduce small config dataclasses or reuse self.* defaults to reduce arg counts.
Also applies to: 168-176
25-36: Minor: typing polish.response_model is a type; prefer Type[Any] for annotations and return type tuple[Any, str, List[Edge]] already correct.
-def _as_answer_text(completion: Any) -> str: +def _as_answer_text(completion: Any) -> str: ...And:
- response_model: Type = str, + response_model: Type[Any] = str,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
.github/workflows/e2e_tests.ymlis excluded by!**/*.yml
📒 Files selected for processing (23)
cognee/infrastructure/databases/graph/kuzu/adapter.py(1 hunks)cognee/infrastructure/engine/models/Edge.py(2 hunks)cognee/infrastructure/llm/prompts/feedback_reaction_prompt.txt(1 hunks)cognee/infrastructure/llm/prompts/feedback_report_prompt.txt(1 hunks)cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt(1 hunks)cognee/modules/chunking/models/DocumentChunk.py(2 hunks)cognee/modules/graph/cognee_graph/CogneeGraph.py(1 hunks)cognee/modules/graph/utils/expand_with_nodes_and_edges.py(2 hunks)cognee/modules/retrieval/graph_completion_cot_retriever.py(7 hunks)cognee/modules/retrieval/utils/brute_force_triplet_search.py(1 hunks)cognee/modules/retrieval/utils/completion.py(2 hunks)cognee/tasks/feedback/__init__.py(1 hunks)cognee/tasks/feedback/create_enrichments.py(1 hunks)cognee/tasks/feedback/extract_feedback_interactions.py(1 hunks)cognee/tasks/feedback/generate_improved_answers.py(1 hunks)cognee/tasks/feedback/link_enrichments_to_feedback.py(1 hunks)cognee/tasks/feedback/models.py(1 hunks)cognee/tasks/storage/index_data_points.py(1 hunks)cognee/tasks/storage/index_graph_edges.py(3 hunks)cognee/tests/test_edge_ingestion.py(1 hunks)cognee/tests/test_feedback_enrichment.py(1 hunks)cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py(3 hunks)examples/python/feedback_enrichment_minimal_example.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py: Use 4-space indentation; name modules and functions in snake_case; name classes in PascalCase (Python)
Adhere to ruff rules, including import hygiene and configured line length (100)
Keep Python lines ≤ 100 characters
Files:
cognee/tests/test_feedback_enrichment.pycognee/modules/chunking/models/DocumentChunk.pyexamples/python/feedback_enrichment_minimal_example.pycognee/tasks/feedback/extract_feedback_interactions.pycognee/infrastructure/engine/models/Edge.pycognee/modules/retrieval/utils/brute_force_triplet_search.pycognee/modules/retrieval/utils/completion.pycognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.pycognee/tasks/feedback/link_enrichments_to_feedback.pycognee/tasks/feedback/models.pycognee/modules/graph/utils/expand_with_nodes_and_edges.pycognee/modules/graph/cognee_graph/CogneeGraph.pycognee/infrastructure/databases/graph/kuzu/adapter.pycognee/tasks/feedback/create_enrichments.pycognee/tests/test_edge_ingestion.pycognee/tasks/storage/index_data_points.pycognee/tasks/feedback/generate_improved_answers.pycognee/tasks/storage/index_graph_edges.pycognee/tasks/feedback/__init__.pycognee/modules/retrieval/graph_completion_cot_retriever.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/**/*.py: Public APIs in the core library should be type-annotated where practical
Prefer explicit, structured error handling and use shared logging utilities from cognee.shared.logging_utils
Files:
cognee/tests/test_feedback_enrichment.pycognee/modules/chunking/models/DocumentChunk.pycognee/tasks/feedback/extract_feedback_interactions.pycognee/infrastructure/engine/models/Edge.pycognee/modules/retrieval/utils/brute_force_triplet_search.pycognee/modules/retrieval/utils/completion.pycognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.pycognee/tasks/feedback/link_enrichments_to_feedback.pycognee/tasks/feedback/models.pycognee/modules/graph/utils/expand_with_nodes_and_edges.pycognee/modules/graph/cognee_graph/CogneeGraph.pycognee/infrastructure/databases/graph/kuzu/adapter.pycognee/tasks/feedback/create_enrichments.pycognee/tests/test_edge_ingestion.pycognee/tasks/storage/index_data_points.pycognee/tasks/feedback/generate_improved_answers.pycognee/tasks/storage/index_graph_edges.pycognee/tasks/feedback/__init__.pycognee/modules/retrieval/graph_completion_cot_retriever.py
cognee/tests/**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/tests/**/test_*.py: Name test files as test_*.py
Use pytest.mark.asyncio for async tests
Tests should avoid external state; rely on fixtures and CI-provided env vars when providers are required
Files:
cognee/tests/test_feedback_enrichment.pycognee/tests/test_edge_ingestion.py
examples/python/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
When adding public APIs, provide or update targeted examples under examples/python/
Files:
examples/python/feedback_enrichment_minimal_example.py
cognee/tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Place unit tests under cognee/tests/unit/
Files:
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
🧬 Code graph analysis (15)
cognee/tests/test_feedback_enrichment.py (9)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/modules/pipelines/tasks/task.py (1)
Task(5-97)cognee/modules/search/types/SearchType.py (1)
SearchType(4-19)cognee/shared/logging_utils.py (2)
get_logger(182-194)info(175-175)cognee/tasks/feedback/create_enrichments.py (1)
create_enrichments(51-84)cognee/tasks/feedback/extract_feedback_interactions.py (1)
extract_feedback_interactions(180-230)cognee/tasks/feedback/generate_improved_answers.py (1)
generate_improved_answers(92-130)cognee/tasks/feedback/link_enrichments_to_feedback.py (1)
link_enrichments_to_feedback(33-67)cognee/api/v1/config/config.py (2)
data_root_directory(36-38)system_root_directory(18-33)
cognee/modules/chunking/models/DocumentChunk.py (3)
cognee/infrastructure/engine/models/Edge.py (1)
Edge(5-38)cognee/modules/engine/models/Entity.py (1)
Entity(6-11)cognee/modules/engine/models/Event.py (1)
Event(8-16)
examples/python/feedback_enrichment_minimal_example.py (6)
cognee/modules/search/types/SearchType.py (1)
SearchType(4-19)cognee/modules/pipelines/tasks/task.py (1)
Task(5-97)cognee/tasks/feedback/extract_feedback_interactions.py (1)
extract_feedback_interactions(180-230)cognee/tasks/feedback/generate_improved_answers.py (1)
generate_improved_answers(92-130)cognee/tasks/feedback/create_enrichments.py (1)
create_enrichments(51-84)cognee/tasks/feedback/link_enrichments_to_feedback.py (1)
link_enrichments_to_feedback(33-67)
cognee/tasks/feedback/extract_feedback_interactions.py (5)
cognee/infrastructure/llm/LLMGateway.py (1)
LLMGateway(6-66)cognee/infrastructure/llm/prompts/read_query_prompt.py (1)
read_query_prompt(6-43)cognee/shared/logging_utils.py (2)
get_logger(182-194)info(175-175)cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/tasks/feedback/models.py (1)
FeedbackEnrichment(9-26)
cognee/modules/retrieval/utils/completion.py (3)
cognee/infrastructure/llm/LLMGateway.py (1)
LLMGateway(6-66)cognee/infrastructure/llm/prompts/render_prompt.py (1)
render_prompt(5-42)cognee/infrastructure/llm/prompts/read_query_prompt.py (1)
read_query_prompt(6-43)
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (3)
cognee/api/v1/config/config.py (2)
system_root_directory(18-33)data_root_directory(36-38)cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(20-220)cognee/modules/retrieval/graph_completion_cot_retriever.py (2)
GraphCompletionCotRetriever(39-272)get_structured_completion(168-231)
cognee/tasks/feedback/link_enrichments_to_feedback.py (4)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/tasks/storage/index_graph_edges.py (1)
index_graph_edges(42-77)cognee/shared/logging_utils.py (2)
get_logger(182-194)info(175-175)cognee/tasks/feedback/models.py (1)
FeedbackEnrichment(9-26)
cognee/tasks/feedback/models.py (2)
cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(20-220)cognee/modules/engine/models/node_set.py (1)
NodeSet(4-7)
cognee/modules/graph/utils/expand_with_nodes_and_edges.py (1)
cognee/infrastructure/engine/models/Edge.py (1)
Edge(5-38)
cognee/tasks/feedback/create_enrichments.py (5)
cognee/infrastructure/llm/LLMGateway.py (1)
LLMGateway(6-66)cognee/infrastructure/llm/prompts/read_query_prompt.py (1)
read_query_prompt(6-43)cognee/shared/logging_utils.py (2)
get_logger(182-194)info(175-175)cognee/modules/engine/models/node_set.py (1)
NodeSet(4-7)cognee/tasks/feedback/models.py (1)
FeedbackEnrichment(9-26)
cognee/tasks/storage/index_data_points.py (5)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
get_vector_engine(5-7)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)
create_vector_index(292-295)index_data_points(297-309)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
create_vector_index(248-249)index_data_points(251-263)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)
create_vector_index(285-295)index_data_points(297-319)cognee/infrastructure/databases/vector/embeddings/EmbeddingEngine.py (1)
get_batch_size(38-45)
cognee/tasks/feedback/generate_improved_answers.py (5)
cognee/infrastructure/llm/LLMGateway.py (1)
LLMGateway(6-66)cognee/infrastructure/llm/prompts/read_query_prompt.py (1)
read_query_prompt(6-43)cognee/shared/logging_utils.py (2)
get_logger(182-194)info(175-175)cognee/modules/retrieval/graph_completion_cot_retriever.py (2)
GraphCompletionCotRetriever(39-272)get_structured_completion(168-231)cognee/tasks/feedback/models.py (1)
FeedbackEnrichment(9-26)
cognee/tasks/storage/index_graph_edges.py (7)
cognee/modules/engine/utils/generate_edge_id.py (1)
generate_edge_id(4-5)cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/modules/graph/models/EdgeType.py (1)
EdgeType(4-8)cognee/tasks/storage/index_data_points.py (1)
index_data_points(10-65)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
index_data_points(297-309)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
index_data_points(251-263)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
index_data_points(297-319)
cognee/tasks/feedback/__init__.py (1)
cognee/tasks/feedback/models.py (1)
FeedbackEnrichment(9-26)
cognee/modules/retrieval/graph_completion_cot_retriever.py (4)
cognee/modules/retrieval/graph_completion_retriever.py (3)
GraphCompletionRetriever(28-281)save_qa(217-281)get_completion(144-215)cognee/modules/retrieval/utils/completion.py (2)
generate_structured_completion(6-28)summarize_text(51-63)cognee/infrastructure/databases/cache/config.py (1)
CacheConfig(6-39)cognee/modules/retrieval/utils/session_cache.py (2)
get_conversation_history(78-156)save_conversation_history(10-75)
🪛 LanguageTool
cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt
[grammar] ~4-~4: Use a hyphen to join words.
Context: ...stion} Context: {context} Provide a one paragraph human readable summary of this...
(QB_NEW_EN_HYPHEN)
[grammar] ~4-~4: Use a hyphen to join words.
Context: ...{context} Provide a one paragraph human readable summary of this interaction con...
(QB_NEW_EN_HYPHEN)
🪛 Pylint (4.0.1)
cognee/tests/test_feedback_enrichment.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 36-36: Too many local variables (23/15)
(R0914)
examples/python/feedback_enrichment_minimal_example.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/tasks/feedback/extract_feedback_interactions.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 153-157: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
cognee/infrastructure/engine/models/Edge.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/modules/retrieval/utils/completion.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 6-6: Too many arguments (7/5)
(R0913)
[refactor] 6-6: Too many positional arguments (7/5)
(R0917)
[refactor] 31-31: Too many arguments (6/5)
(R0913)
[refactor] 31-31: Too many positional arguments (6/5)
(R0917)
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
[refactor] 193-193: Too few public methods (0/2)
(R0903)
[refactor] 196-196: Too few public methods (0/2)
(R0903)
cognee/tasks/feedback/link_enrichments_to_feedback.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/tasks/feedback/models.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 9-9: Too few public methods (0/2)
(R0903)
cognee/tasks/feedback/create_enrichments.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/tasks/feedback/generate_improved_answers.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 72-81: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
cognee/tasks/storage/index_graph_edges.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/modules/retrieval/graph_completion_cot_retriever.py
[refactor] 84-84: Too many arguments (6/5)
(R0913)
[refactor] 84-84: Too many positional arguments (6/5)
(R0917)
[refactor] 84-84: Too many local variables (19/15)
(R0914)
[refactor] 168-168: Too many arguments (6/5)
(R0913)
[refactor] 168-168: Too many positional arguments (6/5)
(R0917)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: End-to-End Tests / Test Feedback Enrichment
- GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
- GitHub Check: End-to-End Tests / Test Entity Extraction
- GitHub Check: End-to-End Tests / Concurrent Subprocess access test
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / Conversation sessions test
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Test graph edge ingestion
- GitHub Check: CLI Tests / CLI Functionality Tests
- GitHub Check: CLI Tests / CLI Integration Tests
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: Basic Tests / Run Basic Graph Tests
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: Basic Tests / Run Simple Examples BAML
- GitHub Check: End-to-End Tests / Test graph edge ingestion
- GitHub Check: End-to-End Tests / Run Telemetry Test
- GitHub Check: Test Weighted Edges Examples
- GitHub Check: Test Weighted Edges with Different Graph Databases (neo4j)
🔇 Additional comments (13)
cognee/tasks/feedback/models.py (1)
9-26: LGTM!The FeedbackEnrichment data model is well-structured with proper type annotations, sensible defaults for optional fields, and clear field semantics. The metadata configuration for indexing the text field aligns with the DataPoint pattern.
cognee/tasks/feedback/link_enrichments_to_feedback.py (1)
33-67: LGTM with a minor note on defensive checks.The implementation correctly creates edges from enrichments to feedback and interaction nodes, with proper logging, indexing, and error handling. The conditionals at lines 48 and 54 checking for ID presence are defensive but acceptable, even though feedback_id and interaction_id are required fields in the FeedbackEnrichment model and enrichment.id is auto-generated in the DataPoint base class.
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
173-179: LGTM! Graceful fallback for edge_text.The updated logic correctly prioritizes edge_text for distance lookups while falling back to relationship_type when edge_text is unavailable. This provides backward compatibility during migration and aligns with the PR's edge_text enrichment strategy.
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
74-74: LGTM! Enables edge_text projection for downstream use.Adding "edge_text" to the edge properties projection correctly supports the enhanced edge metadata strategy introduced in this PR.
cognee/tests/test_edge_ingestion.py (1)
55-66: LGTM! Comprehensive validation of edge_text format.The new assertions correctly verify that contains edges include edge_text with the expected format (relationship_name and entity information). This provides good test coverage for the edge_text enrichment feature.
cognee/modules/graph/utils/expand_with_nodes_and_edges.py (2)
3-3: LGTM! Import supports Edge-enriched contains relationships.The new Edge import enables wrapping entity relationships with structured edge metadata.
247-266: LGTM! Enriches contains relationships with descriptive edge_text.The change correctly constructs Edge instances with semantic edge_text that includes relationship_name, entity_name, and entity_description. This aligns with the PR's edge enrichment strategy and provides richer metadata for embeddings and graph operations. The format is consistent with test expectations in test_edge_ingestion.py.
cognee/infrastructure/llm/prompts/feedback_report_prompt.txt (1)
1-13: LGTM! Clear and well-structured prompt template.The prompt provides explicit formatting instructions and placeholder definitions, ensuring consistent output from the LLM for feedback enrichment reports.
cognee/infrastructure/engine/models/Edge.py (1)
32-38: LGTM on auto-populating edge_textValidator is concise and matches Pydantic v2 patterns. Consider annotating the validator signature for clarity:
- def ensure_edge_text(cls, v: Optional[str], info) -> Optional[str]
Please confirm runtime with pydantic==2.x in CI.
cognee/tests/test_feedback_enrichment.py (1)
108-114: Logger kwargs may raise TypeError without structlog configuredlogger.info("…", feedback=…, sentiment=…, score=…) will fail with stdlib logging (unknown kwargs). Ensure structlog setup runs in tests or switch to
extra={}or structured logger.Run locally without calling setup_logging() to confirm.
cognee/tasks/feedback/__init__.py (1)
1-13: Public surface looks goodRe-exports match implementations; all is complete.
cognee/tasks/feedback/extract_feedback_interactions.py (1)
185-193: Logging kwargs compatibilityMultiple logger.info/warning calls use key=value kwargs. Confirm structlog is initialized in this task context, or switch to
extra={}for stdlib logging.cognee/modules/retrieval/graph_completion_cot_retriever.py (1)
121-123: Edge is fully hashable; current code is correct.Edge has both
__hash__()and__eq__()implementations. The__hash__()method handles both directed and undirected edges properly, and since Node is also hashable (based onhash(self.id)), callingset(triplets)on a list of Edge objects will not raise a TypeError. The code at lines 121–123 is correct as-is.Likely an incorrect or invalid review comment.
hajdul88
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we agreed to have a followup PR to cover the weighted edges in test suites as this PR started to use them as the part of our main cognify pipeline.
The PR on the other hand: nice job!
Description
edge_textfield to edges that auto-fills fromrelationship_typeif not provided.Type of Change
Screenshots/Videos (if applicable)
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.