feat: feedback enrichment #1571

lxobr · 2025-10-20T23:45:58Z

Description

Automatically finds negative user feedback and generates better answers
All tasks work with the same FeedbackEnrichment DataPoint that gets filled out as it moves through the memify pipeline
Creates new nodes and edges in the knowledge graph, linking improved answers back to the original feedback and interactions
Includes a complete example showing how to set up a conversation, ask questions, submit feedback, and run the enrichment pipeline when answers are wrong

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Code refactoring
Performance improvement
Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

pull-checklist · 2025-10-20T23:46:03Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-10-20T23:46:15Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces comprehensive enhancements to Cognee including web scraping capabilities (BeautifulSoup and Tavily integration), a new feedback enrichment system, API mode support for MCP with a dual-mode CogneeClient, distributed Kuzu locking via Redis, Mistral LLM provider support, pipeline batching with configurable batch sizes, removal of INSIGHTS search type, complete removal of MemgraphAdapter, and systematic exception chaining improvements throughout the codebase.

Changes

Cohort / File(s)	Summary
Web Scraping Tasks `cognee/tasks/web_scraper/` (bs4_crawler.py, config.py, models.py, utils.py, web_scraper_task.py, init*.py)	New module providing BeautifulSoup and Tavily-based web scraping with extraction rules, robots.txt handling, async fetching, optional Playwright support, and end-to-end scraping workflows with graph storage integration.
Feedback System `cognee/tasks/feedback/` (models.py, extract_feedback_interactions.py, generate_improved_answers.py, create_enrichments.py, link_enrichments_to_feedback.py, init*.py)	New feedback enrichment pipeline extracting negative feedback, generating improved answers via CoT retrieval, creating enrichment reports, and linking enrichments to graph data with LLM-powered context generation.
Cache/Locking Infrastructure `cognee/infrastructure/databases/cache/` (init*.py, cache_db_interface.py, config.py, get_cache_engine.py, redis/RedisAdapter.py)	New distributed cache coordination layer with Redis-backed locking interface, configuration management, and factory functions for per-context lock acquisition and release.
API Mode Support `cognee-mcp/src/cognee_client.py`, `cognee-mcp/src/server.py`, `cognee-mcp/Dockerfile`, `cognee-mcp/entrypoint.sh`, `cognee-mcp/src/__init__.py`	New CogneeClient class supporting both direct in-process and HTTP API modes, with conditional logic in MCP server for API routing, Docker setup for API mode, and entrypoint migration skipping.
Kuzu Adapter Enhancements `cognee/infrastructure/databases/graph/kuzu/adapter.py`	Added optional Redis-based shared locking, connection lifecycle management (close/reopen), open_connections tracking, and conditional async lock serialization for concurrent access. Removed clear_database method.
Search Type Removal `cognee/modules/search/types/SearchType.py`, `cognee/modules/search/methods/get_search_type_tools.py`, `cognee/modules/retrieval/insights_retriever.py`, `cognee/cli/config.py`, `cognee/api/v1/responses/*`, `cognee/infrastructure/llm/prompts/search_type_selector_prompt.txt`	Removed INSIGHTS search type enum value, deleted InsightsRetriever class, removed INSIGHTS from CLI choices, tool definitions, prompt templates, and dispatch logic. Updated SearchType handling throughout.
Mistral LLM Provider `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py`, `cognee/modules/settings/get_settings.py`, `cognee/api/v1/settings/routers/get_settings_router.py`	Added Mistral to LLMProvider enum, implemented MistralAdapter for structured output via litellm with retries and error handling, added Mistral models to settings and LLM provider list.
Data Ingestion & Add Enhancements `cognee/api/v1/add/add.py`, `cognee/api/v1/add/routers/get_add_router.py`, `cognee/tasks/ingestion/save_data_item_to_storage.py`	Added extraction_rules, tavily_config, soup_crawler_config, and data_per_batch parameters to add function; introduced HTTP URL detection and web content ingestion via fetch_page_content; added HTMLContent validation class; integrated optional web scraper imports.
Pipeline Batching `cognee/modules/pipelines/operations/pipeline.py`, `cognee/modules/pipelines/operations/run_tasks.py`, `cognee/modules/pipelines/operations/run_tasks_data_item.py`, `cognee/modules/pipelines/operations/run_tasks_distributed.py`	Added data_per_batch parameter (default 20) throughout pipeline execution; replaced incremental per-item processing with batch-based concurrent execution; introduced new run_tasks_data_item module for incremental/regular item processing with telemetry and status tracking.
Graph Edge Indexing `cognee/tasks/storage/add_data_points.py`, `cognee/tasks/storage/index_graph_edges.py`, `cognee/tasks/graph/extract_graph_from_data.py`, `cognee-mcp/src/codingagents/coding_rule_associations.py`, `cognee/modules/retrieval/user_qa_feedback.py`	Made edge indexing explicit and always-on by removing update_edge_collection parameter; added optional edges_data parameter to index_graph_edges for direct edge input; updated callers to pass edges explicitly.
Update Function Changes `cognee/api/v1/update/update.py`, `cognee/api/v1/update/routers/get_update_router.py`	Made dataset_id a required non-optional parameter in update() function signature; updated router to pass node_set as None when falsy.
Cognify Enhancements `cognee/api/v1/cognify/cognify.py`	Added data_per_batch parameter to cognify function; changed example search from SearchType.INSIGHTS to SearchType.GRAPH_COMPLETION; propagated data_per_batch through pipeline execution.
Exception Chaining `cognee/cli/commands/add_command.py`, `cognee/cli/commands/cognify_command.py`, `cognee/cli/commands/config_command.py`, `cognee/cli/commands/delete_command.py`, `cognee/cli/commands/search_command.py`, `cognee/modules/users/methods/get_authenticated_user.py`, `cognee/modules/users/roles/methods/create_role.py`, `cognee/modules/users/tenants/methods/create_tenant.py`, `cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py`, `cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py`, `cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py`, `cognee/infrastructure/databases/graph/neptune_driver/neptune_utils.py`	Added "from e" exception chaining to preserve original tracebacks across multiple exception handling sites.
UI & CLI Docker Management `cognee/api/v1/ui/ui.py`, `cognee/cli/_cognee.py`	Added dynamic Docker container management for MCP startup with port mapping and environment configuration; enhanced signal handlers to stop/force-remove containers; updated pid_callback to handle (PID, container_name) tuples for Docker container tracking.
Context Variables `cognee/context_global_variables.py`	Added two new ContextVar globals: soup_crawler_config and tavily_config (default=None) for per-context web scraper configuration.
File Type & Loader Updates `cognee/infrastructure/files/utils/guess_file_type.py`, `cognee/infrastructure/loaders/LoaderEngine.py`	Added fallback to text/plain type when guess() returns None; removed file_stream parameter from load_file signature; reordered default_loader_priority moving advanced_pdf_loader to end.
Data Retrieval & Storage `cognee/modules/data/methods/get_dataset_data.py`, `cognee/modules/engine/models/TableRow.py`, `cognee/tasks/ingestion/save_data_item_to_storage.py`	Added order_by(data_size desc) to dataset data query; removed is_a field from TableRow model; added DoclingDocument handling via export_to_text().
Retrieval & Graph Logic `cognee/modules/retrieval/graph_completion_cot_retriever.py`, `cognee/modules/retrieval/graph_completion_retriever.py`, `cognee/modules/retrieval/utils/description_to_codepart_search.py`	Added _run_cot_completion and get_structured_completion for structured output retrieval; removed update_edge_collection parameter from add_data_points call; changed INSIGHTS to GRAPH_COMPLETION in doc-inclusive searches.
Neo4j & Graph Adapters `cognee/infrastructure/databases/graph/neo4j_driver/adapter.py`, `cognee/infrastructure/databases/graph/neo4j_driver/deadlock_retry.py`, `cognee/infrastructure/databases/graph/get_graph_engine.py`	Added keep_alive parameter to Neo4j driver; switched properties serialization to dict(node); updated filtered graph data query to use n.id; increased deadlock_retry max_retries default from 5 to 10; removed memgraph from supported providers list.
Health & Settings `cognee/api/health.py`, `cognee/modules/settings/get_settings.py`, `cognee/__init__.py`, `cognee-frontend/src/app/(graph)/GraphVisualization.tsx`	Changed health check to use critical_checks names list; added Mistral models to settings; exported update from cognee package; wrapped zoomToFit in GraphVisualizationAPI with guarded forwardRef.
Chat Hook Updates `cognee-frontend/src/modules/chat/hooks/useChat.ts`	Removed INSIGHTS branch from convertToSearchTypeOutput; now handles only SUMMARIES and CHUNKS with fallback.
MCP & Frontend `cognee-mcp/README.md`	Expanded API Mode documentation with Docker setup, host/localhost handling, explicit environment variables, CLI arguments, connectivity notes, and limitations.
LLM Prompt Templates `cognee/infrastructure/llm/prompts/feedback_reaction_prompt.txt`, `cognee/infrastructure/llm/prompts/feedback_report_prompt.txt`, `cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt`	Added three new prompt templates for feedback workflow: reaction (rewriting with feedback), report (generating explanatory paragraphs), and user context (summarizing facts).
Test Files `cognee/tests/test_.py`, `cognee/tests/tasks/web_scraping/web_scraping_test.py`, `cognee/tests/subprocesses/`, `cognee/tests/test_concurrent_subprocess_access.py`, `cognee/tests/cli_tests/cli_unit_tests/test_cli_utils.py`	Updated tests to use GRAPH_COMPLETION instead of INSIGHTS; renamed explanation file variables to explanation_file_path_nlp/_quantum; removed INSIGHTS from expected types; added new web scraping tests; added concurrent subprocess access test; removed test_memgraph.py.
Cache/Logging Exports `cognee/infrastructure/databases/cache/__init__.py`	Re-exported get_cache_engine and get_cache_config to cache package public API.
README Updates `README.md`	Updated Get Started section with new Google Colab notebook URL.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant UI as Frontend/CLI
    participant API as Cognee API
    participant MCP as MCP Server
    participant Direct as Direct Mode<br/>(In-Process)
    participant APIClient as CogneeClient<br/>(API Mode)
    
    User->>UI: Invoke action<br/>(add/cognify/search)
    alt API Mode Enabled
        UI->>APIClient: Create with api_url
        APIClient->>MCP: POST /api/v1/{operation}
        MCP->>API: Route to operation
        API-->>MCP: Response (JSON)
        MCP-->>APIClient: HTTP Response
        APIClient-->>UI: Parsed result
    else Direct Mode
        UI->>Direct: Invoke cognee.{operation}
        Direct->>API: Execute operation
        API-->>Direct: Result
        Direct-->>UI: Direct result
    end

sequenceDiagram
    participant Task as Pipeline Task
    participant Batching as Batch Processor<br/>(data_per_batch)
    participant DataItem as Data Item<br/>Processor
    participant Telemetry as Telemetry
    participant DB as Database
    
    Task->>Batching: run_tasks(data[], batch_size=20)
    loop For each batch of 20 items
        Batching->>DataItem: process_batch(items)
        alt Incremental Mode
            DataItem->>DB: Check prior status
            alt Not completed
                DataItem->>Telemetry: run_with_telemetry
                Telemetry->>DB: Update status
            else Already completed
                DataItem-->>Batching: Skip (already done)
            end
        else Regular Mode
            DataItem->>Telemetry: run_with_telemetry
        end
        DataItem-->>Batching: Results
    end
    Batching-->>Task: All results collected

sequenceDiagram
    participant User
    participant FeedbackTask as Feedback Task
    participant LLM as LLM Service
    participant Retriever as GraphCompletion<br/>Retriever
    participant Graph as Graph DB
    
    User->>FeedbackTask: Extract feedback
    FeedbackTask->>Graph: Fetch negative feedback + interactions
    FeedbackTask-->>Graph: Build enrichments
    
    FeedbackTask->>FeedbackTask: For each enrichment
    FeedbackTask->>Retriever: Generate improved answer<br/>(CoT)
    Retriever->>LLM: Reaction prompt
    LLM-->>Retriever: Improved answer + explanation
    Retriever-->>FeedbackTask: Result
    
    FeedbackTask->>LLM: Create report<br/>(feedback_report_prompt)
    LLM-->>FeedbackTask: Report text
    
    FeedbackTask->>Graph: Add enrichment edges<br/>(enriches_feedback,<br/>improves_interaction)
    Graph->>Graph: Index new edges
    FeedbackTask-->>User: Enrichments complete

sequenceDiagram
    participant User
    participant WebScraper as Web Scraper Task
    participant Crawler as Crawler<br/>(BS4/Tavily)
    participant LLM as LLM
    participant Graph as Graph DB
    
    User->>WebScraper: web_scraper_task(url, extraction_rules)
    WebScraper->>Crawler: fetch_page_content(urls)
    
    alt Using BeautifulSoup
        Crawler->>Crawler: Check robots.txt
        Crawler->>Crawler: Fetch HTML (+ Playwright if needed)
        Crawler->>Crawler: Extract via selectors/XPath
    else Using Tavily
        Crawler->>Crawler: TavilyClient.search
    end
    
    Crawler-->>WebScraper: {url: content}
    
    WebScraper->>LLM: Generate descriptions<br/>(WebPage, WebSite)
    LLM-->>WebScraper: Descriptions
    
    WebScraper->>Graph: Add WebPage/WebSite nodes
    WebScraper->>Graph: Add edges (is_part_of, is_scraping)
    Graph->>Graph: Index data points & edges
    WebScraper-->>User: Result with graph data

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Rationale: This PR exhibits high complexity across multiple dimensions:

Scope: Substantial additions (web scraping module, feedback system, caching layer) alongside removals (Memgraph, INSIGHTS) affecting 100+ files.
Heterogeneity: Highly varied changes—new web scraper integration, feedback enrichment pipeline, API mode routing, distributed Kuzu locking, Mistral LLM support, pipeline batching—each requiring separate reasoning.
Logic Density: Non-trivial logic in batching orchestration, lock management, API client routing, web scraping with extraction rules, and feedback generation.
Repetition: Exception chaining and INSIGHTS removal are repetitive patterns, offsetting complexity slightly.
Risk Areas: API mode routing, distributed locking, pipeline refactoring, and new feedback system warrant careful scrutiny for correctness and side effects.

Possibly related PRs

feat: web scraping connector task #1501: Adds the new web_scraper package with BeautifulSoup and Tavily integration—directly foundational to web scraping feature in this PR.
feat: Add Memgraph integration #751: Removes MemgraphAdapter from codebase—directly related to Memgraph deletion and removal from supported providers in this PR.
Feat/mcp add support for non standalone mode #1523: Implements API mode and Docker/MCP startup logic—directly related to CogneeClient, MCP API routing, and container management additions.

Suggested labels

run-checks

Suggested reviewers

borisarzentar
dexters1

Poem

🐰 Whiskers twitch with delight—
Web scrapers now crawl through the night,
Feedback enriched, batches aligned,
Mistral's wisdom refined,
Exception chains hold the light! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "feat: feedback enrichment" directly corresponds to the primary feature addition in this changeset. The title uses conventional commit format and clearly communicates that a new feedback enrichment feature is being introduced. The raw_summary confirms extensive new code across multiple feedback-related modules (extract_feedback_interactions, generate_improved_answers, create_enrichments, link_enrichments_to_feedback), data models, and task integrations, all centered on feedback enrichment functionality. The title is concise, specific, and provides sufficient clarity for someone reviewing the commit history.
Description Check	✅ Passed	The PR description follows the repository template and provides a comprehensive, human-written explanation of the changes. It clearly outlines the feature's purpose: automatically finding negative feedback and generating improved answers while creating graph connections. The description demonstrates understanding of the implementation approach (using FeedbackEnrichment DataPoint through the pipeline). The author has correctly selected "New feature" as the type of change and completed most of the pre-submission checklist items (testing, minimal changes, code standards, added tests, and DCO affirmation). While some items remain unchecked (documentation, all tests pass, linked issues), these are not mandatory for PR completeness and may not apply to this specific contribution.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 49

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (16)

cognee/infrastructure/loaders/LoaderEngine.py (3)
80-97: Add None check for file_info to prevent AttributeError.

The filetype.guess() function can return None when it cannot determine the file type from the magic bytes. Accessing file_info.extension and file_info.mime on lines 87 and 96 without checking for None will cause an AttributeError, leading to unclear error messages instead of properly handling unsupported file types.

Apply this diff to handle the case when file type cannot be determined:
     file_info = filetype.guess(file_path)
+    
+    # Handle case where file type cannot be determined
+    if file_info is None:
+        logger.warning(f"Could not determine file type for: {file_path}")
+        # Try to use fallback loaders that might handle unknown types
+        for loader_name in self.default_loader_priority:
+            if loader_name in self._loaders:
+                loader = self._loaders[loader_name]
+                # Try loader without type info
+                if loader.can_handle(extension=None, mime_type=None):
+                    return loader
+        return None

     # Try preferred loaders first
     if preferred_loaders:
140-140: Correct the type hint from any to Any.

The return type annotation uses lowercase any instead of Any from the typing module (which is already imported on line 2). Python's built-in any is a function, not a type annotation.

Apply this diff:
-    def get_loader_info(self, loader_name: str) -> Dict[str, any]:
+    def get_loader_info(self, loader_name: str) -> Dict[str, Any]:
105-130: All callers have been correctly updated; fix stale documentation in LoaderInterface.py.

Verification confirms all three call sites in cognee/tasks/ingestion/data_item_to_text_file.py (lines 51, 59, 72) correctly use the new signature with file_path instead of file_stream.

However, the abstract load method docstring in cognee/infrastructure/loaders/LoaderInterface.py (lines 62-69) still references the removed file_stream parameter: "file_stream: If file stream is provided it will be used to process file instead". Remove this stale documentation line to keep the interface contract accurate.
cognee/infrastructure/databases/graph/neo4j_driver/deadlock_retry.py (3)
43-60: Critical: Inconsistent retry logic between exception handlers.

The two exception handlers use different comparison operators, causing different retry behavior:

Line 48 (Neo4jError): if attempt > max_retries: allows the final attempt when attempt == max_retries

Line 57 (DatabaseUnavailable): if attempt >= max_retries: prevents the final attempt when attempt == max_retries

With max_retries=10, Neo4jError gets 11 attempts while DatabaseUnavailable gets only 9.

Apply this diff to make retry behavior consistent:
                 except DatabaseUnavailable:
-                    if attempt >= max_retries:
+                    if attempt > max_retries:
                         raise  # Re-raise the original error
 
                     await wait()
12-26: Remove stale parameters from docstring.

The docstring lists initial_backoff, backoff_factor, and jitter parameters that don't exist in the function signature. These parameters likely belong to the calculate_backoff function instead.

Apply this diff:
     """
     Decorator that automatically retries an asynchronous function when rate limit errors occur.
 
     This decorator implements an exponential backoff strategy with jitter
     to handle rate limit errors efficiently.
 
     Args:
         max_retries: Maximum number of retry attempts.
-        initial_backoff: Initial backoff time in seconds.
-        backoff_factor: Multiplier for exponential backoff.
-        jitter: Jitter factor to avoid the thundering herd problem.
 
     Returns:
         The decorated async function.
     """
37-39: Update misleading log message.

The log message states "Neo4j rate limit hit" but this decorator handles deadlocks (DeadlockDetected), transient errors (Neo.TransientError), and database unavailability, not just rate limits.

Apply this diff:
                 backoff_time = calculate_backoff(attempt)
                 logger.warning(
-                    f"Neo4j rate limit hit, retrying in {backoff_time:.2f}s "
+                    f"Neo4j transient error, retrying in {backoff_time:.2f}s "
                     f"Attempt {attempt}/{max_retries}"
                 )
cognee/infrastructure/databases/graph/kuzu/adapter.py (3)
224-256: Use the cache lock context manager and ensure DB close on exceptions.

Manual acquire/release is error-prone; prefer a with-context once CacheDBInterface.hold_lock is fixed. Also make sure self.close() happens even on exceptions.

Apply this diff:
-        def blocking_query():
-            lock_acquired = False
-            try:
-                if cache_config.shared_kuzu_lock:
-                    self.redis_lock.acquire_lock()
-                    lock_acquired = True
-                if not self.connection:
-                    logger.info("Reconnecting to Kuzu database...")
-                    self._initialize_connection()
-
-                result = self.connection.execute(query, params)
-                rows = []
-
-                while result.has_next():
-                    row = result.get_next()
-                    processed_rows = []
-                    for val in row:
-                        if hasattr(val, "as_py"):
-                            val = val.as_py()
-                        processed_rows.append(val)
-                    rows.append(tuple(processed_rows))
-
-                return rows
-            except Exception as e:
-                logger.error(f"Query execution failed: {str(e)}")
-                raise
-            finally:
-                if cache_config.shared_kuzu_lock and lock_acquired:
-                    try:
-                        self.close()
-                    finally:
-                        self.redis_lock.release_lock()
+        def blocking_query():
+            def _exec_once() -> list[tuple]:
+                if not self.connection:
+                    logger.info("Reconnecting to Kuzu database...")
+                    self._initialize_connection()
+                result = self.connection.execute(query, params)
+                rows: list[tuple] = []
+                while result.has_next():
+                    row = result.get_next()
+                    processed_rows = []
+                    for val in row:
+                        if hasattr(val, "as_py"):
+                            val = val.as_py()
+                        processed_rows.append(val)
+                    rows.append(tuple(processed_rows))
+                return rows
+
+            try:
+                if cache_config.shared_kuzu_lock:
+                    with self.redis_lock.hold_lock():
+                        try:
+                            return _exec_once()
+                        finally:
+                            self.close()
+                else:
+                    return _exec_once()
+            except Exception as e:
+                logger.error(f"Query execution failed: {str(e)}")
+                raise
This assumes CacheDBInterface.hold_lock calls acquire_lock()/release_lock as corrected. Based on learnings.

1427-1461: Bug: get_graph_metrics unpacks a dict and indexes wrong shape; function will fail.

get_model_independent_graph_data returns a dict, not (nodes, edges). The current code will throw and returns meaningless metrics.

Apply this minimal, correct implementation:
-            # Get basic graph data
-            nodes, edges = await self.get_model_independent_graph_data()
-            num_nodes = len(nodes[0]["nodes"]) if nodes else 0
-            num_edges = len(edges[0]["elements"]) if edges else 0
+            # Get basic counts with dedicated queries
+            node_count_rows = await self.query("MATCH (n:Node) RETURN COUNT(n)")
+            edge_count_rows = await self.query("MATCH ()-[r:EDGE]->() RETURN COUNT(r)")
+            num_nodes = int(node_count_rows[0][0]) if node_count_rows else 0
+            num_edges = int(edge_count_rows[0][0]) if edge_count_rows else 0
The rest of the method can remain as-is for optional metrics computations.

1789-1799: Bug: UNWIND list formatting in collect_events produces a nested list.

event_collection_cypher expects a comma-separated string of quoted IDs, but a raw Python list is formatted as "['a','b']" leading to UNWIND [['a','b']] AS uid. Quote and join explicitly.

Apply this diff:
-        query = event_collection_cypher.format(quoted=ids)
+        quoted_ids = ", ".join(f\"'{uid}'\" for uid in ids)
+        query = event_collection_cypher.format(quoted=quoted_ids)
cognee/tests/test_search_db.py (1)
226-226: Critical: Undefined variable 'text'.

Line 226 references text which is not defined in scope. This appears to be a copy-paste error from the earlier refactoring. Should this be explanation_file_path_quantum?

Apply this diff:
-    await cognee.add([text], dataset_name)
+    await cognee.add([explanation_file_path_quantum], dataset_name)
cognee/api/v1/update/update.py (1)
66-66: Fix inconsistent docstring.

The docstring at line 66 still says "Optional specific dataset UUID" but dataset_id is now required (line 13).

Apply this diff:
-        dataset_id: Optional specific dataset UUID to use instead of dataset_name.
+        dataset_id: UUID of the dataset containing the data to update.
cognee/modules/retrieval/graph_completion_retriever.py (1)

235-237: Add missing index_graph_edges() call after add_edges().

The codebase establishes a consistent pattern of calling index_graph_edges() immediately after add_edges(). At cognee/modules/retrieval/graph_completion_retriever.py:237, this call is missing, despite being present in the analogous code at cognee/modules/retrieval/user_qa_feedback.py:78-79 and throughout task modules (add_data_points.py, extract_graph_from_data.py, link_enrichments_to_feedback.py, etc.). Add the missing indexing call after line 237.
cognee/api/v1/cognify/cognify.py (1)
171-175: Docstring references unsupported parameter ontology_file_path

The example passes ontology_file_path, but cognify() does not accept it. Update example or expose a supported way (e.g., via config/ontology_config).

Suggested fix:
-        await cognee.cognify(
-            datasets=["research_papers"],
-            graph_model=ScientificPaper,
-            ontology_file_path="scientific_ontology.owl"
-        )
+        await cognee.cognify(
+            datasets=["research_papers"],
+            graph_model=ScientificPaper,
+            config={"ontology_config": {"ontology_resolver": my_resolver}}
+        )
cognee/modules/pipelines/operations/run_tasks.py (1)
162-164: Error propagation contradicts comment; likely inverted.

Comment says “don’t raise” on incremental loading, but code re-raises unless PipelineRunFailedError. Make re-raise conditional on not incremental.
-        # In case of error during incremental loading of data just let the user know the pipeline Errored, don't raise error
-        if not isinstance(error, PipelineRunFailedError):
-            raise error
+        # If not incremental, re-raise to fail the run; otherwise, yield the error and continue
+        if not incremental_loading:
+            raise
cognee-mcp/src/server.py (1)
116-127: Fix runtime error: LLMGateway.read_query_prompt() method does not exist.

The mistral adapter at cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py:121 calls LLMGateway.read_query_prompt(system_prompt), but read_query_prompt is a standalone function in cognee/infrastructure/llm/prompts/read_query_prompt.py, not a class method. Add the import and call it directly:
from cognee.infrastructure.llm.prompts.read_query_prompt import read_query_prompt

# Then at line 121:
system_prompt = read_query_prompt(system_prompt)
cognee/tasks/storage/index_graph_edges.py (1)
52-57: Bug: edge type counting ignores tuple’s type (breaks on Neo4j path).

You only count relationship_name found inside a props dict. Many adapters return edges as (src, dst, type, props) and don’t duplicate the type in props, so no edge types are counted and nothing gets indexed.

Fix by extracting from the 3rd tuple element first, then falling back to props or model attributes.

Apply:
-    edge_types = Counter(
-        item.get("relationship_name")
-        for edge in edges_data
-        for item in edge
-        if isinstance(item, dict) and "relationship_name" in item
-    )
+    edge_types: Counter[str] = Counter()
+    for edge in edges_data:
+        rel = None
+        # Common tuple shape: (source_id, target_id, rel_type, props)
+        if isinstance(edge, (list, tuple)) and len(edge) >= 3 and isinstance(edge[2], str):
+            rel = edge[2]
+        # Fallback: search props for 'relationship_name'
+        elif isinstance(edge, (list, tuple)):
+            rel = next(
+                (it.get("relationship_name") for it in edge if isinstance(it, dict) and "relationship_name" in it),
+                None,
+            )
+        else:
+            # If it's a model-like EdgeData object
+            rel = getattr(edge, "relationship_name", None) or getattr(edge, "type", None)
+        if rel:
+            edge_types[rel] += 1
Based on learnings.

🧹 Nitpick comments (79)

cognee/api/health.py (1)

289-294: LGTM! Good refactor for maintainability.

Deriving critical_comps dynamically from critical_checks eliminates the risk of keeping a hard-coded list in sync. The logic correctly identifies when critical components are unhealthy.

Minor observation: The llm_provider and embedding_service checks are commented as "non-critical" (lines 188, 217) but reside in critical_checks. They effectively behave as non-critical because they return DEGRADED rather than UNHEALTHY on failure. Moving them to non_critical_checks would align the code structure with the comments, though the current approach works correctly.

cognee/modules/data/methods/get_dataset_data.py (1)

15-15: Add index to data_size column for ORDER BY performance.

The data_size column in cognee/modules/data/models/Data.py (line 36) lacks an index but is used in an ORDER BY clause in the query. For datasets with large numbers of records, this can cause performance degradation. Add index=True to the column definition or document why sorting without an index is acceptable for this use case.
cognee/api/v1/responses/dispatch_function.py (2)
59-63: Consider deriving valid_search_types from the SearchType enum.

The hardcoded fallback list on line 62 duplicates knowledge about valid search types that should come from a single source of truth. If the SearchType enum changes in the future, this fallback list must be manually updated, creating a maintenance burden and risk of inconsistency.

Apply this diff to derive the list programmatically from the enum:
+from enum import Enum
+
 async def handle_search(arguments: Dict[str, Any], user) -> list:
     """Handle search function call"""
     search_tool = next((tool for tool in DEFAULT_TOOLS if tool["name"] == "search"), None)
     required_params = (
         search_tool["parameters"].get("required", []) if search_tool else ["search_query"]
     )
 
     query = arguments.get("search_query")
     if not query and "search_query" in required_params:
         return "Error: Missing required 'search_query' parameter"
 
     search_type_str = arguments.get("search_type", "GRAPH_COMPLETION")
     valid_search_types = (
         search_tool["parameters"]["properties"]["search_type"]["enum"]
         if search_tool
-        else ["CODE", "GRAPH_COMPLETION", "NATURAL_LANGUAGE"]
+        else [member.name for member in SearchType]
     )
62-62: INSIGHTS removal is consistent—enum and fallback list properly synchronized.

Verification confirms the SearchType enum has been updated to remove the INSIGHTS member, and the fallback list on line 62 now correctly contains only valid types: ["CODE", "GRAPH_COMPLETION", "NATURAL_LANGUAGE"]. The enum consistency concern is resolved.

Optional: Consider deriving the fallback list programmatically from the enum's valid members to avoid duplication and reduce maintenance burden if new search types are added in the future. The current hardcoded approach works correctly but increases risk of divergence between enum and fallback definitions.
cognee/infrastructure/llm/prompts/feedback_user_context_prompt.txt (1)
4-5: Consider hyphenating compound adjectives for formal style.

For more formal grammar, consider "one-paragraph" and "human-readable" with hyphens. However, the current phrasing is clear and functional for an LLM prompt.
-Provide a one paragraph human readable summary of this interaction context,
+Provide a one-paragraph human-readable summary of this interaction context,
cognee-frontend/src/app/(graph)/GraphVisualization.tsx (1)
220-232: Make zoomToFit wrapper return void; drop undefined as any.

Returning a value from a void-typed method is unnecessary and the as any escape hatch is avoidable. No‑op early and call through.

Apply:
-  const zoomToFit: ForceGraphMethods["zoomToFit"] = (
+  const zoomToFit: ForceGraphMethods["zoomToFit"] = (
     durationMs?: number,
     padding?: number,
     nodeFilter?: (node: NodeObject) => boolean
   ) => {
     if (!graphRef.current) {
       console.warn("GraphVisualization: graphRef not ready yet");
-      return undefined as any;
+      return;
     }
 
-    return graphRef.current.zoomToFit?.(durationMs, padding, nodeFilter);
+    graphRef.current.zoomToFit?.(durationMs, padding, nodeFilter);
   };
Optional: set sensible defaults matching resize behavior.
-  const zoomToFit: ForceGraphMethods["zoomToFit"] = (
-    durationMs?: number,
-    padding?: number,
-    nodeFilter?: (node: NodeObject) => boolean
-  ) => {
+  const zoomToFit: ForceGraphMethods["zoomToFit"] = (
+    durationMs = 1000,
+    padding = 50,
+    nodeFilter?: (node: NodeObject) => boolean
+  ) => {
cognee/tests/subprocesses/writer.py (1)
8-16: Use string UUIDs for DB parameters.

Kuzu parameter binding may not accept uuid.UUID directly. Emit str to avoid driver/type issues.

Apply:
-    document = PdfDocument(
-        id=uuid.uuid4(),
+    document = PdfDocument(
+        id=str(uuid.uuid4()),
         name=name,
         raw_data_location=name,
         external_metadata="test_external_metadata",
         mime_type="test_mime",
     )
Optional: if this test should intentionally keep the DB handle open, keep as is; otherwise consider closing the adapter when done.
cognee/api/v1/ui/ui.py (1)
556-561: pid_callback now receives a tuple; widen its type hint.

start_ui declares pid_callback: Callable[[int], None], but here you pass (pid, container_name). Update the annotation (and docstring) to reflect both forms.

Outside this hunk, change the signature to:
def start_ui(
    pid_callback: Callable[[int | tuple[int, str]], None],
    ...
) -> Optional[subprocess.Popen]:
    ...
This matches usage in cognee/cli/_cognee.py and avoids type-checker noise.
cognee/tasks/web_scraper/models.py (2)
1-3: Avoid shared mutable defaults; use Field(default_factory=...) for metadata.

Using a bare dict literal as a default creates a shared mutable default across instances. Switch to Field(default_factory=...) and keep typing consistent.

Apply this diff:
-from cognee.infrastructure.engine import DataPoint
+from cognee.infrastructure.engine import DataPoint
+from pydantic import Field
@@
-    metadata: dict = {"index_fields": ["name", "description", "content"]}
+    metadata: dict = Field(default_factory=lambda: {"index_fields": ["name", "description", "content"]})
@@
-    metadata: dict = {"index_fields": ["name", "description"]}
+    metadata: dict = Field(default_factory=lambda: {"index_fields": ["name", "description"]})
@@
-    metadata: dict = {"index_fields": ["name", "description"]}
+    metadata: dict = Field(default_factory=lambda: {"index_fields": ["name", "description"]})
If DataPoint.metadata uses a dedicated alias/type (e.g., MetaData), consider aligning the annotation to that alias for consistency. Based on learnings.

Also applies to: 19-20, 33-34, 46-46

42-42: Constrain ScrapingJob.status to a finite set (Enum or Literal).

Prevent invalid states by using an Enum or Literal[“active”, “paused”, “completed”, “failed”].

Example:
+from enum import Enum
+
+class ScrapingStatus(str, Enum):
+    active = "active"
+    paused = "paused"
+    completed = "completed"
+    failed = "failed"
@@
-    status: str  # "active", "paused", "completed", "failed"
+    status: ScrapingStatus
cognee/modules/pipelines/operations/run_tasks_data_item.py (2)
100-105: Telemetry naming: pass the human-friendly pipeline_name instead of pipeline_id.

run_tasks_with_telemetry emits events keyed by pipeline_name. Passing IDs degrades observability.

Apply this diff:
-            pipeline_name=pipeline_id,
+            pipeline_name=pipeline_name,
Repeat in the regular path:
-        pipeline_name=pipeline_id,
+        pipeline_name=pipeline_name,
Alternatively rename the parameter in run_tasks_data_item_regular to accept pipeline_name explicitly for clarity.

Also applies to: 179-185

106-111: Unify generator yield shape with the docstring/type hint.

Function annotations/docstring say the generators yield dicts; currently they yield PipelineRunYield objects. Wrap them for consistency (and include data_id in incremental).

Incremental:
-            yield PipelineRunYield(
-                pipeline_run_id=pipeline_run_id,
-                dataset_id=dataset.id,
-                dataset_name=dataset.name,
-                payload=result,
-            )
+            yield {
+                "run_info": PipelineRunYield(
+                    pipeline_run_id=pipeline_run_id,
+                    dataset_id=dataset.id,
+                    dataset_name=dataset.name,
+                    payload=result,
+                ),
+                "data_id": data_id,
+            }
Regular:
-        yield PipelineRunYield(
-            pipeline_run_id=pipeline_run_id,
-            dataset_id=dataset.id,
-            dataset_name=dataset.name,
-            payload=result,
-        )
+        yield {
+            "run_info": PipelineRunYield(
+                pipeline_run_id=pipeline_run_id,
+                dataset_id=dataset.id,
+                dataset_name=dataset.name,
+                payload=result,
+            )
+        }
Also applies to: 186-191
cognee-mcp/src/cognee_client.py (3)
78-89: Use _get_headers() for consistency and fix hardcoded filename.

Two issues here:

Headers are constructed inline instead of using the _get_headers() helper, creating inconsistency with other methods.

The hardcoded filename "data.txt" may be misleading when uploading non-text data.

Apply this diff:
-            files = {"data": ("data.txt", str(data), "text/plain")}
+            files = {"data": ("data", str(data), "text/plain")}
             form_data = {
                 "datasetName": dataset_name,
             }
             if node_set is not None:
                 form_data["node_set"] = json.dumps(node_set)
 
             response = await self.client.post(
                 endpoint,
                 files=files,
                 data=form_data,
-                headers={"Authorization": f"Bearer {self.api_token}"} if self.api_token else {},
+                headers={"Authorization": f"Bearer {self.api_token}"} if self.api_token else None,
             )
Note: Using None instead of {} for empty headers is more idiomatic with httpx.

94-96: Document the redirect_stdout pattern.

The redirect_stdout(sys.stderr) pattern is used throughout but not explained. Consider adding a comment explaining why stdout is redirected to stderr in direct mode.

85-92: Consider wrapping HTTP exceptions for better error messages.

HTTP errors from httpx (e.g., HTTPStatusError, RequestError) will propagate directly to callers. For better user experience, consider catching these and raising custom exceptions with more context about what operation failed.

Example:
try:
    response = await self.client.post(...)
    response.raise_for_status()
    return response.json()
except httpx.HTTPStatusError as e:
    raise CogneeAPIError(f"Failed to add data: {e}") from e
cognee-mcp/README.md (3)

132-189: API Mode: add explicit security and networking cautions.

Note that API_TOKEN will end up in container env and shell history; advise using Docker secrets or env files with least privilege.

Caution that --network host exposes container ports to host namespace; recommend only for dev or document risks.

Mention Linux case where host.docker.internal may not exist unless configured; you already show alternatives—link to Docker docs here.

123-131: Clarify transport config parity (env vs args).

Add a one‑liner mapping table for SSE/HTTP paths (e.g., SSE at /sse, HTTP at /mcp) to avoid ambiguity when switching between Docker and direct modes.

317-386: API Mode limitations: cross‑link exact tool behavior.

For each limited tool (codify, prune, status, list_data by dataset), add a quick pointer to the equivalent API endpoint or note “not supported via API.” Helps users decide when to use Direct vs API.

cognee/infrastructure/llm/prompts/feedback_reaction_prompt.txt (1)

9-14: Harden output format and non‑speculative guardrails.

Add: “Do not fabricate facts; if information is missing, state the limitation briefly.” Also require single‑line “Answer:” and “Explanation:” to simplify parsing.
cognee/cli/commands/cognify_command.py (1)
125-128: Include docs_url for richer CLI errors.

When raising CliCommandException, pass docs_url=self.docs_url to improve UX.
-                raise CliCommandException(str(e), error_code=1) from e
+                raise CliCommandException(str(e), error_code=1, docs_url=self.docs_url) from e
-            raise CliCommandException(f"Error during cognification: {str(e)}", error_code=1) from e
+            raise CliCommandException(
+                f"Error during cognification: {str(e)}",
+                error_code=1,
+                docs_url=self.docs_url,
+            ) from e
cognee/tests/test_kuzu.py (2)
41-45: Validate test data paths and prefer Pathlib.

Add existence assertions for both files; use Path objects for clarity and OS safety.
-        explanation_file_path_nlp = os.path.join(
-            pathlib.Path(__file__).parent, "test_data/Natural_language_processing.txt"
-        )
+        base = pathlib.Path(__file__).parent
+        explanation_file_path_nlp = base / "test_data" / "Natural_language_processing.txt"
+        assert explanation_file_path_nlp.exists()
 ...
-        explanation_file_path_quantum = os.path.join(
-            pathlib.Path(__file__).parent, "test_data/Quantum_computers.txt"
-        )
+        explanation_file_path_quantum = base / "test_data" / "Quantum_computers.txt"
+        assert explanation_file_path_quantum.exists()
Also applies to: 46-51

85-87: Relax brittle history count.

Exact 6 may fluctuate with pipeline changes; assert “>= expected minimum” or derive from executed calls.
-        assert len(history) == 6, "Search history is not correct."
+        assert len(history) >= 6, f"Expected at least 6 history entries, got {len(history)}"
cognee/tests/test_add_docling_document.py (2)
18-20: Ensure cleanup even on failure.

Wrap prune calls in try/finally or add a final prune to avoid cross‑test contamination.
-    await cognee.prune.prune_data()
-    await cognee.prune.prune_system(metadata=True)
+    try:
+        await cognee.prune.prune_data()
+        await cognee.prune.prune_system(metadata=True)
+        ...
+    finally:
+        await cognee.prune.prune_data()
+        await cognee.prune.prune_system(metadata=True)
35-53: Reduce brittleness of assertions.

Color assertion may fail on minor extraction variance; prefer set inclusion against tokenized words.

For the “light bulbs” check, allow synonyms (e.g., “zero,” “don’t”) or use regex.
-    lowercase_answer = answer[0].lower()
-    assert ("no" in lowercase_answer) or ("none" in lowercase_answer)
+    import re
+    assert re.search(r"\b(no|none|zero|don'?t)\b", answer[0].lower())
cognee/tests/subprocesses/simple_cognify_2.py (1)
24-31: Graceful event loop teardown.

Add loop.close() after shutdown_asyncgens() to release resources.
     finally:
         loop.run_until_complete(loop.shutdown_asyncgens())
+        loop.close()
cognee-mcp/src/__init__.py (1)
1-4: Constrain fallback import to avoid shadowing unrelated top-level modules and to preserve real import errors.

Current try/except will also catch ImportError raised inside .server and then import a possibly unrelated server on sys.path. Gate the fallback to only fire when running as a script (__package__ is None) and re-raise otherwise.

Apply:
-try:
-    from .server import main as server_main
-except ImportError:
-    from server import main as server_main
+try:
+    from .server import main as server_main
+except ImportError as e:
+    # Only fall back when executed as a script where relative imports don't work.
+    if __package__ is None:
+        from server import main as server_main  # local dev/script mode
+    else:
+        raise
cognee/tests/test_library.py (2)
105-109: Avoid writing visualization output to the user’s home in CI.

visualize_graph() writes to HOME when no path is provided. Provide a temp path to keep tests hermetic.
-    await visualize_graph()
+    import tempfile, os
+    with tempfile.TemporaryDirectory() as tmp:
+        out = os.path.join(tmp, "graph.html")
+        await visualize_graph(destination_file_path=out)
92-103: Scope all three unscoped search calls with dataset_ids to prevent cross-test flakiness.

The test includes three unscoped cognee.search() calls (lines 54, 68, and 92) that can pull results from other datasets/tests, causing flakiness. The cognee.search() function accepts dataset_ids parameter; use pipeline_run_obj.dataset_id which is available at line 83.
     search_results = await cognee.search(
-        query_type=SearchType.GRAPH_COMPLETION, query_text=random_node_name
+        query_type=SearchType.GRAPH_COMPLETION,
+        query_text=random_node_name,
+        dataset_ids=[pipeline_run_obj.dataset_id],
     )

     search_results = await cognee.search(
-        query_type=SearchType.SUMMARIES, query_text=random_node_name
+        query_type=SearchType.SUMMARIES,
+        query_text=random_node_name,
+        dataset_ids=[pipeline_run_obj.dataset_id],
     )

     search_results = await cognee.search(
         query_type=SearchType.GRAPH_COMPLETION,
         query_text="What information do you contain?",
+        dataset_ids=[pipeline_run_obj.dataset_id],
     )
cognee/tasks/feedback/models.py (1)

21-26: Minor: align with base typing and doc clarity.

Consider keeping belongs_to_set: Optional[List[DataPoint]] (NodeSet is a DataPoint) to match base expectations, and add brief field docstrings if used in public APIs. Optional only.
cognee/tasks/storage/add_data_points.py (2)
31-37: Docstring drift: edge indexing is now unconditional.

Update the comment to match behavior.
-        - Optionally updates the edge index via `index_graph_edges`.
+        - Updates the edge index via `index_graph_edges`.
71-76: Add failure boundaries to avoid partial writes or full‑pipeline failure on indexing errors.

A failure in indexing (nodes or edges) after successful graph writes currently fails the entire call with no rollback. Log and continue, or surface a structured warning.
-    await graph_engine.add_nodes(nodes)
-    await index_data_points(nodes)
+    await graph_engine.add_nodes(nodes)
+    try:
+        await index_data_points(nodes)
+    except Exception as e:
+        # Do not fail persistence; surface degraded retrieval explicitly.
+        # Consider metrics/telemetry hook here.
+        print(f"Warning: node indexing failed: {e}")
@@
-    await graph_engine.add_edges(edges)
-    await index_graph_edges(edges)
+    await graph_engine.add_edges(edges)
+    try:
+        if edges:
+            await index_graph_edges(edges)
+    except Exception as e:
+        print(f"Warning: edge indexing failed: {e}")
If you need strict atomicity, confirm whether your graph/vector backends support transactions to implement a real rollback instead.
cognee/tests/test_concurrent_subprocess_access.py (1)
21-42: Ensure the Redis‑based lock is actually enabled for this test.

Defaults in CacheConfig set caching=False and shared_kuzu_lock=False. Explicitly enable in the test or skip if Redis isn’t available.
 async def concurrent_subprocess_access():
+    # Ensure shared Redis lock is enabled for Kùzu
+    os.environ.setdefault("CACHE_CACHING", "true")
+    os.environ.setdefault("CACHE_SHARED_KUZU_LOCK", "true")
+    os.environ.setdefault("CACHE_HOST", os.environ.get("CACHE_HOST", "127.0.0.1"))
+    os.environ.setdefault("CACHE_PORT", os.environ.get("CACHE_PORT", "6379"))
If CI doesn’t provide Redis, guard with a skip:
# at top of file
+# import socket, pytest
+# def _redis_available(host, port): 
+#     try: s=socket.create_connection((host, int(port)), timeout=1); s.close(); return True
+#     except OSError: return False
+# if not _redis_available(os.environ.get("CACHE_HOST","127.0.0.1"), os.environ.get("CACHE_PORT","6379")):
+#     pytest.skip("Redis not available; skipping lock test", allow_module_level=True)
cognee/api/v1/cognify/cognify.py (2)
54-55: Document and define semantics for data_per_batch

Add data_per_batch to the Args section and clarify precedence vs per-Task task_config batch_size (currently hardcoded to 10 below). Consider threading data_per_batch into task_config defaults to avoid divergence.

Apply:
@@
-    Args:
+    Args:
@@
         temporal_cognify: bool = False,
-        data_per_batch: int = 20,
+        data_per_batch: int = 20,
+            Number of data points processed per batch across the pipeline. If a Task
+            specifies task_config["batch_size"], that value takes precedence unless
+            overridden by this argument.
38-38: Unused lock

update_status_lock is unused; remove to avoid dead code.
- update_status_lock = asyncio.Lock()
+ # removed unused update_status_lock
cognee-mcp/entrypoint.sh (1)
50-52: DB readiness sleep may flake

Replace fixed sleep with a small wait loop against DB (or HTTP health) to reduce startup races.
-# Add startup delay to ensure DB is ready
-sleep 2
+# Wait for DB/HTTP health (example: HTTP on $HTTP_PORT if applicable)
+for i in {1..30}; do
+  curl -sf "http://127.0.0.1:${HTTP_PORT}/health" && break
+  sleep 1
+done || echo "Warning: health check did not pass; continuing..."
cognee/tasks/feedback/__init__.py (1)
1-13: Export ImprovedAnswerResponse for convenience

Expose ImprovedAnswerResponse to avoid reaching into submodule for types.
-from .generate_improved_answers import generate_improved_answers
+from .generate_improved_answers import generate_improved_answers, ImprovedAnswerResponse
@@
     "link_enrichments_to_feedback",
     "FeedbackEnrichment",
+    "ImprovedAnswerResponse",
 ]
cognee/tasks/feedback/generate_improved_answers.py (3)
6-9: Remove unused imports

LLMGateway and resolve_edges_to_text are not used here.
-from cognee.infrastructure.llm import LLMGateway
@@
-from cognee.modules.graph.utils import resolve_edges_to_text
72-81: Minor: remove else after return (pylint R1705)

Simplify control flow.
-        if completion:
-            enrichment.improved_answer = completion.answer
-            enrichment.new_context = new_context_text
-            enrichment.explanation = completion.explanation
-            return enrichment
-        else:
-            logger.warning(
-                "Failed to get structured completion from retriever", question=enrichment.question
-            )
-            return None
+        if completion:
+            enrichment.improved_answer = completion.answer
+            enrichment.new_context = new_context_text
+            enrichment.explanation = completion.explanation
+            return enrichment
+        logger.warning(
+            "Failed to get structured completion from retriever", question=enrichment.question
+        )
+        return None
92-101: Nit: top_k default is 20 but not documented elsewhere

If this should match a global setting, consider centralizing or documenting.
cognee/cli/_cognee.py (3)
184-219: Harden shutdown: handle missing Docker, add kill fallback, and dedupe safe-printing.

If Docker CLI is missing (FileNotFoundError), emit a clear warning instead of a silent pass.

After sending SIGTERM (or docker stop), consider a short wait and fallback to SIGKILL/docker rm -f if still alive.

Repeated try/except around fmt.echo/success/warning can be centralized via a small safe_echo wrapper.
@@
-            # First, stop Docker container if running
+            # First, stop Docker container if running
             if docker_container:
                 try:
                     result = subprocess.run(
                         ["docker", "stop", docker_container],
                         capture_output=True,
                         timeout=10,
                         check=False,
                     )
@@
                 except subprocess.TimeoutExpired:
@@
-                except Exception:
-                    pass
+                except FileNotFoundError:
+                    try:
+                        fmt.warning("Docker CLI not found; skipping container shutdown.")
+                    except (BrokenPipeError, OSError):
+                        pass
+                except Exception:
+                    pass
Optionally, after process termination below, poll briefly and escalate to SIGKILL if still alive.

Please confirm whether you want me to provide a concrete SIGKILL escalation patch and a minimal safe_echo helper.

220-245: Terminate robustness: add wait-and-escalate path for stubborn processes.

Currently we send SIGTERM/taskkill once without confirming exit. Add a short wait and, if needed, escalate (SIGKILL on Unix, second taskkill on Windows with error check). This prevents orphans in CI.
@@
-            for pid in spawned_pids:
+            for pid in spawned_pids:
                 try:
                     if hasattr(os, "killpg"):
@@
-                        os.killpg(pgid, signal.SIGTERM)
+                        os.killpg(pgid, signal.SIGTERM)
+                        try:
+                            os.waitpid(-pgid, os.WNOHANG)  # non-blocking check
+                        except Exception:
+                            pass
+                        # Optional: small sleep + SIGKILL if still running
@@
                     else:
                         # Windows: Use taskkill to terminate process and its children
                         subprocess.run(
                             ["taskkill", "/F", "/T", "/PID", str(pid)],
                             capture_output=True,
                             check=False,
                         )
257-266: Document tuple-aware pid_callback.

Annotating the callback clarifies tuple support and prevents misuse.
-            def pid_callback(pid_or_tuple):
+            from typing import Union, Tuple
+            def pid_callback(pid_or_tuple: Union[int, Tuple[int, str]]) -> None:
                 nonlocal spawned_pids, docker_container
cognee/tasks/feedback/link_enrichments_to_feedback.py (1)
16-30: Optional: include feedback_weight and widen metadata typing.

Edges elsewhere carry feedback_weight (see GraphCompletionRetriever.save_qa). Aligning metadata helps downstream analysis.
-def _create_edge_tuple(
-    source_id: UUID, target_id: UUID, relationship_name: str
-) -> Tuple[UUID, UUID, str, dict]:
+from typing import Dict, Any
+
+def _create_edge_tuple(
+    source_id: UUID, target_id: UUID, relationship_name: str
+) -> Tuple[UUID, UUID, str, Dict[str, Any]]:
@@
         {
             "relationship_name": relationship_name,
             "source_node_id": source_id,
             "target_node_id": target_id,
             "ontology_valid": False,
+            "feedback_weight": 0,
         },
cognee/tasks/ingestion/save_data_item_to_storage.py (2)
21-26: HTMLContent validation is too naive.

Checking only “<” and “>” yields many false positives/negatives. Consider a minimal parse attempt (e.g., strip and require at least one tag-like pattern) or defer to caller.
 class HTMLContent(str):
     def __new__(cls, value: str):
-        if not ("<" in value and ">" in value):
+        import re
+        if not re.search(r"<[a-zA-Z][^>]*>", value or ""):
             raise ValueError("Not valid HTML-like content")
         return super().__new__(cls, value)
38-43: Avoid string-based type detection for Docling.

"docling" in str(type(...)) is brittle. Prefer a guarded import and isinstance check; fall back gracefully if Docling isn’t installed.
-    if "docling" in str(type(data_item)):
-        from docling_core.types import DoclingDocument
-
-        if isinstance(data_item, DoclingDocument):
-            data_item = data_item.export_to_text()
+    try:
+        from docling_core.types import DoclingDocument  # type: ignore
+        if isinstance(data_item, DoclingDocument):
+            data_item = data_item.export_to_text()
+    except ImportError:
+        pass
cognee/tests/test_neptune_analytics_vector.py (1)
55-60: Avoid IndexError if search returns no results.

Add a precondition check and a helpful failure message.
-    random_node = (await vector_engine.search("Entity_name", "Quantum computer"))[0]
+    results = await vector_engine.search("Entity_name", "Quantum computer")
+    assert results, "Vector engine returned no nodes for 'Quantum computer'"
+    random_node = results[0]
cognee/modules/retrieval/graph_completion_cot_retriever.py (1)

73-80: Too many locals in _run_cot_completion; consider extracting helpers.

Pylint flags high local var count. Extract prompt-building and validation blocks into small private helpers to reduce complexity.

Would you like me to propose a small split into _build_user_and_system_prompts(...) and _validate_and_followup(...)?

cognee/infrastructure/databases/cache/redis/RedisAdapter.py (1)

7-11: Constructor has too many positional args; prefer a config object and timeouts.

Pass a small config/dataclass (host, port, timeouts) and keep args keyword-only to avoid misuse. Consider adding socket_connect_timeout/socket_timeout on the client for resilience.
cognee/modules/pipelines/operations/run_tasks.py (1)
92-115: Safer concurrency: capture per-task exceptions without aborting the batch.

gather() without return_exceptions=True cancels siblings on first failure and bypasses your later “errored_results” check. Consider collecting exceptions and converting them to PipelineRunErrored entries for uniform handling.
-            results.extend(await asyncio.gather(*data_item_tasks))
+            batch_results = await asyncio.gather(*data_item_tasks, return_exceptions=True)
+            # Normalize exceptions into error entries to be handled downstream
+            for br in batch_results:
+                if isinstance(br, Exception):
+                    results.append({"run_info": PipelineRunErrored(payload=repr(br))})
+                else:
+                    results.append(br)
cognee/tasks/feedback/extract_feedback_interactions.py (4)
138-151: Be robust to non-UUID node ids.

Graph ids may not be UUIDs. Fall back to a stable uuid5 when parsing fails; reduces noisy error logging.
-        enrichment = FeedbackEnrichment(
+        # Normalize IDs to UUIDs
+        try:
+            feedback_uuid = UUID(str(feedback_node_id))
+        except ValueError:
+            feedback_uuid = uuid5(NAMESPACE_OID, str(feedback_node_id))
+        try:
+            interaction_uuid = UUID(str(interaction_node_id))
+        except ValueError:
+            interaction_uuid = uuid5(NAMESPACE_OID, str(interaction_node_id))
+
+        enrichment = FeedbackEnrichment(
             id=str(uuid5(NAMESPACE_OID, f"{question_text}_{interaction_node_id}")),
             text="",
             question=question_text,
             original_answer=original_answer_text,
             improved_answer="",
-            feedback_id=UUID(str(feedback_node_id)),
-            interaction_id=UUID(str(interaction_node_id)),
+            feedback_id=feedback_uuid,
+            interaction_id=interaction_uuid,
             belongs_to_set=None,
             context=context_summary_text,
             feedback_text=feedback_text,
             new_context="",
             explanation="",
         )
153-157: Remove unnecessary else-after-return.
-        if _has_required_feedback_fields(enrichment):
-            return enrichment
-        else:
-            logger.warning("Skipping invalid feedback item", interaction=str(interaction_node_id))
-            return None
+        if _has_required_feedback_fields(enrichment):
+            return enrichment
+        logger.warning("Skipping invalid feedback item", interaction=str(interaction_node_id))
+        return None
180-186: Parameter ‘subgraphs’ is unused.

Either consume it (prefer: allow passing pre-fetched (nodes, edges)) or drop it from the signature to avoid confusion.

Would you like me to wire subgraphs as an optional (nodes, edges) tuple and fall back to fetching when None?

82-95: Date sorting should parse timestamps.

String sort is fragile unless guaranteed ISO 8601. Parse to datetime for correctness.
-    def _recency_key(pair):
+    from datetime import datetime
+    def _to_dt(v): 
+        try: return datetime.fromisoformat(v.replace("Z", "+00:00"))
+        except Exception: return datetime.min
+    def _recency_key(pair):
         _, (_, interaction_props) = pair
-        created_at = interaction_props.get("created_at") or ""
-        updated_at = interaction_props.get("updated_at") or ""
-        return (created_at, updated_at)
+        created_at = _to_dt(interaction_props.get("created_at") or "")
+        updated_at = _to_dt(interaction_props.get("updated_at") or "")
+        return (created_at, updated_at)
cognee/tests/test_neo4j.py (2)
50-56: Avoid IndexError before asserting results exist.

Assert non-emptiness before indexing the first result.
-    vector_engine = get_vector_engine()
-    random_node = (await vector_engine.search("Entity_name", "Quantum computer"))[0]
+    vector_engine = get_vector_engine()
+    search_hits = await vector_engine.search("Entity_name", "Quantum computer")
+    assert search_hits, "No vector hits for 'Quantum computer'."
+    random_node = search_hits[0]
86-90: Brittle history count.

Hard-coding len(history) == 6 will drift as flows evolve. Prefer >= expected minimum or assert specific recent entries.

If history semantics are strict, point me to the spec and I’ll align the assertion precisely.
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (2)
98-101: Catch Pydantic validation as well.

Structured parsing can raise pydantic.ValidationError. Consider adding:
-        except JSONSchemaValidationError as e:
+        except (JSONSchemaValidationError, Exception) as e:
Or explicitly import and catch ValidationError to provide clearer error messages.

1-6: Remove unused imports.

acompletion is unused. Drop it to avoid confusion.
cognee/tasks/web_scraper/utils.py (1)

39-41: Docstring return type mismatch.

Function returns Dict[str, str] for both paths; comment says "dict for Tavily". Clarify it’s string content for Tavily too.
cognee-mcp/src/server.py (6)
13-13: Import style nit.

Prefer from mcp import types to avoid star-import-like module path usage.

154-159: Remove unused import KnowledgeGraph.

KnowledgeGraph is imported but not used after loading a custom model. Drop it.

458-459: Typo: "succesfully".
-                logger.info("Codify process finished succesfully.")
+                logger.info("Codify process finished successfully.")
598-627: Tighten control flow to reduce nested elifs.

A few elif blocks follow a return path (Pylint R1705). Refactor to early returns for readability. No behavior change.

701-714: Early return inside tool on API mode is fine; consider mirroring in docstring.

Docstring for list_data mentions both modes; note the API-mode limitation for dataset_id in docs to avoid surprises.

1029-1093: Global init is OK; add a null-guard for defensive safety.

If tools are invoked programmatically without main(), cognee_client would be None. Consider asserting in each tool or initializing a default client when cognee_client is None.
cognee/api/v1/add/add.py (1)

90-101: Docs/examples: clarify URL ingestion path and prerequisites.

Note that Tavily requires TAVILY_API_KEY and BeautifulSoup requires beautifulsoup4 (and optionally lxml/html5lib).

In examples, consider showing SoupCrawlerConfig(extraction_rules=...) explicitly.

Also applies to: 169-181

cognee/tasks/web_scraper/config.py (1)

13-24: Config models LGTM; minor: consider stricter header typing.

If practical, type headers as Dict[Literal["User-Agent"], str] or a Mapping[str, str] to match httpx. Optional.
cognee/tasks/storage/index_graph_edges.py (2)
83-88: Defensive default for batch size.

Avoid AttributeError if embedding_engine/get_batch_size is absent.
-        batch_size = vector_engine.embedding_engine.get_batch_size()
+        # Be defensive: not all engines expose get_batch_size()
+        batch_size_getter = getattr(getattr(vector_engine, "embedding_engine", None), "get_batch_size", None)
+        batch_size = batch_size_getter() if callable(batch_size_getter) else 20
45-47: Clearer deprecation message.

Message is confusing. Suggest rewording.
-            logger.warning(
-                "Your graph edge embedding is deprecated, please pass edges to the index_graph_edges directly."
-            )
+            logger.warning(
+                "Implicit edge loading in index_graph_edges() is deprecated; pass edges_data explicitly."
+            )
cognee/tasks/web_scraper/bs4_crawler.py (2)
353-368: Close Playwright page/context explicitly to avoid leaks.

Ensure page and context are closed even on exceptions.
-                async with async_playwright() as p:
-                    browser = await p.chromium.launch(headless=True)
-                    try:
-                        context = await browser.new_context()
-                        page = await context.new_page()
-                        await page.goto(
-                            url,
-                            wait_until="networkidle",
-                            timeout=int((timeout or self.timeout) * 1000),
-                        )
-                        if js_wait:
-                            await asyncio.sleep(js_wait)
-                        return await page.content()
-                    finally:
-                        await browser.close()
+                async with async_playwright() as p:
+                    browser = await p.chromium.launch(headless=True)
+                    context = await browser.new_context()
+                    page = await context.new_page()
+                    try:
+                        await page.goto(
+                            url,
+                            wait_until="networkidle",
+                            timeout=int((timeout or self.timeout) * 1000),
+                        )
+                        if js_wait:
+                            await asyncio.sleep(js_wait)
+                        return await page.content()
+                    finally:
+                        try:
+                            await page.close()
+                        finally:
+                            await context.close()
+                            await browser.close()
23-25: Minor: clarify install hint.

Playwright usually needs both package install and browser install.
-        "Failed to import playwright, make sure to install using pip install playwright>=1.9.0"
+        "Playwright not installed. Run: pip install playwright && playwright install"
cognee/tests/tasks/web_scraping/web_scraping_test.py (1)
120-137: Mark cron job test async for pytest.
-async def test_cron_web_scraper():
+@pytest.mark.asyncio
+async def test_cron_web_scraper():
cognee/tasks/web_scraper/web_scraper_task.py (8)
11-11: Don’t freeze env vars at import; defer TAVILY_API_KEY to runtime.

Defaulting params to os.getenv(...) is evaluated at import time. Use Optional and resolve inside check_arguments.
@@
-from typing import Union, List
+from typing import Union, List, Optional
@@ async def cron_web_scraper_task(
-    tavily_api_key: str = os.getenv("TAVILY_API_KEY"),
+    tavily_api_key: Optional[str] = None,
@@ async def web_scraper_task(
-    tavily_api_key: str = os.getenv("TAVILY_API_KEY"),
+    tavily_api_key: Optional[str] = None,
@@ def check_arguments(tavily_api_key, extraction_rules, tavily_config, soup_crawler_config):
-    preferred_tool = "beautifulsoup"
+    preferred_tool = "beautifulsoup"
+    # fallback to env only at runtime
+    tavily_api_key = tavily_api_key or os.getenv("TAVILY_API_KEY")
Also applies to: 49-53, 123-127, 350-366

260-263: Plumb real HTTP metadata; avoid hard-coded 200/text/html.

Status code, content type, and last_modified are hard-coded. Prefer returning these from fetch_page_content (and underlying fetchers) and set WebPage fields/description accordingly.

Short-term: default when unknown, but don’t claim 200/text/html if not verified.

Also applies to: 271-274

155-157: Validate URLs before fetching to reduce SSRF and bad input.

Filter to http/https and optionally block private/loopback ranges before calling fetch_page_content.
@@ async def web_scraper_task(
-    if isinstance(url, str):
-        url = [url]
+    if isinstance(url, str):
+        url = [url]
+    # basic scheme guard
+    url = [u for u in url if urlparse(u).scheme in ("http", "https")]
+    if not url:
+        raise ValueError("No valid http(s) URLs provided")
@@
-    results = await fetch_page_content(
+    results = await fetch_page_content(
         urls=url,
For stronger defense, add CIDR checks to exclude RFC1918, loopback, link-local.

Also applies to: 194-201

3-6: Generalize docstring (not Kuzu-specific).

Module works with the configured graph engine; remove Kuzu mention to avoid misleading docs.
- and ScrapingJob data points, and store them in a Kuzu graph database. It supports
+ and ScrapingJob data points, and store them in the configured graph database. It supports
393-396: Remove unnecessary else after return (R1705).
-    if full_path.startswith(base_path):
-        return full_path[len(base_path) :].lstrip("/")
-    else:
-        return full_path.lstrip("/")
+    if full_path.startswith(base_path):
+        return full_path[len(base_path) :].lstrip("/")
+    return full_path.lstrip("/")
44-53: Too many args; consider an options object.

Both functions take 7 params and web_scraper_task has many locals/statements. Consider a WebScraperOptions dataclass to group configs and reduce churn in call sites.

Also applies to: 118-127

106-106: Minor: rely on logger timestamps.

The logger already timestamps; you can drop datetime.now() in the message for cleaner logs.

104-115: Optional: return the scheduled job id.

When scheduling, consider returning the job id for observability/control.
-        if not scheduler.running:
-            scheduler.start()
-        return
+        if not scheduler.running:
+            scheduler.start()
+        return job_name

cognee-mcp/entrypoint.sh

coderabbitai · 2025-10-20T23:56:07Z

cognee/api/v1/add/add.py

+except ImportError:
+    logger.debug(f"Unable to import {str(ImportError)}")
+    pass


⚠️ Potential issue | 🟡 Minor

Import error logging uses the exception type, not the instance.

logger.debug(f"Unable to import {str(ImportError)}") prints the class. Capture the exception.

-except ImportError: - logger.debug(f"Unable to import {str(ImportError)}") - pass +except ImportError as e: + logger.debug(f"Unable to import web scraping config/context: {e}")

🤖 Prompt for AI Agents

In cognee/api/v1/add/add.py around lines 26-28, the except block logs the ImportError class instead of the caught exception instance; change the handler to capture the exception (e.g., except ImportError as e) and log the actual exception details (use e or repr(e), or logger.exception/traceback if stack is needed) instead of str(ImportError); keep or remove the pass as appropriate.

cognee/api/v1/add/add.py

cognee/api/v1/ui/ui.py

coderabbitai · 2025-10-20T23:56:08Z

cognee/infrastructure/databases/cache/cache_db_interface.py

+    @contextmanager
+    def hold_lock(self):
+        """
+        Context manager for safely acquiring and releasing the lock.
+        """
+        self.acquire()
+        try:
+            yield
+        finally:
+            self.release()


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix method name mismatch in hold_lock (calls undefined acquire/release).

hold_lock calls self.acquire()/self.release(), but the abstract API exposes acquire_lock()/release_lock(). This will raise at runtime and is flagged by static analysis. Use the declared methods.

Apply this diff:

@contextmanager def hold_lock(self): """ Context manager for safely acquiring and releasing the lock. """ - self.acquire() + self.acquire_lock() try: yield finally: - self.release() + self.release_lock()

Optionally yield the lock if callers need it: yield self.lock.

Run to find other mismatches (e.g., RedisAdapter.hold_lock uses acquire/release):

🏁 Script executed:

#!/bin/bash rg -nC2 -e '\bhold_lock\s*\(' -e '\bacquire_lock\s*\(' -e '\brelease_lock\s*\(' -e '\bacquire\s*\(' -e '\brelease\s*\(' cognee/infrastructure/databases/cache

Length of output: 4573

Fix method name mismatch in hold_lock—calls undefined acquire/release methods.

The review comment is accurate. Both cache_db_interface.py (lines 38, 42) and RedisAdapter.py (lines 45, 49) call self.acquire() and self.release(), which don't exist. The abstract interface and all implementations define acquire_lock() and release_lock() instead.

Apply the provided diff to cache_db_interface.py. Additionally, the same issue exists in RedisAdapter.py and should be fixed identically:

Line 45: self.acquire() → self.acquire_lock()

Line 49: self.release() → self.release_lock()

🧰 Tools

🪛 Pylint (4.0.1)

[error] 38-38: Instance of 'CacheDBInterface' has no 'acquire' member

(E1101)

[error] 42-42: Instance of 'CacheDBInterface' has no 'release' member

(E1101)

🤖 Prompt for AI Agents

In cognee/infrastructure/databases/cache/cache_db_interface.py around lines 33-42 and in RedisAdapter.py at the occurrences on lines 45 and 49, the hold_lock context manager calls undefined methods self.acquire() and self.release(); change those calls to the existing method names self.acquire_lock() and self.release_lock() respectively so the interface and implementations match (update the two calls in cache_db_interface.py and the two calls in RedisAdapter.py).

cognee/tests/subprocesses/reader.py

cognee/tests/subprocesses/writer.py

cognee/tests/test_concurrent_subprocess_access.py

cognee/tests/test_lancedb.py

hajdul88

I left some comments. In general I think the biggest issue is that sometimes it breaks + cot rounds are not getting created when access control is ON.

I believe it would have been a better idea to create a reasoning enrichment task/module instead of extending and using the cot retriever as it is a different problem. In this way the prompt is getting searched in the graph in the first round and the retriever is designed for a QA type of use.

Some tests are missing (unit + end to end memify enrichment loop).

examples/python/feedback_enrichment_minimal_example.py

cognee/tasks/feedback/extract_feedback_interactions.py

examples/python/feedback_enrichment_minimal_example.py

cognee/modules/retrieval/graph_completion_cot_retriever.py

cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py

lxobr added 10 commits October 20, 2025 12:48

feat: feedback enrichment preparation

44ec814

feat: extract feedback interactions

78fca9f

feat: generate improved answers temp

97eb893

feat: allow structured output in the cot retriever

1e1fac3

feat: generate improved answers

ce41882

feat: create_enrichments.py

834cf8b

fix: create enrichments

8e580bd

feat: use datapoints only

590c3ad

Merge branch 'dev' into feature/cog-3187-feedback-enrichment

cccf523

chore: use cot retriever only

70c0a98

lxobr added the core-team label Oct 20, 2025

lxobr self-assigned this Oct 20, 2025

lxobr requested a review from hajdul88 October 20, 2025 23:46

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

Vasilije1990 changed the base branch from main to dev October 21, 2025 05:04

Merge branch 'dev' into feature/cog-3187-feedback-enrichment

46b19ad

hajdul88 reviewed Oct 21, 2025

View reviewed changes

lxobr added 11 commits October 23, 2025 11:31

chore: pre-align cot retriever with dev

f4d038b

Merge branch 'dev' into feature/cog-3187-feedback-enrichment-merge-test

46e6d87

chore: restore the feedback enrichment cot retriever functionality

66a8242

refactor: unify structured and str completion

ecae650

test: add e2e feedback enrichment test

aba5f9b

chore: minor improvements

2d61885

chore: adhere to memify input convention

b09e4b7

fix: emphasize negative feedback language

f49b171

chore: expand logging

23e66a6

chore: further expand logging

7a08e13

fix: update kuzu get_filtered_graph_data

6dea23b

Vasilije1990 merged commit d682f2e into dev Oct 24, 2025
268 of 274 checks passed

Vasilije1990 deleted the feature/cog-3187-feedback-enrichment branch October 24, 2025 07:30

coderabbitai bot mentioned this pull request Oct 29, 2025

Feat/cognee mcp add option to install extras #1696

Merged

coderabbitai bot mentioned this pull request Dec 2, 2025

fix: install nvm and node for -ui cli command #1836

Merged

16 tasks

feat: feedback enrichment #1571

feat: feedback enrichment #1571

Uh oh!

Conversation

lxobr commented Oct 20, 2025

Description

Type of Change

Screenshots/Videos (if applicable)

Pre-submission Checklist

DCO Affirmation

Uh oh!

pull-checklist bot commented Oct 20, 2025

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hajdul88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coderabbitai bot commented Oct 20, 2025 •

edited

Loading