Skip to content

Conversation

@borisarzentar
Copy link
Member

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

@borisarzentar borisarzentar self-assigned this Mar 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 24, 2025

Walkthrough

This pull request updates several modules across the codebase. The Dockerfile now adds a new “api” extra dependency for Poetry. In the API router and embedding engine, asynchronous HTTP handling has been replaced with synchronous requests and adjusted error handling. Multiple files have been refactored to clean up imports and improve formatting, including the relocation of some imports into local scopes. Additionally, exception handling in the extract summary has been tightened, and an entire module for topic extraction has been removed.

Changes

File(s) Change Summary
Dockerfile Updated ARG POETRY_EXTRAS to include the api extra dependency with an accompanying comment.
cognee/api/v1/add/routers/get_add_router.py,
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
Changed HTTP request handling from asynchronous (aiohttp/httpx) to synchronous (requests), updated error handling, and adjusted method signature in the embedding engine.
cognee/api/v1/cognify/code_graph_pipeline.py,
cognee/api/v1/users/routers/get_visualize_router.py,
cognee/eval_framework/benchmark_adapters/twowikimultihop_adapter.py,
cognee/infrastructure/llm/anthropic/adapter.py,
cognee/infrastructure/llm/gemini/adapter.py,
cognee/infrastructure/llm/openai/adapter.py,
cognee/shared/utils.py
Refactored imports and formatting: moved some imports to local scopes, removed unused dependencies, added blank lines for readability, and removed the graph rendering function.
cognee/modules/data/extraction/extract_summary.py Modified exception handling by removing RetryError catch and its import, now only catching InstructorRetryException.
cognee/modules/data/extraction/extract_topics_naive.py Removed the entire module containing functions for topic extraction, text cleaning, and demonstration code.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIRouter
    participant ExternalAPI
    participant CogneeAdd
    Client->>APIRouter: Send add request with data URL
    APIRouter->>ExternalAPI: Synchronous requests.get(data)
    ExternalAPI-->>APIRouter: HTTP Response
    alt HTTP 200 OK
        APIRouter->>CogneeAdd: Call cognee_add(file_data)
        CogneeAdd-->>APIRouter: Processed result
        APIRouter-->>Client: Return processed result
    else HTTP Error
        APIRouter-->>Client: Return 400 error JSON
    end
Loading
sequenceDiagram
    participant Caller
    participant Engine as OllamaEmbeddingEngine
    participant EmbeddingService
    Caller->>Engine: Request embedding for prompt
    Engine->>EmbeddingService: Synchronous requests.post(prompt)
    EmbeddingService-->>Engine: HTTP Response (embedding data)
    Engine-->>Caller: Return embedding list
Loading

Possibly related PRs

  • chore: Add ollama optional depdendency #584: The changes in the main PR, which involve adding an entry to the POETRY_EXTRAS in the Dockerfile, are related to the retrieved PR that adds a new dependency entry for ollama in the pyproject.toml file, as both modifications pertain to managing optional dependencies in the project.
  • chore: Be explicit on extras to install in Docker #598: The changes in the main PR, which add a new entry to the POETRY_EXTRAS argument in the Dockerfile, are directly related to the changes in the retrieved PR, which also modifies the POETRY_EXTRAS argument for enhanced configurability in the Docker installation process.

Suggested reviewers

  • lxobr
  • hajdul88

Poem

I'm a rabbit hopping in the code,
Carrots and changes lighten my load.
Extra dependencies now make me grin,
With cleaner imports, I hop right in.
Synchronous calls keep the bugs at bay,
In this garden of code, I joyfully play.
Happy hops and tidy stops every day!

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (2)
cognee/shared/utils.py (1)

45-47: 🛠️ Refactor suggestion

Repeated NLTK downloads can hurt performance.
Downloading corpus data in every call may slow execution if invoked multiple times. Consider caching or verifying availability before each download.

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

54-70: 🛠️ Refactor suggestion

Avoid blocking the event loop by using requests in an async method.
Since requests is synchronous, the event loop will be blocked during network calls. Consider removing the async keyword or switching to an asynchronous HTTP library.

Example refactor using httpx:

-async def _get_embedding(self, prompt: str) -> List[float]:
-    ...
-    response = requests.post(self.endpoint, json=payload, headers=headers, timeout=60.0)
+import httpx
+
+async def _get_embedding(self, prompt: str) -> List[float]:
+    async with httpx.AsyncClient(timeout=60.0) as client:
+        response = await client.post(self.endpoint, json=payload, headers=headers)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bf30f7 and 3a9240c.

⛔ Files ignored due to path filters (2)
  • poetry.lock is excluded by !**/*.lock, !**/*.lock
  • pyproject.toml is excluded by !**/*.toml
📒 Files selected for processing (12)
  • Dockerfile (1 hunks)
  • cognee/api/v1/add/routers/get_add_router.py (2 hunks)
  • cognee/api/v1/cognify/code_graph_pipeline.py (1 hunks)
  • cognee/api/v1/users/routers/get_visualize_router.py (1 hunks)
  • cognee/eval_framework/benchmark_adapters/twowikimultihop_adapter.py (1 hunks)
  • cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3 hunks)
  • cognee/infrastructure/llm/anthropic/adapter.py (1 hunks)
  • cognee/infrastructure/llm/gemini/adapter.py (1 hunks)
  • cognee/infrastructure/llm/openai/adapter.py (1 hunks)
  • cognee/modules/data/extraction/extract_summary.py (1 hunks)
  • cognee/modules/data/extraction/extract_topics_naive.py (0 hunks)
  • cognee/shared/utils.py (1 hunks)
💤 Files with no reviewable changes (1)
  • cognee/modules/data/extraction/extract_topics_naive.py
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/api/v1/add/routers/get_add_router.py

47-47: Undefined name error

(F821)

🪛 GitHub Actions: lint | ruff lint
cognee/api/v1/add/routers/get_add_router.py

[error] 47-47: Ruff: Undefined name error.

⏰ Context from checks skipped due to timeout of 90000ms (25)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_simple_example_test
  • GitHub Check: docker-compose-test
🔇 Additional comments (13)
cognee/infrastructure/llm/gemini/adapter.py (1)

15-15: No issues with the monitoring assignment.
The new line monitoring = get_base_config().monitoring_tool looks good and aligns with removing unneeded asynchronous references.

cognee/api/v1/users/routers/get_visualize_router.py (1)

1-2: Reduced imports improve cleanliness.
Switching to just import logging and Depends is consistent with the PR goal of dropping unused dependencies.

cognee/infrastructure/llm/anthropic/adapter.py (2)

17-18: Deferred import is a good optimization.
Moving import anthropic inside __init__ helps minimize overhead if the module isn't always needed.


22-22: Minor formatting tweak.
This additional blank line likely has no functional impact.

cognee/shared/utils.py (1)

42-42: Local import aligns with on-demand usage.
Deferring the NLTK import inside the function supports the goal of reducing unneeded global dependencies.

cognee/infrastructure/llm/openai/adapter.py (1)

15-16: LGTM: Minor whitespace addition.

This simple whitespace change improves code readability by adding consistent spacing around variable assignments.

Dockerfile (1)

5-6: LGTM: Properly added API as an optional dependency.

This addition aligns with the PR objective of making some dependencies optional. The API dependency is now explicitly listed as a Poetry extra, which allows users to install it only when needed.

cognee/api/v1/cognify/code_graph_pipeline.py (1)

27-28: LGTM: Minor whitespace addition.

This simple whitespace change improves code readability by adding consistent spacing around variable assignments, matching the same pattern used in the OpenAI adapter file.

cognee/eval_framework/benchmark_adapters/twowikimultihop_adapter.py (1)

1-1: LGTM: Removed unused imports.

Good job cleaning up the imports by removing unused ones (requests, os, json, random) and simplifying the typing imports to only include what's needed (Any). This aligns with the PR objective of removing unused dependencies.

cognee/modules/data/extraction/extract_summary.py (1)

38-38:

Details

✅ Verification successful

Verify that RetryError handling is no longer needed.

You've removed the RetryError from both imports and exception handling. Please verify that the code no longer needs to handle RetryError exceptions, which might occur from the underlying LLM client operations.


🏁 Script executed:

#!/bin/bash
# Check if RetryError is still used elsewhere in the codebase
echo "Checking for RetryError usage in the codebase..."
rg "RetryError" --type py

# Check if tenacity is still a dependency
echo -e "\nChecking if tenacity is still a dependency..."
grep -r "tenacity" pyproject.toml

# Check for other places where extract_code_summary is called
echo -e "\nChecking for places where extract_code_summary is called..."
rg "extract_code_summary" --type py

Length of output: 837


Removal of RetryError Handling Verified

The verification shows that no occurrences of RetryError remain in the codebase—the grep search returned no results. The exception handling now solely relies on InstructorRetryException, and given that calls to extract_code_summary are unaffected, the change appears safe. Although the dependency check for tenacity didn’t yield results due to missing pyproject.toml, there’s no indication that RetryError is still needed.

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (2)

2-2: No concerns with the new requests import.


74-74: Switched exception handling to match the requests library.
Catching requests.exceptions.HTTPError is correct now that the code uses requests.

cognee/api/v1/add/routers/get_add_router.py (1)

8-8: requests import is fine, but remember it is synchronous.
Using it in an async context may block other tasks unless carefully managed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)

1-58: 💡 Verification agent

🧩 Analysis chain

Fix formatting issues flagged by the linter.

The pipeline failure indicates that Ruff formatting check failed. Please run 'ruff format' to fix code style issues in this file.


🏁 Script executed:

#!/bin/bash
# Check if ruff is installed and format the file
which ruff && echo "Ruff is installed, you can run: ruff format cognee/api/v1/add/routers/get_add_router.py" || echo "Ruff is not installed"

Length of output: 161


Action Required: Install Ruff and Format the File
Our verification shows that Ruff is not installed on the CI environment, which is why the formatting check failed. Please ensure that Ruff is added to your project’s dependencies or installed in your CI environment, then run the following command to fix the styles:

  • Install Ruff (if not already available)
  • Execute:
    ruff format cognee/api/v1/add/routers/get_add_router.py
🧰 Tools
🪛 Ruff (0.8.2)

22-22: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🪛 GitHub Actions: lint | ruff format

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.

♻️ Duplicate comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)

42-45: ⚠️ Potential issue

Remove await from response.content() - requests is synchronous.

The requests library is synchronous and doesn't support awaiting. Replace await response.content() with response.content.

if response.status_code == 200:
-    file_data = await response.content()
+    file_data = response.content

    return await cognee_add(file_data)
🧹 Nitpick comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)

39-41: Consider using an async HTTP client in an async function.

Using the synchronous requests library inside an async function can block the event loop. Consider using an asynchronous HTTP client like aiohttp or httpx to maintain non-blocking behavior.

-import requests
+import httpx

# And then in the function:
-response = requests.get(data)
-response.raise_for_status()
+async with httpx.AsyncClient() as client:
+    response = await client.get(data)
+    response.raise_for_status()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a9240c and 06c1e96.

📒 Files selected for processing (1)
  • cognee/api/v1/add/routers/get_add_router.py (2 hunks)
🧰 Additional context used
🪛 GitHub Actions: lint | ruff format
cognee/api/v1/add/routers/get_add_router.py

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms (32)
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-15
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: docker-compose-test
  • GitHub Check: Build Cognee Backend Docker App Image
  • GitHub Check: run_simple_example_test
🔇 Additional comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)

7-8: Import change aligns with switching from async to sync HTTP client.

The addition of the requests library import indicates a deliberate change from an asynchronous HTTP client to a synchronous one. This change should be consistent with the overall architecture decision.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
cognee/api/v1/add/routers/get_add_router.py (2)

39-45: ⚠️ Potential issue

Fix synchronous/asynchronous code mismatch.

The requests library is synchronous and doesn't support await. The await response.content() call will cause a runtime error.

-response = requests.get(data)
-response.raise_for_status()
-if response.status_code == 200:
-    file_data = await response.content()
-    return await cognee_add(file_data)
+response = requests.get(data)
+response.raise_for_status()
+file_data = response.content
+return await cognee_add(file_data)

42-50: 🛠️ Refactor suggestion

Remove redundant status code check.

The conditional check for status code 200 is redundant because response.raise_for_status() on line 40 already raises an exception for non-successful status codes. The code after the else block will never be reached.

response = requests.get(data)
response.raise_for_status()
-if response.status_code == 200:
-    file_data = await response.content()
-    return await cognee_add(file_data)
-
-return JSONResponse(
-    status_code=400,
-    content={"error": f"Invalid content sent to cognee: {data}"},
-)
+file_data = response.content
+return await cognee_add(file_data)
🧹 Nitpick comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)

8-8: Confirm dependency removal is intentional.

The import of requests (synchronous HTTP library) appears to be replacing an asynchronous HTTP client, which aligns with the PR objective of dependency cleanup. However, using synchronous HTTP calls within an async function can block the event loop and reduce performance.

Consider using an asynchronous HTTP client like httpx or properly isolating synchronous calls with asyncio.to_thread().

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06c1e96 and f5d7120.

⛔ Files ignored due to path filters (2)
  • .github/workflows/test_chromadb.yml is excluded by !**/*.yml
  • .github/workflows/test_cognee_server_start.yml is excluded by !**/*.yml
📒 Files selected for processing (1)
  • cognee/api/v1/add/routers/get_add_router.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (32)
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: windows-latest
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Build Cognee Backend Docker App Image
  • GitHub Check: docker-compose-test
  • GitHub Check: run_simple_example_test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

51-51: ⚠️ Potential issue

Missing await for asyncio.gather.

The result of asyncio.gather must be awaited to ensure all tasks complete before returning, otherwise it returns a coroutine object instead of the actual embeddings.

-embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
+embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
cognee/api/v1/add/routers/get_add_router.py (1)

39-44: ⚠️ Potential issue

Invalid usage of await with synchronous requests library.

The code attempts to await response.content() which is invalid since requests is a synchronous library and its content is a property, not an awaitable coroutine.

response = requests.get(data)
response.raise_for_status()

-file_data = await response.content()
+file_data = response.content

return await cognee_add(file_data)
🧹 Nitpick comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)

39-44: Consider using a fully synchronous or fully asynchronous approach.

The code mixes synchronous HTTP requests with asynchronous processing, which can lead to inefficient resource usage. Consider either:

  1. Using an async HTTP client library throughout, or
  2. Making the entire request handling synchronous.

Option 1 (fully async):

-import requests
+import httpx

-response = requests.get(data)
-response.raise_for_status()
-file_data = await response.content()
+async with httpx.AsyncClient() as client:
+    response = await client.get(data)
+    response.raise_for_status()
+    file_data = response.content

return await cognee_add(file_data)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f5d7120 and cbbf1db.

⛔ Files ignored due to path filters (2)
  • .github/workflows/test_chromadb.yml is excluded by !**/*.yml
  • .github/workflows/test_cognee_server_start.yml is excluded by !**/*.yml
📒 Files selected for processing (2)
  • cognee/api/v1/add/routers/get_add_router.py (2 hunks)
  • cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (32)
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: chromadb test
  • GitHub Check: docker-compose-test
  • GitHub Check: run_simple_example_test
🔇 Additional comments (2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

74-74: Exception type updated to match requests library.

The exception handling has been appropriately updated to catch requests.exceptions.HTTPError.

cognee/api/v1/add/routers/get_add_router.py (1)

8-8: Added synchronous requests library.

The synchronous requests library has been added, replacing the previously used asynchronous HTTP client.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cognee/tasks/repo_processor/get_local_dependencies.py (1)

26-28: Consider performance implications of per-call initialization.

Moving PY_LANGUAGE and source_code_parser from module level to method level means these objects are recreated on every call to parse_file(). While this supports the goal of on-demand dependency initialization, it could impact performance if this method is called frequently.

Consider using lazy initialization at the class level instead:

class FileParser:
    def __init__(self):
        self.parsed_files = {}
+        self._language = None
+        self._parser = None

+    def _ensure_parser_initialized(self):
+        if self._parser is None:
+            self._language = Language(tspython.language())
+            self._parser = Parser(self._language)

    async def parse_file(self, file_path: str) -> tuple[str, Tree]:
-        PY_LANGUAGE = Language(tspython.language())
-        source_code_parser = Parser(PY_LANGUAGE)
+        self._ensure_parser_initialized()

        if file_path not in self.parsed_files:
            source_code = await get_source_code(file_path)
-            source_code_tree = source_code_parser.parse(bytes(source_code, "utf-8"))
+            source_code_tree = self._parser.parse(bytes(source_code, "utf-8"))
            self.parsed_files[file_path] = (source_code, source_code_tree)

        return self.parsed_files[file_path]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbbf1db and 1dc7091.

📒 Files selected for processing (4)
  • cognee/api/client.py (1 hunks)
  • cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1 hunks)
  • cognee/tasks/repo_processor/get_local_dependencies.py (2 hunks)
  • cognee/tasks/repo_processor/get_repo_file_dependencies.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1)
cognee/api/v1/cognify/code_graph_pipeline.py (1)
  • run_code_graph_pipeline (38-84)
cognee/tasks/repo_processor/get_repo_file_dependencies.py (1)
cognee/tasks/repo_processor/get_local_dependencies.py (1)
  • get_local_script_dependencies (78-112)
cognee/api/client.py (1)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1)
  • get_code_pipeline_router (24-59)
⏰ Context from checks skipped due to timeout of 90000ms (33)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-15
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: windows-latest
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: chromadb test
  • GitHub Check: run_simple_example_test
  • GitHub Check: docker-compose-test
  • GitHub Check: Build Cognee Backend Docker App Image
🔇 Additional comments (3)
cognee/tasks/repo_processor/get_repo_file_dependencies.py (1)

77-78: LGTM! Good practice for optional dependencies.

Moving the import statement from the module level to function level is a good practice for making dependencies optional. This ensures the dependency is only required when the function is actually used.

cognee/api/client.py (1)

172-174: LGTM! Good defensive programming.

This change properly handles the case where the code pipeline router is not available (when tree-sitter is not installed), preventing potential errors when including routers in the FastAPI application.

cognee/tasks/repo_processor/get_local_dependencies.py (1)

2-3: LGTM! Proper organization of imports.

Moving these imports to the top of the file follows standard Python style guidelines.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1)

24-29: ⚠️ Potential issue

Incorrect module import for dependency check.

The current implementation attempts to import run_code_graph_pipeline as a module, but this is actually a function that's imported from cognee.api.v1.cognify.code_graph_pipeline at line 35. This approach doesn't correctly verify if the tree-sitter dependency is available.

Apply this diff to correctly check for the tree-sitter dependency:

def get_code_pipeline_router() -> APIRouter:
    try:
-        import run_code_graph_pipeline
+        from tree_sitter import Language
+        # Verify tree-sitter is available
    except ModuleNotFoundError:
        logger.error("codegraph dependencies not found. Skipping codegraph API routes.")
        return None
🧰 Tools
🪛 Ruff (0.8.2)

25-25: run_code_graph_pipeline imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

51-52: ⚠️ Potential issue

Missing await for asyncio.gather.

The result of asyncio.gather must be awaited to ensure all tasks complete before returning. Without it, the function returns a coroutine object instead of the actual embedding results.

-embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
+embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45fbd1f and a271fb3.

📒 Files selected for processing (2)
  • cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1 hunks)
  • cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py

25-25: run_code_graph_pipeline imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (33)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-15
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on macos-13
  • GitHub Check: chromadb test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_simple_example_test
  • GitHub Check: Build Cognee Backend Docker App Image
  • GitHub Check: docker-compose-test
🔇 Additional comments (4)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1)

35-36: LGTM! Moving the import to function scope is a good practice.

Moving the import statement inside the function is appropriate for this use case, as it ensures the dependency is only loaded when the endpoint is actually called.

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)

2-2: Use consistent HTTP client library.

The code imports the synchronous requests library while the class maintains an async structure with async methods, creating an architectural inconsistency.


54-54: Method remains async but uses synchronous code.

The method signature is still async but now uses synchronous requests.post() which doesn't utilize async/await paradigm.


70-70: Blocking synchronous HTTP call in async method.

Using requests.post() in an async method blocks the event loop, defeating the purpose of asynchronous programming and potentially causing performance issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

53-53: ⚠️ Potential issue

Missing await keyword for asyncio.gather

The asyncio.gather call is missing the await keyword, which will return a coroutine object instead of the actual embedding results.

-        embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
+        embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
🧹 Nitpick comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

72-75: Simplify nested context managers

You can simplify the nested with statements for better readability.

-                async with aiohttp.ClientSession() as session:
-                    async with session.post(
-                        self.endpoint, json=payload, headers=headers, timeout=60.0
-                    ) as response:
+                async with aiohttp.ClientSession() as session, session.post(
+                    self.endpoint, json=payload, headers=headers, timeout=60.0
+                ) as response:
🧰 Tools
🪛 Ruff (0.8.2)

72-75: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a271fb3 and 07272a3.

⛔ Files ignored due to path filters (2)
  • poetry.lock is excluded by !**/*.lock, !**/*.lock
  • pyproject.toml is excluded by !**/*.toml
📒 Files selected for processing (1)
  • cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py

72-75: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (34)
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-15
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: chromadb test
  • GitHub Check: test
  • GitHub Check: lint (ubuntu-latest, 3.10.x)
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: run_simple_example_test
  • GitHub Check: docker-compose-test
  • GitHub Check: Build Cognee Backend Docker App Image

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

53-54: ⚠️ Potential issue

Missing await for asyncio.gather.

The current implementation doesn't await the result of asyncio.gather, which means it's returning a coroutine object rather than the actual embeddings. This will cause unexpected behavior when clients try to use the returned value.

-embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
+embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
🧹 Nitpick comments (2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (2)

72-75: Simplify nested with statements.

The nested with statements can be combined for better readability as suggested by the static analysis tool.

-async with aiohttp.ClientSession() as session:
-    async with session.post(
-        self.endpoint, json=payload, headers=headers, timeout=60.0
-    ) as response:
+async with aiohttp.ClientSession() as session, session.post(
+    self.endpoint, json=payload, headers=headers, timeout=60.0
+) as response:
🧰 Tools
🪛 Ruff (0.8.2)

72-75: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


56-77: Consider reusing the client session for better performance.

Creating a new aiohttp.ClientSession for each embedding request is inefficient, especially when embedding multiple texts. Consider creating a single session for all requests in the embed_text method and passing it to _get_embedding.

async def embed_text(self, text: List[str]) -> List[List[float]]:
    """
    Given a list of text prompts, returns a list of embedding vectors.
    """
    if self.mock:
        return [[0.0] * self.dimensions for _ in text]

+   async with aiohttp.ClientSession() as session:
-       embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
+       embeddings = await asyncio.gather(*[self._get_embedding(prompt, session) for prompt in text])
    return embeddings

-async def _get_embedding(self, prompt: str) -> List[float]:
+async def _get_embedding(self, prompt: str, session: aiohttp.ClientSession) -> List[float]:
    """
    Internal method to call the Ollama embeddings endpoint for a single prompt.
    """
    # ... existing code ...
    
    retries = 0
    while retries < self.MAX_RETRIES:
        try:
-           async with aiohttp.ClientSession() as session:
            async with session.post(
                self.endpoint, json=payload, headers=headers, timeout=60.0
            ) as response:
                data = await response.json()
                return data["embedding"]
🧰 Tools
🪛 Ruff (0.8.2)

72-75: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07272a3 and f65ef25.

📒 Files selected for processing (1)
  • cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py

72-75: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (33)
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: Test on macos-13
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: chromadb test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Build Cognee Backend Docker App Image
  • GitHub Check: run_simple_example_test
  • GitHub Check: docker-compose-test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

72-77: Simplify nested context managers

The nested with statements can be combined for better readability.

-                async with aiohttp.ClientSession() as session:
-                    async with session.post(
-                        self.endpoint, json=payload, headers=headers, timeout=60.0
-                    ) as response:
-                        data = await response.json()
-                        return data["embedding"]
+                async with aiohttp.ClientSession() as session, session.post(
+                    self.endpoint, json=payload, headers=headers, timeout=60.0
+                ) as response:
+                    data = await response.json()
+                    return data["embedding"]
🧰 Tools
🪛 Ruff (0.8.2)

72-75: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f65ef25 and ad8e75f.

📒 Files selected for processing (1)
  • cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py

72-75: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (33)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-15
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: test
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on macos-13
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: test
  • GitHub Check: chromadb test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Build Cognee Backend Docker App Image
  • GitHub Check: run_simple_example_test
  • GitHub Check: docker-compose-test
🔇 Additional comments (3)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)

53-53: Improved implementation of asyncio.gather

The updated implementation properly uses * instead of ** with the list comprehension and correctly awaits the result of asyncio.gather. This is now handling concurrent embedding requests properly.


56-56: Appropriate method signature update

The method signature has been simplified by removing the client parameter, which is now created internally. This makes the method more self-contained while maintaining its asynchronous nature.


78-78: Appropriate exception handling for aiohttp

The exception type has been correctly updated to match the aiohttp library (aiohttp.http_exceptions.HttpBadRequest), which is consistent with the library transition.

@borisarzentar borisarzentar merged commit d192d1f into dev Mar 25, 2025
41 checks passed
@borisarzentar borisarzentar deleted the feature/cog-1621-make-api-dependencies-optional branch March 25, 2025 09:19
@coderabbitai coderabbitai bot mentioned this pull request Apr 8, 2025
@coderabbitai coderabbitai bot mentioned this pull request Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants