-
Notifications
You must be signed in to change notification settings - Fork 958
chore: remove unused dependencies and make some optional #661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: remove unused dependencies and make some optional #661
Conversation
WalkthroughThis pull request updates several modules across the codebase. The Dockerfile now adds a new “api” extra dependency for Poetry. In the API router and embedding engine, asynchronous HTTP handling has been replaced with synchronous requests and adjusted error handling. Multiple files have been refactored to clean up imports and improve formatting, including the relocation of some imports into local scopes. Additionally, exception handling in the extract summary has been tightened, and an entire module for topic extraction has been removed. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant APIRouter
participant ExternalAPI
participant CogneeAdd
Client->>APIRouter: Send add request with data URL
APIRouter->>ExternalAPI: Synchronous requests.get(data)
ExternalAPI-->>APIRouter: HTTP Response
alt HTTP 200 OK
APIRouter->>CogneeAdd: Call cognee_add(file_data)
CogneeAdd-->>APIRouter: Processed result
APIRouter-->>Client: Return processed result
else HTTP Error
APIRouter-->>Client: Return 400 error JSON
end
sequenceDiagram
participant Caller
participant Engine as OllamaEmbeddingEngine
participant EmbeddingService
Caller->>Engine: Request embedding for prompt
Engine->>EmbeddingService: Synchronous requests.post(prompt)
EmbeddingService-->>Engine: HTTP Response (embedding data)
Engine-->>Caller: Return embedding list
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (2)
cognee/shared/utils.py (1)
45-47: 🛠️ Refactor suggestionRepeated NLTK downloads can hurt performance.
Downloading corpus data in every call may slow execution if invoked multiple times. Consider caching or verifying availability before each download.cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
54-70: 🛠️ Refactor suggestionAvoid blocking the event loop by using
requestsin an async method.
Sincerequestsis synchronous, the event loop will be blocked during network calls. Consider removing theasynckeyword or switching to an asynchronous HTTP library.Example refactor using
httpx:-async def _get_embedding(self, prompt: str) -> List[float]: - ... - response = requests.post(self.endpoint, json=payload, headers=headers, timeout=60.0) +import httpx + +async def _get_embedding(self, prompt: str) -> List[float]: + async with httpx.AsyncClient(timeout=60.0) as client: + response = await client.post(self.endpoint, json=payload, headers=headers)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
poetry.lockis excluded by!**/*.lock,!**/*.lockpyproject.tomlis excluded by!**/*.toml
📒 Files selected for processing (12)
Dockerfile(1 hunks)cognee/api/v1/add/routers/get_add_router.py(2 hunks)cognee/api/v1/cognify/code_graph_pipeline.py(1 hunks)cognee/api/v1/users/routers/get_visualize_router.py(1 hunks)cognee/eval_framework/benchmark_adapters/twowikimultihop_adapter.py(1 hunks)cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(3 hunks)cognee/infrastructure/llm/anthropic/adapter.py(1 hunks)cognee/infrastructure/llm/gemini/adapter.py(1 hunks)cognee/infrastructure/llm/openai/adapter.py(1 hunks)cognee/modules/data/extraction/extract_summary.py(1 hunks)cognee/modules/data/extraction/extract_topics_naive.py(0 hunks)cognee/shared/utils.py(1 hunks)
💤 Files with no reviewable changes (1)
- cognee/modules/data/extraction/extract_topics_naive.py
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/api/v1/add/routers/get_add_router.py
47-47: Undefined name error
(F821)
🪛 GitHub Actions: lint | ruff lint
cognee/api/v1/add/routers/get_add_router.py
[error] 47-47: Ruff: Undefined name error.
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: windows-latest
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: run_simple_example_test
- GitHub Check: docker-compose-test
🔇 Additional comments (13)
cognee/infrastructure/llm/gemini/adapter.py (1)
15-15: No issues with the monitoring assignment.
The new linemonitoring = get_base_config().monitoring_toollooks good and aligns with removing unneeded asynchronous references.cognee/api/v1/users/routers/get_visualize_router.py (1)
1-2: Reduced imports improve cleanliness.
Switching to justimport loggingandDependsis consistent with the PR goal of dropping unused dependencies.cognee/infrastructure/llm/anthropic/adapter.py (2)
17-18: Deferred import is a good optimization.
Movingimport anthropicinside__init__helps minimize overhead if the module isn't always needed.
22-22: Minor formatting tweak.
This additional blank line likely has no functional impact.cognee/shared/utils.py (1)
42-42: Local import aligns with on-demand usage.
Deferring the NLTK import inside the function supports the goal of reducing unneeded global dependencies.cognee/infrastructure/llm/openai/adapter.py (1)
15-16: LGTM: Minor whitespace addition.This simple whitespace change improves code readability by adding consistent spacing around variable assignments.
Dockerfile (1)
5-6: LGTM: Properly added API as an optional dependency.This addition aligns with the PR objective of making some dependencies optional. The API dependency is now explicitly listed as a Poetry extra, which allows users to install it only when needed.
cognee/api/v1/cognify/code_graph_pipeline.py (1)
27-28: LGTM: Minor whitespace addition.This simple whitespace change improves code readability by adding consistent spacing around variable assignments, matching the same pattern used in the OpenAI adapter file.
cognee/eval_framework/benchmark_adapters/twowikimultihop_adapter.py (1)
1-1: LGTM: Removed unused imports.Good job cleaning up the imports by removing unused ones (requests, os, json, random) and simplifying the typing imports to only include what's needed (Any). This aligns with the PR objective of removing unused dependencies.
cognee/modules/data/extraction/extract_summary.py (1)
38-38:Details
✅ Verification successful
Verify that RetryError handling is no longer needed.
You've removed the RetryError from both imports and exception handling. Please verify that the code no longer needs to handle RetryError exceptions, which might occur from the underlying LLM client operations.
🏁 Script executed:
#!/bin/bash # Check if RetryError is still used elsewhere in the codebase echo "Checking for RetryError usage in the codebase..." rg "RetryError" --type py # Check if tenacity is still a dependency echo -e "\nChecking if tenacity is still a dependency..." grep -r "tenacity" pyproject.toml # Check for other places where extract_code_summary is called echo -e "\nChecking for places where extract_code_summary is called..." rg "extract_code_summary" --type pyLength of output: 837
Removal of RetryError Handling Verified
The verification shows that no occurrences of
RetryErrorremain in the codebase—the grep search returned no results. The exception handling now solely relies onInstructorRetryException, and given that calls toextract_code_summaryare unaffected, the change appears safe. Although the dependency check for tenacity didn’t yield results due to missingpyproject.toml, there’s no indication thatRetryErroris still needed.cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (2)
2-2: No concerns with the newrequestsimport.
74-74: Switched exception handling to match therequestslibrary.
Catchingrequests.exceptions.HTTPErroris correct now that the code usesrequests.cognee/api/v1/add/routers/get_add_router.py (1)
8-8:requestsimport is fine, but remember it is synchronous.
Using it in an async context may block other tasks unless carefully managed.
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)
1-58: 💡 Verification agent🧩 Analysis chain
Fix formatting issues flagged by the linter.
The pipeline failure indicates that Ruff formatting check failed. Please run 'ruff format' to fix code style issues in this file.
🏁 Script executed:
#!/bin/bash # Check if ruff is installed and format the file which ruff && echo "Ruff is installed, you can run: ruff format cognee/api/v1/add/routers/get_add_router.py" || echo "Ruff is not installed"Length of output: 161
Action Required: Install Ruff and Format the File
Our verification shows that Ruff is not installed on the CI environment, which is why the formatting check failed. Please ensure that Ruff is added to your project’s dependencies or installed in your CI environment, then run the following command to fix the styles:
- Install Ruff (if not already available)
- Execute:
ruff format cognee/api/v1/add/routers/get_add_router.py🧰 Tools
🪛 Ruff (0.8.2)
22-22: Do not perform function call
Dependsin argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
🪛 GitHub Actions: lint | ruff format
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.
♻️ Duplicate comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)
42-45:⚠️ Potential issueRemove
awaitfromresponse.content()-requestsis synchronous.The
requestslibrary is synchronous and doesn't support awaiting. Replaceawait response.content()withresponse.content.if response.status_code == 200: - file_data = await response.content() + file_data = response.content return await cognee_add(file_data)
🧹 Nitpick comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)
39-41: Consider using an async HTTP client in an async function.Using the synchronous
requestslibrary inside an async function can block the event loop. Consider using an asynchronous HTTP client likeaiohttporhttpxto maintain non-blocking behavior.-import requests +import httpx # And then in the function: -response = requests.get(data) -response.raise_for_status() +async with httpx.AsyncClient() as client: + response = await client.get(data) + response.raise_for_status()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/api/v1/add/routers/get_add_router.py(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: lint | ruff format
cognee/api/v1/add/routers/get_add_router.py
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.
⏰ Context from checks skipped due to timeout of 90000ms (32)
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: Test on macos-13
- GitHub Check: Test on macos-15
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: windows-latest
- GitHub Check: test
- GitHub Check: docker-compose-test
- GitHub Check: Build Cognee Backend Docker App Image
- GitHub Check: run_simple_example_test
🔇 Additional comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)
7-8: Import change aligns with switching from async to sync HTTP client.The addition of the
requestslibrary import indicates a deliberate change from an asynchronous HTTP client to a synchronous one. This change should be consistent with the overall architecture decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
cognee/api/v1/add/routers/get_add_router.py (2)
39-45:⚠️ Potential issueFix synchronous/asynchronous code mismatch.
The
requestslibrary is synchronous and doesn't supportawait. Theawait response.content()call will cause a runtime error.-response = requests.get(data) -response.raise_for_status() -if response.status_code == 200: - file_data = await response.content() - return await cognee_add(file_data) +response = requests.get(data) +response.raise_for_status() +file_data = response.content +return await cognee_add(file_data)
42-50: 🛠️ Refactor suggestionRemove redundant status code check.
The conditional check for status code 200 is redundant because
response.raise_for_status()on line 40 already raises an exception for non-successful status codes. The code after the else block will never be reached.response = requests.get(data) response.raise_for_status() -if response.status_code == 200: - file_data = await response.content() - return await cognee_add(file_data) - -return JSONResponse( - status_code=400, - content={"error": f"Invalid content sent to cognee: {data}"}, -) +file_data = response.content +return await cognee_add(file_data)
🧹 Nitpick comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)
8-8: Confirm dependency removal is intentional.The import of
requests(synchronous HTTP library) appears to be replacing an asynchronous HTTP client, which aligns with the PR objective of dependency cleanup. However, using synchronous HTTP calls within an async function can block the event loop and reduce performance.Consider using an asynchronous HTTP client like
httpxor properly isolating synchronous calls withasyncio.to_thread().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.github/workflows/test_chromadb.ymlis excluded by!**/*.yml.github/workflows/test_cognee_server_start.ymlis excluded by!**/*.yml
📒 Files selected for processing (1)
cognee/api/v1/add/routers/get_add_router.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (32)
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: windows-latest
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: test
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Build Cognee Backend Docker App Image
- GitHub Check: docker-compose-test
- GitHub Check: run_simple_example_test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
51-51:⚠️ Potential issueMissing
awaitforasyncio.gather.The result of
asyncio.gathermust be awaited to ensure all tasks complete before returning, otherwise it returns a coroutine object instead of the actual embeddings.-embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text]) +embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text])cognee/api/v1/add/routers/get_add_router.py (1)
39-44:⚠️ Potential issueInvalid usage of
awaitwith synchronousrequestslibrary.The code attempts to await
response.content()which is invalid sincerequestsis a synchronous library and itscontentis a property, not an awaitable coroutine.response = requests.get(data) response.raise_for_status() -file_data = await response.content() +file_data = response.content return await cognee_add(file_data)
🧹 Nitpick comments (1)
cognee/api/v1/add/routers/get_add_router.py (1)
39-44: Consider using a fully synchronous or fully asynchronous approach.The code mixes synchronous HTTP requests with asynchronous processing, which can lead to inefficient resource usage. Consider either:
- Using an async HTTP client library throughout, or
- Making the entire request handling synchronous.
Option 1 (fully async):
-import requests +import httpx -response = requests.get(data) -response.raise_for_status() -file_data = await response.content() +async with httpx.AsyncClient() as client: + response = await client.get(data) + response.raise_for_status() + file_data = response.content return await cognee_add(file_data)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.github/workflows/test_chromadb.ymlis excluded by!**/*.yml.github/workflows/test_cognee_server_start.ymlis excluded by!**/*.yml
📒 Files selected for processing (2)
cognee/api/v1/add/routers/get_add_router.py(2 hunks)cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (32)
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: run_eval_framework_test / test
- GitHub Check: test
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: run_notebook_test / test
- GitHub Check: windows-latest
- GitHub Check: test
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: test
- GitHub Check: run_simple_example_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: chromadb test
- GitHub Check: docker-compose-test
- GitHub Check: run_simple_example_test
🔇 Additional comments (2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
74-74: Exception type updated to match requests library.The exception handling has been appropriately updated to catch
requests.exceptions.HTTPError.cognee/api/v1/add/routers/get_add_router.py (1)
8-8: Added synchronousrequestslibrary.The synchronous
requestslibrary has been added, replacing the previously used asynchronous HTTP client.
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
Outdated
Show resolved
Hide resolved
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
Show resolved
Hide resolved
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
Outdated
Show resolved
Hide resolved
…api-dependencies-optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cognee/tasks/repo_processor/get_local_dependencies.py (1)
26-28: Consider performance implications of per-call initialization.Moving
PY_LANGUAGEandsource_code_parserfrom module level to method level means these objects are recreated on every call toparse_file(). While this supports the goal of on-demand dependency initialization, it could impact performance if this method is called frequently.Consider using lazy initialization at the class level instead:
class FileParser: def __init__(self): self.parsed_files = {} + self._language = None + self._parser = None + def _ensure_parser_initialized(self): + if self._parser is None: + self._language = Language(tspython.language()) + self._parser = Parser(self._language) async def parse_file(self, file_path: str) -> tuple[str, Tree]: - PY_LANGUAGE = Language(tspython.language()) - source_code_parser = Parser(PY_LANGUAGE) + self._ensure_parser_initialized() if file_path not in self.parsed_files: source_code = await get_source_code(file_path) - source_code_tree = source_code_parser.parse(bytes(source_code, "utf-8")) + source_code_tree = self._parser.parse(bytes(source_code, "utf-8")) self.parsed_files[file_path] = (source_code, source_code_tree) return self.parsed_files[file_path]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cognee/api/client.py(1 hunks)cognee/api/v1/cognify/routers/get_code_pipeline_router.py(1 hunks)cognee/tasks/repo_processor/get_local_dependencies.py(2 hunks)cognee/tasks/repo_processor/get_repo_file_dependencies.py(1 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1)
cognee/api/v1/cognify/code_graph_pipeline.py (1)
run_code_graph_pipeline(38-84)
cognee/tasks/repo_processor/get_repo_file_dependencies.py (1)
cognee/tasks/repo_processor/get_local_dependencies.py (1)
get_local_script_dependencies(78-112)
cognee/api/client.py (1)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1)
get_code_pipeline_router(24-59)
⏰ Context from checks skipped due to timeout of 90000ms (33)
- GitHub Check: run_notebook_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-15
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: Test on macos-13
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: windows-latest
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: chromadb test
- GitHub Check: run_simple_example_test
- GitHub Check: docker-compose-test
- GitHub Check: Build Cognee Backend Docker App Image
🔇 Additional comments (3)
cognee/tasks/repo_processor/get_repo_file_dependencies.py (1)
77-78: LGTM! Good practice for optional dependencies.Moving the import statement from the module level to function level is a good practice for making dependencies optional. This ensures the dependency is only required when the function is actually used.
cognee/api/client.py (1)
172-174: LGTM! Good defensive programming.This change properly handles the case where the code pipeline router is not available (when tree-sitter is not installed), preventing potential errors when including routers in the FastAPI application.
cognee/tasks/repo_processor/get_local_dependencies.py (1)
2-3: LGTM! Proper organization of imports.Moving these imports to the top of the file follows standard Python style guidelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1)
24-29:⚠️ Potential issueIncorrect module import for dependency check.
The current implementation attempts to import
run_code_graph_pipelineas a module, but this is actually a function that's imported fromcognee.api.v1.cognify.code_graph_pipelineat line 35. This approach doesn't correctly verify if the tree-sitter dependency is available.Apply this diff to correctly check for the tree-sitter dependency:
def get_code_pipeline_router() -> APIRouter: try: - import run_code_graph_pipeline + from tree_sitter import Language + # Verify tree-sitter is available except ModuleNotFoundError: logger.error("codegraph dependencies not found. Skipping codegraph API routes.") return None🧰 Tools
🪛 Ruff (0.8.2)
25-25:
run_code_graph_pipelineimported but unused; consider usingimportlib.util.find_specto test for availability(F401)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
51-52:⚠️ Potential issueMissing
awaitforasyncio.gather.The result of
asyncio.gathermust be awaited to ensure all tasks complete before returning. Without it, the function returns a coroutine object instead of the actual embedding results.-embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text]) +embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py(1 hunks)cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py
25-25: run_code_graph_pipeline imported but unused; consider using importlib.util.find_spec to test for availability
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (33)
- GitHub Check: run_notebook_test / test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-15
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on macos-13
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: Test on macos-15
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on macos-13
- GitHub Check: chromadb test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: windows-latest
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: run_simple_example_test
- GitHub Check: Build Cognee Backend Docker App Image
- GitHub Check: docker-compose-test
🔇 Additional comments (4)
cognee/api/v1/cognify/routers/get_code_pipeline_router.py (1)
35-36: LGTM! Moving the import to function scope is a good practice.Moving the import statement inside the function is appropriate for this use case, as it ensures the dependency is only loaded when the endpoint is actually called.
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)
2-2: Use consistent HTTP client library.The code imports the synchronous
requestslibrary while the class maintains an async structure withasyncmethods, creating an architectural inconsistency.
54-54: Method remains async but uses synchronous code.The method signature is still
asyncbut now uses synchronousrequests.post()which doesn't utilize async/await paradigm.
70-70: Blocking synchronous HTTP call in async method.Using
requests.post()in an async method blocks the event loop, defeating the purpose of asynchronous programming and potentially causing performance issues.
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
53-53:⚠️ Potential issueMissing
awaitkeyword forasyncio.gatherThe
asyncio.gathercall is missing theawaitkeyword, which will return a coroutine object instead of the actual embedding results.- embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text]) + embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
🧹 Nitpick comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
72-75: Simplify nested context managersYou can simplify the nested
withstatements for better readability.- async with aiohttp.ClientSession() as session: - async with session.post( - self.endpoint, json=payload, headers=headers, timeout=60.0 - ) as response: + async with aiohttp.ClientSession() as session, session.post( + self.endpoint, json=payload, headers=headers, timeout=60.0 + ) as response:🧰 Tools
🪛 Ruff (0.8.2)
72-75: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
poetry.lockis excluded by!**/*.lock,!**/*.lockpyproject.tomlis excluded by!**/*.toml
📒 Files selected for processing (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
72-75: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
⏰ Context from checks skipped due to timeout of 90000ms (34)
- GitHub Check: run_eval_framework_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-15
- GitHub Check: run_simple_example_test / test
- GitHub Check: Test on macos-13
- GitHub Check: Test on macos-13
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: Test on macos-15
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: windows-latest
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: chromadb test
- GitHub Check: test
- GitHub Check: lint (ubuntu-latest, 3.10.x)
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: run_simple_example_test
- GitHub Check: docker-compose-test
- GitHub Check: Build Cognee Backend Docker App Image
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
Outdated
Show resolved
Hide resolved
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
53-54:⚠️ Potential issueMissing
awaitfor asyncio.gather.The current implementation doesn't await the result of
asyncio.gather, which means it's returning a coroutine object rather than the actual embeddings. This will cause unexpected behavior when clients try to use the returned value.-embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text]) +embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text])
🧹 Nitpick comments (2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (2)
72-75: Simplify nestedwithstatements.The nested
withstatements can be combined for better readability as suggested by the static analysis tool.-async with aiohttp.ClientSession() as session: - async with session.post( - self.endpoint, json=payload, headers=headers, timeout=60.0 - ) as response: +async with aiohttp.ClientSession() as session, session.post( + self.endpoint, json=payload, headers=headers, timeout=60.0 +) as response:🧰 Tools
🪛 Ruff (0.8.2)
72-75: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
56-77: Consider reusing the client session for better performance.Creating a new
aiohttp.ClientSessionfor each embedding request is inefficient, especially when embedding multiple texts. Consider creating a single session for all requests in theembed_textmethod and passing it to_get_embedding.async def embed_text(self, text: List[str]) -> List[List[float]]: """ Given a list of text prompts, returns a list of embedding vectors. """ if self.mock: return [[0.0] * self.dimensions for _ in text] + async with aiohttp.ClientSession() as session: - embeddings = asyncio.gather(*[self._get_embedding(prompt) for prompt in text]) + embeddings = await asyncio.gather(*[self._get_embedding(prompt, session) for prompt in text]) return embeddings -async def _get_embedding(self, prompt: str) -> List[float]: +async def _get_embedding(self, prompt: str, session: aiohttp.ClientSession) -> List[float]: """ Internal method to call the Ollama embeddings endpoint for a single prompt. """ # ... existing code ... retries = 0 while retries < self.MAX_RETRIES: try: - async with aiohttp.ClientSession() as session: async with session.post( self.endpoint, json=payload, headers=headers, timeout=60.0 ) as response: data = await response.json() return data["embedding"]🧰 Tools
🪛 Ruff (0.8.2)
72-75: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
72-75: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
⏰ Context from checks skipped due to timeout of 90000ms (33)
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_simple_example_test / test
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: Test on macos-13
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: chromadb test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on macos-15
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: windows-latest
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Build Cognee Backend Docker App Image
- GitHub Check: run_simple_example_test
- GitHub Check: docker-compose-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
72-77: Simplify nested context managersThe nested
withstatements can be combined for better readability.- async with aiohttp.ClientSession() as session: - async with session.post( - self.endpoint, json=payload, headers=headers, timeout=60.0 - ) as response: - data = await response.json() - return data["embedding"] + async with aiohttp.ClientSession() as session, session.post( + self.endpoint, json=payload, headers=headers, timeout=60.0 + ) as response: + data = await response.json() + return data["embedding"]🧰 Tools
🪛 Ruff (0.8.2)
72-75: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
72-75: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
⏰ Context from checks skipped due to timeout of 90000ms (33)
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_notebook_test / test
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on macos-13
- GitHub Check: Test on macos-15
- GitHub Check: run_simple_example_test / test
- GitHub Check: test
- GitHub Check: run_notebook_test / test
- GitHub Check: test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: test
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: windows-latest
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on macos-13
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: test
- GitHub Check: chromadb test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Build Cognee Backend Docker App Image
- GitHub Check: run_simple_example_test
- GitHub Check: docker-compose-test
🔇 Additional comments (3)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)
53-53: Improved implementation of asyncio.gatherThe updated implementation properly uses
*instead of**with the list comprehension and correctly awaits the result ofasyncio.gather. This is now handling concurrent embedding requests properly.
56-56: Appropriate method signature updateThe method signature has been simplified by removing the client parameter, which is now created internally. This makes the method more self-contained while maintaining its asynchronous nature.
78-78: Appropriate exception handling for aiohttpThe exception type has been correctly updated to match the aiohttp library (
aiohttp.http_exceptions.HttpBadRequest), which is consistent with the library transition.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin