Skip to content

Conversation

@dexters1
Copy link
Collaborator

@dexters1 dexters1 commented Oct 24, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new API routers for managing datasets, cognitive processing, search functionalities, and application settings.
    • Added endpoints for adding data to a graph and managing settings related to LLM and Vector Database configurations.
  • Bug Fixes

    • Streamlined error handling for login errors and other exceptions across various endpoints.
    • Enhanced error handling for dataset management operations.
  • Documentation

    • Updated import statements to enhance accessibility of new router functionalities.

These updates enhance the modularity and usability of the API, providing a more organized structure for users.

Added routers for search and settings endpoints

Refactor #COG-334
…outer

Moved permissions router in it's routers folder, added cognify router

Refactor #COG-334
Added routers for datasets and add endpoints

Refactor #COG-334
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 24, 2024

Walkthrough

The changes in this pull request involve a significant restructuring of the API routes within the cognee/api/client.py file, including the removal of several dataset-related endpoints. New routers have been introduced for managing datasets, cognify operations, search functionalities, and settings, promoting a modular architecture. Additionally, error handling has been refined, particularly for login errors. The import statements have been cleaned up for better organization. Overall, the modifications enhance the clarity and maintainability of the API's structure.

Changes

File Path Change Summary
cognee/api/client.py Restructured API routes, removed dataset-related endpoints, added modular routers for datasets, cognify, search, and settings. Streamlined error handling. Cleaned up import statements.
cognee/api/v1/add/routers/__init__.py Added import for get_add_router.
cognee/api/v1/add/routers/get_add_router.py Introduced get_add_router with a POST endpoint for adding data to a graph, handling URLs and files with error handling.
cognee/api/v1/cognify/routers/__init__.py Added import for get_cognify_router.
cognee/api/v1/cognify/routers/get_cognify_router.py Introduced get_cognify_router with a POST endpoint for cognitive processing and error handling.
cognee/api/v1/datasets/routers/__init__.py Added import for get_datasets_router.
cognee/api/v1/datasets/routers/get_datasets_router.py Introduced get_datasets_router with multiple endpoints for dataset management and error handling.
cognee/api/v1/permissions/routers/__init__.py Added import for get_permissions_router.
cognee/api/v1/search/routers/__init__.py Added import for get_search_router.
cognee/api/v1/search/routers/get_search_router.py Introduced get_search_router with a POST endpoint for search functionality and error handling.
cognee/api/v1/settings/routers/__init__.py Added import for get_settings_router.
cognee/api/v1/settings/routers/get_settings_router.py Introduced get_settings_router with GET and POST endpoints for managing settings and user authentication.

Possibly related PRs

  • feat: improve API request and response models and docs #154: The changes in this PR involve significant modifications to the cognee/api/client.py file, particularly in the restructuring of API endpoints and the introduction of new data transfer objects (DTOs) that are directly related to the changes made in the main PR, which also modifies the same file and endpoints.
  • Cog 170 pgvector adapter #158: This PR adds support for PGVector, which is relevant to the main PR's restructuring of API routes and endpoints related to datasets, as it may involve similar functionalities or endpoints that handle data operations.
  • COG-485 - Fix/integration test warnings #176: This PR includes updates to the cognee/api/client.py file, specifically regarding the handling of optional fields in DTOs and improvements in error handling for API endpoints, which aligns with the changes made in the main PR that also focuses on enhancing API response structures and error handling.

Suggested reviewers

  • borisarzentar

🐰 In the garden of code, we hop with glee,
New routers sprout, as fresh as can be!
With endpoints anew, we dance and we play,
Removing the old, making way for the day.
So here’s to the changes, so bright and so bold,
A tale of the API, in stories retold! 🌼✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Changed class order in settings router to allow proper work of settings

Refactor #COG-334
@borisarzentar borisarzentar marked this pull request as ready for review October 28, 2024 08:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 21

🧹 Outside diff range and nitpick comments (6)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)

1-8: Consider optimizing imports for better type safety.

The imports could be improved for better type safety and organization:

  • List can be imported directly from pydantic.types since we're using it with Pydantic models
  • Consider adding a return type hint for the router factory function by importing APIRouter as a type
-from fastapi import APIRouter
-from typing import List
+from fastapi import APIRouter, Depends
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel
+from pydantic.types import List
-from pydantic import BaseModel
-from cognee.modules.users.models import User
-from fastapi.responses import JSONResponse
+from cognee.modules.users.models import User
-from fastapi import Depends
cognee/api/v1/search/routers/get_search_router.py (2)

18-18: Move import statement to the module level.

The import statement for cognee_search should be moved to the top of the file with other imports for better maintainability and to follow Python's style guidelines.

from cognee.api.v1.search import SearchType
+from cognee.api.v1.search import search as cognee_search
from fastapi.responses import JSONResponse
...

@router.post("/", response_model = list)
async def search(payload: SearchPayloadDTO, user: User = Depends(get_authenticated_user)):
    """ This endpoint is responsible for searching for nodes in the graph."""
-    from cognee.api.v1.search import search as cognee_search

17-17: Enhance endpoint documentation.

The current docstring lacks important details about the endpoint's behavior, parameters, and possible responses.

-    """ This endpoint is responsible for searching for nodes in the graph."""
+    """Search for nodes in the graph based on the provided query and search type.
+    
+    Args:
+        payload (SearchPayloadDTO): The search parameters including type and query
+        user (User): The authenticated user performing the search
+    
+    Returns:
+        list: A list of matching search results
+        
+    Raises:
+        HTTPException: If the search fails or parameters are invalid
+    """
cognee/api/v1/settings/routers/get_settings_router.py (2)

29-29: Simplify type annotations for provider using Literal

The type annotations for provider can be simplified by passing multiple arguments to Literal directly, removing the unnecessary Union.

Apply this diff:

-class LLMConfigInputDTO(InDTO):
     provider: Union[Literal["openai"], Literal["ollama"], Literal["anthropic"]]
     model: str
     api_key: SecretStr

+class LLMConfigInputDTO(InDTO):
+    provider: Literal["openai", "ollama", "anthropic"]
     model: str
     api_key: SecretStr

-class VectorDBConfigInputDTO(InDTO):
     provider: Union[Literal["lancedb"], Literal["qdrant"], Literal["weaviate"], Literal["pgvector"]]
     url: str
     api_key: SecretStr

+class VectorDBConfigInputDTO(InDTO):
+    provider: Literal["lancedb", "qdrant", "weaviate", "pgvector"]
     url: str
     api_key: SecretStr

Also applies to: 34-34


24-24: Prefix unused user parameters with an underscore

The user parameter is not used in the get_settings and save_settings functions. To indicate that it is intentionally unused and to adhere to Python conventions, consider renaming it to _user.

Apply this diff:

 async def get_settings(user: User = Depends(get_authenticated_user)):
+async def get_settings(_user: User = Depends(get_authenticated_user)):
     from cognee.modules.settings import get_settings as get_cognee_settings
     return get_cognee_settings()

 async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)):
+async def save_settings(new_settings: SettingsPayloadDTO, _user: User = Depends(get_authenticated_user)):
     from cognee.modules.settings import save_llm_config, save_vector_db_config

     if new_settings.llm is not None:

Also applies to: 43-43

cognee/api/v1/datasets/routers/get_datasets_router.py (1)

126-131: Use HTTPException for consistent error handling.

At lines 126-131, you're returning a JSONResponse with a 404 status code and an error detail in the content. To maintain consistency across your API and leverage FastAPI's built-in exception handling, consider using raise HTTPException instead.

Apply this diff for consistency:

             if dataset is None:
-                return JSONResponse(
-                    status_code=404,
-                    content={
-                        "detail": f"Dataset ({dataset_id}) not found."
-                    }
-                )
+                raise HTTPException(
+                    status_code=404,
+                    detail=f"Dataset ({dataset_id}) not found."
+                )

This approach ensures that error responses are handled uniformly throughout your API.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1088b58 and 02d6750.

📒 Files selected for processing (12)
  • cognee/api/client.py (3 hunks)
  • cognee/api/v1/add/routers/init.py (1 hunks)
  • cognee/api/v1/add/routers/get_add_router.py (1 hunks)
  • cognee/api/v1/cognify/routers/init.py (1 hunks)
  • cognee/api/v1/cognify/routers/get_cognify_router.py (1 hunks)
  • cognee/api/v1/datasets/routers/init.py (1 hunks)
  • cognee/api/v1/datasets/routers/get_datasets_router.py (1 hunks)
  • cognee/api/v1/permissions/routers/init.py (1 hunks)
  • cognee/api/v1/search/routers/init.py (1 hunks)
  • cognee/api/v1/search/routers/get_search_router.py (1 hunks)
  • cognee/api/v1/settings/routers/init.py (1 hunks)
  • cognee/api/v1/settings/routers/get_settings_router.py (1 hunks)
✅ Files skipped from review due to trivial changes (6)
  • cognee/api/v1/add/routers/init.py
  • cognee/api/v1/cognify/routers/init.py
  • cognee/api/v1/datasets/routers/init.py
  • cognee/api/v1/permissions/routers/init.py
  • cognee/api/v1/search/routers/init.py
  • cognee/api/v1/settings/routers/init.py
🧰 Additional context used
🪛 Ruff
cognee/api/client.py

55-55: Module level import not at top of file

(E402)


56-56: Module level import not at top of file

(E402)


57-57: Module level import not at top of file

(E402)


58-58: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)


60-60: Module level import not at top of file

(E402)

cognee/api/v1/datasets/routers/get_datasets_router.py

68-68: Do not use bare except

(E722)

🔇 Additional comments (12)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)

9-27: Verify router integration in the main application.

Let's ensure this router is properly integrated into the main FastAPI application structure.

✅ Verification successful

Router integration is properly implemented

The verification shows that:

  • The router is correctly imported and mounted in client.py with the appropriate prefix /api/v1/cognify
  • The router is properly exported in __init__.py
  • No path conflicts exist with other routers as they all have unique prefixes
  • The route pattern POST / is used consistently across different routers, each under their own unique prefix
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the router is properly imported and mounted in the main app
rg -l "get_cognify_router" --type py

# Check for potential duplicate route definitions
ast-grep --pattern 'router.post("/", $$$)'

Length of output: 582


Script:

#!/bin/bash
# Check how the router is imported and mounted in client.py
rg -A 5 "get_cognify_router" cognee/api/client.py

# Check the router's registration in __init__.py
cat cognee/api/v1/cognify/routers/__init__.py

# Check for any potential path conflicts in route registration
rg "prefix=" --type py

Length of output: 818

cognee/api/v1/search/routers/get_search_router.py (1)

21-21: Verify search implementation compatibility.

Let's verify that the cognee_search function exists with the expected signature.

✅ Verification successful

Let me gather more information to verify the search implementation.


Let me check the search implementation in the imported module.


Search implementation is compatible and correctly integrated

The cognee_search function (aliased from search) is properly implemented in search_v2.py with the exact signature required:

  • search_type: SearchType
  • query: str
  • user: User = None

The function is correctly imported in __init__.py and used appropriately in the router.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the cognee_search function definition
ast-grep --pattern 'async def search($params) {
  $$$
}'

# Look for other usages of the search function
rg -A 5 'cognee_search\(' 

Length of output: 601


Script:

#!/bin/bash
# Search for the cognee_search function definition with imports
rg -p "cognee_search" --type py

# Look for function definition
ast-grep --pattern 'async def cognee_search($params) {
  $$$
}'

# Check imports in the router file
rg "^from.*import.*cognee_search" --type py

Length of output: 348


Script:

#!/bin/bash
# Check the search implementation in the module
rg -p "async def search" cognee/api/v1/search/

# Look for the search function with its parameters
ast-grep --pattern 'async def search($search_type, $query, $user) {
  $$$
}'

# Check the __init__.py in the search module
cat cognee/api/v1/search/__init__.py

Length of output: 605

cognee/api/v1/settings/routers/get_settings_router.py (1)

31-31: Use SecretStr for api_key fields to protect sensitive data

To enhance security, use SecretStr from pydantic for the api_key fields in both LLMConfigInputDTO and VectorDBConfigInputDTO. This prevents the API keys from being accidentally logged or displayed.

[security]

Apply this diff:

+from pydantic import SecretStr

 class LLMConfigInputDTO(InDTO):
     provider: Literal["openai", "ollama", "anthropic"]
     model: str
-    api_key: str
+    api_key: SecretStr

 class VectorDBConfigInputDTO(InDTO):
     provider: Literal["lancedb", "qdrant", "weaviate", "pgvector"]
     url: str
-    api_key: str
+    api_key: SecretStr

Also applies to: 36-36

cognee/api/v1/add/routers/get_add_router.py (2)

45-49: Inconsistent Use of user Parameter in cognee_add Calls

In the call to cognee_add on lines 45-49, the user parameter is passed, but in previous calls (lines 28-31 and 40-43), user is not included. This inconsistency might lead to unexpected behaviors.

Ensure that the user parameter is consistently passed to cognee_add if required:

# For the call on lines 28-31:
                      await cognee_add(
                          "data://.data/",
                          f"{repo_name}",
+                         user=user,
                      )

# For the call on lines 40-43:
                      await cognee_add(
                          "data://.data/",
                          f"{filename}",
+                         user=user,
                      )

51-54: 🛠️ Refactor suggestion

Broad Exception Handling

Catching all exceptions with a generic except Exception can make debugging difficult and may suppress important errors.

Catch specific exceptions to handle known error cases and allow unexpected exceptions to propagate:

-         except Exception as error:
+         except SomeSpecificException as error:

Replace SomeSpecificException with the specific exceptions you expect (e.g., aiohttp.ClientError, subprocess.CalledProcessError).

Likely invalid or redundant comment.

cognee/api/client.py (6)

6-7: Importing FastAPI components

The import statements for FastAPI, JSONResponse, and Response are correctly added and necessary for the application.


130-134: Including datasets router

The get_datasets_router is correctly included, promoting a modular API structure.


136-140: Including add router

The get_add_router is properly included with the appropriate prefix and tags, enhancing the API's modularity.


142-146: Including cognify router

The get_cognify_router is correctly included, which promotes better separation of concerns within the API.


148-152: Including search router

The get_search_router is properly added with the correct prefix and tags, contributing to a more organized API.


154-158: Including settings router

The get_settings_router is correctly included, aligning with the modular architecture and enhancing maintainability.

cognee/api/v1/datasets/routers/get_datasets_router.py (1)

147-147: ⚠️ Potential issue

Ensure secure handling of file paths when returning raw data.

At line 147, the endpoint returns data.raw_data_location directly as a FileResponse. It's crucial to ensure that raw_data_location is a safe and validated file path to prevent potential security vulnerabilities like directory traversal attacks. Serving files without proper validation can expose sensitive files or allow unauthorized access.

Consider adding validation to ensure raw_data_location points to an allowed directory and sanitizing the input if necessary. For example:

import os

# Ensure the file path is within the allowed directory
allowed_directory = '/path/to/allowed/directory'
full_path = os.path.abspath(data.raw_data_location)
if not full_path.startswith(os.path.abspath(allowed_directory)):
    raise HTTPException(status_code=403, detail="Access to the requested file is forbidden.")

return FileResponse(full_path)

This approach helps prevent unauthorized file access by confirming that the requested file resides within a permitted directory.

Comment on lines 9 to 11
def get_cognify_router():
router = APIRouter()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider restructuring for better maintainability and type safety.

The current nested structure could be improved:

  1. Move CognifyPayloadDTO outside the factory function for better reusability and testing
  2. Add return type hint for better type safety
+class CognifyPayloadDTO(BaseModel):
+    datasets: List[str]
+
-def get_cognify_router():
+def get_cognify_router() -> APIRouter:
     router = APIRouter()
-
-    class CognifyPayloadDTO(BaseModel):
-        datasets: List[str]

Also applies to: 27-27

Comment on lines +15 to +18
@router.post("/", response_model=None)
async def cognify(payload: CognifyPayloadDTO, user: User = Depends(get_authenticated_user)):
""" This endpoint is responsible for the cognitive processing of the content."""
from cognee.api.v1.cognify.cognify_v2 import cognify as cognee_cognify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve endpoint definition and imports.

Several improvements could enhance the endpoint's clarity and reliability:

  1. Use a more descriptive path than "/"
  2. Move the cognify import to the top level
  3. Define a proper response model instead of None
-@router.post("/", response_model=None)
+@router.post("/process", response_model=dict)
 async def cognify(payload: CognifyPayloadDTO, user: User = Depends(get_authenticated_user)):
     """ This endpoint is responsible for the cognitive processing of the content."""
-    from cognee.api.v1.cognify.cognify_v2 import cognify as cognee_cognify

Move to top-level imports:

from cognee.api.v1.cognify.cognify_v2 import cognify as cognee_cognify

Comment on lines +19 to +25
try:
await cognee_cognify(payload.datasets, user)
except Exception as error:
return JSONResponse(
status_code=409,
content={"error": str(error)}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling specificity and security.

The current error handling implementation has several concerns:

  1. Generic exception catching could mask specific errors
  2. Using 409 (Conflict) for all errors is not RESTful
  3. Directly exposing exception messages could leak sensitive information

Consider implementing more specific error handling:

-        try:
-            await cognee_cognify(payload.datasets, user)
-        except Exception as error:
-            return JSONResponse(
-                status_code=409,
-                content={"error": str(error)}
-            )
+        try:
+            await cognee_cognify(payload.datasets, user)
+        except ValueError as error:
+            return JSONResponse(
+                status_code=400,
+                content={"error": "Invalid input provided"}
+            )
+        except PermissionError:
+            return JSONResponse(
+                status_code=403,
+                content={"error": "Insufficient permissions"}
+            )
+        except Exception as error:
+            # Log the actual error for debugging
+            logger.error(f"Unexpected error in cognify: {error}")
+            return JSONResponse(
+                status_code=500,
+                content={"error": "An unexpected error occurred"}
+            )

Committable suggestion was skipped due to low confidence.

search_type: SearchType
query: str

@router.post("/", response_model = list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Specify the response model type.

The response_model is set to list without type information. Consider creating a response DTO to properly document the structure of the returned data.

from pydantic import BaseModel

class SearchResultDTO(BaseModel):
    # Add appropriate fields based on the actual search results
    id: str
    title: str
    # ... other fields

@router.post("/", response_model=list[SearchResultDTO])

Comment on lines 11 to 13
class SearchPayloadDTO(InDTO):
search_type: SearchType
query: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Move DTO class to module level.

The SearchPayloadDTO class should be defined at the module level rather than nested inside the router factory function. This improves reusability and follows standard Python practices.

from cognee.api.v1.search import SearchType
from fastapi.responses import JSONResponse
from cognee.modules.users.models import User
from fastapi import Depends, APIRouter
from cognee.api.DTO import InDTO
from cognee.modules.users.methods import get_authenticated_user

+class SearchPayloadDTO(InDTO):
+    search_type: SearchType
+    query: str
+
def get_search_router():
    router = APIRouter()

-    class SearchPayloadDTO(InDTO):
-        search_type: SearchType
-        query: str
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class SearchPayloadDTO(InDTO):
search_type: SearchType
query: str
from cognee.api.v1.search import SearchType
from fastapi.responses import JSONResponse
from cognee.modules.users.models import User
from fastapi import Depends, APIRouter
from cognee.api.DTO import InDTO
from cognee.modules.users.methods import get_authenticated_user
class SearchPayloadDTO(InDTO):
search_type: SearchType
query: str
def get_search_router():
router = APIRouter()

Comment on lines 134 to 135
if dataset_data is None:
raise HTTPException(status_code=404, detail=f"Dataset ({dataset_id}) not found.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct error message when no data is found in the dataset.

At lines 134-135, when dataset_data is None, you raise an HTTPException indicating that the dataset was not found. However, the dataset was successfully retrieved earlier; it's the data within the dataset that's missing. This could lead to confusion for the user.

Apply this diff to provide an accurate error message:

         if dataset_data is None:
-            raise HTTPException(status_code=404, detail=f"Dataset ({dataset_id}) not found.")
+            raise HTTPException(status_code=404, detail=f"No data found in dataset ({dataset_id}).")

This change accurately reflects the issue, informing the user that the dataset exists but contains no data.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if dataset_data is None:
raise HTTPException(status_code=404, detail=f"Dataset ({dataset_id}) not found.")
if dataset_data is None:
raise HTTPException(status_code=404, detail=f"No data found in dataset ({dataset_id}).")

Comment on lines 137 to 139
data = [data for data in dataset_data if str(data.id) == data_id][0]

if data is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential IndexError when accessing data by ID.

At lines 137-139, you attempt to access the first element of a list comprehension without checking if the list is empty. If no matching data is found, this will raise an IndexError, resulting in an unhandled exception.

Apply this diff to safely handle the case where the data is not found:

-        data = [data for data in dataset_data if str(data.id) == data_id][0]
-
-        if data is None:
+        matching_data = [data for data in dataset_data if str(data.id) == data_id]
+        if not matching_data:
+            raise HTTPException(
+                status_code=404,
+                detail=f"Data ({data_id}) not found in dataset ({dataset_id})."
+            )
+        data = matching_data[0]

This modification checks whether any matching data exists before attempting to access it, preventing a potential IndexError and providing a clear error message to the user if the data is not found.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data = [data for data in dataset_data if str(data.id) == data_id][0]
if data is None:
matching_data = [data for data in dataset_data if str(data.id) == data_id]
if not matching_data:
raise HTTPException(
status_code=404,
detail=f"Data ({data_id}) not found in dataset ({dataset_id})."
)
data = matching_data[0]

Comment on lines +92 to +95
status_code=404,
content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure consistent error handling and proper serialization of responses.

At lines 92-95, you're returning a JSONResponse with content set to an instance of ErrorResponseDTO. Directly passing a Pydantic model instance to content may lead to improper serialization since JSONResponse expects a serializable data type like a dictionary. Additionally, this approach is inconsistent with error handling in other parts of your code where HTTPException is raised.

Apply this diff to raise an HTTPException for consistency:

             if dataset is None:
-                return JSONResponse(
-                    status_code=404,
-                    content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
-                )
+                raise HTTPException(
+                    status_code=404,
+                    detail=f"Dataset ({dataset_id}) not found."
+                )

This change leverages FastAPI's exception handling mechanisms and ensures consistent error responses across your API.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
status_code=404,
content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
)
raise HTTPException(
status_code=404,
detail=f"Dataset ({dataset_id}) not found."
)

Comment on lines +113 to +116
return JSONResponse(
status_code=409,
content={"error": str(error)}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle exceptions more precisely and raise appropriate HTTP exceptions.

At lines 113-116, you catch all exceptions and return a JSONResponse with a 409 status code. Catching all exceptions can mask unexpected errors and make debugging more challenging. It's better to catch specific exceptions and raise an HTTPException to provide more informative error responses.

Apply this diff to improve exception handling:

             try:
                 datasets_statuses = await cognee_datasets.get_status(datasets)

                 return datasets_statuses
-            except Exception as error:
-                return JSONResponse(
-                    status_code=409,
-                    content={"error": str(error)}
-                )
+            except SpecificException as error:
+                logger.error(f"Error retrieving dataset statuses: {str(error)}")
+                raise HTTPException(
+                    status_code=500,
+                    detail=f"Error retrieving dataset statuses: {str(error)}"
+                ) from error

Replace SpecificException with the specific exception(s) you anticipate, such as DatabaseError, ConnectionError, or other relevant exceptions. This change enhances error transparency and maintains consistency.

Committable suggestion was skipped due to low confidence.

status_code=200,
content=str(graph_url),
)
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid using bare except statements.

At line 68, the code uses a bare except: clause. Using bare except statements is discouraged because it catches all exceptions, including system-exiting exceptions like KeyboardInterrupt and SystemExit, making debugging difficult and potentially masking unexpected errors. It's better to catch specific exceptions to handle anticipated error conditions.

Apply this diff to catch specific exceptions:

-            except:
+            except SpecificException as error:

Replace SpecificException with the specific exception(s) you expect, such as ConnectionError, RuntimeError, or any other relevant exceptions that might be raised in the try block.

Committable suggestion was skipped due to low confidence.

🧰 Tools
🪛 Ruff

68-68: Do not use bare except

(E722)

Add ability to delete data instance through endpoint

Feature #COG-455
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (6)
cognee/modules/data/methods/delete_dataset.py (1)

Line range hint 1-7: Consider adding error handling and logging.

The function would benefit from proper error handling and logging to track deletion operations and handle failures gracefully.

Consider implementing the following improvements:

 from cognee.modules.data.models import Dataset
 from cognee.infrastructure.databases.relational import get_relational_engine
+from logging import getLogger
+
+logger = getLogger(__name__)
 
 async def delete_dataset(dataset: Dataset):
     db_engine = get_relational_engine()
+    try:
+        logger.info(f"Deleting dataset with id: {dataset.id}")
+        result = await db_engine.delete_data_by_id(dataset.__tablename__, dataset.id)
+        logger.info(f"Successfully deleted dataset with id: {dataset.id}")
+        return result
+    except Exception as e:
+        logger.error(f"Failed to delete dataset with id: {dataset.id}. Error: {str(e)}")
+        raise
-    return await db_engine.delete_data_by_id(dataset.__tablename__, dataset.id)
cognee/modules/data/methods/delete_data.py (2)

1-4: Remove extra blank line

There's an unnecessary extra blank line between imports and the function definition. One blank line is sufficient according to PEP 8.

 from cognee.modules.data.models import Data
 from cognee.infrastructure.databases.relational import get_relational_engine
 
-
 async def delete_data(data: Data):

5-8: Consider adding audit logging

For data deletion operations, it's important to maintain an audit trail. Consider implementing logging to track:

  • Who initiated the deletion
  • When it occurred
  • What data was deleted
  • The reason for deletion (if applicable)

This will help with compliance requirements and debugging.

cognee/modules/data/methods/get_data.py (1)

9-10: Consider adding pagination for large datasets.

If this function might be used to retrieve large data objects, consider implementing pagination or streaming to prevent memory issues.

cognee/api/v1/datasets/routers/get_datasets_router.py (2)

41-54: Consider adding a success response.

The endpoint correctly handles authentication and errors, but consider returning a success message to improve API usability.

     await delete_dataset(dataset)
+    return JSONResponse(
+        status_code=200,
+        content={"message": f"Dataset {dataset_id} successfully deleted"}
+    )

63-63: Address the TODO comment regarding permission handling.

The comment indicates a need to handle permission checks differently. Consider implementing role-based access control (RBAC) for more granular permissions.

Would you like me to help design and implement a proper permission handling system?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 02d6750 and eca1b9f.

📒 Files selected for processing (5)
  • cognee/api/v1/datasets/routers/get_datasets_router.py (1 hunks)
  • cognee/modules/data/methods/init.py (1 hunks)
  • cognee/modules/data/methods/delete_data.py (1 hunks)
  • cognee/modules/data/methods/delete_dataset.py (1 hunks)
  • cognee/modules/data/methods/get_data.py (1 hunks)
🧰 Additional context used
🪛 Ruff
cognee/api/v1/datasets/routers/get_datasets_router.py

93-93: Do not use bare except

(E722)

🔇 Additional comments (9)
cognee/modules/data/methods/__init__.py (3)

9-9: Verify get_data function signature consistency.

Let's check the implementation of get_data to ensure it follows the same pattern as other getter functions:

#!/bin/bash
# Description: Check get_data implementation pattern
# Expected: Function should accept data_id parameter

ast-grep --pattern 'def get_data($$$):'

13-13: Verify delete_data function signature consistency.

Let's check the implementation of delete_data to ensure it follows the same pattern as other deletion functions:

#!/bin/bash
# Description: Check delete_data implementation pattern
# Expected: Function should accept similar parameters as delete_dataset

ast-grep --pattern 'def delete_data($$$):'

9-9: LGTM! Clean import organization.

The new imports maintain the existing organizational structure, with clear categorization into "Get" and "Delete" sections.

Let's verify the existence of the newly imported modules:

Also applies to: 13-13

✅ Verification successful

Import verification successful

The imported modules get_data.py and delete_data.py exist in the correct location with proper implementations:

  • get_data.py contains the async function get_data(user_id: UUID, data_id: UUID)
  • delete_data.py contains the async function delete_data(data: Data)

The imports follow the established pattern in the module, grouped with other similar data operations (get/delete).

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the imported modules exist
# Expected: Both files should exist in the same directory

# Check for the existence of the new modules
fd -t f "get_data.py|delete_data.py" cognee/modules/data/methods/

Length of output: 138


Script:

#!/bin/bash
# Let's try a different approach to verify the existence of these modules
# First, check if the files exist
ls -la cognee/modules/data/methods/get_data.py cognee/modules/data/methods/delete_data.py 2>/dev/null

# Also search for any Python files in the methods directory to understand the structure
ls -la cognee/modules/data/methods/*.py 2>/dev/null

# Use ripgrep to find any references to these functions
rg "get_data|delete_data" cognee/modules/data/methods/

Length of output: 3120

cognee/modules/data/methods/get_data.py (2)

1-4: LGTM! Clean and well-organized imports.

The imports are properly organized and include all necessary dependencies.


6-12: Verify database schema and indexes.

Ensure proper indexing exists on the data_id column for optimal query performance.

cognee/api/v1/datasets/routers/get_datasets_router.py (4)

1-19: LGTM: Imports and router setup are well-organized.

The imports are properly organized and the router setup follows FastAPI best practices.


30-40: LGTM: Proper error handling in get_datasets endpoint.

The endpoint correctly handles authentication and errors, with appropriate logging.


72-76: ⚠️ Potential issue

Fix incorrect error message in data not found case.

The error message incorrectly refers to "Dataset" when it should refer to "Data".

         if data is None:
             raise HTTPException(
                 status_code=404,
-                detail=f"Dataset ({data_id}) not found."
+                detail=f"Data ({data_id}) not found."
             )

Likely invalid or redundant comment.


162-170: ⚠️ Potential issue

Fix unsafe list indexing and maintain consistent error handling.

The current implementation could raise an IndexError if no matching data is found. Also, maintain consistent error handling pattern.

-        data = [data for data in dataset_data if str(data.id) == data_id][0]
-
-        if data is None:
-            return JSONResponse(
-                status_code=404,
-                content={
-                    "detail": f"Data ({data_id}) not found in dataset ({dataset_id})."
-                }
-            )
+        matching_data = [data for data in dataset_data if str(data.id) == data_id]
+        if not matching_data:
+            raise HTTPException(
+                status_code=404,
+                detail=f"Data ({data_id}) not found in dataset ({dataset_id})."
+            )
+        data = matching_data[0]

Likely invalid or redundant comment.

db_engine = get_relational_engine()

return await db_engine.delete_table(dataset.id)
return await db_engine.delete_data_by_id(dataset.__tablename__, dataset.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Warning: Related data cleanup needed for dataset deletion

The Dataset model has a many-to-many relationship with Data through the DatasetData join table. The current deletion method only removes the dataset record but doesn't handle:

  • Associated records in the dataset_data join table
  • The relationship with Data model through the secondary table

Action items:

  • Add cleanup of related records in the dataset_data table
  • Consider implementing cascading delete in the Dataset model's relationship definition
  • Add transaction handling to ensure atomicity of the deletion operation
🔗 Analysis chain

Improved data safety with granular deletion.

The change from deleting an entire table to deleting a specific record is a significant improvement in data safety and granularity.

Let's verify if there are any cascading deletes or related data that needs cleanup:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for foreign key relationships to Dataset table
rg -l "ForeignKey.*Dataset" 

# Search for direct references to Dataset model
ast-grep --pattern 'class $_ {
  $$$
  Dataset $$$
  $$$
}'

Length of output: 91


Script:

#!/bin/bash
# Let's try different approaches to find relationships

# Search for Dataset model definition
rg -l "class Dataset"

# Search for any references to Dataset in models
rg -l "Dataset" --type py

# Look for table relationships or foreign keys with different patterns
rg -l "relationship.*Dataset|references.*dataset" --type py

# Search for any SQL or database operations mentioning dataset
rg -l "dataset.*foreign|cascade.*dataset" --type py -i

Length of output: 976


Script:

#!/bin/bash
# Let's examine the DatasetData model since it appears to have relationships
cat cognee/modules/data/models/DatasetData.py

# Also check the Dataset model definition
cat cognee/modules/data/models/Dataset.py

# Look for any cascade delete configurations
rg "cascade.*delete" --type py -A 2 -B 2

Length of output: 1784

Comment on lines 5 to 8
async def delete_data(data: Data):
db_engine = get_relational_engine()

return await db_engine.delete_data_by_id(data.__tablename__, data.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling, type hints, and documentation

The function needs several improvements for production readiness:

  1. Add return type hint and comprehensive docstring
  2. Implement error handling for database operations
  3. Consider using transaction context
  4. Validate table name to prevent SQL injection

Here's the suggested implementation:

-async def delete_data(data: Data):
-    db_engine = get_relational_engine()
-
-    return await db_engine.delete_data_by_id(data.__tablename__, data.id)
+from typing import Optional
+from sqlalchemy.exc import SQLAlchemyError
+
+async def delete_data(data: Data) -> Optional[bool]:
+    """Delete a data record from the database.
+
+    Args:
+        data (Data): The data object to be deleted.
+
+    Returns:
+        Optional[bool]: True if deletion was successful, False if record not found,
+                       None if operation failed.
+
+    Raises:
+        ValueError: If the data object is invalid.
+    """
+    if not data or not data.id:
+        raise ValueError("Invalid data object")
+
+    db_engine = get_relational_engine()
+    try:
+        async with db_engine.begin() as transaction:
+            # Validate table name against known models to prevent SQL injection
+            if not hasattr(data, '__tablename__') or not isinstance(data.__tablename__, str):
+                raise ValueError("Invalid table name")
+
+            return await db_engine.delete_data_by_id(data.__tablename__, data.id)
+    except SQLAlchemyError as e:
+        # Log the error here
+        return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def delete_data(data: Data):
db_engine = get_relational_engine()
return await db_engine.delete_data_by_id(data.__tablename__, data.id)
from typing import Optional
from sqlalchemy.exc import SQLAlchemyError
async def delete_data(data: Data) -> Optional[bool]:
"""Delete a data record from the database.
Args:
data (Data): The data object to be deleted.
Returns:
Optional[bool]: True if deletion was successful, False if record not found,
None if operation failed.
Raises:
ValueError: If the data object is invalid.
"""
if not data or not data.id:
raise ValueError("Invalid data object")
db_engine = get_relational_engine()
try:
async with db_engine.begin() as transaction:
# Validate table name against known models to prevent SQL injection
if not hasattr(data, '__tablename__') or not isinstance(data.__tablename__, str):
raise ValueError("Invalid table name")
return await db_engine.delete_data_by_id(data.__tablename__, data.id)
except SQLAlchemyError as e:
# Log the error here
return None

Comment on lines 6 to 12
async def get_data(user_id: UUID, data_id: UUID) -> Optional[Data]:
db_engine = get_relational_engine()

async with db_engine.get_async_session() as session:
data = await session.get(Data, data_id)

return data No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add authorization check and error handling.

Several improvements are needed for production readiness:

  1. The user_id parameter is not used for authorization checks
  2. Missing error handling for database operations
  3. Missing function documentation

Here's a suggested implementation:

 async def get_data(user_id: UUID, data_id: UUID) -> Optional[Data]:
+    """Retrieve data by ID with user authorization check.
+
+    Args:
+        user_id (UUID): ID of the user requesting the data
+        data_id (UUID): ID of the data to retrieve
+
+    Returns:
+        Optional[Data]: The requested data object if found and authorized, None otherwise
+
+    Raises:
+        PermissionError: If user is not authorized to access the data
+        SQLAlchemyError: If database operation fails
+    """
     db_engine = get_relational_engine()

     async with db_engine.get_async_session() as session:
-        data = await session.get(Data, data_id)
-
-        return data
+        try:
+            data = await session.get(Data, data_id)
+            if data is None:
+                return None
+            
+            # Add authorization check
+            if data.user_id != user_id:
+                raise PermissionError("User not authorized to access this data")
+            
+            return data
+        except SQLAlchemyError as e:
+            # Log the error here
+            raise

Committable suggestion was skipped due to low confidence.

Comment on lines 20 to 29
class ErrorResponseDTO(BaseModel):
message: str

class DatasetDTO(OutDTO):
id: UUID
name: str
created_at: datetime
updated_at: Optional[datetime]
owner_id: UUID

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Group DTO classes together for better organization.

Consider moving the DataDTO class definition next to the other DTO classes at the top of the file for better code organization and maintainability.

 class DatasetDTO(OutDTO):
     id: UUID
     name: str
     created_at: datetime
     updated_at: Optional[datetime]
     owner_id: UUID

+class DataDTO(OutDTO):
+    id: UUID
+    name: str
+    created_at: datetime
+    updated_at: Optional[datetime]
+    extension: str
+    mime_type: str
+    raw_data_location: str
+
 @router.get("/", response_model=list[DatasetDTO])

Also applies to: 99-107

Comment on lines +137 to +141
except Exception as error:
return JSONResponse(
status_code=409,
content={"error": str(error)}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Maintain consistent error handling across endpoints.

For consistency with other endpoints, use HTTPException instead of JSONResponse for error handling.

-            return JSONResponse(
-                status_code=409,
-                content={"error": str(error)}
-            )
+            logger.error(f"Error retrieving dataset status: {str(error)}")
+            raise HTTPException(
+                status_code=409,
+                detail=str(error)
+            ) from error
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as error:
return JSONResponse(
status_code=409,
content={"error": str(error)}
)
except Exception as error:
logger.error(f"Error retrieving dataset status: {str(error)}")
raise HTTPException(
status_code=409,
detail=str(error)
) from error

Comment on lines +93 to +97
except:
return JSONResponse(
status_code=409,
content="Graphistry credentials are not set. Please set them in your .env file.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace bare except with specific exception handling.

Using bare except clauses can mask important errors and make debugging difficult.

             return JSONResponse(
                 status_code=200,
                 content=str(graph_url),
             )
-        except:
+        except (ConnectionError, ValueError) as error:
+            logger.error(f"Error generating graph: {str(error)}")
             return JSONResponse(
                 status_code=409,
                 content="Graphistry credentials are not set. Please set them in your .env file.",
             )

Committable suggestion was skipped due to low confidence.

🧰 Tools
🪛 Ruff

93-93: Do not use bare except

(E722)

Added cascade deletion so when a dataset or data is deleted the connection in the dataset_data table is also deleted

Feature #COG-455
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
cognee/modules/data/models/Data.py (1)

Line range hint 37-38: Inconsistency in datasets serialization.

The commented-out datasets serialization in to_json() suggests potential inconsistency with the lazy="noload" setting. If datasets are needed in the JSON representation, you should either:

  1. Remove the lazy="noload" setting, or
  2. Implement explicit loading when datasets are needed

Consider one of these approaches:

- # "datasets": [dataset.to_json() for dataset in self.datasets]
+ "datasets": [dataset.to_json() for dataset in self.datasets] if self.datasets else []

Or if datasets should never be included:

- # "datasets": [dataset.to_json() for dataset in self.datasets]
🧰 Tools
🪛 Ruff

22-22: Undefined name Dataset

(F821)

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

96-98: Consider moving SQLite PRAGMA to session initialization.

The foreign key PRAGMA setting should ideally be set when initializing the SQLite connection/session to ensure consistent behavior across all operations.

Consider moving this to the session creation in get_async_session method:

@asynccontextmanager
async def get_async_session(self) -> AsyncGenerator[AsyncSession, None]:
    async_session_maker = self.sessionmaker
    async with async_session_maker() as session:
        try:
            if self.engine.dialect.name == "sqlite":
                await session.execute(text("PRAGMA foreign_keys = ON;"))
            yield session
        finally:
            await session.close()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between eca1b9f and de1ba5c.

📒 Files selected for processing (4)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1 hunks)
  • cognee/modules/data/models/Data.py (1 hunks)
  • cognee/modules/data/models/Dataset.py (1 hunks)
  • cognee/modules/data/models/DatasetData.py (1 hunks)
🔇 Additional comments (6)
cognee/modules/data/models/DatasetData.py (2)

10-11: LGTM! The CASCADE delete behavior is properly implemented.

The addition of ondelete="CASCADE" to both foreign key constraints ensures proper cleanup of junction table records when either datasets or data are deleted, maintaining referential integrity.


10-11: Verify the cascading delete behavior impact.

The addition of CASCADE delete behavior is a significant change that could affect data retention and recovery. Please ensure:

  1. The application code properly handles cascading deletes
  2. There are no audit/compliance requirements for retaining relationship data
  3. Users are aware that deleting datasets or data will remove all associated relationships

Let's verify the cascade configuration across related models:

✅ Verification successful

Let me gather more information about the relationships and cascade configurations:


Cascade delete configuration is appropriate and safe

The verification shows that:

  1. The cascade delete is properly configured in a many-to-many relationship between Dataset and Data models through the DatasetData junction table
  2. Both Dataset and Data models maintain their relationships using the same junction table with appropriate bidirectional references
  3. The application has proper database transaction handling with commit operations
  4. No specific audit requirements or logging mechanisms are tied to deletion operations except for basic error logging
  5. The cascade delete is limited to just the junction table entries, preserving the actual Data and Dataset records unless explicitly deleted

The CASCADE behavior only affects the relationship records in the junction table, which is the expected and safe behavior for many-to-many relationships.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent cascade behavior across related models

# Test 1: Verify cascade configuration in Dataset model
echo "Checking Dataset model configuration..."
rg -A 5 "class Dataset.*Base" 

# Test 2: Verify cascade configuration in Data model
echo "Checking Data model configuration..."
rg -A 5 "class Data.*Base"

# Test 3: Look for any existing delete operations that might need review
echo "Checking delete operations..."
rg -B 2 -A 2 "\.delete\(\)"

Length of output: 4181


Script:

#!/bin/bash
# Check for relationship configurations and cascade behaviors

# Look for relationship definitions and cascade configurations
echo "Checking relationship definitions..."
rg -B 2 -A 2 "relationship\(" 

# Check for any data integrity constraints or hooks
echo "Checking for data integrity hooks..."
rg -B 2 -A 2 "@event\.listens_for.*delete"

# Check for any audit or logging mechanisms
echo "Checking for audit/logging mechanisms..."
rg -B 2 -A 2 "(audit|log|track).*delete"

# Look for any documentation about data retention
echo "Checking documentation..."
fd README.md | xargs rg -i "(retention|cascade|delete)"

Length of output: 4962

cognee/modules/data/models/Dataset.py (2)

22-22: LGTM: Explicit relationship target class name.

Using the explicit string "Data" as the relationship target improves code clarity and helps prevent circular import issues.


22-26: Verify cascading delete behavior implications.

The addition of cascade="all, delete" means that deleting a Dataset will trigger deletion of the relationship in the join table. Please ensure this matches your data lifecycle requirements and consider the following:

  1. Document this behavior in comments for future maintainers
  2. Add appropriate warnings in the deletion endpoints
  3. Consider adding a soft delete mechanism if data recovery might be needed

Let's check for any existing delete operations that might be affected:

cognee/modules/data/models/Data.py (1)

23-27: Verify cascade deletion behavior in many-to-many relationship.

The addition of cascade="all, delete" in a many-to-many relationship requires careful consideration:

  1. This will delete all associated Dataset records when a Data record is deleted
  2. In a many-to-many context, this might have unintended consequences if Datasets should persist independently

Let's verify the relationship configuration across related models:

✅ Verification successful

Let me check the relationship definition in the Dataset model to see if it's properly configured for this many-to-many relationship.


Let me try a different approach to find the relationship definitions using ast-grep, which might be more reliable for finding these SQLAlchemy patterns.


The cascade configuration is correctly implemented for the many-to-many relationship

The cascade behavior is properly configured:

  1. Both Dataset and Data models have matching relationship configurations with cascade="all, delete"
  2. The join table DatasetData has ondelete="CASCADE" on both foreign keys
  3. The relationship is symmetrically defined with correct back_populates references

This setup ensures:

  • Deleting a Data record will only remove the associations in the join table
  • The actual Dataset records will be preserved
  • No orphaned records will be left in the join table
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check relationship definitions in related models
echo "Checking Dataset model relationship configuration:"
rg -A 10 "class Dataset" cognee/modules/data/models/
echo "Checking DatasetData model configuration:"
rg -A 10 "class DatasetData" cognee/modules/data/models/

Length of output: 2394


Script:

#!/bin/bash
# Check relationship definitions in Dataset model
echo "Checking Dataset model relationships:"
rg -B 5 -A 15 "data = relationship" cognee/modules/data/models/Dataset.py

# Check Data model relationships for complete picture
echo -e "\nChecking Data model relationships:"
rg -B 5 -A 15 "datasets = relationship" cognee/modules/data/models/Data.py

Length of output: 322


Script:

#!/bin/bash
# Check relationship definitions in both models
echo "Checking Dataset model for relationship definitions:"
ast-grep --pattern 'relationship($$$)' cognee/modules/data/models/Dataset.py

echo -e "\nChecking Data model for relationship definitions:"
ast-grep --pattern 'relationship($$$)' cognee/modules/data/models/Data.py

# Also check the complete Dataset model class
echo -e "\nChecking complete Dataset model:"
cat cognee/modules/data/models/Dataset.py

Length of output: 2715

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

99-101: Verify cascade delete behavior across different dialects.

The delete operation might behave differently between SQLite and other dialects when dealing with related records. Ensure that the cascade delete behavior is consistent across different database backends.

Let's verify the foreign key relationships and cascade behavior:

Also applies to: 105-106

✅ Verification successful

Based on the search results, I can now generate the final response since I have enough information about the cascade delete behavior across different dialects:

Cascade delete behavior is properly handled across dialects

The codebase already implements proper cascade delete behavior:

  • For SQLite: Foreign key constraints with ON DELETE CASCADE are defined during table creation, as noted in the comments.
  • For other dialects (e.g., PostgreSQL): The CASCADE keyword is explicitly used in drop operations.
  • The DatasetData join table properly defines cascade delete with ondelete="CASCADE" in both foreign key constraints.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for foreign key relationships and cascade delete configurations

# Test 1: Search for foreign key definitions
echo "Checking foreign key definitions..."
rg -A 5 "ForeignKey|relationship" 

# Test 2: Search for cascade delete configurations
echo "Checking cascade delete configurations..."
rg -A 5 "ondelete|CASCADE"

Length of output: 58371

Comment on lines +92 to +107
if self.engine.dialect.name == "sqlite":
async with self.get_async_session() as session:
TableModel = await self.get_table(table_name, schema_name)

# Foreign key constraints are disabled by default in SQLite (for backwards compatibility),
# so must be enabled for each database connection/session separately.
await session.execute(text("PRAGMA foreign_keys = ON;"))

await session.execute(TableModel.delete().where(TableModel.c.id == data_id))
await session.commit()
else:
async with self.get_async_session() as session:
TableModel = await self.get_table(table_name, schema_name)
await session.execute(TableModel.delete().where(TableModel.c.id == data_id))
await session.commit()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor to reduce code duplication and improve error handling.

The current implementation has duplicated session management and delete logic. Consider refactoring for better maintainability.

Here's a suggested implementation:

async def delete_data_by_id(self, table_name: str, data_id: UUID, schema_name: Optional[str] = "public"):
    """
    Delete data in given table based on id. Table must have an id Column.
    """
    try:
        async with self.get_async_session() as session:
            TableModel = await self.get_table(table_name, schema_name)
            
            if self.engine.dialect.name == "sqlite":
                # Foreign key constraints are disabled by default in SQLite
                await session.execute(text("PRAGMA foreign_keys = ON;"))
            
            await session.execute(TableModel.delete().where(TableModel.c.id == data_id))
            await session.commit()
    except Exception as e:
        # Log the error and re-raise with a more specific message
        raise Exception(f"Failed to delete data with id {data_id} from table {table_name}: {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.engine.dialect.name == "sqlite":
async with self.get_async_session() as session:
TableModel = await self.get_table(table_name, schema_name)
# Foreign key constraints are disabled by default in SQLite (for backwards compatibility),
# so must be enabled for each database connection/session separately.
await session.execute(text("PRAGMA foreign_keys = ON;"))
await session.execute(TableModel.delete().where(TableModel.c.id == data_id))
await session.commit()
else:
async with self.get_async_session() as session:
TableModel = await self.get_table(table_name, schema_name)
await session.execute(TableModel.delete().where(TableModel.c.id == data_id))
await session.commit()
try:
async with self.get_async_session() as session:
TableModel = await self.get_table(table_name, schema_name)
if self.engine.dialect.name == "sqlite":
# Foreign key constraints are disabled by default in SQLite
await session.execute(text("PRAGMA foreign_keys = ON;"))
await session.execute(TableModel.delete().where(TableModel.c.id == data_id))
await session.commit()
except Exception as e:
# Log the error and re-raise with a more specific message
raise Exception(f"Failed to delete data with id {data_id} from table {table_name}: {str(e)}")

@dexters1 dexters1 self-assigned this Nov 1, 2024
Removed space to trigger github actions to test telemetry

Test

# Foreign key constraints are disabled by default in SQLite (for backwards compatibility),
# so must be enabled for each database connection/session separately.
await session.execute(text("PRAGMA foreign_keys = ON;"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this here? I would expect this would happen during migration/startup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be set for each new connection/session. It doesn't work if it's only done during migration/startup, I've tried.

Though some people use event listeners to execute the PRAGMA command for each new session/connection for this reason.

Resolved security concerns in endpoints regarding os

Fix #COG-334-structure-routing
Updated docs to functions related to getting and deleting Data model objects

Docs #COG-334
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (8)
cognee/modules/data/methods/get_dataset.py (2)

Line range hint 6-14: Good security check, but consider enhancing error handling.

The ownership validation is a good security measure. However, the function might benefit from more robust error handling.

Consider wrapping the database operations in a try-except block:

 async def get_dataset(user_id: UUID, dataset_id: UUID) -> Optional[Dataset]:
     db_engine = get_relational_engine()
 
     async with db_engine.get_async_session() as session:
+        try:
             dataset = await session.get(Dataset, dataset_id)
 
             if dataset and dataset.owner_id != user_id:
                 return None
 
             return dataset
+        except Exception as e:
+            # Handle specific exceptions (e.g., DatabaseError)
+            raise RuntimeError(f"Failed to retrieve dataset: {str(e)}")

Line range hint 6-14: Add logging for security auditing and debugging.

Consider adding logging to track access attempts and authorization failures.

Here's a suggested implementation:

+import logging
+
+logger = logging.getLogger(__name__)
+
 async def get_dataset(user_id: UUID, dataset_id: UUID) -> Optional[Dataset]:
     db_engine = get_relational_engine()
 
     async with db_engine.get_async_session() as session:
         dataset = await session.get(Dataset, dataset_id)
 
         if dataset and dataset.owner_id != user_id:
+            logger.warning(
+                f"Unauthorized access attempt to dataset {dataset_id} by user {user_id}"
+            )
             return None
 
+        if not dataset:
+            logger.info(f"Dataset {dataset_id} not found")
+        else:
+            logger.debug(f"Dataset {dataset_id} successfully retrieved by owner {user_id}")
         return dataset
cognee/modules/data/methods/get_data.py (1)

6-14: Documentation needs security and error handling details.

While the basic documentation is present, it should include information about error scenarios and security implications.

 async def get_data(data_id: UUID) -> Optional[Data]:
     """Retrieve data by ID.
 
         Args:
             data_id (UUID): ID of the data to retrieve
 
         Returns:
             Optional[Data]: The requested data object if found, None otherwise
+
+        Raises:
+            SQLAlchemyError: If database operation fails
+
+        Note:
+            This function provides no authorization checks. Ensure access control
+            is handled at the API route level.
     """
cognee/api/v1/settings/routers/get_settings_router.py (2)

1-8: Consider reorganizing imports for better readability.

Group imports in the following order:

  1. Standard library (typing)
  2. Third-party packages (fastapi)
  3. Local application imports (cognee.*)

Apply this diff:

+from typing import Union, Optional, Literal
+
 from fastapi import APIRouter, Depends
-from cognee.api.DTO import InDTO, OutDTO
-from typing import Union, Optional, Literal
+
+from cognee.api.DTO import InDTO, OutDTO
 from cognee.modules.users.methods import get_authenticated_user
-from fastapi import Depends
 from cognee.modules.users.models import User
 from cognee.modules.settings.get_settings import LLMConfig, VectorDBConfig

19-23: Simplify provider type annotations.

The Union[Literal] syntax is unnecessarily verbose. Literal can directly accept multiple values.

Apply this diff:

 class LLMConfigInputDTO(InDTO):
-    provider: Union[Literal["openai"], Literal["ollama"], Literal["anthropic"]]
+    provider: Literal["openai", "ollama", "anthropic"]
     model: str
     api_key: str

 class VectorDBConfigInputDTO(InDTO):
-    provider: Union[Literal["lancedb"], Literal["qdrant"], Literal["weaviate"], Literal["pgvector"]]
+    provider: Literal["lancedb", "qdrant", "weaviate", "pgvector"]
     url: str
     api_key: str

Also applies to: 24-27

cognee/api/v1/datasets/routers/get_datasets_router.py (3)

51-63: Add success response for dataset deletion.

Consider returning a success status code (204 No Content) after successful deletion to follow REST conventions.

         await delete_dataset(dataset)
+        return JSONResponse(status_code=204)

73-73: Address TODO comment regarding permission handling.

The TODO comment indicates a need to handle permission access differently. Consider implementing role-based access control (RBAC) or specific permission checks.

Would you like me to help design a permission handling system or create a GitHub issue to track this task?


163-174: Simplify data lookup logic.

The current implementation using list comprehension and length check can be simplified.

-        matching_data = [data for data in dataset_data if str(data.id) == data_id]
-
-        # Check if matching_data contains an element
-        if len(matching_data) == 0:
-            return JSONResponse(
-                status_code=404,
-                content={
-                    "detail": f"Data ({data_id}) not found in dataset ({dataset_id})."
-                }
-            )
-
-        data = matching_data[0]
+        data = next((data for data in dataset_data if str(data.id) == data_id), None)
+        if data is None:
+            raise HTTPException(
+                status_code=404,
+                detail=f"Data ({data_id}) not found in dataset ({dataset_id})."
+            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d5a220e and 7425409.

📒 Files selected for processing (8)
  • cognee/api/v1/add/routers/get_add_router.py (1 hunks)
  • cognee/api/v1/cognify/routers/get_cognify_router.py (1 hunks)
  • cognee/api/v1/datasets/routers/get_datasets_router.py (1 hunks)
  • cognee/api/v1/search/routers/get_search_router.py (1 hunks)
  • cognee/api/v1/settings/routers/get_settings_router.py (1 hunks)
  • cognee/modules/data/methods/delete_data.py (1 hunks)
  • cognee/modules/data/methods/get_data.py (1 hunks)
  • cognee/modules/data/methods/get_dataset.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • cognee/api/v1/add/routers/get_add_router.py
  • cognee/api/v1/cognify/routers/get_cognify_router.py
  • cognee/api/v1/search/routers/get_search_router.py
🧰 Additional context used
🪛 Ruff
cognee/api/v1/datasets/routers/get_datasets_router.py

103-103: Do not use bare except

(E722)

🔇 Additional comments (10)
cognee/modules/data/methods/get_dataset.py (1)

1-4: LGTM! Clean and well-organized imports.

The imports are properly organized and the addition of Optional type is appropriate for the new return type.

cognee/modules/data/methods/get_data.py (1)

6-20: ⚠️ Potential issue

Critical: Add error handling and consider security implications.

The current implementation has several critical issues:

  1. No error handling for database operations
  2. No user authorization check (previously present but now removed)
  3. No proper session cleanup in case of errors

Here's the suggested implementation:

-async def get_data(data_id: UUID) -> Optional[Data]:
+async def get_data(data_id: UUID, user_id: UUID = None) -> Optional[Data]:
     """Retrieve data by ID.
     
         Args:
             data_id (UUID): ID of the data to retrieve
+            user_id (UUID, optional): ID of the user requesting the data
     
         Returns:
             Optional[Data]: The requested data object if found, None otherwise
+
+        Raises:
+            SQLAlchemyError: If database operation fails
+
+        Note:
+            This function provides no authorization checks. Ensure access control
+            is handled at the API route level.
     """
     db_engine = get_relational_engine()

     async with db_engine.get_async_session() as session:
-        data = await session.get(Data, data_id)
-
-        return data
+        try:
+            data = await session.get(Data, data_id)
+            if data is None:
+                return None
+            
+            # If user_id is provided, perform authorization check
+            if user_id is not None and data.user_id != user_id:
+                return None
+            
+            return data
+        except SQLAlchemyError as e:
+            # Log the error here if you have a logging system
+            raise

Let's verify if this data access is properly secured at the API route level:

cognee/modules/data/methods/delete_data.py (2)

1-4: LGTM!

The imports are clean and well-organized. The empty line provides good visual separation.


17-19: Verify database engine initialization.

The database engine initialization could fail if the configuration is incorrect.

Let's verify the database engine initialization:

✅ Verification successful

Let me generate additional scripts to check the error handling and configuration:


Let me check one more aspect to understand the database engine implementation:


Let me check one final aspect about the database engine implementation:


Database engine initialization is properly handled

Based on the codebase analysis:

  • The delete_data_by_id operation is implemented with proper session management
  • Database engine is consistently used across the codebase with proper session handling
  • The engine implementation includes dialect-specific handling (e.g., SQLite support)
  • The operation is used within a well-structured data deletion workflow
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check database engine initialization and usage
# Expected: Find database configuration and error handling

# Check for database configuration
echo "Checking database configuration..."
rg -l "DATABASE|DB_" --type yaml --type env

# Check for database engine error handling
echo "Checking database engine error handling..."
ast-grep --pattern 'get_relational_engine()' -A 5

Length of output: 12910


Script:

#!/bin/bash
# Check the implementation of get_relational_engine and its error handling
ast-grep --pattern 'def get_relational_engine'

# Check for try-catch blocks around database operations
rg -A 3 "try:.*get_relational_engine" 

# Look for the engine configuration
rg -l "class.*Engine" 

Length of output: 600


Script:

#!/bin/bash
# Check the database engine implementation
rg -A 10 "class.*RelationalEngine|class.*DatabaseEngine"

# Check for database configuration files
fd -e yaml -e ini -e conf -e config -x cat {} \; 

Length of output: 6058


Script:

#!/bin/bash
# Check the database engine implementation in infrastructure
rg -A 5 "class RelationalEngine" cognee/infrastructure/
rg -A 5 "def delete_data_by_id" cognee/infrastructure/

Length of output: 885

cognee/api/v1/settings/routers/get_settings_router.py (1)

9-18: LGTM! Well-structured output DTOs.

The output DTOs are correctly defined with proper inheritance and composition.

cognee/api/v1/datasets/routers/get_datasets_router.py (5)

1-36: LGTM! Well-structured DTOs and imports.

The imports are logically organized, and the DTO classes follow a consistent pattern with proper inheritance from base classes.


40-49: LGTM! Well-implemented get_datasets endpoint.

Good error handling with proper logging and HTTP status codes.


82-86: ⚠️ Potential issue

Fix incorrect error message in data not found case.

The error message incorrectly refers to "Dataset" when it should refer to "Data".

         if data is None:
             raise HTTPException(
                 status_code=404,
-                detail=f"Dataset ({data_id}) not found."
+                detail=f"Data ({data_id}) not found."
             )

Likely invalid or redundant comment.


138-142: 🛠️ Refactor suggestion

Use consistent error handling with HTTPException.

Maintain consistency with other endpoints by using HTTPException for error handling.

         except Exception as error:
-            return JSONResponse(
-                status_code=409,
-                content={"error": str(error)}
-            )
+            logger.error(f"Error retrieving dataset status: {error}")
+            raise HTTPException(
+                status_code=409,
+                detail=str(error)
+            ) from error

Likely invalid or redundant comment.


150-156: 🛠️ Refactor suggestion

Use consistent error handling with HTTPException.

Maintain consistency by using HTTPException instead of JSONResponse.

         if dataset is None:
-            return JSONResponse(
-                status_code=404,
-                content={
-                    "detail": f"Dataset ({dataset_id}) not found."
-                }
-            )
+            raise HTTPException(
+                status_code=404,
+                detail=f"Dataset ({dataset_id}) not found."
+            )

Likely invalid or redundant comment.

Comment on lines +1 to +4
from uuid import UUID
from typing import Optional
from cognee.infrastructure.databases.relational import get_relational_engine
from ..models import Data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add SQLAlchemy error type import.

The imports look good, but since we'll need to handle database errors, we should add:

 from uuid import UUID
 from typing import Optional
+from sqlalchemy.exc import SQLAlchemyError
 from cognee.infrastructure.databases.relational import get_relational_engine
 from ..models import Data
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from uuid import UUID
from typing import Optional
from cognee.infrastructure.databases.relational import get_relational_engine
from ..models import Data
from uuid import UUID
from typing import Optional
from sqlalchemy.exc import SQLAlchemyError
from cognee.infrastructure.databases.relational import get_relational_engine
from ..models import Data

Comment on lines +5 to +13
async def delete_data(data: Data):
"""Delete a data record from the database.
Args:
data (Data): The data object to be deleted.
Raises:
ValueError: If the data object is invalid.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add return type hint and expand documentation.

The docstring is good but incomplete. The return value and its type should be documented.

Apply this diff:

-async def delete_data(data: Data):
+async def delete_data(data: Data) -> bool:
     """Delete a data record from the database.
 
        Args:
            data (Data): The data object to be deleted.
+
+       Returns:
+           bool: True if deletion was successful, False if record not found.
 
        Raises:
            ValueError: If the data object is invalid.
     """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def delete_data(data: Data):
"""Delete a data record from the database.
Args:
data (Data): The data object to be deleted.
Raises:
ValueError: If the data object is invalid.
"""
async def delete_data(data: Data) -> bool:
"""Delete a data record from the database.
Args:
data (Data): The data object to be deleted.
Returns:
bool: True if deletion was successful, False if record not found.
Raises:
ValueError: If the data object is invalid.
"""

Comment on lines +14 to +19
if not hasattr(data, '__tablename__'):
raise ValueError("The provided data object is missing the required '__tablename__' attribute.")

db_engine = get_relational_engine()

return await db_engine.delete_data_by_id(data.__tablename__, data.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance error handling and data validation.

The current implementation needs additional safeguards:

  1. Missing validation for data.id
  2. No error handling for database operations
  3. No transaction context for atomicity

Apply this diff to improve reliability:

     if not hasattr(data, '__tablename__'):
         raise ValueError("The provided data object is missing the required '__tablename__' attribute.")
 
+    if not data.id:
+        raise ValueError("The data object must have a valid id.")
+
     db_engine = get_relational_engine()
 
-    return await db_engine.delete_data_by_id(data.__tablename__, data.id)
+    try:
+        async with db_engine.begin() as conn:
+            return await db_engine.delete_data_by_id(data.__tablename__, data.id)
+    except Exception as e:
+        # Log the error here
+        raise ValueError(f"Failed to delete data: {str(e)}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not hasattr(data, '__tablename__'):
raise ValueError("The provided data object is missing the required '__tablename__' attribute.")
db_engine = get_relational_engine()
return await db_engine.delete_data_by_id(data.__tablename__, data.id)
if not hasattr(data, '__tablename__'):
raise ValueError("The provided data object is missing the required '__tablename__' attribute.")
if not data.id:
raise ValueError("The data object must have a valid id.")
db_engine = get_relational_engine()
try:
async with db_engine.begin() as conn:
return await db_engine.delete_data_by_id(data.__tablename__, data.id)
except Exception as e:
# Log the error here
raise ValueError(f"Failed to delete data: {str(e)}") from e

Comment on lines +19 to +31
class LLMConfigInputDTO(InDTO):
provider: Union[Literal["openai"], Literal["ollama"], Literal["anthropic"]]
model: str
api_key: str

class VectorDBConfigInputDTO(InDTO):
provider: Union[Literal["lancedb"], Literal["qdrant"], Literal["weaviate"], Literal["pgvector"]]
url: str
api_key: str

class SettingsPayloadDTO(InDTO):
llm: Optional[LLMConfigInputDTO] = None
vector_db: Optional[VectorDBConfigInputDTO] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add field validation for api_key and url.

Consider adding Pydantic field validators to ensure:

  • api_key meets minimum length/format requirements
  • url is properly formatted

Example implementation:

from pydantic import validator, HttpUrl

class LLMConfigInputDTO(InDTO):
    provider: Literal["openai", "ollama", "anthropic"]
    model: str
    api_key: str

    @validator('api_key')
    def validate_api_key(cls, v):
        if len(v) < 8:
            raise ValueError('API key must be at least 8 characters')
        return v

class VectorDBConfigInputDTO(InDTO):
    provider: Literal["lancedb", "qdrant", "weaviate", "pgvector"]
    url: HttpUrl  # This will automatically validate URL format
    api_key: str

    @validator('api_key')
    def validate_api_key(cls, v):
        if len(v) < 8:
            raise ValueError('API key must be at least 8 characters')
        return v

Comment on lines +36 to +40
@router.get("/", response_model=SettingsDTO)
async def get_settings(user: User = Depends(get_authenticated_user)):
from cognee.modules.settings import get_settings as get_cognee_settings
return get_cognee_settings()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling to GET endpoint.

The endpoint should handle potential errors from get_cognee_settings().

Apply this diff:

 @router.get("/", response_model=SettingsDTO)
 async def get_settings(user: User = Depends(get_authenticated_user)):
+    from fastapi import HTTPException
     from cognee.modules.settings import get_settings as get_cognee_settings
-    return get_cognee_settings()
+    try:
+        return get_cognee_settings()
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Failed to retrieve settings: {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@router.get("/", response_model=SettingsDTO)
async def get_settings(user: User = Depends(get_authenticated_user)):
from cognee.modules.settings import get_settings as get_cognee_settings
return get_cognee_settings()
@router.get("/", response_model=SettingsDTO)
async def get_settings(user: User = Depends(get_authenticated_user)):
from fastapi import HTTPException
from cognee.modules.settings import get_settings as get_cognee_settings
try:
return get_cognee_settings()
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to retrieve settings: {str(e)}")

Comment on lines +41 to +50
@router.post("/", response_model=None)
async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)):
from cognee.modules.settings import save_llm_config, save_vector_db_config

if new_settings.llm is not None:
await save_llm_config(new_settings.llm)

if new_settings.vector_db is not None:
await save_vector_db_config(new_settings.vector_db)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve POST endpoint implementation.

The endpoint needs the following improvements:

  1. Return a success response instead of None
  2. Add error handling
  3. Move imports to the top level

Apply this diff:

-@router.post("/", response_model=None)
+from fastapi import HTTPException
+from pydantic import BaseModel
+
+class SettingsResponse(BaseModel):
+    message: str
+
+@router.post("/", response_model=SettingsResponse)
 async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)):
-    from cognee.modules.settings import save_llm_config, save_vector_db_config
-
-    if new_settings.llm is not None:
-        await save_llm_config(new_settings.llm)
-
-    if new_settings.vector_db is not None:
-        await save_vector_db_config(new_settings.vector_db)
+    try:
+        if new_settings.llm is not None:
+            await save_llm_config(new_settings.llm)
+
+        if new_settings.vector_db is not None:
+            await save_vector_db_config(new_settings.vector_db)
+
+        return SettingsResponse(message="Settings updated successfully")
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Failed to save settings: {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@router.post("/", response_model=None)
async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)):
from cognee.modules.settings import save_llm_config, save_vector_db_config
if new_settings.llm is not None:
await save_llm_config(new_settings.llm)
if new_settings.vector_db is not None:
await save_vector_db_config(new_settings.vector_db)
from fastapi import HTTPException
from pydantic import BaseModel
class SettingsResponse(BaseModel):
message: str
@router.post("/", response_model=SettingsResponse)
async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)):
try:
if new_settings.llm is not None:
await save_llm_config(new_settings.llm)
if new_settings.vector_db is not None:
await save_vector_db_config(new_settings.vector_db)
return SettingsResponse(message="Settings updated successfully")
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to save settings: {str(e)}")

Comment on lines +116 to +120
if dataset is None:
return JSONResponse(
status_code=404,
content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Maintain consistent error handling pattern.

Use HTTPException consistently across the endpoint instead of mixing with JSONResponse and ErrorResponseDTO.

         if dataset is None:
-            return JSONResponse(
-                status_code=404,
-                content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
-            )
+            raise HTTPException(
+                status_code=404,
+                detail=f"Dataset ({dataset_id}) not found."
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if dataset is None:
return JSONResponse(
status_code=404,
content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
)
if dataset is None:
raise HTTPException(
status_code=404,
detail=f"Dataset ({dataset_id}) not found."
)

Added None as default value for datetime

Refactor #COG-334
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
cognee/api/v1/datasets/routers/get_datasets_router.py (1)

73-73: Address the TODO comment regarding permission handling.

The TODO comment indicates a need to handle permission access differently. This should be addressed to ensure proper security implementation.

Would you like me to help design and implement a more granular permission system for data access?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7425409 and bce6540.

📒 Files selected for processing (1)
  • cognee/api/v1/datasets/routers/get_datasets_router.py (1 hunks)
🧰 Additional context used
🪛 Ruff
cognee/api/v1/datasets/routers/get_datasets_router.py

103-103: Do not use bare except

(E722)

🔇 Additional comments (6)
cognee/api/v1/datasets/routers/get_datasets_router.py (6)

1-36: LGTM! Well-structured imports and DTO definitions.

The imports are logically organized, and the DTO classes follow consistent patterns with appropriate inheritance.


40-50: LGTM! Proper error handling and authentication.

The endpoint correctly handles authentication and includes appropriate error logging.


51-63: LGTM! Proper validation before deletion.

The endpoint correctly validates dataset existence and ownership before deletion.


82-86: ⚠️ Potential issue

Fix incorrect error message text.

The error message incorrectly mentions "Dataset" instead of "Data".

Apply this diff to fix the error message:

         if data is None:
             raise HTTPException(
                 status_code=404,
-                detail=f"Dataset ({data_id}) not found."
+                detail=f"Data ({data_id}) not found."
             )

Likely invalid or redundant comment.


116-120: 🛠️ Refactor suggestion

Maintain consistent error handling pattern.

The endpoint uses JSONResponse with ErrorResponseDTO instead of HTTPException.

Apply this diff to maintain consistency:

         if dataset is None:
-            return JSONResponse(
-                status_code=404,
-                content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
-            )
+            raise HTTPException(
+                status_code=404,
+                detail=f"Dataset ({dataset_id}) not found."
+            )

Likely invalid or redundant comment.


138-142: 🛠️ Refactor suggestion

Maintain consistent error handling pattern.

The endpoint uses JSONResponse instead of HTTPException for error handling.

Apply this diff to maintain consistency:

         except Exception as error:
-            return JSONResponse(
-                status_code=409,
-                content={"error": str(error)}
-            )
+            logger.error(f"Error retrieving dataset status: {str(error)}")
+            raise HTTPException(
+                status_code=409,
+                detail=str(error)
+            ) from error

Likely invalid or redundant comment.

Comment on lines +150 to +156
if dataset is None:
return JSONResponse(
status_code=404,
content={
"detail": f"Dataset ({dataset_id}) not found."
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Maintain consistent error handling pattern.

The endpoint uses JSONResponse in multiple places instead of HTTPException.

Apply this diff to maintain consistency:

         if dataset is None:
-            return JSONResponse(
-                status_code=404,
-                content={
-                    "detail": f"Dataset ({dataset_id}) not found."
-                }
-            )
+            raise HTTPException(
+                status_code=404,
+                detail=f"Dataset ({dataset_id}) not found."
+            )

         # ... other code ...

         if len(matching_data) == 0:
-            return JSONResponse(
-                status_code=404,
-                content={
-                    "detail": f"Data ({data_id}) not found in dataset ({dataset_id})."
-                }
-            )
+            raise HTTPException(
+                status_code=404,
+                detail=f"Data ({data_id}) not found in dataset ({dataset_id})."
+            )

Also applies to: 167-172

Comment on lines +95 to +107
try:
graph_client = await get_graph_engine()
graph_url = await render_graph(graph_client.graph)

return JSONResponse(
status_code=200,
content=str(graph_url),
)
except:
return JSONResponse(
status_code=409,
content="Graphistry credentials are not set. Please set them in your .env file.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling in graph generation.

The current implementation has two issues:

  1. Uses a bare except clause which could mask important errors
  2. Inconsistent error handling pattern (using JSONResponse instead of HTTPException)

Apply this diff to improve error handling:

         try:
             graph_client = await get_graph_engine()
             graph_url = await render_graph(graph_client.graph)
 
-            return JSONResponse(
-                status_code=200,
-                content=str(graph_url),
-            )
+            return str(graph_url)
-        except:
+        except (ConnectionError, ValueError) as error:
+            logger.error(f"Error generating graph: {str(error)}")
+            raise HTTPException(
+                status_code=409,
+                detail="Graphistry credentials are not set. Please set them in your .env file."
+            ) from error
-            return JSONResponse(
-                status_code=409,
-                content="Graphistry credentials are not set. Please set them in your .env file.",
-            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
graph_client = await get_graph_engine()
graph_url = await render_graph(graph_client.graph)
return JSONResponse(
status_code=200,
content=str(graph_url),
)
except:
return JSONResponse(
status_code=409,
content="Graphistry credentials are not set. Please set them in your .env file.",
)
try:
graph_client = await get_graph_engine()
graph_url = await render_graph(graph_client.graph)
return str(graph_url)
except (ConnectionError, ValueError) as error:
logger.error(f"Error generating graph: {str(error)}")
raise HTTPException(
status_code=409,
detail="Graphistry credentials are not set. Please set them in your .env file."
) from error
🧰 Tools
🪛 Ruff

103-103: Do not use bare except

(E722)

@dexters1 dexters1 merged commit 027773f into main Nov 6, 2024
25 of 26 checks passed
@borisarzentar borisarzentar deleted the COG-334-structure-routing branch November 6, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants