-
Notifications
You must be signed in to change notification settings - Fork 957
Cog 334 structure routing #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added routers for search and settings endpoints Refactor #COG-334
…outer Moved permissions router in it's routers folder, added cognify router Refactor #COG-334
Added routers for datasets and add endpoints Refactor #COG-334
WalkthroughThe changes in this pull request involve a significant restructuring of the API routes within the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Changed class order in settings router to allow proper work of settings Refactor #COG-334
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 21
🧹 Outside diff range and nitpick comments (6)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)
1-8: Consider optimizing imports for better type safety.The imports could be improved for better type safety and organization:
Listcan be imported directly frompydantic.typessince we're using it with Pydantic models- Consider adding a return type hint for the router factory function by importing
APIRouteras a type-from fastapi import APIRouter -from typing import List +from fastapi import APIRouter, Depends +from fastapi.responses import JSONResponse +from pydantic import BaseModel +from pydantic.types import List -from pydantic import BaseModel -from cognee.modules.users.models import User -from fastapi.responses import JSONResponse +from cognee.modules.users.models import User -from fastapi import Dependscognee/api/v1/search/routers/get_search_router.py (2)
18-18: Move import statement to the module level.The import statement for
cognee_searchshould be moved to the top of the file with other imports for better maintainability and to follow Python's style guidelines.from cognee.api.v1.search import SearchType +from cognee.api.v1.search import search as cognee_search from fastapi.responses import JSONResponse ... @router.post("/", response_model = list) async def search(payload: SearchPayloadDTO, user: User = Depends(get_authenticated_user)): """ This endpoint is responsible for searching for nodes in the graph.""" - from cognee.api.v1.search import search as cognee_search
17-17: Enhance endpoint documentation.The current docstring lacks important details about the endpoint's behavior, parameters, and possible responses.
- """ This endpoint is responsible for searching for nodes in the graph.""" + """Search for nodes in the graph based on the provided query and search type. + + Args: + payload (SearchPayloadDTO): The search parameters including type and query + user (User): The authenticated user performing the search + + Returns: + list: A list of matching search results + + Raises: + HTTPException: If the search fails or parameters are invalid + """cognee/api/v1/settings/routers/get_settings_router.py (2)
29-29: Simplify type annotations forproviderusingLiteralThe type annotations for
providercan be simplified by passing multiple arguments toLiteraldirectly, removing the unnecessaryUnion.Apply this diff:
-class LLMConfigInputDTO(InDTO): provider: Union[Literal["openai"], Literal["ollama"], Literal["anthropic"]] model: str api_key: SecretStr +class LLMConfigInputDTO(InDTO): + provider: Literal["openai", "ollama", "anthropic"] model: str api_key: SecretStr -class VectorDBConfigInputDTO(InDTO): provider: Union[Literal["lancedb"], Literal["qdrant"], Literal["weaviate"], Literal["pgvector"]] url: str api_key: SecretStr +class VectorDBConfigInputDTO(InDTO): + provider: Literal["lancedb", "qdrant", "weaviate", "pgvector"] url: str api_key: SecretStrAlso applies to: 34-34
24-24: Prefix unuseduserparameters with an underscoreThe
userparameter is not used in theget_settingsandsave_settingsfunctions. To indicate that it is intentionally unused and to adhere to Python conventions, consider renaming it to_user.Apply this diff:
async def get_settings(user: User = Depends(get_authenticated_user)): +async def get_settings(_user: User = Depends(get_authenticated_user)): from cognee.modules.settings import get_settings as get_cognee_settings return get_cognee_settings() async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)): +async def save_settings(new_settings: SettingsPayloadDTO, _user: User = Depends(get_authenticated_user)): from cognee.modules.settings import save_llm_config, save_vector_db_config if new_settings.llm is not None:Also applies to: 43-43
cognee/api/v1/datasets/routers/get_datasets_router.py (1)
126-131: UseHTTPExceptionfor consistent error handling.At lines 126-131, you're returning a
JSONResponsewith a 404 status code and an error detail in the content. To maintain consistency across your API and leverage FastAPI's built-in exception handling, consider usingraise HTTPExceptioninstead.Apply this diff for consistency:
if dataset is None: - return JSONResponse( - status_code=404, - content={ - "detail": f"Dataset ({dataset_id}) not found." - } - ) + raise HTTPException( + status_code=404, + detail=f"Dataset ({dataset_id}) not found." + )This approach ensures that error responses are handled uniformly throughout your API.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (12)
- cognee/api/client.py (3 hunks)
- cognee/api/v1/add/routers/init.py (1 hunks)
- cognee/api/v1/add/routers/get_add_router.py (1 hunks)
- cognee/api/v1/cognify/routers/init.py (1 hunks)
- cognee/api/v1/cognify/routers/get_cognify_router.py (1 hunks)
- cognee/api/v1/datasets/routers/init.py (1 hunks)
- cognee/api/v1/datasets/routers/get_datasets_router.py (1 hunks)
- cognee/api/v1/permissions/routers/init.py (1 hunks)
- cognee/api/v1/search/routers/init.py (1 hunks)
- cognee/api/v1/search/routers/get_search_router.py (1 hunks)
- cognee/api/v1/settings/routers/init.py (1 hunks)
- cognee/api/v1/settings/routers/get_settings_router.py (1 hunks)
✅ Files skipped from review due to trivial changes (6)
- cognee/api/v1/add/routers/init.py
- cognee/api/v1/cognify/routers/init.py
- cognee/api/v1/datasets/routers/init.py
- cognee/api/v1/permissions/routers/init.py
- cognee/api/v1/search/routers/init.py
- cognee/api/v1/settings/routers/init.py
🧰 Additional context used
🪛 Ruff
cognee/api/client.py
55-55: Module level import not at top of file
(E402)
56-56: Module level import not at top of file
(E402)
57-57: Module level import not at top of file
(E402)
58-58: Module level import not at top of file
(E402)
59-59: Module level import not at top of file
(E402)
60-60: Module level import not at top of file
(E402)
cognee/api/v1/datasets/routers/get_datasets_router.py
68-68: Do not use bare
except(E722)
🔇 Additional comments (12)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)
9-27: Verify router integration in the main application.Let's ensure this router is properly integrated into the main FastAPI application structure.
✅ Verification successful
Router integration is properly implemented
The verification shows that:
- The router is correctly imported and mounted in
client.pywith the appropriate prefix/api/v1/cognify- The router is properly exported in
__init__.py- No path conflicts exist with other routers as they all have unique prefixes
- The route pattern
POST /is used consistently across different routers, each under their own unique prefix🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if the router is properly imported and mounted in the main app rg -l "get_cognify_router" --type py # Check for potential duplicate route definitions ast-grep --pattern 'router.post("/", $$$)'Length of output: 582
Script:
#!/bin/bash # Check how the router is imported and mounted in client.py rg -A 5 "get_cognify_router" cognee/api/client.py # Check the router's registration in __init__.py cat cognee/api/v1/cognify/routers/__init__.py # Check for any potential path conflicts in route registration rg "prefix=" --type pyLength of output: 818
cognee/api/v1/search/routers/get_search_router.py (1)
21-21: Verify search implementation compatibility.Let's verify that the
cognee_searchfunction exists with the expected signature.✅ Verification successful
Let me gather more information to verify the search implementation.
Let me check the search implementation in the imported module.
Search implementation is compatible and correctly integrated
The
cognee_searchfunction (aliased fromsearch) is properly implemented insearch_v2.pywith the exact signature required:
search_type: SearchTypequery: struser: User = NoneThe function is correctly imported in
__init__.pyand used appropriately in the router.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for the cognee_search function definition ast-grep --pattern 'async def search($params) { $$$ }' # Look for other usages of the search function rg -A 5 'cognee_search\('Length of output: 601
Script:
#!/bin/bash # Search for the cognee_search function definition with imports rg -p "cognee_search" --type py # Look for function definition ast-grep --pattern 'async def cognee_search($params) { $$$ }' # Check imports in the router file rg "^from.*import.*cognee_search" --type pyLength of output: 348
Script:
#!/bin/bash # Check the search implementation in the module rg -p "async def search" cognee/api/v1/search/ # Look for the search function with its parameters ast-grep --pattern 'async def search($search_type, $query, $user) { $$$ }' # Check the __init__.py in the search module cat cognee/api/v1/search/__init__.pyLength of output: 605
cognee/api/v1/settings/routers/get_settings_router.py (1)
31-31: UseSecretStrforapi_keyfields to protect sensitive dataTo enhance security, use
SecretStrfrompydanticfor theapi_keyfields in bothLLMConfigInputDTOandVectorDBConfigInputDTO. This prevents the API keys from being accidentally logged or displayed.[security]
Apply this diff:
+from pydantic import SecretStr class LLMConfigInputDTO(InDTO): provider: Literal["openai", "ollama", "anthropic"] model: str - api_key: str + api_key: SecretStr class VectorDBConfigInputDTO(InDTO): provider: Literal["lancedb", "qdrant", "weaviate", "pgvector"] url: str - api_key: str + api_key: SecretStrAlso applies to: 36-36
cognee/api/v1/add/routers/get_add_router.py (2)
45-49: Inconsistent Use ofuserParameter incognee_addCallsIn the call to
cognee_addon lines 45-49, theuserparameter is passed, but in previous calls (lines 28-31 and 40-43),useris not included. This inconsistency might lead to unexpected behaviors.Ensure that the
userparameter is consistently passed tocognee_addif required:# For the call on lines 28-31: await cognee_add( "data://.data/", f"{repo_name}", + user=user, ) # For the call on lines 40-43: await cognee_add( "data://.data/", f"{filename}", + user=user, )
51-54: 🛠️ Refactor suggestionBroad Exception Handling
Catching all exceptions with a generic
except Exceptioncan make debugging difficult and may suppress important errors.Catch specific exceptions to handle known error cases and allow unexpected exceptions to propagate:
- except Exception as error: + except SomeSpecificException as error:Replace
SomeSpecificExceptionwith the specific exceptions you expect (e.g.,aiohttp.ClientError,subprocess.CalledProcessError).Likely invalid or redundant comment.
cognee/api/client.py (6)
6-7: Importing FastAPI componentsThe import statements for
FastAPI,JSONResponse, andResponseare correctly added and necessary for the application.
130-134: Including datasets routerThe
get_datasets_routeris correctly included, promoting a modular API structure.
136-140: Including add routerThe
get_add_routeris properly included with the appropriate prefix and tags, enhancing the API's modularity.
142-146: Including cognify routerThe
get_cognify_routeris correctly included, which promotes better separation of concerns within the API.
148-152: Including search routerThe
get_search_routeris properly added with the correct prefix and tags, contributing to a more organized API.
154-158: Including settings routerThe
get_settings_routeris correctly included, aligning with the modular architecture and enhancing maintainability.cognee/api/v1/datasets/routers/get_datasets_router.py (1)
147-147:⚠️ Potential issueEnsure secure handling of file paths when returning raw data.
At line 147, the endpoint returns
data.raw_data_locationdirectly as aFileResponse. It's crucial to ensure thatraw_data_locationis a safe and validated file path to prevent potential security vulnerabilities like directory traversal attacks. Serving files without proper validation can expose sensitive files or allow unauthorized access.Consider adding validation to ensure
raw_data_locationpoints to an allowed directory and sanitizing the input if necessary. For example:import os # Ensure the file path is within the allowed directory allowed_directory = '/path/to/allowed/directory' full_path = os.path.abspath(data.raw_data_location) if not full_path.startswith(os.path.abspath(allowed_directory)): raise HTTPException(status_code=403, detail="Access to the requested file is forbidden.") return FileResponse(full_path)This approach helps prevent unauthorized file access by confirming that the requested file resides within a permitted directory.
| def get_cognify_router(): | ||
| router = APIRouter() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider restructuring for better maintainability and type safety.
The current nested structure could be improved:
- Move
CognifyPayloadDTOoutside the factory function for better reusability and testing - Add return type hint for better type safety
+class CognifyPayloadDTO(BaseModel):
+ datasets: List[str]
+
-def get_cognify_router():
+def get_cognify_router() -> APIRouter:
router = APIRouter()
-
- class CognifyPayloadDTO(BaseModel):
- datasets: List[str]Also applies to: 27-27
| @router.post("/", response_model=None) | ||
| async def cognify(payload: CognifyPayloadDTO, user: User = Depends(get_authenticated_user)): | ||
| """ This endpoint is responsible for the cognitive processing of the content.""" | ||
| from cognee.api.v1.cognify.cognify_v2 import cognify as cognee_cognify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve endpoint definition and imports.
Several improvements could enhance the endpoint's clarity and reliability:
- Use a more descriptive path than "/"
- Move the cognify import to the top level
- Define a proper response model instead of
None
-@router.post("/", response_model=None)
+@router.post("/process", response_model=dict)
async def cognify(payload: CognifyPayloadDTO, user: User = Depends(get_authenticated_user)):
""" This endpoint is responsible for the cognitive processing of the content."""
- from cognee.api.v1.cognify.cognify_v2 import cognify as cognee_cognifyMove to top-level imports:
from cognee.api.v1.cognify.cognify_v2 import cognify as cognee_cognify| try: | ||
| await cognee_cognify(payload.datasets, user) | ||
| except Exception as error: | ||
| return JSONResponse( | ||
| status_code=409, | ||
| content={"error": str(error)} | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling specificity and security.
The current error handling implementation has several concerns:
- Generic exception catching could mask specific errors
- Using 409 (Conflict) for all errors is not RESTful
- Directly exposing exception messages could leak sensitive information
Consider implementing more specific error handling:
- try:
- await cognee_cognify(payload.datasets, user)
- except Exception as error:
- return JSONResponse(
- status_code=409,
- content={"error": str(error)}
- )
+ try:
+ await cognee_cognify(payload.datasets, user)
+ except ValueError as error:
+ return JSONResponse(
+ status_code=400,
+ content={"error": "Invalid input provided"}
+ )
+ except PermissionError:
+ return JSONResponse(
+ status_code=403,
+ content={"error": "Insufficient permissions"}
+ )
+ except Exception as error:
+ # Log the actual error for debugging
+ logger.error(f"Unexpected error in cognify: {error}")
+ return JSONResponse(
+ status_code=500,
+ content={"error": "An unexpected error occurred"}
+ )Committable suggestion was skipped due to low confidence.
| search_type: SearchType | ||
| query: str | ||
|
|
||
| @router.post("/", response_model = list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Specify the response model type.
The response_model is set to list without type information. Consider creating a response DTO to properly document the structure of the returned data.
from pydantic import BaseModel
class SearchResultDTO(BaseModel):
# Add appropriate fields based on the actual search results
id: str
title: str
# ... other fields
@router.post("/", response_model=list[SearchResultDTO])| class SearchPayloadDTO(InDTO): | ||
| search_type: SearchType | ||
| query: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Move DTO class to module level.
The SearchPayloadDTO class should be defined at the module level rather than nested inside the router factory function. This improves reusability and follows standard Python practices.
from cognee.api.v1.search import SearchType
from fastapi.responses import JSONResponse
from cognee.modules.users.models import User
from fastapi import Depends, APIRouter
from cognee.api.DTO import InDTO
from cognee.modules.users.methods import get_authenticated_user
+class SearchPayloadDTO(InDTO):
+ search_type: SearchType
+ query: str
+
def get_search_router():
router = APIRouter()
- class SearchPayloadDTO(InDTO):
- search_type: SearchType
- query: str📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class SearchPayloadDTO(InDTO): | |
| search_type: SearchType | |
| query: str | |
| from cognee.api.v1.search import SearchType | |
| from fastapi.responses import JSONResponse | |
| from cognee.modules.users.models import User | |
| from fastapi import Depends, APIRouter | |
| from cognee.api.DTO import InDTO | |
| from cognee.modules.users.methods import get_authenticated_user | |
| class SearchPayloadDTO(InDTO): | |
| search_type: SearchType | |
| query: str | |
| def get_search_router(): | |
| router = APIRouter() |
| if dataset_data is None: | ||
| raise HTTPException(status_code=404, detail=f"Dataset ({dataset_id}) not found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct error message when no data is found in the dataset.
At lines 134-135, when dataset_data is None, you raise an HTTPException indicating that the dataset was not found. However, the dataset was successfully retrieved earlier; it's the data within the dataset that's missing. This could lead to confusion for the user.
Apply this diff to provide an accurate error message:
if dataset_data is None:
- raise HTTPException(status_code=404, detail=f"Dataset ({dataset_id}) not found.")
+ raise HTTPException(status_code=404, detail=f"No data found in dataset ({dataset_id}).")This change accurately reflects the issue, informing the user that the dataset exists but contains no data.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if dataset_data is None: | |
| raise HTTPException(status_code=404, detail=f"Dataset ({dataset_id}) not found.") | |
| if dataset_data is None: | |
| raise HTTPException(status_code=404, detail=f"No data found in dataset ({dataset_id}).") |
| data = [data for data in dataset_data if str(data.id) == data_id][0] | ||
|
|
||
| if data is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential IndexError when accessing data by ID.
At lines 137-139, you attempt to access the first element of a list comprehension without checking if the list is empty. If no matching data is found, this will raise an IndexError, resulting in an unhandled exception.
Apply this diff to safely handle the case where the data is not found:
- data = [data for data in dataset_data if str(data.id) == data_id][0]
-
- if data is None:
+ matching_data = [data for data in dataset_data if str(data.id) == data_id]
+ if not matching_data:
+ raise HTTPException(
+ status_code=404,
+ detail=f"Data ({data_id}) not found in dataset ({dataset_id})."
+ )
+ data = matching_data[0]This modification checks whether any matching data exists before attempting to access it, preventing a potential IndexError and providing a clear error message to the user if the data is not found.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| data = [data for data in dataset_data if str(data.id) == data_id][0] | |
| if data is None: | |
| matching_data = [data for data in dataset_data if str(data.id) == data_id] | |
| if not matching_data: | |
| raise HTTPException( | |
| status_code=404, | |
| detail=f"Data ({data_id}) not found in dataset ({dataset_id})." | |
| ) | |
| data = matching_data[0] |
| status_code=404, | ||
| content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."), | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistent error handling and proper serialization of responses.
At lines 92-95, you're returning a JSONResponse with content set to an instance of ErrorResponseDTO. Directly passing a Pydantic model instance to content may lead to improper serialization since JSONResponse expects a serializable data type like a dictionary. Additionally, this approach is inconsistent with error handling in other parts of your code where HTTPException is raised.
Apply this diff to raise an HTTPException for consistency:
if dataset is None:
- return JSONResponse(
- status_code=404,
- content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
- )
+ raise HTTPException(
+ status_code=404,
+ detail=f"Dataset ({dataset_id}) not found."
+ )This change leverages FastAPI's exception handling mechanisms and ensures consistent error responses across your API.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| status_code=404, | |
| content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."), | |
| ) | |
| raise HTTPException( | |
| status_code=404, | |
| detail=f"Dataset ({dataset_id}) not found." | |
| ) | |
| return JSONResponse( | ||
| status_code=409, | ||
| content={"error": str(error)} | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle exceptions more precisely and raise appropriate HTTP exceptions.
At lines 113-116, you catch all exceptions and return a JSONResponse with a 409 status code. Catching all exceptions can mask unexpected errors and make debugging more challenging. It's better to catch specific exceptions and raise an HTTPException to provide more informative error responses.
Apply this diff to improve exception handling:
try:
datasets_statuses = await cognee_datasets.get_status(datasets)
return datasets_statuses
- except Exception as error:
- return JSONResponse(
- status_code=409,
- content={"error": str(error)}
- )
+ except SpecificException as error:
+ logger.error(f"Error retrieving dataset statuses: {str(error)}")
+ raise HTTPException(
+ status_code=500,
+ detail=f"Error retrieving dataset statuses: {str(error)}"
+ ) from errorReplace SpecificException with the specific exception(s) you anticipate, such as DatabaseError, ConnectionError, or other relevant exceptions. This change enhances error transparency and maintains consistency.
Committable suggestion was skipped due to low confidence.
| status_code=200, | ||
| content=str(graph_url), | ||
| ) | ||
| except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using bare except statements.
At line 68, the code uses a bare except: clause. Using bare except statements is discouraged because it catches all exceptions, including system-exiting exceptions like KeyboardInterrupt and SystemExit, making debugging difficult and potentially masking unexpected errors. It's better to catch specific exceptions to handle anticipated error conditions.
Apply this diff to catch specific exceptions:
- except:
+ except SpecificException as error:Replace SpecificException with the specific exception(s) you expect, such as ConnectionError, RuntimeError, or any other relevant exceptions that might be raised in the try block.
Committable suggestion was skipped due to low confidence.
🧰 Tools
🪛 Ruff
68-68: Do not use bare
except(E722)
Add ability to delete data instance through endpoint Feature #COG-455
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (6)
cognee/modules/data/methods/delete_dataset.py (1)
Line range hint 1-7: Consider adding error handling and logging.
The function would benefit from proper error handling and logging to track deletion operations and handle failures gracefully.
Consider implementing the following improvements:
from cognee.modules.data.models import Dataset
from cognee.infrastructure.databases.relational import get_relational_engine
+from logging import getLogger
+
+logger = getLogger(__name__)
async def delete_dataset(dataset: Dataset):
db_engine = get_relational_engine()
+ try:
+ logger.info(f"Deleting dataset with id: {dataset.id}")
+ result = await db_engine.delete_data_by_id(dataset.__tablename__, dataset.id)
+ logger.info(f"Successfully deleted dataset with id: {dataset.id}")
+ return result
+ except Exception as e:
+ logger.error(f"Failed to delete dataset with id: {dataset.id}. Error: {str(e)}")
+ raise
- return await db_engine.delete_data_by_id(dataset.__tablename__, dataset.id)cognee/modules/data/methods/delete_data.py (2)
1-4: Remove extra blank line
There's an unnecessary extra blank line between imports and the function definition. One blank line is sufficient according to PEP 8.
from cognee.modules.data.models import Data
from cognee.infrastructure.databases.relational import get_relational_engine
-
async def delete_data(data: Data):5-8: Consider adding audit logging
For data deletion operations, it's important to maintain an audit trail. Consider implementing logging to track:
- Who initiated the deletion
- When it occurred
- What data was deleted
- The reason for deletion (if applicable)
This will help with compliance requirements and debugging.
cognee/modules/data/methods/get_data.py (1)
9-10: Consider adding pagination for large datasets.
If this function might be used to retrieve large data objects, consider implementing pagination or streaming to prevent memory issues.
cognee/api/v1/datasets/routers/get_datasets_router.py (2)
41-54: Consider adding a success response.
The endpoint correctly handles authentication and errors, but consider returning a success message to improve API usability.
await delete_dataset(dataset)
+ return JSONResponse(
+ status_code=200,
+ content={"message": f"Dataset {dataset_id} successfully deleted"}
+ )63-63: Address the TODO comment regarding permission handling.
The comment indicates a need to handle permission checks differently. Consider implementing role-based access control (RBAC) for more granular permissions.
Would you like me to help design and implement a proper permission handling system?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- cognee/api/v1/datasets/routers/get_datasets_router.py (1 hunks)
- cognee/modules/data/methods/init.py (1 hunks)
- cognee/modules/data/methods/delete_data.py (1 hunks)
- cognee/modules/data/methods/delete_dataset.py (1 hunks)
- cognee/modules/data/methods/get_data.py (1 hunks)
🧰 Additional context used
🪛 Ruff
cognee/api/v1/datasets/routers/get_datasets_router.py
93-93: Do not use bare except
(E722)
🔇 Additional comments (9)
cognee/modules/data/methods/__init__.py (3)
9-9: Verify get_data function signature consistency.
Let's check the implementation of get_data to ensure it follows the same pattern as other getter functions:
#!/bin/bash
# Description: Check get_data implementation pattern
# Expected: Function should accept data_id parameter
ast-grep --pattern 'def get_data($$$):'13-13: Verify delete_data function signature consistency.
Let's check the implementation of delete_data to ensure it follows the same pattern as other deletion functions:
#!/bin/bash
# Description: Check delete_data implementation pattern
# Expected: Function should accept similar parameters as delete_dataset
ast-grep --pattern 'def delete_data($$$):'9-9: LGTM! Clean import organization.
The new imports maintain the existing organizational structure, with clear categorization into "Get" and "Delete" sections.
Let's verify the existence of the newly imported modules:
Also applies to: 13-13
✅ Verification successful
Import verification successful
The imported modules get_data.py and delete_data.py exist in the correct location with proper implementations:
get_data.pycontains the async functionget_data(user_id: UUID, data_id: UUID)delete_data.pycontains the async functiondelete_data(data: Data)
The imports follow the established pattern in the module, grouped with other similar data operations (get/delete).
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the imported modules exist
# Expected: Both files should exist in the same directory
# Check for the existence of the new modules
fd -t f "get_data.py|delete_data.py" cognee/modules/data/methods/
Length of output: 138
Script:
#!/bin/bash
# Let's try a different approach to verify the existence of these modules
# First, check if the files exist
ls -la cognee/modules/data/methods/get_data.py cognee/modules/data/methods/delete_data.py 2>/dev/null
# Also search for any Python files in the methods directory to understand the structure
ls -la cognee/modules/data/methods/*.py 2>/dev/null
# Use ripgrep to find any references to these functions
rg "get_data|delete_data" cognee/modules/data/methods/
Length of output: 3120
cognee/modules/data/methods/get_data.py (2)
1-4: LGTM! Clean and well-organized imports.
The imports are properly organized and include all necessary dependencies.
6-12: Verify database schema and indexes.
Ensure proper indexing exists on the data_id column for optimal query performance.
cognee/api/v1/datasets/routers/get_datasets_router.py (4)
1-19: LGTM: Imports and router setup are well-organized.
The imports are properly organized and the router setup follows FastAPI best practices.
30-40: LGTM: Proper error handling in get_datasets endpoint.
The endpoint correctly handles authentication and errors, with appropriate logging.
72-76:
Fix incorrect error message in data not found case.
The error message incorrectly refers to "Dataset" when it should refer to "Data".
if data is None:
raise HTTPException(
status_code=404,
- detail=f"Dataset ({data_id}) not found."
+ detail=f"Data ({data_id}) not found."
)Likely invalid or redundant comment.
162-170:
Fix unsafe list indexing and maintain consistent error handling.
The current implementation could raise an IndexError if no matching data is found. Also, maintain consistent error handling pattern.
- data = [data for data in dataset_data if str(data.id) == data_id][0]
-
- if data is None:
- return JSONResponse(
- status_code=404,
- content={
- "detail": f"Data ({data_id}) not found in dataset ({dataset_id})."
- }
- )
+ matching_data = [data for data in dataset_data if str(data.id) == data_id]
+ if not matching_data:
+ raise HTTPException(
+ status_code=404,
+ detail=f"Data ({data_id}) not found in dataset ({dataset_id})."
+ )
+ data = matching_data[0]Likely invalid or redundant comment.
| db_engine = get_relational_engine() | ||
|
|
||
| return await db_engine.delete_table(dataset.id) | ||
| return await db_engine.delete_data_by_id(dataset.__tablename__, dataset.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Warning: Related data cleanup needed for dataset deletion
The Dataset model has a many-to-many relationship with Data through the DatasetData join table. The current deletion method only removes the dataset record but doesn't handle:
- Associated records in the
dataset_datajoin table - The relationship with
Datamodel through the secondary table
Action items:
- Add cleanup of related records in the
dataset_datatable - Consider implementing cascading delete in the Dataset model's relationship definition
- Add transaction handling to ensure atomicity of the deletion operation
🔗 Analysis chain
Improved data safety with granular deletion.
The change from deleting an entire table to deleting a specific record is a significant improvement in data safety and granularity.
Let's verify if there are any cascading deletes or related data that needs cleanup:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for foreign key relationships to Dataset table
rg -l "ForeignKey.*Dataset"
# Search for direct references to Dataset model
ast-grep --pattern 'class $_ {
$$$
Dataset $$$
$$$
}'
Length of output: 91
Script:
#!/bin/bash
# Let's try different approaches to find relationships
# Search for Dataset model definition
rg -l "class Dataset"
# Search for any references to Dataset in models
rg -l "Dataset" --type py
# Look for table relationships or foreign keys with different patterns
rg -l "relationship.*Dataset|references.*dataset" --type py
# Search for any SQL or database operations mentioning dataset
rg -l "dataset.*foreign|cascade.*dataset" --type py -i
Length of output: 976
Script:
#!/bin/bash
# Let's examine the DatasetData model since it appears to have relationships
cat cognee/modules/data/models/DatasetData.py
# Also check the Dataset model definition
cat cognee/modules/data/models/Dataset.py
# Look for any cascade delete configurations
rg "cascade.*delete" --type py -A 2 -B 2
Length of output: 1784
| async def delete_data(data: Data): | ||
| db_engine = get_relational_engine() | ||
|
|
||
| return await db_engine.delete_data_by_id(data.__tablename__, data.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling, type hints, and documentation
The function needs several improvements for production readiness:
- Add return type hint and comprehensive docstring
- Implement error handling for database operations
- Consider using transaction context
- Validate table name to prevent SQL injection
Here's the suggested implementation:
-async def delete_data(data: Data):
- db_engine = get_relational_engine()
-
- return await db_engine.delete_data_by_id(data.__tablename__, data.id)
+from typing import Optional
+from sqlalchemy.exc import SQLAlchemyError
+
+async def delete_data(data: Data) -> Optional[bool]:
+ """Delete a data record from the database.
+
+ Args:
+ data (Data): The data object to be deleted.
+
+ Returns:
+ Optional[bool]: True if deletion was successful, False if record not found,
+ None if operation failed.
+
+ Raises:
+ ValueError: If the data object is invalid.
+ """
+ if not data or not data.id:
+ raise ValueError("Invalid data object")
+
+ db_engine = get_relational_engine()
+ try:
+ async with db_engine.begin() as transaction:
+ # Validate table name against known models to prevent SQL injection
+ if not hasattr(data, '__tablename__') or not isinstance(data.__tablename__, str):
+ raise ValueError("Invalid table name")
+
+ return await db_engine.delete_data_by_id(data.__tablename__, data.id)
+ except SQLAlchemyError as e:
+ # Log the error here
+ return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def delete_data(data: Data): | |
| db_engine = get_relational_engine() | |
| return await db_engine.delete_data_by_id(data.__tablename__, data.id) | |
| from typing import Optional | |
| from sqlalchemy.exc import SQLAlchemyError | |
| async def delete_data(data: Data) -> Optional[bool]: | |
| """Delete a data record from the database. | |
| Args: | |
| data (Data): The data object to be deleted. | |
| Returns: | |
| Optional[bool]: True if deletion was successful, False if record not found, | |
| None if operation failed. | |
| Raises: | |
| ValueError: If the data object is invalid. | |
| """ | |
| if not data or not data.id: | |
| raise ValueError("Invalid data object") | |
| db_engine = get_relational_engine() | |
| try: | |
| async with db_engine.begin() as transaction: | |
| # Validate table name against known models to prevent SQL injection | |
| if not hasattr(data, '__tablename__') or not isinstance(data.__tablename__, str): | |
| raise ValueError("Invalid table name") | |
| return await db_engine.delete_data_by_id(data.__tablename__, data.id) | |
| except SQLAlchemyError as e: | |
| # Log the error here | |
| return None |
| async def get_data(user_id: UUID, data_id: UUID) -> Optional[Data]: | ||
| db_engine = get_relational_engine() | ||
|
|
||
| async with db_engine.get_async_session() as session: | ||
| data = await session.get(Data, data_id) | ||
|
|
||
| return data No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add authorization check and error handling.
Several improvements are needed for production readiness:
- The
user_idparameter is not used for authorization checks - Missing error handling for database operations
- Missing function documentation
Here's a suggested implementation:
async def get_data(user_id: UUID, data_id: UUID) -> Optional[Data]:
+ """Retrieve data by ID with user authorization check.
+
+ Args:
+ user_id (UUID): ID of the user requesting the data
+ data_id (UUID): ID of the data to retrieve
+
+ Returns:
+ Optional[Data]: The requested data object if found and authorized, None otherwise
+
+ Raises:
+ PermissionError: If user is not authorized to access the data
+ SQLAlchemyError: If database operation fails
+ """
db_engine = get_relational_engine()
async with db_engine.get_async_session() as session:
- data = await session.get(Data, data_id)
-
- return data
+ try:
+ data = await session.get(Data, data_id)
+ if data is None:
+ return None
+
+ # Add authorization check
+ if data.user_id != user_id:
+ raise PermissionError("User not authorized to access this data")
+
+ return data
+ except SQLAlchemyError as e:
+ # Log the error here
+ raiseCommittable suggestion was skipped due to low confidence.
| class ErrorResponseDTO(BaseModel): | ||
| message: str | ||
|
|
||
| class DatasetDTO(OutDTO): | ||
| id: UUID | ||
| name: str | ||
| created_at: datetime | ||
| updated_at: Optional[datetime] | ||
| owner_id: UUID | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Group DTO classes together for better organization.
Consider moving the DataDTO class definition next to the other DTO classes at the top of the file for better code organization and maintainability.
class DatasetDTO(OutDTO):
id: UUID
name: str
created_at: datetime
updated_at: Optional[datetime]
owner_id: UUID
+class DataDTO(OutDTO):
+ id: UUID
+ name: str
+ created_at: datetime
+ updated_at: Optional[datetime]
+ extension: str
+ mime_type: str
+ raw_data_location: str
+
@router.get("/", response_model=list[DatasetDTO])Also applies to: 99-107
| except Exception as error: | ||
| return JSONResponse( | ||
| status_code=409, | ||
| content={"error": str(error)} | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintain consistent error handling across endpoints.
For consistency with other endpoints, use HTTPException instead of JSONResponse for error handling.
- return JSONResponse(
- status_code=409,
- content={"error": str(error)}
- )
+ logger.error(f"Error retrieving dataset status: {str(error)}")
+ raise HTTPException(
+ status_code=409,
+ detail=str(error)
+ ) from error📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as error: | |
| return JSONResponse( | |
| status_code=409, | |
| content={"error": str(error)} | |
| ) | |
| except Exception as error: | |
| logger.error(f"Error retrieving dataset status: {str(error)}") | |
| raise HTTPException( | |
| status_code=409, | |
| detail=str(error) | |
| ) from error |
| except: | ||
| return JSONResponse( | ||
| status_code=409, | ||
| content="Graphistry credentials are not set. Please set them in your .env file.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace bare except with specific exception handling.
Using bare except clauses can mask important errors and make debugging difficult.
return JSONResponse(
status_code=200,
content=str(graph_url),
)
- except:
+ except (ConnectionError, ValueError) as error:
+ logger.error(f"Error generating graph: {str(error)}")
return JSONResponse(
status_code=409,
content="Graphistry credentials are not set. Please set them in your .env file.",
)Committable suggestion was skipped due to low confidence.
🧰 Tools
🪛 Ruff
93-93: Do not use bare except
(E722)
Added cascade deletion so when a dataset or data is deleted the connection in the dataset_data table is also deleted Feature #COG-455
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
cognee/modules/data/models/Data.py (1)
Line range hint
37-38: Inconsistency in datasets serialization.The commented-out datasets serialization in
to_json()suggests potential inconsistency with thelazy="noload"setting. If datasets are needed in the JSON representation, you should either:
- Remove the
lazy="noload"setting, or- Implement explicit loading when datasets are needed
Consider one of these approaches:
- # "datasets": [dataset.to_json() for dataset in self.datasets] + "datasets": [dataset.to_json() for dataset in self.datasets] if self.datasets else []Or if datasets should never be included:
- # "datasets": [dataset.to_json() for dataset in self.datasets]🧰 Tools
🪛 Ruff
22-22: Undefined name
Dataset(F821)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
96-98: Consider moving SQLite PRAGMA to session initialization.The foreign key PRAGMA setting should ideally be set when initializing the SQLite connection/session to ensure consistent behavior across all operations.
Consider moving this to the session creation in
get_async_sessionmethod:@asynccontextmanager async def get_async_session(self) -> AsyncGenerator[AsyncSession, None]: async_session_maker = self.sessionmaker async with async_session_maker() as session: try: if self.engine.dialect.name == "sqlite": await session.execute(text("PRAGMA foreign_keys = ON;")) yield session finally: await session.close()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py(1 hunks)cognee/modules/data/models/Data.py(1 hunks)cognee/modules/data/models/Dataset.py(1 hunks)cognee/modules/data/models/DatasetData.py(1 hunks)
🔇 Additional comments (6)
cognee/modules/data/models/DatasetData.py (2)
10-11: LGTM! The CASCADE delete behavior is properly implemented.
The addition of ondelete="CASCADE" to both foreign key constraints ensures proper cleanup of junction table records when either datasets or data are deleted, maintaining referential integrity.
10-11: Verify the cascading delete behavior impact.
The addition of CASCADE delete behavior is a significant change that could affect data retention and recovery. Please ensure:
- The application code properly handles cascading deletes
- There are no audit/compliance requirements for retaining relationship data
- Users are aware that deleting datasets or data will remove all associated relationships
Let's verify the cascade configuration across related models:
✅ Verification successful
Let me gather more information about the relationships and cascade configurations:
Cascade delete configuration is appropriate and safe
The verification shows that:
- The cascade delete is properly configured in a many-to-many relationship between Dataset and Data models through the DatasetData junction table
- Both Dataset and Data models maintain their relationships using the same junction table with appropriate bidirectional references
- The application has proper database transaction handling with commit operations
- No specific audit requirements or logging mechanisms are tied to deletion operations except for basic error logging
- The cascade delete is limited to just the junction table entries, preserving the actual Data and Dataset records unless explicitly deleted
The CASCADE behavior only affects the relationship records in the junction table, which is the expected and safe behavior for many-to-many relationships.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent cascade behavior across related models
# Test 1: Verify cascade configuration in Dataset model
echo "Checking Dataset model configuration..."
rg -A 5 "class Dataset.*Base"
# Test 2: Verify cascade configuration in Data model
echo "Checking Data model configuration..."
rg -A 5 "class Data.*Base"
# Test 3: Look for any existing delete operations that might need review
echo "Checking delete operations..."
rg -B 2 -A 2 "\.delete\(\)"
Length of output: 4181
Script:
#!/bin/bash
# Check for relationship configurations and cascade behaviors
# Look for relationship definitions and cascade configurations
echo "Checking relationship definitions..."
rg -B 2 -A 2 "relationship\("
# Check for any data integrity constraints or hooks
echo "Checking for data integrity hooks..."
rg -B 2 -A 2 "@event\.listens_for.*delete"
# Check for any audit or logging mechanisms
echo "Checking for audit/logging mechanisms..."
rg -B 2 -A 2 "(audit|log|track).*delete"
# Look for any documentation about data retention
echo "Checking documentation..."
fd README.md | xargs rg -i "(retention|cascade|delete)"
Length of output: 4962
cognee/modules/data/models/Dataset.py (2)
22-22: LGTM: Explicit relationship target class name.
Using the explicit string "Data" as the relationship target improves code clarity and helps prevent circular import issues.
22-26: Verify cascading delete behavior implications.
The addition of cascade="all, delete" means that deleting a Dataset will trigger deletion of the relationship in the join table. Please ensure this matches your data lifecycle requirements and consider the following:
- Document this behavior in comments for future maintainers
- Add appropriate warnings in the deletion endpoints
- Consider adding a soft delete mechanism if data recovery might be needed
Let's check for any existing delete operations that might be affected:
cognee/modules/data/models/Data.py (1)
23-27: Verify cascade deletion behavior in many-to-many relationship.
The addition of cascade="all, delete" in a many-to-many relationship requires careful consideration:
- This will delete all associated Dataset records when a Data record is deleted
- In a many-to-many context, this might have unintended consequences if Datasets should persist independently
Let's verify the relationship configuration across related models:
✅ Verification successful
Let me check the relationship definition in the Dataset model to see if it's properly configured for this many-to-many relationship.
Let me try a different approach to find the relationship definitions using ast-grep, which might be more reliable for finding these SQLAlchemy patterns.
The cascade configuration is correctly implemented for the many-to-many relationship
The cascade behavior is properly configured:
- Both
DatasetandDatamodels have matching relationship configurations withcascade="all, delete" - The join table
DatasetDatahasondelete="CASCADE"on both foreign keys - The relationship is symmetrically defined with correct
back_populatesreferences
This setup ensures:
- Deleting a
Datarecord will only remove the associations in the join table - The actual
Datasetrecords will be preserved - No orphaned records will be left in the join table
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check relationship definitions in related models
echo "Checking Dataset model relationship configuration:"
rg -A 10 "class Dataset" cognee/modules/data/models/
echo "Checking DatasetData model configuration:"
rg -A 10 "class DatasetData" cognee/modules/data/models/
Length of output: 2394
Script:
#!/bin/bash
# Check relationship definitions in Dataset model
echo "Checking Dataset model relationships:"
rg -B 5 -A 15 "data = relationship" cognee/modules/data/models/Dataset.py
# Check Data model relationships for complete picture
echo -e "\nChecking Data model relationships:"
rg -B 5 -A 15 "datasets = relationship" cognee/modules/data/models/Data.py
Length of output: 322
Script:
#!/bin/bash
# Check relationship definitions in both models
echo "Checking Dataset model for relationship definitions:"
ast-grep --pattern 'relationship($$$)' cognee/modules/data/models/Dataset.py
echo -e "\nChecking Data model for relationship definitions:"
ast-grep --pattern 'relationship($$$)' cognee/modules/data/models/Data.py
# Also check the complete Dataset model class
echo -e "\nChecking complete Dataset model:"
cat cognee/modules/data/models/Dataset.py
Length of output: 2715
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
99-101: Verify cascade delete behavior across different dialects.
The delete operation might behave differently between SQLite and other dialects when dealing with related records. Ensure that the cascade delete behavior is consistent across different database backends.
Let's verify the foreign key relationships and cascade behavior:
Also applies to: 105-106
✅ Verification successful
Based on the search results, I can now generate the final response since I have enough information about the cascade delete behavior across different dialects:
Cascade delete behavior is properly handled across dialects
The codebase already implements proper cascade delete behavior:
- For SQLite: Foreign key constraints with
ON DELETE CASCADEare defined during table creation, as noted in the comments. - For other dialects (e.g., PostgreSQL): The
CASCADEkeyword is explicitly used in drop operations. - The
DatasetDatajoin table properly defines cascade delete withondelete="CASCADE"in both foreign key constraints.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for foreign key relationships and cascade delete configurations
# Test 1: Search for foreign key definitions
echo "Checking foreign key definitions..."
rg -A 5 "ForeignKey|relationship"
# Test 2: Search for cascade delete configurations
echo "Checking cascade delete configurations..."
rg -A 5 "ondelete|CASCADE"
Length of output: 58371
| if self.engine.dialect.name == "sqlite": | ||
| async with self.get_async_session() as session: | ||
| TableModel = await self.get_table(table_name, schema_name) | ||
|
|
||
| # Foreign key constraints are disabled by default in SQLite (for backwards compatibility), | ||
| # so must be enabled for each database connection/session separately. | ||
| await session.execute(text("PRAGMA foreign_keys = ON;")) | ||
|
|
||
| await session.execute(TableModel.delete().where(TableModel.c.id == data_id)) | ||
| await session.commit() | ||
| else: | ||
| async with self.get_async_session() as session: | ||
| TableModel = await self.get_table(table_name, schema_name) | ||
| await session.execute(TableModel.delete().where(TableModel.c.id == data_id)) | ||
| await session.commit() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor to reduce code duplication and improve error handling.
The current implementation has duplicated session management and delete logic. Consider refactoring for better maintainability.
Here's a suggested implementation:
async def delete_data_by_id(self, table_name: str, data_id: UUID, schema_name: Optional[str] = "public"):
"""
Delete data in given table based on id. Table must have an id Column.
"""
try:
async with self.get_async_session() as session:
TableModel = await self.get_table(table_name, schema_name)
if self.engine.dialect.name == "sqlite":
# Foreign key constraints are disabled by default in SQLite
await session.execute(text("PRAGMA foreign_keys = ON;"))
await session.execute(TableModel.delete().where(TableModel.c.id == data_id))
await session.commit()
except Exception as e:
# Log the error and re-raise with a more specific message
raise Exception(f"Failed to delete data with id {data_id} from table {table_name}: {str(e)}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self.engine.dialect.name == "sqlite": | |
| async with self.get_async_session() as session: | |
| TableModel = await self.get_table(table_name, schema_name) | |
| # Foreign key constraints are disabled by default in SQLite (for backwards compatibility), | |
| # so must be enabled for each database connection/session separately. | |
| await session.execute(text("PRAGMA foreign_keys = ON;")) | |
| await session.execute(TableModel.delete().where(TableModel.c.id == data_id)) | |
| await session.commit() | |
| else: | |
| async with self.get_async_session() as session: | |
| TableModel = await self.get_table(table_name, schema_name) | |
| await session.execute(TableModel.delete().where(TableModel.c.id == data_id)) | |
| await session.commit() | |
| try: | |
| async with self.get_async_session() as session: | |
| TableModel = await self.get_table(table_name, schema_name) | |
| if self.engine.dialect.name == "sqlite": | |
| # Foreign key constraints are disabled by default in SQLite | |
| await session.execute(text("PRAGMA foreign_keys = ON;")) | |
| await session.execute(TableModel.delete().where(TableModel.c.id == data_id)) | |
| await session.commit() | |
| except Exception as e: | |
| # Log the error and re-raise with a more specific message | |
| raise Exception(f"Failed to delete data with id {data_id} from table {table_name}: {str(e)}") |
Removed space to trigger github actions to test telemetry Test
|
|
||
| # Foreign key constraints are disabled by default in SQLite (for backwards compatibility), | ||
| # so must be enabled for each database connection/session separately. | ||
| await session.execute(text("PRAGMA foreign_keys = ON;")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do this here? I would expect this would happen during migration/startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be set for each new connection/session. It doesn't work if it's only done during migration/startup, I've tried.
Though some people use event listeners to execute the PRAGMA command for each new session/connection for this reason.
Resolved security concerns in endpoints regarding os Fix #COG-334-structure-routing
Updated docs to functions related to getting and deleting Data model objects Docs #COG-334
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (8)
cognee/modules/data/methods/get_dataset.py (2)
Line range hint
6-14: Good security check, but consider enhancing error handling.The ownership validation is a good security measure. However, the function might benefit from more robust error handling.
Consider wrapping the database operations in a try-except block:
async def get_dataset(user_id: UUID, dataset_id: UUID) -> Optional[Dataset]: db_engine = get_relational_engine() async with db_engine.get_async_session() as session: + try: dataset = await session.get(Dataset, dataset_id) if dataset and dataset.owner_id != user_id: return None return dataset + except Exception as e: + # Handle specific exceptions (e.g., DatabaseError) + raise RuntimeError(f"Failed to retrieve dataset: {str(e)}")
Line range hint
6-14: Add logging for security auditing and debugging.Consider adding logging to track access attempts and authorization failures.
Here's a suggested implementation:
+import logging + +logger = logging.getLogger(__name__) + async def get_dataset(user_id: UUID, dataset_id: UUID) -> Optional[Dataset]: db_engine = get_relational_engine() async with db_engine.get_async_session() as session: dataset = await session.get(Dataset, dataset_id) if dataset and dataset.owner_id != user_id: + logger.warning( + f"Unauthorized access attempt to dataset {dataset_id} by user {user_id}" + ) return None + if not dataset: + logger.info(f"Dataset {dataset_id} not found") + else: + logger.debug(f"Dataset {dataset_id} successfully retrieved by owner {user_id}") return datasetcognee/modules/data/methods/get_data.py (1)
6-14: Documentation needs security and error handling details.While the basic documentation is present, it should include information about error scenarios and security implications.
async def get_data(data_id: UUID) -> Optional[Data]: """Retrieve data by ID. Args: data_id (UUID): ID of the data to retrieve Returns: Optional[Data]: The requested data object if found, None otherwise + + Raises: + SQLAlchemyError: If database operation fails + + Note: + This function provides no authorization checks. Ensure access control + is handled at the API route level. """cognee/api/v1/settings/routers/get_settings_router.py (2)
1-8: Consider reorganizing imports for better readability.Group imports in the following order:
- Standard library (typing)
- Third-party packages (fastapi)
- Local application imports (cognee.*)
Apply this diff:
+from typing import Union, Optional, Literal + from fastapi import APIRouter, Depends -from cognee.api.DTO import InDTO, OutDTO -from typing import Union, Optional, Literal + +from cognee.api.DTO import InDTO, OutDTO from cognee.modules.users.methods import get_authenticated_user -from fastapi import Depends from cognee.modules.users.models import User from cognee.modules.settings.get_settings import LLMConfig, VectorDBConfig
19-23: Simplify provider type annotations.The Union[Literal] syntax is unnecessarily verbose. Literal can directly accept multiple values.
Apply this diff:
class LLMConfigInputDTO(InDTO): - provider: Union[Literal["openai"], Literal["ollama"], Literal["anthropic"]] + provider: Literal["openai", "ollama", "anthropic"] model: str api_key: str class VectorDBConfigInputDTO(InDTO): - provider: Union[Literal["lancedb"], Literal["qdrant"], Literal["weaviate"], Literal["pgvector"]] + provider: Literal["lancedb", "qdrant", "weaviate", "pgvector"] url: str api_key: strAlso applies to: 24-27
cognee/api/v1/datasets/routers/get_datasets_router.py (3)
51-63: Add success response for dataset deletion.Consider returning a success status code (204 No Content) after successful deletion to follow REST conventions.
await delete_dataset(dataset) + return JSONResponse(status_code=204)
73-73: Address TODO comment regarding permission handling.The TODO comment indicates a need to handle permission access differently. Consider implementing role-based access control (RBAC) or specific permission checks.
Would you like me to help design a permission handling system or create a GitHub issue to track this task?
163-174: Simplify data lookup logic.The current implementation using list comprehension and length check can be simplified.
- matching_data = [data for data in dataset_data if str(data.id) == data_id] - - # Check if matching_data contains an element - if len(matching_data) == 0: - return JSONResponse( - status_code=404, - content={ - "detail": f"Data ({data_id}) not found in dataset ({dataset_id})." - } - ) - - data = matching_data[0] + data = next((data for data in dataset_data if str(data.id) == data_id), None) + if data is None: + raise HTTPException( + status_code=404, + detail=f"Data ({data_id}) not found in dataset ({dataset_id})." + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
cognee/api/v1/add/routers/get_add_router.py(1 hunks)cognee/api/v1/cognify/routers/get_cognify_router.py(1 hunks)cognee/api/v1/datasets/routers/get_datasets_router.py(1 hunks)cognee/api/v1/search/routers/get_search_router.py(1 hunks)cognee/api/v1/settings/routers/get_settings_router.py(1 hunks)cognee/modules/data/methods/delete_data.py(1 hunks)cognee/modules/data/methods/get_data.py(1 hunks)cognee/modules/data/methods/get_dataset.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- cognee/api/v1/add/routers/get_add_router.py
- cognee/api/v1/cognify/routers/get_cognify_router.py
- cognee/api/v1/search/routers/get_search_router.py
🧰 Additional context used
🪛 Ruff
cognee/api/v1/datasets/routers/get_datasets_router.py
103-103: Do not use bare except
(E722)
🔇 Additional comments (10)
cognee/modules/data/methods/get_dataset.py (1)
1-4: LGTM! Clean and well-organized imports.
The imports are properly organized and the addition of Optional type is appropriate for the new return type.
cognee/modules/data/methods/get_data.py (1)
6-20:
Critical: Add error handling and consider security implications.
The current implementation has several critical issues:
- No error handling for database operations
- No user authorization check (previously present but now removed)
- No proper session cleanup in case of errors
Here's the suggested implementation:
-async def get_data(data_id: UUID) -> Optional[Data]:
+async def get_data(data_id: UUID, user_id: UUID = None) -> Optional[Data]:
"""Retrieve data by ID.
Args:
data_id (UUID): ID of the data to retrieve
+ user_id (UUID, optional): ID of the user requesting the data
Returns:
Optional[Data]: The requested data object if found, None otherwise
+
+ Raises:
+ SQLAlchemyError: If database operation fails
+
+ Note:
+ This function provides no authorization checks. Ensure access control
+ is handled at the API route level.
"""
db_engine = get_relational_engine()
async with db_engine.get_async_session() as session:
- data = await session.get(Data, data_id)
-
- return data
+ try:
+ data = await session.get(Data, data_id)
+ if data is None:
+ return None
+
+ # If user_id is provided, perform authorization check
+ if user_id is not None and data.user_id != user_id:
+ return None
+
+ return data
+ except SQLAlchemyError as e:
+ # Log the error here if you have a logging system
+ raiseLet's verify if this data access is properly secured at the API route level:
cognee/modules/data/methods/delete_data.py (2)
1-4: LGTM!
The imports are clean and well-organized. The empty line provides good visual separation.
17-19: Verify database engine initialization.
The database engine initialization could fail if the configuration is incorrect.
Let's verify the database engine initialization:
✅ Verification successful
Let me generate additional scripts to check the error handling and configuration:
Let me check one more aspect to understand the database engine implementation:
Let me check one final aspect about the database engine implementation:
Database engine initialization is properly handled
Based on the codebase analysis:
- The
delete_data_by_idoperation is implemented with proper session management - Database engine is consistently used across the codebase with proper session handling
- The engine implementation includes dialect-specific handling (e.g., SQLite support)
- The operation is used within a well-structured data deletion workflow
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check database engine initialization and usage
# Expected: Find database configuration and error handling
# Check for database configuration
echo "Checking database configuration..."
rg -l "DATABASE|DB_" --type yaml --type env
# Check for database engine error handling
echo "Checking database engine error handling..."
ast-grep --pattern 'get_relational_engine()' -A 5
Length of output: 12910
Script:
#!/bin/bash
# Check the implementation of get_relational_engine and its error handling
ast-grep --pattern 'def get_relational_engine'
# Check for try-catch blocks around database operations
rg -A 3 "try:.*get_relational_engine"
# Look for the engine configuration
rg -l "class.*Engine"
Length of output: 600
Script:
#!/bin/bash
# Check the database engine implementation
rg -A 10 "class.*RelationalEngine|class.*DatabaseEngine"
# Check for database configuration files
fd -e yaml -e ini -e conf -e config -x cat {} \;
Length of output: 6058
Script:
#!/bin/bash
# Check the database engine implementation in infrastructure
rg -A 5 "class RelationalEngine" cognee/infrastructure/
rg -A 5 "def delete_data_by_id" cognee/infrastructure/
Length of output: 885
cognee/api/v1/settings/routers/get_settings_router.py (1)
9-18: LGTM! Well-structured output DTOs.
The output DTOs are correctly defined with proper inheritance and composition.
cognee/api/v1/datasets/routers/get_datasets_router.py (5)
1-36: LGTM! Well-structured DTOs and imports.
The imports are logically organized, and the DTO classes follow a consistent pattern with proper inheritance from base classes.
40-49: LGTM! Well-implemented get_datasets endpoint.
Good error handling with proper logging and HTTP status codes.
82-86:
Fix incorrect error message in data not found case.
The error message incorrectly refers to "Dataset" when it should refer to "Data".
if data is None:
raise HTTPException(
status_code=404,
- detail=f"Dataset ({data_id}) not found."
+ detail=f"Data ({data_id}) not found."
)Likely invalid or redundant comment.
138-142: 🛠️ Refactor suggestion
Use consistent error handling with HTTPException.
Maintain consistency with other endpoints by using HTTPException for error handling.
except Exception as error:
- return JSONResponse(
- status_code=409,
- content={"error": str(error)}
- )
+ logger.error(f"Error retrieving dataset status: {error}")
+ raise HTTPException(
+ status_code=409,
+ detail=str(error)
+ ) from errorLikely invalid or redundant comment.
150-156: 🛠️ Refactor suggestion
Use consistent error handling with HTTPException.
Maintain consistency by using HTTPException instead of JSONResponse.
if dataset is None:
- return JSONResponse(
- status_code=404,
- content={
- "detail": f"Dataset ({dataset_id}) not found."
- }
- )
+ raise HTTPException(
+ status_code=404,
+ detail=f"Dataset ({dataset_id}) not found."
+ )Likely invalid or redundant comment.
| from uuid import UUID | ||
| from typing import Optional | ||
| from cognee.infrastructure.databases.relational import get_relational_engine | ||
| from ..models import Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add SQLAlchemy error type import.
The imports look good, but since we'll need to handle database errors, we should add:
from uuid import UUID
from typing import Optional
+from sqlalchemy.exc import SQLAlchemyError
from cognee.infrastructure.databases.relational import get_relational_engine
from ..models import Data📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from uuid import UUID | |
| from typing import Optional | |
| from cognee.infrastructure.databases.relational import get_relational_engine | |
| from ..models import Data | |
| from uuid import UUID | |
| from typing import Optional | |
| from sqlalchemy.exc import SQLAlchemyError | |
| from cognee.infrastructure.databases.relational import get_relational_engine | |
| from ..models import Data |
| async def delete_data(data: Data): | ||
| """Delete a data record from the database. | ||
| Args: | ||
| data (Data): The data object to be deleted. | ||
| Raises: | ||
| ValueError: If the data object is invalid. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add return type hint and expand documentation.
The docstring is good but incomplete. The return value and its type should be documented.
Apply this diff:
-async def delete_data(data: Data):
+async def delete_data(data: Data) -> bool:
"""Delete a data record from the database.
Args:
data (Data): The data object to be deleted.
+
+ Returns:
+ bool: True if deletion was successful, False if record not found.
Raises:
ValueError: If the data object is invalid.
"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def delete_data(data: Data): | |
| """Delete a data record from the database. | |
| Args: | |
| data (Data): The data object to be deleted. | |
| Raises: | |
| ValueError: If the data object is invalid. | |
| """ | |
| async def delete_data(data: Data) -> bool: | |
| """Delete a data record from the database. | |
| Args: | |
| data (Data): The data object to be deleted. | |
| Returns: | |
| bool: True if deletion was successful, False if record not found. | |
| Raises: | |
| ValueError: If the data object is invalid. | |
| """ |
| if not hasattr(data, '__tablename__'): | ||
| raise ValueError("The provided data object is missing the required '__tablename__' attribute.") | ||
|
|
||
| db_engine = get_relational_engine() | ||
|
|
||
| return await db_engine.delete_data_by_id(data.__tablename__, data.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance error handling and data validation.
The current implementation needs additional safeguards:
- Missing validation for
data.id - No error handling for database operations
- No transaction context for atomicity
Apply this diff to improve reliability:
if not hasattr(data, '__tablename__'):
raise ValueError("The provided data object is missing the required '__tablename__' attribute.")
+ if not data.id:
+ raise ValueError("The data object must have a valid id.")
+
db_engine = get_relational_engine()
- return await db_engine.delete_data_by_id(data.__tablename__, data.id)
+ try:
+ async with db_engine.begin() as conn:
+ return await db_engine.delete_data_by_id(data.__tablename__, data.id)
+ except Exception as e:
+ # Log the error here
+ raise ValueError(f"Failed to delete data: {str(e)}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not hasattr(data, '__tablename__'): | |
| raise ValueError("The provided data object is missing the required '__tablename__' attribute.") | |
| db_engine = get_relational_engine() | |
| return await db_engine.delete_data_by_id(data.__tablename__, data.id) | |
| if not hasattr(data, '__tablename__'): | |
| raise ValueError("The provided data object is missing the required '__tablename__' attribute.") | |
| if not data.id: | |
| raise ValueError("The data object must have a valid id.") | |
| db_engine = get_relational_engine() | |
| try: | |
| async with db_engine.begin() as conn: | |
| return await db_engine.delete_data_by_id(data.__tablename__, data.id) | |
| except Exception as e: | |
| # Log the error here | |
| raise ValueError(f"Failed to delete data: {str(e)}") from e |
| class LLMConfigInputDTO(InDTO): | ||
| provider: Union[Literal["openai"], Literal["ollama"], Literal["anthropic"]] | ||
| model: str | ||
| api_key: str | ||
|
|
||
| class VectorDBConfigInputDTO(InDTO): | ||
| provider: Union[Literal["lancedb"], Literal["qdrant"], Literal["weaviate"], Literal["pgvector"]] | ||
| url: str | ||
| api_key: str | ||
|
|
||
| class SettingsPayloadDTO(InDTO): | ||
| llm: Optional[LLMConfigInputDTO] = None | ||
| vector_db: Optional[VectorDBConfigInputDTO] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add field validation for api_key and url.
Consider adding Pydantic field validators to ensure:
- api_key meets minimum length/format requirements
- url is properly formatted
Example implementation:
from pydantic import validator, HttpUrl
class LLMConfigInputDTO(InDTO):
provider: Literal["openai", "ollama", "anthropic"]
model: str
api_key: str
@validator('api_key')
def validate_api_key(cls, v):
if len(v) < 8:
raise ValueError('API key must be at least 8 characters')
return v
class VectorDBConfigInputDTO(InDTO):
provider: Literal["lancedb", "qdrant", "weaviate", "pgvector"]
url: HttpUrl # This will automatically validate URL format
api_key: str
@validator('api_key')
def validate_api_key(cls, v):
if len(v) < 8:
raise ValueError('API key must be at least 8 characters')
return v| @router.get("/", response_model=SettingsDTO) | ||
| async def get_settings(user: User = Depends(get_authenticated_user)): | ||
| from cognee.modules.settings import get_settings as get_cognee_settings | ||
| return get_cognee_settings() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling to GET endpoint.
The endpoint should handle potential errors from get_cognee_settings().
Apply this diff:
@router.get("/", response_model=SettingsDTO)
async def get_settings(user: User = Depends(get_authenticated_user)):
+ from fastapi import HTTPException
from cognee.modules.settings import get_settings as get_cognee_settings
- return get_cognee_settings()
+ try:
+ return get_cognee_settings()
+ except Exception as e:
+ raise HTTPException(status_code=500, detail=f"Failed to retrieve settings: {str(e)}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @router.get("/", response_model=SettingsDTO) | |
| async def get_settings(user: User = Depends(get_authenticated_user)): | |
| from cognee.modules.settings import get_settings as get_cognee_settings | |
| return get_cognee_settings() | |
| @router.get("/", response_model=SettingsDTO) | |
| async def get_settings(user: User = Depends(get_authenticated_user)): | |
| from fastapi import HTTPException | |
| from cognee.modules.settings import get_settings as get_cognee_settings | |
| try: | |
| return get_cognee_settings() | |
| except Exception as e: | |
| raise HTTPException(status_code=500, detail=f"Failed to retrieve settings: {str(e)}") |
| @router.post("/", response_model=None) | ||
| async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)): | ||
| from cognee.modules.settings import save_llm_config, save_vector_db_config | ||
|
|
||
| if new_settings.llm is not None: | ||
| await save_llm_config(new_settings.llm) | ||
|
|
||
| if new_settings.vector_db is not None: | ||
| await save_vector_db_config(new_settings.vector_db) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve POST endpoint implementation.
The endpoint needs the following improvements:
- Return a success response instead of None
- Add error handling
- Move imports to the top level
Apply this diff:
-@router.post("/", response_model=None)
+from fastapi import HTTPException
+from pydantic import BaseModel
+
+class SettingsResponse(BaseModel):
+ message: str
+
+@router.post("/", response_model=SettingsResponse)
async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)):
- from cognee.modules.settings import save_llm_config, save_vector_db_config
-
- if new_settings.llm is not None:
- await save_llm_config(new_settings.llm)
-
- if new_settings.vector_db is not None:
- await save_vector_db_config(new_settings.vector_db)
+ try:
+ if new_settings.llm is not None:
+ await save_llm_config(new_settings.llm)
+
+ if new_settings.vector_db is not None:
+ await save_vector_db_config(new_settings.vector_db)
+
+ return SettingsResponse(message="Settings updated successfully")
+ except Exception as e:
+ raise HTTPException(status_code=500, detail=f"Failed to save settings: {str(e)}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @router.post("/", response_model=None) | |
| async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)): | |
| from cognee.modules.settings import save_llm_config, save_vector_db_config | |
| if new_settings.llm is not None: | |
| await save_llm_config(new_settings.llm) | |
| if new_settings.vector_db is not None: | |
| await save_vector_db_config(new_settings.vector_db) | |
| from fastapi import HTTPException | |
| from pydantic import BaseModel | |
| class SettingsResponse(BaseModel): | |
| message: str | |
| @router.post("/", response_model=SettingsResponse) | |
| async def save_settings(new_settings: SettingsPayloadDTO, user: User = Depends(get_authenticated_user)): | |
| try: | |
| if new_settings.llm is not None: | |
| await save_llm_config(new_settings.llm) | |
| if new_settings.vector_db is not None: | |
| await save_vector_db_config(new_settings.vector_db) | |
| return SettingsResponse(message="Settings updated successfully") | |
| except Exception as e: | |
| raise HTTPException(status_code=500, detail=f"Failed to save settings: {str(e)}") |
| if dataset is None: | ||
| return JSONResponse( | ||
| status_code=404, | ||
| content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Maintain consistent error handling pattern.
Use HTTPException consistently across the endpoint instead of mixing with JSONResponse and ErrorResponseDTO.
if dataset is None:
- return JSONResponse(
- status_code=404,
- content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
- )
+ raise HTTPException(
+ status_code=404,
+ detail=f"Dataset ({dataset_id}) not found."
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if dataset is None: | |
| return JSONResponse( | |
| status_code=404, | |
| content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."), | |
| ) | |
| if dataset is None: | |
| raise HTTPException( | |
| status_code=404, | |
| detail=f"Dataset ({dataset_id}) not found." | |
| ) |
Added None as default value for datetime Refactor #COG-334
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
cognee/api/v1/datasets/routers/get_datasets_router.py (1)
73-73: Address the TODO comment regarding permission handling.The TODO comment indicates a need to handle permission access differently. This should be addressed to ensure proper security implementation.
Would you like me to help design and implement a more granular permission system for data access?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
cognee/api/v1/datasets/routers/get_datasets_router.py(1 hunks)
🧰 Additional context used
🪛 Ruff
cognee/api/v1/datasets/routers/get_datasets_router.py
103-103: Do not use bare except
(E722)
🔇 Additional comments (6)
cognee/api/v1/datasets/routers/get_datasets_router.py (6)
1-36: LGTM! Well-structured imports and DTO definitions.
The imports are logically organized, and the DTO classes follow consistent patterns with appropriate inheritance.
40-50: LGTM! Proper error handling and authentication.
The endpoint correctly handles authentication and includes appropriate error logging.
51-63: LGTM! Proper validation before deletion.
The endpoint correctly validates dataset existence and ownership before deletion.
82-86:
Fix incorrect error message text.
The error message incorrectly mentions "Dataset" instead of "Data".
Apply this diff to fix the error message:
if data is None:
raise HTTPException(
status_code=404,
- detail=f"Dataset ({data_id}) not found."
+ detail=f"Data ({data_id}) not found."
)Likely invalid or redundant comment.
116-120: 🛠️ Refactor suggestion
Maintain consistent error handling pattern.
The endpoint uses JSONResponse with ErrorResponseDTO instead of HTTPException.
Apply this diff to maintain consistency:
if dataset is None:
- return JSONResponse(
- status_code=404,
- content=ErrorResponseDTO(f"Dataset ({dataset_id}) not found."),
- )
+ raise HTTPException(
+ status_code=404,
+ detail=f"Dataset ({dataset_id}) not found."
+ )Likely invalid or redundant comment.
138-142: 🛠️ Refactor suggestion
Maintain consistent error handling pattern.
The endpoint uses JSONResponse instead of HTTPException for error handling.
Apply this diff to maintain consistency:
except Exception as error:
- return JSONResponse(
- status_code=409,
- content={"error": str(error)}
- )
+ logger.error(f"Error retrieving dataset status: {str(error)}")
+ raise HTTPException(
+ status_code=409,
+ detail=str(error)
+ ) from errorLikely invalid or redundant comment.
| if dataset is None: | ||
| return JSONResponse( | ||
| status_code=404, | ||
| content={ | ||
| "detail": f"Dataset ({dataset_id}) not found." | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Maintain consistent error handling pattern.
The endpoint uses JSONResponse in multiple places instead of HTTPException.
Apply this diff to maintain consistency:
if dataset is None:
- return JSONResponse(
- status_code=404,
- content={
- "detail": f"Dataset ({dataset_id}) not found."
- }
- )
+ raise HTTPException(
+ status_code=404,
+ detail=f"Dataset ({dataset_id}) not found."
+ )
# ... other code ...
if len(matching_data) == 0:
- return JSONResponse(
- status_code=404,
- content={
- "detail": f"Data ({data_id}) not found in dataset ({dataset_id})."
- }
- )
+ raise HTTPException(
+ status_code=404,
+ detail=f"Data ({data_id}) not found in dataset ({dataset_id})."
+ )Also applies to: 167-172
| try: | ||
| graph_client = await get_graph_engine() | ||
| graph_url = await render_graph(graph_client.graph) | ||
|
|
||
| return JSONResponse( | ||
| status_code=200, | ||
| content=str(graph_url), | ||
| ) | ||
| except: | ||
| return JSONResponse( | ||
| status_code=409, | ||
| content="Graphistry credentials are not set. Please set them in your .env file.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling in graph generation.
The current implementation has two issues:
- Uses a bare
exceptclause which could mask important errors - Inconsistent error handling pattern (using JSONResponse instead of HTTPException)
Apply this diff to improve error handling:
try:
graph_client = await get_graph_engine()
graph_url = await render_graph(graph_client.graph)
- return JSONResponse(
- status_code=200,
- content=str(graph_url),
- )
+ return str(graph_url)
- except:
+ except (ConnectionError, ValueError) as error:
+ logger.error(f"Error generating graph: {str(error)}")
+ raise HTTPException(
+ status_code=409,
+ detail="Graphistry credentials are not set. Please set them in your .env file."
+ ) from error
- return JSONResponse(
- status_code=409,
- content="Graphistry credentials are not set. Please set them in your .env file.",
- )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| graph_client = await get_graph_engine() | |
| graph_url = await render_graph(graph_client.graph) | |
| return JSONResponse( | |
| status_code=200, | |
| content=str(graph_url), | |
| ) | |
| except: | |
| return JSONResponse( | |
| status_code=409, | |
| content="Graphistry credentials are not set. Please set them in your .env file.", | |
| ) | |
| try: | |
| graph_client = await get_graph_engine() | |
| graph_url = await render_graph(graph_client.graph) | |
| return str(graph_url) | |
| except (ConnectionError, ValueError) as error: | |
| logger.error(f"Error generating graph: {str(error)}") | |
| raise HTTPException( | |
| status_code=409, | |
| detail="Graphistry credentials are not set. Please set them in your .env file." | |
| ) from error |
🧰 Tools
🪛 Ruff
103-103: Do not use bare except
(E722)
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
These updates enhance the modularity and usability of the API, providing a more organized structure for users.