-
Notifications
You must be signed in to change notification settings - Fork 954
feat: Add database deletion on dataset delete #1893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please make sure all the checkboxes are checked:
|
WalkthroughAdds per-dataset database handler utilities and integrates them into dataset deletion flows; introduces an end-to-end GitHub Actions job and test that verifies graph and vector database files are removed when datasets are deleted. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
cognee/api/v1/datasets/routers/get_datasets_router.py (1)
225-271: Clarify future ofdelete_dataendpoint or remove the TODOThe inline TODO notes that this endpoint may be redundant with other data deletion mechanisms, but the implementation is still fully active.
Either:
- Decide to keep it and remove/adjust the TODO (possibly documenting how it differs from the
/v1/deleteendpoint), or- Plan for deprecation/removal (e.g., mark as deprecated in docs and route comments).
This will avoid long-lived ambiguity around whether clients should rely on this endpoint.
[submit_optional_refactor]cognee/modules/data/deletion/prune_system.py (1)
38-42: Same critical type mismatch and missing docstring as prune_graph_databases.This function has the same issues identified in
prune_graph_databases:
- Critical: Type mismatch between dict-like Row mappings and DatasetDatabase model instances
- Major: Missing docstring
- Minor: No validation of handler dictionary structure
Apply the same fixes as suggested for
prune_graph_databases:async def prune_vector_databases(): + """ + Delete all vector databases for all dataset_database records. + Used when backend_access_control is enabled to prune vector databases per-dataset. + """ db_engine = get_relational_engine() try: - dataset_databases = await db_engine.get_all_data_from_table("dataset_database") + dataset_database_rows = await db_engine.get_all_data_from_table("dataset_database") # Go through each dataset database and delete the vector database - for dataset_database in dataset_databases: + for row in dataset_database_rows: + dataset_database = DatasetDatabase(**row) handler = get_vector_dataset_database_handler(dataset_database) + if "handler_instance" not in handler: + logger.warning("Handler missing 'handler_instance' key for dataset_database") + continue await handler["handler_instance"].delete_dataset(dataset_database)
🧹 Nitpick comments (6)
cognee/modules/data/methods/delete_dataset.py (1)
1-33: Dataset deletion flow via graph/vector handlers looks sound; consider docstring and optional defensive handler lookupThe sequence (fetch
DatasetDatabase, invoke graph/vector handlers, then delete the dataset entity) looks correct and matches the new handler utilities.To improve maintainability and debuggability:
- Add a short docstring documenting that this also deletes associated graph/vector databases, not just the relational dataset row.
- Optionally guard against misconfiguration of handler names (KeyError from the handler mapping) so failures surface with a clearer error message.
For example, adding a docstring:
async def delete_dataset(dataset: Dataset): - db_engine = get_relational_engine() + """Delete a dataset and its associated graph and vector databases.""" + + db_engine = get_relational_engine()cognee/api/v1/datasets/routers/get_datasets_router.py (1)
211-218: Avoid shadowing the routerdelete_datasetfunction with the imported methodImporting
delete_datasetinside thedelete_datasetendpoint shadows the endpoint’s own name, which is confusing when reading or debugging.Consider importing under an alias (or importing at module level) to make intent clearer:
- from cognee.modules.data.methods import delete_dataset + from cognee.modules.data.methods import delete_dataset as delete_dataset_method @@ - await delete_dataset(dataset[0]) + await delete_dataset_method(dataset[0])The authorization pattern via
get_authorized_existing_datasetslooks good.cognee/infrastructure/databases/utils/get_graph_dataset_database_handler.py (1)
1-10: Graph handler lookup is straightforward; consider docstring and clearer error on unknown handlerThe direct lookup into
supported_dataset_database_handlersis fine assuming configuration is always valid.Two small improvements:
- Add a one-line docstring clarifying that this picks the graph handler based on
dataset_database.graph_dataset_database_handler.- Optionally wrap the dict access to raise a more descriptive error if an unsupported handler name is stored in the DB (instead of a bare
KeyError).Example:
def get_graph_dataset_database_handler(dataset_database: DatasetDatabase) -> dict: - from cognee.infrastructure.databases.dataset_database_handler.supported_dataset_database_handlers import ( - supported_dataset_database_handlers, - ) - - handler = supported_dataset_database_handlers[dataset_database.graph_dataset_database_handler] - return handler + from cognee.infrastructure.databases.dataset_database_handler.supported_dataset_database_handlers import ( + supported_dataset_database_handlers, + ) + + """Return the graph dataset database handler config for the given dataset.""" + try: + return supported_dataset_database_handlers[ + dataset_database.graph_dataset_database_handler + ] + except KeyError as exc: + raise ValueError( + f"Unsupported graph dataset database handler: " + f"{dataset_database.graph_dataset_database_handler!r}" + ) from exccognee/tests/test_dataset_delete.py (1)
1-76: E2E dataset deletion test is valuable; consider simplifying event-loop setupThe test does a good end-to-end check: build datasets, verify graph/vector DB files exist, delete, then assert both are gone. This directly exercises the new handler-based deletion.
If you don’t rely on custom loop semantics here, you could simplify the
__main__block withasyncio.run(main())rather than manually creating and managing the loop, e.g.:-if __name__ == "__main__": - logger = setup_logging(log_level=ERROR) - loop = asyncio.new_event_loop() - asyncio.set_event_loop(loop) - try: - loop.run_until_complete(main()) - finally: - loop.run_until_complete(loop.shutdown_asyncgens()) +if __name__ == "__main__": + logger = setup_logging(log_level=ERROR) + asyncio.run(main())The current approach still works, so this is mostly about readability and reducing boilerplate.
cognee/infrastructure/databases/utils/resolve_dataset_database_connection_info.py (1)
1-29: Refactor to delegate connection resolution to handlers looks correctUsing
get_vector_dataset_database_handlerandget_graph_dataset_database_handlerand then calling each handler’sresolve_dataset_connection_infosequentially aligns with the function’s purpose and centralizes handler logic.If you later need better diagnostics for misconfigured handler names, the same defensive pattern suggested for the handler getter utilities can be applied there, but the current flow is coherent with the new design.
cognee/infrastructure/databases/utils/get_vector_dataset_database_handler.py (1)
1-10: Vector handler lookup mirrors graph helper; same docstring/error-handling suggestions applyThe implementation correctly mirrors the graph handler utility, using
vector_dataset_database_handleras the key.For consistency and clarity, consider adding:
- A brief docstring explaining that this returns the vector database handler config for a dataset.
- Optional error handling to surface a clear message if an unsupported handler name is stored.
Pattern would be analogous to the graph helper refactor suggested earlier.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.github/workflows/e2e_tests.yml(1 hunks)cognee/api/v1/datasets/routers/get_datasets_router.py(1 hunks)cognee/infrastructure/databases/utils/__init__.py(1 hunks)cognee/infrastructure/databases/utils/get_graph_dataset_database_handler.py(1 hunks)cognee/infrastructure/databases/utils/get_vector_dataset_database_handler.py(1 hunks)cognee/infrastructure/databases/utils/resolve_dataset_database_connection_info.py(2 hunks)cognee/modules/data/deletion/prune_system.py(3 hunks)cognee/modules/data/methods/delete_dataset.py(1 hunks)cognee/tests/test_dataset_delete.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/modules/data/deletion/prune_system.pycognee/infrastructure/databases/utils/get_graph_dataset_database_handler.pycognee/infrastructure/databases/utils/resolve_dataset_database_connection_info.pycognee/modules/data/methods/delete_dataset.pycognee/infrastructure/databases/utils/__init__.pycognee/infrastructure/databases/utils/get_vector_dataset_database_handler.pycognee/tests/test_dataset_delete.pycognee/api/v1/datasets/routers/get_datasets_router.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/modules/data/deletion/prune_system.pycognee/infrastructure/databases/utils/get_graph_dataset_database_handler.pycognee/infrastructure/databases/utils/resolve_dataset_database_connection_info.pycognee/modules/data/methods/delete_dataset.pycognee/infrastructure/databases/utils/__init__.pycognee/infrastructure/databases/utils/get_vector_dataset_database_handler.pycognee/tests/test_dataset_delete.pycognee/api/v1/datasets/routers/get_datasets_router.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/modules/data/deletion/prune_system.pycognee/infrastructure/databases/utils/get_graph_dataset_database_handler.pycognee/infrastructure/databases/utils/resolve_dataset_database_connection_info.pycognee/modules/data/methods/delete_dataset.pycognee/infrastructure/databases/utils/__init__.pycognee/infrastructure/databases/utils/get_vector_dataset_database_handler.pycognee/tests/test_dataset_delete.pycognee/api/v1/datasets/routers/get_datasets_router.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/modules/data/deletion/prune_system.pycognee/infrastructure/databases/utils/get_graph_dataset_database_handler.pycognee/infrastructure/databases/utils/resolve_dataset_database_connection_info.pycognee/modules/data/methods/delete_dataset.pycognee/infrastructure/databases/utils/__init__.pycognee/infrastructure/databases/utils/get_vector_dataset_database_handler.py
.github/**
⚙️ CodeRabbit configuration file
.github/**: * When the project is hosted on GitHub: All GitHub-specific configurations, templates, and tools should be found in the '.github' directory tree.
- 'actionlint' erroneously generates false positives when dealing with GitHub's
${{ ... }}syntax in conditionals.- 'actionlint' erroneously generates incorrect solutions when suggesting the removal of valid
${{ ... }}syntax.
Files:
.github/workflows/e2e_tests.yml
cognee/tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/tests/**/*.py: Place Python tests under cognee/tests/ organized by type (unit, integration, cli_tests)
Name Python test files test_*.py and use pytest.mark.asyncio for async tests
Files:
cognee/tests/test_dataset_delete.py
cognee/tests/*
⚙️ CodeRabbit configuration file
cognee/tests/*: When reviewing test code:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance, pointing out any violations discovered.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a project rule, Python source files with names prefixed by the string "test_" and located in the project's "tests" directory are the project's unit-testing code. It is safe, albeit a heuristic, to assume these are considered part of the project's minimal acceptance testing unless a justifying exception to this assumption is documented.
- As a project rule, any files without extensions and with names prefixed by either the string "check_" or the string "test_", and located in the project's "tests" directory, are the project's non-unit test code. "Non-unit test" in this context refers to any type of testing other than unit testing, such as (but not limited to) functional testing, style linting, regression testing, etc. It can also be assumed that non-unit testing code is usually written as Bash shell scripts.
Files:
cognee/tests/test_dataset_delete.py
cognee/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Public APIs should be type-annotated in Python where practical
Files:
cognee/api/v1/datasets/routers/get_datasets_router.py
🧠 Learnings (2)
📚 Learning: 2025-10-11T04:18:24.594Z
Learnt from: Vattikuti-Manideep-Sitaram
Repo: topoteretes/cognee PR: 1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.594Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.
Applied to files:
cognee/modules/data/deletion/prune_system.py
📚 Learning: 2024-11-18T12:54:36.758Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 233
File: .github/workflows/test_cognee_llama_index_notebook.yml:0-0
Timestamp: 2024-11-18T12:54:36.758Z
Learning: In the `.github/workflows/test_cognee_llama_index_notebook.yml` workflow, it's acceptable to remove the `--all-extras` flag from `poetry install` to reduce costs by not installing unnecessary dependencies.
Applied to files:
.github/workflows/e2e_tests.yml
🧬 Code graph analysis (7)
cognee/modules/data/deletion/prune_system.py (7)
cognee/infrastructure/databases/utils/get_graph_dataset_database_handler.py (1)
get_graph_dataset_database_handler(4-10)cognee/infrastructure/databases/utils/get_vector_dataset_database_handler.py (1)
get_vector_dataset_database_handler(4-10)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
get_all_data_from_table(417-450)cognee/infrastructure/databases/graph/kuzu/KuzuDatasetDatabaseHandler.py (1)
delete_dataset(58-81)cognee/infrastructure/databases/vector/lancedb/LanceDBDatasetDatabaseHandler.py (1)
delete_dataset(43-50)cognee/infrastructure/databases/graph/neo4j_driver/Neo4jAuraDevDatasetDatabaseHandler.py (1)
delete_dataset(167-168)cognee/infrastructure/databases/dataset_database_handler/dataset_database_handler_interface.py (1)
delete_dataset(71-80)
cognee/infrastructure/databases/utils/get_graph_dataset_database_handler.py (1)
cognee/modules/users/models/DatasetDatabase.py (1)
DatasetDatabase(7-40)
cognee/modules/data/methods/delete_dataset.py (4)
cognee/modules/users/models/DatasetDatabase.py (1)
DatasetDatabase(7-40)cognee/infrastructure/databases/utils/get_vector_dataset_database_handler.py (1)
get_vector_dataset_database_handler(4-10)cognee/infrastructure/databases/utils/get_graph_dataset_database_handler.py (1)
get_graph_dataset_database_handler(4-10)cognee/api/v1/datasets/routers/get_datasets_router.py (1)
delete_dataset(184-218)
cognee/infrastructure/databases/utils/__init__.py (2)
cognee/infrastructure/databases/utils/get_graph_dataset_database_handler.py (1)
get_graph_dataset_database_handler(4-10)cognee/infrastructure/databases/utils/get_vector_dataset_database_handler.py (1)
get_vector_dataset_database_handler(4-10)
cognee/infrastructure/databases/utils/get_vector_dataset_database_handler.py (1)
cognee/modules/users/models/DatasetDatabase.py (1)
DatasetDatabase(7-40)
cognee/tests/test_dataset_delete.py (3)
cognee/shared/logging_utils.py (1)
setup_logging(288-561)cognee/modules/users/methods/get_default_user.py (1)
get_default_user(13-38)cognee/api/v1/config/config.py (2)
data_root_directory(36-38)system_root_directory(18-33)
cognee/api/v1/datasets/routers/get_datasets_router.py (2)
cognee/modules/data/methods/delete_dataset.py (1)
delete_dataset(14-34)cognee/api/v1/delete/routers/get_delete_router.py (1)
delete(18-60)
🔇 Additional comments (3)
.github/workflows/e2e_tests.yml (1)
240-263: New dataset-deletion E2E job is consistent with existing workflow structureThe
test-dataset-database-deletionjob reuses the standard Cognee setup, secrets, and environment pattern and correctly points tocognee/tests/test_dataset_delete.py. This should integrate cleanly with the existing reusable integration tests.cognee/infrastructure/databases/utils/__init__.py (1)
3-4: Re-exporting handler getters from the utils package looks appropriateRe-exporting
get_graph_dataset_database_handlerandget_vector_dataset_database_handlerhere matches the existing pattern and simplifies callers’ imports.cognee/modules/data/deletion/prune_system.py (1)
8-11: LGTM!The new utility imports are correctly structured and align with the refactored deletion logic.
| logger = get_logger() | ||
|
|
||
|
|
||
| async def prune_graph_databases(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring to document function purpose.
Per coding guidelines, undocumented function definitions are considered incomplete. This function should have a docstring explaining that it deletes all graph databases for all dataset_database records when backend access control is enabled.
Apply this diff:
async def prune_graph_databases():
+ """
+ Delete all graph databases for all dataset_database records.
+ Used when backend_access_control is enabled to prune graph databases per-dataset.
+ """
db_engine = get_relational_engine()As per coding guidelines, undocumented functions are assumed incomplete.
🤖 Prompt for AI Agents
In cognee/modules/data/deletion/prune_system.py around line 19, the function
prune_graph_databases lacks a docstring; add a concise docstring immediately
below the async def that states the function's purpose: that it deletes all
graph databases for all dataset_database records when backend access control is
enabled, describes input (none) and behavior (side-effect deletion) and notes
any exceptions or important preconditions (e.g., requires backend access control
enabled). Ensure the docstring follows project style (single- or triple-quoted
summary) and is brief and descriptive.
| dataset_databases = await db_engine.get_all_data_from_table("dataset_database") | ||
| # Go through each dataset database and delete the graph database | ||
| for data_item in data: | ||
| await _prune_graph_db(data_item) | ||
| for dataset_database in dataset_databases: | ||
| handler = get_graph_dataset_database_handler(dataset_database) | ||
| await handler["handler_instance"].delete_dataset(dataset_database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n cognee/modules/data/deletion/prune_system.pyRepository: topoteretes/cognee
Length of output: 3544
🏁 Script executed:
# Find the db_engine.get_all_data_from_table implementation
find . -type f -name "*.py" | xargs grep -l "def get_all_data_from_table" | head -5Repository: topoteretes/cognee
Length of output: 138
🏁 Script executed:
# Find get_graph_dataset_database_handler function
rg -n "def get_graph_dataset_database_handler" --type=pyRepository: topoteretes/cognee
Length of output: 222
🏁 Script executed:
# Find DatasetDatabase class definition
rg -n "class DatasetDatabase" --type=pyRepository: topoteretes/cognee
Length of output: 280
🏁 Script executed:
cat -n cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py | grep -A 20 "def get_all_data_from_table"Repository: topoteretes/cognee
Length of output: 1057
🏁 Script executed:
cat -n cognee/infrastructure/databases/utils/get_graph_dataset_database_handler.pyRepository: topoteretes/cognee
Length of output: 569
🏁 Script executed:
cat -n cognee/modules/users/models/DatasetDatabase.pyRepository: topoteretes/cognee
Length of output: 2140
🏁 Script executed:
cat -n cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py | sed -n '417,460p'Repository: topoteretes/cognee
Length of output: 1885
🏁 Script executed:
# Search for complete implementation of get_all_data_from_table
rg -A 30 "async def get_all_data_from_table" cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.pyRepository: topoteretes/cognee
Length of output: 1234
🏁 Script executed:
# Check how get_all_data_from_table is used elsewhere in the codebase to see if there are other examples
rg -B 2 -A 5 "get_all_data_from_table\(" --type=py | head -50Repository: topoteretes/cognee
Length of output: 3949
🏁 Script executed:
# Check SQLAlchemy Row behavior - does mappings() support attribute access?
python3 << 'EOF'
# Check SQLAlchemy documentation for Row behavior
import sys
try:
from sqlalchemy import Row
# Check if Row supports attribute access
print("SQLAlchemy Row class:", Row)
print("Row supports attribute access via __getattr__")
help(Row)
except Exception as e:
print(f"Error: {e}")
EOFRepository: topoteretes/cognee
Length of output: 97
🏁 Script executed:
# Let's check if there are any similar patterns in the codebase where Row mappings are used with attribute access
rg -B 3 -A 3 "\.mappings\(\)" --type=py cognee/infrastructure/databases/relational/sqlalchemy/Repository: topoteretes/cognee
Length of output: 802
🏁 Script executed:
# Check if DatasetDatabase is instantiated directly elsewhere or if Row objects are used directly
rg -B 2 -A 2 "DatasetDatabase\(" --type=py cognee/ | grep -v test | head -40Repository: topoteretes/cognee
Length of output: 942
🏁 Script executed:
# Check SQLAlchemy version in requirements or setup files
find . -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" | xargs grep -i sqlalchemyRepository: topoteretes/cognee
Length of output: 230
🏁 Script executed:
# Check if there are other usages where get_all_data_from_table results are used with attribute access
rg -B 2 -A 5 "get_all_data_from_table" --type=py cognee/ | grep -A 5 "for.*in.*get_all_data_from_table"Repository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Look for prune_vector_databases which has same pattern as prune_graph_databases
cat -n cognee/modules/data/deletion/prune_system.py | sed -n '35,48p'Repository: topoteretes/cognee
Length of output: 825
🏁 Script executed:
# Check SQLAlchemy Row attribute access support in 2.0+
# Search for any existing uses of Row attribute access in the codebase
rg "result\[" cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py | head -10Repository: topoteretes/cognee
Length of output: 140
🏁 Script executed:
# Look for any test that verifies the attribute access on Row objects from get_all_data_from_table
rg -B 2 -A 2 "result\[.*\]" cognee/tests/test_deduplication.py | head -20Repository: topoteretes/cognee
Length of output: 1194
🏁 Script executed:
# Check if get_vector_dataset_database_handler also uses attribute access like get_graph_dataset_database_handler
cat -n cognee/infrastructure/databases/utils/get_vector_dataset_database_handler.pyRepository: topoteretes/cognee
Length of output: 571
🏁 Script executed:
# Check if there's a typing issue - look at the actual parameter types expected by the handlers
rg -B 5 -A 10 "def get_graph_dataset_database_handler" cognee/infrastructure/databases/utils/get_graph_dataset_database_handler.pyRepository: topoteretes/cognee
Length of output: 499
🏁 Script executed:
# Check if Row objects from SQLAlchemy 2.0 are being used with attribute access anywhere in the codebase
rg "for .* in .*get_all_data_from_table" -A 3 --type=py cognee/ | grep -v testRepository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Check if there are any Row to Model conversions happening anywhere
rg "Row\|mappings" --type=py cognee/infrastructure/databases/relational/sqlalchemy/Repository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Let me verify SQLAlchemy 2.0 Row behavior by checking pyproject.toml more carefully
cat -n pyproject.toml | grep -A 5 -B 5 "sqlalchemy"Repository: topoteretes/cognee
Length of output: 1354
🏁 Script executed:
# Check if Row objects truly support both dict and attribute access in actual usage
# Look for any error handling or type: ignore comments related to this
rg "type: ignore|# noqa" cognee/modules/data/deletion/prune_system.pyRepository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Search for any other functions that receive Row objects and use attribute access
rg -B 3 "\.graph_dataset_database_handler|\.vector_dataset_database_handler" --type=py cognee/Repository: topoteretes/cognee
Length of output: 3574
🏁 Script executed:
# Verify SQLAlchemy 2.0+ Row attribute access support by checking actual Row class behavior
# Look for any issues or type annotations in the codebase related to this
rg "RoomRowMapping|LegacyRow" --type=py cognee/Repository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Check if there are type checking issues - look for mypy or pyright configurations
cat pyproject.toml | grep -A 10 "\[tool.mypy\]"Repository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Verify the actual usage pattern - if the code works, Row objects must support attribute access
# Check if there are any integration tests that use prune_system
rg "prune_graph_databases\|prune_vector_databases" --type=py cognee/tests/Repository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Final verification: Check SQLAlchemy Row documentation behavior in 2.0
# by looking at actual attribute access patterns in the codebase
rg "for .* in .*await.*\." -A 3 --type=py cognee/modules/data/ | head -30Repository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Check if Row objects from mappings() actually support attribute access
# by looking for any similar patterns where Row attributes are accessed
rg "dataset_databases\[.*\]\." --type=py cognee/Repository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Verify by checking actual usage in context_global_variables where similar access happens
cat -n cognee/context_global_variables.py | sed -n '1,80p'Repository: topoteretes/cognee
Length of output: 5277
🏁 Script executed:
# Final check: SQLAlchemy 2.0+ Row objects support both dict and attribute access
# Let's verify this by checking the actual Row implementation usage
rg "from sqlalchemy" cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py | head -10Repository: topoteretes/cognee
Length of output: 309
🏁 Script executed:
# Check if there's any Row wrapping or conversion happening
rg "class.*Row|def.*mappings" --type=py cognee/infrastructure/databases/Repository: topoteretes/cognee
Length of output: 44
🏁 Script executed:
# Verify the complete picture - check what happens in a similar function that works
cat -n cognee/infrastructure/databases/utils/get_or_create_dataset_database.py | head -50Repository: topoteretes/cognee
Length of output: 2232
🏁 Script executed:
# Let's verify SQLAlchemy Row attribute access by examining the actual imports and checking documentation
# SQLAlchemy 2.0+ Row objects support both dict and attribute access by default
python3 << 'EOF'
# Check SQLAlchemy 2.0 Row behavior documentation
# According to SQLAlchemy 2.0 docs, Row objects returned from mappings() support attribute access
# This is a documented feature of SQLAlchemy 2.0+
# The Row class in SQLAlchemy 2.0+ implements both:
# - __getitem__ for dict-like access: row["column_name"]
# - __getattr__ for attribute access: row.column_name
print("SQLAlchemy 2.0+ Row objects support BOTH dict-like and attribute access:")
print("- row['column_name'] works")
print("- row.column_name works")
print("\nThis is documented behavior in SQLAlchemy 2.0+")
print("See: https://docs.sqlalchemy.org/en/20/orm/queryguide.html#rows-are-namedtuples")
EOFRepository: topoteretes/cognee
Length of output: 313
🏁 Script executed:
# Verify that the code actually works with Row attribute access by checking for any actual runtime errors
# or exception handling around this pattern
rg "AttributeError|except.*Error" cognee/modules/data/deletion/prune_system.pyRepository: topoteretes/cognee
Length of output: 250
Add docstring to prune_graph_databases function.
The function lacks documentation as required by project guidelines (code style guideline: "undocumented function definitions in the project's Python code are assumed incomplete").
🤖 Prompt for AI Agents
cognee/modules/data/deletion/prune_system.py lines 22-26: The function
prune_graph_databases is missing a docstring; add a concise triple-quoted
docstring immediately below the def line that states the function purpose
(prune/delete graph DBs for all dataset_database records), describes parameters
(if any, e.g., db_engine) and their types, notes return value (None) and side
effects (calls handler.delete_dataset for each dataset_database), and documents
possible exceptions or errors that may be raised; keep it short (2–4 sentences)
and follow project docstring style.
| handler = get_graph_dataset_database_handler(dataset_database) | ||
| await handler["handler_instance"].delete_dataset(dataset_database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider validating handler dictionary structure.
The code accesses handler["handler_instance"] without verifying the key exists, which could raise a KeyError if the handler structure changes.
Consider adding defensive validation:
handler = get_graph_dataset_database_handler(dataset_database)
+ if "handler_instance" not in handler:
+ logger.warning("Handler missing 'handler_instance' key for dataset_database")
+ continue
await handler["handler_instance"].delete_dataset(dataset_database)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/modules/data/deletion/prune_system.py around lines 25-26, the code
assumes handler["handler_instance"] exists which can raise KeyError; validate
that handler is a dict and contains the "handler_instance" key (and that its
value is not None and exposes delete_dataset) before calling await
handler["handler_instance"].delete_dataset(dataset_database); if validation
fails, log or raise a clear error with context (including dataset_database) or
skip gracefully to avoid unhandled exceptions.
ca746ea to
031cc46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (1)
cognee-mcp/src/server.py (1)
861-867: Migration invocation will block the event loop.The code awaits
run_migrations(), but the implementation incognee/run_migrations.pyuses blockingsubprocess.run(). This will freeze the MCP server during startup.This issue is a consequence of the problems flagged in
cognee/run_migrations.py. Once those are fixed (either by making the function synchronous and removingawaithere, or by making the subprocess call async), this invocation will work correctly.The removal of custom error handling (UserAlreadyExists, etc.) is appropriate since errors will now be raised as exceptions from
run_migrations().
🧹 Nitpick comments (1)
cognee/run_migrations.py (1)
7-10: Clarify package structure assumptions in comments.The comment on lines 7-8 mentions that this is a "placeholder for the path logic," which suggests temporary or incomplete implementation. The constants reference
"alembic"as the migrations directory.Consider expanding the comment to document the expected package structure:
-# Assuming your package is named 'cognee' and the migrations are under 'cognee/migrations' -# This is a placeholder for the path logic. +# Expected package structure: +# cognee/ +# alembic.ini +# alembic/ +# versions/ MIGRATIONS_PACKAGE = "cognee" MIGRATIONS_DIR_NAME = "alembic"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
cognee-mcp/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
alembic/README(0 hunks)cognee-mcp/pyproject.toml(1 hunks)cognee-mcp/src/server.py(1 hunks)cognee/__init__.py(1 hunks)cognee/alembic/README(1 hunks)cognee/alembic/env.py(1 hunks)cognee/run_migrations.py(1 hunks)
💤 Files with no reviewable changes (1)
- alembic/README
✅ Files skipped from review due to trivial changes (1)
- cognee/alembic/README
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/alembic/env.pycognee-mcp/src/server.pycognee/run_migrations.pycognee/__init__.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/alembic/env.pycognee-mcp/src/server.pycognee/run_migrations.pycognee/__init__.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/alembic/env.pycognee/run_migrations.pycognee/__init__.py
🧬 Code graph analysis (2)
cognee-mcp/src/server.py (2)
cognee/shared/logging_utils.py (1)
info(205-205)cognee/run_migrations.py (1)
run_migrations(13-46)
cognee/__init__.py (1)
cognee/run_migrations.py (1)
run_migrations(13-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
- GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
- GitHub Check: End-to-End Tests / Test Feedback Enrichment
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / Test Entity Extraction
- GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
- GitHub Check: End-to-End Tests / Concurrent Subprocess access test
- GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
- GitHub Check: End-to-End Tests / Test graph edge ingestion
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Test Cognify - Edge Centered Payload
- GitHub Check: End-to-End Tests / Conversation sessions test (FS)
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: End-to-End Tests / Test dataset database handlers in Cognee
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: End-to-End Tests / Test dataset database deletion in Cognee
- GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
- GitHub Check: Basic Tests / Run Simple Examples BAML
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: CLI Tests / CLI Integration Tests
- GitHub Check: CLI Tests / CLI Functionality Tests
🔇 Additional comments (2)
cognee/__init__.py (1)
36-37: LGTM: Public API export is properly placed.The import is correctly positioned after all other imports and makes
run_migrationsavailable as part of the public package API.Note: The underlying implementation in
cognee/run_migrations.pyhas issues flagged in a separate review.cognee/alembic/env.py (1)
82-95: The review comment references code that does not exist in the current repository.The actual implementation at
alembic/env.pylines 80-83 usesasyncio.run(run_async_migrations()), which is the correct approach. The problematic try-except pattern withasyncio.get_running_loop()andloop.run_until_complete()is not present in the codebase. Additionally, the file path in the comment is incorrect (cognee/alembic/env.pyshould bealembic/env.py).Likely an incorrect or invalid review comment.
cognee-mcp/pyproject.toml
Outdated
| dependencies = [ | ||
| # For local cognee repo usage remove comment bellow and add absolute path to cognee. Then run `uv sync --reinstall` in the mcp folder on local cognee changes. | ||
| #"cognee[postgres,codegraph,gemini,huggingface,docs,neo4j] @ file:/Users/igorilic/Desktop/cognee", | ||
| "cognee[postgres,codegraph,gemini,huggingface,docs,neo4j] @ file:/Users/igorilic/Desktop/cognee/dist/cognee-0.5.0.dev0-py3-none-any.whl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove hardcoded absolute path before merging.
The dependency uses a hardcoded absolute path specific to a single developer's machine (/Users/igorilic/Desktop/...). This will break the build for other developers and CI environments.
Consider one of these approaches:
- Use a relative path:
file://../../dist/cognee-0.5.0.dev0-py3-none-any.whl - Switch to the commented-out PyPI version on line 12 for production
- Document this as a local-development-only configuration in a README
🤖 Prompt for AI Agents
In cognee-mcp/pyproject.toml around line 10, the dependency entry contains a
hardcoded absolute path to a developer's wheel which will break other
environments; replace it with a non-absolute solution by either (a) changing to
a relative file URL pointing to the local dist (e.g. file://../../dist/... ),
(b) reverting to the commented PyPI package/version for CI/production, or (c) if
this must remain for local dev only, remove it from the committed file and add
documentation in the repo README describing how to install the local wheel for
development; update the file accordingly and ensure CI uses the PyPI entry or a
relative path.
cognee/run_migrations.py
Outdated
| MIGRATIONS_DIR_NAME = "alembic" | ||
|
|
||
|
|
||
| async def run_migrations(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove async keyword from non-async function.
The function is declared as async but contains no await expressions. The only I/O operation is subprocess.run(), which is synchronous and blocking.
Apply this diff:
-async def run_migrations():
+def run_migrations():
"""
Finds the Alembic configuration within the installed package and
programmatically executes 'alembic upgrade head'.
"""Update callers to remove await:
# In cognee-mcp/src/server.py
run_migrations() # Remove await🤖 Prompt for AI Agents
In cognee/run_migrations.py around line 13, the function is declared async but
contains no await and calls the synchronous subprocess.run, so remove the async
keyword to make it a normal function (change "async def run_migrations()" to
"def run_migrations()") and leave the subprocess.run call as-is; then update all
callers (e.g., cognee-mcp/src/server.py) to call run_migrations() without
awaiting it.
cognee/run_migrations.py
Outdated
| migration_result = subprocess.run( | ||
| ["alembic", "upgrade", "head"], | ||
| capture_output=True, | ||
| text=True, | ||
| cwd=Path(package_root), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking subprocess call will freeze the event loop.
subprocess.run() is synchronous and will block the entire event loop if called from async code. Since this function is imported and awaited in cognee-mcp/src/server.py (line 865), it will block the MCP server startup.
If the function must remain async, use the async subprocess API:
- migration_result = subprocess.run(
- ["alembic", "upgrade", "head"],
- capture_output=True,
- text=True,
- cwd=Path(package_root),
- )
+ process = await asyncio.create_subprocess_exec(
+ "alembic", "upgrade", "head",
+ stdout=asyncio.subprocess.PIPE,
+ stderr=asyncio.subprocess.PIPE,
+ cwd=Path(package_root),
+ )
+ stdout, stderr = await process.communicate()
+ migration_result = type('Result', (), {
+ 'returncode': process.returncode,
+ 'stdout': stdout.decode(),
+ 'stderr': stderr.decode()
+ })()Add import:
+import asyncio📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| migration_result = subprocess.run( | |
| ["alembic", "upgrade", "head"], | |
| capture_output=True, | |
| text=True, | |
| cwd=Path(package_root), | |
| ) | |
| process = await asyncio.create_subprocess_exec( | |
| "alembic", "upgrade", "head", | |
| stdout=asyncio.subprocess.PIPE, | |
| stderr=asyncio.subprocess.PIPE, | |
| cwd=Path(package_root), | |
| ) | |
| stdout, stderr = await process.communicate() | |
| migration_result = type('Result', (), { | |
| 'returncode': process.returncode, | |
| 'stdout': stdout.decode(), | |
| 'stderr': stderr.decode() | |
| })() |
🤖 Prompt for AI Agents
In cognee/run_migrations.py around lines 36 to 41 the code uses subprocess.run
which is synchronous and will block the event loop; replace it with the asyncio
subprocess API (e.g., asyncio.create_subprocess_exec or
asyncio.subprocess.create_subprocess_exec) and await the process to completion,
capturing stdout/stderr asynchronously, checking returncode and raising or
returning a proper error if non-zero; also add the necessary import (import
asyncio) and ensure cwd is passed to the async subprocess call (or use os.chdir
temporarily) and that the function remains async so callers can await it without
blocking.
cognee/run_migrations.py
Outdated
| if migration_result.returncode != 0: | ||
| migration_output = migration_result.stderr + migration_result.stdout | ||
| print(f"Migration failed with unexpected error: {migration_output}") | ||
| sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Replace sys.exit() with exception raising.
Calling sys.exit(1) abruptly terminates the process without allowing cleanup or proper error propagation. This prevents the caller from handling migration failures gracefully.
Apply this diff:
if migration_result.returncode != 0:
migration_output = migration_result.stderr + migration_result.stdout
- print(f"Migration failed with unexpected error: {migration_output}")
- sys.exit(1)
+ raise RuntimeError(f"Migration failed: {migration_output}")This allows callers to catch and handle the exception appropriately.
🤖 Prompt for AI Agents
In cognee/run_migrations.py around lines 43 to 46, replace the abrupt process
termination using sys.exit(1) with raising a descriptive exception so callers
can catch and handle migration failures; construct an exception (e.g.,
RuntimeError) including the combined migration_output (migration_result.stderr +
migration_result.stdout) in the message and raise it instead of calling
sys.exit, preserving the existing printed/logged output if desired.
cognee/run_migrations.py
Outdated
|
|
||
| if migration_result.returncode != 0: | ||
| migration_output = migration_result.stderr + migration_result.stdout | ||
| print(f"Migration failed with unexpected error: {migration_output}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use logging utilities instead of print().
Per coding guidelines for cognee/**/*.py, use shared logging utilities from cognee.shared.logging_utils.
Based on coding guidelines, apply this diff:
+from cognee.shared.logging_utils import get_logger
+
+logger = get_logger()
+
async def run_migrations():
"""
Finds the Alembic configuration within the installed package and
programmatically executes 'alembic upgrade head'.
"""
# ... existing code ...
if migration_result.returncode != 0:
migration_output = migration_result.stderr + migration_result.stdout
- print(f"Migration failed with unexpected error: {migration_output}")
+ logger.error(f"Migration failed with unexpected error: {migration_output}")
sys.exit(1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"Migration failed with unexpected error: {migration_output}") | |
| from cognee.shared.logging_utils import get_logger | |
| logger = get_logger() | |
| async def run_migrations(): | |
| """ | |
| Finds the Alembic configuration within the installed package and | |
| programmatically executes 'alembic upgrade head'. | |
| """ | |
| # ... existing code ... | |
| if migration_result.returncode != 0: | |
| migration_output = migration_result.stderr + migration_result.stdout | |
| logger.error(f"Migration failed with unexpected error: {migration_output}") | |
| sys.exit(1) |
🤖 Prompt for AI Agents
In cognee/run_migrations.py around line 45 replace the call to print(...) with
the project logging utilities: import the shared logger from
cognee.shared.logging_utils at the top of the file (or reuse an existing
logger), then change print(f"Migration failed with unexpected error:
{migration_output}") to an appropriate logger method (e.g., logger.error(...))
that includes the migration_output; ensure the import is added only once and no
print() calls remain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cognee/api/v1/datasets/routers/get_datasets_router.py (1)
211-218: Clarify auth result handling and avoiddelete_datasetname shadowing inside the endpointThe new flow correctly (a) enforces delete permission via
get_authorized_existing_datasetsand (b) delegates to the sharedcognee.modules.data.methods.delete_dataset, which satisfies the PR goal of cleaning up associated databases.There are two minor points worth tightening:
- The import on Line 211 shadows the endpoint function name
delete_dataset, which is legal but confusing for readers and makes it harder to tell at a glance whetherawait delete_dataset(...)refers to the endpoint or the data-layer method.- On Lines 213–218, you only guard against
dataset is None. Ifget_authorized_existing_datasetsever returns an empty list for “no authorized dataset”,dataset[0]on Line 218 would raiseIndexError. Using a truthiness check handles bothNoneand empty lists defensively.You can address both with a small refactor:
- from cognee.modules.data.methods import delete_dataset - - dataset = await get_authorized_existing_datasets([dataset_id], "delete", user) - - if dataset is None: - raise DatasetNotFoundError(message=f"Dataset ({str(dataset_id)}) not found.") - - await delete_dataset(dataset[0]) + from cognee.modules.data.methods import delete_dataset as delete_dataset_method + + authorized_datasets = await get_authorized_existing_datasets( + [dataset_id], + "delete", + user, + ) + + if not authorized_datasets: + raise DatasetNotFoundError(message=f"Dataset ({str(dataset_id)}) not found.") + + await delete_dataset_method(authorized_datasets[0])This keeps behavior the same in the happy path, improves robustness if the helper ever returns
[], and removes the local name collision, which should make the code easier to follow.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/api/v1/datasets/routers/get_datasets_router.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/api/v1/datasets/routers/get_datasets_router.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/api/v1/datasets/routers/get_datasets_router.py
cognee/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Public APIs should be type-annotated in Python where practical
Files:
cognee/api/v1/datasets/routers/get_datasets_router.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/api/v1/datasets/routers/get_datasets_router.py
🧬 Code graph analysis (1)
cognee/api/v1/datasets/routers/get_datasets_router.py (1)
cognee/modules/data/methods/delete_dataset.py (1)
delete_dataset(14-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
- GitHub Check: End-to-End Tests / Test dataset database deletion in Cognee
- GitHub Check: End-to-End Tests / Test Feedback Enrichment
- GitHub Check: End-to-End Tests / Test Entity Extraction
- GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: End-to-End Tests / Concurrent Subprocess access test
- GitHub Check: End-to-End Tests / Conversation sessions test (FS)
- GitHub Check: End-to-End Tests / Test dataset database handlers in Cognee
- GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
- GitHub Check: End-to-End Tests / Test Cognify - Edge Centered Payload
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
- GitHub Check: CLI Tests / CLI Integration Tests
- GitHub Check: CLI Tests / CLI Functionality Tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cognee/infrastructure/llm/LLMGateway.py (1)
12-26: Consider forwarding**kwargsin the BAML branch and adding brief method docstringsRight now
acreate_structured_outputaccepts**kwargs, but only the LiteLLM path forwards them; the BAML path ignores them. If the gateway is meant to be backend-agnostic, consider passing**kwargsthrough to the BAMLacreate_structured_outputas well for interface parity.Also, per the project’s Python guidelines, you may want to add short docstrings to the static methods (e.g., summarizing what
acreate_structured_outputexpects/returns) to complement the class-level docstring and align with PEP 257-style documentation.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cognee/api/v1/cognify/cognify.py(1 hunks)cognee/infrastructure/llm/LLMGateway.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code
Files:
cognee/infrastructure/llm/LLMGateway.pycognee/api/v1/cognify/cognify.py
⚙️ CodeRabbit configuration file
**/*.py: When reviewing Python code for this project:
- Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
- As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
- As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
- As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
- As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.
Files:
cognee/infrastructure/llm/LLMGateway.pycognee/api/v1/cognify/cognify.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use shared logging utilities from cognee.shared.logging_utils in Python code
Files:
cognee/infrastructure/llm/LLMGateway.pycognee/api/v1/cognify/cognify.py
cognee/{modules,infrastructure,tasks}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)
Files:
cognee/infrastructure/llm/LLMGateway.py
cognee/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Public APIs should be type-annotated in Python where practical
Files:
cognee/api/v1/cognify/cognify.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: End-to-End Tests / Test Feedback Enrichment
- GitHub Check: End-to-End Tests / Test Entity Extraction
- GitHub Check: End-to-End Tests / Conversation sessions test (FS)
- GitHub Check: End-to-End Tests / Concurrent Subprocess access test
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
- GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Basic Tests / Run Simple Examples BAML
- GitHub Check: CLI Tests / CLI Functionality Tests
- GitHub Check: CLI Tests / CLI Integration Tests
🔇 Additional comments (2)
cognee/api/v1/cognify/cognify.py (1)
56-56: LGTM! Trailing comma aligns with Python conventions.The trailing comma after
**kwargsis a non-functional style improvement that makes future parameter additions cleaner in diffs and is consistent with modern Python formatters like ruff.cognee/infrastructure/llm/LLMGateway.py (1)
34-37: Formatting-only change; behavior stays the sameReformatting this call to multi-line keyword arguments preserves the existing behavior (including forwarding
**kwargs) and improves readability. No issues here.
Description
Type of Change
Screenshots/Videos (if applicable)
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.