-
Notifications
You must be signed in to change notification settings - Fork 606
Feature/delete preview #1385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/delete preview #1385
Conversation
<!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Demo https://github.com/user-attachments/assets/3873cc5e-f5c7-445d-8b95-49007ca86437 1. Run `cognee -ui`, verify cognee frontend and backend started 2. Run in separate port, verify port is in use ### Test backend integration https://github.com/user-attachments/assets/f74de72a-335b-4cec-a565-2c36c9f503c1 1. Add new file to existing dataset 2. Create new dataset ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Changes Made <!-- List the specific changes made in this PR --> - - - ## Testing <!-- Describe how you tested your changes --> ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] **I have tested my changes thoroughly before submitting this PR** - [ ] **This PR contains minimal changes necessary to address the issue/feature** - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## Related Issues <!-- Link any related issues using "Fixes #issue_number" or "Relates to #issue_number" --> ## Additional Notes <!-- Add any additional notes, concerns, or context for reviewers --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Signed-off-by: shehab-badawy <[email protected]>
Please make sure all the checkboxes are checked:
|
WalkthroughAdds optional backend orchestration to the UI launcher, updates CLI to auto-start the backend with the UI, introduces a deletion preview in the delete command using a new async get_deletion_counts function, and updates messaging to reflect backend availability and preview metrics. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI
participant UI as start_ui()
participant BE as Uvicorn Backend
participant FE as Frontend Server
CLI->>UI: start_ui(host, port, open_browser, start_backend=True, ...)
alt start_backend == True
UI->>BE: spawn process cognee.api.client:app (host/port)
UI-->>UI: wait briefly for backend readiness
UI->>FE: launch frontend server
UI-->>FE: attach _cognee_backend_process to FE proc
else
UI->>FE: launch frontend server
end
note over UI,FE: On failure to start frontend, terminate backend if started
CLI->>UI: stop_ui()
alt Backend attached
UI->>BE: terminate backend process
end
UI->>FE: terminate frontend process
sequenceDiagram
autonumber
participant User as User
participant CLI as delete_command
participant Cnt as get_deletion_counts()
participant Del as cognee.delete(...)
User->>CLI: cognee delete [--dataset NAME | --user-id ID | --all] [--force]
alt --force not provided
CLI->>Cnt: asyncio.run(get_deletion_counts(...))
Cnt-->>CLI: {datasets, data_entries, users}
alt counts == 0
CLI-->>User: "No data found to delete."
CLI-->>CLI: exit
else counts > 0
CLI-->>User: Show breakdown + irreversible warning
User-->>CLI: Confirm? (y/N)
alt Confirmed
CLI->>Del: delete with resolved filters
Del-->>CLI: Success/Exception
CLI-->>User: "Successfully deleted {operation}"
else
CLI-->>User: "Deletion cancelled."
end
end
else --force provided
CLI->>Del: delete immediately
Del-->>CLI: Success/Exception
CLI-->>User: "Successfully deleted {operation}"
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes Pre-merge checks (2 passed, 3 warnings)❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. ✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @shehab-badawy, thank you for submitting a PR! We will respond as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
cognee/api/v1/ui/ui.py (3)
333-336
: Use explicit Optional type annotations for consistency.The new parameters should use explicit
Optional[T]
type annotations to match Python's type system recommendations and maintain consistency with the existing codebase.- start_backend: bool = False, - backend_host: str = "localhost", - backend_port: int = 8000, + start_backend: bool = False, + backend_host: str = "localhost", + backend_port: int = 8000,
412-414
: Use logging.exception for better error diagnostics.When catching exceptions, use
logger.exception()
instead oflogger.error()
to include the full stack trace in the logs.- except Exception as e: - logger.error(f"Failed to start backend server: {str(e)}") + except Exception as e: + logger.exception("Failed to start backend server")
504-506
: Document the custom attribute pattern.Using a custom attribute
_cognee_backend_process
on the Popen object is unconventional. Consider documenting this pattern or using a more standard approach.Consider using a namedtuple or a custom class to encapsulate both processes:
from typing import NamedTuple class UIServerProcesses(NamedTuple): frontend: subprocess.Popen backend: Optional[subprocess.Popen] = NoneWould you like me to refactor this to use a more explicit data structure?
cognee/modules/data/methods/get_deletion_counts.py (1)
23-24
: Consider more descriptive return for missing datasets.When a dataset is not found, consider including the dataset name in the response or throwing an exception to make debugging easier.
- if dataset is None: - return {"datasets": 0, "data_entries": 0} + if dataset is None: + return {"datasets": 0, "data_entries": 0, "message": f"Dataset '{dataset_name}' not found"}cognee/cli/commands/delete_command.py (1)
103-105
: Avoid asserting success while delete is unreliable; soften message or verify post-delete.
To prevent false positives, either soften the message or re-check counts to confirm.Option A (softer message):
- fmt.success(f"Successfully deleted {operation}") + fmt.success(f"Requested deletion of {operation}")Option B (verify outcome; may be async-heavy):
- Re-run get_deletion_counts for the same scope and log a post-delete summary; only print “Successfully deleted …” if counts dropped to zero (or decreased).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cognee/api/v1/ui/ui.py
(5 hunks)cognee/cli/_cognee.py
(1 hunks)cognee/cli/commands/delete_command.py
(3 hunks)cognee/modules/data/methods/get_deletion_counts.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
cognee/modules/data/methods/get_deletion_counts.py (4)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
get_relational_engine
(5-21)cognee/modules/data/models/Data.py (1)
Data
(12-59)cognee/modules/data/models/DatasetData.py (1)
DatasetData
(6-12)cognee/cli/commands/delete_command.py (1)
execute
(35-109)
cognee/cli/_cognee.py (2)
cognee/api/v1/ui/ui.py (1)
start_ui
(328-524)cognee/cli/echo.py (2)
success
(28-30)echo
(8-10)
cognee/api/v1/ui/ui.py (1)
cognee/shared/logging_utils.py (1)
info
(175-175)
cognee/cli/commands/delete_command.py (2)
cognee/modules/data/methods/get_deletion_counts.py (1)
get_deletion_counts
(8-57)cognee/cli/echo.py (4)
echo
(8-10)success
(28-30)warning
(18-20)confirm
(38-40)
🪛 Ruff (0.12.2)
cognee/modules/data/methods/get_deletion_counts.py
9-9: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
9-9: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
cognee/api/v1/ui/ui.py
383-383: subprocess
call: check for execution of untrusted input
(S603)
412-412: Do not catch blind exception: Exception
(BLE001)
413-413: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
413-413: Use explicit conversion flag
Replace with conversion flag
(RUF010)
508-508: Consider moving this statement to an else
block
(TRY300)
510-510: Do not catch blind exception: Exception
(BLE001)
511-511: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
511-511: Use explicit conversion flag
Replace with conversion flag
(RUF010)
557-557: Do not catch blind exception: Exception
(BLE001)
558-558: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
558-558: Use explicit conversion flag
Replace with conversion flag
(RUF010)
600-600: Consider moving this statement to an else
block
(TRY300)
602-602: Do not catch blind exception: Exception
(BLE001)
603-603: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
603-603: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🪛 Pylint (3.3.7)
cognee/api/v1/ui/ui.py
[refactor] 383-398: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 527-527: Too many branches (13/12)
(R0912)
[refactor] 527-527: Too many statements (51/50)
(R0915)
🔇 Additional comments (9)
cognee/api/v1/ui/ui.py (2)
543-559
: Good error handling for backend cleanup.The implementation properly handles backend process cleanup with graceful termination followed by forced kill if necessary. The error handling is appropriate and logs are informative.
383-398
: Incorrect — subprocess uses argv list (no shell); validate host/port insteadsubprocess.Popen is invoked with a list (shell=False) in cognee/api/v1/ui/ui.py:383–392, so shell-based command injection via backend_host/backend_port is not applicable; still add validation to ensure backend_host is a valid hostname/IP and backend_port is an int in 1–65535 to avoid malformed values or accidental exposure.
Suggested quick patch (add before line 383):
import re import ipaddress # Validate backend_host if not isinstance(backend_host, str) or not backend_host: raise ValueError(f"Invalid backend_host: {backend_host!r}") try: ipaddress.ip_address(backend_host) except ValueError: if not re.fullmatch( r"[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(?:\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*", backend_host, ): raise ValueError(f"Invalid backend_host: {backend_host!r}") # Validate backend_port try: backend_port = int(backend_port) except (TypeError, ValueError): raise ValueError(f"Invalid backend_port: {backend_port!r}") if not (1 <= backend_port <= 65535): raise ValueError(f"Invalid backend_port: {backend_port}. Must be 1-65535.")Likely an incorrect or invalid review comment.
cognee/cli/_cognee.py (2)
209-211
: LGTM! Backend auto-start integration is clean.The integration of the backend auto-start feature is well-implemented with the
start_backend=True
parameter.
216-216
: Good user communication about backend availability.The addition of the backend URL information helps users understand what services are available.
cognee/modules/data/methods/get_deletion_counts.py (2)
27-32
: Efficient database query implementation.Good use of SQLAlchemy's func.count() for efficient counting without loading all records into memory.
36-50
: Comprehensive counting for all_data mode.The implementation correctly counts all datasets, data entries, and users when deleting all data. The queries are efficient and the return structure is consistent.
cognee/cli/commands/delete_command.py (3)
9-9
: LGTM: import adds the required dependency for preview counts.
Clean, explicit import; avoids circulars by using late import for cognee below.
72-75
: Confirmation defaults to No — good.
The prompt uses default=False via fmt.confirm; meets safety requirement.
93-95
: Track the underlying delete bug and gate user-facing success accordingly.
Good to document the known issue; ensure there’s a linked issue reference in code or commit for traceability. Until fixed, prefer non-committal success messaging (see next comment).
# If --force is used, skip the preview and go straight to deletion | ||
if not args.force: | ||
# --- START PREVIEW LOGIC --- | ||
fmt.echo("Gathering data for preview...") | ||
preview_data = asyncio.run( | ||
get_deletion_counts( | ||
dataset_name=args.dataset_name, | ||
user_id=args.user_id, | ||
all_data=args.all, | ||
) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix unsafe mismatch between preview scope and actual deletion; enforce exclusivity and align param mapping.
If a user passes multiple target flags (e.g., --all with --dataset-name), preview uses dataset scope (get_deletion_counts prioritizes dataset_name) but deletion currently takes the --all branch. That can delete more than previewed.
Apply these diffs to align the operation label and deletion params with a single, consistent precedence (dataset > user > all):
- if args.all:
- operation = "all data"
- elif args.dataset_name:
- operation = f"dataset '{args.dataset_name}'"
- elif args.user_id:
- operation = f"data for user '{args.user_id}'"
- else:
- operation = "data"
+ if args.dataset_name:
+ operation = f"dataset '{args.dataset_name}'"
+ elif args.user_id:
+ operation = f"data for user '{args.user_id}'"
+ elif args.all:
+ operation = "all data"
+ else:
+ operation = "data"
- if args.all:
- await cognee.delete(dataset_name=None, user_id=args.user_id)
- else:
- await cognee.delete(dataset_name=args.dataset_name, user_id=args.user_id)
+ # Execute deletion with the same scope used for preview
+ if args.dataset_name:
+ await cognee.delete(dataset_name=args.dataset_name, user_id=None)
+ elif args.user_id:
+ await cognee.delete(dataset_name=None, user_id=args.user_id)
+ elif args.all:
+ await cognee.delete(dataset_name=None, user_id=None)
+ else:
+ raise CliCommandInnerException("No deletion target provided (unexpected).")
Additionally, make the targets mutually exclusive at the parser level (outside this hunk) to prevent conflicting flags:
# in configure_parser(...)
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("--dataset-name", "-d", help="Specific dataset to delete")
group.add_argument("--user-id", "-u", help="User ID to delete data for")
group.add_argument("--all", action="store_true", help="Delete all data (requires confirmation)")
parser.add_argument("--force", "-f", action="store_true", help="Skip confirmation prompts")
This also lets you drop the manual validation branch.
Also applies to: 79-87, 95-99
preview_data = asyncio.run( | ||
get_deletion_counts( | ||
dataset_name=args.dataset_name, | ||
user_id=args.user_id, | ||
all_data=args.all, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fall back to direct delete if preview metrics retrieval fails.
Acceptance requires continuing with deletion when metrics can’t be fetched. Currently, any failure aborts the command.
Apply this diff to catch preview failures and proceed without preview:
- preview_data = asyncio.run(
- get_deletion_counts(
- dataset_name=args.dataset_name,
- user_id=args.user_id,
- all_data=args.all,
- )
- )
+ try:
+ preview_data = asyncio.run(
+ get_deletion_counts(
+ dataset_name=args.dataset_name,
+ user_id=args.user_id,
+ all_data=args.all,
+ )
+ )
+ except Exception as ex:
+ fmt.warning(f"Could not retrieve preview metrics; proceeding without preview. Reason: {ex}")
+ preview_data = None
@@
- if not preview_data or all(value == 0 for value in preview_data.values()):
- fmt.success("No data found to delete.")
- return
+ if preview_data is not None:
+ if not preview_data or all(value == 0 for value in preview_data.values()):
+ fmt.success("No data found to delete.")
+ return
@@
- fmt.warning("This operation is irreversible!")
- if not fmt.confirm("Proceed?"):
- fmt.echo("Deletion cancelled.")
- return
+ if preview_data is not None:
+ fmt.warning("This operation is irreversible!")
+ if not fmt.confirm("Proceed?"):
+ fmt.echo("Deletion cancelled.")
+ return
Also consider adding tests for:
- delete --all / --dataset-name / --user-id previews,
- empty target messaging,
- --force path.
Also applies to: 57-76
dataset_name: str = None, user_id: str = None, all_data: bool = False | ||
) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use explicit Optional type annotations.
PEP 484 prohibits implicit Optional. Parameters that can be None should use explicit Optional type annotations.
-async def get_deletion_counts(
- dataset_name: str = None, user_id: str = None, all_data: bool = False
-) -> dict:
+from typing import Optional
+
+async def get_deletion_counts(
+ dataset_name: Optional[str] = None,
+ user_id: Optional[str] = None,
+ all_data: bool = False
+) -> dict:
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
dataset_name: str = None, user_id: str = None, all_data: bool = False | |
) -> dict: | |
from typing import Optional | |
async def get_deletion_counts( | |
dataset_name: Optional[str] = None, | |
user_id: Optional[str] = None, | |
all_data: bool = False, | |
) -> dict: |
🧰 Tools
🪛 Ruff (0.12.2)
9-9: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
9-9: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
🤖 Prompt for AI Agents
In cognee/modules/data/methods/get_deletion_counts.py around lines 9 to 10, the
function parameters use implicit None defaults which should be annotated with
typing.Optional; change dataset_name: str = None and user_id: str = None to
dataset_name: Optional[str] = None and user_id: Optional[str] = None, and add
"from typing import Optional" to the imports if it's not already present to
satisfy PEP 484 explicit Optional annotations.
# Placeholder for user_id logic | ||
if user_id: | ||
# TODO: Implement counting logic for a specific user | ||
return {"datasets": 0, "data_entries": 0, "users": 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement user_id deletion counts.
The user_id functionality is marked as TODO but returns placeholder data. This could mislead users about what will actually be deleted.
The placeholder return value {"datasets": 0, "data_entries": 0, "users": 1}
is misleading. Either implement the actual counting logic or return an error/warning to the user.
Would you like me to help implement the user_id counting logic? This would involve:
- Counting datasets owned by the user
- Counting data entries associated with those datasets
- Returning accurate counts
Description
This pull request implements the "Delete Command Preview" enhancement. The cognee delete command now includes a preview step that summarizes what will be deleted before asking for user confirmation. This improves usability and safety by helping users understand the scope of the deletion.
During development, it was discovered that the underlying cognee.delete() function for datasets appears to be failing silently. This new preview feature correctly shows the user what should be deleted, which helps to expose this separate, pre-existing bug.
Type of Change
[ ] Bug fix (non-breaking change that fixes an issue)
[x] New feature (non-breaking change that adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation update
[ ] Code refactoring
[ ] Performance improvement
[ ] Other (please specify):
Changes Made
Created a new function get_deletion_counts to efficiently query the relational database for the number of affected datasets, data entries, and users.
Modified delete_command.py to call this new function and display a formatted preview to the user.
Integrated the --force flag to allow skipping the new preview and confirmation step.
Added a "No data found to delete" message when the target of the deletion is empty, preventing unnecessary prompts.
Testing
Tested the delete --dataset-name case by adding a new dataset with cognee add and verifying that the preview correctly showed "1 datasets" and "1 data entries".
Tested delete --all and confirmed it displayed the total counts of all data in the system.
Wrote and used a helper script to get the default user ID and tested the delete --user-id case.
Confirmed that answering "n" to the confirmation prompt correctly cancels the operation.
Confirmed that the --force flag skips the preview and proceeds directly to the deletion attempt.
Screenshots/Videos (if applicable)
Pre-submission Checklist
[x] I have tested my changes thoroughly before submitting this PR
[x] This PR contains minimal changes necessary to address the issue/feature
[x] My code follows the project's coding standards and style guidelines (verified with ruff)
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] I have added necessary documentation (if applicable)
[x] All new and existing tests pass
[x] I have searched existing PRs to ensure this change hasn't been submitted already
[x] I have linked any relevant issues in the description
[x] My commits have clear and descriptive messages
Related Issues
Fixes #1366
Additional Notes
This PR focuses solely on adding the preview functionality as requested. It does not fix the underlying issue where cognee.delete(dataset_name=...) fails silently. However, this feature makes that bug clearly visible to the user, as the preview will show the correct counts, but the data will remain after the operation.
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.