Skip to content

Conversation

@martin0731
Copy link
Contributor

@martin0731 martin0731 commented Nov 13, 2025

Description

This PR removes the obsolete check_permissions_on_dataset task and all its related imports and usages across the codebase.
The authorization logic is now handled earlier in the pipeline, so this task is no longer needed.
These changes simplify the default Cognify pipeline and make the code cleaner and easier to maintain.

Changes Made

  • Removed cognee/tasks/documents/check_permissions_on_dataset.py
  • Removed import from cognee/tasks/documents/__init__.py
  • Removed import and usage in cognee/api/v1/cognify/cognify.py
  • Removed import and usage in cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py
  • Updated comments in cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (index positions changed)
  • Removed usage in notebooks/cognee_demo.ipynb
  • Updated documentation in examples/python/simple_example.py (process description)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Other (please specify): Task removal / cleanup of deprecated function

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue
  • My code follows the project's coding standards and style guidelines
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description (Closes refactor: Remove check_permissions_on_dataset function #1771)
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

Walkthrough

This PR removes the check_permissions_on_dataset function and all its usages from the codebase. The function previously validated user permissions on datasets. Changes include removing the function module, deleting imports and task usages from pipeline definitions, updating module exports, and adjusting documentation examples to reflect the removed step.

Changes

Cohort / File(s) Summary
Pipeline removal
cognee/api/v1/cognify/cognify.py, cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py
Removed check_permissions_on_dataset import and task usage from default and temporal task pipelines
Task index adjustment
cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
Updated base task selection from indices [0, 1, 2] to [0, 1], excluding permission validation from task sets
Module API
cognee/tasks/documents/__init__.py
Removed re-export of check_permissions_on_dataset
Implementation removal
cognee/tasks/documents/check_permissions_on_dataset.py
Deleted entire module containing the async permission validation function
Documentation
examples/python/simple_example.py
Updated process step descriptions to remove permission check reference and renumber subsequent steps

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Homogeneous removals across multiple files following a consistent pattern
  • No logic inversions or complex refactoring; straightforward deletion of unused function and references
  • Verify no other imports or usages of check_permissions_on_dataset exist elsewhere in the codebase

Possibly related issues

Possibly related PRs

Suggested labels

run-checks

Suggested reviewers

  • lxobr
  • hajdul88
  • borisarzentar

Poem

🐰 A permission check once guarded the way,
But now it hops off to see the light of day!
Tasks flow faster, pipelines spring free,
Simpler validations for you and for me! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: removal of check_permissions_on_dataset.py and its related references throughout the codebase.
Linked Issues check ✅ Passed The PR fully addresses issue #1771 objectives: removed check_permissions_on_dataset.py, deleted module with async function, removed all imports and references across the codebase, and updated documentation.
Out of Scope Changes check ✅ Passed All changes are in scope: removal of deprecated function, its imports, and documentation updates directly align with the stated objective of cleaning up the permissions validation task.
Description check ✅ Passed The PR description provides a clear human-written explanation of changes, rationale, and a detailed breakdown of modified files. It addresses all key template sections including description, type of change, and pre-submission checklist items.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @martin0731, thank you for submitting a PR! We will respond as soon as possible.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cognee/api/v1/cognify/cognify.py (2)

244-294: Consider removing the unused user parameter.

The removal of the permissions check task is correct. However, the user parameter is no longer used within get_default_tasks() but remains in the function signature. While keeping it might provide backward compatibility or future extensibility, consider removing it to clarify the function's actual dependencies.

If you decide to clean up the signature, apply this diff:

 async def get_default_tasks(  # TODO: Find out a better way to do this (Boris's comment)
-    user: User = None,
     graph_model: BaseModel = KnowledgeGraph,
     chunker=TextChunker,
     chunk_size: int = None,

And update the call site in cognify() at line 217:

     tasks = await get_default_tasks(
-        user=user,
         graph_model=graph_model,
         chunker=chunker,

297-332: Consider removing the unused user parameter for consistency.

The removal of the permissions check task and docstring updates are correct. Similar to get_default_tasks(), the user parameter is no longer used within get_temporal_tasks(). For consistency and clarity, consider removing it from the signature as well.

If you decide to clean up the signature, apply this diff:

 async def get_temporal_tasks(
-    user: User = None, chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = 10
+    chunker=TextChunker, chunk_size: int = None, chunks_per_batch: int = 10
 ) -> list[Task]:
     """
     Builds and returns a list of temporal processing tasks to be executed in sequence.
 
     The pipeline includes:
     1. Document classification.
     2. Document chunking with a specified or default chunk size.
     3. Event and timestamp extraction from chunks.
     4. Knowledge graph extraction from events.
     5. Batched insertion of data points.
 
     Args:
-        user (User, optional): The user requesting task execution.
         chunker (Callable, optional): A text chunking function/class to split documents. Defaults to TextChunker.

And update the call site in cognify() at line 213:

     tasks = await get_temporal_tasks(
-        user=user, chunker=chunker, chunk_size=chunk_size, chunks_per_batch=chunks_per_batch
+        chunker=chunker, chunk_size=chunk_size, chunks_per_batch=chunks_per_batch
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 487635b and 3acb581.

📒 Files selected for processing (7)
  • cognee/api/v1/cognify/cognify.py (2 hunks)
  • cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py (0 hunks)
  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (2 hunks)
  • cognee/tasks/documents/__init__.py (0 hunks)
  • cognee/tasks/documents/check_permissions_on_dataset.py (0 hunks)
  • examples/python/simple_example.py (1 hunks)
  • notebooks/cognee_demo.ipynb (1 hunks)
🔥 Files not summarized due to errors (1)
  • notebooks/cognee_demo.ipynb: Error: Server error: no LLM provider could handle the message
💤 Files with no reviewable changes (3)
  • cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py
  • cognee/tasks/documents/check_permissions_on_dataset.py
  • cognee/tasks/documents/init.py
🧰 Additional context used
📓 Path-based instructions (3)
{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py: Use 4-space indentation; name modules and functions in snake_case; name classes in PascalCase (Python)
Adhere to ruff rules, including import hygiene and configured line length (100)
Keep Python lines ≤ 100 characters

Files:

  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
  • cognee/api/v1/cognify/cognify.py
  • examples/python/simple_example.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/**/*.py: Public APIs in the core library should be type-annotated where practical
Prefer explicit, structured error handling and use shared logging utilities from cognee.shared.logging_utils

Files:

  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
  • cognee/api/v1/cognify/cognify.py
examples/python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

When adding public APIs, provide or update targeted examples under examples/python/

Files:

  • examples/python/simple_example.py
🧠 Learnings (2)
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.

Applied to files:

  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
📚 Learning: 2024-10-07T11:20:44.876Z
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-07T11:20:44.876Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.

Applied to files:

  • cognee/api/v1/cognify/cognify.py
🔇 Additional comments (6)
notebooks/cognee_demo.ipynb (1)

591-597: No action required—notebook changes are correct.

Verification confirms that no source code references to check_permissions_on_dataset exist in the notebook cells. The only mentions of this function appear in old execution logs (output from previous runs), which will no longer be generated once the function is removed from the source code. The change to execution_count: null is appropriate notebook cleanup for fresh execution.

examples/python/simple_example.py (1)

35-41: LGTM! Documentation accurately reflects the updated pipeline.

The process step descriptions have been correctly updated to reflect the removal of the permissions check task. The steps now accurately describe the cognify pipeline flow.

cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (2)

33-34: LGTM! Base task selection correctly updated.

The indices have been correctly adjusted from [0, 1, 2] to [0, 1] to reflect the removal of the permissions check task. The comment accurately describes the selected tasks (classify and extract_chunks).


54-55: LGTM! Consistent with get_no_summary_tasks.

The base task selection has been updated consistently with the changes in get_no_summary_tasks, correctly selecting only the classify and extract_chunks tasks.

cognee/api/v1/cognify/cognify.py (2)

21-24: LGTM! Import correctly removed.

The removal of check_permissions_on_dataset from the imports is correct and aligns with the deletion of the corresponding module.


80-85: LGTM! Docstring accurately reflects the updated pipeline.

The processing pipeline description has been correctly updated to remove the permissions check step and accurately describes the current cognify workflow.

@martin0731
Copy link
Contributor Author

Hi! I’ve completed the removal of the check_permissions_on_dataset task and all related imports and usages.
The code builds cleanly and no errors appeared during modification.
Please let me know if you have any additional comments or suggestions.

Thanks for reviewing!

@Vasilije1990 Vasilije1990 added the community-contribution Community contribution label label Nov 19, 2025
@Vasilije1990
Copy link
Contributor

@dexters1 please review

Copy link
Member

@borisarzentar borisarzentar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Let's remove that one unused arg and it is perfect!

Args:
user (User, optional): The user requesting task execution, used for permission checks.
user (User, optional): The user requesting task execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove user argument if we don't need it.

@Vasilije1990 Vasilije1990 merged commit 75fea8d into topoteretes:main Dec 8, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Community contribution label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants