Skip to content

Commit 75fea8d

Browse files
authored
Removed check_permissions_on_dataset.py and related references (#1786)
<!-- .github/pull_request_template.md --> ## Description This PR removes the obsolete `check_permissions_on_dataset` task and all its related imports and usages across the codebase. The authorization logic is now handled earlier in the pipeline, so this task is no longer needed. These changes simplify the default Cognify pipeline and make the code cleaner and easier to maintain. ### Changes Made - Removed `cognee/tasks/documents/check_permissions_on_dataset.py` - Removed import from `cognee/tasks/documents/__init__.py` - Removed import and usage in `cognee/api/v1/cognify/cognify.py` - Removed import and usage in `cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py` - Updated comments in `cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py` (index positions changed) - Removed usage in `notebooks/cognee_demo.ipynb` - Updated documentation in `examples/python/simple_example.py` (process description) --- ## Type of Change - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [x] Other (please specify): Task removal / cleanup of deprecated function --- ## Pre-submission Checklist - [ ] **I have tested my changes thoroughly before submitting this PR** - [x] **This PR contains minimal changes necessary to address the issue** - [x] My code follows the project's coding standards and style guidelines - [ ] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description (Closes #1771) - [x] My commits have clear and descriptive messages --- ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
2 parents 00b60ae + 3acb581 commit 75fea8d

File tree

7 files changed

+19
-58
lines changed

7 files changed

+19
-58
lines changed

cognee/api/v1/cognify/cognify.py

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
from cognee.modules.users.models import User
2020

2121
from cognee.tasks.documents import (
22-
check_permissions_on_dataset,
2322
classify_documents,
2423
extract_chunks_from_documents,
2524
)
@@ -78,12 +77,11 @@ async def cognify(
7877
7978
Processing Pipeline:
8079
1. **Document Classification**: Identifies document types and structures
81-
2. **Permission Validation**: Ensures user has processing rights
82-
3. **Text Chunking**: Breaks content into semantically meaningful segments
83-
4. **Entity Extraction**: Identifies key concepts, people, places, organizations
84-
5. **Relationship Detection**: Discovers connections between entities
85-
6. **Graph Construction**: Builds semantic knowledge graph with embeddings
86-
7. **Content Summarization**: Creates hierarchical summaries for navigation
80+
2. **Text Chunking**: Breaks content into semantically meaningful segments
81+
3. **Entity Extraction**: Identifies key concepts, people, places, organizations
82+
4. **Relationship Detection**: Discovers connections between entities
83+
5. **Graph Construction**: Builds semantic knowledge graph with embeddings
84+
6. **Content Summarization**: Creates hierarchical summaries for navigation
8785
8886
Graph Model Customization:
8987
The `graph_model` parameter allows custom knowledge structures:
@@ -274,7 +272,6 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
274272

275273
default_tasks = [
276274
Task(classify_documents),
277-
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
278275
Task(
279276
extract_chunks_from_documents,
280277
max_chunk_size=chunk_size or get_max_chunk_tokens(),
@@ -305,14 +302,13 @@ async def get_temporal_tasks(
305302
306303
The pipeline includes:
307304
1. Document classification.
308-
2. Dataset permission checks (requires "write" access).
309-
3. Document chunking with a specified or default chunk size.
310-
4. Event and timestamp extraction from chunks.
311-
5. Knowledge graph extraction from events.
312-
6. Batched insertion of data points.
305+
2. Document chunking with a specified or default chunk size.
306+
3. Event and timestamp extraction from chunks.
307+
4. Knowledge graph extraction from events.
308+
5. Batched insertion of data points.
313309
314310
Args:
315-
user (User, optional): The user requesting task execution, used for permission checks.
311+
user (User, optional): The user requesting task execution.
316312
chunker (Callable, optional): A text chunking function/class to split documents. Defaults to TextChunker.
317313
chunk_size (int, optional): Maximum token size per chunk. If not provided, uses system default.
318314
chunks_per_batch (int, optional): Number of chunks to process in a single batch in Cognify
@@ -325,7 +321,6 @@ async def get_temporal_tasks(
325321

326322
temporal_tasks = [
327323
Task(classify_documents),
328-
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
329324
Task(
330325
extract_chunks_from_documents,
331326
max_chunk_size=chunk_size or get_max_chunk_tokens(),

cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
from cognee.shared.data_models import KnowledgeGraph
99
from cognee.shared.utils import send_telemetry
1010
from cognee.tasks.documents import (
11-
check_permissions_on_dataset,
1211
classify_documents,
1312
extract_chunks_from_documents,
1413
)
@@ -31,7 +30,6 @@ async def get_cascade_graph_tasks(
3130
cognee_config = get_cognify_config()
3231
default_tasks = [
3332
Task(classify_documents),
34-
Task(check_permissions_on_dataset, user=user, permissions=["write"]),
3533
Task(
3634
extract_chunks_from_documents, max_chunk_tokens=get_max_chunk_tokens()
3735
), # Extract text chunks based on the document type.

cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ async def get_no_summary_tasks(
3030
ontology_file_path=None,
3131
) -> List[Task]:
3232
"""Returns default tasks without summarization tasks."""
33-
# Get base tasks (0=classify, 1=check_permissions, 2=extract_chunks)
34-
base_tasks = await get_default_tasks_by_indices([0, 1, 2], chunk_size, chunker)
33+
# Get base tasks (0=classify, 1=extract_chunks)
34+
base_tasks = await get_default_tasks_by_indices([0, 1], chunk_size, chunker)
3535

3636
ontology_adapter = RDFLibOntologyResolver(ontology_file=ontology_file_path)
3737

@@ -51,8 +51,8 @@ async def get_just_chunks_tasks(
5151
chunk_size: int = None, chunker=TextChunker, user=None
5252
) -> List[Task]:
5353
"""Returns default tasks with only chunk extraction and data points addition."""
54-
# Get base tasks (0=classify, 1=check_permissions, 2=extract_chunks)
55-
base_tasks = await get_default_tasks_by_indices([0, 1, 2], chunk_size, chunker)
54+
# Get base tasks (0=classify, 1=extract_chunks)
55+
base_tasks = await get_default_tasks_by_indices([0, 1], chunk_size, chunker)
5656

5757
add_data_points_task = Task(add_data_points, task_config={"batch_size": 10})
5858

cognee/tasks/documents/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
11
from .classify_documents import classify_documents
22
from .extract_chunks_from_documents import extract_chunks_from_documents
3-
from .check_permissions_on_dataset import check_permissions_on_dataset

cognee/tasks/documents/check_permissions_on_dataset.py

Lines changed: 0 additions & 26 deletions
This file was deleted.

examples/python/simple_example.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,13 @@ async def main():
3232
print("Cognify process steps:")
3333
print("1. Classifying the document: Determining the type and category of the input text.")
3434
print(
35-
"2. Checking permissions: Ensuring the user has the necessary rights to process the text."
35+
"2. Extracting text chunks: Breaking down the text into sentences or phrases for analysis."
3636
)
3737
print(
38-
"3. Extracting text chunks: Breaking down the text into sentences or phrases for analysis."
38+
"3. Generating knowledge graph: Extracting entities and relationships to form a knowledge graph."
3939
)
40-
print("4. Adding data points: Storing the extracted chunks for processing.")
41-
print(
42-
"5. Generating knowledge graph: Extracting entities and relationships to form a knowledge graph."
43-
)
44-
print("6. Summarizing text: Creating concise summaries of the content for quick insights.\n")
40+
print("4. Summarizing text: Creating concise summaries of the content for quick insights.")
41+
print("5. Adding data points: Storing the extracted chunks for processing.\n")
4542

4643
# Use LLMs and cognee to create knowledge graph
4744
await cognee.cognify()

notebooks/cognee_demo.ipynb

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -591,7 +591,7 @@
591591
},
592592
{
593593
"cell_type": "code",
594-
"execution_count": 11,
594+
"execution_count": null,
595595
"id": "7c431fdef4921ae0",
596596
"metadata": {
597597
"ExecuteTime": {
@@ -609,7 +609,6 @@
609609
"from cognee.modules.pipelines import run_tasks\n",
610610
"from cognee.modules.users.models import User\n",
611611
"from cognee.tasks.documents import (\n",
612-
" check_permissions_on_dataset,\n",
613612
" classify_documents,\n",
614613
" extract_chunks_from_documents,\n",
615614
")\n",
@@ -627,7 +626,6 @@
627626
"\n",
628627
" tasks = [\n",
629628
" Task(classify_documents),\n",
630-
" Task(check_permissions_on_dataset, user=user, permissions=[\"write\"]),\n",
631629
" Task(\n",
632630
" extract_chunks_from_documents, max_chunk_size=get_max_chunk_tokens()\n",
633631
" ), # Extract text chunks based on the document type.\n",

0 commit comments

Comments
 (0)