fix: graph visualization permissions #1219

EricXiao95 · 2025-08-07T14:17:52Z

Description

This PR fix graph visualization access for users with read permissions (#1182)

Add permission checks for graph visualization endpoints to ensure users can only access datasets they have permission to view
Create get_dataset_with_permissions method to validate user access before returning a dataset
Remove redundant dataset existence validation in datasets router and delegate permission checking to graph data retrieval
Add comprehensive test suite for graph visualization permissions covering owner access and permission granting scenarios
Update get_formatted_graph_data() to use dataset owner's ID for context

Testing

Tests can be run with:

pytest -s cognee/tests/test_graph_visualization_permissions.py

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

…er optimization (topoteretes#1151)  ## Description feature: solve edge embedding duplicates in edge collection + retriever optimization ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]>

…opoteretes#1092)  ## Description Attempt at making incremental loading run async ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Add async lock for dynamic table creation ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

…poteretes#1177)  ## Description Add default tokenizer for custom models not available on HuggingFace ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  This PR implements the 'FEELING_LUCKY' search type, which intelligently routes user queries to the most appropriate search retriever, addressing [topoteretes#1162](topoteretes#1162). - implement new search type FEELING_LUCKY - Add the select_search_type function to analyze queries and choose the proper search type - Integrate with an LLM for intelligent search type determination - Add logging for the search type selection process - Support fallback to RAG_COMPLETION when the LLM selection fails - Add tests for the new search type ## How it works When a user selects the 'FEELING_LUCKY' search type, the system first sends their natural language query to an LLM-based classifier. This classifier analyzes the query's intent (e.g., is it asking for a relationship, a summary, or a factual answer?) and selects the optimal SearchType, such as 'INSIGHTS' or 'GRAPH_COMPLETION'. The main search function then proceeds using this dynamically selected type. If the classification process fails, it gracefully falls back to the default 'RAG_COMPLETION' type. ## Testing Tests can be run with: ```bash python -m pytest cognee/tests/unit/modules/search/search_methods_test.py -k "feeling_lucky" -v ``` ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Signed-off-by: EricXiao <[email protected]>

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Resolve issues with Cognee MCP docker use ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Signed-off-by: Andrew Carbonetto <[email protected]> Signed-off-by: Andy Kwok <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: vasilije <[email protected]> Co-authored-by: Andrew Carbonetto <[email protected]> Co-authored-by: Andy Kwok <[email protected]>

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Signed-off-by: Raj2604 <[email protected]> Co-authored-by: Daulet Amirkhanov <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Boris <[email protected]> Co-authored-by: Matea Pesic <[email protected]> Co-authored-by: github-actions[bot] <[email protected]> Co-authored-by: hajdul88 <[email protected]> Co-authored-by: Boris Arzentar <[email protected]> Co-authored-by: Raj Mandhare <[email protected]> Co-authored-by: Pedro Thompson <[email protected]> Co-authored-by: Pedro Henrique Thompson Furtado <[email protected]>

## Description Add multi db support for Neo4j Enterprise users ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Signed-off-by: Raj2604 <[email protected]> Co-authored-by: vasilije <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Daulet Amirkhanov <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Boris <[email protected]> Co-authored-by: Matea Pesic <[email protected]> Co-authored-by: github-actions[bot] <[email protected]> Co-authored-by: hajdul88 <[email protected]> Co-authored-by: Boris Arzentar <[email protected]> Co-authored-by: Raj Mandhare <[email protected]> Co-authored-by: Pedro Thompson <[email protected]> Co-authored-by: Pedro Henrique Thompson Furtado <[email protected]>

This will allow to deal with the issue when the user is using custom embedding and LLM and passes the hosted_vllm option as part of the LiteLLM documentation  ## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

…sh (topoteretes#1210)  ## Description Changing deletion logic to use document id instead of content hash ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  - Improved list handling, removed `.index` logic from `get_graph_from_model`, transitioned to fully datapoint-oriented processing - Streamlined datapoint iteration by introducing `_datapoints_generator` with nested loops - Generalized field processing to handle mixed lists: `[DataPoint, (Edge, DataPoint), (Edge, [DataPoint])]`, allowing dynamic multiple edges generation - Small improvements and refactorings - Added tests to `test_get_graph_from_model_flexible_edges()` covering weighted edges and dynamic multiple edges - Created `dynamic_multiple_edges_example.py` demonstrating dynamic multiple edges ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Signed-off-by: EricXiao <[email protected]>

pull-checklist · 2025-08-07T14:17:57Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-08-07T14:18:03Z

Caution

Review failed

The pull request is closed.

Walkthrough

This update introduces extensive support for structured output frameworks, notably integrating the BAML client and instructor-based LLM gateway, and refactors LLM-related logic to use a new LLMGateway abstraction. It also implements incremental data loading in pipelines, adds permission-aware dataset access, introduces new data models, and updates search type selection with a "feeling lucky" mode.

Changes

Cohort / File(s)	Change Summary
Structured Output Framework: BAML Integration `cognee/infrastructure/llm/structured_output_framework/baml/baml_client/`, `cognee/infrastructure/llm/structured_output_framework/baml/baml_src/`, `cognee/infrastructure/llm/structured_output_framework/baml/baml_src/extraction/*`	Introduces BAML async/sync clients, runtime, parsers, type builder, type map, and Pydantic data models. Adds BAML schema and prompt templates for content graph extraction and classification, with async summary extraction and mock summary support.
LLMGateway Abstraction and LLM Refactor `cognee/infrastructure/llm/LLMGateway.py`, `cognee/infrastructure/llm/__init__.py`, `cognee/infrastructure/llm/config.py`, `cognee/infrastructure/llm/utils.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/extraction/`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/`, `cognee/modules/retrieval/`, `cognee/modules/retrieval/utils/`, `cognee/modules/data/processing/document_types/*`, `cognee/tasks/chunk_naive_llm_classifier/chunk_naive_llm_classifier.py`, `cognee/eval_framework/evaluation/direct_llm_eval_adapter.py`, `cognee/modules/engine/utils/generate_edge_id.py`	Adds LLMGateway class as a unified interface for LLM operations. Refactors all LLM and prompt usage to static gateway methods, removing direct client instantiation and scattered utility functions. Updates imports and usage throughout retrieval, evaluation, and chunk classification modules.
Incremental Loading and Pipeline Refactor `cognee/modules/pipelines/operations/run_tasks.py`, `cognee/modules/pipelines/operations/pipeline.py`, `cognee/api/v1/add/add.py`, `cognee/api/v1/cognify/cognify.py`, `cognee/api/v1/cognify/code_graph_pipeline.py`, `cognee/modules/pipelines/models/PipelineRunInfo.py`, `cognee/modules/pipelines/models/DataItemStatus.py`, `cognee/modules/pipelines/models/__init__.py`, `cognee/api/v1/add/routers/get_add_router.py`, `cognee/api/v1/cognify/routers/get_cognify_router.py`, `cognee/modules/pipelines/exceptions/*`	Implements incremental loading for pipeline tasks, allowing per-data-item processing and skipping already-completed items. Adds new error and status models for pipeline runs, updates API endpoints to handle new statuses and errors.
Graph Database and Data Model Updates `cognee/infrastructure/databases/graph/*`, `cognee/base_config.py`, `cognee/modules/data/models/Data.py`	Adds support for graph database name in configs/adapters, updates subgraph lookup to use `data_id` instead of `content_hash`, adds pipeline status to Data model, and updates config dictionary outputs.
Dataset Permission and Access Control `cognee/modules/data/methods/*`, `cognee/modules/graph/methods/get_formatted_graph_data.py`, `cognee/api/v1/datasets/routers/get_datasets_router.py`	Introduces permission-aware dataset retrieval, updates graph data formatting to enforce permissions, and modifies dataset status endpoint defaults.
Search Type Selector and "Feeling Lucky" Mode `cognee/modules/search/operations/select_search_type.py`, `cognee/modules/search/operations/__init__.py`, `cognee/modules/search/methods/search.py`, `cognee/modules/search/types/SearchType.py`, `cognee/infrastructure/llm/prompts/search_type_selector_prompt.txt`, `cognee/api/v1/search/search.py`	Adds async search type selector using LLM, introduces "FEELING_LUCKY" search type, and updates search logic and documentation to support dynamic query type selection.
Document and Chunk Processing `cognee/tasks/documents/extract_chunks_from_documents.py`, `cognee/modules/data/processing/document_types/PdfDocument.py`	Removes custom PDF error handling, now allowing exceptions to propagate directly during PDF reading and chunk extraction.
Edge and Graph Utilities `cognee/modules/engine/utils/generate_edge_id.py`, `cognee/modules/graph/utils/get_graph_from_model.py`, `cognee/modules/graph/cognee_graph/CogneeGraph.py`	Adds edge ID generation utility, refactors graph extraction to handle relationships and edge metadata more consistently, and simplifies edge mapping and triplet importance calculations.
Miscellaneous and Formatting `cognee/infrastructure/llm/tokenizer/`, `cognee/infrastructure/databases/vector/embeddings/`, `cognee/infrastructure/llm/prompts/*`, `.env.template`, `cognee/shared/data_models.py`	Updates tokenizer imports and fallback logic, adds missing newlines to prompt templates, expands environment variable templates, and makes minor formatting/import changes.
Removals `cognee/modules/data/extraction/extract_categories.py`	Removes the old extract_categories function in favor of LLMGateway-based implementations.

Sequence Diagram(s)

sequenceDiagram
    participant API
    participant LLMGateway
    participant BAMLClient
    participant InstructorClient

    API->>LLMGateway: extract_content_graph(content, response_model, mode)
    alt framework == "BAML"
        LLMGateway->>BAMLClient: ExtractContentGraphGeneric(content, mode)
        BAMLClient-->>LLMGateway: KnowledgeGraph
    else framework == "instructor"
        LLMGateway->>InstructorClient: acreate_structured_output(content, prompt, response_model)
        InstructorClient-->>LLMGateway: KnowledgeGraph
    end
    LLMGateway-->>API: KnowledgeGraph

sequenceDiagram
    participant User
    participant API
    participant Pipeline
    participant DB

    User->>API: POST /add (incremental_loading=True)
    API->>Pipeline: cognee_pipeline(..., incremental_loading=True)
    Pipeline->>DB: Check data item status
    alt Already processed
        Pipeline-->>API: PipelineRunAlreadyCompleted
    else Not processed
        Pipeline->>DB: Process and update status
        Pipeline-->>API: PipelineRunCompleted
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related issues

Fix Graph Visualization Access for Users with Read Permissions #1182: The new permission-aware dataset retrieval and graph visualization logic directly address the issue of fixing graph access for users with read permissions.

Possibly related PRs

feat: deletes on the fly embeddings and uses edge collections #436: Both PRs modify the logic of mapping vector distances to graph edges in CogneeGraph.
Feature/cog 920 implement mock summaryobject for codegraph #385: Both PRs involve BAML structured output integration and mock summary object support.
feat: dynamic multiple edges in datapoints #1212: Both PRs add structured output framework environment variables in .env.template.

Suggested labels

run-checks

Suggested reviewers

borisarzentar

Poem

A rabbit hopped through code so wide,
Adding BAML and Instructor side by side.
With LLMGateway’s magic, prompts now flow,
Incremental pipelines, permissions in tow.
“Feeling Lucky?”—let the search decide,
As data and graphs are neatly supplied.
🐇✨ The garden of features, now unified!

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

borisarzentar and others added 26 commits July 28, 2025 23:19

fix: datasets status without datasets parameter

6773121

version: 0.2.2.dev0

9793cd5

chore: update uv.lock file

961fa5e

Merge remote-tracking branch 'origin/main' into dev

cd930ed

added fix to weighted edges

df17ae7

Merge branch 'main' into merge-main-vol-4

ab425e4

Merge branch 'dev' into merge-main-vol-4

d237b80

added distributed fixes

0ea5894

format fix

c8202c5

Fix Graph Visualization Access for Users with Read Permissions

913a639

Signed-off-by: EricXiao <[email protected]>

EricXiao95 closed this Aug 7, 2025

EricXiao95 reopened this Aug 7, 2025

EricXiao95 closed this Aug 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: graph visualization permissions #1219

fix: graph visualization permissions #1219

Uh oh!

EricXiao95 commented Aug 7, 2025

Uh oh!

pull-checklist bot commented Aug 7, 2025

Uh oh!

coderabbitai bot commented Aug 7, 2025 •

edited

Loading

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix: graph visualization permissions #1219

fix: graph visualization permissions #1219

Uh oh!

Conversation

EricXiao95 commented Aug 7, 2025

Description

Testing

DCO Affirmation

Uh oh!

pull-checklist bot commented Aug 7, 2025

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

coderabbitai bot commented Aug 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)