feat: dynamic multiple edges in datapoints #1212

lxobr · 2025-08-06T15:30:10Z

Description

Improved list handling, removed .index logic from get_graph_from_model, transitioned to fully datapoint-oriented processing
Streamlined datapoint iteration by introducing _datapoints_generator with nested loops
Generalized field processing to handle mixed lists: [DataPoint, (Edge, DataPoint), (Edge, [DataPoint])], allowing dynamic multiple edges generation
Small improvements and refactorings
Added tests to test_get_graph_from_model_flexible_edges() covering weighted edges and dynamic multiple edges
Created dynamic_multiple_edges_example.py demonstrating dynamic multiple edges

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

…tapoints

pull-checklist · 2025-08-06T15:30:15Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-08-06T15:30:21Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update introduces a major refactor and feature expansion for LLM structured output frameworks, including full integration of the BAML framework alongside existing Instructor/litellm support. A new LLMGateway abstraction centralizes all LLM interactions. The pipeline system gains fine-grained incremental processing, and search now supports a new "FEELING_LUCKY" type with dynamic selection. Numerous modules are updated for consistency, error handling, and new configuration options.

Changes

Cohort / File(s)	Change Summary
LLM Gateway Abstraction `cognee/infrastructure/llm/LLMGateway.py`, `cognee/infrastructure/llm/__init__.py`, `cognee/modules/retrieval/code_retriever.py`, `cognee/modules/retrieval/graph_completion_cot_retriever.py`, `cognee/modules/retrieval/natural_language_retriever.py`, `cognee/modules/retrieval/utils/completion.py`, `cognee/modules/retrieval/utils/description_to_codepart_search.py`, `cognee/tasks/graph/cascade_extract/utils/extract_content_nodes_and_relationship_names.py`, `cognee/tasks/graph/cascade_extract/utils/extract_edge_triplets.py`, `cognee/tasks/graph/cascade_extract/utils/extract_nodes.py`, `cognee/tasks/entity_completion/entity_extractors/llm_entity_extractor.py`, `cognee/tasks/chunk_naive_llm_classifier/chunk_naive_llm_classifier.py`, `cognee/modules/data/processing/document_types/AudioDocument.py`, `cognee/modules/data/processing/document_types/ImageDocument.py`, `cognee/modules/data/extraction/extract_categories.py` (deleted), ...	Introduces the `LLMGateway` class as a unified interface for LLM operations, replacing scattered client instantiations and prompt utilities. All LLM-related calls are routed through this gateway, supporting both Instructor/litellm and BAML frameworks. Removes the now-obsolete `extract_categories` module.
BAML Structured Output Framework Integration `cognee/infrastructure/llm/structured_output_framework/baml/baml_client/`, `cognee/infrastructure/llm/structured_output_framework/baml/baml_src/`, `cognee/infrastructure/llm/structured_output_framework/baml/baml_src/extraction/*`, ...	Adds a full BAML client SDK, including async/sync clients, type builders, streaming types, runtime, and prompt templates for content classification, knowledge graph extraction, and summarization. Generated files provide data models and client logic.
LLM Config and Environment `.env.template`, `cognee/infrastructure/llm/config.py`	Adds new environment variables for structured output framework selection and BAML configuration. Updates `LLMConfig` with BAML-specific fields and a post-init registry.
Pipeline Incremental Processing `cognee/modules/pipelines/operations/run_tasks.py`, `cognee/modules/pipelines/operations/pipeline.py`, `cognee/api/v1/add/add.py`, `cognee/api/v1/cognify/cognify.py`, `cognee/api/v1/cognify/code_graph_pipeline.py`, ...	Refactors pipeline task execution to support incremental, concurrent processing of data items, with robust status tracking and error aggregation. Adds `incremental_loading` parameters throughout the pipeline stack.
Graph Database and Adapter Updates `cognee/infrastructure/databases/graph/config.py`, `cognee/infrastructure/databases/graph/get_graph_engine.py`, `cognee/infrastructure/databases/graph/neo4j_driver/adapter.py`, `cognee/infrastructure/databases/graph/kuzu/adapter.py`, `cognee/infrastructure/databases/graph/neptune_driver/adapter.py`, `cognee/infrastructure/databases/graph/networkx/adapter.py`	Adds `graph_database_name` to configs and adapters, removes memgraph support, and standardizes document subgraph queries to use `data_id` instead of `content_hash`. Neo4j adapter gains edge property flattening.
Search System Enhancement `cognee/modules/search/types/SearchType.py`, `cognee/modules/search/operations/select_search_type.py`, `cognee/modules/search/methods/search.py`, `cognee/api/v1/search/search.py`, `cognee/infrastructure/llm/prompts/search_type_selector_prompt.txt`	Adds a new `FEELING_LUCKY` search type, with logic to dynamically select the best search type using an LLM and a new prompt. Updates documentation and selection logic accordingly.
Error Handling and Status Models `cognee/modules/pipelines/exceptions/exceptions.py`, `cognee/modules/pipelines/exceptions/__init__.py`, `cognee/modules/pipelines/models/PipelineRunInfo.py`, `cognee/modules/pipelines/models/DataItemStatus.py`, `cognee/modules/pipelines/models/__init__.py`, `cognee/api/v1/add/routers/get_add_router.py`, `cognee/api/v1/cognify/routers/get_cognify_router.py`	Adds new error and status classes for pipeline runs and data items, with improved error propagation and HTTP response handling in API routers.
Data Model and Deletion Refactor `cognee/modules/data/models/Data.py`, `cognee/api/v1/delete/delete.py`	Adds a mutable JSON `pipeline_status` field to the Data model for better status tracking. Refactors document deletion to use `data_id` instead of `content_hash`.
Graph Extraction & Traversal Refactor `cognee/modules/graph/utils/get_graph_from_model.py`, `cognee/modules/engine/utils/generate_edge_id.py`, `cognee/modules/graph/cognee_graph/CogneeGraph.py`	Refactors graph extraction and traversal to unify data extraction, simplify edge creation, and introduce a utility for generating normalized edge UUIDs.
Prompt and Tokenizer Updates `cognee/infrastructure/llm/prompts/`, `cognee/infrastructure/llm/tokenizer/`, `cognee/infrastructure/databases/vector/embeddings/*`	Adds or updates prompt templates, including a new search type selector prompt. Tokenizer adapters are improved for fallback and error handling.
Miscellaneous Refactoring and Imports `cognee/infrastructure/llm/utils.py`, `cognee/modules/retrieval/context_providers/TripletSearchContextProvider.py`, `cognee/modules/retrieval/graph_completion_context_extension_retriever.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/*`, ...	Cleans up and standardizes import statements, removes unused imports, and updates function signatures and docstrings for clarity and consistency.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant Pipeline
    participant LLMGateway
    participant BAML/Instructor

    User->>API: Submit data for processing (add/cognify)
    API->>Pipeline: Start pipeline with incremental_loading
    loop For each data item (concurrent)
        Pipeline->>LLMGateway: Request structured output (e.g., extract graph/categories/summary)
        alt Framework = BAML
            LLMGateway->>BAML/Instructor: Route request to BAML extraction
        else Framework = Instructor
            LLMGateway->>BAML/Instructor: Route request to Instructor extraction
        end
        BAML/Instructor-->>LLMGateway: Structured output (graph, categories, etc.)
        LLMGateway-->>Pipeline: Return structured output
        Pipeline->>API: Update status, yield result/event
    end
    API-->>User: Return pipeline run info or error

sequenceDiagram
    participant User
    participant API
    participant SearchModule
    participant LLMGateway

    User->>API: Search with type FEELING_LUCKY
    API->>SearchModule: specific_search(query, FEELING_LUCKY)
    SearchModule->>LLMGateway: select_search_type(query)
    LLMGateway-->>SearchModule: Returns best SearchType
    SearchModule->>SearchModule: Perform search with selected type
    SearchModule-->>API: Return search results
    API-->>User: Results

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

This PR introduces new frameworks, refactors core LLM and pipeline logic, adds new data models, and touches many files across the codebase, including generated and configuration files. Review will require careful attention to integration points, concurrency, error handling, and backward compatibility.

Possibly related PRs

topoteretes/cognee#436: Also modifies map_vector_distances_to_graph_edges in CogneeGraph.py, showing direct overlap in graph edge handling logic.
topoteretes/cognee#421: Both PRs update graph database configuration, with this PR adding graph_database_name and removing memgraph, while feat: Add ability to change graph database configuration through cognee #421 adds dynamic config updates—related through graph DB config but not overlapping on the same functions.

Poem

A rabbit hopped through code so wide,
Bringing BAML and Gateway side by side.
Now LLMs speak with a single voice,
Incremental pipelines dance and rejoice.
"Feeling Lucky?"—search is new,
Graphs and prompts all shiny too!
🐇✨ The future’s structured, thanks to you!

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/cog-2672-dynamic-multiple-edges-in-datapoints

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

cognee/modules/graph/utils/get_graph_from_model.py

…tapoints

hajdul88

Looks okay to me

Unit tests are failing

lxobr added 9 commits August 6, 2025 14:05

feat: simplify list handling

1a483e0

refactor: consolidate edge and datapoint handling

a2883bb

refactor: process datapoints via generator

c43b6d6

refactor: generalize input structure

4e46e2e

feat: enable dynamic multiple edges

6cb8018

feat: dynamic multiple edges example

5796b19

tests: update get graph from model tests

50d8e54

refactor: rename generator

05b4ece

Merge branch 'dev' into feature/cog-2672-dynamic-multiple-edges-in-da…

42b1738

…tapoints

lxobr requested a review from hajdul88 August 6, 2025 15:30

lxobr added the run-checks label Aug 6, 2025

lxobr requested a review from Vasilije1990 August 6, 2025 15:30

lxobr changed the base branch from main to dev August 6, 2025 15:33

lxobr changed the title ~~Feature/cog 2672 dynamic multiple edges in datapoints~~ feat: dynamic multiple edges in datapoints Aug 6, 2025

lxobr self-assigned this Aug 6, 2025

hajdul88 requested changes Aug 7, 2025

View reviewed changes

cognee/modules/graph/utils/get_graph_from_model.py Outdated Show resolved Hide resolved

cognee/modules/graph/utils/get_graph_from_model.py Show resolved Hide resolved

lxobr added 2 commits August 7, 2025 11:45

Merge branch 'dev' into feature/cog-2672-dynamic-multiple-edges-in-da…

4c98ea0

…tapoints

fix: add separators in key generation

69e7e0a

lxobr requested a review from hajdul88 August 7, 2025 10:32

hajdul88 previously approved these changes Aug 7, 2025

View reviewed changes

hajdul88 self-requested a review August 7, 2025 10:47

fix: update tests

cd0c391

hajdul88 approved these changes Aug 7, 2025

View reviewed changes

lxobr merged commit 6dbd8e8 into dev Aug 7, 2025
58 of 62 checks passed

lxobr deleted the feature/cog-2672-dynamic-multiple-edges-in-datapoints branch August 7, 2025 12:50

coderabbitai bot mentioned this pull request Aug 7, 2025

fix: graph visualization permissions #1219

Closed

coderabbitai bot mentioned this pull request Sep 19, 2025

version 0.3.4 #1433

Merged

16 tasks

coderabbitai bot mentioned this pull request Oct 31, 2025

Refactor: break down server.py, extract tools #1717

Closed

16 tasks

This was referenced Dec 2, 2025

fix: install nvm and node for -ui cli command #1836

Merged

fix: Resolve issue with BAML rate limit handling #1813

Merged

coderabbitai bot mentioned this pull request Dec 10, 2025

Add support for transcribe image and audio transcription for gemini, anthropic, mistral and ollama. #1828

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: dynamic multiple edges in datapoints #1212

feat: dynamic multiple edges in datapoints #1212

Uh oh!

lxobr commented Aug 6, 2025

Uh oh!

pull-checklist bot commented Aug 6, 2025

Uh oh!

coderabbitai bot commented Aug 6, 2025 •

edited

Loading

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

hajdul88 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: dynamic multiple edges in datapoints #1212

feat: dynamic multiple edges in datapoints #1212

Uh oh!

Conversation

lxobr commented Aug 6, 2025

Description

DCO Affirmation

Uh oh!

pull-checklist bot commented Aug 6, 2025

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

hajdul88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Aug 6, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)