fix: fixes cognify duplicated edges and resets the methods to an olde… #242

hajdul88 · 2024-12-02T19:21:23Z

…r version

Summary by CodeRabbit

New Features
- Enhanced graph construction with improved handling of DataPoint instances, including metadata support.
- Simplified logic for creating model instances from graph data.
Bug Fixes
- Adjusted parameter handling to prevent potential issues with edge construction and retrieval.
Refactor
- Streamlined functions for clarity and reduced complexity by removing unnecessary nested functions and consolidating logic.

…r version

coderabbitai · 2024-12-02T19:21:33Z

Walkthrough

The pull request introduces significant modifications to two functions: get_graph_from_model and get_model_instance_from_graph. The get_graph_from_model function has updated parameters and streamlined logic for handling DataPoint instances, enhancing clarity and reducing complexity. The get_model_instance_from_graph function has simplified its parameter types and logic for constructing model instances from graph edges. Both functions maintain their core functionality while improving their overall structure and readability.

Changes

File Path	Change Summary
`cognee/modules/graph/utils/get_graph_from_model.py`	Updated method signature; added `include_root` parameter; changed defaults for `added_nodes` and `added_edges`; streamlined logic for handling `DataPoint` instances; integrated functionality of `add_nodes_and_edges` directly; enhanced edge construction with metadata.
`cognee/modules/graph/utils/get_model_instance_from_graph.py`	Updated method signature; changed `edges` parameter type to a more generic list; simplified `node_map` construction; modified edge unpacking and retrieval of `edge_type`; streamlined model instance creation logic.

Sequence Diagram(s)

sequenceDiagram
    participant DP as DataPoint
    participant GM as get_graph_from_model
    participant MI as get_model_instance_from_graph
    participant NM as NewModel

    DP->>GM: Pass DataPoint
    GM->>GM: Check include_root
    GM->>GM: Handle nodes and edges
    GM->>NM: Create NewModel instance
    NM-->>GM: Return instance
    GM-->>DP: Return graph

    DP->>MI: Pass nodes and edges
    MI->>MI: Construct node_map
    MI->>NM: Create NewModel instance
    NM-->>MI: Return instance
    MI-->>DP: Return model instance

Poem

In the graph where data hops,
A rabbit's joy never stops.
With nodes and edges, clear and bright,
We weave our tales in pure delight.
Hooray for changes, swift and neat,
In every line, our hearts do beat! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)

cognee/modules/graph/utils/get_graph_from_model.py (1)

21-32: Refactor duplicated code into a helper function

The code blocks from lines 21-32 and 52-63 are nearly identical. Refactoring this duplicated code into a helper function would reduce redundancy and enhance maintainability.

Consider extracting the duplicated logic into a separate function:

def process_property_nodes(property_nodes, property_edges, data_point, field_name, added_nodes, added_edges, metadata=None):
    for node in property_nodes:
        if str(node.id) not in added_nodes:
            nodes.append(node)
            added_nodes[str(node.id)] = True

    for edge in property_edges:
        edge_key = str(edge[0]) + str(edge[1]) + edge[2]
        if str(edge_key) not in added_edges:
            edges.append(edge)
            added_edges[str(edge_key)] = True

    for property_node in get_own_properties(property_nodes, property_edges):
        edge_key = str(data_point.id) + str(property_node.id) + field_name
        if str(edge_key) not in added_edges:
            edge_data = {
                "source_node_id": data_point.id,
                "target_node_id": property_node.id,
                "relationship_name": field_name,
                "updated_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S"),
            }
            if metadata:
                edge_data["metadata"] = metadata
            edges.append((data_point.id, property_node.id, field_name, edge_data))
            added_edges[str(edge_key)] = True

Then, replace the duplicated code with calls to this helper function:

 # For single DataPoint instances
-# Original duplicated code
+process_property_nodes(property_nodes, property_edges, data_point, field_name, added_nodes, added_edges)

 # For lists of DataPoint instances
-# Original duplicated code
+process_property_nodes(property_nodes, property_edges, data_point, field_name, added_nodes, added_edges, metadata={"type": "list"})

Also applies to: 52-63

cognee/modules/graph/utils/get_model_instance_from_graph.py (1)

6-6: Specify the edge structure for clarity and type safety

The edges parameter is now typed as a generic list, reducing clarity and type safety. Defining a specific type for edges enhances readability and helps prevent errors.

Consider defining a type alias for edges:
from typing import List, Tuple, Dict, Any

Edge = Tuple[str, str, str, Dict[str, Any]]

def get_model_instance_from_graph(nodes: List[DataPoint], edges: List[Edge], entity_id: str):
    # rest of the code

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b0eb9af and 6841c83.

📒 Files selected for processing (2)

cognee/modules/graph/utils/get_graph_from_model.py (2 hunks)
cognee/modules/graph/utils/get_model_instance_from_graph.py (1 hunks)

coderabbitai · 2024-12-02T19:26:44Z

cognee/modules/graph/utils/get_graph_from_model.py

+
+                    if str(edge_key) not in added_edges:
+                        edges.append(edge)
+                        added_edges[edge_key] = True


⚠️ Potential issue

Ensure consistent key types in added_edges dictionary

In line 62, added_edges[edge_key] = True is missing the str() conversion used elsewhere. This inconsistency could lead to duplicate edges not being detected correctly due to mismatched key types.

Apply this diff to fix the inconsistency:

- added_edges[edge_key] = True + added_edges[str(edge_key)] = True

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

added_edges[edge_key] = True

added_edges[str(edge_key)] = True

coderabbitai · 2024-12-02T19:26:45Z

cognee/modules/graph/utils/get_graph_from_model.py

-    if not added_edges:
-        added_edges = {}
-
+def get_graph_from_model(data_point: DataPoint, include_root = True, added_nodes = {}, added_edges = {}):


⚠️ Potential issue

Avoid mutable default arguments to prevent unintended behavior

Using mutable default arguments like {} for added_nodes and added_edges can lead to unexpected behavior because the default dictionaries are shared across all function calls. It's recommended to use None as the default value and initialize the dictionaries within the function.

Apply this diff to fix the issue:

-def get_graph_from_model(data_point: DataPoint, include_root = True, added_nodes = {}, added_edges = {}): +def get_graph_from_model(data_point: DataPoint, include_root=True, added_nodes=None, added_edges=None): + if added_nodes is None: + added_nodes = {} + if added_edges is None: + added_edges = {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_graph_from_model(data_point: DataPoint, include_root = True, added_nodes = {}, added_edges = {}):

def get_graph_from_model(data_point: DataPoint, include_root=True, added_nodes=None, added_edges=None):

if added_nodes is None:

added_nodes = {}

if added_edges is None:

added_edges = {}

coderabbitai · 2024-12-02T19:26:45Z

cognee/modules/graph/utils/get_model_instance_from_graph.py

+    for edge in edges:
+        source_node = node_map[edge[0]]
+        target_node = node_map[edge[1]]
+        edge_label = edge[2]
+        edge_properties = edge[3] if len(edge) == 4 else {}


⚠️ Potential issue

Add validation for edge tuple length to prevent index errors

Accessing elements of edge without ensuring it has the expected length can raise IndexError. Validate the length of edge before accessing its elements.

Apply this fix:

for edge in edges: if len(edge) < 3: # Handle error or skip invalid edge continue # or raise an exception with a descriptive message source_node = node_map.get(edge[0]) target_node = node_map.get(edge[1]) if source_node is None or target_node is None: # Handle missing nodes continue # or raise an exception edge_label = edge[2] edge_properties = edge[3] if len(edge) >= 4 else {} edge_metadata = edge_properties.get("metadata", {}) # rest of the code

coderabbitai · 2024-12-02T19:26:45Z

cognee/modules/graph/utils/get_model_instance_from_graph.py

-
-            node_map[source_node_id] = NewModel(**source_node_dict)
+            NewModel = copy_model(type(source_node), { edge_label: (list[type(target_node)], PydanticUndefined) })
+


⚠️ Potential issue

Correct the syntax for specifying list type in copy_model

In line 21, list[type(target_node)] is invalid syntax. To specify a list of a type, use List[type(target_node)] from the typing module.

First, import List from typing:

+from typing import List

Then, correct line 21:

- NewModel = copy_model(type(source_node), { edge_label: (list[type(target_node)], PydanticUndefined) }) + NewModel = copy_model(type(source_node), { edge_label: (List[type(target_node)], PydanticUndefined) })

Committable suggestion skipped: line range outside the PR's diff.

fix: fixes cognify duplicated edges and resets the methods to an olde…

6841c83

…r version

coderabbitai bot reviewed Dec 2, 2024

View reviewed changes

Vasilije1990 self-requested a review December 2, 2024 19:51

Vasilije1990 approved these changes Dec 2, 2024

View reviewed changes

Vasilije1990 merged commit 42ab601 into main Dec 2, 2024
10 checks passed

Vasilije1990 deleted the main-cognify-fix branch December 2, 2024 19:53

This was referenced Dec 4, 2024

Creates edge embeddings collection #251

Merged

fix: refactor get_graph_from_model to return nodes and edges correctly #257

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: fixes cognify duplicated edges and resets the methods to an olde… #242

fix: fixes cognify duplicated edges and resets the methods to an olde… #242

Uh oh!

hajdul88 commented Dec 2, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 2, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 2, 2024

Uh oh!

coderabbitai bot Dec 2, 2024

Uh oh!

coderabbitai bot Dec 2, 2024

Uh oh!

coderabbitai bot Dec 2, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	added_edges[edge_key] = True
	added_edges[str(edge_key)] = True


		node_map[source_node_id] = NewModel(**source_node_dict)
		NewModel = copy_model(type(source_node), { edge_label: (list[type(target_node)], PydanticUndefined) })

fix: fixes cognify duplicated edges and resets the methods to an olde… #242

fix: fixes cognify duplicated edges and resets the methods to an olde… #242

Uh oh!

Conversation

hajdul88 commented Dec 2, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hajdul88 commented Dec 2, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 2, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)