-
Notifications
You must be signed in to change notification settings - Fork 966
fix: fixes cognify duplicated edges and resets the methods to an olde… #242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces significant modifications to two functions: Changes
Sequence Diagram(s)sequenceDiagram
participant DP as DataPoint
participant GM as get_graph_from_model
participant MI as get_model_instance_from_graph
participant NM as NewModel
DP->>GM: Pass DataPoint
GM->>GM: Check include_root
GM->>GM: Handle nodes and edges
GM->>NM: Create NewModel instance
NM-->>GM: Return instance
GM-->>DP: Return graph
DP->>MI: Pass nodes and edges
MI->>MI: Construct node_map
MI->>NM: Create NewModel instance
NM-->>MI: Return instance
MI-->>DP: Return model instance
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
cognee/modules/graph/utils/get_graph_from_model.py (1)
21-32: Refactor duplicated code into a helper functionThe code blocks from lines 21-32 and 52-63 are nearly identical. Refactoring this duplicated code into a helper function would reduce redundancy and enhance maintainability.
Consider extracting the duplicated logic into a separate function:
def process_property_nodes(property_nodes, property_edges, data_point, field_name, added_nodes, added_edges, metadata=None): for node in property_nodes: if str(node.id) not in added_nodes: nodes.append(node) added_nodes[str(node.id)] = True for edge in property_edges: edge_key = str(edge[0]) + str(edge[1]) + edge[2] if str(edge_key) not in added_edges: edges.append(edge) added_edges[str(edge_key)] = True for property_node in get_own_properties(property_nodes, property_edges): edge_key = str(data_point.id) + str(property_node.id) + field_name if str(edge_key) not in added_edges: edge_data = { "source_node_id": data_point.id, "target_node_id": property_node.id, "relationship_name": field_name, "updated_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S"), } if metadata: edge_data["metadata"] = metadata edges.append((data_point.id, property_node.id, field_name, edge_data)) added_edges[str(edge_key)] = TrueThen, replace the duplicated code with calls to this helper function:
# For single DataPoint instances -# Original duplicated code +process_property_nodes(property_nodes, property_edges, data_point, field_name, added_nodes, added_edges)# For lists of DataPoint instances -# Original duplicated code +process_property_nodes(property_nodes, property_edges, data_point, field_name, added_nodes, added_edges, metadata={"type": "list"})Also applies to: 52-63
cognee/modules/graph/utils/get_model_instance_from_graph.py (1)
6-6: Specify the edge structure for clarity and type safetyThe
edgesparameter is now typed as a genericlist, reducing clarity and type safety. Defining a specific type for edges enhances readability and helps prevent errors.Consider defining a type alias for edges:
from typing import List, Tuple, Dict, Any Edge = Tuple[str, str, str, Dict[str, Any]] def get_model_instance_from_graph(nodes: List[DataPoint], edges: List[Edge], entity_id: str): # rest of the code
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
cognee/modules/graph/utils/get_graph_from_model.py(2 hunks)cognee/modules/graph/utils/get_model_instance_from_graph.py(1 hunks)
|
|
||
| if str(edge_key) not in added_edges: | ||
| edges.append(edge) | ||
| added_edges[edge_key] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistent key types in added_edges dictionary
In line 62, added_edges[edge_key] = True is missing the str() conversion used elsewhere. This inconsistency could lead to duplicate edges not being detected correctly due to mismatched key types.
Apply this diff to fix the inconsistency:
- added_edges[edge_key] = True
+ added_edges[str(edge_key)] = True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| added_edges[edge_key] = True | |
| added_edges[str(edge_key)] = True |
| if not added_edges: | ||
| added_edges = {} | ||
|
|
||
| def get_graph_from_model(data_point: DataPoint, include_root = True, added_nodes = {}, added_edges = {}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid mutable default arguments to prevent unintended behavior
Using mutable default arguments like {} for added_nodes and added_edges can lead to unexpected behavior because the default dictionaries are shared across all function calls. It's recommended to use None as the default value and initialize the dictionaries within the function.
Apply this diff to fix the issue:
-def get_graph_from_model(data_point: DataPoint, include_root = True, added_nodes = {}, added_edges = {}):
+def get_graph_from_model(data_point: DataPoint, include_root=True, added_nodes=None, added_edges=None):
+ if added_nodes is None:
+ added_nodes = {}
+ if added_edges is None:
+ added_edges = {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_graph_from_model(data_point: DataPoint, include_root = True, added_nodes = {}, added_edges = {}): | |
| def get_graph_from_model(data_point: DataPoint, include_root=True, added_nodes=None, added_edges=None): | |
| if added_nodes is None: | |
| added_nodes = {} | |
| if added_edges is None: | |
| added_edges = {} |
| for edge in edges: | ||
| source_node = node_map[edge[0]] | ||
| target_node = node_map[edge[1]] | ||
| edge_label = edge[2] | ||
| edge_properties = edge[3] if len(edge) == 4 else {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation for edge tuple length to prevent index errors
Accessing elements of edge without ensuring it has the expected length can raise IndexError. Validate the length of edge before accessing its elements.
Apply this fix:
for edge in edges:
if len(edge) < 3:
# Handle error or skip invalid edge
continue # or raise an exception with a descriptive message
source_node = node_map.get(edge[0])
target_node = node_map.get(edge[1])
if source_node is None or target_node is None:
# Handle missing nodes
continue # or raise an exception
edge_label = edge[2]
edge_properties = edge[3] if len(edge) >= 4 else {}
edge_metadata = edge_properties.get("metadata", {})
# rest of the code|
|
||
| node_map[source_node_id] = NewModel(**source_node_dict) | ||
| NewModel = copy_model(type(source_node), { edge_label: (list[type(target_node)], PydanticUndefined) }) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the syntax for specifying list type in copy_model
In line 21, list[type(target_node)] is invalid syntax. To specify a list of a type, use List[type(target_node)] from the typing module.
First, import List from typing:
+from typing import ListThen, correct line 21:
- NewModel = copy_model(type(source_node), { edge_label: (list[type(target_node)], PydanticUndefined) })
+ NewModel = copy_model(type(source_node), { edge_label: (List[type(target_node)], PydanticUndefined) })Committable suggestion skipped: line range outside the PR's diff.
…r version
Summary by CodeRabbit
New Features
DataPointinstances, including metadata support.Bug Fixes
Refactor