Skip to content

Conversation

@alekszievr
Copy link
Contributor

@alekszievr alekszievr commented Jan 29, 2025

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

  • New Features

    • Enabled an option to retrieve more detailed metrics, providing comprehensive analytics for graph and descriptive data.
  • Refactor

    • Standardized the way metrics are obtained across components for consistent behavior and improved data accuracy.
  • Chore

    • Made internal enhancements to support optional detailed metric calculations, streamlining system performance and ensuring future scalability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2025

Walkthrough

The changes add an include_optional parameter to several methods and functions related to graph metrics retrieval. In the API and data modules, the parameter is now passed to task instantiation and metric storage functions. Meanwhile, multiple adapters for graph databases (GraphDBInterface, Neo4jAdapter, and NetworkXAdapter) have updated their get_graph_metrics signatures to accept the parameter—with the NetworkX adapter now performing detailed metric calculations. Additionally, an extra import is added for SQLAlchemy functionality.

Changes

Files Change Summary
cognee/api/v1/.../cognify_v2.py
cognee/modules/data/methods/store_descriptive_metrics.py
Updated task and function signatures to include include_optional; store_descriptive_metrics now passes this flag to metric retrieval.
cognee/infrastructure/databases/graph/graph_db_interface.py
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py
cognee/infrastructure/databases/graph/networkx/adapter.py
Modified the get_graph_metrics methods to accept an include_optional parameter (default False for some adapters); the NetworkX adapter now computes detailed graph statistics when requested.
cognee/modules/data/models/GraphMetrics.py Added an import for func from sqlalchemy.sql.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client Call
    participant S as store_descriptive_metrics
    participant E as Graph Engine
    participant A as Graph Adapter

    C->>S: Call store_descriptive_metrics(data_points, include_optional)
    S->>E: get_graph_metrics(include_optional)
    E->>A: get_graph_metrics(include_optional)
    A-->>E: Return computed metrics
    E-->>S: Return metrics data
    S-->>C: Process and store metrics
Loading

Possibly related PRs

Suggested reviewers

  • borisarzentar
  • lxobr

Poem

I'm a rabbit in the code, full of glee,
Hop by hop, I set parameters free.
Optional flags now lead my way,
In metrics fields, I leap and play.
With a twitch of whiskers and a soft "thump,"
Every enhancement makes my heart jump!
🥕 Hop on and celebrate, code chumps!

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@alekszievr alekszievr requested a review from lxobr January 29, 2025 20:04
@alekszievr alekszievr self-assigned this Jan 29, 2025
@alekszievr alekszievr changed the title Calculate descriptive metrics in networkx adapter Calculate descriptive metrics in networkx adapter [COG-1082] Jan 29, 2025
@borisarzentar borisarzentar changed the title Calculate descriptive metrics in networkx adapter [COG-1082] feat: Calculate graph metrics for networkx graph [COG-1082] Jan 31, 2025
@alekszievr alekszievr force-pushed the feat/cog-1082-metrics-in-networkx-adapter branch from 788a4f9 to f064f52 Compare February 3, 2025 10:48
@alekszievr alekszievr force-pushed the feat/cog-1082-metrics-in-graphdb-interface branch from 05138fa to 268d778 Compare February 3, 2025 13:15
Base automatically changed from feat/cog-1082-metrics-in-graphdb-interface to dev February 3, 2025 14:25
@alekszievr alekszievr force-pushed the feat/cog-1082-metrics-in-networkx-adapter branch 2 times, most recently from e89c9b9 to 27feae8 Compare February 3, 2025 14:37
@alekszievr alekszievr force-pushed the feat/cog-1082-metrics-in-networkx-adapter branch from 27feae8 to af8e798 Compare February 3, 2025 14:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

534-546: Implement actual graph metrics calculation.

The method currently returns hardcoded placeholder values and doesn't utilize the include_optional parameter. Consider implementing actual metrics calculation using Neo4j's graph algorithms.

Here's a suggested implementation:

     async def get_graph_metrics(self, include_optional=False):
+        """Get graph metrics from Neo4j.
+
+        Args:
+            include_optional (bool, optional): Whether to include computationally expensive metrics. Defaults to False.
+
+        Returns:
+            dict: A dictionary containing graph metrics.
+        """
+        # Basic metrics query
+        basic_metrics_query = """
+        MATCH (n)
+        OPTIONAL MATCH (n)-[r]-()
+        WITH COUNT(DISTINCT n) as nodes, COUNT(r) as edges
+        RETURN nodes as num_nodes, 
+               edges as num_edges,
+               CASE WHEN nodes > 0 THEN toFloat(edges)/nodes ELSE 0 END as mean_degree,
+               CASE WHEN nodes > 1 THEN toFloat(edges)/(nodes * (nodes-1)) ELSE 0 END as edge_density
+        """
+        basic_results = await self.query(basic_metrics_query)
+        metrics = basic_results[0] if basic_results else {}
+        
+        if include_optional:
+            # Additional metrics for connected components
+            components_query = """
+            CALL gds.wcc.stream('graph')
+            YIELD componentId
+            WITH componentId, count(*) as size
+            RETURN count(distinct componentId) as num_components,
+                   collect(size) as component_sizes
+            """
+            components_results = await self.query(components_query)
+            if components_results:
+                metrics.update({
+                    "num_connected_components": components_results[0]["num_components"],
+                    "sizes_of_connected_components": components_results[0]["component_sizes"]
+                })
+            
+            # Additional metrics for clustering and path lengths
+            advanced_metrics_query = """
+            CALL gds.graph.project('temp_graph', '*', '*')
+            CALL gds.alpha.clustering.average.stream('temp_graph')
+            YIELD averageClusteringCoefficient
+            CALL gds.shortestPath.dijkstra.stream('temp_graph')
+            YIELD pathLength
+            RETURN averageClusteringCoefficient as avg_clustering,
+                   max(pathLength) as diameter,
+                   avg(pathLength) as avg_shortest_path_length
+            """
+            advanced_results = await self.query(advanced_metrics_query)
+            if advanced_results:
+                metrics.update(advanced_results[0])
+        
+        return metrics
🧹 Nitpick comments (6)
cognee/infrastructure/databases/graph/networkx/adapter.py (4)

390-392: Consider providing a docstring.
Adding a short docstring clarifying the purpose of this method (and the effect of include_optional) would improve readability and serve as clear documentation for future maintainers.


404-415: Clarify handling of disconnected graphs when computing diameter and shortest paths.
Returning None for diameter and average shortest path length in the case of non-strongly-connected graphs might be confusing for downstream consumers. Consider making this behavior more explicit in documentation, or providing partial coverage (e.g., largest strongly connected component) if that better fits the use case.


423-432: Performance consideration for large graph metrics.
Metrics like connected components, diameter, or average shortest path might be expensive for very large graphs. If performance is a concern, consider providing an alternate asynchronous or batched approach, or a mechanism to skip certain metrics.


434-449: Use consistent placeholders for unsupported metrics.
Currently, optional metrics are either calculated or returned as -1, but if the graph isn't strongly connected, they become None. For consistency, consider returning None in all “not applicable” scenarios or using -1 consistently, depending on the needs of your application logic.

cognee/infrastructure/databases/graph/graph_db_interface.py (1)

59-59: Provide a default value for the new parameter.
To maintain backward compatibility and keep method calls simpler, consider defaulting include_optional=False in the interface declaration. This aligns with the adapters’ updated signatures.

-    async def get_graph_metrics(self, include_optional):
+    async def get_graph_metrics(self, include_optional=False):
     raise NotImplementedError
cognee/modules/data/methods/store_descriptive_metrics.py (1)

26-26: Add type hint for the include_optional parameter.

The function signature should include a type hint for better code maintainability and IDE support.

-async def store_descriptive_metrics(data_points: list[DataPoint], include_optional: bool):
+async def store_descriptive_metrics(data_points: list[DataPoint], include_optional: bool = False) -> list[DataPoint]:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5119992 and af8e798.

📒 Files selected for processing (6)
  • cognee/api/v1/cognify/cognify_v2.py (1 hunks)
  • cognee/infrastructure/databases/graph/graph_db_interface.py (1 hunks)
  • cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1 hunks)
  • cognee/infrastructure/databases/graph/networkx/adapter.py (2 hunks)
  • cognee/modules/data/methods/store_descriptive_metrics.py (1 hunks)
  • cognee/modules/data/models/GraphMetrics.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: docker-compose-test
🔇 Additional comments (4)
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

17-17: Import of numpy is appropriate.
Importing numpy here makes sense given the new graph metric calculations relying on array-like operations.

cognee/modules/data/models/GraphMetrics.py (2)

2-2: Importing func is valid.
Using from sqlalchemy.sql import func is consistent with your switch to database-side defaults.


27-28: Retaining server-side timestamps is fine.
Switching from Python-based timestamps to func.now() ensures that your database handles date/time generation. This can help maintain consistency and accuracy in multi-instance deployments.

cognee/api/v1/cognify/cognify_v2.py (1)

168-168: LGTM!

The Task instantiation correctly includes the new parameter with a sensible default value.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

534-546: Missing integration with Neo4j GDS & unused parameter

The current implementation of get_graph_metrics returns hardcoded values and does not utilize the include_optional parameter. Moreover, our search for neo4j-graph-data-science dependencies and any GDS procedure calls in the repository returned no results—indicating that the required Neo4j Graph Data Science library isn’t configured. This discrepancy means that if you proceed with the suggested GDS-based metric calculations, you’ll first need to add and properly configure the Neo4j GDS dependency.

  • Ensure the repository includes the required Neo4j GDS (neo4j-graph-data-science) dependency and configuration.
  • Either remove or start using the include_optional parameter.
  • Update get_graph_metrics to compute real metrics—using either the configured Neo4j GDS procedures or an alternative approach that aligns with your dependencies.
🔗 Analysis chain

Implement actual metric calculations using Neo4j's graph algorithms.

The method currently returns hardcoded placeholder values (-1) for all metrics. Consider implementing the actual calculations using Neo4j's Graph Data Science (GDS) library, which provides efficient algorithms for computing these metrics.

Here's a suggested implementation:

async def get_graph_metrics(self, include_optional=False):
-    return {
-        "num_nodes": -1,
-        "num_edges": -1,
-        "mean_degree": -1,
-        "edge_density": -1,
-        "num_connected_components": -1,
-        "sizes_of_connected_components": -1,
-        "num_selfloops": -1,
-        "diameter": -1,
-        "avg_shortest_path_length": -1,
-        "avg_clustering": -1,
-    }
+    # Basic metrics
+    basic_metrics_query = """
+    MATCH (n)
+    OPTIONAL MATCH (n)-[r]-()
+    WITH COUNT(DISTINCT n) as nodes, COUNT(r) as edges
+    RETURN nodes as num_nodes,
+           edges as num_edges,
+           toFloat(edges)/nodes as mean_degree,
+           toFloat(edges)/(nodes * (nodes-1)) as edge_density
+    """
+    basic_metrics = (await self.query(basic_metrics_query))[0]
+
+    # Component metrics using GDS
+    component_metrics_query = """
+    CALL gds.wcc.stats.estimate('*')
+    YIELD componentCount, componentDistribution
+    RETURN componentCount as num_connected_components,
+           componentDistribution as sizes_of_connected_components
+    """
+    component_metrics = (await self.query(component_metrics_query))[0]
+
+    # Self-loops
+    selfloops_query = """
+    MATCH (n)-[r]->(n)
+    RETURN COUNT(r) as num_selfloops
+    """
+    selfloops = (await self.query(selfloops_query))[0]
+
+    metrics = {
+        **basic_metrics,
+        **component_metrics,
+        **selfloops
+    }
+
+    if include_optional:
+        # Optional metrics using GDS
+        optional_metrics_query = """
+        CALL gds.alpha.allShortestPaths.stream('*')
+        YIELD sourceNodeId, targetNodeId, distance
+        WITH COLLECT(distance) as paths
+        RETURN 
+            MAX(paths) as diameter,
+            AVG(paths) as avg_shortest_path_length
+        """
+        optional_metrics = (await self.query(optional_metrics_query))[0]
+
+        clustering_query = """
+        CALL gds.localClusteringCoefficient.stream('*')
+        YIELD nodeId, localClusteringCoefficient
+        RETURN AVG(localClusteringCoefficient) as avg_clustering
+        """
+        clustering = (await self.query(clustering_query))[0]
+
+        metrics.update({
+            **optional_metrics,
+            **clustering
+        })
+
+    return metrics

Note: Ensure that the Neo4j Graph Data Science library is installed and properly configured in your Neo4j instance.

Let's verify the Neo4j GDS library installation and configuration:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if Neo4j GDS library is installed and configured
# Expected: Find references to Neo4j GDS library in configuration or dependency files

# Search for GDS library in project dependencies
rg -l "neo4j-graph-data-science|gds" --type yaml --type toml --type json

# Search for GDS procedure calls in Neo4j queries
rg "CALL\s+gds\." --type python

Length of output: 139


Script:

#!/bin/bash
# Re-run searches for Neo4j GDS library references using explicit glob patterns

# 1. Search for Neo4j GDS dependency references in common dependency configuration files
echo "Searching for 'neo4j-graph-data-science' in dependency files:"
rg "neo4j-graph-data-science" --glob "**/{requirements.txt,setup.py,pyproject.toml,environment.yml}" || echo "No dependency reference found."

# 2. Search for GDS procedure calls in Python files by explicitly targeting .py files
echo "Searching for 'CALL gds.' in Python files:"
rg "CALL\s+gds\." --glob "*.py" || echo "No GDS procedure calls found."

Length of output: 514

🧹 Nitpick comments (3)
cognee/infrastructure/databases/graph/graph_db_interface.py (1)

59-59: Add type hints and default value for include_optional parameter.

The parameter should be typed and have a default value for better usability.

-    async def get_graph_metrics(self, include_optional):
+    async def get_graph_metrics(self, include_optional: bool = False):
cognee/modules/data/methods/store_descriptive_metrics.py (1)

26-26: Add type hints for include_optional parameter.

Add type hints to improve code maintainability and IDE support.

-async def store_descriptive_metrics(data_points: list[DataPoint], include_optional: bool):
+async def store_descriptive_metrics(data_points: list[DataPoint], include_optional: bool = False) -> list[DataPoint]:
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

393-396: Add docstring and error handling for mean degree calculation.

The helper function should have a docstring and handle empty graphs consistently.

 def _get_mean_degree(graph):
+    """Calculate the mean degree of the graph.
+    
+    Args:
+        graph: A NetworkX graph object
+    
+    Returns:
+        float: Mean degree of the graph, or 0 for empty graphs
+    """
     degrees = [d for _, d in graph.degree()]
     return np.mean(degrees) if degrees else 0
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5119992 and 8e4b2bc.

📒 Files selected for processing (6)
  • cognee/api/v1/cognify/cognify_v2.py (1 hunks)
  • cognee/infrastructure/databases/graph/graph_db_interface.py (1 hunks)
  • cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1 hunks)
  • cognee/infrastructure/databases/graph/networkx/adapter.py (2 hunks)
  • cognee/modules/data/methods/store_descriptive_metrics.py (1 hunks)
  • cognee/modules/data/models/GraphMetrics.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cognee/modules/data/models/GraphMetrics.py
⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: profiler
  • GitHub Check: docker-compose-test
🔇 Additional comments (3)
cognee/modules/data/methods/store_descriptive_metrics.py (1)

29-29: Add error handling for graph metrics retrieval.

The function should handle potential errors when retrieving graph metrics to ensure graceful failure.

cognee/api/v1/cognify/cognify_v2.py (1)

168-168: Verify performance impact of always including optional metrics.

Setting include_optional=True by default means optional metrics (diameter, clustering, etc.) will always be calculated, which could impact performance for large graphs.

Run the following script to check the graph size and estimate the impact:

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

534-534: Verify implementation consistency across adapters.

The include_optional parameter has been added to align with other adapters, but its implementation might differ. Please ensure consistent behavior across all graph database adapters.

Let's verify the implementation in other adapters:

✅ Verification successful

I'll run additional shell scripts to search for both synchronous and asynchronous implementations of get_graph_metrics (and to see how the include_optional parameter is used) across the codebase.


Graph Metrics Adapter Consistency Verified
Both the networkx and neo4j adapters now define an asynchronous get_graph_metrics method that accepts the include_optional parameter (defaulting to False), which aligns with the abstract definition in graph_db_interface.py. While the two adapters differ in the format of the returned metrics (a tuple in networkx vs. a dictionary in neo4j), the handling of the include_optional parameter is consistent across implementations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check implementation of get_graph_metrics across adapters
# Expected: Find all implementations of get_graph_metrics method

# Search for get_graph_metrics implementations
ast-grep --pattern 'def get_graph_metrics(self, $_)'

Length of output: 54


Script:

#!/bin/bash
echo "Searching for synchronous definitions of get_graph_metrics..."
rg -n "def get_graph_metrics" -C 2 .

echo "Searching for asynchronous definitions of get_graph_metrics..."
rg -n "async def get_graph_metrics" -C 2 .

echo "Searching for get_graph_metrics implementations that mention include_optional..."
rg -n "get_graph_metrics.*include_optional" -C 2 .

Length of output: 5053

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cognee/infrastructure/databases/graph/networkx/adapter.py (2)

393-396: Consider handling empty graphs in mean degree calculation.

The current implementation returns 0 for empty graphs, which might be misleading. Consider returning None instead to distinguish between graphs with actual zero mean degree and empty graphs.

 def _get_mean_degree(graph):
     degrees = [d for _, d in graph.degree()]
-    return np.mean(degrees) if degrees else 0
+    return np.mean(degrees) if degrees else None

410-415: Improve average shortest path length calculation.

Similar to the diameter calculation, consider calculating the average shortest path length for the largest component in disconnected graphs.

 def _get_avg_shortest_path_length(graph):
     if nx.is_strongly_connected(graph):
         return nx.average_shortest_path_length(graph)
+    elif graph.number_of_nodes() > 0:
+        largest = max(nx.weakly_connected_components(graph), key=len)
+        subgraph = graph.subgraph(largest)
+        return nx.average_shortest_path_length(subgraph)
     else:
         return None
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e4b2bc and 58259b0.

📒 Files selected for processing (1)
  • cognee/infrastructure/databases/graph/networkx/adapter.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: windows-latest
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: docker-compose-test
🔇 Additional comments (5)
cognee/infrastructure/databases/graph/networkx/adapter.py (5)

17-17: LGTM!

The numpy import is appropriate for statistical calculations used in the graph metrics.


404-409: Improve diameter calculation for disconnected graphs.

The current implementation returns None for disconnected graphs. Consider calculating the diameter of the largest component instead.


423-432: LGTM!

The mandatory metrics calculation is comprehensive and includes essential graph properties.


434-449: Add error handling for optional metrics calculation.

The current implementation might fail if any of the optional metric calculations raise an exception.


416-422: 🛠️ Refactor suggestion

Consider using undirected graph for clustering coefficient.

The current implementation converts to a directed graph (DiGraph), but clustering coefficient is typically calculated on undirected graphs.

 def _get_avg_clustering(graph):
     try:
-        return nx.average_clustering(nx.DiGraph(graph))
+        return nx.average_clustering(graph.to_undirected())
     except Exception as e:
         logger.warning("Failed to calculate clustering coefficient: %s", e)
         return None

Likely invalid or redundant comment.

@alekszievr alekszievr merged commit 2858a67 into dev Feb 3, 2025
27 checks passed
@alekszievr alekszievr deleted the feat/cog-1082-metrics-in-networkx-adapter branch February 3, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants