-
Notifications
You must be signed in to change notification settings - Fork 966
Test: test descriptive graph metric calculation in neo4j and networkx adapters [COG-1188] #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…-tokens-to-metric-table
…add-num-tokens-to-metric-table
…og-1082-metrics-in-networkx-adapter
…g-1082-metrics-in-neo4j-adapter
…g-1082-metrics-in-neo4j-adapter
WalkthroughThis update removes the default inclusion of the descriptive metrics task from the cognify API. The Neo4j adapter now features new asynchronous methods to check, project, and drop graphs, and its metrics function has been enhanced. The GraphMetrics model has been updated to use database-side timestamp handling. Additionally, several asynchronous test modules have been added to verify metrics consistency between Neo4j and NetworkX, and a new GitHub Actions workflow has been introduced to run these tests. Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner
participant Neo4jAdapter
participant NetworkXEngine
TestRunner->>Neo4jAdapter: get_neo4j_metrics(include_optional=False)
Neo4jAdapter-->>TestRunner: Return graph metrics data
TestRunner->>NetworkXEngine: get_networkx_metrics(include_optional=False)
NetworkXEngine-->>TestRunner: Return graph metrics data
TestRunner->>TestRunner: Compare metrics and assert consistency
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (17)
🔇 Additional comments (4)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)
540-573: Consider sanitizing node and relationship labels
Currently, labels are directly interpolated into the Cypher statement. If a label contains special characters (including quotes), it could cause a syntax error. Consider escaping or sanitizing the labels to avoid query breakage.cognee/tests/tasks/descriptive_metrics/metric_consistency_test.py (1)
1-14: Consider using a tolerance-based comparison for floating-point metrics
The assertion performs a direct equality check on the metrics. If any metric is floating-point, minimal numerical differences could cause test flakiness. A tolerance-based comparison may be more robust for floating values.cognee/tests/tasks/descriptive_metrics/metrics_test_utils.py (1)
1-26: Add a clarifying docstring for the self-loop design
This function deliberately creates a self-loop by addingdoc_chunkto its owncontainslist. Including a docstring or detailed comment can inform maintainers that this is intentional for testing self-loop detection.cognee/tests/tasks/descriptive_metrics/neo4j_metrics_test.py (2)
19-38: Add test cases for optional metrics.The test verifies basic metrics but doesn't test optional metrics like diameter, avg_shortest_path_length, and avg_clustering that are tested in the networkx version.
Add assertions for optional metrics to maintain consistency with networkx tests:
async def test_neo4j_metrics(): neo4j_metrics = await get_neo4j_metrics(include_optional=True) assert neo4j_metrics["num_nodes"] == 9, f"Expected 9 nodes, got {neo4j_metrics['num_nodes']}" assert neo4j_metrics["num_edges"] == 9, f"Expected 9 edges, got {neo4j_metrics['num_edges']}" assert neo4j_metrics["mean_degree"] == 2, ( f"Expected mean degree is 2, got {neo4j_metrics['mean_degree']}" ) assert neo4j_metrics["edge_density"] == 0.125, ( f"Expected edge density is 0.125, got {neo4j_metrics['edge_density']}" ) assert neo4j_metrics["num_connected_components"] == 2, ( f"Expected 2 connected components, got {neo4j_metrics['num_connected_components']}" ) assert neo4j_metrics["sizes_of_connected_components"] == [5, 4], ( f"Expected connected components of size [5, 4], got {neo4j_metrics['sizes_of_connected_components']}" ) assert neo4j_metrics["num_selfloops"] == 1, ( f"Expected 1 self-loop, got {neo4j_metrics['num_selfloops']}" ) + assert neo4j_metrics["diameter"] is None, ( + f"Diameter should be None for disconnected graphs, got {neo4j_metrics['diameter']}" + ) + assert neo4j_metrics["avg_shortest_path_length"] is None, ( + f"Average shortest path should be None for disconnected graphs, got {neo4j_metrics['avg_shortest_path_length']}" + ) + assert neo4j_metrics["avg_clustering"] == 0, ( + f"Expected 0 average clustering, got {neo4j_metrics['avg_clustering']}" + )
19-38: Consider grouping related assertions.The test assertions could be organized better by grouping related metrics together.
Consider reorganizing the test into logical groups:
async def test_neo4j_metrics(): neo4j_metrics = await get_neo4j_metrics(include_optional=True) + # Basic graph properties assert neo4j_metrics["num_nodes"] == 9, f"Expected 9 nodes, got {neo4j_metrics['num_nodes']}" assert neo4j_metrics["num_edges"] == 9, f"Expected 9 edges, got {neo4j_metrics['num_edges']}" assert neo4j_metrics["num_selfloops"] == 1, f"Expected 1 self-loop, got {neo4j_metrics['num_selfloops']}" + + # Connectivity metrics assert neo4j_metrics["mean_degree"] == 2, ( f"Expected mean degree is 2, got {neo4j_metrics['mean_degree']}" ) assert neo4j_metrics["edge_density"] == 0.125, ( f"Expected edge density is 0.125, got {neo4j_metrics['edge_density']}" ) + + # Component analysis assert neo4j_metrics["num_connected_components"] == 2, ( f"Expected 2 connected components, got {neo4j_metrics['num_connected_components']}" ) assert neo4j_metrics["sizes_of_connected_components"] == [5, 4], ( f"Expected connected components of size [5, 4], got {neo4j_metrics['sizes_of_connected_components']}" ) - assert neo4j_metrics["num_selfloops"] == 1, ( - f"Expected 1 self-loop, got {neo4j_metrics['num_selfloops']}" - )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/test_networkx_descriptive_metrics.yml(1 hunks)cognee/api/v1/cognify/cognify_v2.py(0 hunks)cognee/infrastructure/databases/graph/neo4j_driver/adapter.py(1 hunks)cognee/modules/data/models/GraphMetrics.py(1 hunks)cognee/tests/tasks/descriptive_metrics/metric_consistency_test.py(1 hunks)cognee/tests/tasks/descriptive_metrics/metrics_test_utils.py(1 hunks)cognee/tests/tasks/descriptive_metrics/neo4j_metrics_test.py(1 hunks)cognee/tests/tasks/descriptive_metrics/networkx_metrics_test.py(1 hunks)
💤 Files with no reviewable changes (1)
- cognee/api/v1/cognify/cognify_v2.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: run_notebook_test / test
- GitHub Check: windows-latest
- GitHub Check: docker-compose-test
🔇 Additional comments (4)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2)
534-539: Validate GDS plugin presence and handle potential errors
The function relies on the GDS plugin. If the plugin is missing or if the query fails, the code might raise an exception. Consider adding error handling or user feedback if the GDS plugin isn't available.
579-674: Validate non-empty query results before indexing
Accessingnodes[0]["nodes"]oredges[0]["elements"]without first checking ifnodesoredgesis empty can lead to anIndexError. Consider adding a guard clause to ensure results are not empty before indexing.cognee/modules/data/models/GraphMetrics.py (2)
27-28: LGTM! Good improvement in timestamp handling.Switching to database functions (
func.now()) for timestamp management is a better practice as it ensures consistency and reduces overhead.
13-14: Address the TODO comment about graph database ID.The TODO comment suggests that the ID column needs to be updated to reflect the unique ID of the graph database.
Would you like me to help implement a solution for this? I can suggest approaches to integrate the graph database's unique identifier with this model.
a3a0f5a to
9b0cb77
Compare
9b0cb77 to
92ae1d0
Compare
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin
Summary by CodeRabbit