Test: test descriptive graph metric calculation in neo4j and networkx adapters [COG-1188] #500

alekszievr · 2025-02-05T11:30:41Z

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

Refactor
- Updated the default processing flow by removing a descriptive metrics task.
New Features
- Introduced asynchronous graph management capabilities including checks, projection, and deletion.
- Enhanced graph metrics extraction with additional analytics.
Chores
- Improved timestamp handling using database-driven defaults.
Tests
- Added tests to verify graph metrics consistency and accuracy.
- Integrated a new CI workflow for automated testing of graph metrics.

…oken-counting

…-tokens-to-metric-table

…add-num-tokens-to-metric-table

…og-1082-metrics-in-networkx-adapter

…g-1082-metrics-in-neo4j-adapter

coderabbitai · 2025-02-05T11:30:48Z

Walkthrough

This update removes the default inclusion of the descriptive metrics task from the cognify API. The Neo4j adapter now features new asynchronous methods to check, project, and drop graphs, and its metrics function has been enhanced. The GraphMetrics model has been updated to use database-side timestamp handling. Additionally, several asynchronous test modules have been added to verify metrics consistency between Neo4j and NetworkX, and a new GitHub Actions workflow has been introduced to run these tests.

Changes

File(s)	Change Summary
cognee/api/v1/.../cognify_v2.py	Removed the task `Task(store_descriptive_metrics, include_optional=True)` from the `get_default_tasks` function.
cognee/infrastructure/.../adapter.py	Added async methods: `graph_exists`, `project_entire_graph`, and `drop_graph`; updated `get_graph_metrics` to compute metrics (nodes, edges, mean degree, etc.) with logging for unimplemented features.
cognee/modules/data/models/GraphMetrics.py	Updated the `created_at` and `updated_at` columns to use `server_default=func.now()` and `onupdate=func.now()` respectively, removing reliance on Python lambda functions.
cognee/tests/tasks/descriptive_metrics/*.py	Added several async test functions: a consistency check between Neo4j and NetworkX metrics, a disconnected test graph creator, and individual tests (`neo4j_metrics_test.py` and `networkx_metrics_test.py`) validating graph metrics.
.github/workflows/test_descriptive_graph_metrics.yml	Introduced a new GitHub Actions workflow triggering tests for descriptive graph metrics using a reusable workflow, with concurrency controls and secrets management for secure resource access.

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner
    participant Neo4jAdapter
    participant NetworkXEngine

    TestRunner->>Neo4jAdapter: get_neo4j_metrics(include_optional=False)
    Neo4jAdapter-->>TestRunner: Return graph metrics data
    TestRunner->>NetworkXEngine: get_networkx_metrics(include_optional=False)
    NetworkXEngine-->>TestRunner: Return graph metrics data
    TestRunner->>TestRunner: Compare metrics and assert consistency

Possibly related PRs

feat: Calculate graph metrics for networkx graph [COG-1082] #484: Adjusts the store_descriptive_metrics functionality by adding the include_optional parameter, which is directly connected to its removal in this PR.
feat: Add graph metrics getter in graph db interface and adapters [COG-1082] #483: Modifies the store_descriptive_metrics function; its changes relate to the removal of the task in the current PR.
Changes Neo4j add_edge method and implements unit tests around get_graph_from_model logic #474: Alters task configurations inside get_default_tasks, including parameter changes for add_data_points, linking it to the removal of the store_descriptive_metrics task.

Suggested labels

run-checks, do not merge

Suggested reviewers

borisarzentar
lxobr

Poem

Hi, I'm a little rabbit on the run,
Hopping through changes, oh what fun!
Graphs and metrics now dance in time,
With tests and code that rhyme sublime.
ASCII hops and code so neat –
Celebrating each small, clever beat!
🐰✨

Tip

🌐 Web search-backed reviews and chat

We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
You can disable this feature by setting web_search: false in the knowledge_base settings.
Please share any feedback in the Discord discussion.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3a0f5a and eddfef0.

📒 Files selected for processing (1)

.github/workflows/test_descriptive_graph_metrics.yml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_networkx_metrics_test / test
GitHub Check: run_simple_example_test / test
GitHub Check: test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: windows-latest
GitHub Check: test
GitHub Check: test
GitHub Check: docker-compose-test

🔇 Additional comments (4)

.github/workflows/test_descriptive_graph_metrics.yml (4)

1-2: Workflow Name Declaration is Clear and Descriptive
The workflow name “test | descriptive graph metrics” concisely communicates the purpose of the test.

3-7: Workflow Trigger Configuration is Appropriate
The workflow is configured to trigger on both workflow_dispatch and specific pull_request events (i.e., on labeled and synchronize types). This approach allows for manual invocations as well as automated testing when PRs are updated. If additional trigger events (such as opened) are desired, consider adding them; otherwise, this configuration is suitable for its intended purpose.

9-12: Concurrency Setup is Well-Implemented
The use of the concurrency group based on ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} is a smart choice to avoid duplicate runs on concurrent events. Just verify that this expression handles both PR and non-PR contexts as expected.

13-29: Job Definition and Secrets Management are Correct
The job run_networkx_metrics_test leverages a reusable workflow (./.github/workflows/reusable_python_example.yml) and specifies its test script location via the example-location parameter. The mapping of numerous secrets (for LLM, embedding models, and Graphistry credentials) is done securely and according to best practices. Ensure that the referenced reusable workflow exists and that the secret names match the repository’s configuration.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

540-573: Consider sanitizing node and relationship labels
Currently, labels are directly interpolated into the Cypher statement. If a label contains special characters (including quotes), it could cause a syntax error. Consider escaping or sanitizing the labels to avoid query breakage.

cognee/tests/tasks/descriptive_metrics/metric_consistency_test.py (1)

1-14: Consider using a tolerance-based comparison for floating-point metrics
The assertion performs a direct equality check on the metrics. If any metric is floating-point, minimal numerical differences could cause test flakiness. A tolerance-based comparison may be more robust for floating values.

cognee/tests/tasks/descriptive_metrics/metrics_test_utils.py (1)

1-26: Add a clarifying docstring for the self-loop design
This function deliberately creates a self-loop by adding doc_chunk to its own contains list. Including a docstring or detailed comment can inform maintainers that this is intentional for testing self-loop detection.
cognee/tests/tasks/descriptive_metrics/neo4j_metrics_test.py (2)
19-38: Add test cases for optional metrics.

The test verifies basic metrics but doesn't test optional metrics like diameter, avg_shortest_path_length, and avg_clustering that are tested in the networkx version.

Add assertions for optional metrics to maintain consistency with networkx tests:
 async def test_neo4j_metrics():
     neo4j_metrics = await get_neo4j_metrics(include_optional=True)
     assert neo4j_metrics["num_nodes"] == 9, f"Expected 9 nodes, got {neo4j_metrics['num_nodes']}"
     assert neo4j_metrics["num_edges"] == 9, f"Expected 9 edges, got {neo4j_metrics['num_edges']}"
     assert neo4j_metrics["mean_degree"] == 2, (
         f"Expected mean degree is 2, got {neo4j_metrics['mean_degree']}"
     )
     assert neo4j_metrics["edge_density"] == 0.125, (
         f"Expected edge density is 0.125, got {neo4j_metrics['edge_density']}"
     )
     assert neo4j_metrics["num_connected_components"] == 2, (
         f"Expected 2 connected components, got {neo4j_metrics['num_connected_components']}"
     )
     assert neo4j_metrics["sizes_of_connected_components"] == [5, 4], (
         f"Expected connected components of size [5, 4], got {neo4j_metrics['sizes_of_connected_components']}"
     )
     assert neo4j_metrics["num_selfloops"] == 1, (
         f"Expected 1 self-loop, got {neo4j_metrics['num_selfloops']}"
     )
+    assert neo4j_metrics["diameter"] is None, (
+        f"Diameter should be None for disconnected graphs, got {neo4j_metrics['diameter']}"
+    )
+    assert neo4j_metrics["avg_shortest_path_length"] is None, (
+        f"Average shortest path should be None for disconnected graphs, got {neo4j_metrics['avg_shortest_path_length']}"
+    )
+    assert neo4j_metrics["avg_clustering"] == 0, (
+        f"Expected 0 average clustering, got {neo4j_metrics['avg_clustering']}"
+    )
19-38: Consider grouping related assertions.

The test assertions could be organized better by grouping related metrics together.

Consider reorganizing the test into logical groups:
 async def test_neo4j_metrics():
     neo4j_metrics = await get_neo4j_metrics(include_optional=True)
+    # Basic graph properties
     assert neo4j_metrics["num_nodes"] == 9, f"Expected 9 nodes, got {neo4j_metrics['num_nodes']}"
     assert neo4j_metrics["num_edges"] == 9, f"Expected 9 edges, got {neo4j_metrics['num_edges']}"
     assert neo4j_metrics["num_selfloops"] == 1, f"Expected 1 self-loop, got {neo4j_metrics['num_selfloops']}"
+
+    # Connectivity metrics
     assert neo4j_metrics["mean_degree"] == 2, (
         f"Expected mean degree is 2, got {neo4j_metrics['mean_degree']}"
     )
     assert neo4j_metrics["edge_density"] == 0.125, (
         f"Expected edge density is 0.125, got {neo4j_metrics['edge_density']}"
     )
+
+    # Component analysis
     assert neo4j_metrics["num_connected_components"] == 2, (
         f"Expected 2 connected components, got {neo4j_metrics['num_connected_components']}"
     )
     assert neo4j_metrics["sizes_of_connected_components"] == [5, 4], (
         f"Expected connected components of size [5, 4], got {neo4j_metrics['sizes_of_connected_components']}"
     )
-    assert neo4j_metrics["num_selfloops"] == 1, (
-        f"Expected 1 self-loop, got {neo4j_metrics['num_selfloops']}"
-    )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df163b0 and a3a0f5a.

📒 Files selected for processing (8)

.github/workflows/test_networkx_descriptive_metrics.yml (1 hunks)
cognee/api/v1/cognify/cognify_v2.py (0 hunks)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1 hunks)
cognee/modules/data/models/GraphMetrics.py (1 hunks)
cognee/tests/tasks/descriptive_metrics/metric_consistency_test.py (1 hunks)
cognee/tests/tasks/descriptive_metrics/metrics_test_utils.py (1 hunks)
cognee/tests/tasks/descriptive_metrics/neo4j_metrics_test.py (1 hunks)
cognee/tests/tasks/descriptive_metrics/networkx_metrics_test.py (1 hunks)

💤 Files with no reviewable changes (1)

cognee/api/v1/cognify/cognify_v2.py

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: run_notebook_test / test
GitHub Check: windows-latest
GitHub Check: docker-compose-test

🔇 Additional comments (4)

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2)

534-539: Validate GDS plugin presence and handle potential errors
The function relies on the GDS plugin. If the plugin is missing or if the query fails, the code might raise an exception. Consider adding error handling or user feedback if the GDS plugin isn't available.

579-674: Validate non-empty query results before indexing
Accessing nodes[0]["nodes"] or edges[0]["elements"] without first checking if nodes or edges is empty can lead to an IndexError. Consider adding a guard clause to ensure results are not empty before indexing.

cognee/modules/data/models/GraphMetrics.py (2)

27-28: LGTM! Good improvement in timestamp handling.

Switching to database functions (func.now()) for timestamp management is a better practice as it ensures consistency and reduces overhead.

13-14: Address the TODO comment about graph database ID.

The TODO comment suggests that the ID column needs to be updated to reflect the unique ID of the graph database.

Would you like me to help implement a solution for this? I can suggest approaches to integrate the graph database's unique identifier with this model.

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py

cognee/tests/tasks/descriptive_metrics/networkx_metrics_test.py

.github/workflows/test_networkx_descriptive_metrics.yml

alekszievr and others added 30 commits January 28, 2025 12:11

Count the number of tokens in documents

458eeac

Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1071-input-t…

51eadef

…oken-counting

Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1071-input-t…

ba608a4

…oken-counting

save token count to relational db

f6663ab

Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1132-add-num…

9182be8

…-tokens-to-metric-table

Add metrics to metric table

72dfec4

Merge branch 'dev' into feat/cog-1071-input-token-counting

9bd5917

Merge branch 'feat/cog-1071-input-token-counting' into feat/cog-1132-…

227d94e

…add-num-tokens-to-metric-table

Store list as json instead of array in relational db table

22b6459

Merge branch 'dev' into feat/cog-1132-add-num-tokens-to-metric-table

9764441

Sum in sql instead of python

100e7d7

Unify naming

c182d47

Return data_points in descriptive metric calculation task

44fa2cd

Graph metrics getter template in graph db interface and adapters

06030ff

Calculate descriptive metrics in networkx adapter

67d9908

neo4j metrics

252ac7f

Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface

48a51a3

remove _table from table name

9a94db8

Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface

57fb338

Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface

e8dcef1

Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface

b0f6ba7

Use modules for adding to db instead of infrastructure

05138fa

Merge branch 'feat/cog-1082-metrics-in-graphdb-interface' into feat/c…

f064f52

…og-1082-metrics-in-networkx-adapter

Merge branch 'feat/cog-1082-metrics-in-networkx-adapter' into feat/co…

c9ee1bc

…g-1082-metrics-in-neo4j-adapter

Merge branch 'dev' into feat/cog-1082-metrics-in-networkx-adapter

af8e798

Merge branch 'feat/cog-1082-metrics-in-networkx-adapter' into feat/co…

406057f

…g-1082-metrics-in-neo4j-adapter

minor fixes

d93b5f5

minor cleanup

c13fdec

Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter

f2ad1d4

Remove graph metric calculation from the default cognify pipeline

3e67828

alekszievr added 2 commits February 5, 2025 12:18

descriptive metrics tests

34ce4f8

networkx metrics test

1bc55f9

all descriptive metrics tests

c102f26

coderabbitai bot reviewed Feb 5, 2025

View reviewed changes

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py Show resolved Hide resolved

cognee/tests/tasks/descriptive_metrics/networkx_metrics_test.py Show resolved Hide resolved

.github/workflows/test_networkx_descriptive_metrics.yml Show resolved Hide resolved

alekszievr force-pushed the test/metrics_in_adapters branch from a3a0f5a to 9b0cb77 Compare February 5, 2025 11:37

Merge branch 'dev' into test/metrics_in_adapters

92ae1d0

alekszievr force-pushed the test/metrics_in_adapters branch from 9b0cb77 to 92ae1d0 Compare February 5, 2025 11:41

remove neo4j metrics test due to lack of gds plugin

eddfef0

alekszievr changed the title ~~Test: test descriptive graph metric calculation in neo4j and networkx adapters~~ Test: test descriptive graph metric calculation in neo4j and networkx adapters [COG-1188] Feb 5, 2025

alekszievr self-assigned this Feb 5, 2025

alekszievr requested a review from lxobr February 5, 2025 15:56

borisarzentar and others added 2 commits February 6, 2025 11:03

Merge branch 'dev' into test/metrics_in_adapters

eb63421

Merge branch 'dev' into test/metrics_in_adapters

e842de6

borisarzentar approved these changes Feb 7, 2025

View reviewed changes

alekszievr merged commit 460691b into dev Feb 7, 2025
27 of 28 checks passed

alekszievr deleted the test/metrics_in_adapters branch February 7, 2025 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test: test descriptive graph metric calculation in neo4j and networkx adapters [COG-1188] #500

Test: test descriptive graph metric calculation in neo4j and networkx adapters [COG-1188] #500

Uh oh!

alekszievr commented Feb 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 5, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Test: test descriptive graph metric calculation in neo4j and networkx adapters [COG-1188] #500

Test: test descriptive graph metric calculation in neo4j and networkx adapters [COG-1188] #500

Uh oh!

Conversation

alekszievr commented Feb 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

DCO Affirmation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alekszievr commented Feb 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 5, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)