Skip to content

Conversation

@hajdul88
Copy link
Collaborator

@hajdul88 hajdul88 commented Jan 31, 2025

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

Release Notes

  • Tests

    • Added comprehensive unit tests for graph model generation
    • Introduced new test scenarios covering various data structures and edge cases
    • Implemented tests for document, chunk, and entity relationships
  • Chores

    • Updated continuous deployment workflow to trigger only on dev branch

The release focuses on improving test coverage and refining the deployment process.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2025

Walkthrough

This pull request introduces comprehensive unit tests for the get_graph_from_model function in the Cognee project. The changes include defining new data classes (Document, DocumentChunk, EntityType, and Entity) to support testing various scenarios of graph model generation. Multiple asynchronous test functions have been added to verify the function's behavior under different conditions, such as simple structures, document-chunk relationships, duplicate references, multi-level nesting, and edge cases with no contained entities.

Changes

File Change Summary
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py Added new data classes and comprehensive async test methods for get_graph_from_model function
.github/workflows/cd.yaml Removed feature/* branch pattern from push event trigger

Possibly related issues

Possibly related PRs

Suggested labels

do not merge, testing, unit-tests

Suggested reviewers

  • borisarzentar

Poem

🐰 Graphing Adventures, a Rabbit's Tale 🌟

In tests we weave, with classes so bright,
Entities dancing in graph's pure delight
Chunks and documents, a nested embrace
Our model now leaps with algorithmic grace!

Hop, test, explore - the rabbit's refrain! 🚀

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1)

34-56: Consider adding more test cases for better coverage.

While the current test case is good, consider adding:

  1. Edge cases (e.g., cyclic references)
  2. Negative test cases (e.g., invalid relationships)
  3. Complex structures with multiple levels
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

208-221: LGTM! Good use of APOC merge procedure.

The switch to apoc.merge.relationship is a good improvement as it prevents duplicate relationships. The implementation is efficient with batch processing using UNWIND.

Consider adding more specific error handling for common Neo4j relationship errors. Here's a suggested improvement:

     try:
         results = await self.query(query, dict(edges=edges))
         return results
     except Neo4jError as error:
+        if 'ConstraintValidationFailed' in str(error):
+            logger.error("Constraint validation failed while adding edges: %s", error)
+            raise ValueError("Invalid edge data: Constraint validation failed") from error
+        if 'NodeNotFound' in str(error):
+            logger.error("Source or target node not found while adding edges: %s", error)
+            raise ValueError("Invalid edge data: Node not found") from error
         logger.error("Neo4j query error: %s", error, exc_info=True)
         raise error
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f843c25 and a1a89df.

📒 Files selected for processing (5)
  • cognee/api/v1/cognify/cognify_v2.py (1 hunks)
  • cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1 hunks)
  • cognee/modules/graph/utils/get_graph_from_model.py (1 hunks)
  • cognee/tasks/storage/add_data_points.py (1 hunks)
  • cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1 hunks)
🔇 Additional comments (4)
cognee/tasks/storage/add_data_points.py (1)

8-8: LGTM! Function signature simplified.

The removal of the only_root parameter simplifies the interface while maintaining the core functionality.

cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1)

8-31: LGTM! Well-structured test classes.

The class hierarchy and metadata definitions are clear and follow best practices.

cognee/modules/graph/utils/get_graph_from_model.py (1)

100-100: LGTM! Simplified node processing logic.

The removal of the only_root check simplifies the traversal logic while maintaining correct behavior.

cognee/api/v1/cognify/cognify_v2.py (1)

167-167: LGTM! Task configuration simplified.

The removal of the only_root parameter from the Task instantiation maintains consistency with the refactored functions.

@hajdul88 hajdul88 marked this pull request as draft January 31, 2025 12:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (3)

2-2: Remove unused import.

The Optional type hint is imported but not used in the code.

-from typing import List, Optional
+from typing import List
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff formatting standards.


13-17: Consider forward reference for circular import.

The contains field creates a circular import with the Entity class. While using string literal type annotation works, consider using from __future__ import annotations for a more modern approach to handle forward references.

+from __future__ import annotations

class DocumentChunk(DataPoint):
    part_of: Document
    text: str
-    contains: List["Entity"] = None
+    contains: List[Entity] = None
    _metadata = {"index_fields": ["text"], "type": "DocumentChunk"}
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff formatting standards.


34-56: Enhance test coverage and assertions.

While the test covers basic functionality, consider these improvements:

  1. Add cleanup using pytest fixtures
  2. Add more assertions to verify node/edge properties
  3. Add edge cases and error scenarios

Example enhancement:

@pytest.fixture
async def setup_test_data():
    entitytype = EntityType(name="TestType")
    entity = Entity(name="TestEntity", is_type=entitytype)
    yield entitytype, entity
    # Cleanup if needed
    
@pytest.mark.asyncio
async def test_get_graph_from_model_simple_structure(setup_test_data):
    entitytype, entity = setup_test_data
    added_nodes = {}
    added_edges = {}
    visited_properties = {}
    
    nodes, edges = await get_graph_from_model(
        entity, added_nodes, added_edges, visited_properties
    )
    
    # Existing assertions
    assert len(nodes) == 2
    assert len(edges) == 1
    edge_key = str(entity.id) + str(entitytype.id) + "is_type"
    assert edge_key in added_edges
    
    # Additional assertions
    assert entity.id in added_nodes
    assert entitytype.id in added_nodes
    assert added_edges[edge_key]["type"] == "is_type"
    assert added_edges[edge_key]["source"] == entity.id
    assert added_edges[edge_key]["target"] == entitytype.id
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1a89df and 03fdc41.

📒 Files selected for processing (1)
  • cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: ruff format
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

⏰ Context from checks skipped due to timeout of 90000ms (21)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: profiler
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: docker-compose-test
  • GitHub Check: profiler
  • GitHub Check: Publish Cognee Docker image
🔇 Additional comments (1)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1)

31-31: Ensure consistent model rebuilding.

model_rebuild() is only called for DocumentChunk. Consider if other models also need rebuilding for consistency.

✅ Verification successful

Model rebuilding implementation is consistent with codebase patterns

The model_rebuild() usage for DocumentChunk follows the established pattern across the codebase, where models are rebuilt in their respective contexts including test setups, model definitions, and utility functions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other model_rebuild() calls in the codebase
rg "model_rebuild\(\)" -A 2 -B 2

Length of output: 2599

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

@borisarzentar borisarzentar changed the title Feature/cog 754 implement unit tests and extensive checks around the feat: implement unit tests and extensive checks around the get_graph_from_model [COG-754] Jan 31, 2025
@hajdul88 hajdul88 marked this pull request as ready for review January 31, 2025 15:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (2)

31-31: Consider model rebuild for all classes.

Only DocumentChunk is being rebuilt. For consistency, consider rebuilding all model classes that have circular references.

Apply this diff to rebuild all models:

 DocumentChunk.model_rebuild()
+Document.model_rebuild()
+EntityType.model_rebuild()
+Entity.model_rebuild()

34-55: Consider using test fixtures for common test data.

The test creates entity types and entities that are reused across multiple tests. Consider using pytest fixtures to reduce code duplication and improve maintainability.

Example implementation:

@pytest.fixture
def entity_type():
    return EntityType(name="TestType")

@pytest.fixture
def entity(entity_type):
    return Entity(name="TestEntity", is_type=entity_type)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03fdc41 and 9e9ca24.

📒 Files selected for processing (1)
  • cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: windows-latest
🔇 Additional comments (2)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (2)

81-100: Add negative test case for duplicate handling.

The test verifies duplicate entity handling but doesn't test the negative case where entities have the same name but different IDs.

Add a test case for entities with same name but different IDs:

entity_type = EntityType(name="Animal")
entity1 = Entity(name="Cat", is_type=entity_type)
entity2 = Entity(name="Cat", is_type=entity_type)  # Same name, different ID
doc_chunk.contains.extend([entity1, entity2])
✅ Verification successful

The suggested test case is valid and necessary

The review comment correctly identifies a gap in test coverage. The current test verifies handling of duplicate references to the same entity instance, while the suggested test would verify that entities with the same name but different IDs are correctly treated as distinct nodes. This aligns with the implementation's ID-based deduplication logic in get_graph_from_model.

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Find Entity class definition
ast-grep --pattern 'class Entity'

# Find get_graph_from_model implementation
ast-grep --pattern 'async def get_graph_from_model'

# Search for other test files related to graph
fd "test.*graph.*\.py$"

Length of output: 184


Script:

#!/bin/bash
# Search for Entity class with more context
rg -l "class Entity" 

# Look for get_graph_from_model implementation
rg "def get_graph_from_model" -A 5

# Check imports in the test file
cat cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py | grep "import"

# Search for model definitions
fd "models?.py$"

Length of output: 2481


Script:

#!/bin/bash
# Check Entity class implementation
cat cognee/modules/engine/models/Entity.py

# Check get_graph_from_model implementation
cat cognee/modules/graph/utils/get_graph_from_model.py

# Look for any existing duplicate-related tests
rg -l "duplicate" cognee/tests/

Length of output: 4522


104-135: Verify node uniqueness across multiple calls.

The test combines results from multiple get_graph_from_model calls but doesn't verify that nodes aren't duplicated when processing related chunks.

Add assertions to verify node uniqueness:

# Verify no duplicate nodes
node_ids = {node.id for node in all_nodes}
assert len(node_ids) == len(all_nodes), "Duplicate nodes found"

# Verify shared nodes (doc) appear only once
doc_nodes = [node for node in all_nodes if node.id == doc.id]
assert len(doc_nodes) == 1, "Document node duplicated"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/test_python_3_12.yml (1)

44-47: Consider a more CI-friendly approach to handle pytest-asyncio.

While the current implementation works, using poetry add in CI modifies the project's dependencies. Consider these alternatives:

  1. Add pytest-asyncio to pyproject.toml directly
  2. Use poetry install with a specific version

Apply this diff to make it more CI-friendly:

-      - name: Add pytest-asyncio
-        run: |
-          poetry add --dev pytest-asyncio
+      - name: Install pytest-asyncio
+        run: |
+          poetry run pip install "pytest-asyncio==0.23.3"

Or better yet, add it to pyproject.toml:

[tool.poetry.group.dev.dependencies]
pytest-asyncio = "^0.23.3"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e9ca24 and 41007aa.

📒 Files selected for processing (1)
  • .github/workflows/test_python_3_12.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: docker-compose-test
  • GitHub Check: Publish Cognee Docker image

@hajdul88 hajdul88 requested review from borisarzentar, dexters1 and lxobr and removed request for lxobr January 31, 2025 16:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_test.py (6)

8-32: LGTM! Consider adding type hints for _metadata.

The data class hierarchy is well-structured with clear relationships. Consider adding type hints for the _metadata class variable to improve code maintainability:

-    _metadata = {"index_fields": [], "type": "Document"}
+    _metadata: dict[str, list[str] | str] = {"index_fields": [], "type": "Document"}

34-55: Enhance test coverage with node property assertions.

The test verifies basic structure but could be more thorough. Consider adding assertions for node properties to ensure the graph maintains the correct entity information:

     assert len(nodes) == 2, f"Expected 2 nodes, got {len(nodes)}"
     assert len(edges) == 1, f"Expected 1 edges, got {len(edges)}"
 
     edge_key = str(entity.id) + str(entitytype.id) + "is_type"
     assert edge_key in added_edges, f"Edge {edge_key} not found"
+    
+    # Verify node properties
+    entity_node = next(n for n in nodes if n["id"] == str(entity.id))
+    assert entity_node["name"] == "TestEntity"
+    
+    type_node = next(n for n in nodes if n["id"] == str(entitytype.id))
+    assert type_node["name"] == "TestType"

57-78: Add relationship validation between nodes.

While the test verifies node and edge counts, it should also validate the relationships between nodes to ensure correct graph structure:

     assert len(nodes) == 5, f"Expected 5 nodes, got {len(nodes)}"
     assert len(edges) == 5, f"Expected 5 edges, got {len(edges)}"
+    
+    # Verify relationships
+    doc_chunk_edges = [e for e in edges if str(doc_chunk.id) in e["source"]]
+    assert len(doc_chunk_edges) == 3  # part_of + 2 contains
+    
+    entity_edges = [e for e in edges if e["type"] == "is_type"]
+    assert len(entity_edges) == 2  # Two entities with is_type relationship

80-101: Add explicit verification of entity uniqueness.

The test effectively verifies deduplication through counts, but could be more explicit in checking entity uniqueness:

     assert len(nodes) == 4, f"Expected 4 nodes, got {len(nodes)}"
     assert len(edges) == 3, f"Expected 3 edges, got {len(edges)}"
+    
+    # Verify entity uniqueness
+    entity_nodes = [n for n in nodes if n.get("type") == "Entity"]
+    assert len(entity_nodes) == 1, "Duplicate entity nodes found"
+    
+    contains_edges = [e for e in edges if e["type"] == "contains"]
+    assert len(contains_edges) == 1, "Duplicate contains edges found"

103-136: Add validation of node type distribution.

The test handles complex nesting well but should verify the distribution of node types:

     assert len(all_nodes) == 8, f"Expected 8 nodes, got {len(all_nodes)}"
     assert len(all_edges) == 8, f"Expected 8 edges, got {len(all_edges)}"
+    
+    # Verify node type distribution
+    node_types = {n.get("type"): [] for n in all_nodes}
+    for node in all_nodes:
+        node_types[node.get("type")].append(node)
+    
+    assert len(node_types["Document"]) == 1
+    assert len(node_types["DocumentChunk"]) == 2
+    assert len(node_types["Entity"]) == 3
+    assert len(node_types["EntityType"]) == 2

138-151: Add type verification for empty contains case.

The test covers the empty contains case but should explicitly verify node types:

     assert len(nodes) == 2, f"Expected 2 nodes, got {len(nodes)}"
     assert len(edges) == 1, f"Expected 1 edge, got {len(edges)}"
+    
+    # Verify node types
+    node_types = {n["id"]: n.get("type") for n in nodes}
+    assert node_types[str(doc.id)] == "Document"
+    assert node_types[str(chunk.id)] == "DocumentChunk"
+    
+    # Verify edge type
+    assert edges[0]["type"] == "part_of"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41007aa and a7ab508.

📒 Files selected for processing (2)
  • .github/workflows/cd.yaml (0 hunks)
  • cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_test.py (1 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/cd.yaml

@hajdul88 hajdul88 merged commit 2fd6bfa into dev Jan 31, 2025
26 of 28 checks passed
@hajdul88 hajdul88 deleted the feature/cog-754-implement-unit-tests-and-extensive-checks-around-the branch January 31, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants