-
Notifications
You must be signed in to change notification settings - Fork 960
Cog 577 add unit test task #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request updates the testing workflows for Python versions 3.9, 3.10, and 3.11 by modifying the paths for unit tests and adding integration test steps. The workflow now includes conditional execution based on documentation changes and specific labels. Additionally, several test files were deleted, while new test files were created to validate the functionality of various components, including the Changes
Possibly related PRs
Suggested reviewers
Warning Rate limit exceeded@0xideas has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 13 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
2fb641e to
42be325
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Outside diff range and nitpick comments (18)
cognee/tests/unit/documents/pdf_document.py (2)
22-33: Enhance test coverage and error handlingThe current test only covers the happy path with valid input. We should add tests for edge cases and error conditions.
Consider adding these test cases:
- Test with non-existent file
- Test with empty PDF
- Test with invalid chunk sizes
- Test with PDF containing no text
- Test memory usage with large files
Example:
def test_pdf_document_with_invalid_file(): with pytest.raises(FileNotFoundError): pdf_doc = PdfDocument( id=uuid.uuid4(), name="nonexistent.pdf", raw_data_location="nonexistent.pdf" ) list(pdf_doc.read(chunk_size=1024)) def test_pdf_document_with_invalid_chunk_size(test_pdf_path): pdf_doc = PdfDocument( id=uuid.uuid4(), name="Test document.pdf", raw_data_location=test_pdf_path ) with pytest.raises(ValueError): list(pdf_doc.read(chunk_size=0))
5-8: Consider parameterizing test dataThe ground truth data is hardcoded in the test file. This makes it difficult to maintain and extend the test cases.
Consider using pytest's parameterize feature:
@pytest.mark.parametrize("expected", [ {"word_count": 879, "len_text": 5622, "cut_type": "sentence_end"}, {"word_count": 951, "len_text": 6384, "cut_type": "sentence_end"}, ]) def test_pdf_document_paragraphs(test_pdf_path, expected): pdf_doc = PdfDocument( id=uuid.uuid4(), name="Test document.pdf", raw_data_location=test_pdf_path ) paragraph_data = next(pdf_doc.read(chunk_size=1024)) assert expected["word_count"] == paragraph_data.word_count assert expected["len_text"] == len(paragraph_data.text) assert expected["cut_type"] == paragraph_data.cut_typecognee/tests/unit/utils/get_mock_user.py (2)
11-21: Add example usage to the docstring.The docstring is clear but would benefit from an example usage section to help other developers understand how to use this utility function effectively.
Add an example section to the docstring:
Returns: Mock: A configured mock User instance. + + Example: + >>> mock_user = get_mock_user(groups=[mock_group], name="Test User") + >>> assert mock_user.id is not None + >>> assert mock_user.name == "Test User" """
37-38: Add safety checks for dynamic attribute setting.The current implementation allows overriding critical attributes through
additional_attributes, which could lead to unexpected behavior in tests.Consider adding protection for critical attributes:
+ # Protected attributes that shouldn't be overridden + PROTECTED_ATTRS = {'id', '__tablename__', '__mapper_args__', 'groups'} + # Add any additional attributes for attr_name, attr_value in additional_attributes.items(): + if attr_name in PROTECTED_ATTRS: + raise ValueError(f"Cannot override protected attribute: {attr_name}") setattr(mock_user, attr_name, attr_value)cognee/tests/unit/run_tasks/test_run_tasks.py (1)
8-8: Add docstring and improve test result validation.The test function would benefit from:
- A clear docstring explaining the test's purpose and expected behavior
- Documentation of how the expected results are calculated
- More explicit assertions showing the transformation steps
Consider refactoring like this:
async def run_and_check_tasks(): + """Test the run_tasks pipeline with a sequence of number transformations. + + Pipeline steps: + 1. Generate numbers 1-10 + 2. Add 1 to each number (batched by 5) + 3. Multiply each number by 2 + 4. Add 1 to each number + + Example: 1 -> 2 -> 4 -> 5 + """ # ... existing code ... - results = [5, 7, 9, 11, 13, 15, 17, 19, 21, 23] - index = 0 + # Calculate expected results step by step + input_numbers = list(range(1, 11)) + expected_results = [ + ((num + 1) * 2) + 1 # Shows the transformation chain + for num in input_numbers + ] async for result in pipeline: - print(result) - assert result == results[index] - index += 1 + expected = expected_results.pop(0) + assert result == expected, f"Expected {expected}, got {result}"Also applies to: 34-39
cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py (2)
32-38: Consider removing or conditionally enabling debug print.The print statement in the assertion loop might pollute test output. Consider:
- Removing it if not needed for debugging
- Using a debug flag or pytest's caplog fixture for conditional logging
async for result in tasks_run: - print(result) + # Use pytest's caplog fixture or debug flag if logging is needed assert result == results[index] index += 1
Line range hint
40-50: Consider improving test architecture.Several architectural improvements could make this test more robust:
- Consider using a mock queue or async queue implementation for better test control
- Add proper cleanup in case of test failures
- Consider using pytest-asyncio for better async test handling
Example implementation:
import pytest from asyncio import Queue as AsyncQueue @pytest.mark.asyncio async def test_run_tasks_from_queue(): # Use async queue instead of sync queue data_queue = AsyncQueue() data_queue.is_closed = False try: await run_queue() finally: # Ensure cleanup data_queue.is_closed = True.github/workflows/test_python_3_9.yml (1)
Line range hint
54-63: Consider consolidating test execution stepsThe workflow currently has two separate test execution steps:
- Running unit tests with pytest
- Running
test_library.pyseparatelyThis split test execution could lead to:
- Inconsistent test reporting
- Harder maintenance
- Longer workflow execution time
Consider consolidating all tests under the pytest framework and running them in a single step. This would provide:
- Unified test reporting
- Consistent test execution environment
- Easier maintenance
- - name: Run tests - run: poetry run pytest cognee/tests/unit/ - - - name: Run default basic pipeline - env: - ENV: 'dev' - LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }} - run: poetry run python ./cognee/tests/test_library.py + - name: Run all tests + env: + ENV: 'dev' + LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }} + run: poetry run pytest cognee/tests/unit/ cognee/tests/test_library.py.github/workflows/test_python_3_10.yml (1)
54-54: Consider enhancing test reporting.The current workflow could benefit from additional test reporting features:
- Test result artifacts for debugging failed runs
- Test coverage reporting
- Test execution time tracking
Consider adding these improvements:
- run: poetry run pytest cognee/tests/unit/ + run: | + poetry run pytest cognee/tests/unit/ \ + --junitxml=test-results.xml \ + --cov=cognee \ + --cov-report=xml \ + --cov-report=html + + - name: Upload test results + if: always() + uses: actions/upload-artifact@v3 + with: + name: test-results + path: | + test-results.xml + coverage.xml + htmlcov/.github/workflows/test_python_3_11.yml (1)
54-54: Consider including integration tests in the workflowThe test path has been narrowed to only run unit tests (
cognee/tests/unit/). While unit tests are crucial, consider adding a separate step for integration tests to maintain comprehensive test coverage.Example addition:
- name: Run tests run: poetry run pytest cognee/tests/unit/ + - name: Run integration tests + run: poetry run pytest cognee/tests/integration/cognee/tests/unit/processing/test_chunking.py (3)
3-38: Consider adding more test cases to GROUND_TRUTHThe test data could be more comprehensive. Consider adding test cases for:
- Empty text
- Single paragraph
- Paragraphs at the word count boundary (12 words)
- Paragraphs exceeding the word count limit
40-47: Consider moving test data to fixturesThe test data could be moved to a separate fixture file to improve reusability and maintainability, especially if these test cases might be useful for testing related functionality.
64-69: Refactor test cases using pytest.parametrizeThe test cases could be combined using pytest's parametrize decorator for better maintainability and clearer test reporting.
Here's a suggested improvement:
+import pytest + +@pytest.mark.parametrize( + "scenario,test_text,ground_truth", + [ + ("whole_text", INPUT_TEXT["whole_text"], GROUND_TRUTH["whole_text"]), + ("cut_text", INPUT_TEXT["cut_text"], GROUND_TRUTH["cut_text"]), + ], +) +def test_chunking(scenario: str, test_text: str, ground_truth: list[dict]) -> None: + """ + Test chunk_by_paragraph with different scenarios. + + Args: + scenario: Test scenario name + test_text: Input text to be chunked + ground_truth: Expected chunking results + """ + run_chunking_test(test_text, ground_truth) -def test_chunking_whole_text(): - run_chunking_test(INPUT_TEXT["whole_text"], GROUND_TRUTH["whole_text"]) - -def test_chunking_cut_text(): - run_chunking_test(INPUT_TEXT["cut_text"], GROUND_TRUTH["cut_text"])cognee/tests/unit/integration/test_model_to_graph_to_model.py (4)
44-54: Remove duplicate ground truth data.
PARSED_PERSON_GROUND_TRUTHis identical toPERSON_GROUND_TRUTH. Consider reusingPERSON_GROUND_TRUTHinstead of maintaining duplicate data that could become inconsistent during maintenance.-PARSED_PERSON_GROUND_TRUTH = { - "id": "boris", - "name": "Boris", - "age": 30, - "driving_license": { - "issued_by": "PU Vrsac", - "issued_on": "2025-11-06", - "number": "1234567890", - "expires_on": "2025-11-06", - }, -}Then update line 171 to use
PERSON_GROUND_TRUTH.
69-91: Enhance type hints and add docstrings.The classes are well-structured but could benefit from:
- More specific type hints for
_metadata- Class-level docstrings explaining the purpose and relationships
- Documentation for the
index_fieldsmetadata usageExample enhancement:
class CarType(DataPoint): + """Represents a type of car with indexable name field.""" id: str name: CarTypeName - _metadata: dict = dict(index_fields=["name"]) + _metadata: dict[str, list[str]] = dict(index_fields=["name"])
93-109: Improve test helper function readability and maintainability.Consider the following improvements:
- Use more descriptive variable names (e.g.,
attribute_nameinstead ofkey)- Extract the time delta threshold to a constant
- Format error messages for better readability
+MAX_TIME_DELTA_SECONDS = 20 + def run_test_agains_ground_truth( test_target_item_name, test_target_item, ground_truth_dict ): - for key, ground_truth in ground_truth_dict.items(): - if isinstance(ground_truth, dict): - for key2, ground_truth2 in ground_truth.items(): + for attr_name, expected_value in ground_truth_dict.items(): + if isinstance(expected_value, dict): + for nested_attr, expected_nested_value in expected_value.items(): + actual_value = getattr(test_target_item, attr_name)[nested_attr] assert ( - ground_truth2 == getattr(test_target_item, key)[key2] - ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }" + expected_nested_value == actual_value + ), f"Mismatch in {test_target_item_name}.{attr_name}.{nested_attr}:\n Expected: {expected_nested_value}\n Actual: {actual_value}" else: - assert ground_truth == getattr( - test_target_item, key - ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }" + actual_value = getattr(test_target_item, attr_name) + assert expected_value == actual_value, ( + f"Mismatch in {test_target_item_name}.{attr_name}:\n" + f" Expected: {expected_value}\n" + f" Actual: {actual_value}" + ) time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at") - assert time_delta.total_seconds() < 20, f"{ time_delta.total_seconds() = }" + assert time_delta.total_seconds() < MAX_TIME_DELTA_SECONDS, ( + f"Update timestamp too old: {time_delta.total_seconds()} seconds" + )
147-173: Add docstrings to test functions.The test functions would benefit from clear docstrings explaining what aspects they're testing and their expected outcomes.
Example enhancement:
def test_extracted_person(graph_outputs): + """ + Test that a Person instance is correctly extracted to a graph node. + Verifies all attributes match the ground truth data. + """ (_, person, _, _) = graph_outputs run_test_agains_ground_truth("person", person, PERSON_GROUND_TRUTH)cognee/tests/test_pgvector.py (1)
Line range hint
1-96: Convert integration test to proper unit testsThis file appears to be an integration test rather than a unit test, which conflicts with its new location in the
unittests directory. The test has several design issues:
- It performs real I/O and database operations instead of using mocks
- Lacks proper test isolation and cleanup
- Mixes setup, execution, and assertions in a single function
- Has no error handling for database operations
Consider refactoring this into multiple focused unit tests:
import pytest from unittest.mock import Mock, patch from cognee.api.v1.search import SearchType @pytest.fixture def mock_vector_engine(): with patch('cognee.infrastructure.databases.vector.get_vector_engine') as mock: engine = Mock() engine.search.return_value = [Mock(payload={"text": "test_node"})] mock.return_value = engine yield mock @pytest.fixture def mock_cognee(): with patch('cognee.search') as mock: mock.return_value = ["test_result"] yield mock class TestPGVector: async def test_vector_search(self, mock_vector_engine): engine = mock_vector_engine() results = await engine.search("Entity_name", "AI") assert len(results) > 0 assert "text" in results[0].payload @pytest.mark.parametrize("search_type", [ SearchType.INSIGHTS, SearchType.CHUNKS, SearchType.SUMMARIES ]) async def test_cognee_search(self, mock_cognee, search_type): results = await cognee.search(search_type, query="test_query") assert len(results) > 0
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (4)
cognee/modules/data/processing/document_types/__tests__/artificial-inteligence.pdfis excluded by!**/*.pdfcognee/modules/data/processing/document_types/__tests__/soldiers-home.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v1.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v2.pdfis excluded by!**/*.pdf
📒 Files selected for processing (18)
.github/workflows/test_python_3_10.yml(1 hunks).github/workflows/test_python_3_11.yml(1 hunks).github/workflows/test_python_3_9.yml(1 hunks).gitignore(1 hunks)cognee/infrastructure/engine/__tests__/model_to_graph_to_model.test.py(0 hunks)cognee/modules/data/processing/document_types/__tests__/PdfDocument.test.py(0 hunks)cognee/modules/pipelines/operations/__tests__/get_graph_url.py(0 hunks)cognee/modules/pipelines/operations/__tests__/run_tasks.test.py(0 hunks)cognee/tasks/chunks/__tests__/chunk_by_paragraph.test.py(0 hunks)cognee/tasks/infer_data_ontology/models/models.py(0 hunks)cognee/tests/test_pgvector.py(1 hunks)cognee/tests/unit/documents/pdf_document.py(1 hunks)cognee/tests/unit/integration/test_model_to_graph_to_model.py(1 hunks)cognee/tests/unit/processing/test_chunking.py(1 hunks)cognee/tests/unit/run_tasks/test_run_tasks.py(1 hunks)cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py(3 hunks)cognee/tests/unit/utils/get_mock_user.py(1 hunks)pytest.ini(0 hunks)
💤 Files with no reviewable changes (7)
- cognee/infrastructure/engine/tests/model_to_graph_to_model.test.py
- cognee/modules/data/processing/document_types/tests/PdfDocument.test.py
- cognee/modules/pipelines/operations/tests/get_graph_url.py
- cognee/modules/pipelines/operations/tests/run_tasks.test.py
- cognee/tasks/chunks/tests/chunk_by_paragraph.test.py
- cognee/tasks/infer_data_ontology/models/models.py
- pytest.ini
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🧰 Additional context used
🪛 Ruff
cognee/tests/unit/integration/test_model_to_graph_to_model.py
139-139: Do not use bare except
(E722)
cognee/tests/unit/utils/get_mock_user.py
8-8: Undefined name Group
(F821)
26-26: Undefined name User
(F821)
🔇 Additional comments (10)
cognee/tests/unit/documents/pdf_document.py (1)
1-33: Verify test data file existence
The test depends on a PDF file in the test_data directory. We should verify this dependency is available.
✅ Verification successful
Test data file dependency is present and referenced correctly ✓
The verification confirms:
- The test data file
artificial-intelligence.pdfexists at the expected location:cognee/tests/test_data/artificial-intelligence.pdf - The file is referenced in two test files:
- The current test file:
cognee/tests/unit/documents/pdf_document.py - Another test file:
cognee/tests/test_library.py
- The current test file:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the test data file exists
fd artificial-intelligence.pdf
# Look for other test files that might be using the same test data
rg -l "artificial-intelligence.pdf" --type py
Length of output: 202
cognee/tests/unit/utils/get_mock_user.py (1)
6-40: Verify consistent usage pattern across test files.
Let's ensure this mock utility is used consistently across the test suite and that we're not duplicating similar mock creation logic elsewhere.
✅ Verification successful
Let me check the actual usage patterns in those test files to ensure consistency.
Mock user utility is consistently used and no duplicates found
The verification shows that:
get_mock_useris the only mock user creation utility in the codebase- It's consistently used in both test files (
test_run_tasks.pyandtest_run_tasks_from_queue.py) - The usage pattern is identical in both files, where it's imported and used within a patched wrapper
- No duplicate or inconsistent mock user creation patterns were found
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for similar mock user creation patterns and current usage
# Look for similar mock user creation patterns
echo "Checking for similar mock user creation patterns..."
rg -l "create_autospec.*User" "cognee/tests"
# Look for current usage of get_mock_user
echo "Checking current usage of get_mock_user..."
rg -l "get_mock_user" "cognee/tests"
# Look for potential duplicate mock user creation utilities
echo "Checking for potential duplicate utilities..."
rg -l "mock.*user" "cognee/tests"
Length of output: 751
Script:
#!/bin/bash
# Check the actual usage of get_mock_user in test files
echo "Usage in test_run_tasks.py:"
rg "get_mock_user" "cognee/tests/unit/run_tasks/test_run_tasks.py" -B 2 -A 2
echo -e "\nUsage in test_run_tasks_from_queue.py:"
rg "get_mock_user" "cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py" -B 2 -A 2
Length of output: 1043
🧰 Tools
🪛 Ruff
8-8: Undefined name Group
(F821)
26-26: Undefined name User
(F821)
cognee/tests/unit/run_tasks/test_run_tasks.py (1)
1-5: LGTM! Imports are well-organized and complete.
The imports cover all necessary components for async testing, mocking, and task execution.
cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py (2)
5-6: LGTM! Clean import additions for mocking.
The new imports are well-organized and specifically imported for the mocking functionality.
23-30: Verify pipeline_name parameter handling.
The addition of pipeline_name parameter is good for identification, but let's verify its handling in the run_tasks function.
✅ Verification successful
Based on the shell script results, I can now generate the final response as the implementation details of run_tasks function are clear.
Pipeline name parameter is properly handled with good defaults and telemetry
The pipeline_name parameter in run_tasks function has proper implementation:
- Has a default value "default_pipeline"
- Used consistently in logging and telemetry
- Properly propagated through error handling
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if pipeline_name is properly handled in run_tasks function
# Expected: Find implementation details of run_tasks function
# Search for run_tasks function definition
ast-grep --pattern 'def run_tasks($$$)'
# Search for pipeline_name usage
rg -A 5 'pipeline_name' --type py
Length of output: 4092
.github/workflows/test_python_3_9.yml (1)
54-54: Verify test path consistency across workflows
The test path has been updated to cognee/tests/unit/. Let's ensure this change is consistent across all Python workflow files.
✅ Verification successful
Test path is consistent across all Python workflow files
The verification shows that all Python workflow files (3.9, 3.10, and 3.11) consistently use the same test path cognee/tests/unit/ for pytest execution.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check test paths in all Python workflow files
# Expected: All workflow files should use the same test path pattern
echo "Checking test paths in Python workflow files..."
rg -l 'run: poetry run pytest' .github/workflows/ | while read -r file; do
echo "=== $file ==="
rg 'run: poetry run pytest.*' "$file"
done
Length of output: 885
.github/workflows/test_python_3_10.yml (1)
54-54: Verify the scope of test coverage.
The test path has been narrowed to only run unit tests. While this makes the test execution more focused, ensure that other types of tests (integration, functional) are covered in separate workflows.
Let's verify the test coverage setup:
✅ Verification successful
Test coverage is properly distributed across different test types
The test path change is appropriate as the repository maintains a well-organized test structure:
- Unit tests are in
cognee/tests/unit/ - Integration tests for different components (neo4j, pgvector, qdrant, weaviate) are in
cognee/tests/ - The project has proper test dependencies configured in pyproject.toml including pytest and coverage tools
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other test types and coverage configuration
# Check for other test directories
echo "Looking for other test directories:"
fd -t d "tests" -E "unit"
# Check for coverage configuration
echo -e "\nLooking for coverage configuration:"
fd -t f "coverage|pytest.ini|setup.cfg|pyproject.toml"
Length of output: 284
Script:
#!/bin/bash
# Check the content of pyproject.toml for test configuration and coverage settings
echo "Checking pyproject.toml for test configuration:"
cat pyproject.toml
# Check for test files in other test directories
echo -e "\nChecking for test files in other directories:"
fd -t f "test_.*\.py" tests/
fd -t f "test_.*\.py" cognee/tests/ -E "unit"
Length of output: 3332
.github/workflows/test_python_3_11.yml (1)
54-54: Verify test path consistency across Python version workflows
The test path has been updated to cognee/tests/unit/. Let's ensure this change is consistent across all Python version workflows.
✅ Verification successful
Test paths are consistent across all Python workflows
The verification confirms that all Python version workflows (3.9, 3.10, and 3.11) use the same test path cognee/tests/unit/ for running tests with pytest.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if test paths are consistent across Python workflow files
# Expected: All workflow files should use the same test path
echo "Checking test paths in Python workflow files..."
rg -A 1 'Run tests' .github/workflows/test_python_*.yml
Length of output: 702
cognee/tests/unit/processing/test_chunking.py (1)
1-2: LGTM: Clean and focused import
The import statement is specific and appropriately imports only the required function.
cognee/tests/test_pgvector.py (1)
68-68: Verify search term change impact on test reliability
The search term has been changed from "Quantum computer" to "AI", but the test data still contains content about quantum computers. This mismatch might affect test reliability.
Let's verify if the test data contains relevant content for the "AI" search term:
✅ Verification successful
The test data and search term are correctly aligned
The test data file Natural_language_processing.txt contains content about NLP and AI, which aligns well with the search term "AI". The test file's content specifically mentions "Natural language processing (NLP) is an interdisciplinary subfield of computer science" and discusses AI-related concepts like machine learning and language processing. Therefore, the change from "Quantum computer" to "AI" as a search term actually improves the test's reliability by making it search for content that exists in the test data.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if test data contains AI-related content
# Expected: The test data should contain content related to AI to ensure reliable search results
# Search for AI-related terms in the test data file
rg -i "(\bai\b|artificial intelligence)" "cognee/tests/test_data/Natural_language_processing.txt"
Length of output: 97
Script:
#!/bin/bash
# Let's first check if the file exists and see its content
fd "Natural_language_processing.txt" --type f
# Let's check all test data files in cognee/tests/test_data directory
fd . "cognee/tests/test_data" --type f --exec cat {}
# Let's also check the test file itself to understand the test setup
cat cognee/tests/test_pgvector.py
Length of output: 37004
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
cognee/tests/unit/utils/get_mock_user.py (1)
24-42: Consider adding validation and type checking.The implementation is good but could be enhanced with:
- Validation of additional_attributes to prevent overwriting essential attributes
- Type checking for the groups parameter to ensure each item is a Group instance
- Making the polymorphic_identity configurable for different user types
Here's a suggested improvement:
def get_mock_user( user_id: Optional[UUID] = None, groups: Optional[List["Group"]] = None, + polymorphic_identity: str = "user", **additional_attributes ) -> Mock: # Generate a random UUID if none provided user_id = user_id or uuid4() # Create base mock mock_user = create_autospec(User, instance=True) + # Validate groups if provided + if groups is not None: + for group in groups: + if not isinstance(group, (Group, Mock)): + raise TypeError(f"Expected Group instance, got {type(group)}") + + # Prevent overwriting essential attributes + protected_attrs = {"id", "__tablename__", "groups", "__mapper_args__"} + if protected_attrs.intersection(additional_attributes): + raise ValueError(f"Cannot override protected attributes: {protected_attrs}") # Configure basic attributes mock_user.id = user_id mock_user.__tablename__ = "users" mock_user.groups = groups or [] # Set polymorphic identity - mock_user.__mapper_args__ = {"polymorphic_identity": "user"} + mock_user.__mapper_args__ = {"polymorphic_identity": polymorphic_identity} # Add any additional attributes for attr_name, attr_value in additional_attributes.items(): setattr(mock_user, attr_name, attr_value) return mock_usercognee/tests/unit/run_tasks/test_run_tasks_from_queue.py (3)
24-31: Consider adding error handling to the pipeline tasksThe pipeline tasks could benefit from proper error handling to ensure test stability and better error reporting.
tasks_run = run_tasks( [ - Task(queue_consumer), - Task(add_one), - Task(multiply_by_two), + Task(queue_consumer, on_error=lambda e: print(f"Queue consumer error: {e}")), + Task(add_one, on_error=lambda e: print(f"Add one error: {e}")), + Task(multiply_by_two, on_error=lambda e: print(f"Multiply error: {e}")), ], pipeline_name="test_run_tasks_from_queue", )
33-38: Replace hardcoded results with dynamic calculationThe hardcoded results list could be replaced with a dynamic calculation to make the test more maintainable.
- results = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20] + results = [(i + 1) * 2 for i in range(10)] index = 0 async for result in tasks_run: print(result) assert result == results[index] index += 1
Line range hint
41-51: Improve test stability and resource cleanupThe current implementation has potential stability issues:
- Fixed sleep times could lead to flaky tests
- No explicit cleanup of queue resources
async def run_queue(): data_queue = Queue() data_queue.is_closed = False + try: async def queue_producer(): for i in range(0, 10): data_queue.put(i) - await asyncio.sleep(0.1) + await asyncio.sleep(0.01) # Reduce sleep time data_queue.is_closed = True await asyncio.gather(pipeline(data_queue), queue_producer()) + finally: + data_queue.is_closed = True # Ensure queue is closed even if test failscognee/tests/unit/integration/test_model_to_graph_to_model.py (2)
43-53: Consider applying DRY principle to ground truth data.
PARSED_PERSON_GROUND_TRUTHis identical toPERSON_GROUND_TRUTH. Consider deriving it fromPERSON_GROUND_TRUTHto avoid duplication:-PARSED_PERSON_GROUND_TRUTH = { - "id": "boris", - "name": "Boris", - "age": 30, - "driving_license": { - "issued_by": "PU Vrsac", - "issued_on": "2025-11-06", - "number": "1234567890", - "expires_on": "2025-11-06", - }, -} +PARSED_PERSON_GROUND_TRUTH = PERSON_GROUND_TRUTH
92-107: Fix typo and consider simplifying the helper function.
- The function name has a typo: "agains" should be "against"
- The nested dictionary comparison could be simplified using dictionary comparison
-def run_test_agains_ground_truth( +def run_test_against_ground_truth( test_target_item_name, test_target_item, ground_truth_dict ): for key, ground_truth in ground_truth_dict.items(): if isinstance(ground_truth, dict): - for key2, ground_truth2 in ground_truth.items(): - assert ( - ground_truth2 == getattr(test_target_item, key)[key2] - ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }" + assert ground_truth == getattr(test_target_item, key), ( + f"{test_target_item_name}/{key = }: {ground_truth = } != " + f"{getattr(test_target_item, key) = }" + ) else: assert ground_truth == getattr( test_target_item, key ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
cognee/tests/unit/documents/pdf_document.py(1 hunks)cognee/tests/unit/integration/test_model_to_graph_to_model.py(1 hunks)cognee/tests/unit/run_tasks/test_run_tasks.py(1 hunks)cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py(3 hunks)cognee/tests/unit/utils/get_mock_user.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- cognee/tests/unit/documents/pdf_document.py
- cognee/tests/unit/run_tasks/test_run_tasks.py
🧰 Additional context used
🪛 Ruff
cognee/tests/unit/integration/test_model_to_graph_to_model.py
138-138: Do not use bare except
(E722)
cognee/tests/unit/utils/get_mock_user.py
10-10: Undefined name Group
(F821)
🔇 Additional comments (5)
cognee/tests/unit/utils/get_mock_user.py (1)
8-23: Well-documented function signature!
The function signature is well-typed and the docstring provides clear documentation for all parameters.
🧰 Tools
🪛 Ruff
10-10: Undefined name Group
(F821)
cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py (1)
3-4: LGTM: Appropriate test dependencies imported
The added imports for mocking and test utilities are correctly implemented.
Also applies to: 7-8
cognee/tests/unit/integration/test_model_to_graph_to_model.py (3)
56-89: Well-structured class definitions with proper type safety!
The class hierarchy is well-designed with:
- Proper use of enums for type safety
- Clear inheritance from DataPoint
- Appropriate type hints
- Well-defined metadata for indexing
110-143: LGTM with existing review feedback!
The fixture is well-structured and creates appropriate test data. Please address the error handling feedback from the previous review.
🧰 Tools
🪛 Ruff
138-138: Do not use bare except
(E722)
146-172: Well-structured test suite with good separation of concerns!
The test functions are well-organized with:
- Clear focus on specific components (person, car, edge, parsed person)
- Consistent use of the helper function
- Comprehensive validation including nested objects
| from typing import List, Optional | ||
| from unittest.mock import Mock, create_autospec | ||
| from uuid import UUID, uuid4 | ||
|
|
||
| from cognee.modules.users.models import User | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing import for Group class.
The code uses Group as a type hint but it's not imported. This will cause type checking issues.
Add the following import:
from typing import List, Optional
from unittest.mock import Mock, create_autospec
from uuid import UUID, uuid4
from cognee.modules.users.models import User
+from cognee.modules.users.models import Group # Assuming it's in the same module as UserCommittable suggestion skipped: line range outside the PR's diff.
7c22668 to
7093852
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (7)
cognee/tests/unit/run_tasks/conftest.py (2)
11-13: Enhance the warning message and mock user implementation.The warning message could be more professional and the mock user implementation could be more explicit about its parameters.
Consider this improvement:
def get_mock_user_wrapper(): - warnings.warn("\n\n\n---------------USING MOCK USER--------------------\n\n\n") - return get_mock_user(None, None) + warnings.warn("Using mock user for testing purposes", UserWarning) + return get_mock_user(user_id=None, api_key=None)
15-18: Make the cleanup mechanism more robust.While the current implementation works, it could be made more robust by ensuring cleanup happens even if an exception occurs during testing.
Consider this improvement:
- get_default_user = cognee.modules.users.methods.get_default_user - cognee.modules.users.methods.get_default_user = get_mock_user_wrapper() - yield - cognee.modules.users.methods.get_default_user = get_default_user + original_get_default_user = cognee.modules.users.methods.get_default_user + try: + cognee.modules.users.methods.get_default_user = get_mock_user_wrapper() + yield + finally: + cognee.modules.users.methods.get_default_user = original_get_default_user🧰 Tools
🪛 Ruff
15-15: Redefinition of unused
get_default_userfrom line 4(F811)
cognee/tests/unit/run_tasks/test_run_tasks.py (1)
Line range hint
10-42: Add comprehensive test scenarios and improve test maintainability.The current implementation has several areas for improvement:
- Test only covers the happy path
- Print statement should be removed from test code
- Hardcoded results make maintenance difficult
Consider this refactoring:
async def run_and_check_tasks(): def number_generator(num): for i in range(num): yield i + 1 async def add_one(nums): for num in nums: yield num + 1 async def multiply_by_two(num): yield num * 2 async def add_one_single(num): yield num + 1 pipeline = run_tasks( [ Task(number_generator), Task(add_one, task_config={"batch_size": 5}), Task(multiply_by_two, task_config={"batch_size": 1}), Task(add_one_single), ], 10, pipeline_name="test_run_tasks", ) - results = [5, 7, 9, 11, 13, 15, 17, 19, 21, 23] + # Calculate expected results dynamically + expected_results = [(i + 1 + 1) * 2 + 1 for i in range(10)] index = 0 async for result in pipeline: - print(result) - assert result == results[index] + assert result == expected_results[index], f"Mismatch at index {index}" index += 1 + assert index == len(expected_results), "Not all results were processed"cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py (3)
Line range hint
8-12: Fix potential race condition in queue_consumerThe queue_consumer function has a potential race condition between checking
is_closedandempty(). The queue state could change between these checks.Consider this safer implementation:
async def queue_consumer(): - while not data_queue.is_closed: - if not data_queue.empty(): - yield data_queue.get() - else: - await asyncio.sleep(0.3) + while True: + try: + if data_queue.is_closed and data_queue.empty(): + break + if not data_queue.empty(): + yield data_queue.get() + else: + await asyncio.sleep(0.3) + except Exception as e: + print(f"Error in queue_consumer: {e}") + break
21-28: Improve pipeline configuration and type safetyThe pipeline configuration could be improved in several ways:
- Pipeline name is hardcoded which could cause conflicts in parallel tests
- Missing type hints which could help catch errors early
Consider these improvements:
+from typing import AsyncIterator, Any + tasks_run = run_tasks( [ Task(queue_consumer), Task(add_one), Task(multiply_by_two), ], - pipeline_name="test_run_tasks_from_queue", + pipeline_name=f"test_run_tasks_from_queue_{id(data_queue)}", # Ensure unique names )
Line range hint
38-48: Improve queue management and error handlingThe queue management could be improved with proper cleanup and error handling.
Consider these improvements:
async def run_queue(): - data_queue = Queue() - data_queue.is_closed = False + data_queue = Queue() + data_queue.is_closed = False + + try: + async def queue_producer(): + try: + for i in range(0, 10): + data_queue.put(i) + await asyncio.sleep(0.1) + except Exception as e: + print(f"Error in producer: {e}") + finally: + data_queue.is_closed = True - async def queue_producer(): - for i in range(0, 10): - data_queue.put(i) - await asyncio.sleep(0.1) - data_queue.is_closed = True - - await asyncio.gather(pipeline(data_queue), queue_producer()) + await asyncio.gather(pipeline(data_queue), queue_producer()) + finally: + # Ensure queue is marked as closed even if an error occurs + data_queue.is_closed = Truecognee/tests/unit/utils/get_mock_user.py (1)
23-41: Consider adding validation for better robustnessThe implementation is good, but could be enhanced with:
- Validation for additional_attributes to prevent overwriting critical attributes
- Type checking for the groups parameter to ensure each item is a Group mock
Here's a suggested improvement:
def get_mock_user( user_id: Optional[UUID] = None, groups: Optional[List["Group"]] = None, **additional_attributes ) -> Mock: # Generate a random UUID if none provided user_id = user_id or uuid4() # Create base mock mock_user = create_autospec(User, instance=True) # Configure basic attributes mock_user.id = user_id mock_user.__tablename__ = "users" - mock_user.groups = groups or [] + # Validate groups + if groups is not None: + if not isinstance(groups, list) or not all(isinstance(g, Mock) for g in groups): + raise ValueError("groups must be a list of Group mocks") + mock_user.groups = groups or [] # Set polymorphic identity mock_user.__mapper_args__ = {"polymorphic_identity": "user"} # Add any additional attributes + protected_attrs = {'id', '__tablename__', 'groups', '__mapper_args__'} for attr_name, attr_value in additional_attributes.items(): + if attr_name in protected_attrs: + raise ValueError(f"Cannot override protected attribute: {attr_name}") setattr(mock_user, attr_name, attr_value) return mock_user
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
cognee/tests/unit/run_tasks/conftest.py(1 hunks)cognee/tests/unit/run_tasks/test_run_tasks.py(3 hunks)cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py(3 hunks)cognee/tests/unit/utils/get_mock_user.py(1 hunks)pytest.ini(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pytest.ini
🧰 Additional context used
🪛 Ruff
cognee/tests/unit/run_tasks/conftest.py
15-15: Redefinition of unused get_default_user from line 4
(F811)
cognee/tests/unit/utils/get_mock_user.py
9-9: Undefined name Group
(F821)
🔇 Additional comments (6)
cognee/tests/unit/run_tasks/conftest.py (1)
8-9: Verify the session-scoped mock's impact.
The session-scoped mock user affects all tests. Ensure this is intentional and that no tests require the actual user implementation.
cognee/tests/unit/run_tasks/test_run_tasks.py (3)
2-7: LGTM: Import statements are well-organized.
The new imports are properly structured and necessary for the test implementation.
25-34: 🛠️ Refactor suggestion
Add validation for task configurations and error handling.
The pipeline configuration lacks error handling and validation testing.
Add test cases for:
- Invalid batch sizes
- Failed task execution
- Pipeline termination
- Empty input
Example test case for invalid batch size:
async def test_invalid_batch_size():
with pytest.raises(ValueError, match="Batch size must be positive"):
pipeline = run_tasks(
[Task(add_one, task_config={"batch_size": -1})],
10,
pipeline_name="test_invalid_batch"
)
async for _ in pipeline:
passLine range hint 10-45: Verify test coverage for the pipeline module.
Let's ensure adequate test coverage for the pipeline functionality.
cognee/tests/unit/utils/get_mock_user.py (2)
1-5: Missing import for Group class
This issue was previously identified in a past review comment. The Group class is used as a type annotation but not imported.
7-22: Well-documented function signature!
The function signature is properly typed and the docstring provides comprehensive information about the parameters and return value.
🧰 Tools
🪛 Ruff
9-9: Undefined name Group
(F821)
f790685 to
2ad5283
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (7)
cognee/tests/unit/run_tasks/conftest.py (3)
8-9: Add docstring to explain the fixture's purpose and scopeThe fixture would benefit from a clear docstring explaining its purpose, scope, and impact on tests.
@pytest.fixture(autouse=True, scope="session") def set_get_mock_user_wrapper(): + """ + Session-wide fixture that replaces the default user getter with a mock version. + This fixture is automatically applied to all tests in the session. + + Yields: + None: The fixture handles setup and teardown of the mock user functionality. + """
11-13: Enhance the warning message with more contextThe current warning message could be more informative about the implications of using a mock user.
def get_mock_user_wrapper(): - warnings.warn("\n\n\n---------------USING MOCK USER--------------------\n\n\n") + warnings.warn( + "Using mock user for testing. This replaces the default user " + "authentication for the entire test session.", + UserWarning + ) return get_mock_user(None, None)
8-18: Consider using pytest's monkeypatch fixtureThe current implementation modifies global state directly. Consider using pytest's built-in
monkeypatchfixture for safer module-level modifications.Would you like me to provide an alternative implementation using
monkeypatch? This would be safer as it ensures proper cleanup even if tests fail.🧰 Tools
🪛 Ruff
15-15: Redefinition of unused
get_default_userfrom line 4(F811)
cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py (2)
Line range hint
6-28: Improve pipeline implementation with error handling and resource managementThe pipeline implementation needs several improvements:
- The queue_consumer while loop should have a timeout mechanism
- Error handling is missing for queue operations
- The busy-wait loop with sleep could cause unnecessary CPU usage
Consider this improved implementation:
async def pipeline(data_queue): async def queue_consumer(): + timeout = 30 # seconds + start_time = asyncio.get_event_loop().time() while not data_queue.is_closed: + current_time = asyncio.get_event_loop().time() + if current_time - start_time > timeout: + raise TimeoutError("Queue consumer timed out") if not data_queue.empty(): - yield data_queue.get() + try: + yield data_queue.get() + except Exception as e: + raise RuntimeError(f"Error processing queue item: {e}") else: - await asyncio.sleep(0.3) + await asyncio.sleep(0.1) # Reduced sleep time
Line range hint
38-48: Add proper resource cleanup and error handling to queue runnerThe queue runner implementation needs improvements in resource management and error handling.
Consider this improved implementation:
async def run_queue(): - data_queue = Queue() - data_queue.is_closed = False + data_queue = None + try: + data_queue = Queue() + data_queue.is_closed = False - async def queue_producer(): - for i in range(0, 10): - data_queue.put(i) - await asyncio.sleep(0.1) - data_queue.is_closed = True + async def queue_producer(): + try: + for i in range(0, 10): + data_queue.put(i) + await asyncio.sleep(0.1) + finally: + data_queue.is_closed = True - await asyncio.gather(pipeline(data_queue), queue_producer()) + await asyncio.gather(pipeline(data_queue), queue_producer()) + except Exception as e: + if data_queue: + data_queue.is_closed = True + raisecognee/tests/unit/integration/test_model_to_graph_to_model.py (2)
32-54: Consider applying DRY principle to ground truth data.
PERSON_GROUND_TRUTHandPARSED_PERSON_GROUND_TRUTHare identical. Consider using a single constant to avoid duplication and potential maintenance issues.-PARSED_PERSON_GROUND_TRUTH = { - "id": "boris", - "name": "Boris", - "age": 30, - "driving_license": { - "issued_by": "PU Vrsac", - "issued_on": "2025-11-06", - "number": "1234567890", - "expires_on": "2025-11-06", - }, -} +PARSED_PERSON_GROUND_TRUTH = PERSON_GROUND_TRUTH
93-109: Consider making the time delta check more CI-friendly.The 20-second time delta check might be too strict for CI environments where test execution could be delayed. Consider:
- Making the time delta configurable via environment variable
- Using a mock clock for more reliable testing
+import os + def run_test_agains_ground_truth( test_target_item_name, test_target_item, ground_truth_dict ): for key, ground_truth in ground_truth_dict.items(): if isinstance(ground_truth, dict): for key2, ground_truth2 in ground_truth.items(): assert ( ground_truth2 == getattr(test_target_item, key)[key2] ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }" else: assert ground_truth == getattr( test_target_item, key ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }" time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at") - assert time_delta.total_seconds() < 20, f"{ time_delta.total_seconds() = }" + max_delta = float(os.getenv("MAX_TIME_DELTA_SECONDS", "20")) + assert time_delta.total_seconds() < max_delta, f"{ time_delta.total_seconds() = } >= {max_delta}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (4)
cognee/modules/data/processing/document_types/__tests__/artificial-inteligence.pdfis excluded by!**/*.pdfcognee/modules/data/processing/document_types/__tests__/soldiers-home.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v1.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v2.pdfis excluded by!**/*.pdf
📒 Files selected for processing (15)
.github/workflows/test_python_3_10.yml(1 hunks).github/workflows/test_python_3_11.yml(1 hunks).github/workflows/test_python_3_9.yml(1 hunks)cognee/infrastructure/engine/__tests__/model_to_graph_to_model.test.py(0 hunks)cognee/modules/data/processing/document_types/__tests__/PdfDocument.test.py(0 hunks)cognee/modules/pipelines/operations/__tests__/get_graph_url.py(0 hunks)cognee/tasks/chunks/__tests__/chunk_by_paragraph.test.py(0 hunks)cognee/tests/unit/documents/pdf_document.py(1 hunks)cognee/tests/unit/integration/test_model_to_graph_to_model.py(1 hunks)cognee/tests/unit/processing/test_chunking.py(1 hunks)cognee/tests/unit/run_tasks/conftest.py(1 hunks)cognee/tests/unit/run_tasks/test_run_tasks.py(3 hunks)cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py(3 hunks)cognee/tests/unit/utils/get_mock_user.py(1 hunks)pytest.ini(1 hunks)
💤 Files with no reviewable changes (4)
- cognee/infrastructure/engine/tests/model_to_graph_to_model.test.py
- cognee/modules/data/processing/document_types/tests/PdfDocument.test.py
- cognee/modules/pipelines/operations/tests/get_graph_url.py
- cognee/tasks/chunks/tests/chunk_by_paragraph.test.py
🚧 Files skipped from review as they are similar to previous changes (7)
- .github/workflows/test_python_3_10.yml
- .github/workflows/test_python_3_11.yml
- .github/workflows/test_python_3_9.yml
- cognee/tests/unit/documents/pdf_document.py
- cognee/tests/unit/processing/test_chunking.py
- cognee/tests/unit/run_tasks/test_run_tasks.py
- pytest.ini
🧰 Additional context used
🪛 Ruff
cognee/tests/unit/integration/test_model_to_graph_to_model.py
139-139: Do not use bare except
(E722)
cognee/tests/unit/run_tasks/conftest.py
15-15: Redefinition of unused get_default_user from line 4
(F811)
cognee/tests/unit/utils/get_mock_user.py
9-9: Undefined name Group
(F821)
🔇 Additional comments (9)
cognee/tests/unit/run_tasks/conftest.py (2)
1-6: LGTM: Imports are well-organized and necessary
The imports are logically structured and all serve a clear purpose in the fixture implementation.
8-18: Verify fixture placement and potential conflicts
Since this is a session-wide fixture that automatically affects all tests, we should verify its placement and potential conflicts.
✅ Verification successful
Fixture placement and potential conflicts verified
No conflicting fixtures found and placement is correct.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for other fixtures that might conflict with this one
echo "Checking for other fixtures that might handle user authentication..."
fd conftest.py | xargs rg -l "get_default_user"
# Check if this is the right location for this fixture
echo "Verifying the directory structure for unit tests..."
fd -t d "unit/run_tasks" -x ls -R {} \;
Length of output: 390
🧰 Tools
🪛 Ruff
15-15: Redefinition of unused get_default_user from line 4
(F811)
cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py (2)
30-36: Previous review comment about improving test assertions is still valid
The validation logic still needs the improvements suggested in the previous review.
51-52: Previous review comment about improving test isolation and error handling is still valid
The test function implementation still needs the improvements suggested in the previous review.
cognee/tests/unit/utils/get_mock_user.py (3)
7-22: Well-documented function with clear type hints!
The function signature and docstring are well-structured, providing clear information about parameters and return type.
🧰 Tools
🪛 Ruff
9-9: Undefined name Group
(F821)
26-40: Verify mock configuration completeness.
While the basic mock setup looks good, consider verifying that all essential SQLAlchemy model attributes are properly mocked. This is particularly important for:
- Required columns that might cause issues if not set
- Relationships beyond
groups - Any model-specific methods that might be called during testing
Let's check the User model definition to ensure we haven't missed any critical attributes:
#!/bin/bash
# Search for User class definition and its attributes
ast-grep --pattern 'class User' -A 201-4:
Add missing import for Group type.
The Group type is used in the type hint but not imported. This will cause type checking issues.
Let's verify the correct import path for Group:
cognee/tests/unit/integration/test_model_to_graph_to_model.py (2)
57-91: LGTM! Well-structured data models with proper type safety.
The data models are well-designed with:
- Proper inheritance from DataPoint
- Clear type hints
- Enum-based validation for car types
- Appropriate metadata configuration for indexing
147-173: LGTM! Well-structured and comprehensive test coverage.
The test functions are well-organized with:
- Clear separation of concerns
- Focused test cases
- Comprehensive validation of model-to-graph-to-model conversion
94afed2 to
b507f9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (6)
cognee/tests/unit/run_tasks/conftest.py (1)
11-13: Enhance the warning message with more contextThe current warning message could be more informative by including details about the test context and implications of using a mock user.
- warnings.warn("\n\n\n---------------USING MOCK USER--------------------\n\n\n") + warnings.warn( + "Using mock user for testing. This replaces the default user getter " + "throughout the test session. All authentication will use mock credentials." + )cognee/tests/unit/utils/get_mock_user.py (1)
23-41: Consider adding safety checks for attribute overrides.The implementation is good, but consider these safety improvements:
- Prevent overwriting of essential attributes via
additional_attributes.- Add type validation for elements in the
groupslist.Here's a suggested implementation:
def get_mock_user( user_id: Optional[UUID] = None, groups: Optional[List["Group"]] = None, **additional_attributes ) -> Mock: # Generate a random UUID if none provided user_id = user_id or uuid4() # Create base mock mock_user = create_autospec(User, instance=True) # Configure basic attributes mock_user.id = user_id mock_user.__tablename__ = "users" - mock_user.groups = groups or [] + # Validate groups if provided + if groups is not None: + if not all(isinstance(g, (Group, Mock)) for g in groups): + raise TypeError("All elements in groups must be instances of Group or Mock") + mock_user.groups = groups or [] # Set polymorphic identity mock_user.__mapper_args__ = {"polymorphic_identity": "user"} + # Prevent overwriting essential attributes + protected_attrs = {"id", "__tablename__", "groups", "__mapper_args__"} + invalid_attrs = protected_attrs.intersection(additional_attributes.keys()) + if invalid_attrs: + raise ValueError(f"Cannot override protected attributes: {invalid_attrs}") # Add any additional attributes for attr_name, attr_value in additional_attributes.items(): setattr(mock_user, attr_name, attr_value) return mock_user.gitignore (1)
8-8: Improve directory pattern syntaxThe
*.lance/pattern has an unconventional trailing slash. For directory patterns in gitignore, it's more conventional to use either*.lanceor**/*.lance/.Choose one of these alternatives based on your needs:
-*.lance/ +*.lanceor for recursive matching:
-*.lance/ +**/*.lance/cognee/tests/unit/integration/test_model_to_graph_to_model.py (3)
32-54: Consider consolidating duplicate ground truth data.
PERSON_GROUND_TRUTHandPARSED_PERSON_GROUND_TRUTHare identical. Consider using a single constant to avoid maintenance overhead and potential inconsistencies.-PARSED_PERSON_GROUND_TRUTH = { - "id": "boris", - "name": "Boris", - "age": 30, - "driving_license": { - "issued_by": "PU Vrsac", - "issued_on": "2025-11-06", - "number": "1234567890", - "expires_on": "2025-11-06", - }, -}Then update the test to use
PERSON_GROUND_TRUTH:- "parsed_person", parsed_person, PARSED_PERSON_GROUND_TRUTH + "parsed_person", parsed_person, PERSON_GROUND_TRUTH
69-91: Add docstrings to model classes.Consider adding docstrings to
CarType,Car, andPersonclasses to document their purpose, attributes, and relationships. This will improve code maintainability and help other developers understand the test data model.Example for the Person class:
class Person(DataPoint): """Represents a person with car ownership information. Attributes: id (str): Unique identifier for the person name (str): Person's name age (int): Person's age owns_car (list[Car]): List of cars owned by the person driving_license (Optional[dict]): Driving license details if available _metadata (dict): Indexing configuration """
158-166: Improve edge test readability and maintainability.The current edge test uses tuple slicing and manual dictionary comparison. Consider restructuring it to be more explicit and maintainable.
def test_extracted_edge(graph_outputs): (_, _, edge, _) = graph_outputs - assert ( - EDGE_GROUND_TRUTH[:3] == edge[:3] - ), f"{EDGE_GROUND_TRUTH[:3] = } != {edge[:3] = }" - for key, ground_truth in EDGE_GROUND_TRUTH[3].items(): - assert ground_truth == edge[3][key], f"{ground_truth = } != {edge[3][key] = }" + # Test edge components + source_id, target_id, relationship = edge[:3] + assert source_id == EDGE_GROUND_TRUTH[0], "Incorrect source node ID" + assert target_id == EDGE_GROUND_TRUTH[1], "Incorrect target node ID" + assert relationship == EDGE_GROUND_TRUTH[2], "Incorrect relationship name" + + # Test edge metadata + expected_metadata = EDGE_GROUND_TRUTH[3] + actual_metadata = edge[3] + assert actual_metadata == expected_metadata, ( + "Edge metadata mismatch:\n" + f"Expected: {expected_metadata}\n" + f"Actual: {actual_metadata}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (4)
cognee/modules/data/processing/document_types/__tests__/artificial-inteligence.pdfis excluded by!**/*.pdfcognee/modules/data/processing/document_types/__tests__/soldiers-home.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v1.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v2.pdfis excluded by!**/*.pdf
📒 Files selected for processing (16)
.github/workflows/test_python_3_10.yml(1 hunks).github/workflows/test_python_3_11.yml(1 hunks).github/workflows/test_python_3_9.yml(1 hunks).gitignore(1 hunks)cognee/infrastructure/engine/__tests__/model_to_graph_to_model.test.py(0 hunks)cognee/modules/data/processing/document_types/__tests__/PdfDocument.test.py(0 hunks)cognee/modules/pipelines/operations/__tests__/get_graph_url.py(0 hunks)cognee/tasks/chunks/__tests__/chunk_by_paragraph.test.py(0 hunks)cognee/tests/unit/documents/pdf_document.py(1 hunks)cognee/tests/unit/integration/test_model_to_graph_to_model.py(1 hunks)cognee/tests/unit/processing/test_chunking.py(1 hunks)cognee/tests/unit/run_tasks/conftest.py(1 hunks)cognee/tests/unit/run_tasks/test_run_tasks.py(3 hunks)cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py(3 hunks)cognee/tests/unit/utils/get_mock_user.py(1 hunks)pytest.ini(1 hunks)
💤 Files with no reviewable changes (4)
- cognee/infrastructure/engine/tests/model_to_graph_to_model.test.py
- cognee/modules/data/processing/document_types/tests/PdfDocument.test.py
- cognee/modules/pipelines/operations/tests/get_graph_url.py
- cognee/tasks/chunks/tests/chunk_by_paragraph.test.py
🚧 Files skipped from review as they are similar to previous changes (8)
- .github/workflows/test_python_3_10.yml
- .github/workflows/test_python_3_11.yml
- .github/workflows/test_python_3_9.yml
- cognee/tests/unit/documents/pdf_document.py
- cognee/tests/unit/processing/test_chunking.py
- cognee/tests/unit/run_tasks/test_run_tasks.py
- cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py
- pytest.ini
🧰 Additional context used
🪛 Ruff
cognee/tests/unit/integration/test_model_to_graph_to_model.py
139-139: Do not use bare except
(E722)
cognee/tests/unit/run_tasks/conftest.py
15-15: Redefinition of unused get_default_user from line 4
(F811)
cognee/tests/unit/utils/get_mock_user.py
9-9: Undefined name Group
(F821)
🔇 Additional comments (4)
cognee/tests/unit/run_tasks/conftest.py (2)
1-6: LGTM!
The imports are well-organized and all are necessary for the fixture's functionality.
15-18:
Fix the function replacement implementation
The current implementation has the same issues as previously identified:
- The wrapper function is being called immediately instead of being passed as a reference
- The module-level function replacement could be made more explicit
- get_default_user = cognee.modules.users.methods.get_default_user
- cognee.modules.users.methods.get_default_user = get_mock_user_wrapper()
+ # Store the original function
+ original_get_default_user = cognee.modules.users.methods.get_default_user
+ # Replace with the wrapper function (not its result)
+ cognee.modules.users.methods.get_default_user = get_mock_user_wrapper
yield
- cognee.modules.users.methods.get_default_user = get_default_user
+ # Restore the original function
+ cognee.modules.users.methods.get_default_user = original_get_default_user🧰 Tools
🪛 Ruff
15-15: Redefinition of unused get_default_user from line 4
(F811)
cognee/tests/unit/utils/get_mock_user.py (2)
7-22: Well-documented function signature!
The function signature is properly typed and the docstring provides clear documentation of all parameters and return value.
🧰 Tools
🪛 Ruff
9-9: Undefined name Group
(F821)
7-41: Verify integration with existing test infrastructure.
Let's verify the integration with test files mentioned in the summary.
🧰 Tools
🪛 Ruff
9-9: Undefined name Group
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
cognee/tests/unit/utils/get_mock_user.py (1)
25-25: Enhance warning message for better visibility and context.The current warning message could be more informative by including context about why a mock user is being used and potential implications.
Consider updating the warning message:
- warnings.warn("\n\n\n---------------USING MOCK USER--------------------\n\n\n") + warnings.warn( + "\nMOCK USER IN USE: This test is using a mock user instance instead of a real user." + "\nEnsure this is intended and that all necessary user attributes are properly mocked.", + UserWarning + )cognee/tests/unit/integration/test_model_to_graph_to_model.py (3)
43-53: Consider consolidating duplicate ground truth data.
PARSED_PERSON_GROUND_TRUTHis identical toPERSON_GROUND_TRUTH. Consider reusingPERSON_GROUND_TRUTHdirectly to avoid data duplication and potential maintenance issues.-PARSED_PERSON_GROUND_TRUTH = { - "id": "boris", - "name": "Boris", - "age": 30, - "driving_license": { - "issued_by": "PU Vrsac", - "issued_on": "2025-11-06", - "number": "1234567890", - "expires_on": "2025-11-06", - }, -}Then update the test to use
PERSON_GROUND_TRUTH:- "parsed_person", parsed_person, PARSED_PERSON_GROUND_TRUTH + "parsed_person", parsed_person, PERSON_GROUND_TRUTH
88-88: Add type hints for driving_license dictionary fields.Consider defining a TypedDict for driving_license to improve type safety and documentation.
+from typing import TypedDict + +class DrivingLicense(TypedDict): + issued_by: str + issued_on: str + number: str + expires_on: str + class Person(DataPoint): id: str name: str age: int owns_car: list[Car] - driving_license: Optional[dict] + driving_license: Optional[DrivingLicense]
146-172: Add docstrings to test functions.The test functions lack documentation. Consider adding docstrings to explain the purpose and expectations of each test.
Example for the first test:
def test_extracted_person(graph_outputs): """ Test that a Person instance is correctly extracted from the graph. Verifies that all person attributes match the ground truth data, including nested structures like driving_license. """ (_, person, _, _) = graph_outputs run_test_agains_ground_truth("person", person, PERSON_GROUND_TRUTH)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
cognee/tests/unit/documents/pdf_document.py(1 hunks)cognee/tests/unit/integration/test_model_to_graph_to_model.py(1 hunks)cognee/tests/unit/run_tasks/conftest.py(1 hunks)cognee/tests/unit/run_tasks/test_run_tasks.py(3 hunks)cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py(3 hunks)cognee/tests/unit/utils/get_mock_user.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- cognee/tests/unit/documents/pdf_document.py
- cognee/tests/unit/run_tasks/conftest.py
- cognee/tests/unit/run_tasks/test_run_tasks.py
- cognee/tests/unit/run_tasks/test_run_tasks_from_queue.py
🧰 Additional context used
🪛 Ruff
cognee/tests/unit/integration/test_model_to_graph_to_model.py
138-138: Do not use bare except
(E722)
cognee/tests/unit/utils/get_mock_user.py
11-11: Undefined name Group
(F821)
🔇 Additional comments (6)
cognee/tests/unit/utils/get_mock_user.py (4)
1-7: Add missing import for Group type.
The Group type is used in type hints but not imported.
9-24: LGTM! Well-documented function signature.
The function signature is properly typed and the docstring provides comprehensive documentation of all parameters and return type.
🧰 Tools
🪛 Ruff
11-11: Undefined name Group
(F821)
30-39: Verify required User model attributes.
The mock configuration assumes certain attributes (__tablename__, __mapper_args__) exist on the User model. Let's verify these are actually required.
Run the following to check the User model implementation:
#!/bin/bash
# Description: Check if these attributes are defined in the User model
ast-grep --pattern 'class User {
$$$
__tablename__ = $$$
$$$
}'
ast-grep --pattern 'class User {
$$$
__mapper_args__ = $$$
$$$
}'41-44: Consider validating additional attributes.
The current implementation sets any additional attributes without validation. This could lead to setting attributes that don't exist on the real User model.
Let's verify the available User model attributes:
Consider adding validation:
# Add any additional attributes
+ valid_attributes = set(dir(User)) # Get all attributes of User class
for attr_name, attr_value in additional_attributes.items():
+ if attr_name not in valid_attributes:
+ warnings.warn(f"Setting non-standard attribute '{attr_name}' on mock user")
setattr(mock_user, attr_name, attr_value)cognee/tests/unit/integration/test_model_to_graph_to_model.py (2)
92-108: Skip comment: Existing review comment is still valid.
The previous review comment already covers the necessary improvements for the helper function, including fixing the typo, improving error messages, and making the time threshold configurable.
135-140: Skip comment: Existing review comment is still valid.
The previous review comment already covers the necessary improvements for error handling in the fixture.
🧰 Tools
🪛 Ruff
138-138: Do not use bare except
(E722)
| car, person = nodes[0], nodes[1] | ||
| edge = edges[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make node unpacking more robust.
The current implementation assumes a specific order of nodes, which is fragile. Consider using node IDs to find the correct nodes.
- car, person = nodes[0], nodes[1]
+ car = next(node for node in nodes if node.id == "car1")
+ person = next(node for node in nodes if node.id == "boris")Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (6)
cognee/tests/integration/run_toy_tasks/conftest.py (2)
1-4: Add docstring to explain fixture's purposeThe fixture lacks documentation about its purpose and usage.
Add a docstring:
import os import pytest + @pytest.fixture(autouse=True, scope="session") def copy_cognee_db_to_target_location(): + """Prepares the test environment by copying a test database file. + + This fixture runs automatically once per test session to ensure + the required database file is available for integration tests. + """
1-8: Consider implementing proper database setup/teardownInstead of copying a static database file, consider implementing proper database initialization and cleanup. This would be more maintainable and less prone to data consistency issues.
Consider these alternatives:
- Use an in-memory database for tests
- Create a fresh database with test data programmatically
- Use database migrations to set up the test database
- Implement proper database fixtures that handle setup and teardown
Would you like me to provide an example implementation of any of these approaches?
cognee/tests/integration/run_toy_tasks/test_run_tasks.py (1)
44-45: Consider adding timeout and error handlingWhile the test runner is functional, consider enhancing it with timeout and error handling for improved robustness.
def test_run_tasks(): - asyncio.run(run_and_check_tasks()) + try: + # Add timeout to prevent infinite loops + asyncio.run(asyncio.wait_for(run_and_check_tasks(), timeout=30)) + except asyncio.TimeoutError: + pytest.fail("Test exceeded timeout") + except Exception as e: + pytest.fail(f"Test failed with error: {str(e)}")cognee/tests/integration/run_toy_tasks/test_run_tasks_from_queue.py (3)
22-29: Consider adding error handling for task failuresThe pipeline configuration looks good, but it lacks error handling for potential task failures. This is particularly important for integration tests to ensure proper error reporting and cleanup.
Consider wrapping the tasks in try-except blocks or adding error handlers:
tasks_run = run_tasks( [ - Task(queue_consumer), - Task(add_one), - Task(multiply_by_two), + Task(queue_consumer, on_failure=lambda e: print(f"Queue consumer failed: {e}")), + Task(add_one, on_failure=lambda e: print(f"Add one failed: {e}")), + Task(multiply_by_two, on_failure=lambda e: print(f"Multiply failed: {e}")), ], pipeline_name="test_run_tasks_from_queue", )
Line range hint
39-49: Improve queue handling reliability and thread safetyThe current implementation has several potential issues:
- Using a custom
is_closedflag on Queue isn't thread-safe- Fixed sleep delays could make tests flaky
- No cleanup handling for failure scenarios
Consider using asyncio.Queue instead:
- data_queue = Queue() - data_queue.is_closed = False + data_queue = asyncio.Queue() + queue_closed = asyncio.Event() async def queue_producer(): - for i in range(0, 10): - data_queue.put(i) - await asyncio.sleep(0.1) - data_queue.is_closed = True + try: + for i in range(0, 10): + await data_queue.put(i) + await asyncio.sleep(0.1) + finally: + queue_closed.set() - await asyncio.gather(pipeline(data_queue), queue_producer()) + try: + await asyncio.gather(pipeline(data_queue), queue_producer()) + except Exception: + queue_closed.set() + raiseAlso update the consumer accordingly:
async def queue_consumer(): while not queue_closed.is_set() or not data_queue.empty(): try: yield await asyncio.wait_for(data_queue.get(), timeout=0.3) data_queue.task_done() except asyncio.TimeoutError: continue
Integration tests are excluded from the default pytest configuration
The test
cognee/tests/integration/run_toy_tasks/test_run_tasks_from_queue.pyis located in the integration tests directory, butpytest.iniis set to run only unit tests (addopts = cognee/tests/unit). Ensure integration tests are included in the test run or adjust their configuration accordingly.🔗 Analysis chain
Line range hint
1-53: Verify test location in the new testing structureAccording to the AI summary, test execution paths are being refined to target unit tests in
cognee/tests/unit/. However, this integration test is located incognee/tests/integration/.Let's verify the test organization:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check the current test organization structure fd -t d . cognee/tests/ # Check for similar integration tests fd -t f "test_.*queue.*" cognee/tests/ # Check pytest configuration cat pytest.iniLength of output: 432
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
cognee/tests/integration/run_toy_tasks/conftest.py(1 hunks)cognee/tests/integration/run_toy_tasks/test_run_tasks.py(3 hunks)cognee/tests/integration/run_toy_tasks/test_run_tasks_from_queue.py(3 hunks)
🔇 Additional comments (3)
cognee/tests/integration/run_toy_tasks/conftest.py (1)
6-6: Consider test isolation implications of session-scoped fixture
The autouse=True with scope="session" means this fixture will run once for all tests and modify system state. This could lead to test interference if multiple test sessions run concurrently or if tests expect a clean state.
Let's check if there are any concurrent test runs or dependencies:
cognee/tests/integration/run_toy_tasks/test_run_tasks.py (2)
Line range hint 10-41: Well-structured test implementation with clear naming
The renaming from main() to run_and_check_tasks() better reflects the function's purpose. The nested functions create a clear and logical pipeline for number transformation testing.
25-34: Verify pipeline_name parameter handling
The addition of pipeline_name="test_run_tasks" improves traceability, but let's verify how this parameter is handled in the run_tasks implementation.
✅ Verification successful
Pipeline_name parameter is properly handled
The pipeline_name="test_run_tasks" is correctly utilized in the run_tasks implementation for logging and telemetry purposes. No additional validation is required.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for run_tasks implementation to verify pipeline_name parameter handling
ast-grep --pattern 'def run_tasks($$$pipeline_name$$$):
$$$
'
# Search for any validation or error handling related to pipeline_name
rg -A 5 'pipeline_name' --type py
Length of output: 4248
|
Hey @borisarzentar, I moved the test for running the toy tasks out of the unit test folder and into an integration test folder, as they require a sqlite database. I tried monkeypatching the database access but that didn't work, so now I copy over a file that enables these pipelines. By relying on outside state, they are no longer unit tests, and so I moved the to |
64723a6 to
184b73b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
cognee/tests/unit/interfaces/test_model_to_graph_to_model.py (4)
33-55: Consider consolidating duplicate ground truth data.
PERSON_GROUND_TRUTHandPARSED_PERSON_GROUND_TRUTHcontain identical data. To avoid maintenance overhead and potential inconsistencies, consider using a single constant.-PARSED_PERSON_GROUND_TRUTH = { - "id": "boris", - "name": "Boris", - "age": 30, - "driving_license": { - "issued_by": "PU Vrsac", - "issued_on": "2025-11-06", - "number": "1234567890", - "expires_on": "2025-11-06", - }, -}Then update the test to use
PERSON_GROUND_TRUTH:- run_test_against_ground_truth( - "parsed_person", parsed_person, PARSED_PERSON_GROUND_TRUTH - ) + run_test_against_ground_truth( + "parsed_person", parsed_person, PERSON_GROUND_TRUTH + )
85-91: Enhance ground truth data structure to match model relationships.The
Personclass definesowns_caraslist[Car], but this relationship isn't reflected inPERSON_GROUND_TRUTH. Consider updating the ground truth to include the car relationship for more comprehensive testing.PERSON_GROUND_TRUTH = { "id": "boris", "name": "Boris", "age": 30, + "owns_car": [CAR_GROUND_TRUTH], "driving_license": { "issued_by": "PU Vrsac", "issued_on": "2025-11-06", "number": "1234567890", "expires_on": "2025-11-06", }, }
112-143: Add type hints and improve fixture organization.The fixture could benefit from type hints and better organization:
- Add return type hint for clarity
- Consider splitting the fixture into smaller, focused fixtures
+from typing import Tuple + @pytest.fixture(scope="session") -def graph_outputs(): +def graph_outputs() -> Tuple[Car, Person, tuple, Person]: boris = Person( id="boris", name="Boris", age=30, owns_car=[ Car( id="car1", brand="Toyota", model="Camry", year=2020, color="Blue", is_type=CarType(id="sedan", name=CarTypeName.Sedan), ) ], driving_license={ "issued_by": "PU Vrsac", "issued_on": "2025-11-06", "number": "1234567890", "expires_on": "2025-11-06", }, ) nodes, edges = get_graph_from_model(boris) car, person = nodes[0], nodes[1] edge = edges[0] parsed_person = get_model_instance_from_graph(nodes, edges, "boris") return (car, person, edge, parsed_person)Consider splitting into focused fixtures:
@pytest.fixture(scope="session") def test_person() -> Person: return Person(...) @pytest.fixture(scope="session") def graph_data(test_person) -> Tuple[list, list]: return get_graph_from_model(test_person) @pytest.fixture(scope="session") def parsed_person(graph_data) -> Person: nodes, edges = graph_data return get_model_instance_from_graph(nodes, edges, "boris")
145-171: Enhance test case descriptions and assertions.The test cases could benefit from more descriptive docstrings and better assertion messages:
def test_extracted_person(graph_outputs): + """Test that a Person object is correctly extracted from the graph representation.""" (_, person, _, _) = graph_outputs run_test_against_ground_truth("person", person, PERSON_GROUND_TRUTH) def test_extracted_car(graph_outputs): + """Test that a Car object is correctly extracted from the graph representation.""" (car, _, _, _) = graph_outputs run_test_against_ground_truth("car", car, CAR_GROUND_TRUTH) def test_extracted_edge(graph_outputs): + """Test that the relationship between Person and Car is correctly extracted.""" (_, _, edge, _) = graph_outputs assert ( EDGE_GROUND_TRUTH[:3] == edge[:3] - ), f"{EDGE_GROUND_TRUTH[:3] = } != {edge[:3] = }" + ), f"Edge tuple mismatch: expected {EDGE_GROUND_TRUTH[:3]}, got {edge[:3]}" for key, ground_truth in EDGE_GROUND_TRUTH[3].items(): - assert ground_truth == edge[3][key], f"{ground_truth = } != {edge[3][key] = }" + assert ground_truth == edge[3][key], ( + f"Edge metadata mismatch for {key}: " + f"expected {ground_truth}, got {edge[3][key]}" + ) def test_parsed_person(graph_outputs): + """Test that a Person object can be reconstructed from its graph representation.""" (_, _, _, parsed_person) = graph_outputs run_test_against_ground_truth( "parsed_person", parsed_person, PARSED_PERSON_GROUND_TRUTH ) run_test_against_ground_truth("car", parsed_person.owns_car[0], CAR_GROUND_TRUTH)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (4)
cognee/modules/data/processing/document_types/__tests__/artificial-inteligence.pdfis excluded by!**/*.pdfcognee/modules/data/processing/document_types/__tests__/soldiers-home.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v1.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v2.pdfis excluded by!**/*.pdf
📒 Files selected for processing (15)
.github/workflows/test_python_3_10.yml(1 hunks).github/workflows/test_python_3_11.yml(1 hunks).github/workflows/test_python_3_9.yml(1 hunks)cognee/infrastructure/engine/__tests__/model_to_graph_to_model.test.py(0 hunks)cognee/modules/data/processing/document_types/__tests__/PdfDocument.test.py(0 hunks)cognee/modules/pipelines/operations/__tests__/get_graph_url.py(0 hunks)cognee/tasks/chunks/__tests__/chunk_by_paragraph.test.py(0 hunks)cognee/tasks/documents/classify_documents.py(0 hunks)cognee/tests/integration/run_toy_tasks/conftest.py(1 hunks)cognee/tests/integration/run_toy_tasks/test_run_tasks.py(2 hunks)cognee/tests/integration/run_toy_tasks/test_run_tasks_from_queue.py(3 hunks)cognee/tests/unit/documents/pdf_document.py(1 hunks)cognee/tests/unit/interfaces/test_model_to_graph_to_model.py(1 hunks)cognee/tests/unit/processing/test_chunking.py(1 hunks)pytest.ini(0 hunks)
💤 Files with no reviewable changes (6)
- cognee/infrastructure/engine/tests/model_to_graph_to_model.test.py
- cognee/modules/data/processing/document_types/tests/PdfDocument.test.py
- cognee/modules/pipelines/operations/tests/get_graph_url.py
- cognee/tasks/chunks/tests/chunk_by_paragraph.test.py
- cognee/tasks/documents/classify_documents.py
- pytest.ini
🚧 Files skipped from review as they are similar to previous changes (8)
- .github/workflows/test_python_3_10.yml
- .github/workflows/test_python_3_11.yml
- .github/workflows/test_python_3_9.yml
- cognee/tests/integration/run_toy_tasks/conftest.py
- cognee/tests/integration/run_toy_tasks/test_run_tasks.py
- cognee/tests/integration/run_toy_tasks/test_run_tasks_from_queue.py
- cognee/tests/unit/documents/pdf_document.py
- cognee/tests/unit/processing/test_chunking.py
| def run_test_against_ground_truth( | ||
| test_target_item_name, test_target_item, ground_truth_dict | ||
| ): | ||
| for key, ground_truth in ground_truth_dict.items(): | ||
| if isinstance(ground_truth, dict): | ||
| for key2, ground_truth2 in ground_truth.items(): | ||
| assert ( | ||
| ground_truth2 == getattr(test_target_item, key)[key2] | ||
| ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }" | ||
| else: | ||
| assert ground_truth == getattr( | ||
| test_target_item, key | ||
| ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }" | ||
| time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at") | ||
|
|
||
| assert time_delta.total_seconds() < 20, f"{ time_delta.total_seconds() = }" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance test helper robustness and configurability.
The helper function has a few areas for improvement:
- The 20-second threshold for timestamp freshness is hardcoded.
- The function doesn't handle all possible data structures (e.g., lists).
- Error messages could be more descriptive.
Consider these improvements:
+# At the top of the file
+TIMESTAMP_FRESHNESS_THRESHOLD = 20 # seconds
+
def run_test_against_ground_truth(
- test_target_item_name, test_target_item, ground_truth_dict
+ test_target_item_name: str,
+ test_target_item: DataPoint,
+ ground_truth_dict: dict,
+ timestamp_threshold: int = TIMESTAMP_FRESHNESS_THRESHOLD,
):
for key, ground_truth in ground_truth_dict.items():
+ test_value = getattr(test_target_item, key)
+ if test_value is None:
+ assert ground_truth is None, (
+ f"{test_target_item_name}: {key} is None but ground truth is {ground_truth}"
+ )
+ continue
if isinstance(ground_truth, dict):
for key2, ground_truth2 in ground_truth.items():
assert (
- ground_truth2 == getattr(test_target_item, key)[key2]
+ ground_truth2 == test_value[key2]
), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }"
+ elif isinstance(ground_truth, list):
+ assert len(ground_truth) == len(test_value), (
+ f"{test_target_item_name}: {key} length mismatch. "
+ f"Expected {len(ground_truth)}, got {len(test_value)}"
+ )
+ for i, (expected, actual) in enumerate(zip(ground_truth, test_value)):
+ if isinstance(expected, dict):
+ run_test_against_ground_truth(
+ f"{test_target_item_name}[{i}]",
+ actual,
+ expected,
+ timestamp_threshold,
+ )
+ else:
+ assert expected == actual, (
+ f"{test_target_item_name}: {key}[{i}] mismatch. "
+ f"Expected {expected}, got {actual}"
+ )
else:
assert ground_truth == getattr(
test_target_item, key
), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }"
time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at")
- assert time_delta.total_seconds() < 20, f"{ time_delta.total_seconds() = }"
+ assert time_delta.total_seconds() < timestamp_threshold, (
+ f"Timestamp too old: {time_delta.total_seconds()}s > {timestamp_threshold}s"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def run_test_against_ground_truth( | |
| test_target_item_name, test_target_item, ground_truth_dict | |
| ): | |
| for key, ground_truth in ground_truth_dict.items(): | |
| if isinstance(ground_truth, dict): | |
| for key2, ground_truth2 in ground_truth.items(): | |
| assert ( | |
| ground_truth2 == getattr(test_target_item, key)[key2] | |
| ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }" | |
| else: | |
| assert ground_truth == getattr( | |
| test_target_item, key | |
| ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }" | |
| time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at") | |
| assert time_delta.total_seconds() < 20, f"{ time_delta.total_seconds() = }" | |
| # At the top of the file | |
| TIMESTAMP_FRESHNESS_THRESHOLD = 20 # seconds | |
| def run_test_against_ground_truth( | |
| test_target_item_name: str, | |
| test_target_item: DataPoint, | |
| ground_truth_dict: dict, | |
| timestamp_threshold: int = TIMESTAMP_FRESHNESS_THRESHOLD, | |
| ): | |
| for key, ground_truth in ground_truth_dict.items(): | |
| test_value = getattr(test_target_item, key) | |
| if test_value is None: | |
| assert ground_truth is None, ( | |
| f"{test_target_item_name}: {key} is None but ground truth is {ground_truth}" | |
| ) | |
| continue | |
| if isinstance(ground_truth, dict): | |
| for key2, ground_truth2 in ground_truth.items(): | |
| assert ( | |
| ground_truth2 == test_value[key2] | |
| ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }" | |
| elif isinstance(ground_truth, list): | |
| assert len(ground_truth) == len(test_value), ( | |
| f"{test_target_item_name}: {key} length mismatch. " | |
| f"Expected {len(ground_truth)}, got {len(test_value)}" | |
| ) | |
| for i, (expected, actual) in enumerate(zip(ground_truth, test_value)): | |
| if isinstance(expected, dict): | |
| run_test_against_ground_truth( | |
| f"{test_target_item_name}[{i}]", | |
| actual, | |
| expected, | |
| timestamp_threshold, | |
| ) | |
| else: | |
| assert expected == actual, ( | |
| f"{test_target_item_name}: {key}[{i}] mismatch. " | |
| f"Expected {expected}, got {actual}" | |
| ) | |
| else: | |
| assert ground_truth == getattr( | |
| test_target_item, key | |
| ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }" | |
| time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at") | |
| assert time_delta.total_seconds() < timestamp_threshold, ( | |
| f"Timestamp too old: {time_delta.total_seconds()}s > {timestamp_threshold}s" | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Outside diff range and nitpick comments (20)
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py (2)
1-2: Use parentheses for line continuation instead of backslashReplace the backslash with parentheses for a more Pythonic line continuation:
-from cognee.tests.unit.interfaces.graph.util import \ - run_test_against_ground_truth +from cognee.tests.unit.interfaces.graph.util import ( + run_test_against_ground_truth +)
4-14: Consider using relative dates instead of hardcoded datesThe hardcoded date "2025-11-06" for the driving license could cause test failures after it expires. Consider using relative dates based on the current date:
+from datetime import datetime, timedelta + +# At the top of the file +FUTURE_DATE = (datetime.now() + timedelta(days=365)).strftime("%Y-%m-%d") + PARSED_PERSON_GROUND_TRUTH = { "id": "boris", "name": "Boris", "age": 30, "driving_license": { "issued_by": "PU Vrsac", - "issued_on": "2025-11-06", + "issued_on": FUTURE_DATE, "number": "1234567890", - "expires_on": "2025-11-06", + "expires_on": FUTURE_DATE, }, }cognee/tests/unit/interfaces/graph/util.py (3)
2-3: Remove extra blank lineThere's an unnecessary extra blank line after the imports.
from datetime import datetime, timezone -
17-19: Improve timestamp validationThe timestamp validation could be more robust and configurable.
+ MAX_AGE_SECONDS = 20 # Consider making this configurable + + if not hasattr(test_target_item, "updated_at"): + raise AttributeError(f"{test_target_item_name} has no 'updated_at' attribute") + time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at") - - assert time_delta.total_seconds() < 20, f"{ time_delta.total_seconds() = }" + age_seconds = time_delta.total_seconds() + assert age_seconds < MAX_AGE_SECONDS, ( + f"{test_target_item_name} is too old: " + f"age={age_seconds:.2f}s, max_age={MAX_AGE_SECONDS}s" + )
1-19: Consider moving utility to a common test utilities locationThis validation function seems generally useful for comparing objects against ground truth data. Consider moving it to a common test utilities location (e.g.,
cognee/tests/utils.py) to promote reuse across different test suites.cognee/tests/integration/run_toy_tasks/run_tasks_test.py (1)
33-39: Improve test readability with calculated expected resultsInstead of using a hardcoded results array, consider calculating the expected results to make the test more maintainable and self-documenting.
- results = [5, 7, 9, 11, 13, 15, 17, 19, 21, 23] + # Calculate expected results: (i + 1 + 1) * 2 + 1 for i in range(10) + results = [(i + 2) * 2 + 1 for i in range(10)] index = 0 async for result in pipeline: assert ( result == results[index] - ), f"at {index = }: {result = } != {results[index] = }" + ), f"Pipeline transformation failed at index {index}. Got {result}, expected {results[index]} (((index + 1) + 1) * 2 + 1)" index += 1cognee/tests/unit/documents/PdfDocument_test.py (2)
7-10: Document the ground truth data source and formatThe
GROUND_TRUTHconstant lacks documentation about where these values come from and what they represent.Add documentation:
+# Ground truth values extracted from 'artificial-intelligence.pdf' +# These values represent expected metrics for the first two paragraphs: +# - word_count: Number of words in the paragraph +# - len_text: Total length of the paragraph text +# - cut_type: How the paragraph was split (sentence_end, max_length, etc.) GROUND_TRUTH = [ {"word_count": 879, "len_text": 5622, "cut_type": "sentence_end"}, {"word_count": 951, "len_text": 6384, "cut_type": "sentence_end"}, ]
24-35: Improve test assertions structure and error messagesThe current assertions could be better structured for clarity and maintainability. Consider using pytest's built-in features for better error reporting.
Here's a suggested improvement:
- for ground_truth, paragraph_data in zip( - GROUND_TRUTH, pdf_doc.read(chunk_size=1024) - ): - assert ( - ground_truth["word_count"] == paragraph_data.word_count - ), f'{ground_truth["word_count"] = } != {paragraph_data.word_count = }' - assert ground_truth["len_text"] == len( - paragraph_data.text - ), f'{ground_truth["len_text"] = } != {len(paragraph_data.text) = }' - assert ( - ground_truth["cut_type"] == paragraph_data.cut_type - ), f'{ground_truth["cut_type"] = } != {paragraph_data.cut_type = }' +def test_pdf_document_read(pdf_document): + for i, (expected, actual) in enumerate(zip( + GROUND_TRUTH, + pdf_document.read(chunk_size=1024) + )): + # Test each paragraph's metrics + assert actual.word_count == expected["word_count"], \ + f"Paragraph {i}: Wrong word count" + assert len(actual.text) == expected["len_text"], \ + f"Paragraph {i}: Wrong text length" + assert actual.cut_type == expected["cut_type"], \ + f"Paragraph {i}: Wrong cut type"cognee/tests/integration/run_toy_tasks/run_task_from_queue_test.py (5)
22-29: Add documentation for the pipeline configurationThe pipeline setup would benefit from a docstring explaining the transformation chain and why this specific pipeline name is used.
Consider adding a docstring:
+ # Pipeline that processes numbers by adding 1 and then multiplying by 2 + # Example: input 0 -> (0+1)*2 = 2, input 1 -> (1+1)*2 = 4, etc. tasks_run = run_tasks( [ Task(queue_consumer), Task(add_one), Task(multiply_by_two), ], pipeline_name="test_run_tasks_from_queue", )
31-37: Improve test readability and error messagesThe hardcoded results list and basic assertion message make it difficult to understand the test's logic and debug failures.
Consider this improvement:
- results = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20] + # Generate expected results using the same transformation logic + results = [(i + 1) * 2 for i in range(10)] index = 0 async for result in tasks_run: assert ( result == results[index] - ), f"at {index = }: {result = } != {results[index] = }" + ), f"Transform failed for input {index}. Expected (({index}+1)*2)={results[index]}, but got {result}" index += 1
Line range hint
40-50: Consider using asyncio.Queue for better async integrationThe current implementation uses a standard Queue with a custom closure mechanism. Using asyncio.Queue would provide better integration with async operations and built-in task coordination.
Consider this improvement:
async def run_queue(): - data_queue = Queue() - data_queue.is_closed = False + data_queue = asyncio.Queue() + producer_done = asyncio.Event() async def queue_producer(): for i in range(0, 10): - data_queue.put(i) + await data_queue.put(i) await asyncio.sleep(0.1) - data_queue.is_closed = True + producer_done.set() await asyncio.gather(pipeline(data_queue), queue_producer())This change would require updating the queue_consumer to use:
async def queue_consumer(): while not producer_done.is_set() or not data_queue.empty(): try: yield await data_queue.get() except asyncio.QueueEmpty: await asyncio.sleep(0.3)
53-54: Add error handling and cleanup to the test functionThe test function should handle potential asyncio errors and ensure proper cleanup.
Consider this improvement:
def test_run_tasks_from_queue(): - asyncio.run(run_queue()) + try: + asyncio.run(run_queue()) + except Exception as e: + pytest.fail(f"Test failed with error: {e}") + finally: + # Reset event loop policy to prevent interference with other tests + asyncio.set_event_loop_policy(None)
Unintended Dependencies in Integration Tests
cognee/tests/integration/run_toy_tasks/conftest.pyimports fromcognee.tests.unitcognee/tests/integration/run_toy_tasks/run_task_from_queue_test.pyimports fromcognee.tests.unitcognee/tests/integration/run_toy_tasks/run_tasks_test.pyimports fromcognee.tests.unit🔗 Analysis chain
Line range hint
1-54: Verify integration test directory structureThe move to integration tests is appropriate given the external state dependencies mentioned in the PR comments. Let's verify the directory structure is consistent.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify integration test structure and related files # Check if there are any remaining references to this test in the unit test directory rg -l "run_task_from_queue_test" cognee/tests/unit/ # Verify the integration test directory structure find cognee/tests/integration -type f -name "*test.py" -exec dirname {} \; | sort | uniq -c # Check for similar database-dependent tests that might need to be moved rg -l "sqlite" cognee/tests/unit/Length of output: 229
Script:
#!/bin/bash # Description: Further verify integration test isolation and absence of unintended dependencies # Check for any import statements in integration tests that reference unit tests rg "from cognee\.tests\.unit" cognee/tests/integration/**/*.py # Verify that integration tests do not import modules exclusively meant for unit tests rg "import .*" cognee/tests/integration/**/*.py | grep "cognee/modules/unit" # Ensure no test files are duplicated between unit and integration directories fd -e py -x basename {} \; | sort | uniq -dLength of output: 609
cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py (2)
4-5: Use parentheses for line continuation instead of backslashPython style guides recommend using parentheses for line continuation instead of backslashes.
-from cognee.tests.unit.interfaces.graph.util import \ - run_test_against_ground_truth +from cognee.tests.unit.interfaces.graph.util import ( + run_test_against_ground_truth +)
40-44: Enhance test functions with docstrings and consistent assertionsThe test functions could benefit from the following improvements:
- Add docstrings to describe test purposes
- Simplify edge comparison using dictionary comparison
- Use consistent assertion message formatting
def test_extracted_person(graph_outputs): + """Test that extracted person data matches the ground truth.""" (_, person, _, _) = graph_outputs run_test_against_ground_truth("person", person, PERSON_GROUND_TRUTH) def test_extracted_car(graph_outputs): + """Test that extracted car data matches the ground truth.""" (car, _, _, _) = graph_outputs run_test_against_ground_truth("car", car, CAR_GROUND_TRUTH) def test_extracted_edge(graph_outputs): + """Test that extracted edge data matches the ground truth.""" (_, _, edge, _) = graph_outputs - assert ( - EDGE_GROUND_TRUTH[:3] == edge[:3] - ), f"{EDGE_GROUND_TRUTH[:3] = } != {edge[:3] = }" - for key, ground_truth in EDGE_GROUND_TRUTH[3].items(): - assert ground_truth == edge[3][key], f"{ground_truth = } != {edge[3][key] = }" + # Compare tuple elements + assert EDGE_GROUND_TRUTH[:3] == edge[:3], ( + f"Expected edge data {EDGE_GROUND_TRUTH[:3]}, got {edge[:3]}" + ) + # Compare metadata dictionary + assert EDGE_GROUND_TRUTH[3] == edge[3], ( + f"Expected metadata {EDGE_GROUND_TRUTH[3]}, got {edge[3]}" + )Also applies to: 46-49, 51-58
cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py (2)
3-38: Consider enhancing test data coverage.The test data could be more comprehensive to cover edge cases:
- Empty text
- Single-word paragraphs
- Very long paragraphs
- Special characters and Unicode
- Multiple consecutive newlines
64-69: Enhance test coverage and documentation.Consider the following improvements:
- Add docstrings explaining the purpose of each test
- Add negative test cases (invalid inputs, edge cases)
- Consider using pytest.mark.parametrize for more concise test cases
Example improvement:
import pytest @pytest.mark.parametrize("test_case,expected", [ ("whole_text", "whole_text"), ("cut_text", "cut_text"), ]) def test_chunking(test_case, expected): """ Test paragraph chunking functionality. Args: test_case: Type of input text to test expected: Expected ground truth key """ run_chunking_test(INPUT_TEXT[test_case], GROUND_TRUTH[expected])cognee/tests/unit/interfaces/graph/conftest.py (3)
35-40: Consider using relative dates for driving license validityThe hardcoded future dates (2025-11-06) in the driving license ground truth data may cause test failures when these dates expire. Consider using relative dates based on the current date to ensure the tests remain valid over time.
- "issued_on": "2025-11-06", - "expires_on": "2025-11-06", + "issued_on": datetime.now(timezone.utc).strftime("%Y-%m-%d"), + "expires_on": datetime.now(timezone.utc).replace(year=datetime.now().year + 2).strftime("%Y-%m-%d"),
62-69: Consider adding index_fields metadata to Car modelBoth CarType and Person models have index_fields metadata defined, but Car model is missing it. Consider adding index_fields for consistent indexing across models.
class Car(DataPoint): id: str brand: str model: str year: int color: str is_type: CarType + _metadata: dict = dict(index_fields=["brand", "model"])
80-81: Add return type hint to the fixtureThe fixture's return type should be explicitly defined for better code maintainability and IDE support.
@pytest.fixture(scope="session") -def graph_outputs(): +def graph_outputs() -> tuple[Car, Person, tuple, Person]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
cognee/tests/integration/run_toy_tasks/conftest.py(1 hunks)cognee/tests/integration/run_toy_tasks/run_task_from_queue_test.py(3 hunks)cognee/tests/integration/run_toy_tasks/run_tasks_test.py(2 hunks)cognee/tests/unit/documents/PdfDocument_test.py(1 hunks)cognee/tests/unit/interfaces/graph/conftest.py(1 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py(1 hunks)cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py(1 hunks)cognee/tests/unit/interfaces/graph/util.py(1 hunks)cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cognee/tests/integration/run_toy_tasks/conftest.py
🔇 Additional comments (10)
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py (1)
25-30: Verify test discovery and similar patterns
Let's ensure this test is properly discovered and follows consistent patterns with other tests.
cognee/tests/integration/run_toy_tasks/run_tasks_test.py (2)
Line range hint 7-21: LGTM: Well-structured pipeline setup
The nested functions are well-organized, following a clear pattern with single responsibilities. The use of async generators is appropriate for streaming data processing.
42-43: LGTM: Clean test runner implementation
The test runner function follows pytest conventions and correctly handles the async-to-sync transition.
cognee/tests/integration/run_toy_tasks/run_task_from_queue_test.py (1)
1-7: LGTM: Clean and appropriate imports
The imports are well-organized with a clear separation between standard library and custom modules.
cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py (1)
1-58: Verify test categorization as unit test
Based on the PR objectives mentioning the reorganization of tests between unit and integration folders, please verify if this test should remain in the unit test folder. If the test requires external state (like a database) or complex dependencies, it might be better suited as an integration test.
cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py (3)
1-1: LGTM! Clean and focused import.
1-69: Verify test categorization.
Given the PR objectives mention reorganizing tests between unit and integration folders, please verify that this test belongs in the unit test folder. If it requires any external state (like the mentioned SQLite database), it might need to be moved to the integration test folder.
#!/bin/bash
# Check if chunk_by_paragraph has any database dependencies
rg -l "sqlite|database|db" --type py | xargs rg "chunk_by_paragraph"7-8: Verify word count accuracy and cut type logic.
The word count and cut type assertions need verification:
- "Third paragraph is cut and is missing the dot at the end" is marked as 12 words but appears to have 13 words
- The logic for determining "sentence_cut" vs "sentence_end" should be verified
Also applies to: 33-35
cognee/tests/unit/interfaces/graph/conftest.py (2)
44-54: LGTM!
The CarTypeName enum is well-structured with comprehensive coverage of car types.
80-110: Consider moving this fixture to integration tests
This fixture creates a complete graph with database-like structures and complex object relationships. Based on the PR objectives mentioning the separation of unit and integration tests, this might be better suited for the integration test directory since it's testing the interaction between multiple components.
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py
Show resolved
Hide resolved
| for ground_truth, paragraph_data in zip( | ||
| GROUND_TRUTH, pdf_doc.read(chunk_size=1024) | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Chunk size not adequately tested
Tests do not verify different chunk sizes, which may affect the results.
- cognee/tests/unit/documents/PdfDocument_test.py lines 24-26
🔗 Analysis chain
Verify chunk size handling
The test uses a fixed chunk size of 1024 bytes but doesn't verify how different chunk sizes might affect the results.
Let's verify if the chunk size is documented or tested elsewhere:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for other tests or documentation of chunk size handling
rg -g '!*.pyc' -g '!*.pdf' "chunk_size" ./cognee/
Length of output: 7381
Script:
#!/bin/bash
# Search for different chunk_size values in test files
rg -g 'tests/**/*.py' 'chunk_size\s*=\s*(?!1024)\d+' ./cognee/
Length of output: 313
Script:
#!/bin/bash
# Search for different chunk_size values in test files, excluding 1024
rg -g 'tests/**/*.py' 'chunk_size\s*=\s*\d+' ./cognee/ | grep -v 'chunk_size\s*=\s*1024'
Length of output: 90
cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py
Outdated
Show resolved
Hide resolved
d7a1c28 to
58a733c
Compare
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (6)
cognee/tests/integration/run_toy_tasks/run_task_from_queue_test.py (3)
34-36: Enhance assertion error message for better debuggingWhile the current error message includes the index and values, it could be more descriptive about the operation being tested.
Consider updating the assertion message to be more descriptive:
- ), f"at {index = }: {result = } != {results[index] = }" + ), f"Pipeline result mismatch at step {index}: got {result}, expected {results[index]} (after add_one and multiply_by_two operations)"
53-54: Add test isolation and cleanupThe test function should handle cleanup of resources and ensure proper isolation.
Consider adding setup and teardown:
+import pytest + +@pytest.fixture(autouse=True) +async def setup_teardown(): + # Setup (if needed) + yield + # Cleanup any remaining queue items or reset state + await asyncio.sleep(0.1) # Allow any pending tasks to complete + def test_run_tasks_from_queue(): asyncio.run(run_queue())
Line range hint
1-54: Consider adding comprehensive test scenariosWhile the current test covers the happy path, consider adding tests for:
- Empty queue handling
- Error scenarios (e.g., malformed data in queue)
- Queue closure before completion
- Concurrent queue operations with multiple producers
This would provide better coverage of edge cases and failure scenarios.
cognee/tests/unit/interfaces/graph/conftest.py (3)
14-24: Add docstring to improve code documentation.Consider adding a docstring to explain the purpose and usage of this enum.
class CarTypeName(Enum): + """Enumeration of supported car types used for classification and validation.""" Pickup = "Pickup" Sedan = "Sedan"
26-48: Enhance type safety and documentation of data models.Consider the following improvements:
- Add docstrings to describe each model's purpose and relationships
- Use TypedDict for driving_license structure
- Make type hints more specific where possible
+from typing import TypedDict + +class DrivingLicense(TypedDict): + issued_by: str + issued_on: str + number: str + expires_on: str class CarType(DataPoint): + """Represents a classification type for vehicles.""" id: str name: CarTypeName _metadata: dict = dict(index_fields=["name"]) class Car(DataPoint): + """Represents a vehicle with its properties and type classification.""" id: str brand: str model: str year: int color: str is_type: CarType class Person(DataPoint): + """Represents an individual with their properties and owned vehicles.""" id: str name: str age: int owns_car: list[Car] - driving_license: Optional[dict] + driving_license: Optional[DrivingLicense] _metadata: dict = dict(index_fields=["name"])
1-1: Add docstring explaining the conftest.py purpose.In response to @borisarzentar's question:
conftest.pyis a special pytest filename that automatically shares fixtures across multiple test files in the same directory and its subdirectories. This is a pytest convention for creating reusable test components.+""" +Pytest configuration and shared fixtures for graph interface tests. + +This conftest.py file provides shared test data and fixtures that can be automatically +used by all test files in the graph interface testing directory. +""" from datetime import datetime, timezone
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (4)
cognee/modules/data/processing/document_types/__tests__/artificial-inteligence.pdfis excluded by!**/*.pdfcognee/modules/data/processing/document_types/__tests__/soldiers-home.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v1.pdfis excluded by!**/*.pdfcognee/modules/pipelines/operations/__tests__/artificial-inteligence.v2.pdfis excluded by!**/*.pdf
📒 Files selected for processing (18)
.github/workflows/test_python_3_10.yml(1 hunks).github/workflows/test_python_3_11.yml(1 hunks).github/workflows/test_python_3_9.yml(1 hunks)cognee/infrastructure/engine/__tests__/model_to_graph_to_model.test.py(0 hunks)cognee/modules/data/processing/document_types/__tests__/PdfDocument.test.py(0 hunks)cognee/modules/pipelines/operations/__tests__/get_graph_url.py(0 hunks)cognee/tasks/chunks/__tests__/chunk_by_paragraph.test.py(0 hunks)cognee/tasks/documents/classify_documents.py(1 hunks)cognee/tests/integration/run_toy_tasks/conftest.py(1 hunks)cognee/tests/integration/run_toy_tasks/run_task_from_queue_test.py(3 hunks)cognee/tests/integration/run_toy_tasks/run_tasks_test.py(2 hunks)cognee/tests/unit/documents/PdfDocument_test.py(1 hunks)cognee/tests/unit/interfaces/graph/conftest.py(1 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py(1 hunks)cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py(1 hunks)cognee/tests/unit/interfaces/graph/util.py(1 hunks)cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py(1 hunks)pytest.ini(0 hunks)
💤 Files with no reviewable changes (5)
- cognee/infrastructure/engine/tests/model_to_graph_to_model.test.py
- cognee/modules/data/processing/document_types/tests/PdfDocument.test.py
- cognee/modules/pipelines/operations/tests/get_graph_url.py
- cognee/tasks/chunks/tests/chunk_by_paragraph.test.py
- pytest.ini
🚧 Files skipped from review as they are similar to previous changes (10)
- .github/workflows/test_python_3_10.yml
- .github/workflows/test_python_3_11.yml
- .github/workflows/test_python_3_9.yml
- cognee/tasks/documents/classify_documents.py
- cognee/tests/integration/run_toy_tasks/conftest.py
- cognee/tests/integration/run_toy_tasks/run_tasks_test.py
- cognee/tests/unit/documents/PdfDocument_test.py
- cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py
- cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py
- cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py
🔇 Additional comments (3)
cognee/tests/integration/run_toy_tasks/run_task_from_queue_test.py (1)
22-29: LGTM: Well-structured task pipeline with improved traceability
The addition of the pipeline_name parameter enhances observability and provides clear context for the test execution.
cognee/tests/unit/interfaces/graph/util.py (1)
5-17: Consider relocating this utility to the integration test folder
Based on the PR objectives mentioning moving tests with external state dependencies to integration tests, this utility function might be better placed in the integration test folder if it's primarily used with tests that require external state (like SQLite database).
Let's check its usage:
✅ Verification successful
Utility function is appropriately located in the unit test folder
The run_test_against_ground_truth function is only used within unit tests, hence it does not require relocation to the integration test folder.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usage of run_test_against_ground_truth
rg "run_test_against_ground_truth" --type py
Length of output: 955
cognee/tests/unit/interfaces/graph/conftest.py (1)
1-11: LGTM! Clean and well-organized imports.
The imports are properly organized with standard library imports separated from third-party imports.
| time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at") | ||
|
|
||
| assert time_delta.total_seconds() < 60, f"{ time_delta.total_seconds() = }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve timestamp validation robustness
The timestamp validation needs improvements in several areas:
- No handling for missing updated_at attribute
- Hardcoded 60-second threshold
- Non-descriptive error message
Apply this diff:
+ MAX_AGE_SECONDS = 60 # Consider making this configurable
+
+ if not hasattr(test_target_item, "updated_at"):
+ raise AttributeError(f"{test_target_item_name} missing required 'updated_at' attribute")
+
time_delta = datetime.now(timezone.utc) - getattr(test_target_item, "updated_at")
- assert time_delta.total_seconds() < 60, f"{ time_delta.total_seconds() = }"
+ assert time_delta.total_seconds() < MAX_AGE_SECONDS, (
+ f"{test_target_item_name} is too old: "
+ f"age={time_delta.total_seconds():.2f}s > threshold={MAX_AGE_SECONDS}s"
+ )Committable suggestion skipped: line range outside the PR's diff.
| for key, ground_truth in ground_truth_dict.items(): | ||
| if isinstance(ground_truth, dict): | ||
| for key2, ground_truth2 in ground_truth.items(): | ||
| assert ( | ||
| ground_truth2 == getattr(test_target_item, key)[key2] | ||
| ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }" | ||
| else: | ||
| assert ground_truth == getattr( | ||
| test_target_item, key | ||
| ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add robust error handling for missing attributes
The current implementation could raise unclear exceptions when attributes or dictionary keys are missing. The previous review suggestion for better error handling hasn't been addressed yet.
Apply this diff to improve error handling:
for key, ground_truth in ground_truth_dict.items():
+ if not hasattr(test_target_item, key):
+ raise AttributeError(f"{test_target_item_name} missing required attribute '{key}'")
+ target_value = getattr(test_target_item, key)
+
if isinstance(ground_truth, dict):
+ if not isinstance(target_value, dict):
+ raise TypeError(f"Expected dict for '{key}', got {type(target_value)}")
for key2, ground_truth2 in ground_truth.items():
+ if key2 not in target_value:
+ raise KeyError(f"Missing key '{key2}' in {key}")
assert (
- ground_truth2 == getattr(test_target_item, key)[key2]
+ ground_truth2 == target_value[key2]
), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {target_value[key2] = }"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for key, ground_truth in ground_truth_dict.items(): | |
| if isinstance(ground_truth, dict): | |
| for key2, ground_truth2 in ground_truth.items(): | |
| assert ( | |
| ground_truth2 == getattr(test_target_item, key)[key2] | |
| ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {getattr(test_target_item, key)[key2] = }" | |
| else: | |
| assert ground_truth == getattr( | |
| test_target_item, key | |
| ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }" | |
| for key, ground_truth in ground_truth_dict.items(): | |
| if not hasattr(test_target_item, key): | |
| raise AttributeError(f"{test_target_item_name} missing required attribute '{key}'") | |
| target_value = getattr(test_target_item, key) | |
| if isinstance(ground_truth, dict): | |
| if not isinstance(target_value, dict): | |
| raise TypeError(f"Expected dict for '{key}', got {type(target_value)}") | |
| for key2, ground_truth2 in ground_truth.items(): | |
| if key2 not in target_value: | |
| raise KeyError(f"Missing key '{key2}' in {key}") | |
| assert ( | |
| ground_truth2 == target_value[key2] | |
| ), f"{test_target_item_name}/{key = }/{key2 = }: {ground_truth2 = } != {target_value[key2] = }" | |
| else: | |
| assert ground_truth == getattr( | |
| test_target_item, key | |
| ), f"{test_target_item_name}/{key = }: {ground_truth = } != {getattr(test_target_item, key) = }" |
| @pytest.fixture(scope="session") | ||
| def graph_outputs(): | ||
| boris = Person( | ||
| id="boris", | ||
| name="Boris", | ||
| age=30, | ||
| owns_car=[ | ||
| Car( | ||
| id="car1", | ||
| brand="Toyota", | ||
| model="Camry", | ||
| year=2020, | ||
| color="Blue", | ||
| is_type=CarType(id="sedan", name=CarTypeName.Sedan), | ||
| ) | ||
| ], | ||
| driving_license={ | ||
| "issued_by": "PU Vrsac", | ||
| "issued_on": "2025-11-06", | ||
| "number": "1234567890", | ||
| "expires_on": "2025-11-06", | ||
| }, | ||
| ) | ||
| nodes, edges = get_graph_from_model(boris) | ||
|
|
||
| car, person = nodes[0], nodes[1] | ||
| edge = edges[0] | ||
|
|
||
| parsed_person = get_model_instance_from_graph(nodes, edges, "boris") | ||
|
|
||
| return (car, person, edge, parsed_person) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve fixture robustness and maintainability.
Consider the following improvements:
- Use constants for test data values
- Use dynamic dates relative to current time
- Add return type annotation
- Add basic validation of graph conversion results
+from dataclasses import dataclass
+from typing import NamedTuple
+
+@dataclass
+class TestData:
+ PERSON_ID: str = "boris"
+ PERSON_NAME: str = "Boris"
+ PERSON_AGE: int = 30
+ CAR_ID: str = "car1"
+ LICENSE_ISSUER: str = "PU Vrsac"
+ LICENSE_NUMBER: str = "1234567890"
+class GraphOutputs(NamedTuple):
+ car: dict
+ person: dict
+ edge: dict
+ parsed_person: Person
@pytest.fixture(scope="session")
-def graph_outputs():
+def graph_outputs() -> GraphOutputs:
+ """Provides test data for graph conversion testing.
+
+ Returns:
+ GraphOutputs: Tuple containing car, person, edge, and parsed person data
+ """
+ # Calculate license dates relative to current time
+ current_date = datetime.now(timezone.utc)
+ expiry_date = current_date.replace(year=current_date.year + 2)
+ formatted_date = expiry_date.strftime("%Y-%m-%d")
+
boris = Person(
- id="boris",
- name="Boris",
- age=30,
+ id=TestData.PERSON_ID,
+ name=TestData.PERSON_NAME,
+ age=TestData.PERSON_AGE,
owns_car=[
Car(
- id="car1",
+ id=TestData.CAR_ID,
brand="Toyota",
model="Camry",
year=2020,
color="Blue",
is_type=CarType(id="sedan", name=CarTypeName.Sedan),
)
],
driving_license={
- "issued_by": "PU Vrsac",
- "issued_on": "2025-11-06",
- "number": "1234567890",
- "expires_on": "2025-11-06",
+ "issued_by": TestData.LICENSE_ISSUER,
+ "issued_on": formatted_date,
+ "number": TestData.LICENSE_NUMBER,
+ "expires_on": formatted_date,
},
)
nodes, edges = get_graph_from_model(boris)
+
+ # Basic validation
+ assert len(nodes) == 2, "Expected two nodes (car and person)"
+ assert len(edges) == 1, "Expected one edge (person owns car)"
car, person = nodes[0], nodes[1]
edge = edges[0]
- parsed_person = get_model_instance_from_graph(nodes, edges, "boris")
+ parsed_person = get_model_instance_from_graph(nodes, edges, TestData.PERSON_ID)
+ assert parsed_person.id == TestData.PERSON_ID, "Parsed person ID mismatch"
- return (car, person, edge, parsed_person)
+ return GraphOutputs(car, person, edge, parsed_person)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.fixture(scope="session") | |
| def graph_outputs(): | |
| boris = Person( | |
| id="boris", | |
| name="Boris", | |
| age=30, | |
| owns_car=[ | |
| Car( | |
| id="car1", | |
| brand="Toyota", | |
| model="Camry", | |
| year=2020, | |
| color="Blue", | |
| is_type=CarType(id="sedan", name=CarTypeName.Sedan), | |
| ) | |
| ], | |
| driving_license={ | |
| "issued_by": "PU Vrsac", | |
| "issued_on": "2025-11-06", | |
| "number": "1234567890", | |
| "expires_on": "2025-11-06", | |
| }, | |
| ) | |
| nodes, edges = get_graph_from_model(boris) | |
| car, person = nodes[0], nodes[1] | |
| edge = edges[0] | |
| parsed_person = get_model_instance_from_graph(nodes, edges, "boris") | |
| return (car, person, edge, parsed_person) | |
| from dataclasses import dataclass | |
| from typing import NamedTuple | |
| @dataclass | |
| class TestData: | |
| PERSON_ID: str = "boris" | |
| PERSON_NAME: str = "Boris" | |
| PERSON_AGE: int = 30 | |
| CAR_ID: str = "car1" | |
| LICENSE_ISSUER: str = "PU Vrsac" | |
| LICENSE_NUMBER: str = "1234567890" | |
| class GraphOutputs(NamedTuple): | |
| car: dict | |
| person: dict | |
| edge: dict | |
| parsed_person: Person | |
| @pytest.fixture(scope="session") | |
| def graph_outputs() -> GraphOutputs: | |
| """Provides test data for graph conversion testing. | |
| Returns: | |
| GraphOutputs: Tuple containing car, person, edge, and parsed person data | |
| """ | |
| # Calculate license dates relative to current time | |
| current_date = datetime.now(timezone.utc) | |
| expiry_date = current_date.replace(year=current_date.year + 2) | |
| formatted_date = expiry_date.strftime("%Y-%m-%d") | |
| boris = Person( | |
| id=TestData.PERSON_ID, | |
| name=TestData.PERSON_NAME, | |
| age=TestData.PERSON_AGE, | |
| owns_car=[ | |
| Car( | |
| id=TestData.CAR_ID, | |
| brand="Toyota", | |
| model="Camry", | |
| year=2020, | |
| color="Blue", | |
| is_type=CarType(id="sedan", name=CarTypeName.Sedan), | |
| ) | |
| ], | |
| driving_license={ | |
| "issued_by": TestData.LICENSE_ISSUER, | |
| "issued_on": formatted_date, | |
| "number": TestData.LICENSE_NUMBER, | |
| "expires_on": formatted_date, | |
| }, | |
| ) | |
| nodes, edges = get_graph_from_model(boris) | |
| # Basic validation | |
| assert len(nodes) == 2, "Expected two nodes (car and person)" | |
| assert len(edges) == 1, "Expected one edge (person owns car)" | |
| car, person = nodes[0], nodes[1] | |
| edge = edges[0] | |
| parsed_person = get_model_instance_from_graph(nodes, edges, TestData.PERSON_ID) | |
| assert parsed_person.id == TestData.PERSON_ID, "Parsed person ID mismatch" | |
| return GraphOutputs(car, person, edge, parsed_person) |
Summary by CodeRabbit
New Features
PdfDocumentand graph-related functionalities.Bug Fixes
Documentation
Chores
pytest.iniconfiguration file.