Add unit tests for storage utility functions (#2312)#2338
Add unit tests for storage utility functions (#2312)#2338haroldfabla2-hue wants to merge 3 commits intotopoteretes:mainfrom
Conversation
- Add entrypoint.py as portable alternative to bash script - Windows Docker Desktop has issues executing .sh files - Python entrypoint works cross-platform without shell dependencies - Fallback bash entrypoint.sh kept for Linux/macOS environments Fixes: topoteretes#2274
…coded 'public' schema This fixes the bug where delete_dataset() fails in non-public Postgres schemas because the schema was hardcoded to 'public'. The fix changes the default schema_name from 'public' to None, which makes the function use the database's search_path configuration. Closes: topoteretes#2291
- Add test_utils.py with tests for copy_model() and get_own_properties() - Tests cover basic copying, include_fields, exclude_fields, and edge cases - Follows existing test structure in cognee/tests/unit/modules/
Please make sure all the checkboxes are checked:
|
WalkthroughThis PR introduces a new Python entrypoint for Docker with database migration and Gunicorn server startup logic, updates the SqlAlchemyAdapter method signature to support multi-schema database operations, adds unit tests for storage utility functions, and reconfigures the Dockerfile to use the new Python entrypoint. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
275-288:⚠️ Potential issue | 🔴 CriticalFix critical bug:
schema_name=Nonecausesfull_table_nameto become literal string"None.table".When
delete_entity_by_idis called withoutschema_name, it defaults toNoneand passes this explicitly toget_table(). Inget_table()at line 410, the code constructs:full_table_name = f"{schema_name}.{table_name}" # becomes "None.users" (literal string)This will never match actual table names in
metadata.tables, causingEntityNotFoundErroron line 414 even when the table exists. Themetadata.reflect(schema=None)call works correctly for PostgreSQL search_path, but the subsequent table name lookup fails.For non-SQLite databases, handle
schema_name=Noneexplicitly before constructingfull_table_name. WhenNone, either omit the schema prefix or explicitly use"public"(similar to the pattern at line 655).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py` around lines 275 - 288, delete_entity_by_id is passing schema_name=None into get_table which builds full_table_name = f"{schema_name}.{table_name}", producing "None.table" and failing lookup; update the call-site (delete_entity_by_id) and/or get_table to explicitly handle schema_name is None by either omitting the schema prefix when building full_table_name or substituting a real schema like "public" for non-SQLite DBs (mirror the pattern used around line 655), and ensure get_table never constructs f"{None}.{...}" so metadata.tables lookups succeed and EntityNotFoundError is not raised incorrectly.
🧹 Nitpick comments (1)
entrypoint.py (1)
11-25: Consider adding type hints for consistency.The function lacks type hints. While not critical for an entrypoint script, adding them would align with project guidelines.
💡 Proposed type hints
-def run_command(cmd, cwd=None, check=True): +def run_command(cmd: list[str], cwd: str | None = None, check: bool = True) -> subprocess.CompletedProcess: """Run a shell command and return the result."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@entrypoint.py` around lines 11 - 25, Add type hints to run_command: annotate cmd as Sequence[str], cwd as Optional[Union[str, Path]] = None, and check as bool = True, and set the return type to subprocess.CompletedProcess[str]; also import the necessary typing symbols (Optional, Sequence, Union) and Path from pathlib at the top of the file. Ensure the function signature is updated to def run_command(cmd: Sequence[str], cwd: Optional[Union[str, Path]] = None, check: bool = True) -> subprocess.CompletedProcess[str]: and adjust imports accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cognee/tests/unit/modules/storage/test_utils.py`:
- Around line 95-105: The test incorrectly asserts that belongs_to_set is
excluded even though get_own_properties only excludes lists whose first element
is a DataPoint; update the test to match implementation by either (A) changing
the assertion to assert "belongs_to_set" in properties, or (B) if you intend to
test exclusion of nested DataPoint objects, replace the list value with
DataPoint instances (e.g., belongs_to_set=[DataPoint(...), DataPoint(...)]) so
get_own_properties will exclude it; refer to the DataPoint constructor and
get_own_properties function to make the change.
- Around line 13-24: The test fails because copy_model expects a class (accesses
model.__name__) but the test passes an instance; either update the test to call
copy_model(type(original)) or make copy_model accept instances by normalizing
its input (e.g., in copy_model, detect if the input is a class or instance and
set model_class = model if inspect.isclass(model) else type(model), then use
model_class.__name__ and model_class.model_fields); pick one approach and apply
consistently with other callers like get_model_instance_from_graph,
get_graph_from_model, and LanceDBAdapter.
---
Outside diff comments:
In `@cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py`:
- Around line 275-288: delete_entity_by_id is passing schema_name=None into
get_table which builds full_table_name = f"{schema_name}.{table_name}",
producing "None.table" and failing lookup; update the call-site
(delete_entity_by_id) and/or get_table to explicitly handle schema_name is None
by either omitting the schema prefix when building full_table_name or
substituting a real schema like "public" for non-SQLite DBs (mirror the pattern
used around line 655), and ensure get_table never constructs f"{None}.{...}" so
metadata.tables lookups succeed and EntityNotFoundError is not raised
incorrectly.
---
Nitpick comments:
In `@entrypoint.py`:
- Around line 11-25: Add type hints to run_command: annotate cmd as
Sequence[str], cwd as Optional[Union[str, Path]] = None, and check as bool =
True, and set the return type to subprocess.CompletedProcess[str]; also import
the necessary typing symbols (Optional, Sequence, Union) and Path from pathlib
at the top of the file. Ensure the function signature is updated to def
run_command(cmd: Sequence[str], cwd: Optional[Union[str, Path]] = None, check:
bool = True) -> subprocess.CompletedProcess[str]: and adjust imports
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 60257173-c4c1-4543-ab6d-35bc7777336b
📒 Files selected for processing (5)
Dockerfilecognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.pycognee/tests/unit/modules/storage/__init__.pycognee/tests/unit/modules/storage/test_utils.pyentrypoint.py
| def test_copy_model_basic(self): | ||
| """Test basic DataPoint copying without modifications.""" | ||
| original = DataPoint( | ||
| id=UUID("12345678-1234-5678-1234-567812345678"), | ||
| type="TestPoint", | ||
| ) | ||
|
|
||
| copied = copy_model(original) | ||
|
|
||
| assert copied.__name__ == "TestPoint" | ||
| assert "id" in copied.model_fields | ||
| assert "type" in copied.model_fields |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the actual copy_model implementation for __name__ usage
ast-grep --pattern $'def copy_model($$$):
$$$
'Repository: topoteretes/cognee
Length of output: 1570
🏁 Script executed:
#!/bin/bash
# Also check if __name__ is accessed on class or instance
rg -n "__name__" cognee/modules/storage/utils/__init__.py -B2 -A2Repository: topoteretes/cognee
Length of output: 253
🏁 Script executed:
#!/bin/bash
# Check if DataPoint is defined and what it inherits from
rg -n "class DataPoint" -A5 cognee/Repository: topoteretes/cognee
Length of output: 1669
🏁 Script executed:
#!/bin/bash
# Check all usages of copy_model to see if it's called with instances or classes
rg -n "copy_model\(" cognee/ -B2 -A2 | head -100Repository: topoteretes/cognee
Length of output: 4626
🏁 Script executed:
#!/bin/bash
# Check if BaseModel instances have __name__ by looking at Pydantic usage patterns
rg -n "\.\_\_name\_\_" cognee/modules/storage/ -B2 -A2Repository: topoteretes/cognee
Length of output: 144
The test passes an instance to copy_model() but the implementation expects a class.
The test creates a DataPoint instance and passes it to copy_model(original), but the implementation at line 37 accesses model.__name__, which only exists on classes. When an instance is passed, this will raise AttributeError: 'DataPoint' object has no attribute '__name__'.
All other usages in the codebase pass the class via type(source_node) or similar (see get_model_instance_from_graph.py, get_graph_from_model.py, LanceDBAdapter.py). Either the function signature should accept type[DataPoint] instead of DataPoint, or the tests should pass type(original) instead of original.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/tests/unit/modules/storage/test_utils.py` around lines 13 - 24, The
test fails because copy_model expects a class (accesses model.__name__) but the
test passes an instance; either update the test to call
copy_model(type(original)) or make copy_model accept instances by normalizing
its input (e.g., in copy_model, detect if the input is a class or instance and
set model_class = model if inspect.isclass(model) else type(model), then use
model_class.__name__ and model_class.model_fields); pick one approach and apply
consistently with other callers like get_model_instance_from_graph,
get_graph_from_model, and LanceDBAdapter.
| def test_get_own_properties_excludes_nested(self): | ||
| """Test that nested objects are excluded.""" | ||
| data_point = DataPoint( | ||
| type="TestPoint", | ||
| belongs_to_set=["item1", "item2"], | ||
| ) | ||
|
|
||
| properties = get_own_properties(data_point) | ||
|
|
||
| # belongs_to_set should be excluded as it's a list with specific types | ||
| assert "belongs_to_set" not in properties |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the actual get_own_properties implementation
rg -n "def get_own_properties" cognee/modules/storage/utils/__init__.py -A 20Repository: topoteretes/cognee
Length of output: 655
Test assertion is incorrect based on the implementation.
The test expects belongs_to_set to be excluded when set to ["item1", "item2"], but the get_own_properties implementation only excludes lists where the first element is a DataPoint instance. Since "item1" is a string, belongs_to_set will be included in the returned properties, causing this test to fail. Either correct the assertion to check that "belongs_to_set" is in properties, or adjust the test data to use DataPoint instances if the intent is to test exclusion of nested objects.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/tests/unit/modules/storage/test_utils.py` around lines 95 - 105, The
test incorrectly asserts that belongs_to_set is excluded even though
get_own_properties only excludes lists whose first element is a DataPoint;
update the test to match implementation by either (A) changing the assertion to
assert "belongs_to_set" in properties, or (B) if you intend to test exclusion of
nested DataPoint objects, replace the list value with DataPoint instances (e.g.,
belongs_to_set=[DataPoint(...), DataPoint(...)]) so get_own_properties will
exclude it; refer to the DataPoint constructor and get_own_properties function
to make the change.
Summary
Adds unit tests for the storage utility functions in
cognee/modules/storage/utils/__init__.pyas requested in issue #2312.Changes
Created
cognee/tests/unit/modules/storage/test_utils.pywith:copy_model()covering basic copying, include_fields, exclude_fields, combined usage, and edge casesget_own_properties()covering basic extraction, nested object exclusion, and primitive value preservationCreated
cognee/tests/unit/modules/storage/__init__.py(required for pytest discovery)Testing
Tests are designed to run without external dependencies (pure unit tests with mocked LLM where needed).
Closes #2312
Summary by CodeRabbit
New Features
Improvements
Tests
Chores