Skip to content

Add unit tests for storage utility functions (#2312)#2338

Open
haroldfabla2-hue wants to merge 3 commits intotopoteretes:mainfrom
haroldfabla2-hue:test/issue-2312-storage-utils
Open

Add unit tests for storage utility functions (#2312)#2338
haroldfabla2-hue wants to merge 3 commits intotopoteretes:mainfrom
haroldfabla2-hue:test/issue-2312-storage-utils

Conversation

@haroldfabla2-hue
Copy link

@haroldfabla2-hue haroldfabla2-hue commented Mar 9, 2026

Summary

Adds unit tests for the storage utility functions in cognee/modules/storage/utils/__init__.py as requested in issue #2312.

Changes

  • Created cognee/tests/unit/modules/storage/test_utils.py with:

    • 5 unit tests for copy_model() covering basic copying, include_fields, exclude_fields, combined usage, and edge cases
    • 3 unit tests for get_own_properties() covering basic extraction, nested object exclusion, and primitive value preservation
  • Created cognee/tests/unit/modules/storage/__init__.py (required for pytest discovery)

Testing

Tests are designed to run without external dependencies (pure unit tests with mocked LLM where needed).

Closes #2312

Summary by CodeRabbit

  • New Features

    • Added Python-based container entrypoint with environment variable configuration and automated database migrations.
    • Introduced debug mode and environment-specific server configuration.
  • Improvements

    • Updated database schema handling to support multi-schema setups via search_path.
  • Tests

    • Added comprehensive unit tests for storage utility functions.
  • Chores

    • Updated Docker configuration for improved Windows/Docker compatibility.

Alberto Farah and others added 3 commits March 7, 2026 07:44
- Add entrypoint.py as portable alternative to bash script
- Windows Docker Desktop has issues executing .sh files
- Python entrypoint works cross-platform without shell dependencies
- Fallback bash entrypoint.sh kept for Linux/macOS environments

Fixes: topoteretes#2274
…coded 'public' schema

This fixes the bug where delete_dataset() fails in non-public Postgres
schemas because the schema was hardcoded to 'public'.

The fix changes the default schema_name from 'public' to None, which
makes the function use the database's search_path configuration.

Closes: topoteretes#2291
- Add test_utils.py with tests for copy_model() and get_own_properties()
- Tests cover basic copying, include_fields, exclude_fields, and edge cases
- Follows existing test structure in cognee/tests/unit/modules/
@pull-checklist
Copy link

pull-checklist bot commented Mar 9, 2026

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

Walkthrough

This PR introduces a new Python entrypoint for Docker with database migration and Gunicorn server startup logic, updates the SqlAlchemyAdapter method signature to support multi-schema database operations, adds unit tests for storage utility functions, and reconfigures the Dockerfile to use the new Python entrypoint.

Changes

Cohort / File(s) Summary
Docker Configuration
Dockerfile, entrypoint.py
Dockerfile updated to copy and execute entrypoint.py instead of entrypoint.sh. New Python entrypoint script added with Alembic migration execution, debugpy integration for development, Gunicorn server launch with UvicornWorker, and environment-based configuration (DEBUG, ENVIRONMENT, ports).
Database Adapter
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
Method signature changed: delete_entity_by_id schema_name parameter default shifted from "public" to None to support multi-schema setups via database search_path; docstring updated accordingly.
Storage Utility Tests
cognee/tests/unit/modules/storage/test_utils.py
New unit test file covering copy_model and get_own_properties functions with scenarios including basic copying, field inclusion/exclusion, and edge cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • Vasilije1990
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR description focuses on storage utility function tests but the raw_summary includes substantial unrelated changes: Dockerfile Docker entrypoint updates, SqlAlchemyAdapter schema_name parameter changes, and a new entrypoint.py script. Remove the Dockerfile, SqlAlchemyAdapter, and entrypoint.py changes from this PR and create separate PRs for each distinct feature (Docker entrypoint refactoring and schema handling improvements).
Linked Issues check ❓ Inconclusive The PR addresses #2312's core testing requirements but the raw_summary shows changes to Dockerfile, SqlAlchemyAdapter, and entrypoint.py that are not mentioned in the PR description and appear unrelated to the storage utility function tests. Clarify whether the Dockerfile, SqlAlchemyAdapter, and entrypoint.py changes are intentional or mistakenly included in this PR; if unintended, move them to separate PRs focused on their specific objectives.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Add unit tests for storage utility functions (#2312)' clearly and concisely describes the main objective of the changeset, focusing on the primary contribution of new unit tests.
Description check ✅ Passed The PR description includes a summary, detailed changes, and testing information. While some template sections are omitted, the essential information is present and the description adequately communicates the PR's purpose and scope.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

275-288: ⚠️ Potential issue | 🔴 Critical

Fix critical bug: schema_name=None causes full_table_name to become literal string "None.table".

When delete_entity_by_id is called without schema_name, it defaults to None and passes this explicitly to get_table(). In get_table() at line 410, the code constructs:

full_table_name = f"{schema_name}.{table_name}"  # becomes "None.users" (literal string)

This will never match actual table names in metadata.tables, causing EntityNotFoundError on line 414 even when the table exists. The metadata.reflect(schema=None) call works correctly for PostgreSQL search_path, but the subsequent table name lookup fails.

For non-SQLite databases, handle schema_name=None explicitly before constructing full_table_name. When None, either omit the schema prefix or explicitly use "public" (similar to the pattern at line 655).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py`
around lines 275 - 288, delete_entity_by_id is passing schema_name=None into
get_table which builds full_table_name = f"{schema_name}.{table_name}",
producing "None.table" and failing lookup; update the call-site
(delete_entity_by_id) and/or get_table to explicitly handle schema_name is None
by either omitting the schema prefix when building full_table_name or
substituting a real schema like "public" for non-SQLite DBs (mirror the pattern
used around line 655), and ensure get_table never constructs f"{None}.{...}" so
metadata.tables lookups succeed and EntityNotFoundError is not raised
incorrectly.
🧹 Nitpick comments (1)
entrypoint.py (1)

11-25: Consider adding type hints for consistency.

The function lacks type hints. While not critical for an entrypoint script, adding them would align with project guidelines.

💡 Proposed type hints
-def run_command(cmd, cwd=None, check=True):
+def run_command(cmd: list[str], cwd: str | None = None, check: bool = True) -> subprocess.CompletedProcess:
     """Run a shell command and return the result."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@entrypoint.py` around lines 11 - 25, Add type hints to run_command: annotate
cmd as Sequence[str], cwd as Optional[Union[str, Path]] = None, and check as
bool = True, and set the return type to subprocess.CompletedProcess[str]; also
import the necessary typing symbols (Optional, Sequence, Union) and Path from
pathlib at the top of the file. Ensure the function signature is updated to def
run_command(cmd: Sequence[str], cwd: Optional[Union[str, Path]] = None, check:
bool = True) -> subprocess.CompletedProcess[str]: and adjust imports
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cognee/tests/unit/modules/storage/test_utils.py`:
- Around line 95-105: The test incorrectly asserts that belongs_to_set is
excluded even though get_own_properties only excludes lists whose first element
is a DataPoint; update the test to match implementation by either (A) changing
the assertion to assert "belongs_to_set" in properties, or (B) if you intend to
test exclusion of nested DataPoint objects, replace the list value with
DataPoint instances (e.g., belongs_to_set=[DataPoint(...), DataPoint(...)]) so
get_own_properties will exclude it; refer to the DataPoint constructor and
get_own_properties function to make the change.
- Around line 13-24: The test fails because copy_model expects a class (accesses
model.__name__) but the test passes an instance; either update the test to call
copy_model(type(original)) or make copy_model accept instances by normalizing
its input (e.g., in copy_model, detect if the input is a class or instance and
set model_class = model if inspect.isclass(model) else type(model), then use
model_class.__name__ and model_class.model_fields); pick one approach and apply
consistently with other callers like get_model_instance_from_graph,
get_graph_from_model, and LanceDBAdapter.

---

Outside diff comments:
In `@cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py`:
- Around line 275-288: delete_entity_by_id is passing schema_name=None into
get_table which builds full_table_name = f"{schema_name}.{table_name}",
producing "None.table" and failing lookup; update the call-site
(delete_entity_by_id) and/or get_table to explicitly handle schema_name is None
by either omitting the schema prefix when building full_table_name or
substituting a real schema like "public" for non-SQLite DBs (mirror the pattern
used around line 655), and ensure get_table never constructs f"{None}.{...}" so
metadata.tables lookups succeed and EntityNotFoundError is not raised
incorrectly.

---

Nitpick comments:
In `@entrypoint.py`:
- Around line 11-25: Add type hints to run_command: annotate cmd as
Sequence[str], cwd as Optional[Union[str, Path]] = None, and check as bool =
True, and set the return type to subprocess.CompletedProcess[str]; also import
the necessary typing symbols (Optional, Sequence, Union) and Path from pathlib
at the top of the file. Ensure the function signature is updated to def
run_command(cmd: Sequence[str], cwd: Optional[Union[str, Path]] = None, check:
bool = True) -> subprocess.CompletedProcess[str]: and adjust imports
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 60257173-c4c1-4543-ab6d-35bc7777336b

📥 Commits

Reviewing files that changed from the base of the PR and between c9370a8 and 6edc0b0.

📒 Files selected for processing (5)
  • Dockerfile
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
  • cognee/tests/unit/modules/storage/__init__.py
  • cognee/tests/unit/modules/storage/test_utils.py
  • entrypoint.py

Comment on lines +13 to +24
def test_copy_model_basic(self):
"""Test basic DataPoint copying without modifications."""
original = DataPoint(
id=UUID("12345678-1234-5678-1234-567812345678"),
type="TestPoint",
)

copied = copy_model(original)

assert copied.__name__ == "TestPoint"
assert "id" in copied.model_fields
assert "type" in copied.model_fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the actual copy_model implementation for __name__ usage
ast-grep --pattern $'def copy_model($$$):
  $$$
'

Repository: topoteretes/cognee

Length of output: 1570


🏁 Script executed:

#!/bin/bash
# Also check if __name__ is accessed on class or instance
rg -n "__name__" cognee/modules/storage/utils/__init__.py -B2 -A2

Repository: topoteretes/cognee

Length of output: 253


🏁 Script executed:

#!/bin/bash
# Check if DataPoint is defined and what it inherits from
rg -n "class DataPoint" -A5 cognee/

Repository: topoteretes/cognee

Length of output: 1669


🏁 Script executed:

#!/bin/bash
# Check all usages of copy_model to see if it's called with instances or classes
rg -n "copy_model\(" cognee/ -B2 -A2 | head -100

Repository: topoteretes/cognee

Length of output: 4626


🏁 Script executed:

#!/bin/bash
# Check if BaseModel instances have __name__ by looking at Pydantic usage patterns
rg -n "\.\_\_name\_\_" cognee/modules/storage/ -B2 -A2

Repository: topoteretes/cognee

Length of output: 144


The test passes an instance to copy_model() but the implementation expects a class.

The test creates a DataPoint instance and passes it to copy_model(original), but the implementation at line 37 accesses model.__name__, which only exists on classes. When an instance is passed, this will raise AttributeError: 'DataPoint' object has no attribute '__name__'.

All other usages in the codebase pass the class via type(source_node) or similar (see get_model_instance_from_graph.py, get_graph_from_model.py, LanceDBAdapter.py). Either the function signature should accept type[DataPoint] instead of DataPoint, or the tests should pass type(original) instead of original.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/tests/unit/modules/storage/test_utils.py` around lines 13 - 24, The
test fails because copy_model expects a class (accesses model.__name__) but the
test passes an instance; either update the test to call
copy_model(type(original)) or make copy_model accept instances by normalizing
its input (e.g., in copy_model, detect if the input is a class or instance and
set model_class = model if inspect.isclass(model) else type(model), then use
model_class.__name__ and model_class.model_fields); pick one approach and apply
consistently with other callers like get_model_instance_from_graph,
get_graph_from_model, and LanceDBAdapter.

Comment on lines +95 to +105
def test_get_own_properties_excludes_nested(self):
"""Test that nested objects are excluded."""
data_point = DataPoint(
type="TestPoint",
belongs_to_set=["item1", "item2"],
)

properties = get_own_properties(data_point)

# belongs_to_set should be excluded as it's a list with specific types
assert "belongs_to_set" not in properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the actual get_own_properties implementation
rg -n "def get_own_properties" cognee/modules/storage/utils/__init__.py -A 20

Repository: topoteretes/cognee

Length of output: 655


Test assertion is incorrect based on the implementation.

The test expects belongs_to_set to be excluded when set to ["item1", "item2"], but the get_own_properties implementation only excludes lists where the first element is a DataPoint instance. Since "item1" is a string, belongs_to_set will be included in the returned properties, causing this test to fail. Either correct the assertion to check that "belongs_to_set" is in properties, or adjust the test data to use DataPoint instances if the intent is to test exclusion of nested objects.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/tests/unit/modules/storage/test_utils.py` around lines 95 - 105, The
test incorrectly asserts that belongs_to_set is excluded even though
get_own_properties only excludes lists whose first element is a DataPoint;
update the test to match implementation by either (A) changing the assertion to
assert "belongs_to_set" in properties, or (B) if you intend to test exclusion of
nested DataPoint objects, replace the list value with DataPoint instances (e.g.,
belongs_to_set=[DataPoint(...), DataPoint(...)]) so get_own_properties will
exclude it; refer to the DataPoint constructor and get_own_properties function
to make the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add unit tests for storage utility functions (copy_model, etc.)

1 participant