docs: Add module-level docstrings to cognee/__init__.py#2361
docs: Add module-level docstrings to cognee/__init__.py#2361haroldfabla2-hue wants to merge 4 commits intotopoteretes:mainfrom
Conversation
- Add entrypoint.py as portable alternative to bash script - Windows Docker Desktop has issues executing .sh files - Python entrypoint works cross-platform without shell dependencies - Fallback bash entrypoint.sh kept for Linux/macOS environments Fixes: topoteretes#2274
…coded 'public' schema This fixes the bug where delete_dataset() fails in non-public Postgres schemas because the schema was hardcoded to 'public'. The fix changes the default schema_name from 'public' to None, which makes the function use the database's search_path configuration. Closes: topoteretes#2291
- Add test_utils.py with tests for copy_model() and get_own_properties() - Tests cover basic copying, include_fields, exclude_fields, and edge cases - Follows existing test structure in cognee/tests/unit/modules/
Add comprehensive docstrings to the main public API functions exported from cognee/__init__.py. This improves IDE autocomplete and provides clear usage examples for new users. Addresses: topoteretes#2311
|
Note Currently processing new changes in this PR. This may take a few minutes, please wait... ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✏️ Tip: You can disable in-progress messages and the fortune message in your review settings. Tip You can generate walkthrough in a markdown collapsible section to save space.Enable the ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
275-288:⚠️ Potential issue | 🔴 CriticalCritical:
Nonedefault causes runtime failure on PostgreSQL.Changing
schema_namedefault from"public"toNonewill break deletions on PostgreSQL. Theget_tablemethod (line 380) doesn't handleNoneproperly:
- When
schema_name=Noneis passed toget_table, line 31 evaluates tof"None.{table_name}"(literal string"None.table_name")- This won't match any key in
metadata.tablessincereflect(schema=None)produces unqualified table names- Result:
EntityNotFoundErroris always raisedThe existing caller in
cognee/modules/data/methods/delete_dataset.py:58callsdelete_entity_by_idwithoutschema_name, so this change will break dataset deletion.Either fix
get_tableto handleNoneproperly, or keep the"public"default for consistency with other methods (delete_table,insert_data,get_table,get_all_data_from_table).Option 1: Revert to "public" default for consistency
async def delete_entity_by_id( - self, table_name: str, data_id: UUID, schema_name: Optional[str] = None + self, table_name: str, data_id: UUID, schema_name: Optional[str] = "public" ):Option 2: Fix get_table to handle None schema
async def get_table(self, table_name: str, schema_name: Optional[str] = "public") -> Table: ... else: # Create a MetaData instance to load table information metadata = MetaData() # Load table information from schema into MetaData await connection.run_sync(metadata.reflect, schema=schema_name) - # Define the full table name - full_table_name = f"{schema_name}.{table_name}" - # Check if table is in list of tables for the given schema - if full_table_name in metadata.tables: - return metadata.tables[full_table_name] - raise EntityNotFoundError(message=f"Table '{full_table_name}' not found.") + # Define the full table name (handle None schema) + if schema_name is None: + full_table_name = table_name + else: + full_table_name = f"{schema_name}.{table_name}" + # Check if table is in list of tables for the given schema + if full_table_name in metadata.tables: + return metadata.tables[full_table_name] + raise EntityNotFoundError(message=f"Table '{full_table_name}' not found.")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py` around lines 275 - 288, The failure is caused by get_table building a qualified key like f"{schema_name}.{table_name}" when schema_name is None, producing "None.table" and causing EntityNotFoundError; fix get_table (used by delete_entity_by_id) to handle None by checking unqualified table keys too: if schema_name is None (or equals the string "None") look up metadata.tables by table_name (and as a fallback try f"{schema_name}.{table_name}" for explicit schemas), and ensure reflect(schema=None) behavior is considered when building keys; update the lookup logic in get_table (and any helper that constructs the key) so delete_entity_by_id can safely be called without a schema_name.
🧹 Nitpick comments (1)
Dockerfile (1)
31-31: Keepentrypoint.pyout of the dependency-cache layer.Including
entrypoint.pyin theCOPYbeforeuv sync --no-install-projectmeans any entrypoint-only change now forces a full dependency reinstall. Copy it with the application sources instead so Docker only rebuilds the cheap layers for this kind of change.♻️ Suggested layering change
-COPY README.md pyproject.toml uv.lock entrypoint.sh entrypoint.py ./ +COPY README.md pyproject.toml uv.lock entrypoint.sh ./ @@ COPY ./cognee /app/cognee COPY ./distributed /app/distributed +COPY ./entrypoint.py /app/entrypoint.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` at line 31, The Dockerfile currently copies entrypoint.py into the layer used for dependency caching (COPY README.md pyproject.toml uv.lock entrypoint.sh entrypoint.py ./), causing any change to entrypoint.py to bust the dependency-install layer; remove entrypoint.py from that COPY (keep README.md, pyproject.toml, uv.lock, and entrypoint.sh as needed), run the uv sync --no-install-project dependency step, then copy entrypoint.py together with the application sources in the later application COPY (the same step that copies your app files) so changes to entrypoint.py only rebuild the cheap app layer rather than forcing a full dependency reinstall.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cognee/__init__.py`:
- Around line 17-21: The example in the module shows top-level await which is
invalid; wrap the sequence of await calls to cognee.add, cognee.cognify, and
cognee.search inside an async entrypoint (e.g., define async def main(): ...),
then run that entrypoint with asyncio.run(main()) so the example is runnable in
standard Python; ensure the example imports asyncio and calls
asyncio.run(main()) instead of using await at top-level.
In `@cognee/tests/unit/modules/storage/test_utils.py`:
- Around line 95-105: The test test_get_own_properties_excludes_nested currently
uses belongs_to_set=["item1","item2"], which is a list[str] and won't trigger
the exclusion logic in get_own_properties; update the test to supply a list
whose first element is a DataPoint (e.g., belongs_to_set=[DataPoint(...)] or
similar nested DataPoint instance) so the exclusion branch in get_own_properties
(in cognee/modules/storage/utils/__init__.py) is exercised and the assertion
that "belongs_to_set" is not in properties will pass.
- Around line 13-25: The test test_copy_model_basic is passing a DataPoint
instance to copy_model but copy_model expects a class and uses model.__name__;
update the test to pass the class (DataPoint) instead of an instance, then
assert the copied model name matches the class name (e.g., "DataPoint") and that
"id" and "type" are in copied.model_fields; alternatively, if you prefer
copy_model to accept instances, adjust copy_model to extract the name from the
instance's type field (accessing instance.type) and update its usage in
copy_model where model.__name__ is referenced.
---
Outside diff comments:
In `@cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py`:
- Around line 275-288: The failure is caused by get_table building a qualified
key like f"{schema_name}.{table_name}" when schema_name is None, producing
"None.table" and causing EntityNotFoundError; fix get_table (used by
delete_entity_by_id) to handle None by checking unqualified table keys too: if
schema_name is None (or equals the string "None") look up metadata.tables by
table_name (and as a fallback try f"{schema_name}.{table_name}" for explicit
schemas), and ensure reflect(schema=None) behavior is considered when building
keys; update the lookup logic in get_table (and any helper that constructs the
key) so delete_entity_by_id can safely be called without a schema_name.
---
Nitpick comments:
In `@Dockerfile`:
- Line 31: The Dockerfile currently copies entrypoint.py into the layer used for
dependency caching (COPY README.md pyproject.toml uv.lock entrypoint.sh
entrypoint.py ./), causing any change to entrypoint.py to bust the
dependency-install layer; remove entrypoint.py from that COPY (keep README.md,
pyproject.toml, uv.lock, and entrypoint.sh as needed), run the uv sync
--no-install-project dependency step, then copy entrypoint.py together with the
application sources in the later application COPY (the same step that copies
your app files) so changes to entrypoint.py only rebuild the cheap app layer
rather than forcing a full dependency reinstall.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 704a5350-f07f-4ab6-8fa0-75f29309bfc4
📒 Files selected for processing (6)
Dockerfilecognee/__init__.pycognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.pycognee/tests/unit/modules/storage/__init__.pycognee/tests/unit/modules/storage/test_utils.pyentrypoint.py
| Example Usage: | ||
| >>> import cognee | ||
| >>> await cognee.add("Cognee builds AI memory.", dataset_name="demo") | ||
| >>> await cognee.cognify(dataset_name="demo") | ||
| >>> results = await cognee.search("What does Cognee do?") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does the standard Python 3.12 interpreter or REPL allow awaitat top level outside anasync def? Please cite the relevant Python documentation.
💡 Result:
No. In standard Python 3.12 code (including the normal interactive REPL), await cannot be used at top level; the language reference states that an await expression “can only be used inside a coroutine function” (i.e., inside async def). [1]
If you want a standard-library REPL that does accept top-level await, Python provides the asyncio REPL via python -m asyncio, which explicitly says to “Use await directly instead of asyncio.run()” and shows await ... at the prompt. [2]
Sources
[1] Python 3.12 Language Reference — 6.4. Await expression https://docs.python.org/3.12/reference/expressions.html
[2] Python docs — asyncio REPL (python -m asyncio) https://docs.python.org/3.12/library/asyncio.html
Wrap the example in an async entrypoint.
The example code shows await at the top level of the Python REPL, which is not valid in standard Python. The await keyword can only be used inside a coroutine function (inside async def). To make this example runnable, wrap it in an async function and use asyncio.run() to execute it:
>>> import asyncio
>>> import cognee
>>> async def main():
... await cognee.add("Cognee builds AI memory.", dataset_name="demo")
... await cognee.cognify(dataset_name="demo")
... results = await cognee.search("What does Cognee do?")
>>> asyncio.run(main())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/__init__.py` around lines 17 - 21, The example in the module shows
top-level await which is invalid; wrap the sequence of await calls to
cognee.add, cognee.cognify, and cognee.search inside an async entrypoint (e.g.,
define async def main(): ...), then run that entrypoint with asyncio.run(main())
so the example is runnable in standard Python; ensure the example imports
asyncio and calls asyncio.run(main()) instead of using await at top-level.
| def test_copy_model_basic(self): | ||
| """Test basic DataPoint copying without modifications.""" | ||
| original = DataPoint( | ||
| id=UUID("12345678-1234-5678-1234-567812345678"), | ||
| type="TestPoint", | ||
| ) | ||
|
|
||
| copied = copy_model(original) | ||
|
|
||
| assert copied.__name__ == "TestPoint" | ||
| assert "id" in copied.model_fields | ||
| assert "type" in copied.model_fields | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "copy_model implementation:"
sed -n '23,40p' cognee/modules/storage/utils/__init__.py
echo
echo "new copy_model tests:"
sed -n '13,25p' cognee/tests/unit/modules/storage/test_utils.pyRepository: topoteretes/cognee
Length of output: 1237
Pass a class to copy_model(), not an instance.
copy_model() expects a class as its model parameter and uses model.__name__ to get the class name. The test passes a DataPoint instance instead, which means model.__name__ resolves to "DataPoint" (the class name), not "TestPoint" (the field value). The assertion assert copied.__name__ == "TestPoint" will fail.
Either pass the class (DataPoint itself) or redesign the function to accept instances and extract the name from a field (e.g., type field).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/tests/unit/modules/storage/test_utils.py` around lines 13 - 25, The
test test_copy_model_basic is passing a DataPoint instance to copy_model but
copy_model expects a class and uses model.__name__; update the test to pass the
class (DataPoint) instead of an instance, then assert the copied model name
matches the class name (e.g., "DataPoint") and that "id" and "type" are in
copied.model_fields; alternatively, if you prefer copy_model to accept
instances, adjust copy_model to extract the name from the instance's type field
(accessing instance.type) and update its usage in copy_model where
model.__name__ is referenced.
| def test_get_own_properties_excludes_nested(self): | ||
| """Test that nested objects are excluded.""" | ||
| data_point = DataPoint( | ||
| type="TestPoint", | ||
| belongs_to_set=["item1", "item2"], | ||
| ) | ||
|
|
||
| properties = get_own_properties(data_point) | ||
|
|
||
| # belongs_to_set should be excluded as it's a list with specific types | ||
| assert "belongs_to_set" not in properties |
There was a problem hiding this comment.
This case does not hit the exclusion branch you are asserting.
In cognee/modules/storage/utils/__init__.py:43-61, get_own_properties() only filters lists when the first element is a DataPoint. belongs_to_set=["item1", "item2"] is a list[str], so this property will be preserved and Line 105 will fail. Use a nested DataPoint list here if you want to test the exclusion path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/tests/unit/modules/storage/test_utils.py` around lines 95 - 105, The
test test_get_own_properties_excludes_nested currently uses
belongs_to_set=["item1","item2"], which is a list[str] and won't trigger the
exclusion logic in get_own_properties; update the test to supply a list whose
first element is a DataPoint (e.g., belongs_to_set=[DataPoint(...)] or similar
nested DataPoint instance) so the exclusion branch in get_own_properties (in
cognee/modules/storage/utils/__init__.py) is exercised and the assertion that
"belongs_to_set" is not in properties will pass.
Summary
Add comprehensive docstrings to the main public API functions exported from
cognee/__init__.py. This improves IDE autocomplete and provides clear usage examples for new users.Changes
Issue
Fixes #2311
Checklist
Summary by CodeRabbit
Bug Fixes
Documentation
Tests