fix: Fix security issue #1966

Vasilije1990 · 2026-01-06T14:11:56Z

Description

Security issue reported by the user

Acceptance Criteria

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Code refactoring
Performance improvement
Other (please specify): Security package upgrade

Screenshots/Videos (if applicable)

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
[X ] All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Note

Frontend (notebooks/UX):
- Add instance-aware fetching (cloudFetch, localFetch) and pass instance into useNotebooks; update dashboard and datasets to support cloud/local.
- New markdown rendering via react-markdown (MarkdownPreview); refactor Notebook UI with memoized cells and preview/edit toggle.
- Replace contentEditable with throttled auto-expanding TextArea for better performance; improve handleServerErrors to treat 401/403 and optional retry.
- Package upgrades: next 16.1, react 19.2.3, @auth0/nextjs-auth0 4.14; add react-markdown deps.
DB migrations:
- Add data.label (nullable) and data.last_accessed (TZ DateTime; optional backfill via ENABLE_LAST_ACCESSED).
- Mark old tutorial notebook as deletable.
CI/workflows:
- Convert several tests to pytest -v; add new tests: custom data label and S3 permissions example with Postgres service.
- search_db_tests.yml: add Python version matrix (3.10–3.13) across providers.
- Release workflow: trigger docs and community repos via repository_dispatch on main releases.
Infra/docs:
- Docker adds chromadb extra; .env.template supports DATABASE_CONNECT_ARGS.
- CONTRIBUTING: quick "Simple Example" run instructions.
Tests:
- cognee-mcp test client now asserts zero failures instead of warning.

^{Written by Cursor Bugbot for commit 34c6652. This will update automatically on new commits. Configure here.}

Summary by CodeRabbit

New Features
- Added custom data labeling for organized dataset management
- Added access tracking with automatic cleanup for unused data
- Introduced comprehensive tutorial notebooks with guided knowledge graph examples
- Added configuration examples for ChromaDB, KuzuDB, Neptune Analytics, and PGVector databases
- Enhanced multi-database support with improved connection argument handling
Improvements
- Expanded test coverage across retrieval systems and search functionality
- Enhanced error handling and resilience in API connections
- Improved documentation with migration guides and deprecation notices
- Added support for distributed execution patterns and cloud deployments
- Strengthened access control and permission management workflows

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…trievers to update the timestamp

…DataPoint

…o delete-last-acessed

Delete last acessed

… in completion retriever and graph_completion retriever

feat: adding cleanup function and adding update_node_acess_timestamps…

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

## Description This PR changes the permission test in e2e tests to use pytest. Introduces: - fixtures for the environment setup - one eventloop for all pytest tests - mocking for acreate_structured_output answer generation (for search) - Asserts in permission test (before we use the example only) ## Acceptance Criteria  ## Type of Change  - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [x] **I have tested my changes thoroughly before submitting this PR** - [x] **This PR contains minimal changes necessary to address the issue/feature** - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **New Features** * Entity model now includes description and metadata fields for richer entity information and indexing. * **Tests** * Expanded and restructured permission tests covering multi-tenant and role-based access flows; improved test scaffolding and stability. * E2E test workflow now runs pytest with verbose output and INFO logs. * **Bug Fixes** * Access-tracking updates now commit transactions so access timestamps persist. * **Chores** * General formatting, cleanup, and refactoring across modules and maintenance scripts. ✏️ Tip: You can customize this high-level summary in your review settings.

## Description This PR covers the higher level search.py logic with unit tests. As a part of the implementation we fully cover the following core logic: - search.py - get_search_type_tools (with all the core search types) - search - prepare_search_results contract (testing behavior from search.py interface) ## Acceptance Criteria  ## Type of Change  - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [x] **I have tested my changes thoroughly before submitting this PR** - [x] **This PR contains minimal changes necessary to address the issue/feature** - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **Tests** * Added comprehensive unit test coverage for search functionality, including search type tool selection, search operations, and result preparation workflows across multiple scenarios and edge cases. ✏️ Tip: You can customize this high-level summary in your review settings.

## Description  ## Type of Change  - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [x] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [ ] **I have tested my changes thoroughly before submitting this PR** - [ ] **This PR contains minimal changes necessary to address the issue/feature** - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **Documentation** * Deprecated legacy examples and added a migration guide mapping old paths to new locations * Added a comprehensive new-examples README detailing configurations, pipelines, demos, and migration notes * **New Features** * Added many runnable examples and demos: database configs, embedding/LLM setups, permissions and access-control, custom pipelines (organizational, product recommendation, code analysis, procurement), multimedia, visualization, temporal/ontology demos, and a local UI starter * **Chores** * Updated CI/test entrypoints to use the new-examples layout ✏️ Tip: You can customize this high-level summary in your review settings.  --------- Co-authored-by: lxobr <[email protected]>

## Description  - `map_vector_distances_to_graph_nodes` and `map_vector_distances_to_graph_edges` accept both single-query (flat list) and multi-query (nested list) inputs. - `query_list_length` controls the mode: omit it for single-query behavior, or provide it to enable multi-query mode with strict length validation and per-query results. - `vector_distance` on `Node` and `Edge` is now a list (one distance per query). Constructors set it to `None`, and `reset_distances` initializes it at the start of each search. - `Node.update_distance_for_query` and `Edge.update_distance_for_query` are the only methods that write to `vector_distance`. They ensure the list has enough elements and keep unmatched queries at the penalty value. - `triplet_distance_penalty` is the default distance value used everywhere. Unmatched nodes/edges and missing scores all use this same penalty for consistency. - `edges_by_distance_key` is an index mapping edge labels to matching edges. This lets us update all edges with the same label at once, instead of scanning the full edge list repeatedly. - `calculate_top_triplet_importances` returns `List[Edge]` for single-query mode and `List[List[Edge]]` for multi-query mode. ## Acceptance Criteria  ## Type of Change  - [ ] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [x] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [x] **I have tested my changes thoroughly before submitting this PR** - [x] **This PR contains minimal changes necessary to address the issue/feature** - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **New Features** * Multi-query support for mapping/scoring node and edge distances and a configurable triplet distance penalty. * Distance-keyed edge indexing for more accurate distance-to-edge matching. * **Refactor** * Vector distance metadata changed from scalars to per-query lists; added reset/normalization and per-query update flows. * Node/edge distance initialization now supports deferred/listed distances. * **Tests** * Updated and expanded tests for multi-query flows, list-based distances, edge-key handling, and related error cases. ✏️ Tip: You can customize this high-level summary in your review settings.

…litellm

…1949)  ## Description This PR adds support for structured outputs with llama cpp using litellm and instructor. It returns a Pydantic instance. Based on the github issue described [here](#1947). It features the following: - works for both local and server modes (OpenAI api compatible) - defaults to `JSON` mode (**not JSON schema mode, which is too rigid**) - uses existing patterns around logging & tenacity decorator consistent with other adapters - Respects max_completion_tokens / max_tokens ## Acceptance Criteria  I used the script below to test it with the [Phi-3-mini-4k-instruct model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf). This tests a basic structured data extraction and a more complex one locally, then verifies that data extraction works in server mode. There are instructors in the script on how to set up the models. If you are testing this on a mac, run `brew install llama.cpp` to get llama cpp working locally. If you don't have Apple silicon chips, you will need to alter the script or the configs to run this on GPU. ``` """ Comprehensive test script for LlamaCppAPIAdapter - Tests LOCAL and SERVER modes SETUP INSTRUCTIONS: =================== 1. Download a small model (pick ONE): # Phi-3-mini (2.3GB, recommended - best balance) wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf # OR TinyLlama (1.1GB, smallest but lower quality) wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf 2. For SERVER mode tests, start a server: python -m llama_cpp.server --model ./Phi-3-mini-4k-instruct-q4.gguf --port 8080 --n_gpu_layers -1 """ import asyncio import os from pydantic import BaseModel from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.llama_cpp.adapter import ( LlamaCppAPIAdapter, ) class Person(BaseModel): """Simple test model for person extraction""" name: str age: int class EntityExtraction(BaseModel): """Test model for entity extraction""" entities: list[str] summary: str # Configuration - UPDATE THESE PATHS MODEL_PATHS = [ "./Phi-3-mini-4k-instruct-q4.gguf", "./tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf", ] def find_model() -> str: """Find the first available model file""" for path in MODEL_PATHS: if os.path.exists(path): return path return None async def test_local_mode(): """Test LOCAL mode (in-process, no server needed)""" print("=" * 70) print("Test 1: LOCAL MODE (In-Process)") print("=" * 70) model_path = find_model() if not model_path: print("❌ No model found! Download a model first:") print() return False print(f"Using model: {model_path}") try: adapter = LlamaCppAPIAdapter( name="LlamaCpp-Local", model_path=model_path, # Local mode parameter max_completion_tokens=4096, n_ctx=2048, n_gpu_layers=-1, # 0 for CPU, -1 for all GPU layers ) print(f"✓ Adapter initialized in {adapter.mode_type.upper()} mode") print(" Sending request...") result = await adapter.acreate_structured_output( text_input="John Smith is 30 years old", system_prompt="Extract the person's name and age.", response_model=Person, ) print(f"✅ Success!") print(f" Name: {result.name}") print(f" Age: {result.age}") print() return True except ImportError as e: print(f"❌ ImportError: {e}") print(" Install llama-cpp-python: pip install llama-cpp-python") print() return False except Exception as e: print(f"❌ Failed: {e}") print() return False async def test_server_mode(): """Test SERVER mode (localhost HTTP endpoint)""" print("=" * 70) print("Test 3: SERVER MODE (Localhost HTTP)") print("=" * 70) try: adapter = LlamaCppAPIAdapter( name="LlamaCpp-Server", endpoint="http://localhost:8080/v1", # Server mode parameter api_key="dummy", model="Phi-3-mini-4k-instruct-q4.gguf", max_completion_tokens=1024, chat_format="phi-3" ) print(f"✓ Adapter initialized in {adapter.mode_type.upper()} mode") print(f" Endpoint: {adapter.endpoint}") print(" Sending request...") result = await adapter.acreate_structured_output( text_input="Sarah Johnson is 25 years old", system_prompt="Extract the person's name and age.", response_model=Person, ) print(f"✅ Success!") print(f" Name: {result.name}") print(f" Age: {result.age}") print() return True except Exception as e: print(f"❌ Failed: {e}") print(" Make sure llama-cpp-python server is running on port 8080:") print(" python -m llama_cpp.server --model your-model.gguf --port 8080") print() return False async def test_entity_extraction_local(): """Test more complex extraction with local mode""" print("=" * 70) print("Test 2: Complex Entity Extraction (Local Mode)") print("=" * 70) model_path = find_model() if not model_path: print("❌ No model found!") print() return False try: adapter = LlamaCppAPIAdapter( name="LlamaCpp-Local", model_path=model_path, max_completion_tokens=1024, n_ctx=2048, n_gpu_layers=-1, ) print(f"✓ Adapter initialized") print(" Sending complex extraction request...") result = await adapter.acreate_structured_output( text_input="Natural language processing (NLP) is a subfield of artificial intelligence (AI) and computer science.", system_prompt="Extract all technical entities mentioned and provide a brief summary.", response_model=EntityExtraction, ) print(f"✅ Success!") print(f" Entities: {', '.join(result.entities)}") print(f" Summary: {result.summary}") print() return True except Exception as e: print(f"❌ Failed: {e}") print() return False async def main(): """Run all tests""" print("\n" + "🦙" * 35) print("Llama CPP Adapter - Comprehensive Test Suite") print("Testing LOCAL and SERVER modes") print("🦙" * 35 + "\n") results = {} # Test 1: Local mode (no server needed) print("=" * 70) print("PHASE 1: Testing LOCAL mode (in-process)") print("=" * 70) print() results["local_basic"] = await test_local_mode() results["local_complex"] = await test_entity_extraction_local() # Test 2: Server mode (requires server on 8080) print("\n" + "=" * 70) print("PHASE 2: Testing SERVER mode (requires server running)") print("=" * 70) print() results["server"] = await test_server_mode() # Summary print("\n" + "=" * 70) print("TEST SUMMARY") print("=" * 70) for test_name, passed in results.items(): status = "✅ PASSED" if passed else "❌ FAILED" print(f" {test_name:20s}: {status}") passed_count = sum(results.values()) total_count = len(results) print() print(f"Total: {passed_count}/{total_count} tests passed") if passed_count == total_count: print("\n🎉 All tests passed! The adapter is working correctly.") elif results.get("local_basic"): print("\n✓ Local mode works! Server/cloud tests need llama-cpp-python server running.") else: print("\n⚠️ Please check setup instructions at the top of this file.") if __name__ == "__main__": asyncio.run(main()) ``` **The following screenshots show the tests passing** <img width="622" height="149" alt="image" src="https://github.com/user-attachments/assets/9df02f66-39a9-488a-96a6-dc79b47e3001" /> Test 1 <img width="939" height="750" alt="image" src="https://github.com/user-attachments/assets/87759189-8fd2-450f-af7f-0364101a5690" /> Test 2 <img width="938" height="746" alt="image" src="https://github.com/user-attachments/assets/61e423c0-3d41-4fde-acaf-ae77c3463d66" /> Test 3 <img width="944" height="232" alt="image" src="https://github.com/user-attachments/assets/f7302777-2004-447c-a2fe-b12762241ba9" /> **note** I also tried to test it with the `TinyLlama-1.1B-Chat` model but such a small model is bad at producing structured JSON consistently. ## Type of Change  - [ ] Bug fix (non-breaking change that fixes an issue) - [ X] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) see above ## Pre-submission Checklist  - [X] **I have tested my changes thoroughly before submitting this PR** - [X] **This PR contains minimal changes necessary to address the issue/feature** - [X] My code follows the project's coding standards and style guidelines - [X] I have added tests that prove my fix is effective or that my feature works - [X] I have added necessary documentation (if applicable) - [X] All new and existing tests pass - [X] I have searched existing PRs to ensure this change hasn't been submitted already - [X] I have linked any relevant issues in the description - [X] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **New Features** * Llama CPP integration supporting local (in-process) and server (OpenAI‑compatible) modes. * Selectable provider with configurable model path, context size, GPU layers, and chat format. * Asynchronous structured-output generation with rate limiting, retries/backoff, and debug logging. * **Chores** * Added llama-cpp-python dependency and bumped project version. * **Documentation** * CONTRIBUTING updated with a “Running Simple Example” walkthrough for local/server usage. ✏️ Tip: You can customize this high-level summary in your review settings.

## Description  ## Acceptance Criteria  ## Type of Change  - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [ ] **I have tested my changes thoroughly before submitting this PR** - [ ] **This PR contains minimal changes necessary to address the issue/feature** - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **New Features** * Two interactive tutorial notebooks added (Cognee Basics, Python Development) with runnable code and rich markdown; MarkdownPreview for rendered markdown; instance-aware notebook support and cloud proxy with API key handling; notebook CRUD (create, save, run, delete). * **Bug Fixes** * Improved authentication handling to treat 401/403 consistently. * **Improvements** * Auto-expanding text areas; better error propagation from dataset operations; migration to allow toggling deletability for legacy tutorial notebooks. * **Tests** * Expanded tests for tutorial creation and loading. ✏️ Tip: You can customize this high-level summary in your review settings.

feat(auth): make JWT token expiration configurable via environment variable- Add JWT_LIFETIME_SECONDS environment variable to configure token expiration - Set default expiration to3600 seconds (1 hour) for both API and client auth backends - Remove hardcoded expiration values in favor of environment-based configuration - Add documentation comments explaining the JWT strategy configuration feat(auth): make cookie domain configurable via environment variable - Add AUTH_TOKEN_COOKIE_DOMAIN environment variable to configure cookie domain - When not set or empty, cookie domain defaults to None allowing cross-domain usage - Add documentation explaining cookie expiration is handled by JWT strategy - Update default_transport to use environment-based cookie domainfeat(docker): add CORS_ALLOWED_ORIGINS environment variable - Add CORS_ALLOWED_ORIGINS environment variable with default value of '*' - Configure frontend to use NEXT_PUBLIC_BACKEND_API_URL environment variable - Set default backend API URL to http://localhost:8000 feat(docker): add restart policy to all services - Add restart: always policy to cognee, frontend, neo4j, chromadb, and postgres services - This ensures services automatically restart on failure or system reboot - Improves container reliability and uptime```

refactor(auth): remove redundant comments from JWT strategy configurationRemove duplicate comments that were explaining the JWT lifetime configuration in both API and client authentication backends. The code remains functionallyunchanged but comments are cleaned up for better maintainability. ```

fix(auth): add error handling for JWT lifetime configuration - Add try-catch block to handle invalid JWT_LIFETIME_SECONDS environment variable - Default to 360 seconds when environment variable is not a valid integer - Apply same fix to both API and client authentication backendsdocs(docker): add security warning for CORS configuration - Add comment warning about default CORS_ALLOWED_ORIGINS setting - Emphasize need to override wildcard with specific domains in production ```

fix(embeddings): handle empty API key in LiteLLMEmbeddingEngine - Add conditional check for empty API key to prevent authentication errors- Set default API key to "EMPTY" when no valid key is provided- This ensures proper fallback behavior when API key is not configured ```

feat(Dockerfile): add chromadb support and China mirror option - Add chromadb extra dependency to uv sync commands in Dockerfile- Include optional aliyun mirror configuration for users in China- Update dependency installation to include chromadb extra```

## Description This PR addresses a runtime error where the application fails because ChromaDB is not installed. The error message `"ChromaDB is not installed. Please install it with 'pip install chromadb'"` occurs when attempting to use features that depend on ChromaDB. ## Acceptance Criteria  ## Type of Change  - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [ ] **I have tested my changes thoroughly before submitting this PR** - [ ] **This PR contains minimal changes necessary to address the issue/feature** - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **Chores** * Updated dependency management to include chromadb in the build configuration. ✏️ Tip: You can customize this high-level summary in your review settings.

fix(embeddings): handle empty API key in LiteLLMEmbeddingEngine - Add conditional check for empty API key to prevent authentication errors- Set default API key to "EMPTY" when no valid key is provided- This ensures proper fallback behavior when API key is not configured ```  ## Description This PR fixes an issue where the `LiteLLMEmbeddingEngine` throws an authentication error when the `EMBEDDING_API_KEY` environment variable is empty or not set. The error message indicated `"api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"`. Log Error: 2025-12-23T11:36:58.220908 [error ] Error embedding text: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable [LiteLLMEmbeddingEngine] **Root Cause**: When initializing the embedding engine, if the `api_key` parameter is an empty string, the underlying LiteLLM client doesn't treat it as "no key provided" but instead uses this empty string to make API requests, triggering authentication failure. **Solution**: Added a conditional check in the code that creates the `LiteLLMEmbeddingEngine` instance. If the `EMBEDDING_API_KEY` read from configuration is empty (`None` or empty string), we explicitly set the `api_key` parameter passed to the engine constructor to a non-empty placeholder string `"EMPTY"`. This aligns with LiteLLM's handling of optional authentication and prevents exceptions in scenarios where keys are not required or need to be obtained from other sources **How to Reproduce**: Configure the application with the following settings (as shown in the error log): EMBEDDING_PROVIDER="custom" EMBEDDING_MODEL="openai/Qwen/Qwen3-Embedding-xxx" EMBEDDING_ENDPOINT="xxxxx" EMBEDDING_API_VERSION="" EMBEDDING_DIMENSIONS=1024 EMBEDDING_MAX_TOKENS=16384 EMBEDDING_BATCH_SIZE=10 # If embedding key is not provided same key set for LLM_API_KEY will be used EMBEDDING_API_KEY="" ## Acceptance Criteria  ## Type of Change  - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **Bug Fixes** * Improved API key validation for the embedding service to properly handle blank or missing API keys, ensuring more reliable embedding generation and preventing potential service errors. ✏️ Tip: You can customize this high-level summary in your review settings.

…vice restart policies (#1956)  ## Description This PR introduces several configuration improvements to enhance the application's flexibility and reliability. The changes make JWT token expiration and cookie domain configurable via environment variables, improve CORS configuration, and add container restart policies for better uptime. **JWT Token Expiration Configuration:** - Added `JWT_LIFETIME_SECONDS` environment variable to configure JWT token expiration time - Set default expiration to 3600 seconds (1 hour) for both API and client authentication backends - Removed hardcoded expiration values in favor of environment-based configuration - Added documentation comments explaining the JWT strategy configuration **Cookie Domain Configuration:** - Added `AUTH_TOKEN_COOKIE_DOMAIN` environment variable to configure cookie domain - When not set or empty, cookie domain defaults to `None` allowing cross-domain usage - Added documentation explaining cookie expiration is handled by JWT strategy - Updated default_transport to use environment-based cookie domain **CORS Configuration Enhancement:** - Added `CORS_ALLOWED_ORIGINS` environment variable with default value of `'*'` - Configured frontend to use `NEXT_PUBLIC_BACKEND_API_URL` environment variable - Set default backend API URL to `http://localhost:8000` **Docker Service Reliability:** - Added `restart: always` policy to all services (cognee, frontend, neo4j, chromadb, and postgres) - This ensures services automatically restart on failure or system reboot - Improves container reliability and uptime in production and development environments ## Acceptance Criteria  ## Type of Change  - [x] Bug fix (non-breaking change that fixes an issue) - [x] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable)  ## Pre-submission Checklist  - [x] **I have tested my changes thoroughly before submitting this PR** - [x] **This PR contains minimal changes necessary to address the issue/feature** - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.  ## Summary by CodeRabbit * **New Features** * Services now automatically restart on failure for improved reliability. * **Configuration** * Cookie domain for authentication is now configurable via environment variable, defaulting to None if not set. * JWT token lifetime is now configurable via environment variable, with a 3600-second default. * CORS allowed origins are now configurable with a default of all origins (*). * Frontend backend API URL is now configurable, defaulting to http://localhost:8000. ✏️ Tip: You can customize this high-level summary in your review settings.

pull-checklist · 2026-01-06T14:12:02Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2026-01-06T14:12:14Z

Caution

Review failed

The pull request is closed.

Walkthrough

Comprehensive refactoring introducing database connection argument support, notebook tutorial system redesign, data labeling and access tracking, frontend cloud/local fetch strategies, LLM adapter unification, multi-query graph distance handling, and extensive test suite migration to pytest patterns. Includes new cleanup utilities, example configurations, and deployment configuration updates.

Changes

Cohort / File(s)	Summary
Database Configuration & Connection Arguments `.env.template`, `cognee/infrastructure/databases/relational/config.py`, `cognee/infrastructure/databases/relational/create_relational_engine.py`, `cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py`, `cognee/infrastructure/databases/vector/create_vector_engine.py`, `cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py`	Added `DATABASE_CONNECT_ARGS` environment variable and configuration parsing to support provider-specific connection parameters. Updated SQLAlchemy adapter and engine creation to accept and apply `connect_args` for PostgreSQL/asyncpg and SQLite drivers via `URL.create()` for proper credential handling.
Notebook Tutorial System Restructuring `cognee/modules/notebooks/methods/create_notebook.py`, `cognee/modules/notebooks/methods/create_tutorial_notebooks.py`, `cognee/modules/notebooks/methods/get_notebooks.py`, `cognee/modules/notebooks/methods/__init__.py`, `alembic/versions/1a58b986e6e1_enable_delete_for_old_tutorial_notebooks.py`, `cognee/modules/notebooks/tutorials/...`	Refactored tutorial notebook creation from inline helper to dedicated module loading tutorials from filesystem directories. Added support for tutorial configuration (name, deletable flag) via `config.json`. Introduced UUID5-based stable IDs and comprehensive cell parsing. Created two tutorial directories with extensive markdown/Python cell content.
Data Labeling & Last-Accessed Tracking `cognee/modules/data/models/Data.py`, `alembic/versions/a1b2c3d4e5f6_add_label_column_to_data.py`, `alembic/versions/e1ec1dcb50b6_add_last_accessed_to_data.py`, `cognee/api/v1/datasets/routers/get_datasets_router.py`, `cognee/tasks/ingestion/data_item.py`, `cognee/tasks/ingestion/ingest_data.py`, `cognee/tasks/ingestion/save_data_item_to_storage.py`, `cognee/tasks/cleanup/cleanup_unused_data.py`, `cognee/modules/retrieval/utils/access_tracking.py`, `cognee/tests/test_custom_data_label.py`, `cognee/tests/test_cleanup_unused_data.py`	Added `label` and `last_accessed` columns to Data model. Introduced `DataItem` wrapper for per-item labeling. Implemented access timestamp tracking on data retrieval and cleanup utility for removing unused data based on access recency. Added Alembic migrations for schema updates.
Frontend API Layer & Cloud/Local Integration `cognee-frontend/src/modules/instances/cloudFetch.ts`, `cognee-frontend/src/modules/instances/localFetch.ts`, `cognee-frontend/src/modules/instances/types.ts`, `cognee-frontend/src/app/dashboard/Dashboard.tsx`, `cognee-frontend/src/app/dashboard/InstanceDatasetsAccordion.tsx`, `cognee-frontend/src/modules/notebooks/useNotebooks.ts`, `cognee-frontend/src/modules/notebooks/createNotebook.ts`, `cognee-frontend/src/modules/notebooks/deleteNotebook.ts`, `cognee-frontend/src/modules/notebooks/getNotebooks.ts`, `cognee-frontend/src/modules/notebooks/runNotebookCell.ts`, `cognee-frontend/src/modules/notebooks/saveNotebook.ts`	Created centralized fetch abstractions (`cloudFetch`, `localFetch`) for dual-endpoint support. Refactored notebook hook to accept `CogneeInstance` parameter and delegate to dedicated API modules. Added `setApiKey` mechanism for cloud authentication. Introduced `MarkdownPreview` component with TailwindCSS styling. Extracted all inline fetch calls to reusable module functions.
LLM Infrastructure & Adapter Unification `cognee/infrastructure/llm/LLMGateway.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llama_cpp/adapter.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py`, `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py`	Unified LLM adapters to inherit from `GenericAPIAdapter` base class. Added transcription and image-processing capabilities (`create_transcript`, `transcribe_image`) to generic adapter with `TranscriptionReturnType`. Introduced new `LlamaCppAPIAdapter` with local/server dual-mode support. Updated `LLMGateway` to remove sync `create_structured_output` wrapper. Added observability decorators (`@observe(as_type="generation")`). Removed sync method from LLMGateway.
Graph Distance & Multi-Query Support `cognee/modules/graph/cognee_graph/CogneeGraph.py`, `cognee/modules/graph/cognee_graph/CogneeGraphElements.py`, `cognee/modules/graph/utils/__init__.py`, `cognee/modules/graph/utils/get_entity_nodes_from_triplets.py`	Extended Node/Edge to support per-query vector distance lists instead of single scalar values. Added `edges_by_distance_key`, `triplet_distance_penalty`, and multi-query distance management methods. Introduced `reset_distances`, `map_vector_distances_to_graph_nodes/edges`, and `calculate_top_triplet_importances` methods supporting single/multi-query scenarios. Added `get_entity_nodes_from_triplets` utility.
Retrieval & Search Infrastructure `cognee/modules/retrieval/chunks_retriever.py`, `cognee/modules/retrieval/completion_retriever.py`, `cognee/modules/retrieval/graph_completion_retriever.py`, `cognee/modules/retrieval/summaries_retriever.py`, `cognee/modules/retrieval/triplet_retriever.py`, `cognee/modules/retrieval/utils/brute_force_triplet_search.py`	Integrated access timestamp tracking across all retrievers via `update_node_access_timestamps()`. Changed `TripletRetriever` default `top_k` from 1 to 5. Removed debug scaffolding from `format_triplets`. Updated all retrievers to track and update last-accessed timestamps post-retrieval.
Test Refactoring to Pytest Fixtures `cognee/tests/test_permissions.py`, `cognee/tests/test_search_db.py`, `cognee/tests/test_custom_data_label.py`, `cognee/tests/test_cleanup_unused_data.py`	Migrated procedural tests to pytest-asyncio fixture-based patterns with helper functions and mocks. Restructured `test_permissions.py` and `test_search_db.py` from monolithic flows to modular setup/fixtures. Added session-scoped fixtures for end-to-end state management. Introduced assertion-based validation replacing print-based verification.
Unit Test Expansion & Mocking `cognee/tests/unit/modules/retrieval/...`, `cognee/tests/unit/modules/search/...`, `cognee/tests/unit/modules/graph/...`, `cognee/tests/unit/modules/users/test_tutorial_notebook_creation.py`, `cognee/tests/unit/eval_framework/...`	Expanded retrieval test coverage with mock-based unit tests replacing integration-style setups. Added comprehensive tests for graph distance handling, search type tools, completion logic, and user feedback. Implemented extensive test fixtures for chunk retrieval, graph/triplet completion, summaries, RAG, and temporal retrievers.
Integration Test Additions `cognee/tests/integration/retrieval/test_chunks_retriever.py`, `cognee/tests/integration/retrieval/test_graph_completion_retriever.py`, `cognee/tests/integration/retrieval/test_graph_completion_retriever_context_extension.py`, `cognee/tests/integration/retrieval/test_graph_completion_retriever_cot.py`, `cognee/tests/integration/retrieval/test_rag_completion_retriever.py`, `cognee/tests/integration/retrieval/test_structured_output.py`, `cognee/tests/integration/retrieval/test_summaries_retriever.py`, `cognee/tests/integration/retrieval/test_temporal_retriever.py`, `cognee/tests/integration/retrieval/test_triplet_retriever.py`	Added comprehensive integration tests for all retriever types with multi-scenario setup fixtures (simple, complex, empty graph). Tests validate context retrieval, triplet ordering, top-k limiting, empty-graph behavior, and completion generation paths.
GitHub Workflows & CI/CD `.github/workflows/e2e_tests.yml`, `.github/workflows/examples_tests.yml`, `.github/workflows/release.yml`, `.github/workflows/release_test.yml`, `.github/workflows/search_db_tests.yml`	Added new e2e workflow job for label testing. Renamed and added S3-permissions-example test job with PostgreSQL service. Added `trigger-docs-test-suite` and `trigger-community-test-suite` jobs to release workflow. Introduced python-version matrix support to search_db_tests. Switched test invocations from direct Python scripts to pytest with verbose output.
Deployment & Docker Configuration `docker-compose.yml`, `.env.template`	Added `restart: always` policy to all services. Added `CORS_ALLOWED_ORIGINS` environment variable to cognee service. Added `NEXT_PUBLIC_BACKEND_API_URL` to frontend service.
Data Model Changes `cognee/modules/chunking/models/DocumentChunk.py`, `cognee/modules/engine/models/Entity.py`, `cognee-frontend/package.json`, `cognee-frontend/src/ui/elements/TextArea.tsx`, `cognee-frontend/src/ui/elements/Notebook/MarkdownPreview.tsx`, `cognee-frontend/src/ui/elements/Notebook/Notebook.tsx`, `cognee-frontend/src/ui/elements/Notebook/NotebookCellHeader.tsx`	Removed `metadata` defaults from `DocumentChunk` and `Entity` models. Updated TextArea to support auto-expansion with optional value/onChange props. Refactored Notebook component to use composition with `NotebookCell` and `MarkdownPreview` components. Updated frontend dependencies (react-markdown, next, auth0).
API Enhancements `cognee/api/v1/add/add.py`, `cognee/api/v1/memify/routers/get_memify_router.py`	Extended `add()` function to accept `DataItem` and `list[DataItem]` types. Added `run_in_background` field to `MemifyPayloadDTO` and propagated to memify invocation.
Authentication & Configuration `cognee/modules/users/authentication/default/default_transport.py`, `cognee/modules/users/authentication/get_api_auth_backend.py`, `cognee/modules/users/authentication/get_client_auth_backend.py`, `cognee/modules/users/methods/create_user.py`	Added environment-driven `AUTH_TOKEN_COOKIE_DOMAIN` configuration for cookie domain handling. Implemented robust `JWT_LIFETIME_SECONDS` parsing with fallback defaults. Removed unused imports from `create_user`.
Utilities & Helpers `cognee-mcp/src/test_client.py`, `cognee/modules/retrieval/utils/brute_force_triplet_search.py`, `cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py`, `cognee/tests/unit/api/test_get_raw_data_endpoint.py`	Changed test summary from non-fatal warning to assertion failure. Added `format_triplets` as public export. Added unit tests for raw data endpoint supporting local file and S3 streaming paths.
Frontend Components `cognee-frontend/src/modules/ingestion/useDatasets.ts`, `cognee-frontend/src/utils/handleServerErrors.ts`	Updated `handleServerErrors` signature to use explicit null defaults and treat 403 as unauthorized. Added error re-throw in `fetchDatasets`.
New Examples & Documentation `new-examples/...`, `examples/README.md`, `cognee-starter-kit/README.md`, `CONTRIBUTING.md`, `new-examples/README.md`	Added extensive new example configurations and custom pipeline examples covering database, LLM, embedding, and permission setups. Migrated starter-kit examples to new-examples with deprecation notices. Added running examples and test instructions to CONTRIBUTING.md.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Add a task that deletes the old data that has not been accessed in a while #1335: The addition of cleanup_unused_data.py implements the cleanup task for unused memify data directly addressing this issue's objectives.

Possibly related PRs

chore: retriever test reorganization + adding new tests (integration) (STEP 1) #1881: Identical changes to retriever test structure and TripletRetriever default top_k and brute_force_triplet_search.format_triplets.
feat: Add custom label by contributor: apenade #1913: Both add data labeling support via Data.label column, DataItem wrapper, and related ingestion/migration changes.
feat: redo notebook tutorials #1922: Both modify notebook tutorial subsystem including Alembic revision for deletable tutorials and tutorial creation module refactoring.

Suggested labels

backend, frontend, database, testing, ci-cd, llm-infrastructure, graph-processing, feature-complete

Suggested reviewers

pazone

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b339529 and 34c6652.

⛔ Files ignored due to path filters (8)

cognee-frontend/package-lock.json is excluded by !**/package-lock.json
cognee-mcp/uv.lock is excluded by !**/*.lock
new-examples/configurations/permissions_example/data/artificial_intelligence.pdf is excluded by !**/*.pdf
new-examples/demos/multimedia_processing/data/example.png is excluded by !**/*.png
new-examples/demos/multimedia_processing/data/text_to_speech.mp3 is excluded by !**/*.mp3
new-examples/demos/ontology_medical_comparison/data/scientific_papers/TOJ-22-0073_152Mendoza.pdf is excluded by !**/*.pdf
new-examples/demos/ontology_medical_comparison/data/scientific_papers/nutrients-13-01241.pdf is excluded by !**/*.pdf
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (195)

.env.template
.github/workflows/e2e_tests.yml
.github/workflows/examples_tests.yml
.github/workflows/release.yml
.github/workflows/release_test.yml
.github/workflows/search_db_tests.yml
CONTRIBUTING.md
Dockerfile
alembic/versions/1a58b986e6e1_enable_delete_for_old_tutorial_notebooks.py
alembic/versions/a1b2c3d4e5f6_add_label_column_to_data.py
alembic/versions/e1ec1dcb50b6_add_last_accessed_to_data.py
cognee-frontend/package.json
cognee-frontend/src/app/dashboard/Dashboard.tsx
cognee-frontend/src/app/dashboard/InstanceDatasetsAccordion.tsx
cognee-frontend/src/modules/ingestion/useDatasets.ts
cognee-frontend/src/modules/instances/cloudFetch.ts
cognee-frontend/src/modules/instances/localFetch.ts
cognee-frontend/src/modules/instances/types.ts
cognee-frontend/src/modules/notebooks/createNotebook.ts
cognee-frontend/src/modules/notebooks/deleteNotebook.ts
cognee-frontend/src/modules/notebooks/getNotebooks.ts
cognee-frontend/src/modules/notebooks/runNotebookCell.ts
cognee-frontend/src/modules/notebooks/saveNotebook.ts
cognee-frontend/src/modules/notebooks/useNotebooks.ts
cognee-frontend/src/ui/elements/Notebook/MarkdownPreview.tsx
cognee-frontend/src/ui/elements/Notebook/Notebook.tsx
cognee-frontend/src/ui/elements/Notebook/NotebookCellHeader.tsx
cognee-frontend/src/ui/elements/TextArea.tsx
cognee-frontend/src/utils/handleServerErrors.ts
cognee-mcp/src/test_client.py
cognee-starter-kit/README.md
cognee/api/v1/add/add.py
cognee/api/v1/datasets/routers/get_datasets_router.py
cognee/api/v1/memify/routers/get_memify_router.py
cognee/infrastructure/databases/relational/config.py
cognee/infrastructure/databases/relational/create_relational_engine.py
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/infrastructure/databases/vector/create_vector_engine.py
cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py
cognee/infrastructure/llm/LLMGateway.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llama_cpp/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
cognee/modules/chunking/models/DocumentChunk.py
cognee/modules/data/models/Data.py
cognee/modules/engine/models/Entity.py
cognee/modules/graph/cognee_graph/CogneeGraph.py
cognee/modules/graph/cognee_graph/CogneeGraphElements.py
cognee/modules/graph/utils/__init__.py
cognee/modules/graph/utils/get_entity_nodes_from_triplets.py
cognee/modules/notebooks/methods/__init__.py
cognee/modules/notebooks/methods/create_notebook.py
cognee/modules/notebooks/methods/create_tutorial_notebooks.py
cognee/modules/notebooks/methods/get_notebooks.py
cognee/modules/notebooks/tutorials/cognee-basics/cell-1.md
cognee/modules/notebooks/tutorials/cognee-basics/cell-2.md
cognee/modules/notebooks/tutorials/cognee-basics/cell-3.md
cognee/modules/notebooks/tutorials/cognee-basics/cell-4.py
cognee/modules/notebooks/tutorials/cognee-basics/cell-5.py
cognee/modules/notebooks/tutorials/cognee-basics/cell-6.py
cognee/modules/notebooks/tutorials/cognee-basics/cell-7.py
cognee/modules/notebooks/tutorials/cognee-basics/config.json
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-1.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-10.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-11.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-12.py
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-13.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-14.py
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-15.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-16.py
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-2.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-3.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-4.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-5.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-6.py
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-7.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-8.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/cell-9.py
cognee/modules/notebooks/tutorials/python-development-with-cognee/config.json
cognee/modules/notebooks/tutorials/python-development-with-cognee/data/copilot_conversations.json
cognee/modules/notebooks/tutorials/python-development-with-cognee/data/guido_contributions.json
cognee/modules/notebooks/tutorials/python-development-with-cognee/data/my_developer_rules.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/data/pep_style_guide.md
cognee/modules/notebooks/tutorials/python-development-with-cognee/data/zen_principles.md
cognee/modules/retrieval/chunks_retriever.py
cognee/modules/retrieval/completion_retriever.py
cognee/modules/retrieval/graph_completion_retriever.py
cognee/modules/retrieval/summaries_retriever.py
cognee/modules/retrieval/triplet_retriever.py
cognee/modules/retrieval/utils/access_tracking.py
cognee/modules/retrieval/utils/brute_force_triplet_search.py
cognee/modules/users/authentication/default/default_transport.py
cognee/modules/users/authentication/get_api_auth_backend.py
cognee/modules/users/authentication/get_client_auth_backend.py
cognee/modules/users/methods/create_user.py
cognee/tasks/cleanup/cleanup_unused_data.py
cognee/tasks/ingestion/data_item.py
cognee/tasks/ingestion/ingest_data.py
cognee/tasks/ingestion/save_data_item_to_storage.py
cognee/tasks/summarization/models.py
cognee/tests/integration/retrieval/test_chunks_retriever.py
cognee/tests/integration/retrieval/test_graph_completion_retriever.py
cognee/tests/integration/retrieval/test_graph_completion_retriever_context_extension.py
cognee/tests/integration/retrieval/test_graph_completion_retriever_cot.py
cognee/tests/integration/retrieval/test_rag_completion_retriever.py
cognee/tests/integration/retrieval/test_structured_output.py
cognee/tests/integration/retrieval/test_summaries_retriever.py
cognee/tests/integration/retrieval/test_temporal_retriever.py
cognee/tests/integration/retrieval/test_triplet_retriever.py
cognee/tests/test_cleanup_unused_data.py
cognee/tests/test_custom_data_label.py
cognee/tests/test_permissions.py
cognee/tests/test_search_db.py
cognee/tests/unit/api/test_get_raw_data_endpoint.py
cognee/tests/unit/eval_framework/benchmark_adapters_test.py
cognee/tests/unit/eval_framework/corpus_builder_test.py
cognee/tests/unit/infrastructure/databases/relational/test_RelationalConfig.py
cognee/tests/unit/modules/graph/cognee_graph_elements_test.py
cognee/tests/unit/modules/graph/cognee_graph_test.py
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py
cognee/tests/unit/modules/retrieval/conversation_history_test.py
cognee/tests/unit/modules/retrieval/graph_completion_retriever_context_extension_test.py
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py
cognee/tests/unit/modules/retrieval/rag_completion_retriever_test.py
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py
cognee/tests/unit/modules/retrieval/temporal_retriever_test.py
cognee/tests/unit/modules/retrieval/test_brute_force_triplet_search.py
cognee/tests/unit/modules/retrieval/test_completion.py
cognee/tests/unit/modules/retrieval/test_graph_summary_completion_retriever.py
cognee/tests/unit/modules/retrieval/test_user_qa_feedback.py
cognee/tests/unit/modules/retrieval/triplet_retriever_test.py
cognee/tests/unit/modules/search/test_get_search_type_tools.py
cognee/tests/unit/modules/search/test_search.py
cognee/tests/unit/modules/search/test_search_prepare_search_result_contract.py
cognee/tests/unit/modules/users/test_tutorial_notebook_creation.py
docker-compose.yml
examples/README.md
new-examples/README.md
new-examples/configurations/database_examples/chromadb_vector_database_configuration.py
new-examples/configurations/database_examples/kuzu_graph_database_configuration.py
new-examples/configurations/database_examples/neo4j_graph_database_configuration.py
new-examples/configurations/database_examples/neptune_analytics_aws_database_configuration.py
new-examples/configurations/database_examples/pgvector_postgres_vector_database_configuration.py
new-examples/configurations/database_examples/s3_storage_configuration.py
new-examples/configurations/distributed_execution_with_modal_example.py
new-examples/configurations/embedding_configurations/azure_openai_setup.py
new-examples/configurations/embedding_configurations/openai_setup.py
new-examples/configurations/llm_configurations/azure_openai_setup.py
new-examples/configurations/llm_configurations/openai_setup.py
new-examples/configurations/permissions_example/user_permissions_and_access_control_example.py
new-examples/configurations/structured_output_configurations.py/baml_setup.py
new-examples/configurations/structured_output_configurations.py/litellm_intructor_setup.py
new-examples/custom_pipelines/agentic_reasoning_procurement_example.py
new-examples/custom_pipelines/code_graph_repository_analysis_example.py
new-examples/custom_pipelines/custom_cognify_pipeline_example.py
new-examples/custom_pipelines/dynamic_steps_resume_analysis_hr_example.py
new-examples/custom_pipelines/memify_coding_agent_rule_extraction_example.py
new-examples/custom_pipelines/organizational_hierarchy/data/companies.json
new-examples/custom_pipelines/organizational_hierarchy/data/people.json
new-examples/custom_pipelines/organizational_hierarchy/organizational_hierarchy_pipeline_example.py
new-examples/custom_pipelines/organizational_hierarchy/organizational_hierarchy_pipeline_low_level_example.py
new-examples/custom_pipelines/product_recommendation/data/customers.json
new-examples/custom_pipelines/product_recommendation/product_recommendation_example.py
new-examples/custom_pipelines/relational_database_to_knowledge_graph_migration_example.py
new-examples/demos/conversation_session_persistence_example.py
new-examples/demos/core_features_getting_started_example.py
new-examples/demos/custom_graph_model_entity_schema_definition.py
new-examples/demos/custom_prompt_guide.py
new-examples/demos/direct_llm_call_for_structured_output_example.py
new-examples/demos/dynamic_multiple_weighted_edges_example.py
new-examples/demos/feedback_enrichment_minimal_example.py
new-examples/demos/graph_visualization_example.py
new-examples/demos/multimedia_processing/multimedia_audio_image_processing_example.py
new-examples/demos/nodeset_memory_grouping_with_tags_example.py
new-examples/demos/ontology_medical_comparison/data/enriched_medical_ontology_with_classes.owl
new-examples/demos/ontology_medical_comparison/ontology_medical_domain_comparison_example.py
new-examples/demos/ontology_reference_vocabulary/data/basic_ontology.owl
new-examples/demos/ontology_reference_vocabulary/ontology_as_reference_vocabulary_example.py
new-examples/demos/retrievers_and_search_examples.py
new-examples/demos/simple_default_cognee_pipelines_example.py
new-examples/demos/simple_document_qa/data/alice_in_wonderland.txt
new-examples/demos/simple_document_qa/simple_document_qa_demo.py
new-examples/demos/start_local_ui_frontend_example.py
new-examples/demos/temporal_awareness_example.py
new-examples/demos/web_url_content_ingestion_example.py
new-examples/demos/weighted_edges_relationships_example.py
pyproject.toml

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

.github/workflows/release.yml

+    needs: release-pypi-package
+    if: ${{ inputs.flavour == 'main' }}
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Trigger docs tests
+        run: |
+          curl -L -X POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "Authorization: Bearer ${{ secrets.REPO_DISPATCH_PAT_TOKEN }}" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            https://api.github.com/repos/topoteretes/cognee-docs/dispatches \
+            -d '{"event_type":"new-main-release","client_payload":{"caller_repo":"'"${GITHUB_REPOSITORY}"'"}}'
+
+  trigger-community-test-suite:


To fix this, explicitly limit the GITHUB_TOKEN permissions so the job does not inherit broad repository defaults. The minimal, safe change is to add a permissions block. Since only some jobs already define permissions (release-pypi-package, release-docker-image), and the flagged job trigger-docs-test-suite (and also trigger-community-test-suite) do not need any GitHub API access via GITHUB_TOKEN, we can:

Add a workflow‑level permissions: contents: read so the default for all jobs is read‑only.

Optionally, if you want to be extra strict, you can set permissions: {} on the two trigger jobs, but that’s not required if they don’t use GITHUB_TOKEN at all.

The single best fix with minimal functional impact is adding a root‑level permissions block right under name: release.yml. This ensures:

Jobs without their own permissions (including the one at line 141) use read‑only permissions.

Existing job‑specific permissions blocks stay unchanged and continue to override the default.

No additional methods, imports, or definitions are needed; only the YAML header must be updated.

.github/workflows/release.yml

+    needs: release-pypi-package
+    if: ${{ inputs.flavour == 'main' }}
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Trigger community tests
+        run: |
+          curl -L -X POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "Authorization: Bearer ${{ secrets.REPO_DISPATCH_PAT_TOKEN }}" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            https://api.github.com/repos/topoteretes/cognee-community/dispatches \
+            -d '{"event_type":"new-main-release","client_payload":{"caller_repo":"'"${GITHUB_REPOSITORY}"'"}}'


In general, the fix is to explicitly define a permissions block for each job that currently relies on default GITHUB_TOKEN permissions, granting only the minimal scope needed. For the two jobs that just run curl with a separate PAT, they do not need write permissions from GITHUB_TOKEN and can safely run with read-only or even no explicit content write scopes.

Concretely, in .github/workflows/release.yml, add a permissions block to both trigger-docs-test-suite and trigger-community-test-suite jobs. They do not appear to perform any repository modifications; they only send repository dispatch events to other repositories using secrets.REPO_DISPATCH_PAT_TOKEN. Thus, setting permissions: contents: read (or even permissions: {}) is sufficient to satisfy CodeQL’s requirement to restrict GITHUB_TOKEN. To keep consistency with the rest of the workflow (other jobs use contents: read or contents: write), we can set contents: read on these jobs. No additional imports, methods, or definitions are needed—only YAML changes within this workflow file.

gitguardian · 2026-01-06T14:13:02Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
17116131	Triggered	Generic Password	`5f8a3e2`	new-examples/configurations/database_examples/neo4j_graph_database_configuration.py	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-01-06T14:29:24Z

alembic/versions/a1b2c3d4e5f6_add_label_column_to_data.py

+
+# revision identifiers, used by Alembic.
+revision: str = "a1b2c3d4e5f6"
+down_revision: Union[str, None] = "46a6ce2bd2b2"


Alembic migration branch creates multiple heads

Both a1b2c3d4e5f6_add_label_column_to_data.py and 1a58b986e6e1_enable_delete_for_old_tutorial_notebooks.py have down_revision = "46a6ce2bd2b2", creating a branched migration history. This causes Alembic to detect multiple heads, which will fail migrations with "Multiple heads detected" error. One migration needs its down_revision updated to point to the other migration to create a linear chain.

Additional Locations (1)

alembic/versions/1a58b986e6e1_enable_delete_for_old_tutorial_notebooks.py#L16-L17

chinu0609 and others added 30 commits October 29, 2025 20:12

feat: adding last_accessed_at field to the models and updating the re…

3372679

…trievers to update the timestamp

feat: adding last_accessed_at field to the models and updating the re…

3f27c55

…trievers to update the timestamp

Merge remote-tracking branch 'upstream/dev' into delete-last-acessed

7d4804f

fix: removing last_acessed_at from individual model and adding it to …

5f6f050

…DataPoint

Merge branch 'delete-last-acessed' of github.com:chinu0609/cognee int…

1339687

…o delete-last-acessed

fix: removing node_type and try except

6f06e4a

Merge pull request #4 from chinu0609/delete-last-acessed

4b43afc

Delete last acessed

feat: adding cleanup function and adding update_node_acess_timestamps…

f1afd1f

… in completion retriever and graph_completion retriever

feat: genarlizing getting entities from triplets

5080e8f

test: Add docs tests. Initial commit, still WIP.

90d10e6

feat: adding last_acessed in the Data model

d34fd92

fix: removing hard relations

3c0e915

fix: add text_doc flag

9041a80

fix: add column check in migration

ff263c0

fix: add text_doc flag

c5f0c4a

fix: min to days

fdf037b

Merge pull request #5 from chinu0609/delete-last-acessed

bd71540

feat: adding cleanup function and adding update_node_acess_timestamps…

Merge remote-tracking branch 'upstream/dev'

b327756

Merge branch 'main' of github.com:chinu0609/cognee

ce4a5c8

fix: min to days

85a2bac

fix: remove uneccessary imports

84c8e07

fix: remove uneccessary imports

84bd2f3

fix: return chunk payload

d351c9a

test: add search tests docs

ac33007

Merge branch 'dev' into feature/cog-3213-docs-set-up-guide-script-tests

b328aef

test: add tests to workflows

503bdc3

chore: ruff format

1e56d6d

Add custom label support to Data model (#1769)

82d4866

Update alembic/versions/a1b2c3d4e5f6_add_label_column_to_data.py

8ea83e4

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update cognee/tasks/ingestion/data_item.py

a451fb8

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

dexters1 and others added 24 commits December 19, 2025 11:55

fix: Resolve issues with migration

3bc3f63

chore: tweak mapping and scoring

a85df53

fix: Resolve migration issue

2c4f9b0

nit: update variable names

9808077

fix: update tests

c3cec81

chore: remove duplicate import

f6c76ce

added fixes to litellm

27f2aa0

add support for structured outputs with llamma cpp va instructor and …

d578971

…litellm

update lock file

dd639fa

reformat

8965e31

```

2c79d69

fix(embeddings): handle empty API key in LiteLLMEmbeddingEngine - Add conditional check for empty API key to prevent authentication errors- Set default API key to "EMPTY" when no valid key is provided- This ensures proper fallback behavior when API key is not configured ```

```

570de51

feat(Dockerfile): add chromadb support and China mirror option - Add chromadb extra dependency to uv sync commands in Dockerfile- Include optional aliyun mirror configuration for users in China- Update dependency installation to include chromadb extra```

github-advanced-security bot found potential problems Jan 6, 2026

View reviewed changes

Vasilije1990 closed this Jan 6, 2026

cursor bot reviewed Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fix security issue #1966

fix: Fix security issue #1966

Uh oh!

Vasilije1990 commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

pull-checklist bot commented Jan 6, 2026

Uh oh!

coderabbitai bot commented Jan 6, 2026 •

edited

Loading

Review failed

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

gitguardian bot commented Jan 6, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

@@ -1,4 +1,6 @@
             name: release.yml
+            permissions:
+              contents: read
             on:
               workflow_dispatch:
                 inputs:

fix: Fix security issue #1966

fix: Fix security issue #1966

Uh oh!

Conversation

Vasilije1990 commented Jan 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Acceptance Criteria

Type of Change

Screenshots/Videos (if applicable)

Pre-submission Checklist

DCO Affirmation

Summary by CodeRabbit

Uh oh!

pull-checklist bot commented Jan 6, 2026

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

gitguardian bot commented Jan 6, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Jan 6, 2026

Choose a reason for hiding this comment

Alembic migration branch creates multiple heads

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

Vasilije1990 commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 6, 2026 •

edited

Loading