-
Notifications
You must be signed in to change notification settings - Fork 952
Fix Ollama embedding response key handling #1809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Ollama embedding response key handling #1809
Conversation
Ollama's /api/embeddings endpoint returns different response keys based on input: - Single prompt: returns "embedding" (singular) - Multiple prompts: returns "embeddings" (plural) Cognee sends a single prompt but was incorrectly accessing data["embeddings"][0], causing KeyError that blocked all Ollama embedding generation. Updated OllamaEmbeddingEngine._get_embedding() to: - Check for "embedding" (singular) first - matches Ollama's actual response - Fall back to "embeddings" for backwards compatibility - Raise clear KeyError if neither key exists Fixes: KeyError blocking Ollama embeddings in v0.3.9 and v0.4.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: HillviewCap <[email protected]>
Please make sure all the checkboxes are checked:
|
WalkthroughOllamaEmbeddingEngine._get_embedding now surfaces HTTP errors and accepts responses containing either "embedding" or "embeddings" (returning the first item); a new async test exercises single and batch embedding responses against a live Ollama server. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Test as Test Script
participant Engine as OllamaEmbeddingEngine
participant API as Ollama Server
Test->>Engine: request_embedding(texts)
Engine->>API: POST /embeddings (payload)
API-->>Engine: HTTP Response (status + body)
alt HTTP error (status >=400)
Engine->>Engine: response.raise_for_status() -> raises HTTPError
Engine-->>Test: propagate/raise error
else HTTP 2xx
Engine->>Engine: parse JSON body
alt body contains "embedding"
Engine-->>Test: return embedding (list/array)
else body contains "embeddings"
Engine-->>Test: return first embedding from "embeddings"
else
Engine-->>Test: raise KeyError with descriptive message
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧬 Code graph analysis (1)cognee/tests/test_ollama_fix.py (1)
🪛 Pylint (4.0.3)cognee/tests/test_ollama_fix.py[error] 1-1: Unrecognized option found: suggestion-mode (E0015) [refactor] 12-12: Too many statements (53/50) (R0915) 🪛 Ruff (0.14.5)cognee/tests/test_ollama_fix.py38-38: Do not catch blind exception: (BLE001) 61-61: Do not catch blind exception: (BLE001) 80-80: Do not catch blind exception: (BLE001) Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @HillviewCap, thank you for submitting a PR! We will respond as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (2)
123-126: Consider validating HTTP response status before parsing JSON.The code calls
response.json()without checking if the request succeeded. If Ollama returns an error status (4xx, 5xx), the response body might not contain the expected keys, leading to the KeyError on line 133 instead of a more informative HTTP error.Apply this diff to add status validation:
async with session.post( self.endpoint, json=payload, headers=headers, timeout=60.0 ) as response: + response.raise_for_status() data = await response.json()This will raise an
aiohttp.ClientResponseErrorwith the HTTP status code and message before attempting to parse the response, making debugging easier.
133-135: Optional: Consider extracting the error message to a constant.Static analysis flagged TRY003, suggesting that long error messages should be defined outside the exception instantiation. While not critical, extracting the message improves testability and reusability.
Example refactor:
+_MISSING_EMBEDDING_KEY_MSG = "No 'embedding' or 'embeddings' key found in Ollama API response" + # ... in _get_embedding method: else: - raise KeyError( - "No 'embedding' or 'embeddings' key found in Ollama API response" - ) + raise KeyError(_MISSING_EMBEDDING_KEY_MSG)Based on static analysis hints.
cognee/tests/test_ollama_fix.py (1)
1-1: Optional: Make script executable or remove shebang.The file has a shebang (
#!/usr/bin/env python3) but isn't executable.Either make it executable:
chmod +x cognee/tests/test_ollama_fix.pyOr remove the shebang if it's not meant to be run directly. This addresses the EXE001 static analysis hint.
Based on static analysis hints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock,!**/*.lock
📒 Files selected for processing (2)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(1 hunks)cognee/tests/test_ollama_fix.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cognee/tests/test_ollama_fix.py (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
OllamaEmbeddingEngine(30-171)
🪛 Pylint (4.0.3)
cognee/tests/test_ollama_fix.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 13-13: Too many statements (53/50)
(R0915)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
[refactor] 128-135: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
🪛 Ruff (0.14.5)
cognee/tests/test_ollama_fix.py
1-1: Shebang is present but file is not executable
(EXE001)
24-24: f-string without any placeholders
Remove extraneous f prefix
(F541)
27-27: f-string without any placeholders
Remove extraneous f prefix
(F541)
39-39: Do not catch blind exception: Exception
(BLE001)
49-49: f-string without any placeholders
Remove extraneous f prefix
(F541)
55-55: f-string without any placeholders
Remove extraneous f prefix
(F541)
62-62: Do not catch blind exception: Exception
(BLE001)
76-76: f-string without any placeholders
Remove extraneous f prefix
(F541)
81-81: Do not catch blind exception: Exception
(BLE001)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
133-135: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
127-135: LGTM! The fix correctly handles Ollama's response format variations.The conditional logic properly addresses the KeyError by checking for "embedding" (singular) first, then falling back to "embeddings" (plural) for compatibility. The explicit error message will aid debugging if the API response changes unexpectedly.
| async def test_ollama_embedding(): | ||
| """Test OllamaEmbeddingEngine with real Ollama server.""" | ||
|
|
||
| print("=" * 80) | ||
| print("Testing OllamaEmbeddingEngine Fix") | ||
| print("=" * 80) | ||
|
|
||
| # Configure for your Ollama server | ||
| ollama_endpoint = "http://10.0.10.9:11434/api/embeddings" | ||
| ollama_model = "nomic-embed-text" | ||
|
|
||
| print(f"\nConfiguration:") | ||
| print(f" Endpoint: {ollama_endpoint}") | ||
| print(f" Model: {ollama_model}") | ||
| print(f" Expected dimensions: 768") | ||
|
|
||
| # Initialize the embedding engine | ||
| print("\n1. Initializing OllamaEmbeddingEngine...") | ||
| try: | ||
| engine = OllamaEmbeddingEngine( | ||
| model=ollama_model, | ||
| dimensions=768, | ||
| endpoint=ollama_endpoint, | ||
| huggingface_tokenizer="bert-base-uncased", | ||
| ) | ||
| print(" ✅ Engine initialized successfully") | ||
| except Exception as e: | ||
| print(f" ❌ Failed to initialize engine: {e}") | ||
| sys.exit(1) | ||
|
|
||
| # Test single text embedding | ||
| print("\n2. Testing single text embedding...") | ||
| test_texts = ["The sky is blue and the grass is green."] | ||
|
|
||
| try: | ||
| embeddings = await engine.embed_text(test_texts) | ||
| print(f" ✅ Embedding generated successfully") | ||
| print(f" 📊 Embedding shape: {len(embeddings)} texts, {len(embeddings[0])} dimensions") | ||
| print(f" 📊 First 5 values: {embeddings[0][:5]}") | ||
|
|
||
| # Verify dimensions | ||
| if len(embeddings[0]) == 768: | ||
| print(f" ✅ Dimensions match expected (768)") | ||
| else: | ||
| print(f" ⚠️ Dimensions mismatch: got {len(embeddings[0])}, expected 768") | ||
|
|
||
| except KeyError as e: | ||
| print(f" ❌ KeyError (this is the bug we're fixing): {e}") | ||
| sys.exit(1) | ||
| except Exception as e: | ||
| print(f" ❌ Failed to generate embedding: {type(e).__name__}: {e}") | ||
| sys.exit(1) | ||
|
|
||
| # Test multiple texts | ||
| print("\n3. Testing multiple text embeddings...") | ||
| test_texts_multiple = [ | ||
| "Hello world", | ||
| "Machine learning is fascinating", | ||
| "Ollama embeddings work great" | ||
| ] | ||
|
|
||
| try: | ||
| embeddings = await engine.embed_text(test_texts_multiple) | ||
| print(f" ✅ Multiple embeddings generated successfully") | ||
| print(f" 📊 Generated {len(embeddings)} embeddings") | ||
| for i, emb in enumerate(embeddings): | ||
| print(f" 📊 Text {i+1}: {len(emb)} dimensions, first 3 values: {emb[:3]}") | ||
|
|
||
| except Exception as e: | ||
| print(f" ❌ Failed to generate embeddings: {type(e).__name__}: {e}") | ||
| sys.exit(1) | ||
|
|
||
| # Success! | ||
| print("\n" + "=" * 80) | ||
| print("✅ ALL TESTS PASSED!") | ||
| print("=" * 80) | ||
| print("\nThe OllamaEmbeddingEngine fix is working correctly!") | ||
| print("- Handles 'embedding' (singular) response from Ollama API") | ||
| print("- Generates embeddings successfully") | ||
| print("- Correct dimensions (768 for nomic-embed-text)") | ||
| print("\n✅ Ready to submit PR!") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| asyncio.run(test_ollama_embedding()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Convert integration test to proper unit test with mocking.
This script requires a live Ollama server, making it unsuitable for automated testing in CI/CD pipelines. Consider converting it to a proper unit test using mocking.
Create a proper unit test that mocks the HTTP response:
import unittest
from unittest.mock import AsyncMock, patch, MagicMock
from cognee.infrastructure.databases.vector.embeddings.OllamaEmbeddingEngine import OllamaEmbeddingEngine
class TestOllamaEmbeddingEngine(unittest.TestCase):
@patch('aiohttp.ClientSession')
async def test_embedding_singular_key(self, mock_session):
"""Test that engine handles 'embedding' (singular) response key."""
# Mock the API response with singular 'embedding' key
mock_response = AsyncMock()
mock_response.json = AsyncMock(return_value={"embedding": [0.1] * 768})
mock_response.__aenter__ = AsyncMock(return_value=mock_response)
mock_response.__aexit__ = AsyncMock()
mock_session.return_value.__aenter__.return_value.post.return_value = mock_response
engine = OllamaEmbeddingEngine(
model="nomic-embed-text",
dimensions=768,
endpoint="http://localhost:11434/api/embeddings",
huggingface_tokenizer="bert-base-uncased",
)
embeddings = await engine.embed_text(["test text"])
self.assertEqual(len(embeddings), 1)
self.assertEqual(len(embeddings[0]), 768)
@patch('aiohttp.ClientSession')
async def test_embedding_plural_key(self, mock_session):
"""Test backward compatibility with 'embeddings' (plural) response key."""
# Mock the API response with plural 'embeddings' key
mock_response = AsyncMock()
mock_response.json = AsyncMock(return_value={"embeddings": [[0.1] * 768]})
mock_response.__aenter__ = AsyncMock(return_value=mock_response)
mock_response.__aexit__ = AsyncMock()
mock_session.return_value.__aenter__.return_value.post.return_value = mock_response
engine = OllamaEmbeddingEngine(
model="nomic-embed-text",
dimensions=768,
endpoint="http://localhost:11434/api/embeddings",
huggingface_tokenizer="bert-base-uncased",
)
embeddings = await engine.embed_text(["test text"])
self.assertEqual(len(embeddings), 1)
self.assertEqual(len(embeddings[0]), 768)
@patch('aiohttp.ClientSession')
async def test_missing_embedding_keys_raises_error(self, mock_session):
"""Test that missing keys raise descriptive KeyError."""
mock_response = AsyncMock()
mock_response.json = AsyncMock(return_value={"error": "Invalid model"})
mock_response.__aenter__ = AsyncMock(return_value=mock_response)
mock_response.__aexit__ = AsyncMock()
mock_session.return_value.__aenter__.return_value.post.return_value = mock_response
engine = OllamaEmbeddingEngine(
model="nomic-embed-text",
dimensions=768,
endpoint="http://localhost:11434/api/embeddings",
huggingface_tokenizer="bert-base-uncased",
)
with self.assertRaises(KeyError) as context:
await engine.embed_text(["test text"])
self.assertIn("No 'embedding' or 'embeddings' key found", str(context.exception))This approach:
- Runs without external dependencies
- Works in CI/CD pipelines
- Follows standard testing patterns
- Tests both response formats and error cases
- Uses assertions instead of sys.exit()
🧰 Tools
🪛 Pylint (4.0.3)
[refactor] 13-13: Too many statements (53/50)
(R0915)
🪛 Ruff (0.14.5)
24-24: f-string without any placeholders
Remove extraneous f prefix
(F541)
27-27: f-string without any placeholders
Remove extraneous f prefix
(F541)
39-39: Do not catch blind exception: Exception
(BLE001)
49-49: f-string without any placeholders
Remove extraneous f prefix
(F541)
55-55: f-string without any placeholders
Remove extraneous f prefix
(F541)
62-62: Do not catch blind exception: Exception
(BLE001)
76-76: f-string without any placeholders
Remove extraneous f prefix
(F541)
81-81: Do not catch blind exception: Exception
(BLE001)
| print("=" * 80) | ||
|
|
||
| # Configure for your Ollama server | ||
| ollama_endpoint = "http://10.0.10.9:11434/api/embeddings" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Hardcoded private IP address will break CI/CD and other environments.
The endpoint "http://10.0.10.9:11434/api/embeddings" is a private network address specific to your local setup. This test will fail in:
- Continuous integration pipelines
- Other developers' machines
- Production environments
- Any system without access to this specific IP
Apply this diff to make the endpoint configurable:
- ollama_endpoint = "http://10.0.10.9:11434/api/embeddings"
+ ollama_endpoint = os.getenv("OLLAMA_ENDPOINT", "http://localhost:11434/api/embeddings")
ollama_model = "nomic-embed-text"Add this import at the top:
import asyncio
import sys
+import os
from cognee.infrastructure.databases.vector.embeddings.OllamaEmbeddingEngine import (This allows customization via environment variables while defaulting to localhost.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/tests/test_ollama_fix.py around line 21, the test uses a hardcoded
private IP endpoint "http://10.0.10.9:11434/api/embeddings" which will break CI
and other environments; change it to read from an environment variable (e.g.,
OLLAMA_ENDPOINT) with a sensible default like
"http://localhost:11434/api/embeddings", add the necessary import for os at the
top of the file, and replace the hardcoded string with
os.getenv("OLLAMA_ENDPOINT", "http://localhost:11434/api/embeddings") so the
endpoint can be configured externally while preserving local defaults.
| except Exception as e: | ||
| print(f" ❌ Failed to initialize engine: {e}") | ||
| sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Replace sys.exit() with proper test assertions.
Using sys.exit(1) is unconventional for tests. Standard test frameworks use assertions or raise exceptions to indicate failures, allowing test runners to properly collect and report results.
If keeping this as an integration test script, at least replace sys.exit() with exceptions:
except Exception as e:
print(f" ❌ Failed to initialize engine: {e}")
- sys.exit(1)
+ raise RuntimeError(f"Engine initialization failed: {e}") from eBetter yet, use the proper unit test approach suggested in the previous comment.
Also applies to: 59-64, 81-83
🧰 Tools
🪛 Ruff (0.14.5)
39-39: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In cognee/tests/test_ollama_fix.py around lines 39-41, 59-64 and 81-83, replace
the calls to sys.exit(1) inside exception handlers with proper test failures by
either raising an exception or using the test framework's failure helpers (e.g.,
pytest.fail("...") or assert False, with a clear message containing the original
exception), so the test runner can record the failure instead of terminating the
process; ensure the original exception text is included in the failure message
for diagnostics.
- Add HTTP response status validation before parsing JSON - Extract error message to constant to improve testability - Simplify elif to if after return statement - Remove shebang from non-executable test file - Fix unnecessary f-string prefixes in test file These changes improve error handling, code quality, and address all actionable comments from the CodeRabbit review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Hey @HillviewCap , thanks for the PR! We are aware of this change that Ollama did, and we made the changes on our side to reflect the new response Ollama gives. The Hope this helps, if you have any more questions, do not hesitate to ask! |
|
Description
Fixes the
KeyError: 'embeddings'that blocks all Ollama embedding generation in Cognee v0.3.9 and v0.4.0I found that during Ollama's
/api/embeddingsendpoint returns different response keys based on input type:{"embedding": [...]}(singular){"embeddings": [[...], [...]]}(plural)The
OllamaEmbeddingEnginewas incorrectly accessingdata["embeddings"][0]for single-prompt requests, causing aKeyErrorthus preventing any Ollama-based embedding generation.Validated against Ollama API documentation
All linting checks passed (
ruff format .andruff check .)Backwards compatible with potential future multi-prompt support
Type of Change
Screenshots/Videos (if applicable)
None
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.