Feature/cog 1358 local ollama model support for cognee #555

hajdul88 · 2025-02-18T14:32:19Z

This PR contains the ollama specific llm adapter together with the embedding engine.

Tested with the following models:

LLM_API_KEY="ollama" llm_model = "llama3.1:8b" LLM_PROVIDER = "ollama" llm_endpoint = "http://localhost:11434/v1" EMBEDDING_PROVIDER="ollama" EMBEDDING_MODEL="avr/sfr-embedding-mistral:latest" EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings" EMBEDDING_DIMENSIONS=4096 HUGGINGFACE_TOKENIZER="Salesforce/SFR-Embedding-Mistral"

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

New Features
- Introduced a new embedding option that leverages an external provider for asynchronous text processing.
- Added enhanced language model integration using a dedicated adapter to improve interaction quality.
Enhancements
- Expanded configuration settings to include a new tokenizer option.
- Updated provider selection logic to incorporate the additional embedding and language model features.

…edding)

coderabbitai · 2025-02-18T14:39:34Z

Walkthrough

The changes integrate an Ollama-based service into the system. A new OllamaEmbeddingEngine class extends the existing embedding functionality with asynchronous text embedding, external API calls, and exponential backoff for retries. The configuration is updated with an optional huggingface_tokenizer attribute. The embedding engine selection logic now supports an "ollama" provider. A new OllamaAPIAdapter class has been introduced to handle LLM interactions, and the LLM client creation now instantiates this adapter when the "OLLAMA" provider is specified.

Changes

File(s)	Change Summary
`cognee/infrastructure/.../OllamaEmbeddingEngine.py` `cognee/infrastructure/.../config.py` `cognee/infrastructure/.../get_embedding_engine.py`	- Added new `OllamaEmbeddingEngine` class with async `embed_text`, `_get_embedding` (with exponential backoff), `get_vector_size`, and `get_tokenizer` methods. - Updated `EmbeddingConfig` with a `huggingface_tokenizer` attribute. - Expanded engine selection logic to support the "ollama" provider.
`cognee/infrastructure/llm/get_llm_client.py` `cognee/infrastructure/llm/ollama/adapter.py`	- Introduced new `OllamaAPIAdapter` class implementing LLM interactions via an async method `acreate_structured_output` with retry logic. - Modified the LLM client to return an instance of `OllamaAPIAdapter` for the "OLLAMA" provider.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant EE as OllamaEmbeddingEngine
    participant HTTP as HTTP Client
    participant EX as External Embedding Service
    participant Env as Environment

    C->>EE: call embed_text(prompts)
    EE->>Env: Check MOCK_EMBEDDING
    alt MOCK Enabled
        EE-->>C: Return zero vectors
    else
        loop For each prompt
            EE->>HTTP: _get_embedding(prompt)
            HTTP->>EX: Send async request
            EX-->>HTTP: Return embedding vector / error
            note over EE,HTTP: Retry with exponential backoff if needed
        end
        EE-->>C: Return list of embeddings
    end

sequenceDiagram
    participant C as Client
    participant G as get_llm_client
    participant OA as OllamaAPIAdapter
    participant LLM as External LLM Service

    C->>G: Request LLM client for "OLLAMA" provider
    G->>OA: Instantiate OllamaAPIAdapter
    C->>OA: call acreate_structured_output(text, system_prompt, response_model)
    OA->>LLM: Send chat completion request (with retry logic)
    LLM-->>OA: Return structured response
    OA-->>C: Return output as BaseModel

Poem

I'm a bunny of bytes, hopping through new code lanes,
With Ollama engines and adapters boosting our gains.
I twirl through async calls and retry with grace,
Leaving no bug behind in my rapid-paced race.
My little paws tap joyful keys in digital delight,
Celebrating clean changes deep into the night.
🐇 Keep hopping, keep coding — the future's bright!

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…or-cognee

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)

14-23: Class attributes are well-declared and documented, but consider specifying type hints at the class level.
You might want to leverage typed class attributes with Python 3.9+ (e.g., model: Optional[str] at top) to enable better IDE support and static checks on initialization.

24-43: Environment-based mocking is convenient, but ensure consistent boolean parsing.
Currently, enable_mocking is cast to string and checked against "true", "1", "yes". For added clarity and future maintainability, consider consolidating or normalizing similar environment variables in a helper function or config.

44-57: Async embedding loop is correct, but consider parallelization for performance.
The current usage of for prompt in text: processes embeddings sequentially, which may be slow under high load. You could gather tasks concurrently to speed up large batch embeddings.
cognee/infrastructure/llm/ollama/adapter.py (2)
9-10: Improve class documentation to be Ollama-specific.

The current docstring is generic and doesn't reflect that this is specifically an Ollama adapter.
-    """Adapter for a Generic API LLM provider using instructor with an OpenAI backend."""
+    """Adapter for Ollama LLM provider using instructor with an OpenAI-compatible backend."""
12-21: Add input validation and type hints.

The constructor should validate required parameters and include type hints for better code maintainability.
-    def __init__(self, endpoint: str, api_key: str, model: str, name: str, max_tokens: int):
+    def __init__(
+        self,
+        endpoint: str,
+        api_key: str,
+        model: str,
+        name: str,
+        max_tokens: int,
+    ) -> None:
+        if not endpoint or not api_key or not model:
+            raise ValueError("endpoint, api_key, and model are required parameters")
+        if max_tokens <= 0:
+            raise ValueError("max_tokens must be positive")
         self.name = name
         self.model = model
         self.api_key = api_key
         self.endpoint = endpoint
         self.max_tokens = max_tokens
cognee/infrastructure/llm/get_llm_client.py (1)
54-54: Remove unused import of GenericAPIAdapter.

The import of GenericAPIAdapter is no longer needed in the OLLAMA case.
-        from .generic_llm_api.adapter import GenericAPIAdapter

         return OllamaAPIAdapter(
             llm_config.llm_endpoint,
             llm_config.llm_api_key,
             llm_config.llm_model,
             "Ollama",
             max_tokens=max_tokens,
         )
Also applies to: 56-62

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e98d51a and 2325a88.

📒 Files selected for processing (5)

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1 hunks)
cognee/infrastructure/databases/vector/embeddings/config.py (1 hunks)
cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py (1 hunks)
cognee/infrastructure/llm/get_llm_client.py (2 hunks)
cognee/infrastructure/llm/ollama/adapter.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Test on macos-15
GitHub Check: Test on ubuntu-22.04

🔇 Additional comments (5)

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)

1-13: Great job structuring your imports and initial setup!
This segment cleanly organizes your dependencies, including httpx for async requests and the logging configuration. It follows Pythonic style for clarity. No issues to address here.

58-91: Robust retry logic with exponential backoff looks good, but handle unexpected JSON schema gracefully.
If data["embedding"] is missing or malformed, the code will raise a KeyError. You may want to catch or check if "embedding" exists before returning the data to provide a clearer error or fallback.

92-102: Tokenizer acquisition is straightforward, no major issues noted.
It’s good that you’re keeping the tokenizer logic contained and easily testable.

cognee/infrastructure/databases/vector/embeddings/config.py (1)

14-14: Optional tokenizer addition aligns well with your new Ollama engine.
This extra attribute cleanly extends the configuration without breaking existing usage.

cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py (1)

19-27: Support for the new "ollama" provider is well integrated.
The new branch correctly initializes the OllamaEmbeddingEngine with the relevant configuration. Ensure that any future updates to the Ollama engine (e.g., new parameters) also get passed here.

coderabbitai · 2025-02-19T01:53:42Z

cognee/infrastructure/llm/ollama/adapter.py

+    async def acreate_structured_output(
+        self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
+    ) -> BaseModel:
+        """Generate a structured output from the LLM using the provided text and system prompt."""
+
+        response = self.aclient.chat.completions.create(
+            model=self.model,
+            messages=[
+                {
+                    "role": "user",
+                    "content": f"Use the given format to extract information from the following input: {text_input}",
+                },
+                {
+                    "role": "system",
+                    "content": system_prompt,
+                },
+            ],
+            max_retries=5,
+            response_model=response_model,
+        )
+
+        return response


🛠️ Refactor suggestion

Add error handling and configuration options.

The method should handle API errors and allow configuration of model parameters.

async def acreate_structured_output( self, text_input: str, system_prompt: str, response_model: Type[BaseModel] ) -> BaseModel: - """Generate a structured output from the LLM using the provided text and system prompt.""" + """Generate a structured output from Ollama using the provided text and system prompt. + + Args: + text_input: The input text to process + system_prompt: The system prompt to guide the model + response_model: Pydantic model for response structure + + Returns: + BaseModel: Structured response matching response_model + + Raises: + OpenAIError: If API call fails + ValueError: If input validation fails + """ + if not text_input or not system_prompt: + raise ValueError("text_input and system_prompt are required") - response = self.aclient.chat.completions.create( - model=self.model, - messages=[ - { - "role": "user", - "content": f"Use the given format to extract information from the following input: {text_input}", - }, - { - "role": "system", - "content": system_prompt, - }, - ], - max_retries=5, - response_model=response_model, - ) + try: + response = await self.aclient.chat.completions.create( + model=self.model, + messages=[ + { + "role": "user", + "content": f"Use the given format to extract information from the following input: {text_input}", + }, + { + "role": "system", + "content": system_prompt, + }, + ], + max_retries=5, + response_model=response_model, + temperature=0.7, # Add configurable parameters + timeout=30, # Add timeout + ) + return response + except Exception as e: + raise OpenAIError(f"Failed to generate structured output: {str(e)}") - return response

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def acreate_structured_output(

self, text_input: str, system_prompt: str, response_model: Type[BaseModel]

) -> BaseModel:

"""Generate a structured output from the LLM using the provided text and system prompt."""

response = self.aclient.chat.completions.create(

model=self.model,

messages=[

{

"role": "user",

"content": f"Use the given format to extract information from the following input: {text_input}",

},

{

"role": "system",

"content": system_prompt,

},

],

max_retries=5,

response_model=response_model,

)

return response

async def acreate_structured_output(

self, text_input: str, system_prompt: str, response_model: Type[BaseModel]

) -> BaseModel:

"""Generate a structured output from Ollama using the provided text and system prompt.

Args:

text_input: The input text to process

system_prompt: The system prompt to guide the model

response_model: Pydantic model for response structure

Returns:

BaseModel: Structured response matching response_model

Raises:

OpenAIError: If API call fails

ValueError: If input validation fails

"""

if not text_input or not system_prompt:

raise ValueError("text_input and system_prompt are required")

try:

response = await self.aclient.chat.completions.create(

model=self.model,

messages=[

{

"role": "user",

"content": f"Use the given format to extract information from the following input: {text_input}",

},

{

"role": "system",

"content": system_prompt,

},

],

max_retries=5,

response_model=response_model,

temperature=0.7, # Add configurable parameters

timeout=30, # Add timeout

)

return response

except Exception as e:

raise OpenAIError(f"Failed to generate structured output: {str(e)}")

hajdul88 added 4 commits February 18, 2025 14:36

feat: implements separate adapter for ollama

d69c208

feat: implements separate embedding engine for ollama

def9828

feat: integrates OllamaApiAdapter into get llm client

cd66421

feat: adds huggingface tokenizer parameter to config (only ollama emb…

843c25f

…edding)

hajdul88 marked this pull request as draft February 18, 2025 14:32

Merge branch 'dev' into feature/cog-1358-local-ollama-model-support-f…

bb4f469

…or-cognee

hajdul88 added the run-checks label Feb 18, 2025

hajdul88 requested review from Vasilije1990, borisarzentar and dexters1 and removed request for Vasilije1990, borisarzentar and dexters1 February 18, 2025 18:13

Vasilije1990 and others added 3 commits February 18, 2025 10:48

Merge branch 'dev' into feature/cog-1358-local-ollama-model-support-f…

927b758

…or-cognee

Remove testing part

93b04ab

Remove testing part

2325a88

Vasilije1990 approved these changes Feb 19, 2025

View reviewed changes

Vasilije1990 marked this pull request as ready for review February 19, 2025 01:50

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

Vasilije1990 merged commit 0bcaf5c into dev Feb 19, 2025
36 of 37 checks passed

Vasilije1990 deleted the feature/cog-1358-local-ollama-model-support-for-cognee branch February 19, 2025 01:54

coderabbitai bot mentioned this pull request Feb 22, 2025

feat: Draft ollama test #566

Merged

coderabbitai bot mentioned this pull request Nov 5, 2025

Handle multiple response formats in OllamaEmbeddingEngine #1735

Merged

12 tasks

coderabbitai bot mentioned this pull request Nov 19, 2025

Fix Ollama embedding response key handling #1809

Closed

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/cog 1358 local ollama model support for cognee #555

Feature/cog 1358 local ollama model support for cognee #555

Uh oh!

hajdul88 commented Feb 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature/cog 1358 local ollama model support for cognee #555

Feature/cog 1358 local ollama model support for cognee #555

Uh oh!

Conversation

hajdul88 commented Feb 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DCO Affirmation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hajdul88 commented Feb 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)