Skip to content

Conversation

@hajdul88
Copy link
Collaborator

@hajdul88 hajdul88 commented Feb 18, 2025

This PR contains the ollama specific llm adapter together with the embedding engine.

Tested with the following models:

LLM_API_KEY="ollama" llm_model = "llama3.1:8b" LLM_PROVIDER = "ollama" llm_endpoint = "http://localhost:11434/v1" EMBEDDING_PROVIDER="ollama" EMBEDDING_MODEL="avr/sfr-embedding-mistral:latest" EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings" EMBEDDING_DIMENSIONS=4096 HUGGINGFACE_TOKENIZER="Salesforce/SFR-Embedding-Mistral"

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

  • New Features

    • Introduced a new embedding option that leverages an external provider for asynchronous text processing.
    • Added enhanced language model integration using a dedicated adapter to improve interaction quality.
  • Enhancements

    • Expanded configuration settings to include a new tokenizer option.
    • Updated provider selection logic to incorporate the additional embedding and language model features.

@hajdul88 hajdul88 marked this pull request as draft February 18, 2025 14:32
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2025

Walkthrough

The changes integrate an Ollama-based service into the system. A new OllamaEmbeddingEngine class extends the existing embedding functionality with asynchronous text embedding, external API calls, and exponential backoff for retries. The configuration is updated with an optional huggingface_tokenizer attribute. The embedding engine selection logic now supports an "ollama" provider. A new OllamaAPIAdapter class has been introduced to handle LLM interactions, and the LLM client creation now instantiates this adapter when the "OLLAMA" provider is specified.

Changes

File(s) Change Summary
cognee/infrastructure/.../OllamaEmbeddingEngine.py
cognee/infrastructure/.../config.py
cognee/infrastructure/.../get_embedding_engine.py
- Added new OllamaEmbeddingEngine class with async embed_text, _get_embedding (with exponential backoff), get_vector_size, and get_tokenizer methods.
- Updated EmbeddingConfig with a huggingface_tokenizer attribute.
- Expanded engine selection logic to support the "ollama" provider.
cognee/infrastructure/llm/get_llm_client.py
cognee/infrastructure/llm/ollama/adapter.py
- Introduced new OllamaAPIAdapter class implementing LLM interactions via an async method acreate_structured_output with retry logic.
- Modified the LLM client to return an instance of OllamaAPIAdapter for the "OLLAMA" provider.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant EE as OllamaEmbeddingEngine
    participant HTTP as HTTP Client
    participant EX as External Embedding Service
    participant Env as Environment

    C->>EE: call embed_text(prompts)
    EE->>Env: Check MOCK_EMBEDDING
    alt MOCK Enabled
        EE-->>C: Return zero vectors
    else
        loop For each prompt
            EE->>HTTP: _get_embedding(prompt)
            HTTP->>EX: Send async request
            EX-->>HTTP: Return embedding vector / error
            note over EE,HTTP: Retry with exponential backoff if needed
        end
        EE-->>C: Return list of embeddings
    end
Loading
sequenceDiagram
    participant C as Client
    participant G as get_llm_client
    participant OA as OllamaAPIAdapter
    participant LLM as External LLM Service

    C->>G: Request LLM client for "OLLAMA" provider
    G->>OA: Instantiate OllamaAPIAdapter
    C->>OA: call acreate_structured_output(text, system_prompt, response_model)
    OA->>LLM: Send chat completion request (with retry logic)
    LLM-->>OA: Return structured response
    OA-->>C: Return output as BaseModel
Loading

Poem

I'm a bunny of bytes, hopping through new code lanes,
With Ollama engines and adapters boosting our gains.
I twirl through async calls and retry with grace,
Leaving no bug behind in my rapid-paced race.
My little paws tap joyful keys in digital delight,
Celebrating clean changes deep into the night.
🐇 Keep hopping, keep coding — the future's bright!

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Vasilije1990 Vasilije1990 marked this pull request as ready for review February 19, 2025 01:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)

14-23: Class attributes are well-declared and documented, but consider specifying type hints at the class level.
You might want to leverage typed class attributes with Python 3.9+ (e.g., model: Optional[str] at top) to enable better IDE support and static checks on initialization.


24-43: Environment-based mocking is convenient, but ensure consistent boolean parsing.
Currently, enable_mocking is cast to string and checked against "true", "1", "yes". For added clarity and future maintainability, consider consolidating or normalizing similar environment variables in a helper function or config.


44-57: Async embedding loop is correct, but consider parallelization for performance.
The current usage of for prompt in text: processes embeddings sequentially, which may be slow under high load. You could gather tasks concurrently to speed up large batch embeddings.

cognee/infrastructure/llm/ollama/adapter.py (2)

9-10: Improve class documentation to be Ollama-specific.

The current docstring is generic and doesn't reflect that this is specifically an Ollama adapter.

-    """Adapter for a Generic API LLM provider using instructor with an OpenAI backend."""
+    """Adapter for Ollama LLM provider using instructor with an OpenAI-compatible backend."""

12-21: Add input validation and type hints.

The constructor should validate required parameters and include type hints for better code maintainability.

-    def __init__(self, endpoint: str, api_key: str, model: str, name: str, max_tokens: int):
+    def __init__(
+        self,
+        endpoint: str,
+        api_key: str,
+        model: str,
+        name: str,
+        max_tokens: int,
+    ) -> None:
+        if not endpoint or not api_key or not model:
+            raise ValueError("endpoint, api_key, and model are required parameters")
+        if max_tokens <= 0:
+            raise ValueError("max_tokens must be positive")
         self.name = name
         self.model = model
         self.api_key = api_key
         self.endpoint = endpoint
         self.max_tokens = max_tokens
cognee/infrastructure/llm/get_llm_client.py (1)

54-54: Remove unused import of GenericAPIAdapter.

The import of GenericAPIAdapter is no longer needed in the OLLAMA case.

-        from .generic_llm_api.adapter import GenericAPIAdapter

         return OllamaAPIAdapter(
             llm_config.llm_endpoint,
             llm_config.llm_api_key,
             llm_config.llm_model,
             "Ollama",
             max_tokens=max_tokens,
         )

Also applies to: 56-62

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e98d51a and 2325a88.

📒 Files selected for processing (5)
  • cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1 hunks)
  • cognee/infrastructure/databases/vector/embeddings/config.py (1 hunks)
  • cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py (1 hunks)
  • cognee/infrastructure/llm/get_llm_client.py (2 hunks)
  • cognee/infrastructure/llm/ollama/adapter.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on ubuntu-22.04
🔇 Additional comments (5)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)

1-13: Great job structuring your imports and initial setup!
This segment cleanly organizes your dependencies, including httpx for async requests and the logging configuration. It follows Pythonic style for clarity. No issues to address here.


58-91: Robust retry logic with exponential backoff looks good, but handle unexpected JSON schema gracefully.
If data["embedding"] is missing or malformed, the code will raise a KeyError. You may want to catch or check if "embedding" exists before returning the data to provide a clearer error or fallback.


92-102: Tokenizer acquisition is straightforward, no major issues noted.
It’s good that you’re keeping the tokenizer logic contained and easily testable.

cognee/infrastructure/databases/vector/embeddings/config.py (1)

14-14: Optional tokenizer addition aligns well with your new Ollama engine.
This extra attribute cleanly extends the configuration without breaking existing usage.

cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py (1)

19-27: Support for the new "ollama" provider is well integrated.
The new branch correctly initializes the OllamaEmbeddingEngine with the relevant configuration. Ensure that any future updates to the Ollama engine (e.g., new parameters) also get passed here.

Comment on lines +23 to +44
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> BaseModel:
"""Generate a structured output from the LLM using the provided text and system prompt."""

response = self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"Use the given format to extract information from the following input: {text_input}",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=5,
response_model=response_model,
)

return response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling and configuration options.

The method should handle API errors and allow configuration of model parameters.

     async def acreate_structured_output(
         self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
     ) -> BaseModel:
-        """Generate a structured output from the LLM using the provided text and system prompt."""
+        """Generate a structured output from Ollama using the provided text and system prompt.
+        
+        Args:
+            text_input: The input text to process
+            system_prompt: The system prompt to guide the model
+            response_model: Pydantic model for response structure
+            
+        Returns:
+            BaseModel: Structured response matching response_model
+            
+        Raises:
+            OpenAIError: If API call fails
+            ValueError: If input validation fails
+        """
+        if not text_input or not system_prompt:
+            raise ValueError("text_input and system_prompt are required")
 
-        response = self.aclient.chat.completions.create(
-            model=self.model,
-            messages=[
-                {
-                    "role": "user",
-                    "content": f"Use the given format to extract information from the following input: {text_input}",
-                },
-                {
-                    "role": "system",
-                    "content": system_prompt,
-                },
-            ],
-            max_retries=5,
-            response_model=response_model,
-        )
+        try:
+            response = await self.aclient.chat.completions.create(
+                model=self.model,
+                messages=[
+                    {
+                        "role": "user",
+                        "content": f"Use the given format to extract information from the following input: {text_input}",
+                    },
+                    {
+                        "role": "system",
+                        "content": system_prompt,
+                    },
+                ],
+                max_retries=5,
+                response_model=response_model,
+                temperature=0.7,  # Add configurable parameters
+                timeout=30,  # Add timeout
+            )
+            return response
+        except Exception as e:
+            raise OpenAIError(f"Failed to generate structured output: {str(e)}")
-        return response
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> BaseModel:
"""Generate a structured output from the LLM using the provided text and system prompt."""
response = self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"Use the given format to extract information from the following input: {text_input}",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=5,
response_model=response_model,
)
return response
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> BaseModel:
"""Generate a structured output from Ollama using the provided text and system prompt.
Args:
text_input: The input text to process
system_prompt: The system prompt to guide the model
response_model: Pydantic model for response structure
Returns:
BaseModel: Structured response matching response_model
Raises:
OpenAIError: If API call fails
ValueError: If input validation fails
"""
if not text_input or not system_prompt:
raise ValueError("text_input and system_prompt are required")
try:
response = await self.aclient.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": f"Use the given format to extract information from the following input: {text_input}",
},
{
"role": "system",
"content": system_prompt,
},
],
max_retries=5,
response_model=response_model,
temperature=0.7, # Add configurable parameters
timeout=30, # Add timeout
)
return response
except Exception as e:
raise OpenAIError(f"Failed to generate structured output: {str(e)}")

@Vasilije1990 Vasilije1990 merged commit 0bcaf5c into dev Feb 19, 2025
36 of 37 checks passed
@Vasilije1990 Vasilije1990 deleted the feature/cog-1358-local-ollama-model-support-for-cognee branch February 19, 2025 01:54
@coderabbitai coderabbitai bot mentioned this pull request Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants