-
Notifications
You must be signed in to change notification settings - Fork 953
Feature/cog 1358 local ollama model support for cognee #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cog 1358 local ollama model support for cognee #555
Conversation
WalkthroughThe changes integrate an Ollama-based service into the system. A new Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant EE as OllamaEmbeddingEngine
participant HTTP as HTTP Client
participant EX as External Embedding Service
participant Env as Environment
C->>EE: call embed_text(prompts)
EE->>Env: Check MOCK_EMBEDDING
alt MOCK Enabled
EE-->>C: Return zero vectors
else
loop For each prompt
EE->>HTTP: _get_embedding(prompt)
HTTP->>EX: Send async request
EX-->>HTTP: Return embedding vector / error
note over EE,HTTP: Retry with exponential backoff if needed
end
EE-->>C: Return list of embeddings
end
sequenceDiagram
participant C as Client
participant G as get_llm_client
participant OA as OllamaAPIAdapter
participant LLM as External LLM Service
C->>G: Request LLM client for "OLLAMA" provider
G->>OA: Instantiate OllamaAPIAdapter
C->>OA: call acreate_structured_output(text, system_prompt, response_model)
OA->>LLM: Send chat completion request (with retry logic)
LLM-->>OA: Return structured response
OA-->>C: Return output as BaseModel
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)
14-23: Class attributes are well-declared and documented, but consider specifying type hints at the class level.
You might want to leverage typed class attributes with Python 3.9+ (e.g.,model: Optional[str]at top) to enable better IDE support and static checks on initialization.
24-43: Environment-based mocking is convenient, but ensure consistent boolean parsing.
Currently,enable_mockingis cast to string and checked against"true", "1", "yes". For added clarity and future maintainability, consider consolidating or normalizing similar environment variables in a helper function or config.
44-57: Async embedding loop is correct, but consider parallelization for performance.
The current usage offor prompt in text:processes embeddings sequentially, which may be slow under high load. You could gather tasks concurrently to speed up large batch embeddings.cognee/infrastructure/llm/ollama/adapter.py (2)
9-10: Improve class documentation to be Ollama-specific.The current docstring is generic and doesn't reflect that this is specifically an Ollama adapter.
- """Adapter for a Generic API LLM provider using instructor with an OpenAI backend.""" + """Adapter for Ollama LLM provider using instructor with an OpenAI-compatible backend."""
12-21: Add input validation and type hints.The constructor should validate required parameters and include type hints for better code maintainability.
- def __init__(self, endpoint: str, api_key: str, model: str, name: str, max_tokens: int): + def __init__( + self, + endpoint: str, + api_key: str, + model: str, + name: str, + max_tokens: int, + ) -> None: + if not endpoint or not api_key or not model: + raise ValueError("endpoint, api_key, and model are required parameters") + if max_tokens <= 0: + raise ValueError("max_tokens must be positive") self.name = name self.model = model self.api_key = api_key self.endpoint = endpoint self.max_tokens = max_tokenscognee/infrastructure/llm/get_llm_client.py (1)
54-54: Remove unused import of GenericAPIAdapter.The import of
GenericAPIAdapteris no longer needed in the OLLAMA case.- from .generic_llm_api.adapter import GenericAPIAdapter return OllamaAPIAdapter( llm_config.llm_endpoint, llm_config.llm_api_key, llm_config.llm_model, "Ollama", max_tokens=max_tokens, )Also applies to: 56-62
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(1 hunks)cognee/infrastructure/databases/vector/embeddings/config.py(1 hunks)cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py(1 hunks)cognee/infrastructure/llm/get_llm_client.py(2 hunks)cognee/infrastructure/llm/ollama/adapter.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Test on macos-15
- GitHub Check: Test on ubuntu-22.04
🔇 Additional comments (5)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)
1-13: Great job structuring your imports and initial setup!
This segment cleanly organizes your dependencies, includinghttpxfor async requests and the logging configuration. It follows Pythonic style for clarity. No issues to address here.
58-91: Robust retry logic with exponential backoff looks good, but handle unexpected JSON schema gracefully.
Ifdata["embedding"]is missing or malformed, the code will raise a KeyError. You may want to catch or check if"embedding"exists before returning the data to provide a clearer error or fallback.
92-102: Tokenizer acquisition is straightforward, no major issues noted.
It’s good that you’re keeping the tokenizer logic contained and easily testable.cognee/infrastructure/databases/vector/embeddings/config.py (1)
14-14: Optional tokenizer addition aligns well with your new Ollama engine.
This extra attribute cleanly extends the configuration without breaking existing usage.cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py (1)
19-27: Support for the new "ollama" provider is well integrated.
The new branch correctly initializes theOllamaEmbeddingEnginewith the relevant configuration. Ensure that any future updates to the Ollama engine (e.g., new parameters) also get passed here.
| async def acreate_structured_output( | ||
| self, text_input: str, system_prompt: str, response_model: Type[BaseModel] | ||
| ) -> BaseModel: | ||
| """Generate a structured output from the LLM using the provided text and system prompt.""" | ||
|
|
||
| response = self.aclient.chat.completions.create( | ||
| model=self.model, | ||
| messages=[ | ||
| { | ||
| "role": "user", | ||
| "content": f"Use the given format to extract information from the following input: {text_input}", | ||
| }, | ||
| { | ||
| "role": "system", | ||
| "content": system_prompt, | ||
| }, | ||
| ], | ||
| max_retries=5, | ||
| response_model=response_model, | ||
| ) | ||
|
|
||
| return response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling and configuration options.
The method should handle API errors and allow configuration of model parameters.
async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> BaseModel:
- """Generate a structured output from the LLM using the provided text and system prompt."""
+ """Generate a structured output from Ollama using the provided text and system prompt.
+
+ Args:
+ text_input: The input text to process
+ system_prompt: The system prompt to guide the model
+ response_model: Pydantic model for response structure
+
+ Returns:
+ BaseModel: Structured response matching response_model
+
+ Raises:
+ OpenAIError: If API call fails
+ ValueError: If input validation fails
+ """
+ if not text_input or not system_prompt:
+ raise ValueError("text_input and system_prompt are required")
- response = self.aclient.chat.completions.create(
- model=self.model,
- messages=[
- {
- "role": "user",
- "content": f"Use the given format to extract information from the following input: {text_input}",
- },
- {
- "role": "system",
- "content": system_prompt,
- },
- ],
- max_retries=5,
- response_model=response_model,
- )
+ try:
+ response = await self.aclient.chat.completions.create(
+ model=self.model,
+ messages=[
+ {
+ "role": "user",
+ "content": f"Use the given format to extract information from the following input: {text_input}",
+ },
+ {
+ "role": "system",
+ "content": system_prompt,
+ },
+ ],
+ max_retries=5,
+ response_model=response_model,
+ temperature=0.7, # Add configurable parameters
+ timeout=30, # Add timeout
+ )
+ return response
+ except Exception as e:
+ raise OpenAIError(f"Failed to generate structured output: {str(e)}")
- return response📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def acreate_structured_output( | |
| self, text_input: str, system_prompt: str, response_model: Type[BaseModel] | |
| ) -> BaseModel: | |
| """Generate a structured output from the LLM using the provided text and system prompt.""" | |
| response = self.aclient.chat.completions.create( | |
| model=self.model, | |
| messages=[ | |
| { | |
| "role": "user", | |
| "content": f"Use the given format to extract information from the following input: {text_input}", | |
| }, | |
| { | |
| "role": "system", | |
| "content": system_prompt, | |
| }, | |
| ], | |
| max_retries=5, | |
| response_model=response_model, | |
| ) | |
| return response | |
| async def acreate_structured_output( | |
| self, text_input: str, system_prompt: str, response_model: Type[BaseModel] | |
| ) -> BaseModel: | |
| """Generate a structured output from Ollama using the provided text and system prompt. | |
| Args: | |
| text_input: The input text to process | |
| system_prompt: The system prompt to guide the model | |
| response_model: Pydantic model for response structure | |
| Returns: | |
| BaseModel: Structured response matching response_model | |
| Raises: | |
| OpenAIError: If API call fails | |
| ValueError: If input validation fails | |
| """ | |
| if not text_input or not system_prompt: | |
| raise ValueError("text_input and system_prompt are required") | |
| try: | |
| response = await self.aclient.chat.completions.create( | |
| model=self.model, | |
| messages=[ | |
| { | |
| "role": "user", | |
| "content": f"Use the given format to extract information from the following input: {text_input}", | |
| }, | |
| { | |
| "role": "system", | |
| "content": system_prompt, | |
| }, | |
| ], | |
| max_retries=5, | |
| response_model=response_model, | |
| temperature=0.7, # Add configurable parameters | |
| timeout=30, # Add timeout | |
| ) | |
| return response | |
| except Exception as e: | |
| raise OpenAIError(f"Failed to generate structured output: {str(e)}") |
This PR contains the ollama specific llm adapter together with the embedding engine.
Tested with the following models:
LLM_API_KEY="ollama" llm_model = "llama3.1:8b" LLM_PROVIDER = "ollama" llm_endpoint = "http://localhost:11434/v1" EMBEDDING_PROVIDER="ollama" EMBEDDING_MODEL="avr/sfr-embedding-mistral:latest" EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings" EMBEDDING_DIMENSIONS=4096 HUGGINGFACE_TOKENIZER="Salesforce/SFR-Embedding-Mistral"DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin
Summary by CodeRabbit
New Features
Enhancements