-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat: Draft ollama test #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 44 commits
c572e27
2de364b
8ebd9a7
9d0d96e
b4088be
b670697
bfe039d
6bc4f6a
96adcfb
c06c28d
edd681f
02b0109
a91e83e
326c418
92602aa
f2d0909
97465f1
73662b8
90d96aa
3a88b94
11442df
1dfb0dd
4c4723b
846c45e
2c0bfc8
91512cd
5c7b4a5
7a85e71
0bba1f8
061fbbd
0ed6aa6
3090333
80ccf55
ce8c2da
65927b3
6463c2e
70f9b5f
468268c
c72b12d
c224556
ec9bbca
cabbfd6
c329cef
44f02df
6b49078
fe7da60
a77655a
cfc93e3
647d872
01bb8cb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| name: test | gemini | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| pull_request: | ||
| types: [labeled, synchronize] | ||
|
|
||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| run_simple_example_test: | ||
| uses: ./.github/workflows/reusable_python_example.yml | ||
| with: | ||
| example-location: ./examples/python/simple_example.py | ||
| secrets: | ||
| OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
| GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }} | ||
| GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }} | ||
| EMBEDDING_PROVIDER: "gemini" | ||
| EMBEDDING_API_KEY: ${{ secrets.GEMINI_API_KEY }} | ||
| EMBEDDING_MODEL: "gemini/text-embedding-004" | ||
| EMBEDDING_ENDPOINT: "https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004" | ||
| EMBEDDING_API_VERSION: "v1beta" | ||
| EMBEDDING_DIMENSIONS: 768 | ||
| EMBEDDING_MAX_TOKENS: 8076 | ||
| LLM_PROVIDER: "gemini" | ||
| LLM_API_KEY: ${{ secrets.GEMINI_API_KEY }} | ||
| LLM_MODEL: "gemini/gemini-1.5-flash" | ||
| LLM_ENDPOINT: "https://generativelanguage.googleapis.com/" | ||
| LLM_API_VERSION: "v1beta" | ||
|
Comment on lines
+18
to
+33
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Secrets Mismatch with Reusable Workflow The job passes several secrets that are not defined in the reusable workflow (
Please either update the reusable workflow file to accept these additional secrets (if they are necessary for the workflow’s operation) or remove them from here to avoid potential configuration issues. 🧰 Tools🪛 actionlint (1.7.4)22-22: secret "EMBEDDING_PROVIDER" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 23-23: secret "EMBEDDING_API_KEY" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 24-24: secret "EMBEDDING_MODEL" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 25-25: secret "EMBEDDING_ENDPOINT" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 26-26: secret "EMBEDDING_API_VERSION" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 27-27: secret "EMBEDDING_DIMENSIONS" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 28-28: secret "EMBEDDING_MAX_TOKENS" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 29-29: secret "LLM_PROVIDER" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 31-31: secret "LLM_MODEL" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 32-32: secret "LLM_ENDPOINT" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) 33-33: secret "LLM_API_VERSION" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY" (workflow-call) |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| name: test | ollama | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| pull_request: | ||
| types: [ labeled, synchronize ] | ||
|
|
||
| jobs: | ||
|
|
||
| run_simple_example_test: | ||
|
|
||
| runs-on: ubuntu-latest | ||
| # services: | ||
| # ollama: | ||
| # image: ollama/ollama | ||
| # ports: | ||
| # - 11434:11434 | ||
|
|
||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Setup Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.12.x' | ||
|
|
||
| - name: Install Poetry | ||
| uses: snok/install-poetry@v1.4.1 | ||
| with: | ||
| virtualenvs-create: true | ||
| virtualenvs-in-project: true | ||
| installer-parallel: true | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| poetry install --no-interaction --all-extras | ||
| poetry add torch | ||
|
|
||
| # - name: Install ollama | ||
| # run: curl -fsSL https://ollama.com/install.sh | sh | ||
| # - name: Run ollama | ||
| # run: | | ||
| # ollama serve --openai & | ||
| # ollama pull llama3.2 & | ||
| # ollama pull avr/sfr-embedding-mistral:latest | ||
|
|
||
| - name: Start Ollama container | ||
| run: | | ||
| docker run -d --name ollama -p 11434:11434 ollama/ollama | ||
| sleep 5 | ||
| docker exec -d ollama bash -c "ollama serve --openai" | ||
|
|
||
| - name: Check Ollama logs | ||
| run: docker logs ollama | ||
|
|
||
| - name: Wait for Ollama to be ready | ||
| run: | | ||
| for i in {1..30}; do | ||
| if curl -s http://localhost:11434/v1/models > /dev/null; then | ||
| echo "Ollama is ready" | ||
| exit 0 | ||
| fi | ||
| echo "Waiting for Ollama... attempt $i" | ||
| sleep 2 | ||
| done | ||
| echo "Ollama failed to start" | ||
| exit 1 | ||
|
|
||
| - name: Pull required Ollama models | ||
| run: | | ||
| curl -X POST http://localhost:11434/api/pull -d '{"name": "llama3.2"}' | ||
| curl -X POST http://localhost:11434/api/pull -d '{"name": "avr/sfr-embedding-mistral:latest"}' | ||
|
|
||
| - name: Call ollama API | ||
| run: | | ||
| curl -X POST http://localhost:11434/v1/chat/completions \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "model": "llama3.2", | ||
| "stream": false, | ||
| "messages": [ | ||
| { "role": "system", "content": "You are a helpful assistant." }, | ||
| { "role": "user", "content": "Whatever I say, answer with Yes." } | ||
| ] | ||
| }' | ||
| curl -X POST http://127.0.0.1:11434/v1/embeddings \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "model": "avr/sfr-embedding-mistral:latest", | ||
| "input": "This is a test sentence to generate an embedding." | ||
| }' | ||
|
|
||
| - name: Dump Docker logs | ||
| run: | | ||
| docker ps | ||
| docker logs $(docker ps --filter "ancestor=ollama/ollama" --format "{{.ID}}") | ||
|
|
||
|
|
||
| - name: Run example test | ||
| env: | ||
| OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
| GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }} | ||
| GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }} | ||
| PYTHONFAULTHANDLER: 1 | ||
| LLM_PROVIDER: "ollama" | ||
| LLM_API_KEY: "ollama" | ||
| LLM_ENDPOINT: "http://localhost:11434/v1/" | ||
| LLM_MODEL: "llama3.2" | ||
| EMBEDDING_PROVIDER: "ollama" | ||
| EMBEDDING_MODEL: "avr/sfr-embedding-mistral:latest" | ||
| EMBEDDING_ENDPOINT: "http://localhost:11434/v1/" | ||
| EMBEDDING_DIMENSIONS: "4096" | ||
| HUGGINGFACE_TOKENIZER: "Salesforce/SFR-Embedding-Mistral" | ||
| run: poetry run python ./examples/python/simple_example.py |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,44 +1,109 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from typing import Type | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from typing import Type, Optional | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from pydantic import BaseModel | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import instructor | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from cognee.infrastructure.llm.llm_interface import LLMInterface | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from cognee.infrastructure.llm.config import get_llm_config | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from openai import OpenAI | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from openai import AsyncOpenAI # Use AsyncOpenAI for async compatibility | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import base64 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| class OllamaAPIAdapter(LLMInterface): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Adapter for a Generic API LLM provider using instructor with an OpenAI backend.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Adapter for an Ollama API LLM provider using instructor with an OpenAI backend.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def __init__(self, endpoint: str, api_key: str, model: str, name: str, max_tokens: int): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MAX_RETRIES = 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MAX_RETRIES = 5 | |
| transcription = self.aclient.audio.transcriptions.create( | |
| model="whisper-1", # Ensure the correct model for transcription | |
| file=audio_file, | |
| language="en", | |
| max_retries=self.MAX_RETRIES, | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix issues in the audio transcription implementation.
This implementation has multiple critical issues:
- It uses
self.aclientinstead ofself.client - Based on previous reviews, Ollama may not support OpenAI's audio transcription API
- There's no error handling for the API call
def create_transcript(self, input_file: str) -> str:
"""Generate an audio transcript from a user query."""
if not os.path.isfile(input_file):
raise FileNotFoundError(f"The file {input_file} does not exist.")
- with open(input_file, "rb") as audio_file:
- transcription = self.aclient.audio.transcriptions.create(
- model="whisper-1", # Ensure the correct model for transcription
- file=audio_file,
- language="en",
- )
-
- # Ensure the response contains a valid transcript
- if not hasattr(transcription, "text"):
- raise ValueError("Transcription failed. No text returned.")
-
- return transcription.text
+ try:
+ with open(input_file, "rb") as audio_file:
+ transcription = self.client.audio.transcriptions.create(
+ model="whisper-1", # Ensure the correct model for transcription
+ file=audio_file,
+ language="en",
+ max_retries=self.MAX_RETRIES,
+ )
+
+ # Ensure the response contains a valid transcript
+ if not hasattr(transcription, "text"):
+ raise ValueError("Transcription failed. No text returned.")
+
+ return transcription.text
+ except Exception as e:
+ raise Exception(f"Error transcribing audio: {str(e)}")🌐 Web query:
Does Ollama support OpenAI-compatible audio transcription API?
💡 Result:
Ollama does not natively support OpenAI-compatible audio transcription APIs like Whisper. However, developers can combine Ollama with OpenAI's Whisper API or third-party tools to achieve this functionality[1][3][7]. Here's how it works:
-
Audio Processing Workflow
Typical implementations use OpenAI's Whisper for speech-to-text:// Transcribe audio using OpenAI's API const transcription = await openai.audio.transcriptions.create({ model: 'whisper-1', file: audioFile, });
Then pass the text to Ollama for LLM processing[1][4].
-
Ollama's OpenAI Compatibility
While Ollama supports OpenAI-style chat completions API for text generation[2][5][8]:// Using Ollama with OpenAI-compatible chat API const response = await generateText({ model: ollama('llama3.1'), prompt: transcribedText, });
This compatibility doesn't extend to audio endpoints[2][5].
-
Alternative Implementations
Some projects combine offline tools with Ollama:- WhisperLive for real-time transcription[7]
- Ollama-Voice stack (Whisper + Ollama + pyttsx3)[10]
- Custom integrations using Nvidia GPUs[3]
For direct audio API support, developers currently need to use OpenAI's services or third-party solutions alongside Ollama's text capabilities[1][3][7]. The native Ollama API focuses on text generation through its OpenAI-compatible chat endpoints[2][5][8].
Citations:
- 1: https://www.thisdot.co/blog/how-to-build-an-ai-assistant-with-openai-vercel-ai-sdk-and-ollama-with-next
- 2: https://ollama.com/blog/openai-compatibility
- 3: https://www.restack.io/p/ollama-whisperx-answer-cat-ai
- 4: https://dev.to/josmel/automating-it-interviews-with-ollama-and-audio-capabilities-in-python-545o
- 5: https://bestofai.com/article/openai-compatibility-ollama-blog
- 6: Realtime API like OpenAI (full fledged voice to voice integrations) ollama/ollama#7514
- 7: https://www.singleapi.net/2024/09/23/offline-whisper-audio-transcription-and-ollama-voice-assistant/
- 8: https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html
- 9: https://docs.openwebui.com/features/
- 10: https://github.com/maudoin/ollama-voice
Critical: Audio Transcription Implementation Requires Revision
The recent changes correctly update the client reference (from self.aclient to self.client) and add error handling with a retry mechanism. However, the core issue still remains: Ollama does not natively support OpenAI’s audio transcription API (e.g., the Whisper endpoint). To ensure reliable functionality, please consider one of the following improvements:
- Integrate a Supported Transcription Service: If audio transcription is required, use a dedicated service (such as OpenAI’s Whisper API or another third-party solution) rather than relying on Ollama’s endpoints.
- Refactor the Workflow: Separate the transcription step from the LLM workflow. This will help avoid confusion and prevent the use of unsupported API calls in the Ollama adapter.
Affected Code Location:
cognee/infrastructure/llm/ollama/adapter.py(lines 52-69)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def create_transcript(self, input_file: str) -> str: | |
| """Generate an audio transcript from a user query.""" | |
| if not os.path.isfile(input_file): | |
| raise FileNotFoundError(f"The file {input_file} does not exist.") | |
| with open(input_file, "rb") as audio_file: | |
| transcription = self.aclient.audio.transcriptions.create( | |
| model="whisper-1", # Ensure the correct model for transcription | |
| file=audio_file, | |
| language="en", | |
| ) | |
| # Ensure the response contains a valid transcript | |
| if not hasattr(transcription, "text"): | |
| raise ValueError("Transcription failed. No text returned.") | |
| return transcription.text | |
| def create_transcript(self, input_file: str) -> str: | |
| """Generate an audio transcript from a user query.""" | |
| if not os.path.isfile(input_file): | |
| raise FileNotFoundError(f"The file {input_file} does not exist.") | |
| try: | |
| with open(input_file, "rb") as audio_file: | |
| transcription = self.client.audio.transcriptions.create( | |
| model="whisper-1", # Ensure the correct model for transcription | |
| file=audio_file, | |
| language="en", | |
| max_retries=self.MAX_RETRIES, | |
| ) | |
| # Ensure the response contains a valid transcript | |
| if not hasattr(transcription, "text"): | |
| raise ValueError("Transcription failed. No text returned.") | |
| return transcription.text | |
| except Exception as e: | |
| raise Exception(f"Error transcribing audio: {str(e)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix client reference and add error handling in transcribe_image.
The transcribe_image method uses the base64 encoding correctly, but it should use self.client instead of self.aclient and include error handling.
def transcribe_image(self, input_file: str) -> str:
"""Transcribe content from an image using base64 encoding."""
if not os.path.isfile(input_file):
raise FileNotFoundError(f"The file {input_file} does not exist.")
- with open(input_file, "rb") as image_file:
- encoded_image = base64.b64encode(image_file.read()).decode("utf-8")
+ try:
+ with open(input_file, "rb") as image_file:
+ encoded_image = base64.b64encode(image_file.read()).decode("utf-8")Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined client reference.
The code uses self.aclient, but only self.client is defined in the constructor. This will cause a runtime error.
-response = self.aclient.chat.completions.create(
+response = self.client.chat.completions.create(📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = self.aclient.chat.completions.create( | |
| response = self.client.chat.completions.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify Ollama supports multimodal content in this format.
The implementation assumes Ollama supports OpenAI's multimodal message format. Verify that Ollama's API accepts this structure for image analysis.
🌐 Web query:
Does Ollama support OpenAI-compatible image analysis with base64 encoded images?
💡 Result:
Ollama supports image analysis using base64-encoded images through its vision models like LLaVA and Llama 3.2 Vision, though implementation details differ slightly from OpenAI's approach. Here's how it works:
Base64 Image Handling
-
Encoding requirement:
Images must be converted to base64 strings without thedata:image/...;base64,prefix[6][8]. For example:def encode_image_to_base64(image_path): with Image.open(image_path) as img: buffered = io.BytesIO() img.save(buffered, format="PNG") return base64.b64encode(buffered.getvalue()).decode('utf-8')
This matches OpenAI's requirement for raw base64 data[6][8].
-
API integration:
Base64 strings are passed via theimagesparameter in chat requests:response = ollama.chat( model="llava:13b", messages=[{ "role": "user", "content": "Describe this image:", "images": [base64_image_string] }] )
This structure mirrors OpenAI's vision API format[7][9].
Key Differences from OpenAI
| Feature | Ollama | OpenAI |
|---|---|---|
| Model options | LLaVA, Llama 3.2 Vision | GPT-4 Vision |
| Local execution | ✅ Runs entirely offline | ❌ Cloud-only |
| Base64 prefix handling | Requires raw string | Requires prefix stripping |
| Model licensing | Apache 2.0/LLaMA license | Proprietary |
Supported Use Cases
- Document analysis: Receipts, charts, and diagrams[3][7]
- Object detection: Local security camera integration[5]
- OCR: Book cover text extraction[2]
- Multimodal RAG systems: Combined text/image retrieval[3]
To implement this with Llama 3.2 Vision:
ollama run llama3.2-visionThen use either file paths or base64 strings in your API calls[9][10]. The system supports both CLI and programmatic access through Python/JavaScript libraries[7][9].
Citations:
- 1: https://github.com/codearrangertoo/ollama-vision
- 2: https://www.datadrifters.com/blog/llama-32vision-for-high-precision-ocr-with-ollama
- 3: https://ai.plainenglish.io/ollamas-new-vision-model-support-a-comprehensive-guide-ba1da6c4f45b
- 4: https://python.langchain.com/v0.1/docs/integrations/llms/ollama/
- 5: https://github.com/the-smart-home-maker/hass_ollama_image_analysis
- 6: API Llava Image Path ollama/ollama#1547
- 7: https://ollama.com/blog/vision-models
- 8: Ollama / Llava should be able to take in Base64 format w/ prefix when prompting with image ollama/ollama-js#68
- 9: https://www.youtube.com/watch?v=UjiUwNTErRE
- 10: https://ollama.com/blog/llama3.2-vision
Action: Update Image Handling to Meet Ollama API Requirements
We've confirmed that while Ollama does support image analysis with base64-encoded images, it requires some important adjustments compared to OpenAI’s multimodal message format:
- Strip the Prefix: Ollama expects a raw base64 string without the
data:image/jpeg;base64,prefix. - Parameter Structure: Instead of using the
image_urlfield, the API typically accepts images via animagesparameter in the chat message.
Please update the code in cognee/infrastructure/llm/ollama/adapter.py (lines 82–89) accordingly to comply with these requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Update image handling to meet Ollama API requirements.
The current implementation assumes Ollama supports OpenAI's multimodal message format, but according to previous verification, Ollama requires different formatting for image processing.
-response = self.client.chat.completions.create(
+response = self.client.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
- "content": [
- {"type": "text", "text": "What's in this image?"},
- {
- "type": "image_url",
- "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"},
- },
- ],
+ "content": "What's in this image?",
+ "images": [encoded_image], # Ollama expects raw base64 strings without prefix
}
],🌐 Web query:
Does Ollama API support image analysis with content structure like OpenAI or does it require different format?
💡 Result:
Ollama's API supports image analysis with structured content output capabilities similar to OpenAI, though implementation details differ. Here's a technical breakdown:
Image Analysis Capabilities
Ollama supports vision models like LLaVA (v1.6) that enable:
- Object detection
- Text recognition in images
- Detailed scene description
- Chart/diagram interpretation [9]
The API accepts images through:
- File paths in CLI (
./image.jpg) - Base64-encoded data in programmatic calls
- Direct image URLs [9]
# Python example using Ollama's vision API
import ollama
response = ollama.chat(
model='llava:13b',
messages=[{
'role': 'user',
'content': 'Analyze this security footage:',
'images': ['/path/to/image.jpg']
}]
)
print(response['message']['content'])Structured Output Formatting
Ollama introduced JSON schema constraints in December 2024:
# Structured output example with image analysis
response = ollama.generate(
model='llava:34b',
prompt="Extract vehicle details from traffic cam image",
format={
"type": "object",
"properties": {
"make": {"type": "string"},
"color": {"type": "string"},
"license_plate": {"type": "string"}
}
}
)Key differences from OpenAI:
- Requires explicit JSON schema definition
- Uses native
formatparameter instead of separate JSON mode flag - Supports vision models through multimodal LLMs [3][7]
Implementation Considerations
-
Model Requirements: Must use vision-capable models like:
llava:7b,llava:13b,llava:34bbakllavamoondream[9]
-
Resolution Handling:
- Supports up to 4MP images (4x previous versions)
- Automatic downsampling for large images [9]
-
API Compatibility:
# OpenAI-style implementation with Ollama from openai import OpenAI client = OpenAI( base_url='http://localhost:11434/v1', api_key='ollama' ) response = client.chat.completions.create( model="llava:13b", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Describe this medical scan"}, {"type": "image_url", "image_url": {"url": "base64_encoded_image"}} ] }] )
This compatibility layer allows direct porting of OpenAI-based implementations [2][8].
For production systems, Ollama recommends combining vision models with structured output constraints to ensure reliable parsing of image analysis results [4][7].
Citations:
- 1: https://www.restack.io/p/ollama-answer-image-recognition-cat-ai
- 2: https://ollama.com/blog/openai-compatibility
- 3: https://ollama.com/blog/structured-outputs
- 4: https://python.useinstructor.com/examples/ollama/
- 5: https://github.com/the-smart-home-maker/hass_ollama_image_analysis
- 6: https://community.home-assistant.io/t/llm-vision-let-home-assistant-see/729241
- 7: https://github.com/ollama/ollama/blob/main/docs/api.md?plain=1
- 8: https://www.cohorte.co/blog/a-comprehensive-guide-to-ollama
- 9: https://ollama.com/blog/vision-models
- 10: https://www.restack.io/p/ollama-answer-comparison-cat-ai
Action Required: Update Image Payload Format for Ollama API Compatibility
The current implementation uses OpenAI’s multimodal message format. Based on updated Ollama API guidelines, you should remove the nested content structure and instead provide the query as a plain text string along with an “images” field containing the raw base64-encoded string (without the MIME data prefix). This will ensure proper image processing with Ollama’s vision models.
- In file cognee/infrastructure/llm/ollama/adapter.py (lines 64–76), update the message payload as shown below.
- Remove the image encapsulation within the “content” array.
- Supply the encoded image directly in the “images” list without the
"data:image/jpeg;base64,"prefix.
-response = self.client.chat.completions.create(
- model=self.model,
- messages=[
- {
- "role": "user",
- "content": [
- {"type": "text", "text": "What's in this image?"},
- {
- "type": "image_url",
- "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"},
- },
- ],
- }
- ],
+response = self.client.chat.completions.create(
+ model=self.model,
+ messages=[
+ {
+ "role": "user",
+ "content": "What's in this image?",
+ "images": [encoded_image], # Ollama expects raw base64 strings without the MIME prefix
+ }
+ ],Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use instance variable for max_tokens
Replace the hardcoded max_tokens value with the instance variable for consistency.
- max_tokens=300,
+ max_tokens=self.max_tokens,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| max_tokens=300, | |
| ) | |
| max_tokens=self.max_tokens, | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix client reference and update image payload format for Ollama compatibility.
This implementation has several issues:
- It uses
self.aclientinstead ofself.client - The format for image processing is incompatible with Ollama's API
- It uses a hardcoded
max_tokensvalue instead of the class variable - There's no error handling for the API call
-response = self.aclient.chat.completions.create(
- model=self.model,
- messages=[
- {
- "role": "user",
- "content": [
- {"type": "text", "text": "What's in this image?"},
- {
- "type": "image_url",
- "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"},
- },
- ],
- }
- ],
- max_tokens=300,
-);
+ response = self.client.chat.completions.create(
+ model=self.model,
+ messages=[
+ {
+ "role": "user",
+ "content": "What's in this image?",
+ "images": [encoded_image], # Ollama expects raw base64 strings
+ }
+ ],
+ max_tokens=self.max_tokens,
+ max_retries=self.MAX_RETRIES,
+ )🌐 Web query:
What is the correct format for sending base64 encoded images to Ollama API?
💡 Result:
To send base64-encoded images to Ollama API, include them as an array under the images parameter in your request body. Here's how to format it correctly:
Image Formatting Requirements
- Base64 encoding: Images must be converted to base64 strings without additional metadata prefixes (e.g., remove "data:image/png;base64,")
- Array structure: Wrap images in an array even for single-image requests
- Multimodal models: Use supported models like LLaVA or Llama 3.2 Vision
Example Request Structure
{
"model": "llava:13b",
"prompt": "Describe this image",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAY..."]
}Key parameters:
model: Must support multimodal processingprompt: Your text query/instructionimages: Array of base64-encoded image strings[1][3][4]
Implementation Examples
Python:
import base64
import ollama
with open("image.jpg", "rb") as f:
b64_image = base64.b64encode(f.read()).decode("utf-8")
response = ollama.chat(
model="llava:13b",
messages=[{
"role": "user",
"content": "What's in this image?",
"images": [b64_image]
}]
)cURL:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2-vision",
"prompt": "Explain this diagram",
"images": ["/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAYEBQY..."]
}'Supported Models & Requirements
| Model | VRAM Requirement | Base64 Handling |
|---|---|---|
| LLaVA 13B | ≥8GB | Direct string array |
| Llama3.2-Vision | 11B:8GB, 90B:64GB | File path conversion required |
For optimal results:
- Use Ollama v0.4+ for vision models[8]
- Remove metadata prefixes from base64 strings[9]
- Combine images with text prompts for contextual analysis[2][6]
This format works across all Ollama endpoints supporting multimodal input, including /api/generate and /api/chat[1][8].
Citations:
- 1: https://hexdocs.pm/ollama/0.3.0/Ollama.API.html
- 2: https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html
- 3: https://github.com/ollama/ollama/blob/main/docs/api.md?plain=1
- 4: https://benyoung.blog/blog/your-private-ai-can-have-eyes-ollama-with-the-llava-model/
- 5: https://geshan.com.np/blog/2025/02/ollama-api/
- 6: https://www.runcomfy.com/comfyui-nodes/ComfyUI-OllamaGemini/ollama-api
- 7: https://www.postman.com/postman-student-programs/ollama-api/documentation/suc47x8/ollama-rest-api
- 8: https://ollama.com/blog/llama3.2-vision
- 9: API Llava Image Path ollama/ollama#1547
- 10: https://pipedream.com/apps/hyperise/integrations/ollama/generate-chat-completion-with-ollama-api-on-new-image-impression-from-hyperise-api-int_z3sVGLQG
Action Required: Update Image Handling & Add Error Handling
The updated code correctly fixes the client reference, updates the image payload format (using the images array with a raw base64 string), and replaces the hardcoded max_tokens with the class variable while also adding max_retries. However, the API call still lacks error handling. Please ensure that:
- Encoded Image: The
encoded_imagevariable contains a raw base64 string (i.e. any "data:image/jpeg;base64," prefix is removed) as required by Ollama’s API. - Error Handling: Wrap the API call in an error handling block (e.g., try/except) to manage potential API failures gracefully.
- Consistency: Verify that all usages of the client and token/retry values across the codebase adhere to these updated patterns.
Committable suggestion skipped: line range outside the PR's diff.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add proper error handling for the entire API call.
While you've added validation for the response, you should wrap the entire API call in a try-except block to handle API exceptions properly.
+ except Exception as e:
+ raise Exception(f"Error transcribing image: {str(e)}")
+
return response.choices[0].message.contentCommittable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is not consistent.