Skip to content
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
c572e27
Draft ollama test
Vasilije1990 Feb 19, 2025
2de364b
Ollama test end to end
Vasilije1990 Feb 20, 2025
8ebd9a7
Ollama test end to end
Vasilije1990 Feb 20, 2025
9d0d96e
Fix ollama
Vasilije1990 Feb 21, 2025
b4088be
Fix ollama
Vasilije1990 Feb 21, 2025
b670697
Fix ollama
Vasilije1990 Feb 21, 2025
bfe039d
Fix ollama
Vasilije1990 Feb 21, 2025
6bc4f6a
Fix ollama
Vasilije1990 Feb 21, 2025
96adcfb
Fix ollama
Vasilije1990 Feb 21, 2025
c06c28d
Fix ollama
Vasilije1990 Feb 21, 2025
edd681f
Fix ollama
Vasilije1990 Feb 21, 2025
02b0109
Fix ollama
Vasilije1990 Feb 21, 2025
a91e83e
Fix ollama
Vasilije1990 Feb 21, 2025
326c418
Fix ollama
Vasilije1990 Feb 21, 2025
92602aa
Fix ollama
Vasilije1990 Feb 22, 2025
f2d0909
Fix ollama
Vasilije1990 Feb 22, 2025
97465f1
Fix ollama
Vasilije1990 Feb 22, 2025
73662b8
Fix ollama
Vasilije1990 Feb 22, 2025
90d96aa
Fix ollama
Vasilije1990 Feb 22, 2025
3a88b94
Fix ollama
Vasilije1990 Feb 22, 2025
11442df
Fix ollama
Vasilije1990 Feb 22, 2025
1dfb0dd
Fix ollama
Vasilije1990 Feb 22, 2025
4c4723b
Fix ollama
Vasilije1990 Feb 22, 2025
846c45e
Fix ollama
Vasilije1990 Feb 22, 2025
2c0bfc8
Fix ollama
Vasilije1990 Feb 22, 2025
91512cd
Merge branch 'dev' into COG-1368
Vasilije1990 Feb 22, 2025
5c7b4a5
Ruff it.
soobrosa Feb 25, 2025
7a85e71
Merge branch 'dev' into COG-1368
soobrosa Feb 25, 2025
0bba1f8
Response model fun.
soobrosa Feb 25, 2025
061fbbd
OpenAI mode.
soobrosa Feb 25, 2025
0ed6aa6
Typo.
soobrosa Feb 25, 2025
3090333
Add a call, homogenous localhost.
soobrosa Feb 25, 2025
80ccf55
Should conform more.
soobrosa Feb 25, 2025
ce8c2da
Unset, my friend, unset.
soobrosa Feb 25, 2025
65927b3
Update test_ollama.yml
Vasilije1990 Feb 25, 2025
6463c2e
Update test_ollama.yml
Vasilije1990 Feb 25, 2025
70f9b5f
Update test_ollama.yml
Vasilije1990 Feb 25, 2025
468268c
Docker Composish way.
soobrosa Feb 26, 2025
c72b12d
Let's be Pydantic.
soobrosa Feb 26, 2025
c224556
Launch Docker manually.
soobrosa Feb 26, 2025
ec9bbca
Cosmetics.
soobrosa Feb 26, 2025
cabbfd6
Maybe we could fly without the Hugger.
soobrosa Feb 26, 2025
c329cef
OHMY.
soobrosa Feb 26, 2025
44f02df
Async it.
soobrosa Feb 26, 2025
6b49078
Response model.
soobrosa Feb 26, 2025
fe7da60
Will graph fly.
soobrosa Feb 26, 2025
a77655a
Oops, putting back create transcript.
soobrosa Feb 26, 2025
cfc93e3
Clean up adapter.
soobrosa Feb 26, 2025
647d872
Phi4 can respond reasonably.
soobrosa Feb 27, 2025
01bb8cb
Beefy runner.
soobrosa Feb 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/workflows/test_gemini.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: test | gemini

on:
workflow_dispatch:
pull_request:
types: [labeled, synchronize]


concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
run_simple_example_test:
uses: ./.github/workflows/reusable_python_example.yml
with:
example-location: ./examples/python/simple_example.py
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is not consistent.

GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }}
GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }}
EMBEDDING_PROVIDER: "gemini"
EMBEDDING_API_KEY: ${{ secrets.GEMINI_API_KEY }}
EMBEDDING_MODEL: "gemini/text-embedding-004"
EMBEDDING_ENDPOINT: "https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004"
EMBEDDING_API_VERSION: "v1beta"
EMBEDDING_DIMENSIONS: 768
EMBEDDING_MAX_TOKENS: 8076
LLM_PROVIDER: "gemini"
LLM_API_KEY: ${{ secrets.GEMINI_API_KEY }}
LLM_MODEL: "gemini/gemini-1.5-flash"
LLM_ENDPOINT: "https://generativelanguage.googleapis.com/"
LLM_API_VERSION: "v1beta"
Comment on lines +18 to +33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Secrets Mismatch with Reusable Workflow

The job passes several secrets that are not defined in the reusable workflow (reusable_python_example.yml). According to the static analysis hints, only OPENAI_API_KEY, GRAPHISTRY_USERNAME, GRAPHISTRY_PASSWORD, and LLM_API_KEY are expected, yet the configuration includes additional secrets such as:

  • EMBEDDING_PROVIDER
  • EMBEDDING_API_KEY
  • EMBEDDING_MODEL
  • EMBEDDING_ENDPOINT
  • EMBEDDING_API_VERSION
  • EMBEDDING_DIMENSIONS
  • EMBEDDING_MAX_TOKENS
  • LLM_PROVIDER
  • LLM_MODEL
  • LLM_ENDPOINT
  • LLM_API_VERSION

Please either update the reusable workflow file to accept these additional secrets (if they are necessary for the workflow’s operation) or remove them from here to avoid potential configuration issues.

🧰 Tools
🪛 actionlint (1.7.4)

22-22: secret "EMBEDDING_PROVIDER" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


23-23: secret "EMBEDDING_API_KEY" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


24-24: secret "EMBEDDING_MODEL" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


25-25: secret "EMBEDDING_ENDPOINT" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


26-26: secret "EMBEDDING_API_VERSION" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


27-27: secret "EMBEDDING_DIMENSIONS" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


28-28: secret "EMBEDDING_MAX_TOKENS" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


29-29: secret "LLM_PROVIDER" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


31-31: secret "LLM_MODEL" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


32-32: secret "LLM_ENDPOINT" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)


33-33: secret "LLM_API_VERSION" is not defined in "./.github/workflows/reusable_python_example.yml" reusable workflow. defined secrets are "GRAPHISTRY_PASSWORD", "GRAPHISTRY_USERNAME", "LLM_API_KEY", "OPENAI_API_KEY"

(workflow-call)

115 changes: 115 additions & 0 deletions .github/workflows/test_ollama.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
name: test | ollama

on:
workflow_dispatch:
pull_request:
types: [ labeled, synchronize ]

jobs:

run_simple_example_test:

runs-on: ubuntu-latest
# services:
# ollama:
# image: ollama/ollama
# ports:
# - 11434:11434

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.12.x'

- name: Install Poetry
uses: snok/install-poetry@v1.4.1
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true

- name: Install dependencies
run: |
poetry install --no-interaction --all-extras
poetry add torch

# - name: Install ollama
# run: curl -fsSL https://ollama.com/install.sh | sh
# - name: Run ollama
# run: |
# ollama serve --openai &
# ollama pull llama3.2 &
# ollama pull avr/sfr-embedding-mistral:latest

- name: Start Ollama container
run: |
docker run -d --name ollama -p 11434:11434 ollama/ollama
sleep 5
docker exec -d ollama bash -c "ollama serve --openai"

- name: Check Ollama logs
run: docker logs ollama

- name: Wait for Ollama to be ready
run: |
for i in {1..30}; do
if curl -s http://localhost:11434/v1/models > /dev/null; then
echo "Ollama is ready"
exit 0
fi
echo "Waiting for Ollama... attempt $i"
sleep 2
done
echo "Ollama failed to start"
exit 1

- name: Pull required Ollama models
run: |
curl -X POST http://localhost:11434/api/pull -d '{"name": "llama3.2"}'
curl -X POST http://localhost:11434/api/pull -d '{"name": "avr/sfr-embedding-mistral:latest"}'

- name: Call ollama API
run: |
curl -X POST http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"stream": false,
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Whatever I say, answer with Yes." }
]
}'
curl -X POST http://127.0.0.1:11434/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "avr/sfr-embedding-mistral:latest",
"input": "This is a test sentence to generate an embedding."
}'

- name: Dump Docker logs
run: |
docker ps
docker logs $(docker ps --filter "ancestor=ollama/ollama" --format "{{.ID}}")


- name: Run example test
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }}
GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }}
PYTHONFAULTHANDLER: 1
LLM_PROVIDER: "ollama"
LLM_API_KEY: "ollama"
LLM_ENDPOINT: "http://localhost:11434/v1/"
LLM_MODEL: "llama3.2"
EMBEDDING_PROVIDER: "ollama"
EMBEDDING_MODEL: "avr/sfr-embedding-mistral:latest"
EMBEDDING_ENDPOINT: "http://localhost:11434/v1/"
EMBEDDING_DIMENSIONS: "4096"
HUGGINGFACE_TOKENIZER: "Salesforce/SFR-Embedding-Mistral"
run: poetry run python ./examples/python/simple_example.py
23 changes: 22 additions & 1 deletion .github/workflows/upgrade_deps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,29 @@ name: Update Poetry Dependencies

on:
schedule:
- cron: '0 3 * * 0'
- cron: '0 3 * * 0' # Runs at 3 AM every Sunday
push:
paths:
- 'poetry.lock'
- 'pyproject.toml'
branches:
- main
- dev
pull_request:
paths:
- 'poetry.lock'
- 'pyproject.toml'
types: [opened, synchronize, reopened]
branches:
- main
- dev
workflow_dispatch:
inputs:
debug_enabled:
type: boolean
description: 'Run the update with debug logging'
required: false
default: false

jobs:
update-dependencies:
Expand Down
95 changes: 80 additions & 15 deletions cognee/infrastructure/llm/ollama/adapter.py
Original file line number Diff line number Diff line change
@@ -1,44 +1,109 @@
from typing import Type
from typing import Type, Optional
from pydantic import BaseModel
import instructor
from cognee.infrastructure.llm.llm_interface import LLMInterface
from cognee.infrastructure.llm.config import get_llm_config
from openai import OpenAI
from openai import AsyncOpenAI # Use AsyncOpenAI for async compatibility
import base64
import os


class OllamaAPIAdapter(LLMInterface):
"""Adapter for a Generic API LLM provider using instructor with an OpenAI backend."""
"""Adapter for an Ollama API LLM provider using instructor with an OpenAI backend."""

def __init__(self, endpoint: str, api_key: str, model: str, name: str, max_tokens: int):
MAX_RETRIES = 5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Utilize MAX_RETRIES constant in API calls

You've defined a MAX_RETRIES constant, but it's not being used in any of the API calls. To make your code more resilient, add this parameter to all API calls.

 transcription = self.aclient.audio.transcriptions.create(
     model="whisper-1",  # Ensure the correct model for transcription
     file=audio_file,
     language="en",
+    max_retries=self.MAX_RETRIES,
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MAX_RETRIES = 5
transcription = self.aclient.audio.transcriptions.create(
model="whisper-1", # Ensure the correct model for transcription
file=audio_file,
language="en",
max_retries=self.MAX_RETRIES,
)

def __init__(
self,
endpoint: str,
api_key: str,
model: str,
name: str,
max_tokens: int,
api_version: Optional[str] = None,
) -> None:
self.name = name
self.model = model
self.api_key = api_key
self.endpoint = endpoint
self.max_tokens = max_tokens
self.api_version = api_version

# Use AsyncOpenAI for proper async handling
self.aclient = instructor.from_openai(
OpenAI(base_url=self.endpoint, api_key=self.api_key), mode=instructor.Mode.JSON
AsyncOpenAI(base_url=self.endpoint, api_key=self.api_key), mode=instructor.Mode.JSON
)

async def acreate_structured_output(
self, text_input: str, system_prompt: str, response_model: Type[BaseModel]
) -> BaseModel:
"""Generate a structured output from the LLM using the provided text and system prompt."""

# Ensure the function being awaited is actually async
if not callable(getattr(self.aclient.chat.completions, "create", None)):
raise TypeError("self.aclient.chat.completions.create is not callable!")

response = await self.aclient.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": text_input},
],
max_tokens=self.max_tokens,
)

# Ensure response is valid before passing to Pydantic model
if not isinstance(response, dict):
raise ValueError(f"Unexpected response format: {response}")

return response_model(**response)

def create_transcript(self, input_file: str) -> str:
"""Generate an audio transcript from a user query."""

if not os.path.isfile(input_file):
raise FileNotFoundError(f"The file {input_file} does not exist.")

with open(input_file, "rb") as audio_file:
transcription = self.aclient.audio.transcriptions.create(
model="whisper-1", # Ensure the correct model for transcription
file=audio_file,
language="en",
)

# Ensure the response contains a valid transcript
if not hasattr(transcription, "text"):
raise ValueError("Transcription failed. No text returned.")

return transcription.text
Comment on lines +48 to +65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

⚠️ Potential issue

🧩 Analysis chain

Fix issues in the audio transcription implementation.

This implementation has multiple critical issues:

  1. It uses self.aclient instead of self.client
  2. Based on previous reviews, Ollama may not support OpenAI's audio transcription API
  3. There's no error handling for the API call
 def create_transcript(self, input_file: str) -> str:
     """Generate an audio transcript from a user query."""

     if not os.path.isfile(input_file):
         raise FileNotFoundError(f"The file {input_file} does not exist.")

-    with open(input_file, "rb") as audio_file:
-        transcription = self.aclient.audio.transcriptions.create(
-            model="whisper-1",  # Ensure the correct model for transcription
-            file=audio_file,
-            language="en",
-        )
-
-    # Ensure the response contains a valid transcript
-    if not hasattr(transcription, "text"):
-        raise ValueError("Transcription failed. No text returned.")
-
-    return transcription.text
+    try:
+        with open(input_file, "rb") as audio_file:
+            transcription = self.client.audio.transcriptions.create(
+                model="whisper-1",  # Ensure the correct model for transcription
+                file=audio_file,
+                language="en",
+                max_retries=self.MAX_RETRIES,
+            )
+
+        # Ensure the response contains a valid transcript
+        if not hasattr(transcription, "text"):
+            raise ValueError("Transcription failed. No text returned.")
+
+        return transcription.text
+    except Exception as e:
+        raise Exception(f"Error transcribing audio: {str(e)}")

🌐 Web query:

Does Ollama support OpenAI-compatible audio transcription API?

💡 Result:

Ollama does not natively support OpenAI-compatible audio transcription APIs like Whisper. However, developers can combine Ollama with OpenAI's Whisper API or third-party tools to achieve this functionality[1][3][7]. Here's how it works:

  1. Audio Processing Workflow
    Typical implementations use OpenAI's Whisper for speech-to-text:

    // Transcribe audio using OpenAI's API
    const transcription = await openai.audio.transcriptions.create({
      model: 'whisper-1',
      file: audioFile,
    });

    Then pass the text to Ollama for LLM processing[1][4].

  2. Ollama's OpenAI Compatibility
    While Ollama supports OpenAI-style chat completions API for text generation[2][5][8]:

    // Using Ollama with OpenAI-compatible chat API
    const response = await generateText({
      model: ollama('llama3.1'),
      prompt: transcribedText,
    });

    This compatibility doesn't extend to audio endpoints[2][5].

  3. Alternative Implementations
    Some projects combine offline tools with Ollama:

    • WhisperLive for real-time transcription[7]
    • Ollama-Voice stack (Whisper + Ollama + pyttsx3)[10]
    • Custom integrations using Nvidia GPUs[3]

For direct audio API support, developers currently need to use OpenAI's services or third-party solutions alongside Ollama's text capabilities[1][3][7]. The native Ollama API focuses on text generation through its OpenAI-compatible chat endpoints[2][5][8].

Citations:


Critical: Audio Transcription Implementation Requires Revision

The recent changes correctly update the client reference (from self.aclient to self.client) and add error handling with a retry mechanism. However, the core issue still remains: Ollama does not natively support OpenAI’s audio transcription API (e.g., the Whisper endpoint). To ensure reliable functionality, please consider one of the following improvements:

  • Integrate a Supported Transcription Service: If audio transcription is required, use a dedicated service (such as OpenAI’s Whisper API or another third-party solution) rather than relying on Ollama’s endpoints.
  • Refactor the Workflow: Separate the transcription step from the LLM workflow. This will help avoid confusion and prevent the use of unsupported API calls in the Ollama adapter.

Affected Code Location:

  • cognee/infrastructure/llm/ollama/adapter.py (lines 52-69)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def create_transcript(self, input_file: str) -> str:
"""Generate an audio transcript from a user query."""
if not os.path.isfile(input_file):
raise FileNotFoundError(f"The file {input_file} does not exist.")
with open(input_file, "rb") as audio_file:
transcription = self.aclient.audio.transcriptions.create(
model="whisper-1", # Ensure the correct model for transcription
file=audio_file,
language="en",
)
# Ensure the response contains a valid transcript
if not hasattr(transcription, "text"):
raise ValueError("Transcription failed. No text returned.")
return transcription.text
def create_transcript(self, input_file: str) -> str:
"""Generate an audio transcript from a user query."""
if not os.path.isfile(input_file):
raise FileNotFoundError(f"The file {input_file} does not exist.")
try:
with open(input_file, "rb") as audio_file:
transcription = self.client.audio.transcriptions.create(
model="whisper-1", # Ensure the correct model for transcription
file=audio_file,
language="en",
max_retries=self.MAX_RETRIES,
)
# Ensure the response contains a valid transcript
if not hasattr(transcription, "text"):
raise ValueError("Transcription failed. No text returned.")
return transcription.text
except Exception as e:
raise Exception(f"Error transcribing audio: {str(e)}")


def transcribe_image(self, input_file: str) -> str:
"""Transcribe content from an image using base64 encoding."""

if not os.path.isfile(input_file):
raise FileNotFoundError(f"The file {input_file} does not exist.")

with open(input_file, "rb") as image_file:
encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

Comment on lines +71 to 75
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix client reference and add error handling in transcribe_image.

The transcribe_image method uses the base64 encoding correctly, but it should use self.client instead of self.aclient and include error handling.

 def transcribe_image(self, input_file: str) -> str:
     """Transcribe content from an image using base64 encoding."""

     if not os.path.isfile(input_file):
         raise FileNotFoundError(f"The file {input_file} does not exist.")

-    with open(input_file, "rb") as image_file:
-        encoded_image = base64.b64encode(image_file.read()).decode("utf-8")
+    try:
+        with open(input_file, "rb") as image_file:
+            encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

Committable suggestion skipped: line range outside the PR's diff.

response = self.aclient.chat.completions.create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined client reference.

The code uses self.aclient, but only self.client is defined in the constructor. This will cause a runtime error.

-response = self.aclient.chat.completions.create(
+response = self.client.chat.completions.create(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response = self.aclient.chat.completions.create(
response = self.client.chat.completions.create(

model=self.model,
messages=[
{
"role": "user",
"content": f"Use the given format to extract information from the following input: {text_input}",
},
{
"role": "system",
"content": system_prompt,
},
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"},
},
],
}
Comment on lines +81 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify Ollama supports multimodal content in this format.

The implementation assumes Ollama supports OpenAI's multimodal message format. Verify that Ollama's API accepts this structure for image analysis.


🌐 Web query:

Does Ollama support OpenAI-compatible image analysis with base64 encoded images?

💡 Result:

Ollama supports image analysis using base64-encoded images through its vision models like LLaVA and Llama 3.2 Vision, though implementation details differ slightly from OpenAI's approach. Here's how it works:

Base64 Image Handling

  1. Encoding requirement:
    Images must be converted to base64 strings without the data:image/...;base64, prefix[6][8]. For example:

    def encode_image_to_base64(image_path):
        with Image.open(image_path) as img:
            buffered = io.BytesIO()
            img.save(buffered, format="PNG")
            return base64.b64encode(buffered.getvalue()).decode('utf-8')

    This matches OpenAI's requirement for raw base64 data[6][8].

  2. API integration:
    Base64 strings are passed via the images parameter in chat requests:

    response = ollama.chat(
        model="llava:13b",
        messages=[{
            "role": "user",
            "content": "Describe this image:",
            "images": [base64_image_string]
        }]
    )

    This structure mirrors OpenAI's vision API format[7][9].

Key Differences from OpenAI

Feature Ollama OpenAI
Model options LLaVA, Llama 3.2 Vision GPT-4 Vision
Local execution ✅ Runs entirely offline ❌ Cloud-only
Base64 prefix handling Requires raw string Requires prefix stripping
Model licensing Apache 2.0/LLaMA license Proprietary

Supported Use Cases

  • Document analysis: Receipts, charts, and diagrams[3][7]
  • Object detection: Local security camera integration[5]
  • OCR: Book cover text extraction[2]
  • Multimodal RAG systems: Combined text/image retrieval[3]

To implement this with Llama 3.2 Vision:

ollama run llama3.2-vision

Then use either file paths or base64 strings in your API calls[9][10]. The system supports both CLI and programmatic access through Python/JavaScript libraries[7][9].

Citations:


Action: Update Image Handling to Meet Ollama API Requirements

We've confirmed that while Ollama does support image analysis with base64-encoded images, it requires some important adjustments compared to OpenAI’s multimodal message format:

  • Strip the Prefix: Ollama expects a raw base64 string without the data:image/jpeg;base64, prefix.
  • Parameter Structure: Instead of using the image_url field, the API typically accepts images via an images parameter in the chat message.

Please update the code in cognee/infrastructure/llm/ollama/adapter.py (lines 82–89) accordingly to comply with these requirements.

Comment on lines 65 to 88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

⚠️ Potential issue

🧩 Analysis chain

Update image handling to meet Ollama API requirements.

The current implementation assumes Ollama supports OpenAI's multimodal message format, but according to previous verification, Ollama requires different formatting for image processing.

-response = self.client.chat.completions.create(
+response = self.client.chat.completions.create(
     model=self.model,
     messages=[
         {
             "role": "user",
-            "content": [
-                {"type": "text", "text": "What's in this image?"},
-                {
-                    "type": "image_url",
-                    "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"},
-                },
-            ],
+            "content": "What's in this image?",
+            "images": [encoded_image],  # Ollama expects raw base64 strings without prefix
         }
     ],

🌐 Web query:

Does Ollama API support image analysis with content structure like OpenAI or does it require different format?

💡 Result:

Ollama's API supports image analysis with structured content output capabilities similar to OpenAI, though implementation details differ. Here's a technical breakdown:

Image Analysis Capabilities

Ollama supports vision models like LLaVA (v1.6) that enable:

  • Object detection
  • Text recognition in images
  • Detailed scene description
  • Chart/diagram interpretation [9]

The API accepts images through:

  1. File paths in CLI (./image.jpg)
  2. Base64-encoded data in programmatic calls
  3. Direct image URLs [9]
# Python example using Ollama's vision API
import ollama

response = ollama.chat(
    model='llava:13b',
    messages=[{
        'role': 'user',
        'content': 'Analyze this security footage:',
        'images': ['/path/to/image.jpg']
    }]
)
print(response['message']['content'])

Structured Output Formatting

Ollama introduced JSON schema constraints in December 2024:

# Structured output example with image analysis
response = ollama.generate(
    model='llava:34b',
    prompt="Extract vehicle details from traffic cam image",
    format={
        "type": "object",
        "properties": {
            "make": {"type": "string"},
            "color": {"type": "string"},
            "license_plate": {"type": "string"}
        }
    }
)

Key differences from OpenAI:

  1. Requires explicit JSON schema definition
  2. Uses native format parameter instead of separate JSON mode flag
  3. Supports vision models through multimodal LLMs [3][7]

Implementation Considerations

  1. Model Requirements: Must use vision-capable models like:

    • llava:7b, llava:13b, llava:34b
    • bakllava
    • moondream [9]
  2. Resolution Handling:

    • Supports up to 4MP images (4x previous versions)
    • Automatic downsampling for large images [9]
  3. API Compatibility:

    # OpenAI-style implementation with Ollama
    from openai import OpenAI
    
    client = OpenAI(
        base_url='http://localhost:11434/v1',
        api_key='ollama'
    )
    
    response = client.chat.completions.create(
        model="llava:13b",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this medical scan"},
                {"type": "image_url", "image_url": {"url": "base64_encoded_image"}}
            ]
        }]
    )

    This compatibility layer allows direct porting of OpenAI-based implementations [2][8].

For production systems, Ollama recommends combining vision models with structured output constraints to ensure reliable parsing of image analysis results [4][7].

Citations:


Action Required: Update Image Payload Format for Ollama API Compatibility

The current implementation uses OpenAI’s multimodal message format. Based on updated Ollama API guidelines, you should remove the nested content structure and instead provide the query as a plain text string along with an “images” field containing the raw base64-encoded string (without the MIME data prefix). This will ensure proper image processing with Ollama’s vision models.

  • In file cognee/infrastructure/llm/ollama/adapter.py (lines 64–76), update the message payload as shown below.
  • Remove the image encapsulation within the “content” array.
  • Supply the encoded image directly in the “images” list without the "data:image/jpeg;base64," prefix.
-response = self.client.chat.completions.create(
-    model=self.model,
-    messages=[
-        {
-            "role": "user",
-            "content": [
-                {"type": "text", "text": "What's in this image?"},
-                {
-                    "type": "image_url",
-                    "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"},
-                },
-            ],
-        }
-    ],
+response = self.client.chat.completions.create(
+    model=self.model,
+    messages=[
+        {
+            "role": "user",
+            "content": "What's in this image?",
+            "images": [encoded_image],  # Ollama expects raw base64 strings without the MIME prefix
+        }
+    ],

Committable suggestion skipped: line range outside the PR's diff.

],
max_retries=5,
response_model=response_model,
max_tokens=300,
)
Comment on lines +102 to 91
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use instance variable for max_tokens

Replace the hardcoded max_tokens value with the instance variable for consistency.

-    max_tokens=300,
+    max_tokens=self.max_tokens,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
max_tokens=300,
)
max_tokens=self.max_tokens,
)

Comment on lines 81 to 91
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

⚠️ Potential issue

🧩 Analysis chain

Fix client reference and update image payload format for Ollama compatibility.

This implementation has several issues:

  1. It uses self.aclient instead of self.client
  2. The format for image processing is incompatible with Ollama's API
  3. It uses a hardcoded max_tokens value instead of the class variable
  4. There's no error handling for the API call
-response = self.aclient.chat.completions.create(
-    model=self.model,
-    messages=[
-        {
-            "role": "user",
-            "content": [
-                {"type": "text", "text": "What's in this image?"},
-                {
-                    "type": "image_url",
-                    "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"},
-                },
-            ],
-        }
-    ],
-    max_tokens=300,
-);
+        response = self.client.chat.completions.create(
+            model=self.model,
+            messages=[
+                {
+                    "role": "user",
+                    "content": "What's in this image?",
+                    "images": [encoded_image],  # Ollama expects raw base64 strings
+                }
+            ],
+            max_tokens=self.max_tokens,
+            max_retries=self.MAX_RETRIES,
+        )

🌐 Web query:

What is the correct format for sending base64 encoded images to Ollama API?

💡 Result:

To send base64-encoded images to Ollama API, include them as an array under the images parameter in your request body. Here's how to format it correctly:

Image Formatting Requirements

  • Base64 encoding: Images must be converted to base64 strings without additional metadata prefixes (e.g., remove "data:image/png;base64,")
  • Array structure: Wrap images in an array even for single-image requests
  • Multimodal models: Use supported models like LLaVA or Llama 3.2 Vision

Example Request Structure

{
  "model": "llava:13b",
  "prompt": "Describe this image",
  "stream": false,
  "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAY..."] 
}

Key parameters:

  • model: Must support multimodal processing
  • prompt: Your text query/instruction
  • images: Array of base64-encoded image strings[1][3][4]

Implementation Examples

Python:

import base64
import ollama

with open("image.jpg", "rb") as f:
    b64_image = base64.b64encode(f.read()).decode("utf-8")

response = ollama.chat(
    model="llava:13b",
    messages=[{
        "role": "user",
        "content": "What's in this image?",
        "images": [b64_image]
    }]
)

cURL:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2-vision",
  "prompt": "Explain this diagram",
  "images": ["/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAYEBQY..."]
}'

Supported Models & Requirements

Model VRAM Requirement Base64 Handling
LLaVA 13B ≥8GB Direct string array
Llama3.2-Vision 11B:8GB, 90B:64GB File path conversion required

For optimal results:

  1. Use Ollama v0.4+ for vision models[8]
  2. Remove metadata prefixes from base64 strings[9]
  3. Combine images with text prompts for contextual analysis[2][6]

This format works across all Ollama endpoints supporting multimodal input, including /api/generate and /api/chat[1][8].

Citations:


Action Required: Update Image Handling & Add Error Handling

The updated code correctly fixes the client reference, updates the image payload format (using the images array with a raw base64 string), and replaces the hardcoded max_tokens with the class variable while also adding max_retries. However, the API call still lacks error handling. Please ensure that:

  • Encoded Image: The encoded_image variable contains a raw base64 string (i.e. any "data:image/jpeg;base64," prefix is removed) as required by Ollama’s API.
  • Error Handling: Wrap the API call in an error handling block (e.g., try/except) to manage potential API failures gracefully.
  • Consistency: Verify that all usages of the client and token/retry values across the codebase adhere to these updated patterns.

Committable suggestion skipped: line range outside the PR's diff.


return response
# Ensure response is valid before accessing .choices[0].message.content
if not hasattr(response, "choices") or not response.choices:
raise ValueError("Image transcription failed. No response received.")

return response.choices[0].message.content
Comment on lines 93 to 97
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add proper error handling for the entire API call.

While you've added validation for the response, you should wrap the entire API call in a try-except block to handle API exceptions properly.

+    except Exception as e:
+        raise Exception(f"Error transcribing image: {str(e)}")
+
     return response.choices[0].message.content

Committable suggestion skipped: line range outside the PR's diff.

Loading