Test audio image transcription #1911

dexters1 · 2025-12-16T14:44:01Z

Description

Run CI/CD for audio/image transcription PR from contributor @rajeevrajeshuni

Acceptance Criteria

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Code refactoring
Performance improvement
Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

Release Notes

New Features
- Added audio transcription capability across LLM providers.
- Added image transcription and description capability.
- Enhanced observability and monitoring for AI operations.
Breaking Changes
- Removed synchronous structured output method; use asynchronous alternative instead.
Refactor
- Unified LLM provider architecture for improved consistency and maintainability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

pull-checklist · 2025-12-16T14:44:06Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

gitguardian · 2025-12-16T14:44:07Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
9573981	Triggered	Generic Password	`13c034e`	.github/workflows/examples_tests.yml	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

coderabbitai · 2025-12-16T14:44:09Z

Walkthrough

Refactored LLM adapter architecture by removing the synchronous create_structured_output method from LLMGateway, consolidating adapter classes to inherit from a new GenericAPIAdapter base class instead of LLMInterface, and adding transcription capabilities and observability decorators across adapters.

Changes

Cohort / File(s)	Summary
LLMGateway refactoring `cognee/infrastructure/llm/LLMGateway.py`	Removed public static method `create_structured_output()`; callers must use `acreate_structured_output` instead
Generic adapter base class enhancement `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py`	Expanded `GenericAPIAdapter` with constructor signature changes (added `endpoint`, `api_version`, `transcription_model`, `image_transcribe_model`, `fallback_*` parameters); added public methods `create_transcript()` and `transcribe_image()`; added observability decorators (`@observe`) to generation and transcription paths
Adapter base class migration `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py`, `gemini/adapter.py`, `mistral/adapter.py`, `openai/adapter.py`	Changed inheritance from `LLMInterface` to `GenericAPIAdapter` across all four adapters; updated constructor signatures to match new base class (moved `api_key` and `model` to required positional args, added `transcription_model`/`image_transcribe_model` parameters); delegated initialization to `super().__init__(...)`; added `@observe(as_type="generation")` decorator to `acreate_structured_output`
Transcription support `mistral/adapter.py`	Added new public method `create_transcript()` returning `TranscriptionReturnType` with Mistral client integration
Transcription type definition `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py`	Added new `TranscriptionReturnType` class with `text: str` and `payload: BaseModel` attributes
Adapter instantiation updates `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py`	Updated adapter constructor calls to pass `max_completion_tokens` and related fields as positional arguments instead of keyword arguments
Interface documentation `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py`	Updated docstring to reference multimodal processing; removed `LLMGateway` import; no signature changes
Minor cleanup `cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py`	Removed extraneous blank line in imports

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Constructor signature compatibility: Verify all adapter constructors correctly invoke super().__init__() with the expected parameter order and types across anthropic, gemini, mistral, and openai adapters
Inheritance behavioral changes: Review whether the shift from LLMInterface to GenericAPIAdapter preserves existing functionality (especially around error handling, client initialization, and fallback flows)
Transcription implementation: Validate the new create_transcript() and transcribe_image() methods across adapters (particularly mistral and openai) for correctness and consistency
Call site updates in get_llm_client.py: Confirm all adapter instantiation calls match the new constructor signatures and parameter passing conventions
Observability integration: Ensure @observe decorators are correctly applied and don't introduce unintended side effects

Possibly related PRs

Changes langfuse LLM API call tracing to generation #397: Adds observability decorator support (@observe(as_type="generation")) to adapter methods, which this PR integrates into multiple adapters
fix: Resolve issue with BAML rate limit handling #1813: Modifies the same set of LLM adapters (anthropic, gemini, mistral, openai) with rate-limiter and retry logic, creating potential interaction points with this refactoring
Feature/delete preview #1385: Refactors LLMGateway public API and routing of structured-output calls, directly related to the create_structured_output() removal in this PR

Suggested reviewers

Vasilije1990
borisarzentar
hajdul88

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is incomplete, containing only 'Run CI/CD for audio/image transcription PR from contributor @rajeevrajeshuni' without addressing required template sections like Acceptance Criteria, Type of Change, or Pre-submission Checklist details.	Complete the pull request description by filling out the Acceptance Criteria section with specific requirements, selecting the appropriate Type of Change, providing acceptance proof, testing instructions, and checking relevant Pre-submission Checklist items.
Linked Issues check	⚠️ Warning	The pull request description does not include any links to related issues, tickets, or references to the original contributor's PR that this change is based on.	Add links to relevant GitHub issues or the original PR #1911 context in the description to establish traceability and provide context for reviewers.
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Test audio image transcription' is partially related to the changeset but is vague and overly broad, using non-descriptive language that doesn't convey the specific architectural changes made.	Revise the title to clearly describe the main change, such as 'Refactor LLM adapters to use GenericAPIAdapter base class with transcription support' or 'Add audio and image transcription capabilities to LLM adapters'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Out of Scope Changes check	✅ Passed	The changes appear focused on refactoring LLM adapters to use a new GenericAPIAdapter base class and adding transcription capabilities, which aligns with the stated PR objective of testing audio/image transcription features.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch test-audio-image-transcription

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…rface

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py (1)
97-99: Unused import of GenericAPIAdapter in OLLAMA branch.

GenericAPIAdapter is imported but OllamaAPIAdapter is instantiated. This import appears to be dead code.
     elif provider == LLMProvider.OLLAMA:
         if llm_config.llm_api_key is None and raise_api_key_error:
             raise LLMAPIKeyNotSetError()

-        from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.generic_llm_api.adapter import (
-            GenericAPIAdapter,
-        )
-
         return OllamaAPIAdapter(
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py (1)
79-91: Hardcoded max_tokens=4096 ignores configured max_completion_tokens.

The constructor accepts max_completion_tokens and passes it to the base class (line 41), but line 81 uses a hardcoded value of 4096. This means the configured token limit is ignored, potentially causing unexpected behavior.
             return await self.aclient(
                 model=self.model,
-                max_tokens=4096,
+                max_tokens=self.max_completion_tokens,
                 max_retries=2,
                 messages=[
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (2)
179-181: Commented-out code should be removed or reinstated.

The api_base=self.fallback_endpoint line is commented out but the base class GenericAPIAdapter uses it in its fallback path. This inconsistency could cause the fallback to behave differently than expected.
                         api_key=self.fallback_api_key,
-                        # api_base=self.fallback_endpoint,
+                        api_base=self.fallback_endpoint,
                         response_model=response_model,
227-236: Replace litellm.transcription() with async alternative to prevent event loop blocking.

The synchronous litellm.transcription() blocks the event loop in async context; use await litellm.atranscription(), or wrap the call in asyncio.to_thread() if needed.

🧹 Nitpick comments (10)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py (1)
9-14: Docstring mentions multimodal processing but interface lacks corresponding methods.

The docstring claims the interface defines methods for "multimodal processing" (line 10), but only acreate_structured_output is documented and implemented. If multimodal capabilities are intended, consider adding method signatures for transcription and image processing to align with the documented scope, or update the docstring to reflect the actual interface.
 class LLMInterface(Protocol):
     """
-    Define an interface for LLM models with methods for structured output, multimodal processing, and prompt display.
+    Define an interface for LLM models with methods for structured output generation.

     Methods:
     - acreate_structured_output(text_input: str, system_prompt: str, response_model: Type[BaseModel])
     """
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1)
1-10: Consider using @dataclass for cleaner data container definition.

The class uses class-level type annotations (lines 5-6) which aren't enforced at runtime and don't serve as instance attributes. Using @dataclass would be more idiomatic and eliminate the boilerplate __init__.

Additionally, a module docstring would help document this type's purpose per coding guidelines.
+"""Types for LLM transcription return values."""
+
+from dataclasses import dataclass
 from pydantic import BaseModel


+@dataclass
 class TranscriptionReturnType:
+    """Container for transcription results with raw text and structured payload."""
+
     text: str
     payload: BaseModel
-
-    def __init__(self, text: str, payload: BaseModel):
-        self.text = text
-        self.payload = payload
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py (2)
115-120: Inconsistent argument passing style: consider using keyword arguments.

AnthropicAdapter uses positional arguments (lines 116-118) while other adapters like OpenAIAdapter, GeminiAdapter, and MistralAdapter use keyword arguments. Using keyword arguments consistently improves readability and reduces risk of positional mismatches if constructor signatures change.
         return AnthropicAdapter(
-            llm_config.llm_api_key,
-            llm_config.llm_model,
-            max_completion_tokens,
+            api_key=llm_config.llm_api_key,
+            model=llm_config.llm_model,
+            max_completion_tokens=max_completion_tokens,
             instructor_mode=llm_config.llm_instructor_mode.lower(),
         )
130-139: Same inconsistency: GenericAPIAdapter uses positional arguments.

For consistency with other adapters and maintainability, prefer keyword arguments here as well.
         return GenericAPIAdapter(
-            llm_config.llm_api_key,
-            llm_config.llm_model,
-            max_completion_tokens,
+            api_key=llm_config.llm_api_key,
+            model=llm_config.llm_model,
+            max_completion_tokens=max_completion_tokens,
             "Custom",
             instructor_mode=llm_config.llm_instructor_mode.lower(),
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (4)
7-7: Unused import: Optional

Optional is imported but not used anywhere in this file.
-from typing import Type, Optional
+from typing import Type
198-220: Type annotation missing and inefficient MIME type validation order.

The input parameter should be typed as str.

MIME type validation occurs after the file has already been read and base64-encoded. If validation fails, work was wasted and the error message is misleading since we did successfully open the file.
-    async def create_transcript(self, input) -> TranscriptionReturnType:
+    async def create_transcript(self, input: str) -> TranscriptionReturnType:
         """
         Generate an audio transcript from a user query.
         ...
         """
+        mime_type, _ = mimetypes.guess_type(input)
+        if not mime_type or not mime_type.startswith("audio/"):
+            raise ValueError(
+                f"Could not determine MIME type for audio file: {input}. Is the extension correct?"
+            )
         async with open_data_file(input, mode="rb") as audio_file:
             encoded_string = base64.b64encode(audio_file.read()).decode("utf-8")
-        mime_type, _ = mimetypes.guess_type(input)
-        if not mime_type or not mime_type.startswith("audio/"):
-            raise ValueError(
-                f"Could not determine MIME type for audio file: {input}. Is the extension correct?"
-            )
         response = await litellm.acompletion(
252-274: Type annotation and MIME validation order; return type mismatch.

The input parameter should be typed as str.

Same issue as create_transcript: validate MIME type before reading the file.

The return type annotation says BaseModel, but the method returns a raw litellm response object, not a BaseModel instance. Consider using Any or defining a proper return type.
-    async def transcribe_image(self, input) -> BaseModel:
+    async def transcribe_image(self, input: str):
         """
         Generate a transcription of an image from a user query.
         ...
         Returns:
         --------
-            - BaseModel: A structured output generated by the model, returned as an instance of
-              BaseModel.
+            The raw response from the LLM API containing the image description.
         """
+        mime_type, _ = mimetypes.guess_type(input)
+        if not mime_type or not mime_type.startswith("image/"):
+            raise ValueError(
+                f"Could not determine MIME type for image file: {input}. Is the extension correct?"
+            )
         async with open_data_file(input, mode="rb") as image_file:
             encoded_image = base64.b64encode(image_file.read()).decode("utf-8")
-        mime_type, _ = mimetypes.guess_type(input)
-        if not mime_type or not mime_type.startswith("image/"):
-            raise ValueError(
-                f"Could not determine MIME type for image file: {input}. Is the extension correct?"
-            )
297-297: Hardcoded max_completion_tokens=300 may be too restrictive.

For detailed image descriptions, 300 tokens could truncate the response. Consider making this configurable or using self.max_completion_tokens for consistency with other methods.
-            max_completion_tokens=300,
+            max_completion_tokens=self.max_completion_tokens,
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (1)
41-46: Docstring lists create_structured_output but method may be removed.

The docstring mentions create_structured_output as a public method, but based on the AI summary, the synchronous version was removed. Update the docstring to reflect the current API surface.
     Public methods:

     - acreate_structured_output
-    - create_structured_output
     - create_transcript
     - transcribe_image
     - show_prompt
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (1)
138-138: Return type Optional[TranscriptionReturnType] is misleading.

The method always returns a TranscriptionReturnType or raises an exception; it never returns None. Remove Optional for accurate typing.
-    async def create_transcript(self, input) -> Optional[TranscriptionReturnType]:
+    async def create_transcript(self, input: str) -> TranscriptionReturnType:

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12e6ad1 and 8027263.

📒 Files selected for processing (10)

cognee/infrastructure/llm/LLMGateway.py (0 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py (2 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py (3 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (7 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py (3 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py (1 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (5 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py (0 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (5 hunks)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1 hunks)

💤 Files with no reviewable changes (2)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py
cognee/infrastructure/llm/LLMGateway.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.

As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.

As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.

As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.

As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py

cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py

cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py

🧬 Code graph analysis (4)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py (4)

cognee/infrastructure/llm/exceptions.py (1)

ContentPolicyFilterError (4-5)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (1)

GenericAPIAdapter (38-300)

cognee/shared/logging_utils.py (1)

get_logger (212-224)

cognee/modules/observability/get_observe.py (1)

get_observe (5-25)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (3)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (2)

GenericAPIAdapter (38-300)

create_transcript (198-242)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1)

TranscriptionReturnType (4-10)

cognee/infrastructure/llm/LLMGateway.py (1)

create_transcript (41-47)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (3)

cognee/infrastructure/files/utils/open_data_file.py (1)

open_data_file (11-54)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (2)

GenericAPIAdapter (38-300)

create_transcript (198-242)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1)

TranscriptionReturnType (4-10)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (4)

cognee/infrastructure/files/utils/open_data_file.py (1)

open_data_file (11-54)

cognee/modules/observability/get_observe.py (1)

get_observe (5-25)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1)

TranscriptionReturnType (4-10)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (1)

create_transcript (208-237)

🔇 Additional comments (7)

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py (1)

35-49: LGTM on the adapter refactoring to use GenericAPIAdapter.

The inheritance change, constructor delegation, and API client initialization using self.api_key from the base class are correctly implemented. The observability decorator integration aligns with the broader PR changes.

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py (3)

62-78: Well-structured base class initialization with extended parameters.

The super().__init__ call correctly passes all relevant configuration including endpoint, api_version, transcription_model, and fallback credentials. The integration with GenericAPIAdapter is properly implemented.

134-182: Content policy error handling with fallback is comprehensive.

The error handling catches content policy violations, checks for valid fallback configuration before attempting retry, and properly re-raises non-content-policy InstructorRetryException errors. The nested try/except for the fallback path mirrors the primary error handling appropriately.

118-127: No action required. The message ordering in the messages array does not affect Gemini's response quality when using the instructor library adapter, as the adapter automatically converts and properly formats messages for the Gemini API.

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (1)

77-88: LGTM: Proper delegation to base class constructor.

The super().__init__() call correctly passes all required parameters to GenericAPIAdapter, including the new transcription_model and fallback configuration.

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (2)

51-59: LGTM: Proper base class initialization.

The constructor correctly delegates to GenericAPIAdapter.__init__() with appropriate parameters including the new transcription model options.

70-70: Good addition of observability decorator.

Adding @observe(as_type="generation") aligns with the consistent observability pattern across all adapters in this refactor.

coderabbitai · 2025-12-16T17:30:52Z

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py

    Public methods:
-    - acreate_structured_output(text_input: str, system_prompt: str, response_model:
-    Type[BaseModel]) -> BaseModel
+    - acreate_structured_output(text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel
+    - create_transcript(input) -> BaseModel: Transcribe audio files to text
+    - transcribe_image(input) -> BaseModel: Inherited from GenericAPIAdapter
    """


⚠️ Potential issue | 🟡 Minor

Docstring inaccuracy: create_transcript return type and method ownership.

The docstring states create_transcript(input) -> BaseModel but per the GenericAPIAdapter context, this method returns TranscriptionReturnType. Also, both create_transcript and transcribe_image are inherited from the base class, not defined in this adapter.

Public methods: - acreate_structured_output(text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel - - create_transcript(input) -> BaseModel: Transcribe audio files to text - - transcribe_image(input) -> BaseModel: Inherited from GenericAPIAdapter + + Inherited from GenericAPIAdapter: + - create_transcript(input) -> TranscriptionReturnType: Transcribe audio files to text + - transcribe_image(input) -> BaseModel: Transcribe image content """

coderabbitai · 2025-12-16T17:30:52Z

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py

+        transcription_model = self.transcription_model
+        if self.transcription_model.startswith("mistral"):
+            transcription_model = self.transcription_model.split("/")[-1]
+        file_name = input.split("/")[-1]


⚠️ Potential issue | 🟡 Minor

Use os.path.basename() instead of string split for path handling.

input.split("/")[-1] is not portable across operating systems. Use os.path.basename() for cross-platform compatibility. As per coding guidelines, prioritize portability.

+import os ... - file_name = input.split("/")[-1] + file_name = os.path.basename(input)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

file_name = input.split("/")[-1]

import os

...

file_name = os.path.basename(input)

🤖 Prompt for AI Agents

In cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py around line 156, replace the non-portable path handling "input.split('/')[-1]" with os.path.basename(input) to ensure cross-platform correctness; if os is not already imported at the top of the module, add "import os" and update the variable assignment accordingly so the filename is derived using os.path.basename.

coderabbitai · 2025-12-16T17:30:52Z

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py

+        async with open_data_file(input, mode="rb") as f:
+            transcription_response = self.mistral_client.audio.transcriptions.complete(
+                model=transcription_model,
+                file={
+                    "content": f,
+                    "file_name": file_name,
+                },
+            )
+
+            return TranscriptionReturnType(transcription_response.text, transcription_response)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "adapter.py" -path "*/mistral/*" | head -20

Repository: topoteretes/cognee

Length of output: 159

🏁 Script executed:

fd "adapter.py" -p "*/mistral/*" -x cat -n {}

Repository: topoteretes/cognee

Length of output: 160

🏁 Script executed:

rg "mistral_client" -A 5 -B 5

Repository: topoteretes/cognee

Length of output: 3101

🏁 Script executed:

cat -n cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | sed -n '140,175p'

Repository: topoteretes/cognee

Length of output: 1338

🏁 Script executed:

rg "import.*Mistral|from.*mistral" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py

Repository: topoteretes/cognee

Length of output: 91

🏁 Script executed:

rg "async def.*transcrib" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py -A 20

Repository: topoteretes/cognee

Length of output: 44

🏁 Script executed:

rg "async def" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | grep -i transcrib

Repository: topoteretes/cognee

Length of output: 44

🏁 Script executed:

cat -n cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | sed -n '130,140p'

Repository: topoteretes/cognee

Length of output: 588

🏁 Script executed:

rg "def.*transcrib" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py -B 5

Repository: topoteretes/cognee

Length of output: 44

🏁 Script executed:

cat -n cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | head -30

Repository: topoteretes/cognee

Length of output: 1202

🏁 Script executed:

rg "Mistral|mistralai" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py

Repository: topoteretes/cognee

Length of output: 305

🏁 Script executed:

grep -r "asyncio.to_thread\|run_in_executor" cognee/ | head -10

Repository: topoteretes/cognee

Length of output: 600

🏁 Script executed:

cat pyproject.toml | grep -A 5 "mistral"

Repository: topoteretes/cognee

Length of output: 373

🏁 Script executed:

rg "mistral_client\." cognee/ -B 2 -A 2

Repository: topoteretes/cognee

Length of output: 794

🏁 Script executed:

rg "self\.aclient\|acompletion" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py -B 3 -A 3

Repository: topoteretes/cognee

Length of output: 44

🏁 Script executed:

python3 << 'EOF' import subprocess result = subprocess.run( ["pip", "show", "mistralai"], capture_output=True, text=True ) print(result.stdout) print(result.stderr) EOF

Repository: topoteretes/cognee

Length of output: 148

🏁 Script executed:

python3 << 'EOF' # Check if mistralai module has async audio transcription try: from mistralai import Mistral import inspect client = Mistral.__dict__ # Check for audio attribute if hasattr(Mistral, 'audio'): print("Mistral has 'audio' attribute") audio_attr = getattr(Mistral, 'audio') print(f"Type: {type(audio_attr)}") else: print("Mistral does not have 'audio' attribute in class definition") # List some key methods/attributes print("\nMistral class attributes/methods:") for attr in dir(Mistral): if not attr.startswith('_'): print(f" - {attr}") except Exception as e: print(f"Error: {e}") EOF

Repository: topoteretes/cognee

Length of output: 148

🏁 Script executed:

rg "\.audio\.transcriptions\." cognee/ -B 5 -A 5

Repository: topoteretes/cognee

Length of output: 3082

🏁 Script executed:

cat -n cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | sed -n '1,80p'

Repository: topoteretes/cognee

Length of output: 3169

🏁 Script executed:

rg "aclient" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py -B 3 -A 3

Repository: topoteretes/cognee

Length of output: 666

Wrap synchronous Mistral API call in asyncio.to_thread() to avoid blocking the event loop.

The create_transcript method is declared as async, but self.mistral_client.audio.transcriptions.complete() is a synchronous call that blocks the event loop. Replace the call at line 158 with:

transcription_response = await asyncio.to_thread( self.mistral_client.audio.transcriptions.complete, model=transcription_model, file={ "content": f, "file_name": file_name, }, )

Alternatively, check if the mistralai SDK provides an async audio transcription API.

🤖 Prompt for AI Agents

In cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py around lines 157 to 166, the synchronous call to self.mistral_client.audio.transcriptions.complete(...) inside an async function blocks the event loop; wrap that call with asyncio.to_thread(...) and await it (or switch to an async SDK method if available) so the blocking work runs in a thread; also ensure asyncio is imported at top of the file and keep passing the same model and file arguments when calling via to_thread.

rajeevrajeshuni and others added 7 commits November 25, 2025 12:22

Adding support for audio/image transcription for all other providers

02b1778

uv lock version revert

09fbf22

Merge branch 'dev' into issue-1767

57195fb

resolving merge conflicts

8e5f14d

resolving merge conflicts

d57d188

strandardizing return type for transcription and some CR changes

6260f9e

Merge branch 'dev' into issue-1767

13c034e

refactor: make return type mandatory for transcription

a52873a

dexters1 self-assigned this Dec 16, 2025

dexters1 added the core-team label Dec 16, 2025

dexters1 added 5 commits December 16, 2025 16:02

refactor: remove optional return value

d92d6b9

refactor: use async image and transcription handling

f2cb68d

refactor: format code

3e041ec

refactor: remove mandatory transcription and image methods in LLMInte…

f27d07d

…rface

refactor: remove unused import

8027263

dexters1 mentioned this pull request Dec 16, 2025

Add support for transcribe image and audio transcription for gemini, anthropic, mistral and ollama. #1828

Closed

4 tasks

dexters1 marked this pull request as ready for review December 16, 2025 17:23

dexters1 requested a review from Vasilije1990 December 16, 2025 17:23

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

Vasilije1990 approved these changes Dec 16, 2025

View reviewed changes

Vasilije1990 merged commit aeda1d8 into dev Dec 16, 2025
132 of 133 checks passed

Vasilije1990 deleted the test-audio-image-transcription branch December 16, 2025 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test audio image transcription #1911

Test audio image transcription #1911

Uh oh!

dexters1 commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

pull-checklist bot commented Dec 16, 2025

Uh oh!

gitguardian bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 16, 2025

Uh oh!

coderabbitai bot Dec 16, 2025

Uh oh!

coderabbitai bot Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        file_name = input.split("/")[-1]
+import os
+...
+        file_name = os.path.basename(input)

Test audio image transcription #1911

Test audio image transcription #1911

Uh oh!

Conversation

dexters1 commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Acceptance Criteria

Type of Change

Screenshots/Videos (if applicable)

Pre-submission Checklist

DCO Affirmation

Summary by CodeRabbit

Release Notes

Uh oh!

pull-checklist bot commented Dec 16, 2025

Please make sure all the checkboxes are checked:

Uh oh!

gitguardian bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

coderabbitai bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dexters1 commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

gitguardian bot commented Dec 16, 2025 •

edited

Loading

coderabbitai bot commented Dec 16, 2025 •

edited

Loading