Skip to content

Conversation

@dexters1
Copy link
Collaborator

@dexters1 dexters1 commented Dec 16, 2025

Description

Run CI/CD for audio/image transcription PR from contributor @rajeevrajeshuni

Acceptance Criteria

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added audio transcription capability across LLM providers.
    • Added image transcription and description capability.
    • Enhanced observability and monitoring for AI operations.
  • Breaking Changes

    • Removed synchronous structured output method; use asynchronous alternative instead.
  • Refactor

    • Unified LLM provider architecture for improved consistency and maintainability.

✏️ Tip: You can customize this high-level summary in your review settings.

@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@gitguardian
Copy link

gitguardian bot commented Dec 16, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9573981 Triggered Generic Password 13c034e .github/workflows/examples_tests.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

Refactored LLM adapter architecture by removing the synchronous create_structured_output method from LLMGateway, consolidating adapter classes to inherit from a new GenericAPIAdapter base class instead of LLMInterface, and adding transcription capabilities and observability decorators across adapters.

Changes

Cohort / File(s) Summary
LLMGateway refactoring
cognee/infrastructure/llm/LLMGateway.py
Removed public static method create_structured_output(); callers must use acreate_structured_output instead
Generic adapter base class enhancement
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
Expanded GenericAPIAdapter with constructor signature changes (added endpoint, api_version, transcription_model, image_transcribe_model, fallback_* parameters); added public methods create_transcript() and transcribe_image(); added observability decorators (@observe) to generation and transcription paths
Adapter base class migration
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py, gemini/adapter.py, mistral/adapter.py, openai/adapter.py
Changed inheritance from LLMInterface to GenericAPIAdapter across all four adapters; updated constructor signatures to match new base class (moved api_key and model to required positional args, added transcription_model/image_transcribe_model parameters); delegated initialization to super().__init__(...); added @observe(as_type="generation") decorator to acreate_structured_output
Transcription support
mistral/adapter.py
Added new public method create_transcript() returning TranscriptionReturnType with Mistral client integration
Transcription type definition
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
Added new TranscriptionReturnType class with text: str and payload: BaseModel attributes
Adapter instantiation updates
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py
Updated adapter constructor calls to pass max_completion_tokens and related fields as positional arguments instead of keyword arguments
Interface documentation
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
Updated docstring to reference multimodal processing; removed LLMGateway import; no signature changes
Minor cleanup
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py
Removed extraneous blank line in imports

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Constructor signature compatibility: Verify all adapter constructors correctly invoke super().__init__() with the expected parameter order and types across anthropic, gemini, mistral, and openai adapters
  • Inheritance behavioral changes: Review whether the shift from LLMInterface to GenericAPIAdapter preserves existing functionality (especially around error handling, client initialization, and fallback flows)
  • Transcription implementation: Validate the new create_transcript() and transcribe_image() methods across adapters (particularly mistral and openai) for correctness and consistency
  • Call site updates in get_llm_client.py: Confirm all adapter instantiation calls match the new constructor signatures and parameter passing conventions
  • Observability integration: Ensure @observe decorators are correctly applied and don't introduce unintended side effects

Possibly related PRs

Suggested reviewers

  • Vasilije1990
  • borisarzentar
  • hajdul88

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is incomplete, containing only 'Run CI/CD for audio/image transcription PR from contributor @rajeevrajeshuni' without addressing required template sections like Acceptance Criteria, Type of Change, or Pre-submission Checklist details. Complete the pull request description by filling out the Acceptance Criteria section with specific requirements, selecting the appropriate Type of Change, providing acceptance proof, testing instructions, and checking relevant Pre-submission Checklist items.
Linked Issues check ⚠️ Warning The pull request description does not include any links to related issues, tickets, or references to the original contributor's PR that this change is based on. Add links to relevant GitHub issues or the original PR #1911 context in the description to establish traceability and provide context for reviewers.
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Test audio image transcription' is partially related to the changeset but is vague and overly broad, using non-descriptive language that doesn't convey the specific architectural changes made. Revise the title to clearly describe the main change, such as 'Refactor LLM adapters to use GenericAPIAdapter base class with transcription support' or 'Add audio and image transcription capabilities to LLM adapters'.
✅ Passed checks (1 passed)
Check name Status Explanation
Out of Scope Changes check ✅ Passed The changes appear focused on refactoring LLM adapters to use a new GenericAPIAdapter base class and adding transcription capabilities, which aligns with the stated PR objective of testing audio/image transcription features.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch test-audio-image-transcription

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dexters1 dexters1 self-assigned this Dec 16, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py (1)

97-99: Unused import of GenericAPIAdapter in OLLAMA branch.

GenericAPIAdapter is imported but OllamaAPIAdapter is instantiated. This import appears to be dead code.

     elif provider == LLMProvider.OLLAMA:
         if llm_config.llm_api_key is None and raise_api_key_error:
             raise LLMAPIKeyNotSetError()

-        from cognee.infrastructure.llm.structured_output_framework.litellm_instructor.llm.generic_llm_api.adapter import (
-            GenericAPIAdapter,
-        )
-
         return OllamaAPIAdapter(
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py (1)

79-91: Hardcoded max_tokens=4096 ignores configured max_completion_tokens.

The constructor accepts max_completion_tokens and passes it to the base class (line 41), but line 81 uses a hardcoded value of 4096. This means the configured token limit is ignored, potentially causing unexpected behavior.

             return await self.aclient(
                 model=self.model,
-                max_tokens=4096,
+                max_tokens=self.max_completion_tokens,
                 max_retries=2,
                 messages=[
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (2)

179-181: Commented-out code should be removed or reinstated.

The api_base=self.fallback_endpoint line is commented out but the base class GenericAPIAdapter uses it in its fallback path. This inconsistency could cause the fallback to behave differently than expected.

                         api_key=self.fallback_api_key,
-                        # api_base=self.fallback_endpoint,
+                        api_base=self.fallback_endpoint,
                         response_model=response_model,

227-236: Replace litellm.transcription() with async alternative to prevent event loop blocking.

The synchronous litellm.transcription() blocks the event loop in async context; use await litellm.atranscription(), or wrap the call in asyncio.to_thread() if needed.

🧹 Nitpick comments (10)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py (1)

9-14: Docstring mentions multimodal processing but interface lacks corresponding methods.

The docstring claims the interface defines methods for "multimodal processing" (line 10), but only acreate_structured_output is documented and implemented. If multimodal capabilities are intended, consider adding method signatures for transcription and image processing to align with the documented scope, or update the docstring to reflect the actual interface.

 class LLMInterface(Protocol):
     """
-    Define an interface for LLM models with methods for structured output, multimodal processing, and prompt display.
+    Define an interface for LLM models with methods for structured output generation.

     Methods:
     - acreate_structured_output(text_input: str, system_prompt: str, response_model: Type[BaseModel])
     """
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1)

1-10: Consider using @dataclass for cleaner data container definition.

The class uses class-level type annotations (lines 5-6) which aren't enforced at runtime and don't serve as instance attributes. Using @dataclass would be more idiomatic and eliminate the boilerplate __init__.

Additionally, a module docstring would help document this type's purpose per coding guidelines.

+"""Types for LLM transcription return values."""
+
+from dataclasses import dataclass
 from pydantic import BaseModel


+@dataclass
 class TranscriptionReturnType:
+    """Container for transcription results with raw text and structured payload."""
+
     text: str
     payload: BaseModel
-
-    def __init__(self, text: str, payload: BaseModel):
-        self.text = text
-        self.payload = payload
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py (2)

115-120: Inconsistent argument passing style: consider using keyword arguments.

AnthropicAdapter uses positional arguments (lines 116-118) while other adapters like OpenAIAdapter, GeminiAdapter, and MistralAdapter use keyword arguments. Using keyword arguments consistently improves readability and reduces risk of positional mismatches if constructor signatures change.

         return AnthropicAdapter(
-            llm_config.llm_api_key,
-            llm_config.llm_model,
-            max_completion_tokens,
+            api_key=llm_config.llm_api_key,
+            model=llm_config.llm_model,
+            max_completion_tokens=max_completion_tokens,
             instructor_mode=llm_config.llm_instructor_mode.lower(),
         )

130-139: Same inconsistency: GenericAPIAdapter uses positional arguments.

For consistency with other adapters and maintainability, prefer keyword arguments here as well.

         return GenericAPIAdapter(
-            llm_config.llm_api_key,
-            llm_config.llm_model,
-            max_completion_tokens,
+            api_key=llm_config.llm_api_key,
+            model=llm_config.llm_model,
+            max_completion_tokens=max_completion_tokens,
             "Custom",
             instructor_mode=llm_config.llm_instructor_mode.lower(),
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (4)

7-7: Unused import: Optional

Optional is imported but not used anywhere in this file.

-from typing import Type, Optional
+from typing import Type

198-220: Type annotation missing and inefficient MIME type validation order.

  1. The input parameter should be typed as str.
  2. MIME type validation occurs after the file has already been read and base64-encoded. If validation fails, work was wasted and the error message is misleading since we did successfully open the file.
-    async def create_transcript(self, input) -> TranscriptionReturnType:
+    async def create_transcript(self, input: str) -> TranscriptionReturnType:
         """
         Generate an audio transcript from a user query.
         ...
         """
+        mime_type, _ = mimetypes.guess_type(input)
+        if not mime_type or not mime_type.startswith("audio/"):
+            raise ValueError(
+                f"Could not determine MIME type for audio file: {input}. Is the extension correct?"
+            )
         async with open_data_file(input, mode="rb") as audio_file:
             encoded_string = base64.b64encode(audio_file.read()).decode("utf-8")
-        mime_type, _ = mimetypes.guess_type(input)
-        if not mime_type or not mime_type.startswith("audio/"):
-            raise ValueError(
-                f"Could not determine MIME type for audio file: {input}. Is the extension correct?"
-            )
         response = await litellm.acompletion(

252-274: Type annotation and MIME validation order; return type mismatch.

  1. The input parameter should be typed as str.
  2. Same issue as create_transcript: validate MIME type before reading the file.
  3. The return type annotation says BaseModel, but the method returns a raw litellm response object, not a BaseModel instance. Consider using Any or defining a proper return type.
-    async def transcribe_image(self, input) -> BaseModel:
+    async def transcribe_image(self, input: str):
         """
         Generate a transcription of an image from a user query.
         ...
         Returns:
         --------
-            - BaseModel: A structured output generated by the model, returned as an instance of
-              BaseModel.
+            The raw response from the LLM API containing the image description.
         """
+        mime_type, _ = mimetypes.guess_type(input)
+        if not mime_type or not mime_type.startswith("image/"):
+            raise ValueError(
+                f"Could not determine MIME type for image file: {input}. Is the extension correct?"
+            )
         async with open_data_file(input, mode="rb") as image_file:
             encoded_image = base64.b64encode(image_file.read()).decode("utf-8")
-        mime_type, _ = mimetypes.guess_type(input)
-        if not mime_type or not mime_type.startswith("image/"):
-            raise ValueError(
-                f"Could not determine MIME type for image file: {input}. Is the extension correct?"
-            )

297-297: Hardcoded max_completion_tokens=300 may be too restrictive.

For detailed image descriptions, 300 tokens could truncate the response. Consider making this configurable or using self.max_completion_tokens for consistency with other methods.

-            max_completion_tokens=300,
+            max_completion_tokens=self.max_completion_tokens,
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (1)

41-46: Docstring lists create_structured_output but method may be removed.

The docstring mentions create_structured_output as a public method, but based on the AI summary, the synchronous version was removed. Update the docstring to reflect the current API surface.

     Public methods:

     - acreate_structured_output
-    - create_structured_output
     - create_transcript
     - transcribe_image
     - show_prompt
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (1)

138-138: Return type Optional[TranscriptionReturnType] is misleading.

The method always returns a TranscriptionReturnType or raises an exception; it never returns None. Remove Optional for accurate typing.

-    async def create_transcript(self, input) -> Optional[TranscriptionReturnType]:
+    async def create_transcript(self, input: str) -> TranscriptionReturnType:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12e6ad1 and 8027263.

📒 Files selected for processing (10)
  • cognee/infrastructure/llm/LLMGateway.py (0 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py (2 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py (3 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (7 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py (3 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py (1 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (5 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py (0 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (5 hunks)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1 hunks)
💤 Files with no reviewable changes (2)
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py
  • cognee/infrastructure/llm/LLMGateway.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py
cognee/{modules,infrastructure,tasks}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Co-locate feature-specific helpers under their respective package (modules/, infrastructure/, or tasks/)

Files:

  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/llm_interface.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py
  • cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py
🧬 Code graph analysis (4)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py (4)
cognee/infrastructure/llm/exceptions.py (1)
  • ContentPolicyFilterError (4-5)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (1)
  • GenericAPIAdapter (38-300)
cognee/shared/logging_utils.py (1)
  • get_logger (212-224)
cognee/modules/observability/get_observe.py (1)
  • get_observe (5-25)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (3)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (2)
  • GenericAPIAdapter (38-300)
  • create_transcript (198-242)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1)
  • TranscriptionReturnType (4-10)
cognee/infrastructure/llm/LLMGateway.py (1)
  • create_transcript (41-47)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (3)
cognee/infrastructure/files/utils/open_data_file.py (1)
  • open_data_file (11-54)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (2)
  • GenericAPIAdapter (38-300)
  • create_transcript (198-242)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1)
  • TranscriptionReturnType (4-10)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/generic_llm_api/adapter.py (4)
cognee/infrastructure/files/utils/open_data_file.py (1)
  • open_data_file (11-54)
cognee/modules/observability/get_observe.py (1)
  • get_observe (5-25)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/types.py (1)
  • TranscriptionReturnType (4-10)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (1)
  • create_transcript (208-237)
🔇 Additional comments (7)
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py (1)

35-49: LGTM on the adapter refactoring to use GenericAPIAdapter.

The inheritance change, constructor delegation, and API client initialization using self.api_key from the base class are correctly implemented. The observability decorator integration aligns with the broader PR changes.

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py (3)

62-78: Well-structured base class initialization with extended parameters.

The super().__init__ call correctly passes all relevant configuration including endpoint, api_version, transcription_model, and fallback credentials. The integration with GenericAPIAdapter is properly implemented.


134-182: Content policy error handling with fallback is comprehensive.

The error handling catches content policy violations, checks for valid fallback configuration before attempting retry, and properly re-raises non-content-policy InstructorRetryException errors. The nested try/except for the fallback path mirrors the primary error handling appropriately.


118-127: No action required. The message ordering in the messages array does not affect Gemini's response quality when using the instructor library adapter, as the adapter automatically converts and properly formats messages for the Gemini API.

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/openai/adapter.py (1)

77-88: LGTM: Proper delegation to base class constructor.

The super().__init__() call correctly passes all required parameters to GenericAPIAdapter, including the new transcription_model and fallback configuration.

cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py (2)

51-59: LGTM: Proper base class initialization.

The constructor correctly delegates to GenericAPIAdapter.__init__() with appropriate parameters including the new transcription model options.


70-70: Good addition of observability decorator.

Adding @observe(as_type="generation") aligns with the consistent observability pattern across all adapters in this refactor.

Comment on lines 41 to 45
Public methods:
- acreate_structured_output(text_input: str, system_prompt: str, response_model:
Type[BaseModel]) -> BaseModel
- acreate_structured_output(text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel
- create_transcript(input) -> BaseModel: Transcribe audio files to text
- transcribe_image(input) -> BaseModel: Inherited from GenericAPIAdapter
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring inaccuracy: create_transcript return type and method ownership.

The docstring states create_transcript(input) -> BaseModel but per the GenericAPIAdapter context, this method returns TranscriptionReturnType. Also, both create_transcript and transcribe_image are inherited from the base class, not defined in this adapter.

     Public methods:
     - acreate_structured_output(text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel
-    - create_transcript(input) -> BaseModel: Transcribe audio files to text
-    - transcribe_image(input) -> BaseModel: Inherited from GenericAPIAdapter
+    
+    Inherited from GenericAPIAdapter:
+    - create_transcript(input) -> TranscriptionReturnType: Transcribe audio files to text
+    - transcribe_image(input) -> BaseModel: Transcribe image content
     """

transcription_model = self.transcription_model
if self.transcription_model.startswith("mistral"):
transcription_model = self.transcription_model.split("/")[-1]
file_name = input.split("/")[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use os.path.basename() instead of string split for path handling.

input.split("/")[-1] is not portable across operating systems. Use os.path.basename() for cross-platform compatibility. As per coding guidelines, prioritize portability.

+import os
 ...
-        file_name = input.split("/")[-1]
+        file_name = os.path.basename(input)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
file_name = input.split("/")[-1]
import os
...
file_name = os.path.basename(input)
🤖 Prompt for AI Agents
In
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
around line 156, replace the non-portable path handling "input.split('/')[-1]"
with os.path.basename(input) to ensure cross-platform correctness; if os is not
already imported at the top of the module, add "import os" and update the
variable assignment accordingly so the filename is derived using
os.path.basename.

Comment on lines +157 to +166
async with open_data_file(input, mode="rb") as f:
transcription_response = self.mistral_client.audio.transcriptions.complete(
model=transcription_model,
file={
"content": f,
"file_name": file_name,
},
)

return TranscriptionReturnType(transcription_response.text, transcription_response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "adapter.py" -path "*/mistral/*" | head -20

Repository: topoteretes/cognee

Length of output: 159


🏁 Script executed:

fd "adapter.py" -p "*/mistral/*" -x cat -n {}

Repository: topoteretes/cognee

Length of output: 160


🏁 Script executed:

rg "mistral_client" -A 5 -B 5

Repository: topoteretes/cognee

Length of output: 3101


🏁 Script executed:

cat -n cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | sed -n '140,175p'

Repository: topoteretes/cognee

Length of output: 1338


🏁 Script executed:

rg "import.*Mistral|from.*mistral" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py

Repository: topoteretes/cognee

Length of output: 91


🏁 Script executed:

rg "async def.*transcrib" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py -A 20

Repository: topoteretes/cognee

Length of output: 44


🏁 Script executed:

rg "async def" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | grep -i transcrib

Repository: topoteretes/cognee

Length of output: 44


🏁 Script executed:

cat -n cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | sed -n '130,140p'

Repository: topoteretes/cognee

Length of output: 588


🏁 Script executed:

rg "def.*transcrib" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py -B 5

Repository: topoteretes/cognee

Length of output: 44


🏁 Script executed:

cat -n cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | head -30

Repository: topoteretes/cognee

Length of output: 1202


🏁 Script executed:

rg "Mistral|mistralai" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py

Repository: topoteretes/cognee

Length of output: 305


🏁 Script executed:

grep -r "asyncio.to_thread\|run_in_executor" cognee/ | head -10

Repository: topoteretes/cognee

Length of output: 600


🏁 Script executed:

cat pyproject.toml | grep -A 5 "mistral"

Repository: topoteretes/cognee

Length of output: 373


🏁 Script executed:

rg "mistral_client\." cognee/ -B 2 -A 2

Repository: topoteretes/cognee

Length of output: 794


🏁 Script executed:

rg "self\.aclient\|acompletion" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py -B 3 -A 3

Repository: topoteretes/cognee

Length of output: 44


🏁 Script executed:

python3 << 'EOF'
import subprocess
result = subprocess.run(
    ["pip", "show", "mistralai"],
    capture_output=True,
    text=True
)
print(result.stdout)
print(result.stderr)
EOF

Repository: topoteretes/cognee

Length of output: 148


🏁 Script executed:

python3 << 'EOF'
# Check if mistralai module has async audio transcription
try:
    from mistralai import Mistral
    import inspect
    
    client = Mistral.__dict__
    
    # Check for audio attribute
    if hasattr(Mistral, 'audio'):
        print("Mistral has 'audio' attribute")
        audio_attr = getattr(Mistral, 'audio')
        print(f"Type: {type(audio_attr)}")
    else:
        print("Mistral does not have 'audio' attribute in class definition")
        
    # List some key methods/attributes
    print("\nMistral class attributes/methods:")
    for attr in dir(Mistral):
        if not attr.startswith('_'):
            print(f"  - {attr}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: topoteretes/cognee

Length of output: 148


🏁 Script executed:

rg "\.audio\.transcriptions\." cognee/ -B 5 -A 5

Repository: topoteretes/cognee

Length of output: 3082


🏁 Script executed:

cat -n cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py | sed -n '1,80p'

Repository: topoteretes/cognee

Length of output: 3169


🏁 Script executed:

rg "aclient" cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py -B 3 -A 3

Repository: topoteretes/cognee

Length of output: 666


Wrap synchronous Mistral API call in asyncio.to_thread() to avoid blocking the event loop.

The create_transcript method is declared as async, but self.mistral_client.audio.transcriptions.complete() is a synchronous call that blocks the event loop. Replace the call at line 158 with:

transcription_response = await asyncio.to_thread(
    self.mistral_client.audio.transcriptions.complete,
    model=transcription_model,
    file={
        "content": f,
        "file_name": file_name,
    },
)

Alternatively, check if the mistralai SDK provides an async audio transcription API.

🤖 Prompt for AI Agents
In
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/mistral/adapter.py
around lines 157 to 166, the synchronous call to
self.mistral_client.audio.transcriptions.complete(...) inside an async function
blocks the event loop; wrap that call with asyncio.to_thread(...) and await it
(or switch to an async SDK method if available) so the blocking work runs in a
thread; also ensure asyncio is imported at top of the file and keep passing the
same model and file arguments when calling via to_thread.

@Vasilije1990 Vasilije1990 merged commit aeda1d8 into dev Dec 16, 2025
132 of 133 checks passed
@Vasilije1990 Vasilije1990 deleted the test-audio-image-transcription branch December 16, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants