Skip to content

Add Mistral OCR (/v1/ocr) support to symfony/ai-mistral-platform #2072

@tacman

Description

@tacman

Summary

The mistral-ocr-latest model uses a dedicated /v1/ocr endpoint that is structurally different from chat completions. The current symfony/ai-mistral-platform bridge only wraps /v1/chat/completions and /v1/embeddings, so applications that need document OCR must bypass the agent abstraction entirely and call the Mistral API via raw HttpClient.

The gap

The Mistral OCR endpoint accepts a document payload (URL or base64-encoded image/PDF) and returns structured output with pages[], markdown text, layout blocks, and embedded images. It is not a chat call — you cannot route it through AgentInterface::call() or MessageBag. There is currently no Platform, Model, or Response class in symfony/ai-mistral-platform that covers it.

Example request shape:

POST https://api.mistral.ai/v1/ocr
{
  "model": "mistral-ocr-latest",
  "document": {
    "type": "image_url",
    "image_url": "https://..."
  },
  "include_image_base64": true,
  "document_annotation_format": { ... }
}

Example response shape:

{
  "pages": [
    {
      "index": 0,
      "markdown": "...",
      "dimensions": { "width": 2480, "height": 3508 },
      "images": [ { "id": "img-0", "image_base64": "...", ... } ],
      "document_annotation": "{ \"blocks\": [...] }"
    }
  ]
}

Proposed addition

A MistralOcrPlatform (or an OcrModel within the existing Mistral platform) that:

  • Accepts a document URL or binary payload
  • POSTs to https://api.mistral.ai/v1/ocr
  • Returns a typed OcrResponse with pages, layout_blocks, and image_blocks
  • Integrates with the existing MISTRAL_API_KEY environment variable wiring

This would allow applications to use Mistral OCR through the same DI-configured service layer as all other AI tasks, rather than managing a raw HTTP client alongside the agent abstraction.

Context

Mistral positions OCR as a first-class document intelligence product (see docs.mistral.ai/api/endpoint/ocr). It supports PDFs with multiple pages, layout detection, bounding boxes, and per-page annotation — well beyond what vision models produce via chat completions. Given that symfony/ai-mistral-platform already exists, this feels like a natural addition to the package.

Happy to contribute a PR if the direction is accepted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions