Summary
The mistral-ocr-latest model uses a dedicated /v1/ocr endpoint that is structurally different from chat completions. The current symfony/ai-mistral-platform bridge only wraps /v1/chat/completions and /v1/embeddings, so applications that need document OCR must bypass the agent abstraction entirely and call the Mistral API via raw HttpClient.
The gap
The Mistral OCR endpoint accepts a document payload (URL or base64-encoded image/PDF) and returns structured output with pages[], markdown text, layout blocks, and embedded images. It is not a chat call — you cannot route it through AgentInterface::call() or MessageBag. There is currently no Platform, Model, or Response class in symfony/ai-mistral-platform that covers it.
Example request shape:
POST https://api.mistral.ai/v1/ocr
{
"model": "mistral-ocr-latest",
"document": {
"type": "image_url",
"image_url": "https://..."
},
"include_image_base64": true,
"document_annotation_format": { ... }
}
Example response shape:
{
"pages": [
{
"index": 0,
"markdown": "...",
"dimensions": { "width": 2480, "height": 3508 },
"images": [ { "id": "img-0", "image_base64": "...", ... } ],
"document_annotation": "{ \"blocks\": [...] }"
}
]
}
Proposed addition
A MistralOcrPlatform (or an OcrModel within the existing Mistral platform) that:
- Accepts a document URL or binary payload
- POSTs to
https://api.mistral.ai/v1/ocr
- Returns a typed
OcrResponse with pages, layout_blocks, and image_blocks
- Integrates with the existing
MISTRAL_API_KEY environment variable wiring
This would allow applications to use Mistral OCR through the same DI-configured service layer as all other AI tasks, rather than managing a raw HTTP client alongside the agent abstraction.
Context
Mistral positions OCR as a first-class document intelligence product (see docs.mistral.ai/api/endpoint/ocr). It supports PDFs with multiple pages, layout detection, bounding boxes, and per-page annotation — well beyond what vision models produce via chat completions. Given that symfony/ai-mistral-platform already exists, this feels like a natural addition to the package.
Happy to contribute a PR if the direction is accepted.
Summary
The
mistral-ocr-latestmodel uses a dedicated/v1/ocrendpoint that is structurally different from chat completions. The currentsymfony/ai-mistral-platformbridge only wraps/v1/chat/completionsand/v1/embeddings, so applications that need document OCR must bypass the agent abstraction entirely and call the Mistral API via rawHttpClient.The gap
The Mistral OCR endpoint accepts a
documentpayload (URL or base64-encoded image/PDF) and returns structured output withpages[], markdown text, layout blocks, and embedded images. It is not a chat call — you cannot route it throughAgentInterface::call()orMessageBag. There is currently noPlatform,Model, orResponseclass insymfony/ai-mistral-platformthat covers it.Example request shape:
Example response shape:
{ "pages": [ { "index": 0, "markdown": "...", "dimensions": { "width": 2480, "height": 3508 }, "images": [ { "id": "img-0", "image_base64": "...", ... } ], "document_annotation": "{ \"blocks\": [...] }" } ] }Proposed addition
A
MistralOcrPlatform(or anOcrModelwithin the existing Mistral platform) that:https://api.mistral.ai/v1/ocrOcrResponsewithpages,layout_blocks, andimage_blocksMISTRAL_API_KEYenvironment variable wiringThis would allow applications to use Mistral OCR through the same DI-configured service layer as all other AI tasks, rather than managing a raw HTTP client alongside the agent abstraction.
Context
Mistral positions OCR as a first-class document intelligence product (see docs.mistral.ai/api/endpoint/ocr). It supports PDFs with multiple pages, layout detection, bounding boxes, and per-page annotation — well beyond what vision models produce via chat completions. Given that
symfony/ai-mistral-platformalready exists, this feels like a natural addition to the package.Happy to contribute a PR if the direction is accepted.