[BUG]: Multimodal requests sent in dynamo are incompatible with llava chat templates.

### Describe the Bug

Multimodal Inference requests sent to `llava-hf/llava-1.5-7b-hf` lead to the model crashing with the following error:

```bash
2025-11-20T08:30:58.906682Z  INFO dynamo_llm::discovery::watcher: added model model_name="llava-hf/llava-1.5-7b-hf" namespace="dynamo"
...
2025-11-20T08:31:13.806872Z  INFO http_client.get_http_client: Shared HTTP client initialized with timeout=30.0s
2025-11-20T08:31:14.115455Z  INFO _client._send_single_request: HTTP Request: GET http:/... "HTTP/1.1 200 OK"
...
Traceback (most recent call last):
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 370, in generate
    q = await self.add_request(
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 284, in add_request
    prompt_str, request = self.processor.process_inputs(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/engine/processor.py", line 377, in process_inputs
    processed_inputs: ProcessorInputs = self.input_preprocessor.preprocess(
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 644, in preprocess
    return self._process_decoder_only_prompt(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 614, in _process_decoder_only_prompt
    prompt_comps = self._prompt_to_llm_inputs(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 388, in _prompt_to_llm_inputs
    return self._process_tokens(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 317, in _process_tokens
    inputs = self._process_multimodal(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 242, in _process_multimodal
    mm_input = mm_processor.apply(
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 2045, in apply
    prompt_ids, prompt, mm_placeholders = self._maybe_apply_prompt_updates(
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 1997, in _maybe_apply_prompt_updates
    ) = self._apply_prompt_updates(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 1919, in _apply_prompt_updates
    assert update_idx is not None, (
           ^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Failed to apply prompt replacement for mm_items['image'][0]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/components/src/dynamo/vllm/handlers.py", line 319, in generate
    async for tok in self.generate_tokens(
  File "/workspace/components/src/dynamo/vllm/handlers.py", line 232, in generate_tokens
    async for res in gen:
  File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 420, in generate
    raise EngineGenerateError() from e
vllm.v1.engine.exceptions.EngineGenerateError
```

My mental model for what is missing: `AssertionError: Failed to apply prompt replacement for mm_items['image'][0]` means that vLLM attempted to expand/swap out the `<image>` tokens that were placed during chat template application. Which means that chat template failed to populate the `<image>` token after being applied to the inference request.


### Steps to Reproduce

```bash
// Assuming etcd/NATS started. 
python -m dynamo.frontend &
python -m dynamo.vllm --model llava-hf/llava-1.5-7b-hf --max-model-len 4096 &
```

Then, we send an inference request that looks like:
```bash
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "http://images.cocodataset.org/test2017/000000155781.jpg"
          }
        }
      ]
    }
  ],
  "model": "llava-hf/llava-1.5-7b-hf"
}'
```

### Additional Context

The issue is that if you look at the `llava-hf/llava-1.5-7b-hf` chat template:
```jinja
...
  {# Render all images first #}
  {% for content in message['content'] | selectattr('type', 'equalto', 'image') %}
    <image>
  {% endfor %}
...
```

Currently the dynamo frontend applies the chat template to the following messages object:
```bash
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {
         "type": "image_url",
         "image_url": { "url": "http://images.cocodataset.org/test2017/000000155781.jpg"}
        }
      ]
```

If we look closely, dynamo expects the chat template to pick up the "**image_url**" field, but the chat template expects the field to be labeled "**image**".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: Multimodal requests sent in dynamo are incompatible with llava chat templates. #4501

Describe the Bug

Steps to Reproduce

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: Multimodal requests sent in dynamo are incompatible with llava chat templates. #4501

Description

Describe the Bug

Steps to Reproduce

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions