-
Notifications
You must be signed in to change notification settings - Fork 738
Description
Describe the Bug
Multimodal Inference requests sent to llava-hf/llava-1.5-7b-hf lead to the model crashing with the following error:
2025-11-20T08:30:58.906682Z INFO dynamo_llm::discovery::watcher: added model model_name="llava-hf/llava-1.5-7b-hf" namespace="dynamo"
...
2025-11-20T08:31:13.806872Z INFO http_client.get_http_client: Shared HTTP client initialized with timeout=30.0s
2025-11-20T08:31:14.115455Z INFO _client._send_single_request: HTTP Request: GET http:/... "HTTP/1.1 200 OK"
...
Traceback (most recent call last):
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 370, in generate
q = await self.add_request(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 284, in add_request
prompt_str, request = self.processor.process_inputs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/engine/processor.py", line 377, in process_inputs
processed_inputs: ProcessorInputs = self.input_preprocessor.preprocess(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 644, in preprocess
return self._process_decoder_only_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 614, in _process_decoder_only_prompt
prompt_comps = self._prompt_to_llm_inputs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 388, in _prompt_to_llm_inputs
return self._process_tokens(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 317, in _process_tokens
inputs = self._process_multimodal(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 242, in _process_multimodal
mm_input = mm_processor.apply(
^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 2045, in apply
prompt_ids, prompt, mm_placeholders = self._maybe_apply_prompt_updates(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 1997, in _maybe_apply_prompt_updates
) = self._apply_prompt_updates(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 1919, in _apply_prompt_updates
assert update_idx is not None, (
^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Failed to apply prompt replacement for mm_items['image'][0]
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/components/src/dynamo/vllm/handlers.py", line 319, in generate
async for tok in self.generate_tokens(
File "/workspace/components/src/dynamo/vllm/handlers.py", line 232, in generate_tokens
async for res in gen:
File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 420, in generate
raise EngineGenerateError() from e
vllm.v1.engine.exceptions.EngineGenerateErrorMy mental model for what is missing: AssertionError: Failed to apply prompt replacement for mm_items['image'][0] means that vLLM attempted to expand/swap out the <image> tokens that were placed during chat template application. Which means that chat template failed to populate the <image> token after being applied to the inference request.
Steps to Reproduce
// Assuming etcd/NATS started.
python -m dynamo.frontend &
python -m dynamo.vllm --model llava-hf/llava-1.5-7b-hf --max-model-len 4096 &Then, we send an inference request that looks like:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "http://images.cocodataset.org/test2017/000000155781.jpg"
}
}
]
}
],
"model": "llava-hf/llava-1.5-7b-hf"
}'Additional Context
The issue is that if you look at the llava-hf/llava-1.5-7b-hf chat template:
...
{# Render all images first #}
{% for content in message['content'] | selectattr('type', 'equalto', 'image') %}
<image>
{% endfor %}
...Currently the dynamo frontend applies the chat template to the following messages object:
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": { "url": "http://images.cocodataset.org/test2017/000000155781.jpg"}
}
]If we look closely, dynamo expects the chat template to pick up the "image_url" field, but the chat template expects the field to be labeled "image".