Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8d67b7d
feat: add more robust handling for MM prompt
hhzhang16 Jun 4, 2025
b65efb5
feat: [WIP] generalize workers
hhzhang16 Jun 4, 2025
40c2154
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-1…
hhzhang16 Jun 4, 2025
e13f827
feat: remove cls token
hhzhang16 Jun 4, 2025
a866d73
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-1…
hhzhang16 Jun 4, 2025
0adb7e6
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-1…
hhzhang16 Jun 5, 2025
86c6135
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-1…
hhzhang16 Jun 5, 2025
19f2158
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-1…
hhzhang16 Jun 6, 2025
a766509
feat: working multimodal agg for multiple vision models
hhzhang16 Jun 7, 2025
17aecda
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-1…
hhzhang16 Jun 7, 2025
496ee57
feat: addressing ci comments
hhzhang16 Jun 9, 2025
820c7e3
feat: addressing ci comments
hhzhang16 Jun 9, 2025
bb4f95e
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dep-1…
hhzhang16 Jun 9, 2025
027341a
Update examples/multimodal/README.md
hhzhang16 Jun 9, 2025
0eff4e0
feat: trust remote code when loading autoconfig
hhzhang16 Jun 9, 2025
d736895
feat: working code for phi3v
hhzhang16 Jun 10, 2025
36eacb9
docs: add phi3v to multimodal readme
hhzhang16 Jun 10, 2025
d586343
feat: working for Qwen 2.5 VL
hhzhang16 Jun 11, 2025
d5025a7
docs: fixing dash issue
hhzhang16 Jun 11, 2025
1b0efc0
Merge branch 'main' into hannahz/dep-114-generalize-vlm-embedding-ext…
hhzhang16 Jun 11, 2025
843d586
docs: add readme note about disagg support
hhzhang16 Jun 11, 2025
d12e86d
Merge branch 'hannahz/dep-114-generalize-vlm-embedding-extraction' of…
hhzhang16 Jun 11, 2025
073ad67
feat: remove pynvml from this MR
hhzhang16 Jun 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
feat: addressing ci comments
  • Loading branch information
hhzhang16 committed Jun 9, 2025
commit 820c7e3b90530759f7b99f485d7a6c3d28a38270
14 changes: 12 additions & 2 deletions examples/multimodal/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ limitations under the License.
# Multimodal Deployment Examples

This directory provides example workflows and reference implementations for deploying a multimodal model using Dynamo.
The examples are based on the [llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) model.

## Multimodal Aggregated Serving

Expand Down Expand Up @@ -51,7 +50,10 @@ flowchart LR

```bash
cd $DYNAMO_HOME/examples/multimodal
# Serve a LLaVA 1.5 7B model:
dynamo serve graphs.agg:Frontend -f ./configs/agg-llava.yaml
# Serve a Qwen2 VL model:
# dynamo serve graphs.agg:Frontend -f ./configs/agg-llava.yaml
```

### Client
Expand Down Expand Up @@ -85,6 +87,8 @@ curl http://localhost:8000/v1/chat/completions \
}'
```

If serving the example Qwen model, replace `"llava-hf/llava-1.5-7b-hf"` in the `"model"` field with `"Qwen/Qwen2-VL-7B-Instruct"`.

You should see a response similar to this:
```json
{"id": "c37b946e-9e58-4d54-88c8-2dbd92c47b0c", "object": "chat.completion", "created": 1747725277, "model": "llava-hf/llava-1.5-7b-hf", "choices": [{"index": 0, "message": {"role": "assistant", "content": " In the image, there is a city bus parked on a street, with a street sign nearby on the right side. The bus appears to be stopped out of service. The setting is in a foggy city, giving it a slightly moody atmosphere."}, "finish_reason": "stop"}]}
Expand Down Expand Up @@ -151,6 +155,7 @@ curl http://localhost:8000/v1/chat/completions \
}
],
"max_tokens": 300,
"temperature": 0.0,
"stream": false
}'
```
Expand Down Expand Up @@ -195,8 +200,10 @@ DYNAMO_TAG=$(dynamo build graphs.agg:Frontend | grep "Successfully built" | awk

# Deploy to Kubernetes
export DEPLOYMENT_NAME=multimodal-agg
# For aggregated serving:
# For aggregated serving with LLaVA:
dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/agg-llava.yaml
# For aggregated serving with Qwen2-VL:
# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/agg-qwen.yaml
# For disaggregated serving:
# export DEPLOYMENT_NAME=multimodal-disagg
# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/disagg.yaml
Expand Down Expand Up @@ -233,8 +240,11 @@ curl localhost:8000/v1/chat/completions \
}
],
"max_tokens": 300,
"temperature": 0.0,
"stream": false
}'
```

If serving the example Qwen model, replace `"llava-hf/llava-1.5-7b-hf"` in the `"model"` field with `"Qwen/Qwen2-VL-7B-Instruct"`.

For more details on managing deployments, testing, and troubleshooting, please refer to the [Operator Deployment Guide](../../docs/guides/dynamo_deploy/operator_deployment.md).
5 changes: 1 addition & 4 deletions examples/multimodal/components/encode_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ async def encode(self, request: EncodeRequest) -> AsyncIterator[EncodeResponse]:
embeddings = self.vision_model.get_multimodal_embeddings(**image_embeds)
if isinstance(embeddings, tuple):
# The result multimodal_embeddings is tuple of tensors, with each
# tensor correspoending to a multimodal data item (image or video).
# tensor corresponding to a multimodal data item (image or video).
# TODO: for multi-image support, this result will contain multiple tensors.
embeddings = embeddings[0].unsqueeze(0)

Expand All @@ -195,9 +195,6 @@ async def encode(self, request: EncodeRequest) -> AsyncIterator[EncodeResponse]:
f"Request serialized_request is None for request: {{ id: {request_id} }}."
)

assert (
embeddings.is_contiguous()
), "Embeddings tensor must be contiguous!"
# Create a descriptor for the embeddings, this will register the memory with the connector (and the NIXL runtime).
descriptor = connect.Descriptor(embeddings)
# Create a write operation using the serialized request and the descriptor.
Expand Down
Loading