feat: addressing ci comments

ai-dynamo · hhzhang16 · Jun 11, 2025 · Jun 4, 2025 · Jun 4, 2025 · Jun 4, 2025
commit 820c7e3b90530759f7b99f485d7a6c3d28a38270
@@ -18,7 +18,6 @@ limitations under the License.
 # Multimodal Deployment Examples
 
 This directory provides example workflows and reference implementations for deploying a multimodal model using Dynamo.
-The examples are based on the [llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) model.
 
 ## Multimodal Aggregated Serving
 
@@ -51,7 +50,10 @@ flowchart LR
 
 ```bash
 cd $DYNAMO_HOME/examples/multimodal
+# Serve a LLaVA 1.5 7B model:
 dynamo serve graphs.agg:Frontend -f ./configs/agg-llava.yaml
+# Serve a Qwen2 VL model:
+# dynamo serve graphs.agg:Frontend -f ./configs/agg-llava.yaml
 ```
 
 ### Client
@@ -85,6 +87,8 @@ curl http://localhost:8000/v1/chat/completions \
     }'
 ```
 
+If serving the example Qwen model, replace `"llava-hf/llava-1.5-7b-hf"` in the `"model"` field with `"Qwen/Qwen2-VL-7B-Instruct"`.
+
 You should see a response similar to this:
 ```json
 {"id": "c37b946e-9e58-4d54-88c8-2dbd92c47b0c", "object": "chat.completion", "created": 1747725277, "model": "llava-hf/llava-1.5-7b-hf", "choices": [{"index": 0, "message": {"role": "assistant", "content": " In the image, there is a city bus parked on a street, with a street sign nearby on the right side. The bus appears to be stopped out of service. The setting is in a foggy city, giving it a slightly moody atmosphere."}, "finish_reason": "stop"}]}
@@ -151,6 +155,7 @@ curl http://localhost:8000/v1/chat/completions \
         }
       ],
       "max_tokens": 300,
+      "temperature": 0.0,
       "stream": false
     }'
 ```
@@ -195,8 +200,10 @@ DYNAMO_TAG=$(dynamo build graphs.agg:Frontend | grep "Successfully built" |  awk
 
 # Deploy to Kubernetes
 export DEPLOYMENT_NAME=multimodal-agg
-# For aggregated serving:
+# For aggregated serving with LLaVA:
 dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/agg-llava.yaml
+# For aggregated serving with Qwen2-VL:
+# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/agg-qwen.yaml
 # For disaggregated serving:
 # export DEPLOYMENT_NAME=multimodal-disagg
 # dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/disagg.yaml
@@ -233,8 +240,11 @@ curl localhost:8000/v1/chat/completions \
       }
     ],
     "max_tokens": 300,
+    "temperature": 0.0,
     "stream": false
   }'
 ```
 
+If serving the example Qwen model, replace `"llava-hf/llava-1.5-7b-hf"` in the `"model"` field with `"Qwen/Qwen2-VL-7B-Instruct"`.
+
 For more details on managing deployments, testing, and troubleshooting, please refer to the [Operator Deployment Guide](../../docs/guides/dynamo_deploy/operator_deployment.md).
@@ -182,7 +182,7 @@ async def encode(self, request: EncodeRequest) -> AsyncIterator[EncodeResponse]:
                 embeddings = self.vision_model.get_multimodal_embeddings(**image_embeds)
                 if isinstance(embeddings, tuple):
                     # The result multimodal_embeddings is tuple of tensors, with each
-                    # tensor correspoending to a multimodal data item (image or video).
+                    # tensor corresponding to a multimodal data item (image or video).
                     # TODO: for multi-image support, this result will contain multiple tensors.
                     embeddings = embeddings[0].unsqueeze(0)
 
@@ -195,9 +195,6 @@ async def encode(self, request: EncodeRequest) -> AsyncIterator[EncodeResponse]:
                         f"Request serialized_request is None for request: {{ id: {request_id} }}."
                     )
 
-                assert (
-                    embeddings.is_contiguous()
-                ), "Embeddings tensor must be contiguous!"
                 # Create a descriptor for the embeddings, this will register the memory with the connector (and the NIXL runtime).
                 descriptor = connect.Descriptor(embeddings)
                 # Create a write operation using the serialized request and the descriptor.