Skip to content

Conversation

@yuekaizhang
Copy link
Contributor

@yuekaizhang yuekaizhang commented Aug 28, 2025

Overview:

This PR supports qwen2audio multimodal example based on VLLM.

Details:

Where should the reviewer start?

audio_encoder_worker.py and audio_agg.sh should be a good start.

@krishung5 I was wondering if you have some time to review the PR, many thanks.

Summary by CodeRabbit

  • New Features

    • Added audio modality support to the multimodal pipeline, including processing of audio_url content and streaming responses.
    • Introduced an audio encoder worker and integrated audio embeddings into the aggregated serving flow.
    • Added launcher script to run an end-to-end audio aggregated setup.
    • Enabled model support for Qwen/Qwen2-Audio-7B-Instruct.
    • Improved input validation to accept audio alongside image/video.
  • Documentation

    • Expanded README with “Multimodal Aggregated Audio Serving,” including architecture diagram, setup instructions, curl example, and sample response.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 28, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

Walkthrough

Adds audio modality support to the multimodal example: new audio encoder worker, audio loader utility, protocol/model updates for audio embeddings, processor and PD worker adjustments, a new launch script for audio aggregation, and README documentation describing the aggregated audio-serving workflow.

Changes

Cohort / File(s) Summary
Documentation
examples/multimodal/README.md
Adds “Multimodal Aggregated Audio Serving” section with topology, Mermaid diagram, curl example, and sample response.
Protocol and Model
examples/multimodal/utils/protocol.py, examples/multimodal/utils/model.py
Introduces AudioURL types, extends MessageContent and MultiModalInput with audio_url, broadens embeddings shape to include 2D, adds Qwen2-Audio-7B constant, and updates construct_mm_data to accept/use audio_embeds.
Processor
examples/multimodal/components/processor.py
Handles audio_url content, enforces exclusivity with image/video, and updates required-input validation/messages.
Workers
examples/multimodal/components/audio_encode_worker.py, examples/multimodal/components/worker.py
Adds VllmEncodeWorker for audio embeddings and downstream orchestration; updates PDWorker to route audio inputs and clear audio_url after building multi_modal_data.
Utilities
examples/multimodal/utils/audio_loader.py
Adds async AudioLoader with HTTP(S) fetch, librosa decoding on threadpool, and LRU-like in-memory cache.
Launch Scripts
examples/multimodal/launch/audio_agg.sh, examples/multimodal/launch/agg.sh
New audio aggregation launcher orchestrating ingress, processor, audio encoder, and prefill; trivial EOF newline change to existing agg.sh.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Client
  participant F as Ingress Frontend
  participant P as Processor
  participant AE as AudioEncodeWorker
  participant PD as PD/LLM Worker

  U->>F: HTTP request (messages incl. audio_url)
  F->>P: vLLMMultimodalRequest
  Note right of P: Tokenize prompt<br/>Extract audio_url

  P->>AE: Encode request (audio_url)
  AE->>AE: Load audio (AudioLoader)
  AE->>AE: Compute audio embeddings
  AE-->>P: Embeddings descriptor

  P->>PD: Prefill/Decode request<br/>(multi_modal_data with audio)
  PD-->>F: Streamed tokens/results
  F-->>U: Streamed response

  rect rgba(200, 230, 255, 0.25)
    Note over AE,PD: New/updated interactions for audio modality
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

In cables hums a whispered tune,
I hop through bytes beneath the moon.
From waves to words, I nibble sound,
Embeddings bloom where beats are found.
Two workers dance, a stereo pair—
Prefill, decode—responses share.
Thump-thump, the audio’s in the air! 🐇🎧

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link

👋 Hi yuekaizhang! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Aug 28, 2025
@yuekaizhang yuekaizhang changed the title [multimodal] support qwen2audio example feat: Add vLLM multimodal audio support Aug 28, 2025
@github-actions github-actions bot added the feat label Aug 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
examples/multimodal/launch/agg.sh (2)

61-64: Wire up the audio encoder worker (and quote args).

Currently launches components/encode_worker.py, not the new audio encoder; also unquoted args risk word-splitting. Switch based on model name and quote.

-# run E/P/D workers
-CUDA_VISIBLE_DEVICES=0 python3 components/encode_worker.py --model $MODEL_NAME &
-CUDA_VISIBLE_DEVICES=1 python3 components/worker.py --model $MODEL_NAME --worker-type prefill &
+# run E/P/D workers
+if [[ "${MODEL_NAME,,}" == *"audio"* ]]; then
+  CUDA_VISIBLE_DEVICES=0 python3 components/audio_encode_worker.py --model "$MODEL_NAME" &
+else
+  CUDA_VISIBLE_DEVICES=0 python3 components/encode_worker.py --model "$MODEL_NAME" &
+fi
+CUDA_VISIBLE_DEVICES=1 python3 components/worker.py --model "$MODEL_NAME" --worker-type prefill &

55-66: Background failure handling: exit fast on first crash.

set -e won’t catch background job failures. Use a wait loop to detect first failing PID and then kill others.

-# Wait for all background processes to complete
-wait
+# Wait for all background processes to complete; exit on first failure
+pids=($(jobs -p))
+status=0
+for pid in "${pids[@]}"; do
+  if ! wait "$pid"; then
+    status=$?
+    break
+  fi
+done
+exit "$status"
examples/multimodal/utils/model.py (1)

44-52: Fix dtype handling, None-guard, and return typing for audio path.

  • Don’t force bf16; respect embeddings_dtype to avoid mismatch with the encoder/PD buffer.
  • Guard None before .to(...)
  • Return type annotation doesn’t match actual values (audio: List[tensor], video: np.ndarray). Relax to Dict[str, Any].
-def construct_mm_data(
+def construct_mm_data(
     model: str,
     embeddings_dtype: torch.dtype,
     image_embeds: Optional[torch.Tensor] = None,
     video_numpy: Optional[Any] = None,
     image_grid_thw: Optional[List[Any]] = None,
     audio_embeds: Optional[torch.Tensor] = None,
-) -> Dict[str, torch.Tensor | Dict[str, Any]]:
+) -> Dict[str, Any]:
@@
-    if model == SupportedModels.QWEN_2_AUDIO_7B:
-        audio_embeds = audio_embeds.to(torch.bfloat16)
-        assert audio_embeds.ndim == 2, "Audio embeddings must be 2D"
-        return {"audio": [audio_embeds]}
+    if model == SupportedModels.QWEN_2_AUDIO_7B:
+        if audio_embeds is None:
+            raise ValueError("No audio embeddings provided.")
+        audio_embeds = audio_embeds.to(embeddings_dtype)
+        assert audio_embeds.ndim == 2, "Audio embeddings must be 2D"
+        return {"audio": [audio_embeds]}

Also applies to: 53-56

examples/multimodal/components/worker.py (2)

248-252: Set audio embeddings dtype explicitly.

Align PD buffer dtype with the model/audio path (and your model.construct_mm_data call). Recommend bf16 for audio to avoid double casting and potential precision loss.

-        if "video" in self.engine_args.model.lower():
+        if "video" in self.engine_args.model.lower():
             self.EMBEDDINGS_DTYPE = torch.uint8
-        else:
+        elif "audio" in self.engine_args.model.lower():
+            self.EMBEDDINGS_DTYPE = torch.bfloat16
+        else:
             self.EMBEDDINGS_DTYPE = torch.float16

285-312: Guard the URL path for audio/video; current else-branch assumes image and will mis-handle audio_url.

If audio_url survives to PD (e.g., encoder didn’t clear it), this code attempts to load an image with None URL. Make the URL branch explicit.

-        if (
+        if (
             request.multimodal_input.image_url is None
             and request.multimodal_input.video_url is None
             and request.multimodal_input.audio_url is None
         ):
@@
-        else:
-            # Use PIL image instead of image embeddings
-            multi_modal_data = {
-                "image": await self.image_loader.load_image(
-                    request.multimodal_input.image_url
-                )
-            }
+        else:
+            if request.multimodal_input.image_url is not None:
+                # Use PIL image instead of image embeddings
+                multi_modal_data = {
+                    "image": await self.image_loader.load_image(
+                        request.multimodal_input.image_url
+                    )
+                }
+            elif request.multimodal_input.video_url is not None:
+                raise ValueError(
+                    "video_url cannot be consumed by PD worker; provide precomputed embeddings"
+                )
+            elif request.multimodal_input.audio_url is not None:
+                raise ValueError(
+                    "audio_url cannot be consumed by PD worker; provide precomputed embeddings"
+                )

Also applies to: 319-331

🧹 Nitpick comments (14)
examples/multimodal/launch/agg.sh (2)

39-53: Add prompt-template handling for audio models or enforce explicit template.

No template branch for Qwen2-Audio; script exits for audio by default. Either add a model-specific template or make --prompt-template mandatory when MODEL includes "Audio".


58-60: Quote PROMPT_TEMPLATE and MODEL_NAME.

Prevent word-splitting or globbing in args.

-python3 components/processor.py --model $MODEL_NAME --prompt-template "$PROMPT_TEMPLATE" &
+python3 components/processor.py --model "$MODEL_NAME" --prompt-template "$PROMPT_TEMPLATE" &
examples/multimodal/utils/protocol.py (1)

110-117: Validate audio_url scheme.

Add a light validator to reject non-http(s) URLs early (mirrors typical image_url hygiene).

 class AudioURLDetail(BaseModel):
-    url: str
+    url: str
+
+    @field_validator("url")
+    @classmethod
+    def _http_only(cls, v: str) -> str:
+        if not (v.startswith("http://") or v.startswith("https://")):
+            raise ValueError("audio_url must start with http:// or https://")
+        return v
examples/multimodal/components/worker.py (1)

274-284: Descriptor/embeddings creation assumes embeddings_shape is set.

If upstream forgets to set embeddings_shape for audio, this will break. Add a precondition check with a clearer error.

-        embeddings = torch.empty(
+        if request.embeddings_shape is None:
+            raise ValueError("embeddings_shape must be set for non-URL multimodal inputs")
+        embeddings = torch.empty(
             request.embeddings_shape,
             dtype=self.EMBEDDINGS_DTYPE,
             device=self.EMBEDDINGS_DEVICE,
         )
examples/multimodal/README.md (3)

511-514: Tighten wording and fix minor grammar.

  • Use “passes” (singular) and parallel phrasing.

Apply:

-- processor: Tokenizes the prompt and passes it to the AudioEncodeWorker.
-- frontend: HTTP endpoint to handle incoming requests.
+- processor: Tokenizes the prompt and passes it to the AudioEncodeWorker.
+- frontend: Provides an HTTP endpoint to handle incoming requests.

543-568: Align the example payload with the note.

If keeping the temporary image_url hack, explicitly label it in the example to prevent users from switching to audio_url and breaking the flow. Also, consider lowering max_tokens from 6000 unless known-safe.

Apply:

-            {
-              "type": "image_url",
-              "image_url": {
-                "url": "https://raw.githubusercontent.com/yuekaizhang/Triton-ASR-Client/main/datasets/mini_en/wav/1221-135766-0002.wav"
-              }
-            }
+            {
+              "type": "image_url",
+              "image_url": {
+                "url": "https://raw.githubusercontent.com/yuekaizhang/Triton-ASR-Client/main/datasets/mini_en/wav/1221-135766-0002.wav"
+              }
+            }
+            // Note: using image_url to carry audio temporarily.

543-545: Port inconsistency across sections.

Audio uses port 8000 (matches audio_agg.sh), earlier sections use 8080. Add a one-liner callout at the start of the audio section or standardize ports across examples.

Apply:

-### Client
+### Client
+Note: The audio example serves on port 8000 (audio_agg.sh). Other examples may use 8080.
examples/multimodal/launch/audio_agg.sh (3)

24-29: Help text mentions unrelated templates.

This launcher is audio-specific; drop LLaVA/Qwen2.5/Phi3V template note to avoid confusion.

Apply:

-            echo "  --prompt-template <template> Specify the multi-modal prompt template to use. LLaVA 1.5 7B, Qwen2.5-VL, and Phi3V models have predefined templates."
+            echo "  --prompt-template <template> Specify the multimodal prompt template to use for the selected audio model."

55-60: Quote args; unify python invocation.

Quote MODEL_NAME to avoid word-splitting and use python3 consistently.

Apply:

-python -m dynamo.frontend --http-port 8000 &
+python3 -m dynamo.frontend --http-port 8000 &

-python3 components/processor.py --model $MODEL_NAME --prompt-template "$PROMPT_TEMPLATE" &
+python3 components/processor.py --model "$MODEL_NAME" --prompt-template "$PROMPT_TEMPLATE" &

-CUDA_VISIBLE_DEVICES=0 python3 components/audio_encode_worker.py --model $MODEL_NAME &
-CUDA_VISIBLE_DEVICES=1 python3 components/worker.py --model $MODEL_NAME --worker-type prefill &
+CUDA_VISIBLE_DEVICES=0 python3 components/audio_encode_worker.py --model "$MODEL_NAME" &
+CUDA_VISIBLE_DEVICES=1 python3 components/worker.py --model "$MODEL_NAME" --worker-type prefill &

57-60: Guard for single-GPU environments (optional). Defaults assume two GPUs (0 and 1); either document this requirement or add a runtime check (e.g. via nvidia-smi) to warn when fewer GPUs are available.

examples/multimodal/utils/audio_loader.py (3)

65-71: Honor sampling_rate and move CPU-bound load off-loop (already good).

Also ensure we pass the desired SR.

Apply:

-            def _load_audio():
-                return librosa.load(audio_data_stream, sr=16000)
+            def _load_audio():
+                return librosa.load(audio_data_stream, sr=sampling_rate or 16000)

73-81: Make cache mutation atomic.

Wrap eviction and insert with a lock.

Apply:

-            if parsed_url.scheme in ("http", "https"):
-                audio_url_lower = audio_url.lower()
-                # Cache the audio for future use, and evict the oldest audio if the cache is full
-                if self._cache_queue.full():
-                    oldest_audio_url = await self._cache_queue.get()
-                    del self._audio_cache[oldest_audio_url]
-
-                self._audio_cache[audio_url_lower] = (audio_data, sr)
-                await self._cache_queue.put(audio_url_lower)
+            if parsed_url.scheme in ("http", "https"):
+                audio_url_lower = audio_url.lower()
+                async with self._lock:
+                    if self._cache_queue.full():
+                        oldest_audio_url = await self._cache_queue.get()
+                        self._audio_cache.pop(oldest_audio_url, None)
+                    self._audio_cache[audio_url_lower] = (audio_data, sr)
+                    await self._cache_queue.put(audio_url_lower)

44-49: Cache lookup should also be locked (optional).

Prevents torn reads under concurrent evictions.

Apply:

-            if audio_url_lower in self._audio_cache:
-                logger.debug(f"Audio found in cache for URL: {audio_url}")
-                return self._audio_cache[audio_url_lower]
+            async with self._lock:
+                if audio_url_lower in self._audio_cache:
+                    logger.debug(f"Audio found in cache for URL: {audio_url}")
+                    return self._audio_cache[audio_url_lower]
examples/multimodal/components/audio_encode_worker.py (1)

39-55: Unused cupy/numpy selection block.

DEVICE/array_module are unused. Remove or wire into compute.

Apply:

-try:
-    import cupy as array_module
-
-    if not array_module.cuda.is_available():
-        raise ImportError("CUDA is not available.")
-    DEVICE = "cuda"
-    logger.info("Using cupy for array operations (GPU mode).")
-except ImportError as e:
-    logger.warning(f"Failed to import cupy, falling back to numpy: {e}.")
-    import numpy as array_module
-
-    DEVICE = "cpu"
+# (Optional) Remove cupy/numpy runtime selection; not used in this module.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e3619ce and 0aef825.

📒 Files selected for processing (9)
  • examples/multimodal/README.md (1 hunks)
  • examples/multimodal/components/audio_encode_worker.py (1 hunks)
  • examples/multimodal/components/processor.py (1 hunks)
  • examples/multimodal/components/worker.py (3 hunks)
  • examples/multimodal/launch/agg.sh (1 hunks)
  • examples/multimodal/launch/audio_agg.sh (1 hunks)
  • examples/multimodal/utils/audio_loader.py (1 hunks)
  • examples/multimodal/utils/model.py (2 hunks)
  • examples/multimodal/utils/protocol.py (3 hunks)
🧰 Additional context used
🪛 LanguageTool
examples/multimodal/README.md

[grammar] ~511-~511: There might be a mistake here.
Context: .../worker.py) for prefilling and decoding. - processor: Tokenizes the prompt and pass...

(QB_NEW_EN)


[grammar] ~512-~512: There might be a mistake here.
Context: ... and passes it to the AudioEncodeWorker. - frontend: HTTP endpoint to handle incomi...

(QB_NEW_EN)


[grammar] ~519-~519: There might be a mistake here.
Context: ...onents/backends/vllm/README.md) example. By separating the audio processing from ...

(QB_NEW_EN)


[grammar] ~520-~520: There might be a mistake here.
Context: ...a more flexible deployment and scale the AudioEncodeWorker independently from the...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
examples/multimodal/utils/protocol.py (1)

128-129: Schema extension looks consistent with the pipeline.

Audio content union + MultiModalInput.audio_url + embeddings_shape 2-tuple are coherent with the PD worker and model helpers.

Please confirm the engine-side expectation for audio embeddings shape tuple order (e.g., [T, D]) matches the audio encoder output.

Also applies to: 145-149, 155-157

examples/multimodal/utils/model.py (1)

31-32: Add QWEN_2_AUDIO_7B constant — LGTM.

Clearly scoped addition.

examples/multimodal/components/worker.py (2)

306-312: Audio path construction — LGTM (given encoder provides 2D).

The new branch routes embeddings to construct_mm_data correctly.


248-256: No action needed: audio encoder and worker both use torch.float16.

The audio model is loaded with torch_dtype=torch.float16 and the worker’s EMBEDDINGS_DTYPE is set to torch.float16 when not handling video, so there’s no RDMA type mismatch.

examples/multimodal/components/processor.py (1)

257-263: Required multimodal input validation — LGTM.

Clear and correct inclusion of audio_url.

Copy link
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for enabling audio support! General workflow lgtm, left a few comments/questions.

Could we also add some pytest to here: https://github.com/ai-dynamo/dynamo/blob/main/tests/serve/test_vllm.py.

@yuekaizhang yuekaizhang requested a review from a team as a code owner August 29, 2025 03:59
@yuekaizhang
Copy link
Contributor Author

yuekaizhang commented Aug 29, 2025

@krishung5 Would you mind checking the latest commit?

Also, the current recipe depends on vllm-project/vllm#23625. What do you think we add the test case in another PR after dynamo update the vllm commit tag?

Copy link
Contributor

@whoisj whoisj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, at least all of the nixl_connector stuff looks correct.

Approved assuming all tests pass and @yuekaizhang has done some form of validation for both the aggregated and disaggregated use cases.

@yuekaizhang
Copy link
Contributor Author

LGTM, at least all of the nixl_connector stuff looks correct.

Approved assuming all tests pass and @yuekaizhang has done some form of validation for both the aggregated and disaggregated use cases.

Hi @whoisj and @krishung5, I've just added a test case to this PR, which depends on vllm v0.10.2rc1 (from https://github.com/vllm-project/vllm/releases/tag/v0.10.2rc1). I checked the latest vllm pip index and saw that only version 0.10.1 is available. Before the official v0.10.2 is released, what do you think about merging this PR first? This would help avoid potential code conflicts.

@krishung5
Copy link
Contributor

Hi @yuekaizhang, sorry for the late reply, I was OOO for the past two weeks. Some questions:
If the audio path depends on vllm v0.10.2, would pip install vllm==v0.10.2 simply work, or do we need to rebuild dynamo with the vllm version as well? I think I ran into some symbol not found issues if pip installing a vllm version different from what dynamo is built with. Just wanted to double check on this.

btw - I think vllm released v0.10.2 on pip index.

Besides, could you help resolve the conflicts in the PR and rerun a pipeline? I can help on the CI side if you don't have access to the GitLab. Thank you!

@yuekaizhang
Copy link
Contributor Author

@krishung5 Hi, I have rebased the code. Also, I have verified the correctness of the whole model serving and client test process.

Would you mind helping check it again?

@krishung5
Copy link
Contributor

/ok to test 2e3a2b3

Signed-off-by: Yuekai Zhang <[email protected]>
@yuekaizhang
Copy link
Contributor Author

@krishung5 Sorry for the delay, would you mind helping check the CI/CD again? Many thanks!

@krishung5
Copy link
Contributor

/ok to test 5ab7e9c

@krishung5
Copy link
Contributor

/ok to test 00484b8

@krishung5
Copy link
Contributor

/ok to test 00f8463

@krishung5
Copy link
Contributor

/ok to test bcf5e9e

@krishung5 krishung5 merged commit 0f5dd2b into ai-dynamo:main Nov 13, 2025
38 of 47 checks passed
daiyaanarfeen pushed a commit that referenced this pull request Nov 14, 2025
Signed-off-by: Yuekai Zhang <[email protected]>
Signed-off-by: krishung5 <[email protected]>
Co-authored-by: Kris Hung <[email protected]>
Signed-off-by: Daiyaan <[email protected]>
dagil-nvidia added a commit that referenced this pull request Dec 10, 2025
Cherry-pick of PR #2760 to release/0.7.1.

Adds audio multimodal support for vLLM including:
- AudioEncodeWorker for processing audio inputs
- Audio aggregated and disaggregated serving examples
- Audio loader utilities
- Documentation for audio multimodal serving

Original PR: #2760
Original author: Yuekai Zhang <[email protected]>
Co-authored-by: Kris Hung <[email protected]>

Signed-off-by: Dan Gil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants