Skip to content

Conversation

@indrajit96
Copy link
Contributor

@indrajit96 indrajit96 commented Dec 1, 2025

Overview:

Adds standalone encoder support to the dynamo TRT-LLM backend, enabling the full E-P-D (Encode-Prefill-Decode) disaggregated multimodal serving architecture.

Details:

  • TensorRTLLMEngine changes: Initializes MultimodalEncoder when in encode mode instead of the standard LLM
  • process_encode_request: Extended to handle both pre-computed embeddings (.pt files via NIXL) and raw images (via MultimodalEncoder)
  • EPD flow in PrefillHandler: Calls encode worker for image URLs and passes ep_disaggregated_params through the pipeline
  • Metadata forwarding: EPD-specific fields (_epd_processed_prompt, _epd_prompt_token_ids) are packed into disaggregated_params for the decode worker
  • New launch script: epd_multimodal.sh for running the full E-P-D setup with LLaVA

Where should the reviewer start?

  • components/src/dynamo/trtllm/encode_helper.py - Core encoding logic and process_encode_request
  • components/src/dynamo/trtllm/request_handlers/handlers.py - EncodeHandler and PrefillHandler.remote_encode_full_epd
  • components/src/dynamo/trtllm/engine.py - MultimodalEncoder initialization
  • components/src/dynamo/trtllm/request_handlers/handler_base.py - EPD metadata handling in _prepare_decode_input and _encode_and_pack_disaggregated_params

Summary by CodeRabbit

  • New Features

    • Introduced disaggregated inference mode support, enabling flexible splitting of model computation into encode, prefill, and decode stages for optimized performance
    • Enhanced multimodal processing capabilities with improved image embedding and handling
  • Chores

    • Added example configuration files and launcher scripts for multimodal model deployment

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
@indrajit96 indrajit96 changed the title Ibhosale epd 1.2 feat: Standalone encoder in dynamo trtllm Dec 1, 2025
@github-actions github-actions bot added the feat label Dec 1, 2025
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
draft_tokens=disaggregated_params.draft_tokens,
# E-P Disaggregated Params (for full EPD flow)
# Use getattr with None default for backward compatibility with text-only requests
multimodal_embedding_handles=getattr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe stupid question: these already default to None in the DisaggregatedParams definition from TRTLLM: https://github.com/NVIDIA/TensorRT-LLM/blob/v1.2.0rc4/tensorrt_llm/disaggregated_params.py#L37

Why not use:

dataclasses.replace(disaggregated_params, opaque_state=opaque_state)

since opaque_state seems to be the only actual difference between the input disaggergate_params and what we're returning here?

Similar question for the encode method below.

yield {"error": "Dictionary embeddings missing 'mm_embeddings' key"}
# Handle both tensor and dictionary formats
if isinstance(loaded_data, dict):
# Dictionary format (e.g., maverick_mm_embed_seashore_v3.pt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: was this in reference to some local debugging? Should it be removed?

yield {"ep_disaggregated_params": None}
return
if (
hasattr(ep_disaggregated_params, "multimodal_embedding_handles")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe stupid question: aren't we guaranteed that this attribute exists? Couldn't we just check for if ep_disaggregated_params.multimodal_embedding_handles is not None directly?

Or is the idea that we want to support multiple TRTLLM versions somehow?

# Tokenize the processed prompt for prefill worker
if processed_prompt and tokenizer is not None:
prompt_token_ids = tokenizer.encode(
processed_prompt, add_special_tokens=False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave a comment why add_special_tokens is set to False?

async def initialize(self):
if not self._llm:
self._llm = self._llm_cls(**self.engine_args)
if self.disaggregation_mode == DisaggregationMode.ENCODE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how is the engine initialized for a prefill worker? Is that encapsulated via engine_args itself somehow? (It might be worth leaving a comment whatever the case is 🙏 )

self._llm = self._llm_cls(**self.engine_args)
if self.disaggregation_mode == DisaggregationMode.ENCODE:
# Initialize the multimodal encoder for full EPD
max_batch_size = self.engine_args.pop("max_batch_size", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why was this necessary? Maybe leave a comment? 🙏

)
self._llm = MultimodalEncoder(
model=model,
max_batch_size=max_batch_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why not forward the rest of the self.engine_args? MultimodalEncoder is also a LLM class: https://github.com/NVIDIA/TensorRT-LLM/blob/v1.2.0rc4/tensorrt_llm/llmapi/mm_encoder.py#L16


@property
def llm(self):
def llm(self) -> Union[LLM, MultimodalEncoder]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# Setup disaggregated_params for PREFILL mode
if self.disaggregation_mode == DisaggregationMode.PREFILL:
if ep_disaggregated_params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could these if / elif clauses' contents be moved to helper functions for readability? 🙏

out["disaggregated_params"] = asdict(
DisaggregatedParamsCodec.encode(output.disaggregated_params)
# In EPD flow, output.disaggregated_params might be None, use the input params
logging.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar readability comment here 🙏

Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
@indrajit96 indrajit96 marked this pull request as ready for review December 2, 2025 22:04
@indrajit96 indrajit96 requested review from a team as code owners December 2, 2025 22:04
@indrajit96 indrajit96 requested a review from a team as a code owner December 2, 2025 22:04
@indrajit96 indrajit96 requested a review from PeaBrane December 2, 2025 22:04
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

Introduces a disaggregation mode framework for TensorRT-LLM enabling encoder/prefill/decode pipeline separation. Adds mode-aware engine initialization, multimodal encoding support, request routing logic, configuration files, and launch infrastructure for multimodal model deployment.

Changes

Cohort / File(s) Summary
Disaggregation Mode Constants
components/src/dynamo/trtllm/constants.py
New enum DisaggregationMode with four modes: AGGREGATED, PREFILL, DECODE, ENCODE. Enables mode-aware initialization and request routing throughout the pipeline.
Engine Initialization & Management
components/src/dynamo/trtllm/engine.py, components/src/dynamo/trtllm/main.py
TensorRTLLMEngine.__init__() now accepts disaggregation_mode parameter; conditionally initializes MultimodalEncoder when mode is ENCODE. get_llm_engine() propagates disaggregation mode from config. llm property returns Union[LLM, MultimodalEncoder].
Encoding & Multimodal Processing
components/src/dynamo/trtllm/encode_helper.py, components/src/dynamo/trtllm/multimodal_processor.py
process_embedding_request() renamed to process_encode_request() with expanded interface (tokenizer, model_dir, model_type, engine). Adds dual-path handling: load embeddings from paths or generate via multimodal encoder. process_openai_request() signature extended with ep_disaggregated_params; new encoder/data-plane flow handling and refactored token streaming logic.
Request Flow & Routing
components/src/dynamo/trtllm/request_handlers/handler_base.py, components/src/dynamo/trtllm/request_handlers/handlers.py
HandlerBase adds three new helper methods: _decode_disaggregated_params_from_prefill(), _prepare_decode_input(), _encode_and_pack_disaggregated_params(). generate_locally() signature extended with ep_disaggregated_params. EncodeHandler adds model_dir, model_type, tokenizer attributes. PrefillHandler adds remote_encode_full_epd() method; manages disaggregated params reconstruction and request metadata injection.
Utilities
components/src/dynamo/trtllm/utils/disagg_utils.py
Enhanced DisaggregatedParamsCodec to thread multimodal_embedding_handles and multimodal_hashes through decode/encode paths with backward-compatible defaults.
Engine Configuration Files
examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/decode.yaml, examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/encode.yaml, examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/prefill.yaml
Three new YAML configuration files specifying model runtime settings (tensor parallelism, batch sizes, KV cache config, overlap scheduler settings) for disaggregated decode, encode, and prefill workers.
Launch & Prompt Template
examples/backends/trtllm/launch/epd_multimodal.sh, examples/backends/trtllm/templates/llava_multimodal.jinja
Bash launcher orchestrating frontend and three disaggregated workers (encode, prefill, decode) with device binding and cleanup handling. Jinja2 template formats multimodal messages with special handling for system, user (with image placeholders), and assistant roles.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas requiring extra attention:

  • encode_helper.py — Dual-path embedding loading (dict vs. tensor) with NIXL metadata serialization and disaggregated params encoding introduces logic density; verify correctness of auxiliary data preservation and error handling paths.
  • request_handlers/handler_base.py — Three new helper methods and refactored generate_locally() flow orchestrate disaggregation logic; review state transitions, metadata threading (_epd_processed_prompt, _epd_prompt_token_ids), and interaction with multimodal stripping.
  • request_handlers/handlers.pyPrefillHandler.remote_encode_full_epd() and its interaction with ep_disaggregated_params reconstruction requires tracing disaggregated params lifecycle through prefill→decode handoff.
  • multimodal_processor.py — Token streaming delta calculation refactor and EPD flow override logic; verify previous_decoded_text initialization and delta computation across chunk boundaries.
  • engine.py — Mode-dependent initialization path selection (ENCODE → MultimodalEncoder vs. LLM); ensure proper cleanup in exception scenarios and type safety of union return types.

Poem

🐰 A disaggregated dance we now perform,
Encode, prefill, decode — each form,
Multimodal whispers threaded through,
Embeddings and prompts find their true
Path to harmony, pixel by token anew! ✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding standalone encoder support to the dynamo TRT-LLM backend.
Description check ✅ Passed The description includes all required sections from the template (Overview, Details, Where should the reviewer start) with comprehensive and specific information about the changes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/src/dynamo/trtllm/request_handlers/handlers.py (1)

90-94: Unreachable code after return statement.

Lines 90-94 are unreachable because line 85 returns unconditionally after the connector path. This dead code should be removed.

             yield response
         return
-    else:
-        logging.error("encode handler: no Dynamo NIXL connector found")
-        raise RuntimeError("encode handler: no Dynamo NIXL connector found")
-
-    if not request.get("streaming", False):
-        yield request
-        return
-
-    yield request
+    # No connector available
+    logging.error("encode handler: no Dynamo NIXL connector found")
+    raise RuntimeError("encode handler: no Dynamo NIXL connector found")
♻️ Duplicate comments (2)
components/src/dynamo/trtllm/engine.py (1)

44-56: Conditional initialization for ENCODE mode is correct.

The MultimodalEncoder initialization properly handles the ENCODE disaggregation mode. Note that engine_args.pop() mutates the dictionary, which could cause issues if engine_args is used elsewhere after initialization. Since past reviews have already discussed parameter forwarding to MultimodalEncoder, I'll defer to those discussions.

components/src/dynamo/trtllm/encode_helper.py (1)

316-318: Add comment explaining add_special_tokens=False.

Per previous review feedback, please add a comment explaining why special tokens are disabled during tokenization.

🧹 Nitpick comments (7)
examples/backends/trtllm/launch/epd_multimodal.sh (2)

21-28: Consider debugging-friendly cleanup for troubleshooting.

The cleanup trap unconditionally terminates all background processes on EXIT/INT/TERM. Based on learnings from similar launch infrastructure (PR 1730), keeping background processes alive after script exit can enable users to manually connect and debug issues without restarting everything. However, since this script launches ephemeral worker processes (not persistent infrastructure services), the current unconditional cleanup is simpler and may be appropriate for this use case.

Evaluate whether your operational workflows would benefit from selective cleanup or keeping workers alive for debugging attached nodes.


9-11: Consider validating engine configuration paths before launching workers.

The script references engine config files that may not exist (e.g., llava-v1.6-mistral-7b-hf/encode.yaml). If files are missing, workers will fail with cryptic errors. A simple validation check early in the script could surface issues earlier.

Example validation (optional):

for config_var in ENCODE_ENGINE_ARGS PREFILL_ENGINE_ARGS DECODE_ENGINE_ARGS; do
  config_path="${!config_var}"
  if [[ ! -f "$config_path" ]]; then
    echo "Error: Config file not found: $config_path" >&2
    exit 1
  fi
done
components/src/dynamo/trtllm/multimodal_processor.py (2)

162-162: Instance attribute should be initialized in __init__.

previous_decoded_text is set as an instance attribute here but isn't declared in __init__. This works but is unconventional and could cause issues if create_response_chunk is called before process_openai_request.

Consider initializing in __init__:

 def __init__(
     self,
     model_type: str,
     model_dir: str,
     max_file_size_mb: int,
     tokenizer: Optional[TokenizerProtocol] = None,
     allowed_local_media_path: str = "",
 ):
     self.model_type = model_type
     self.model_dir = model_dir
     self.tokenizer = tokenizer
     self.modality = ""
     self.allowed_local_media_path = allowed_local_media_path
     self.max_file_size_mb = max_file_size_mb
     self.max_file_size_bytes = max_file_size_mb * 1024 * 1024
+    self.previous_decoded_text = ""

180-184: Simplify key check with dict.get().

Per static analysis hint (RUF019), the key check before dictionary access can be simplified.

-            if "_epd_prompt_token_ids" in request and request["_epd_prompt_token_ids"]:
-                result["prompt_token_ids"] = request["_epd_prompt_token_ids"]
+            prompt_token_ids = request.get("_epd_prompt_token_ids")
+            if prompt_token_ids:
+                result["prompt_token_ids"] = prompt_token_ids
components/src/dynamo/trtllm/encode_helper.py (1)

278-278: Potential blocking call in async context.

engine.llm.generate(inputs) appears to be a synchronous call wrapped in list(). In an async handler, this could block the event loop. Consider whether this should be run in an executor or if TRTLLM provides an async variant.

If blocking is confirmed, consider:

import asyncio
encoder_outputs = await asyncio.get_event_loop().run_in_executor(
    None, lambda: list(engine.llm.generate(inputs))
)
components/src/dynamo/trtllm/request_handlers/handlers.py (2)

105-113: Consider adding error handling for generator exhaustion.

The method breaks after the first response, but if the generator yields no items, encode_response remains None. The subsequent check handles this, but consider using async for ... else or anext() for clarity.

     async def remote_encode_full_epd(self, request: dict):
-        async for res in await self.encode_client.round_robin(request):
-            encode_response = res.data()
-            break
-
-        if not encode_response:
-            raise RuntimeError("Did not receive a response from the encode worker.")
-
-        return encode_response
+        encode_response = None
+        async for res in await self.encode_client.round_robin(request):
+            encode_response = res.data()
+            break
+
+        if not encode_response:
+            raise RuntimeError("Did not receive a response from the encode worker.")
+
+        return encode_response

Alternatively, consider extracting the round-robin single-response pattern to a shared helper since remote_encode_with_nixl uses the same pattern.


158-185: EPD flow correctly reconstructs DisaggregatedParams and propagates metadata.

The logic correctly:

  1. Calls remote_encode_full_epd for image URLs
  2. Reconstructs DisaggregatedParams from the dict response
  3. Sets request_type to "context_only" for prefill phase
  4. Stores _epd_processed_prompt and _epd_prompt_token_ids in the request for downstream use

One minor improvement per static analysis: the nested key check on lines 179-181 can be simplified.

-                            if (
-                                "prompt_token_ids" in encode_response
-                                and encode_response["prompt_token_ids"]
-                            ):
+                            prompt_token_ids = encode_response.get("prompt_token_ids")
+                            if prompt_token_ids:
                                 request["_epd_prompt_token_ids"] = encode_response[
                                     "prompt_token_ids"
                                 ]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d9fc85 and 1f152c0.

📒 Files selected for processing (13)
  • components/src/dynamo/trtllm/constants.py (1 hunks)
  • components/src/dynamo/trtllm/encode_helper.py (3 hunks)
  • components/src/dynamo/trtllm/engine.py (5 hunks)
  • components/src/dynamo/trtllm/main.py (1 hunks)
  • components/src/dynamo/trtllm/multimodal_processor.py (2 hunks)
  • components/src/dynamo/trtllm/request_handlers/handler_base.py (6 hunks)
  • components/src/dynamo/trtllm/request_handlers/handlers.py (4 hunks)
  • components/src/dynamo/trtllm/utils/disagg_utils.py (2 hunks)
  • examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/decode.yaml (1 hunks)
  • examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/encode.yaml (1 hunks)
  • examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/prefill.yaml (1 hunks)
  • examples/backends/trtllm/launch/epd_multimodal.sh (1 hunks)
  • examples/backends/trtllm/templates/llava_multimodal.jinja (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

  • components/src/dynamo/trtllm/multimodal_processor.py
📚 Learning: 2025-07-03T10:14:30.570Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Applied to files:

  • examples/backends/trtllm/launch/epd_multimodal.sh
📚 Learning: 2025-06-05T01:10:51.865Z
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 1391
File: examples/tensorrt_llm/common/base_engine.py:171-176
Timestamp: 2025-06-05T01:10:51.865Z
Learning: In examples/tensorrt_llm/common/base_engine.py, the _init_engine method is called only once during initialization, so direct mutation of the _default_sampling_params object during setup is safe and appropriate.

Applied to files:

  • components/src/dynamo/trtllm/engine.py
🧬 Code graph analysis (5)
components/src/dynamo/trtllm/main.py (1)
components/src/dynamo/trtllm/engine.py (1)
  • get_llm_engine (109-120)
components/src/dynamo/trtllm/multimodal_processor.py (1)
components/src/dynamo/trtllm/utils/disagg_utils.py (1)
  • decode (27-50)
examples/backends/trtllm/launch/epd_multimodal.sh (3)
components/src/dynamo/trtllm/engine.py (1)
  • cleanup (58-65)
components/src/dynamo/trtllm/utils/disagg_utils.py (2)
  • encode (53-77)
  • decode (27-50)
components/src/dynamo/trtllm/multimodal_processor.py (1)
  • decode (40-41)
components/src/dynamo/trtllm/engine.py (2)
components/src/dynamo/trtllm/constants.py (1)
  • DisaggregationMode (7-11)
components/src/dynamo/sglang/args.py (1)
  • DisaggregationMode (144-147)
components/src/dynamo/trtllm/request_handlers/handlers.py (2)
components/src/dynamo/trtllm/encode_helper.py (1)
  • process_encode_request (191-327)
components/src/dynamo/trtllm/multimodal_processor.py (1)
  • extract_prompt_and_media (132-154)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4668/merge) by indrajit96.
examples/backends/trtllm/launch/epd_multimodal.sh

[error] 1-1: Check for executable shebang scripts: the file has a shebang but is not marked executable. Run 'chmod +x examples/backends/trtllm/launch/epd_multimodal.sh' (or 'git add --chmod=+x ...' on Windows if needed).

🪛 Ruff (0.14.7)
components/src/dynamo/trtllm/multimodal_processor.py

180-180: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)

components/src/dynamo/trtllm/request_handlers/handlers.py

111-111: Avoid specifying long messages outside the exception class

(TRY003)


180-181: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)

components/src/dynamo/trtllm/request_handlers/handler_base.py

191-191: Unused method argument: prefill_result

(ARG002)


457-457: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (30)
examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/decode.yaml (2)

22-22: Verify enable_chunked_prefill is appropriate for decode-only configuration.

In the EPD disaggregated architecture, the decode phase handles tokens post-prefill. The enable_chunked_prefill: true setting on line 22 is typically a prefill-phase optimization and may be unrelated or problematic in a decode-only context.

Please confirm:

  1. Whether this setting is actively used by the TensorRT-LLM decode engine or safely ignored.
  2. Whether this was intentionally copied from a full-model config and should be removed for decode-only mode.

15-29: Configuration looks reasonable for decode-only workload; verify EPD completeness.

The conservative memory allocation (free_gpu_memory_fraction: 0.30) and batch size (max_batch_size: 16) are appropriate for the decode phase of an EPD pipeline. However, verify that companion encode.yaml and prefill.yaml configurations exist elsewhere in the PR and are properly integrated into the disaggregated setup.

examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/prefill.yaml (3)

1-14: License header is present and correct.

The Apache 2.0 license header is properly formatted and applied.


26-31: KV cache and cache transceiver configuration appears appropriate for disaggregated prefill.

The free_gpu_memory_fraction: 0.30 is conservative, leaving substantial headroom for input embeddings and compute. The enable_block_reuse: false is reasonable for a disaggregated setup where workers are specialized and may not benefit from block reuse across different request flows.

Confirm that the 30% KV cache allocation (line 27) leaves sufficient GPU memory for the prefill worker's input embeddings, attention computation, and output buffering. If memory utilization runs high, consider adjusting the free_gpu_memory_fraction upward.


15-24: Configuration parameters and prefill-only constraints are appropriate.

The settings are conservatively tuned for a prefill-only disaggregated worker. The comment on line 23 correctly documents that the overlap scheduler is unsupported for prefill-only workers, aligning with the disaggregation mode framework.

All related configuration files (encode.yaml and decode.yaml) exist in the same directory with consistent memory and parallelism settings across phases. YAML syntax validation confirms all three files are well-formed. The prefill.yaml configuration is correctly referenced in launch scripts (e.g., examples/backends/trtllm/launch/epd_multimodal.sh). Parameter choices like max_num_tokens: 8192, max_batch_size: 16, and enable_chunked_prefill: true are sensible for prefill-only workers. Note that decode.yaml appropriately sets disable_overlap_scheduler: false for the decode phase, distinguishing it from the prefill phase.

examples/backends/trtllm/templates/llava_multimodal.jinja (1)

1-13: Template structure looks correct for LLaVA-Mistral format.

The template correctly handles the multimodal content structure with <image> placeholders for image_url types. A few observations:

  1. Unknown role types are silently ignored - consider adding an else clause with a warning or error for debugging.
  2. Verify that the BOS token (<s>) is added elsewhere in the pipeline, as Mistral-based models typically require it at the start.
components/src/dynamo/trtllm/utils/disagg_utils.py (2)

44-50: Backward-compatible multimodal parameter handling looks good.

Using getattr with None default provides backward compatibility with older TensorRT-LLM versions that may not have these fields.


64-77: Consistent encoding of multimodal parameters.

The encode path correctly mirrors the decode path for multimodal fields.

components/src/dynamo/trtllm/constants.py (1)

7-11: Clean enum definition for disaggregation modes.

The enum clearly defines the four supported modes. Note that the AGGREGATED value differs from sglang's equivalent ("prefill_and_decode" vs "agg" in components/src/dynamo/sglang/args.py). If cross-backend consistency is desired, consider aligning the string values.

components/src/dynamo/trtllm/main.py (1)

290-290: Correctly passes disaggregation mode to engine initialization.

The change properly forwards config.disaggregation_mode to get_llm_engine, enabling mode-aware engine initialization.

examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/encode.yaml (1)

15-31: Reasonable encode worker configuration.

The configuration appropriately sets a lower free_gpu_memory_fraction (0.30) for the encode-only worker, which doesn't need as much KV cache memory. The comment explaining why disable_overlap_scheduler is required is helpful.

components/src/dynamo/trtllm/multimodal_processor.py (2)

173-184: EPD flow handling looks correct.

The logic properly handles the EPD case by using the encoder-provided processed prompt and token IDs.


229-242: Token streaming refactor is correct.

The incremental delta calculation using cached previous_decoded_text properly handles both first and subsequent chunks.

components/src/dynamo/trtllm/engine.py (2)

58-65: Improved cleanup with try/finally.

The cleanup method now properly guards against exceptions during shutdown and ensures _llm is set to None in all cases.


109-112: Updated context manager signature correctly passes disaggregation mode.

The async context manager now properly accepts and forwards the disaggregation mode to TensorRTLLMEngine.

components/src/dynamo/trtllm/encode_helper.py (3)

279-300: Good defensive error handling for encoder outputs.

The code properly handles cases where encoder outputs are empty or missing disaggregated params, with appropriate logging at different levels (error vs warning).


222-265: Embedding paths flow with NIXL looks correct.

The logic properly handles both dict and tensor formats, creates NIXL readable operations, and waits for completion before returning.


274-275: Not an issue: Intentional single-image processing design.

The code uses image_urls[0] as intended. The default_multimodal_input_loader function expects a single media item (not a list), as documented on line 305 with the comment "default_multimodal_input_loader returns a list, get the first element." Processing one image at a time through this flow is the designed behavior.

components/src/dynamo/trtllm/request_handlers/handlers.py (5)

6-6: LGTM!

The import of DisaggregatedParams from tensorrt_llm.llmapi is correctly added to support the EPD flow parameter handling.


60-69: LGTM!

Good defensive initialization pattern - setting attributes to None first, then conditionally populating them from multimodal_processor. This prevents AttributeError if multimodal_processor is not available.


75-85: LGTM!

The updated call to process_encode_request now passes all required parameters for both NIXL embedding transfer and full EPD encoding paths. The early return on line 85 ensures no further processing after the async generator completes.


145-149: Unused variable _ from extract_prompt_and_media.

The text_prompt (assigned to _) is extracted but not used in this method. If it's intentionally unused, that's fine, but verify this matches the expected EPD flow where text_prompt handling happens elsewhere.


186-199: LGTM!

The generate_locally call correctly passes both embeddings_tensor and ep_disaggregated_params, enabling the base handler to use either NIXL-transferred embeddings or EPD disaggregated params. The single-response assertion is appropriate for prefill mode.

components/src/dynamo/trtllm/request_handlers/handler_base.py (7)

35-35: LGTM!

Import of DisaggregationMode from constants centralizes the enum definition, improving maintainability.


144-185: Well-structured helper for decoding disaggregated params.

The method correctly:

  1. Extracts EPD metadata from the packed _epd_metadata field
  2. Uses pop() to remove metadata from params_dict before decoding
  3. Sets request_type to "generation_only" for decode phase
  4. Clears multimodal_embedding_handles to avoid TRT-LLM validation errors

The docstring clearly explains the return tuple.


253-328: Well-designed helper for encoding and packing disaggregated params.

This method addresses the previous review comment about extracting helper functions for readability. It properly:

  1. Chooses between output and input disaggregated params
  2. Preserves multimodal_embedding_handles and multimodal_hashes for EPD flow
  3. Packs EPD metadata into the params dict for transmission

The logic for preserving handles from input when output doesn't have them (lines 287-302) correctly handles the case where TRT-LLM doesn't propagate these fields through prefill.


365-410: Complex flow but well-structured with clear comments.

The initialization flow correctly:

  1. Normalizes request format (stop_conditions, sampling_options)
  2. Sets up disaggregated_params for PREFILL mode with proper request_type
  3. Decodes params from prefill_result for DECODE mode
  4. Makes ep_disaggregated_params available to multimodal processor

The comments explaining the Rust frontend's max_tokens handling are helpful for maintainability.


416-442: Mode-based processing correctly routes to appropriate handlers.

The branching logic properly separates DECODE mode (using _prepare_decode_input) from PREFILL/ENCODE modes (using multimodal_processor.process_openai_request). The fallback to request.get("token_ids") for text-only flows is appropriate.


463-482: Defensive handling of optional request fields.

Good defensive pattern - checking for presence of sampling_options and stop_conditions before iterating, and checking each value before setting. This handles cases where decode workers may receive minimal requests.


537-543: LGTM - Cleaner refactor using helper method.

The PREFILL mode handling is now cleaner with the extracted _encode_and_pack_disaggregated_params helper. The None check before assignment is correct.

Comment on lines +187 to +251
async def _prepare_decode_input(
self,
request: dict,
epd_metadata: dict,
prefill_result: Optional[dict],
embeddings: Any,
ep_disaggregated_params: Any,
) -> Optional[Any]:
"""
Prepare input for DECODE mode processing.
Handles EPD flow (with encoder) by extracting prompt and token IDs,
or falls back to multimodal processor for other flows.
Args:
request: The request dictionary
epd_metadata: EPD metadata extracted from prefill result
prefill_result: Result from prefill worker
embeddings: Multimodal embeddings (if any)
ep_disaggregated_params: Disaggregated params from encoder/prefill
Returns:
Processed input ready for the engine, or None if not available
"""
# Decode worker with generation_only mode
# Pass the same inputs format as prefill
# Check epd_metadata (packed by prefill), then prefill_result, then direct request
epd_prompt = epd_metadata.get("_epd_processed_prompt")
epd_token_ids = epd_metadata.get("_epd_prompt_token_ids")

if epd_prompt:
# In EPD generation-only mode (decode), pass the SAME input format as prefill
# This matches TRT-LLM's test: llm_decode.generate(inputs, disaggregated_params=...)
# The inputs dict provides prompt structure, disaggregated_params provide multimodal embeddings
if epd_token_ids:
prompt_token_ids = epd_token_ids

processed_input = {
"prompt": epd_prompt,
"prompt_token_ids": prompt_token_ids,
}

# Remove ALL multimodal data from request to avoid TRT-LLM validation error
# In generation-only mode, ALL multimodal data must be in disaggregated_params only
mm_keys_to_remove = ["multi_modal_data", "image_data", "mm_data"]
for key in mm_keys_to_remove:
if key in request:
request.pop(key)
logging.debug(
f"DECODE: Removed {key} from request (already in disaggregated_params)"
)
return processed_input
elif self.multimodal_processor:
# Encode/Prefill worker: Process multimodal content normally
# In EPD flow, multimodal_processor should be called in PREFILL/ENCODE modes only
# DECODE mode should skip this and use EPD metadata from prefill
processed_input = await self.multimodal_processor.process_openai_request(
request, embeddings, ep_disaggregated_params
)
return processed_input
else:
logging.debug(
"DECODE: No multimodal_processor found, using request token_ids"
)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused parameter prefill_result.

The static analysis correctly identifies that prefill_result is unused within this method. Since EPD metadata is already extracted and passed via epd_metadata, consider removing this parameter unless it's intended for future use.

     async def _prepare_decode_input(
         self,
         request: dict,
         epd_metadata: dict,
-        prefill_result: Optional[dict],
         embeddings: Any,
         ep_disaggregated_params: Any,
     ) -> Optional[Any]:

And update the call site at line 417-423.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _prepare_decode_input(
self,
request: dict,
epd_metadata: dict,
prefill_result: Optional[dict],
embeddings: Any,
ep_disaggregated_params: Any,
) -> Optional[Any]:
"""
Prepare input for DECODE mode processing.
Handles EPD flow (with encoder) by extracting prompt and token IDs,
or falls back to multimodal processor for other flows.
Args:
request: The request dictionary
epd_metadata: EPD metadata extracted from prefill result
prefill_result: Result from prefill worker
embeddings: Multimodal embeddings (if any)
ep_disaggregated_params: Disaggregated params from encoder/prefill
Returns:
Processed input ready for the engine, or None if not available
"""
# Decode worker with generation_only mode
# Pass the same inputs format as prefill
# Check epd_metadata (packed by prefill), then prefill_result, then direct request
epd_prompt = epd_metadata.get("_epd_processed_prompt")
epd_token_ids = epd_metadata.get("_epd_prompt_token_ids")
if epd_prompt:
# In EPD generation-only mode (decode), pass the SAME input format as prefill
# This matches TRT-LLM's test: llm_decode.generate(inputs, disaggregated_params=...)
# The inputs dict provides prompt structure, disaggregated_params provide multimodal embeddings
if epd_token_ids:
prompt_token_ids = epd_token_ids
processed_input = {
"prompt": epd_prompt,
"prompt_token_ids": prompt_token_ids,
}
# Remove ALL multimodal data from request to avoid TRT-LLM validation error
# In generation-only mode, ALL multimodal data must be in disaggregated_params only
mm_keys_to_remove = ["multi_modal_data", "image_data", "mm_data"]
for key in mm_keys_to_remove:
if key in request:
request.pop(key)
logging.debug(
f"DECODE: Removed {key} from request (already in disaggregated_params)"
)
return processed_input
elif self.multimodal_processor:
# Encode/Prefill worker: Process multimodal content normally
# In EPD flow, multimodal_processor should be called in PREFILL/ENCODE modes only
# DECODE mode should skip this and use EPD metadata from prefill
processed_input = await self.multimodal_processor.process_openai_request(
request, embeddings, ep_disaggregated_params
)
return processed_input
else:
logging.debug(
"DECODE: No multimodal_processor found, using request token_ids"
)
return None
async def _prepare_decode_input(
self,
request: dict,
epd_metadata: dict,
embeddings: Any,
ep_disaggregated_params: Any,
) -> Optional[Any]:
"""
Prepare input for DECODE mode processing.
Handles EPD flow (with encoder) by extracting prompt and token IDs,
or falls back to multimodal processor for other flows.
Args:
request: The request dictionary
epd_metadata: EPD metadata extracted from prefill result
embeddings: Multimodal embeddings (if any)
ep_disaggregated_params: Disaggregated params from encoder/prefill
Returns:
Processed input ready for the engine, or None if not available
"""
# Decode worker with generation_only mode
# Pass the same inputs format as prefill
# Check epd_metadata (packed by prefill), then prefill_result, then direct request
epd_prompt = epd_metadata.get("_epd_processed_prompt")
epd_token_ids = epd_metadata.get("_epd_prompt_token_ids")
if epd_prompt:
# In EPD generation-only mode (decode), pass the SAME input format as prefill
# This matches TRT-LLM's test: llm_decode.generate(inputs, disaggregated_params=...)
# The inputs dict provides prompt structure, disaggregated_params provide multimodal embeddings
if epd_token_ids:
prompt_token_ids = epd_token_ids
processed_input = {
"prompt": epd_prompt,
"prompt_token_ids": prompt_token_ids,
}
# Remove ALL multimodal data from request to avoid TRT-LLM validation error
# In generation-only mode, ALL multimodal data must be in disaggregated_params only
mm_keys_to_remove = ["multi_modal_data", "image_data", "mm_data"]
for key in mm_keys_to_remove:
if key in request:
request.pop(key)
logging.debug(
f"DECODE: Removed {key} from request (already in disaggregated_params)"
)
return processed_input
elif self.multimodal_processor:
# Encode/Prefill worker: Process multimodal content normally
# In EPD flow, multimodal_processor should be called in PREFILL/ENCODE modes only
# DECODE mode should skip this and use EPD metadata from prefill
processed_input = await self.multimodal_processor.process_openai_request(
request, embeddings, ep_disaggregated_params
)
return processed_input
else:
logging.debug(
"DECODE: No multimodal_processor found, using request token_ids"
)
return None
🧰 Tools
🪛 Ruff (0.14.7)

191-191: Unused method argument: prefill_result

(ARG002)

🤖 Prompt for AI Agents
components/src/dynamo/trtllm/request_handlers/handler_base.py around lines 187
to 251: the parameter prefill_result is declared but never used in
_prepare_decode_input; remove prefill_result from the method signature and any
type hints, update the docstring args list to drop it, and then update the call
site mentioned (lines ~417-423) to stop passing prefill_result (pass the
remaining args in their same order); ensure imports/types still match and run
tests/type checks to confirm no other callers remain.

Comment on lines +217 to +227
if epd_prompt:
# In EPD generation-only mode (decode), pass the SAME input format as prefill
# This matches TRT-LLM's test: llm_decode.generate(inputs, disaggregated_params=...)
# The inputs dict provides prompt structure, disaggregated_params provide multimodal embeddings
if epd_token_ids:
prompt_token_ids = epd_token_ids

processed_input = {
"prompt": epd_prompt,
"prompt_token_ids": prompt_token_ids,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential UnboundLocalError if epd_token_ids is falsy.

If epd_prompt is truthy but epd_token_ids is falsy, the code at line 226 references prompt_token_ids which may not be defined in this scope.

         if epd_prompt:
             # In EPD generation-only mode (decode), pass the SAME input format as prefill
             # This matches TRT-LLM's test: llm_decode.generate(inputs, disaggregated_params=...)
             # The inputs dict provides prompt structure, disaggregated_params provide multimodal embeddings
+            prompt_token_ids = None
             if epd_token_ids:
                 prompt_token_ids = epd_token_ids

             processed_input = {
                 "prompt": epd_prompt,
                 "prompt_token_ids": prompt_token_ids,
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if epd_prompt:
# In EPD generation-only mode (decode), pass the SAME input format as prefill
# This matches TRT-LLM's test: llm_decode.generate(inputs, disaggregated_params=...)
# The inputs dict provides prompt structure, disaggregated_params provide multimodal embeddings
if epd_token_ids:
prompt_token_ids = epd_token_ids
processed_input = {
"prompt": epd_prompt,
"prompt_token_ids": prompt_token_ids,
}
if epd_prompt:
# In EPD generation-only mode (decode), pass the SAME input format as prefill
# This matches TRT-LLM's test: llm_decode.generate(inputs, disaggregated_params=...)
# The inputs dict provides prompt structure, disaggregated_params provide multimodal embeddings
prompt_token_ids = None
if epd_token_ids:
prompt_token_ids = epd_token_ids
processed_input = {
"prompt": epd_prompt,
"prompt_token_ids": prompt_token_ids,
}
🤖 Prompt for AI Agents
In components/src/dynamo/trtllm/request_handlers/handler_base.py around lines
217-227, the block that builds processed_input can reference prompt_token_ids
when epd_token_ids is falsy, causing an UnboundLocalError; ensure
prompt_token_ids is always defined before use by initializing it (e.g., set
prompt_token_ids = None or to the appropriate existing token variable) before
the if epd_token_ids check or by using a conditional expression when
constructing processed_input so prompt_token_ids is assigned a defined value
even if epd_token_ids is falsy.

@@ -0,0 +1,67 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make the script executable to pass pre-merge validation.

The pipeline check flags that the file has a shebang but is not marked executable.

Apply the following to fix:

git add --chmod=+x examples/backends/trtllm/launch/epd_multimodal.sh

Alternatively, if using a standard Unix environment:

chmod +x examples/backends/trtllm/launch/epd_multimodal.sh
🧰 Tools
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4668/merge) by indrajit96.

[error] 1-1: Check for executable shebang scripts: the file has a shebang but is not marked executable. Run 'chmod +x examples/backends/trtllm/launch/epd_multimodal.sh' (or 'git add --chmod=+x ...' on Windows if needed).

🤖 Prompt for AI Agents
In examples/backends/trtllm/launch/epd_multimodal.sh around line 1, the file has
a shebang but is not marked executable which fails pre-merge validation; fix it
by making the file executable before committing (for example, set the executable
bit on the file locally or stage it with the executable bit set in git) and
re-run the pipeline to ensure the check passes.

export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/prefill.yaml"}
export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/decode.yaml"}
export ENCODE_ENGINE_ARGS=${ENCODE_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/llava-v1.6-mistral-7b-hf/encode.yaml"}
export PREFILL_CUDA_VISIBLE_DEVICES=${PREFILL_CUDA_VISIBLE_DEVICES:-"0"}
Copy link
Contributor

@rmccorm4 rmccorm4 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does encode worker need a whole dedicated GPU? Or can it fit on same GPU as prefill? Reason I ask is because our CI (at least on gitlab side + pytest markers) has 1 and 2 gpu tests setup, but not sure about 3 gpu tests (we could use a 4 gpu runner if needed).

Copy link
Contributor

@rmccorm4 rmccorm4 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to https://github.com/ai-dynamo/dynamo/pull/4668/files#r2582944082, can we enable this script as a CI test like we do for the other scripts?

ex:

"disaggregated_multimodal": TRTLLMConfig(
name="disaggregated_multimodal",
directory=trtllm_dir,
script_name="disagg_multimodal.sh",
marks=[pytest.mark.gpu_2, pytest.mark.trtllm, pytest.mark.multimodal],
model="Qwen/Qwen2-VL-7B-Instruct",
models_port=8000,
timeout=900,
delayed_start=60,
request_payloads=[multimodal_payload_default()],
),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using a custom template because the default llava template is incompatible with dynamo's pre-processing? From my understanding, llava models will expect image instead image_url which is passed to the jinja template (More details: #4501).

# Handle image URLs (full E-PD flow with MultimodalEncoder)
elif image_urls:
if self.encode_client and self.connector:
encode_response = await self.remote_encode_full_epd(request)
Copy link
Contributor

@KrishnanPrash KrishnanPrash Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatively new to the EPD flow so please feel free to correct me, but based of this call, does the execution path look like this: Frontend -> Prefill -> Encode -> Prefill -> Decode?

If so, I was wondering how feasible a flow like this would look like: Frontend -> Encode -> Prefill -> Decode? Is this something we would want to do to save an extra network hop per request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krishung5 any thoughts on this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KrishnanPrash yes this makes sense from performance POV.
The design currently is such due to initial prefill/decode first routing limitation (now it's gone).
I feel we can take this up in a new PR as it would need some rework in prefill worker.
We can use this PR for functinality and a new one for optimization.

Copy link
Contributor

@KrishnanPrash KrishnanPrash Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just wanted to get your thoughts on this:
From my understanding, the current EPD flow looks like this:

  • Frontend -> Prefill Worker
  • Within the prefill worker we make a blocking call to the encode worker (await self.remote_encode_full_epd(request))
  • Encode worker finishes work and yields to the prefill worker

At a request-level, we are currently blocking at the encode worker before calling the prefill worker, which makes sense. But if we need to need to implement a feature like batching in the future for this flow, would the prefill worker be blocked until all the requests are done on the encoder-side?

engine=None,
):
"""
Process embedding request by loading embeddings and creating NIXL readable operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comments for this function need to be updated.

)
await readable_op.wait_for_completion()
logging.debug("EncodeHelper completed readable operation.")
elif image_urls and text_prompt:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a else case to handle the case if either image_urls or text_prompt or both are empty?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants