Skip to content

Conversation

@yilin-void
Copy link
Collaborator

@yilin-void yilin-void commented Jul 17, 2025

DeepEP diff: https://github.com/deepseek-ai/DeepEP/compare/eb3f072664251c05074c3ecc3c3f5dad179c29a9...7b15af835942675df041eca2dcb9930b880287e1?expand=1

I fixed the address of dispatch_rdma_recv_count_buffer to avoid cleaning it after each change in hidden_size or token_num. This eliminates the need to call the low_latency_buffer twice (before and after the LL dispatch). Additionally, we can use all_rank_max_num_tokens instead of self.deep_ep_max_num_tokens for dispatching and combining, which avoids the copy overhead.

Summary by CodeRabbit

Summary by CodeRabbit

  • Bug Fixes

    • Improved validation logic for token dispatching to allow more flexibility in buffer sizes.
    • Enhanced consistency checks to ensure token counts do not exceed configured limits, reducing potential errors during processing.
  • Refactor

    • Simplified tensor handling by removing redundant buffer cleaning and reshaping steps.
    • Unified dispatch and combination processes to use the actual maximum token count per rank for improved reliability.
  • Chores

    • Updated the version of a key dependency for improved compatibility.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 17, 2025

Walkthrough

The updates modify the DeepEP dependency version in the build configuration, replace the num_max_dispatch_tokens_per_rank variable with num_experts in the low latency buffer class and simplify its validation logic, and add assertions to enforce token count constraints in the fused MoE module. Redundant buffer cleaning, tensor truncation, and adapter logic are removed for streamlined dispatch and combine flows.

Changes

File(s) Change Summary
cpp/tensorrt_llm/deep_ep/CMakeLists.txt Updated DeepEP dependency commit hash to a newer version.
tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py Replaced num_max_dispatch_tokens_per_rank with num_experts; added assertions for num_experts consistency; simplified dispatch validation logic.
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py Added assertions for token counts, unified dispatch buffer sizing using actual max tokens, removed redundant buffer cleaning and tensor truncation, and eliminated adapter logic in combine step.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant FusedMoEWideEP
    participant VariableLengthLowLatencyBuffer
    participant DeepEP

    User->>FusedMoEWideEP: Call forward(...)
    FusedMoEWideEP->>VariableLengthLowLatencyBuffer: low_latency_dispatch(all_rank_max_num_tokens)
    VariableLengthLowLatencyBuffer->>VariableLengthLowLatencyBuffer: Assert num_experts consistency
    VariableLengthLowLatencyBuffer->>DeepEP: Dispatch tokens
    DeepEP-->>VariableLengthLowLatencyBuffer: Return results
    FusedMoEWideEP->>FusedMoEWideEP: Combine results (direct call, no adapter)
    FusedMoEWideEP-->>User: Return output
Loading

Suggested reviewers

  • yizhang-nv
  • yuantailing

Poem

A hop and a skip through the code today,
Token counts checked in a rabbit’s own way.
Buffers are tidied, old checks swept aside,
With DeepEP now fresher, we leap with more pride.
No more padding or fuss—just swift, happy hops,
As the code garden blooms with these latest crop drops!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2)

465-469: Good safety check and more accurate token count usage.

The assertion ensures we don't exceed the configured buffer size, and using all_rank_max_num_tokens instead of the fixed maximum aligns with the actual dispatch requirements.

However, line 467 exceeds the 120-character limit.

-                    x, recv_expert_count, deep_ep_handle = \
-                        self.deep_ep_buffer.low_latency_dispatch(x, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
+                    x, recv_expert_count, deep_ep_handle = self.deep_ep_buffer.low_latency_dispatch(
+                        x, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)

623-626: Consistent safety check for postquant dispatch.

Good to see the same assertion and token count handling applied here for consistency.

Line 626 exceeds the 120-character limit.

-                fp4_packed_tensor, recv_expert_count, deep_ep_handle = \
-                    self.deep_ep_buffer.low_latency_dispatch(fp4_packed_tensor, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
+                fp4_packed_tensor, recv_expert_count, deep_ep_handle = self.deep_ep_buffer.low_latency_dispatch(
+                    fp4_packed_tensor, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a718486 and ee78914.

📒 Files selected for processing (3)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt (1 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py (1 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py (1)
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py

467-467: Line too long (126 > 120)

(E501)


626-626: Line too long (138 > 120)

(E501)

🔇 Additional comments (2)
cpp/tensorrt_llm/deep_ep/CMakeLists.txt (1)

1-1: DeepEP dependency updated to newer commit.

The update aligns with the refined token handling changes in the Python modules.

tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py (1)

142-142: Good improvement to validation logic.

Relaxing from strict equality to less-than-or-equal check allows more flexible dispatch token counts while still preventing buffer overflow. This change enables dispatching with variable token counts up to the configured maximum.

@yilin-void yilin-void requested a review from yuantailing July 17, 2025 10:40
@yilin-void yilin-void marked this pull request as ready for review July 17, 2025 10:40
@yilin-void yilin-void requested a review from a team as a code owner July 17, 2025 10:40
@yilin-void yilin-void requested review from liji-nv and yizhang-nv July 17, 2025 10:40
@yilin-void
Copy link
Collaborator Author

/bot run

@yilin-void yilin-void requested review from hyukn and removed request for liji-nv and yizhang-nv July 17, 2025 10:41
@tensorrt-cicd
Copy link
Collaborator

PR_Github #12199 [ run ] triggered by Bot

@yilin-void
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee78914 and 55fdcec.

📒 Files selected for processing (3)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt (1 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py (1 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py

467-467: Line too long (126 > 120)

(E501)


626-626: Line too long (138 > 120)

(E501)

🔇 Additional comments (2)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2)

465-468: LGTM: Assertion and dispatch parameter update look correct.

The assertion ensures all_rank_max_num_tokens doesn't exceed the configured limit before dispatch, and updating the dispatch call to use the actual maximum tokens per rank instead of the hardcoded limit is a good improvement for handling variable token counts.


711-717: low_latency_combine signature & reshape usage verified

The low_latency_combine method in deep_ep_utils.py is defined as:

def low_latency_combine(self,
                        hidden_states: torch.Tensor,
                        topk_idx: torch.Tensor,
                        topk_weights: torch.Tensor,
                        handle: Tuple):
    …

This matches the call in tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (lines 711–717):

num_tokens_per_expert_for_fused_moe = (
    self.mapping.moe_ep_size * all_rank_max_num_tokens
)
final_hidden_states = final_hidden_states.view(
    self.expert_size_per_partition,
    num_tokens_per_expert_for_fused_moe,
    self.hidden_size,
)
final_hidden_states = self.deep_ep_buffer.low_latency_combine(
    final_hidden_states,
    deep_ep_topk_idx,
    deep_ep_topk_weights,
    deep_ep_handle,
)

Reshaping with all_rank_max_num_tokens produces the expected
(expert_size_per_partition, num_tokens_per_expert, hidden_size) tensor,
and all arguments align with the combine implementation. No changes needed.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12200 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12199 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12200 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9062 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@yilin-void
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12268 [ run ] triggered by Bot

@yilin-void
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12275 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12268 [ run ] completed with state ABORTED

@yilin-void
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

625-628: Consistent implementation with remaining formatting issue.

The assertion and dispatch call update are correctly implemented, consistent with the earlier occurrence in the file. However, the line length violation on line 628 still needs to be addressed.

Apply this formatting fix:

-                    self.deep_ep_buffer.low_latency_dispatch(fp4_packed_tensor, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
+                    self.deep_ep_buffer.low_latency_dispatch(
+                        fp4_packed_tensor, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
🧹 Nitpick comments (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

466-468: Logic changes look good, but fix line length violation.

The assertion correctly validates the token count constraint and the dispatch call update aligns with the PR objectives to use actual token counts instead of hardcoded maximums.

However, there's a line length violation on line 468. Apply this formatting fix:

-                    self.deep_ep_buffer.low_latency_dispatch(x, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
+                    self.deep_ep_buffer.low_latency_dispatch(
+                        x, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f69e8fc and 3e4a14f.

📒 Files selected for processing (3)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt (1 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py (3 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py

468-468: Line too long (126 > 120)

(E501)


628-628: Line too long (138 > 120)

(E501)

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12299 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12275 [ run ] completed with state ABORTED

@yilin-void
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

625-628: Fix line length violation - duplicate formatting issue.

This segment has the same line length violation as the previous dispatch call. The assertion and parameter logic are correct, but the formatting needs to be consistent.

Apply this diff to fix the line length violation:

-                assert all_rank_max_num_tokens <= self.deep_ep_max_num_tokens
-                fp4_packed_tensor, recv_expert_count, deep_ep_handle = \
-                    self.deep_ep_buffer.low_latency_dispatch(fp4_packed_tensor, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
+                assert all_rank_max_num_tokens <= self.deep_ep_max_num_tokens
+                fp4_packed_tensor, recv_expert_count, deep_ep_handle = self.deep_ep_buffer.low_latency_dispatch(
+                    fp4_packed_tensor, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots
+                )
🧹 Nitpick comments (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

466-470: Fix line length violation while maintaining correct logic.

The assertion and dispatch parameter changes are correct - using all_rank_max_num_tokens instead of the hardcoded maximum aligns with the PR's goal of supporting variable token counts. However, there's a formatting issue that needs to be addressed.

Apply this diff to fix the line length violation:

-                assert all_rank_max_num_tokens <= self.deep_ep_max_num_tokens
-                x, recv_expert_count, deep_ep_handle = \
-                    self.deep_ep_buffer.low_latency_dispatch(x, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
+                assert all_rank_max_num_tokens <= self.deep_ep_max_num_tokens
+                x, recv_expert_count, deep_ep_handle = self.deep_ep_buffer.low_latency_dispatch(
+                    x, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots
+                )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e4a14f and f32c0ec.

📒 Files selected for processing (3)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt (1 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py (3 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py

468-468: Line too long (126 > 120)

(E501)


628-628: Line too long (138 > 120)

(E501)

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12316 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12299 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12316 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9150 completed with status: 'FAILURE'

Signed-off-by: Yilin Zhang <[email protected]>
@yilin-void
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

618-620: Line length violation persists - reformat the dispatch call.

The assertion and logic changes are correct, but the line length violation on line 620 needs to be fixed by reformatting the dispatch call arguments.

This appears to be the same formatting issue identified in previous reviews. Please apply the formatting fix:

-                assert all_rank_max_num_tokens <= self.deep_ep_max_num_tokens
-                fp4_packed_tensor, recv_expert_count, deep_ep_handle = \
-                    self.deep_ep_buffer.low_latency_dispatch(fp4_packed_tensor, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
+                assert all_rank_max_num_tokens <= self.deep_ep_max_num_tokens
+                fp4_packed_tensor, recv_expert_count, deep_ep_handle = self.deep_ep_buffer.low_latency_dispatch(
+                    fp4_packed_tensor,
+                    deep_ep_topk_idx,
+                    all_rank_max_num_tokens,
+                    self.num_slots,
+                )
🧹 Nitpick comments (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

466-469: Fix line length violation and verify assertion logic.

The assertion correctly validates token count limits before dispatch, and using all_rank_max_num_tokens aligns with the PR objective of supporting variable token numbers. However, there's a line length violation that needs to be addressed.

Apply this diff to fix the line length violation:

-                assert all_rank_max_num_tokens <= self.deep_ep_max_num_tokens
-                x, recv_expert_count, deep_ep_handle = \
-                    self.deep_ep_buffer.low_latency_dispatch(x, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots)
+                assert all_rank_max_num_tokens <= self.deep_ep_max_num_tokens
+                x, recv_expert_count, deep_ep_handle = self.deep_ep_buffer.low_latency_dispatch(
+                    x, deep_ep_topk_idx, all_rank_max_num_tokens, self.num_slots
+                )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f32c0ec and d578007.

📒 Files selected for processing (3)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt (1 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py (3 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/deep_ep/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • tensorrt_llm/_torch/modules/fused_moe/deep_ep_utils.py
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py

468-468: Line too long (126 > 120)

(E501)


620-620: Line too long (138 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

466-620: Implementation correctly supports variable token counts with proper validation.

The changes successfully implement the PR objective of supporting variable hidden sizes and token numbers by:

  1. Adding safety assertions - Both dispatch calls now validate that all_rank_max_num_tokens doesn't exceed deep_ep_max_num_tokens before dispatching
  2. Using dynamic token counts - Replacing hardcoded deep_ep_max_num_tokens with all_rank_max_num_tokens for actual dispatch operations
  3. Maintaining consistency - Both DeepEPLowLatency dispatch paths follow the same pattern

This aligns with the PR summary's goal of avoiding overhead from unnecessary data copying and supporting variable token numbers.

The logic changes are sound and improve the flexibility of the DeepEP implementation as intended.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12366 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12366 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9187 completed with status: 'SUCCESS'

@yilin-void yilin-void merged commit 118307c into NVIDIA:main Jul 20, 2025
3 checks passed
reasonsolo pushed a commit to reasonsolo/TensorRT-LLM that referenced this pull request Jul 21, 2025
timlee0212 pushed a commit to timlee0212/TensorRT-LLM that referenced this pull request Jul 21, 2025
@coderabbitai coderabbitai bot mentioned this pull request Jul 23, 2025
NVShreyas pushed a commit to NVShreyas/TensorRT-LLM that referenced this pull request Jul 28, 2025
Ransiki pushed a commit to Ransiki/TensorRT-LLM that referenced this pull request Jul 29, 2025
@yilin-void yilin-void deleted the dev/deep_ep branch September 28, 2025 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants