add corresponding qwen3-32b-fp8 aic based disagg performance tuning md guide #4655

davilu-nvidia · 2025-11-27T15:26:04Z

Overview:

To add corresponding qwen3-32b-fp8 aic based disagg performance tuning guide

Details:

Add related md which works with qwen3-32b-fp8-recipe

Summary by CodeRabbit

Documentation
- Added comprehensive guide for advanced performance tuning in disaggregated serving deployments, including optimization methodology, QPS matching workflows, automation strategies, practical case studies, and step-by-step fine-tuning guidance.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…uide

copy-pr-bot · 2025-11-27T15:26:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-11-27T15:28:43Z

Walkthrough

A new markdown document for the qwen3-32b-fp8 recipe detailing disaggregated serving performance tuning. Contents include QPS matching methodology via worker parallelism and batch sizing, AIC-driven automation deployment workflow, manual fine-tuning guidance, and deployment case studies demonstrating performance gains.

Changes

Cohort / File(s)	Summary
Qwen3 Disaggregation Performance Tuning Documentation `recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md`	New technical guide covering advanced disaggregation performance tuning, including QPS matching methodology, worker configuration workflows, AIC-driven automation deployment, manual fine-tuning steps, and case studies with performance metrics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Technical accuracy of QPS matching methodology and worker parallelism calculations
Completeness and clarity of the AIC-driven automation workflow steps
Validity of performance tuning recommendations and deployment guidance
Consistency with referenced external deployment guides

Poem

🐰 A recipe for speed, so crisp and neat,
With AIC tuning, the performance's sweet,
Disaggregation dances, workers align,
QPS matched up—the stars now shine! ✨

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding a qwen3-32b-fp8 AIC-based disaggregation performance tuning markdown guide, which matches the file added.
Description check	✅ Passed	The description covers the Overview and Details sections from the template but is missing the 'Where should the reviewer start?' and 'Related Issues' sections.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md (1)
25-25: Minor wording suggestion: simplify "in view of".

Line 25 uses "in view of" which is somewhat wordy. Consider a shorter alternative like "based on" or simply restructuring the sentence for greater clarity.

Example:
-### __Match__ the N prefill worker candidates with M decode worker candidates in view of __sequence throughput seq/s__
+### __Match__ the N prefill worker candidates with M decode worker candidates by __sequence throughput seq/s__

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9d7d95 and 5a5c8a5.

⛔ Files ignored due to path filters (7)

recipes/qwen3-32b-fp8/images/agg_allignment.png is excluded by !**/*.png
recipes/qwen3-32b-fp8/images/challenges_in_disagg.png is excluded by !**/*.png
recipes/qwen3-32b-fp8/images/disagg_aic_allignment.png is excluded by !**/*.png
recipes/qwen3-32b-fp8/images/disagg_allignment.png is excluded by !**/*.png
recipes/qwen3-32b-fp8/images/find_worker_SLA.png is excluded by !**/*.png
recipes/qwen3-32b-fp8/images/local_deploy_k8s.png is excluded by !**/*.png
recipes/qwen3-32b-fp8/images/qps_match.png is excluded by !**/*.png

📒 Files selected for processing (1)

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4655/merge) by davilu-nvidia.

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md

[error] 1-1: Trailing whitespace found. pre-commit hook trailing-whitespace failed and modified the file in place. Fix the trailing spaces in this file.

🪛 LanguageTool

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md

[grammar] ~16-~16: Ensure spelling is correct
Context: ...n: 0 auto;"> ## Disagg pd QPS Matching Methology ### We can firstly __find a worker that meet...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~17-~17: Consider using “who” when you are referring to a person instead of an object.
Context: ...ogy ### We can firstly find a worker that meets SLA and under constraints - En...

(THAT_WHO)

[style] ~25-~25: ‘in view of’ might be wordy. Consider a shorter alternative.
Context: ...didates with M decode worker candidates in view of sequence throughput seq/s - Seq/s ...

(EN_WORDINESS_PREMIUM_IN_VIEW_OF)

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... >= 60 - disable prefix caching ### AIC based full automation deployment [AIC a...

(QB_NEW_EN_HYPHEN)

[grammar] ~51-~51: Ensure spelling is correct
Context: ...- AIC projection and ai-perf actual run allignment What's the problem here: this tps/user -...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~56-~56: Use a hyphen to join words.
Context: ...the most important SLAs ### Manual fine tuning based on AIC suggestions __agg a...

(QB_NEW_EN_HYPHEN)

[grammar] ~72-~72: Ensure spelling is correct
Context: ...t (60), which means we need more decode wokers and to tune decode max_batch_size (us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~74-~74: Use a hyphen to join words.
Context: ...o enhance decoding capability. We fine tuned with more decode GPUs (2 x tp2 a...

(QB_NEW_EN_HYPHEN)

[grammar] ~74-~74: Ensure spelling is correct
Context: ...ker max_batch_size and prefill worker parallism setting, finally we found best disagg ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~80-~80: Use a hyphen to join words.
Context: ...Based on AIC run and minimum manual fine tuning process - Under TTFT constraint...

(QB_NEW_EN_HYPHEN)

[grammar] ~86-~86: Ensure spelling is correct
Context: ...e been working on fine-grained AIC perf allignment, advanced feature such as prefix cachin...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md

88-88: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

90-90: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

coderabbitai · 2025-11-27T15:28:46Z

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md

+# Advanced Disagg Perf Tuning 
+## Challenges in Disaggregated Serving Deployment
+__Challenge1__ – Is disaggregated serving always better than aggregated serving? How much perf gain is reasonable?
+
+A: For example, considering __ISL:OSL=1:4000__, do we have perf gain by using disaggregated serving? – __NO__
+
+__Challenge2__ – How to configure disaggregated serving to solve the problem __throughput @ latency__
+
+- Parallelism of the worker
+- How many p and d
+- Depend on ISL, OSL, TTFT, TPOT
+- The tuning efforts are tremendous
+
+<img src="images/challenges_in_disagg.png" width="700" alt="challenges_in_disagg" style="display: block; margin: 0 auto;">
+
+## Disagg pd QPS Matching Methology 
+### We can firstly __find a worker that meets SLA and under constraints__
+
+- Enumerate parallelism combination of a worker, tp x pp x attn dp x moe tp x moe ep
+- Find max batch size of the worker which meets TTFT and TPOT respectively (Disagg is awesome! We can achieve this separately)
+- Ensure there's no OOM
+
+<img src="images/find_worker_SLA.png" width="600" alt="find_worker_SLA" style="display: block; margin: 0 auto;">
+
+### __Match__ the N prefill worker candidates with M decode worker candidates in view of __sequence throughput seq/s__
+
+- Seq/s of prefill = how many sequences I can process and finish context phase per second => __producer__
+- Seq/s of decode = how many sequences I can process and finish the whole generation phase per second => __consumer__
+- The throughput should __match__ between xP and yD
+- Finally, sweep X and Y for a given (prefill, decode) worker combination, find the best seq/s/gpu, thus the best tokens/s/gpu
+
+<img src="images/qps_match.png" width="500" alt="qps_match" style="display: block; margin: 0 auto;">
+
+# agg/disagg best perf tuning based on AIC
+## Case Study
+### Settings
+- model: qwen3-32b-fp8-per-block
+- ISL:OSL = 4000/500
+- TTFT SLA = 600/1200 ms
+- TPS/user SLA >= 60
+- disable prefix caching
+
+### AIC based full automation deployment 
+[AIC automation deploy guide](https://github.com/ai-dynamo/aiconfigurator/blob/main/docs/dynamo_deployment_guide.md)
+
+AIC is now supporting automate everything in one script, starting from configuring the deployment, generating configs, preparing docker image and container, pulling model checkpoints, deploying service, benchmarking and summarizing. Refer to [Automation](https://github.com/ai-dynamo/aiconfigurator/blob/main/tools/automation/README.md) for more details
+
+### local deployment vs. k8s deployment 
+<img src="images/local_deploy_k8s.png" width="900" alt="local_deploy_k8s" style="display: block; margin: 0 auto;">
+
+### disagg - AIC projection and ai-perf actual run allignment 
+<img src="images/disagg_aic_allignment.png" width="800" alt="disagg_aic_allignment" style="display: block; margin: 0 auto;">
+
+What's the problem here: this tps/user - tps/gpu pareto plot __does not have TTFT info at all__ while TTFT is one of the most important SLAs
+
+### Manual fine tuning based on AIC suggestions 
+__agg allignment__ 
+
+<img src="images/agg_allignment.png" width="500" alt="agg_allignment" style="display: block; margin: 0 auto;">
+
+TTFT estimation is complicated. 
+Currently AIC can handle TTFT from engine execution, but not other online serving overheads, say request queuing
+
+actual TTFT is higher than expected, so we're supposed to reduce `max_batch_size` with `TP2` to meet TTFT SLA
+
+We did look around with other combinations of `TP_size` and `max_batch_size` and AIC was right, `TP2` is the best choice
+
+__disagg allignment__ 
+
+<img src="images/disagg_allignment.png" width="500" alt="disagg_allignment" style="display: block; margin: 0 auto;">
+
+Actual run based on AIC's suggestion is noted in yellow, as we can observe, tps/user is a little bit less than SLA requirement (60), which means we need more decode wokers and to tune decode `max_batch_size` (usually equals to `request concurrency`) to enhance decoding capability. 
+
+We fine tuned with more decode GPUs (`2 x tp2` and `1 x tp4`) with corresponding decode worker `max_batch_size` and prefill worker `parallism setting`, finally we found best disagg config __within minimum search area__ 
+
+Regarding prefill, considering that `tp1` has less communication consumption, we tried `4 x tp1` and it shows better performance than that of `2 x tp2`.
+
+__Conclusion__ 
+
+Based on AIC run and minimum manual fine tuning process 
+
+- Under TTFT constraint of 600 ms, disagg delivers a __148%__ tps/gpu perf gain over agg
+
+- Under TTFT constraint of 1200 ms, disagg delivers a __102%__ tps/gpu perf gain over agg
+
+We've been working on fine-grained AIC perf allignment, advanced feature such as prefix caching modeling, vllm/sglang (including wide ep) backend supports etc. Stay tuned!
+
+__Corresponding recipe__
+
+https://github.com/ai-dynamo/dynamo/tree/main/recipes/qwen3-32b-fp8
+
+
+
+
+
+
+


⚠️ Potential issue | 🟡 Minor

Remove trailing whitespace.

The pre-commit hook detected trailing whitespace in the file. Ensure all lines, including blank lines at the end of the file, have no trailing spaces.

🧰 Tools

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4655/merge) by davilu-nvidia.

[error] 1-1: Trailing whitespace found. pre-commit hook trailing-whitespace failed and modified the file in place. Fix the trailing spaces in this file.

🪛 LanguageTool

[grammar] ~16-~16: Ensure spelling is correct
Context: ...n: 0 auto;"> ## Disagg pd QPS Matching Methology ### We can firstly __find a worker that meet...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~17-~17: Consider using “who” when you are referring to a person instead of an object.
Context: ...ogy ### We can firstly find a worker that meets SLA and under constraints - En...

(THAT_WHO)

[style] ~25-~25: ‘in view of’ might be wordy. Consider a shorter alternative.
Context: ...didates with M decode worker candidates in view of sequence throughput seq/s - Seq/s ...

(EN_WORDINESS_PREMIUM_IN_VIEW_OF)

[grammar] ~43-~43: Use a hyphen to join words.
Context: ... >= 60 - disable prefix caching ### AIC based full automation deployment [AIC a...

(QB_NEW_EN_HYPHEN)

[grammar] ~51-~51: Ensure spelling is correct
Context: ...- AIC projection and ai-perf actual run allignment What's the problem here: this tps/user -...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~56-~56: Use a hyphen to join words.
Context: ...the most important SLAs ### Manual fine tuning based on AIC suggestions __agg a...

(QB_NEW_EN_HYPHEN)

[grammar] ~72-~72: Ensure spelling is correct
Context: ...t (60), which means we need more decode wokers and to tune decode max_batch_size (us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~74-~74: Use a hyphen to join words.
Context: ...o enhance decoding capability. We fine tuned with more decode GPUs (2 x tp2 a...

(QB_NEW_EN_HYPHEN)

[grammar] ~74-~74: Ensure spelling is correct
Context: ...ker max_batch_size and prefill worker parallism setting, finally we found best disagg ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~80-~80: Use a hyphen to join words.
Context: ...Based on AIC run and minimum manual fine tuning process - Under TTFT constraint...

(QB_NEW_EN_HYPHEN)

[grammar] ~86-~86: Ensure spelling is correct
Context: ...e been working on fine-grained AIC perf allignment, advanced feature such as prefix cachin...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)

88-88: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

90-90: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md lines 1-97: the file contains trailing whitespace (including possible blank lines at EOF) which trips the pre-commit hook; remove all trailing spaces from every line and ensure the file ends with a single newline and no trailing spaces—use your editor's "trim trailing whitespace" feature or run a command like `sed -i 's/[ \t]*$//' <file>` (or enable trim on save) and re-save, then re-run the pre-commit checks.

coderabbitai · 2025-11-27T15:28:47Z

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md

+
+<img src="images/challenges_in_disagg.png" width="700" alt="challenges_in_disagg" style="display: block; margin: 0 auto;">
+
+## Disagg pd QPS Matching Methology 


⚠️ Potential issue | 🟡 Minor

Fix hyphenation and spelling errors in headings and text.

Several spelling and grammar errors need correction throughout the document:

Line 16: "Methology" → "Methodology"

Line 43: "AIC based" → "AIC-based"

Line 56: "fine tuning" → "fine-tuning"

Line 80: "fine tuning" → "fine-tuning"

Apply this diff:

-## Disagg pd QPS Matching Methology +## Disagg pd QPS Matching Methodology

-### AIC based full automation deployment +### AIC-based full automation deployment

-### Manual fine tuning based on AIC suggestions +### Manual fine-tuning based on AIC suggestions

-Based on AIC run and minimum manual fine tuning process +Based on AIC run and minimum manual fine-tuning process

Also applies to: 43-43, 56-56, 80-80

🧰 Tools

🪛 LanguageTool

[grammar] ~16-~16: Ensure spelling is correct
Context: ...n: 0 auto;"> ## Disagg pd QPS Matching Methology ### We can firstly __find a worker that meet...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

In recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md around lines 16, 43, 56 and 80, fix heading and text hyphenation/spelling: change "Methology" to "Methodology" on line 16, "AIC based" to "AIC-based" on line 43, and both occurrences of "fine tuning" on lines 56 and 80 to "fine-tuning"; update the content in-place ensuring punctuation and capitalization remain consistent.

coderabbitai · 2025-11-27T15:28:47Z

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md

+### local deployment vs. k8s deployment 
+<img src="images/local_deploy_k8s.png" width="900" alt="local_deploy_k8s" style="display: block; margin: 0 auto;">
+
+### disagg - AIC projection and ai-perf actual run allignment 


⚠️ Potential issue | 🟡 Minor

Fix spelling errors and hyphenation in case study and conclusion.

Multiple spelling errors need correction:

Line 51: "allignment" → "alignment"

Line 72: "wokers" → "workers"

Line 74: "parallism" → "parallelism", "fine tuned" → "fine-tuned"

Line 86: "allignment" → "alignment"

Apply this diff:

-### disagg - AIC projection and ai-perf actual run allignment +### disagg - AIC projection and ai-perf actual run alignment

-Actual run based on AIC's suggestion is noted in yellow, as we can observe, tps/user is a little bit less than SLA requirement (60), which means we need more decode wokers and to tune decode `max_batch_size` (usually equals to `request concurrency`) to enhance decoding capability. +Actual run based on AIC's suggestion is noted in yellow, as we can observe, tps/user is a little bit less than SLA requirement (60), which means we need more decode workers and to tune decode `max_batch_size` (usually equals to `request concurrency`) to enhance decoding capability.

-We fine tuned with more decode GPUs (`2 x tp2` and `1 x tp4`) with corresponding decode worker `max_batch_size` and prefill worker `parallism setting`, finally we found best disagg config __within minimum search area__ +We fine-tuned with more decode GPUs (`2 x tp2` and `1 x tp4`) with corresponding decode worker `max_batch_size` and prefill worker `parallelism setting`, finally we found best disagg config __within minimum search area__

-We've been working on fine-grained AIC perf allignment, advanced feature such as prefix caching modeling, vllm/sglang (including wide ep) backend supports etc. Stay tuned! +We've been working on fine-grained AIC perf alignment, advanced feature such as prefix caching modeling, vllm/sglang (including wide ep) backend supports etc. Stay tuned!

Also applies to: 72-72, 74-74, 86-86

🧰 Tools

🪛 LanguageTool

[grammar] ~51-~51: Ensure spelling is correct
Context: ...- AIC projection and ai-perf actual run allignment What's the problem here: this tps/user -...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

In recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md around lines 51, 72, 74, and 86, there are spelling and hyphenation issues: change "allignment" to "alignment" on lines 51 and 86, change "wokers" to "workers" on line 72, and on line 74 change "parallism" to "parallelism" and "fine tuned" to the hyphenated "fine-tuned"; update each instance accordingly to correct spelling and apply hyphenation.

coderabbitai · 2025-11-27T15:28:47Z

recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md

+__Corresponding recipe__
+
+https://github.com/ai-dynamo/dynamo/tree/main/recipes/qwen3-32b-fp8


⚠️ Potential issue | 🟡 Minor

Convert emphasis to proper markdown heading and format bare URL.

Line 88 uses emphasis as a heading (violates MD036), and line 90 contains a bare URL without markdown link formatting (violates MD034).

Apply this diff:

-__Corresponding recipe__ +## Corresponding Recipe -https://github.com/ai-dynamo/dynamo/tree/main/recipes/qwen3-32b-fp8 +[qwen3-32b-fp8 Recipe](https://github.com/ai-dynamo/dynamo/tree/main/recipes/qwen3-32b-fp8)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

__Corresponding recipe__

https://github.com/ai-dynamo/dynamo/tree/main/recipes/qwen3-32b-fp8

## Corresponding Recipe

[qwen3-32b-fp8 Recipe](https://github.com/ai-dynamo/dynamo/tree/main/recipes/qwen3-32b-fp8)

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

88-88: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

90-90: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

In recipes/qwen3-32b-fp8/aic_based_disagg_perf_tuning.md around lines 88 to 90, the text uses emphasis as a heading (MD036) and leaves a bare URL (MD034); change the emphasized line to a proper Markdown heading (prepend one or more # as appropriate) and replace the bare URL with a Markdown link (e.g. [Corresponding recipe](https://github.com/ai-dynamo/dynamo/tree/main/recipes/qwen3-32b-fp8)) so the heading and link follow MD style rules.

BenHamm · 2025-12-01T23:34:21Z

Seeing mismatch between this guide and AIC 0.4.0 results: https://gist.github.com/BenHamm/3ec1e1e92312302e966ee75606fe1931

add corresponding qwen3-32b-fp8 aic based disagg performance tuning g…

5a5c8a5

…uide

davilu-nvidia requested review from a team as code owners November 27, 2025 15:26

pull-request-size bot added the size/M label Nov 27, 2025

coderabbitai bot reviewed Nov 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add corresponding qwen3-32b-fp8 aic based disagg performance tuning md guide #4655

add corresponding qwen3-32b-fp8 aic based disagg performance tuning md guide #4655

Uh oh!

davilu-nvidia commented Nov 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Nov 27, 2025

Uh oh!

coderabbitai bot commented Nov 27, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 27, 2025

Uh oh!

coderabbitai bot Nov 27, 2025

Uh oh!

coderabbitai bot Nov 27, 2025

Uh oh!

coderabbitai bot Nov 27, 2025

Uh oh!

BenHamm commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		<img src="images/challenges_in_disagg.png" width="700" alt="challenges_in_disagg" style="display: block; margin: 0 auto;">

		## Disagg pd QPS Matching Methology

		__Corresponding recipe__

		https://github.com/ai-dynamo/dynamo/tree/main/recipes/qwen3-32b-fp8

add corresponding qwen3-32b-fp8 aic based disagg performance tuning md guide #4655

Are you sure you want to change the base?

add corresponding qwen3-32b-fp8 aic based disagg performance tuning md guide #4655

Uh oh!

Conversation

davilu-nvidia commented Nov 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Nov 27, 2025

Uh oh!

coderabbitai bot commented Nov 27, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

BenHamm commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davilu-nvidia commented Nov 27, 2025 •

edited by coderabbitai bot

Loading