[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json #8617

rosenrodt · 2025-10-23T06:54:33Z

Summary by CodeRabbit

Release Notes

New Features
- Added fallback mechanism for loading quantization configurations, automatically switching to alternative sources when primary configuration is unavailable.
- Enhanced validation for quantization settings compatibility.
Improvements
- Improved readability of quantization configuration logging with truncated output values.
- Ensured consistent data type attribute handling across configurations.
Tests
- Added comprehensive tests for quantization configuration loading scenarios and fallback behavior.

Description

This supports mixed quant recipe (e.g., FP8 QKV + W4A8 MoE) from ModelOpt PTQ models

Test Coverage

tests/unittest/llmapi/test_llm_quant.py

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-10-23T07:00:09Z

📝 Walkthrough

Walkthrough

This PR enhances quantization configuration handling by implementing a tolerant fallback mechanism in ModelConfig that attempts to load quant_cfg.json first, then falls back to hf_quant_config.json when unavailable. It aggregates configurations from both sources, validates kv_cache settings, builds per-layer QuantConfig entries, and ensures torch_dtype is consistently populated. Logging improvements truncate verbose output.

Changes

Cohort / File(s)	Summary
Quantization Config Handling `tensorrt_llm/_torch/model_config.py`	Modified `load_modelopt_quant_config` to implement tolerant loading: attempt quant_cfg.json, fallback to hf_quant_config.json with logging. Aggregates configs from both sources, validates kv_cache_quant_algo compatibility, and builds per-layer QuantConfig entries. Updated `from_pretrained` to populate `torch_dtype` from dtype if not already set. Normalized comment capitalization.
Logging Enhancement `tensorrt_llm/llmapi/llm_utils.py`	Truncated quantization config value logging to 100 characters with "..." ellipsis for longer values.
Unit Tests `tests/unittest/llmapi/test_llm_quant.py`	Added two new test functions: `test_quant_cfg_from_quant_cfg_json` validates MIXED_PRECISION loading from quant_cfg.json and per-layer settings; `test_quant_cfg_from_hf_quant_config` validates fallback to hf_quant_config.json. Imported ModelConfig for testing.

Sequence Diagram

sequenceDiagram
    participant User
    participant model_config as ModelConfig.load_<br/>modelopt_quant_config()
    participant quant_cfg as quant_cfg.json
    participant hf_quant as hf_quant_config.json
    participant config as Merged Config

    User->>model_config: load quantization config
    model_config->>quant_cfg: try read quant_cfg.json
    alt quant_cfg.json exists
        quant_cfg-->>model_config: config data
        model_config->>config: load quant_cfg
    else quant_cfg.json missing/fails
        model_config->>hf_quant: fallback: read hf_quant_config.json
        hf_quant-->>model_config: config data
        model_config->>model_config: log fallback message
        model_config->>config: load hf_quant_config
    end
    
    rect rgb(200, 220, 240)
        note over model_config,config: Aggregate & Validate
        model_config->>config: merge both configs
        model_config->>config: validate kv_cache_quant_algo<br/>compatibility across layers
    end
    
    model_config->>config: build per-layer QuantConfig<br/>with kv_cache_quant_algo
    config-->>User: merged quantization config

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes involve new fallback logic and configuration aggregation with validation in the primary file, alongside straightforward logging truncation and two focused test cases. While the logic is relatively clear, the interaction between mixed quantization sources and per-layer config building requires careful verification of the aggregation and validation flow.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description is largely incomplete compared to the template requirements. While the Description section provides minimal content ("This support mixed quant recipe (e.g., FP8 QKV + W4A8 MoE) from ModelOpt PTQ models"), it lacks detail about the issue being solved and how the solution addresses it. More critically, the Test Coverage section is entirely empty—no test names or details are provided despite the raw summary showing two new test functions were added (test_quant_cfg_from_quant_cfg_json and test_quant_cfg_from_hf_quant_config). The description reads as vague and does not fully explain the purpose, mechanism, or testing strategy for the changes.	The author should complete the Test Coverage section by listing and explaining the two new test functions that validate per-layer quantization loading from both quant_cfg.json and hf_quant_config.json. Additionally, the Description section should be expanded to clearly explain the problem (lack of per-layer quantization config support from hf_quant_config.json) and how the solution addresses it through the implemented fallback mechanism and per-layer handling logic. This will ensure downstream readers understand both the "what" and "why" of the changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json" follows the required template format with a valid NVBugs ID, a [fix] type indicator, and a clear summary. The title is concise and directly describes the primary change: enabling per-layer quantization configuration to be loaded from hf_quant_config.json. This aligns with the core functionality changes documented in the raw summary, which describes the fallback mechanism from quant_cfg.json to hf_quant_config.json and per-layer quantization support. The title is specific enough that teammates scanning history would understand the main change without ambiguity.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tensorrt_llm/llmapi/llm_utils.py (1)

1-1: Missing NVIDIA Apache-2.0 header (2025).

Per coding guidelines, prepend the 2025 NVIDIA Apache-2.0 copyright header to all .py files.

Apply at file top:

+ # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.

tensorrt_llm/_torch/model_config.py (1)

1-1: Missing NVIDIA Apache-2.0 header (2025).

Add the standard 2025 NVIDIA Apache-2.0 header at the top.

+ # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.

🧹 Nitpick comments (3)

tensorrt_llm/llmapi/llm_utils.py (1)

393-397: Avoid repeated str() and length checks in log.

Compute once to reduce overhead and keep code clearer.

-            for key, value in hf_quant_config.items():
-                logger.info(
-                    f"Setting {key}={str(value)[:100]}{'...' if len(str(value)) > 100 else ''} from HF quant config."
-                )
+            for key, value in hf_quant_config.items():
+                val_str = str(value)
+                truncated = (val_str[:100] + '...') if len(val_str) > 100 else val_str
+                logger.info(f"Setting {key}={truncated} from HF quant config.")
                 setattr(quant_config, key, value)

tests/unittest/llmapi/test_llm_quant.py (2)

79-137: Good coverage for quant_cfg.json + hf fallback; add a couple of assertions.

Add asserts for layer kv_cache_quant_algo and use enum for consistency.
-        assert quant_config.kv_cache_quant_algo == "FP8"
+        assert str(quant_config.kv_cache_quant_algo) == "FP8"
+        # Each per-layer config should inherit the global kv cache setting
+        assert layer_quant_config["model.layers.0.self_attn.k_proj"].kv_cache_quant_algo == "FP8"
Optionally add a negative test for mismatched kv_cache between files raising RuntimeError.

139-174: Fallback-only path looks good; add one more per-layer kv cache check.

Ensure per-layer kv cache inherits the global value in fallback case too.
-        assert layer_quant_config[
-            "model.layers.0.self_attn.q_proj"].quant_algo == "FP8"
+        assert layer_quant_config["model.layers.0.self_attn.q_proj"].quant_algo == "FP8"
+        assert layer_quant_config["model.layers.0.self_attn.q_proj"].kv_cache_quant_algo == "FP8"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04e2b27 and 78f99be.

📒 Files selected for processing (3)

tensorrt_llm/_torch/model_config.py (3 hunks)
tensorrt_llm/llmapi/llm_utils.py (1 hunks)
tests/unittest/llmapi/test_llm_quant.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/llmapi/llm_utils.py
tensorrt_llm/_torch/model_config.py
tests/unittest/llmapi/test_llm_quant.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/llmapi/llm_utils.py
tensorrt_llm/_torch/model_config.py
tests/unittest/llmapi/test_llm_quant.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/llmapi/llm_utils.py
tensorrt_llm/_torch/model_config.py
tests/unittest/llmapi/test_llm_quant.py

🧬 Code graph analysis (2)

tensorrt_llm/_torch/model_config.py (3)

tensorrt_llm/llmapi/llm_args.py (4)

quant_config (2534-2537)

quant_config (2540-2541)

model_dir (1442-1444)

model_dir (1447-1451)

tensorrt_llm/models/modeling_utils.py (2)

QuantConfig (131-271)

quant_algo (550-551)

tensorrt_llm/logger.py (1)

info (138-139)

tests/unittest/llmapi/test_llm_quant.py (2)

tensorrt_llm/_torch/model_config.py (1)

load_modelopt_quant_config (258-321)

tensorrt_llm/quantization/mode.py (1)

QuantAlgo (23-47)

🪛 Ruff (0.14.1)

tensorrt_llm/_torch/model_config.py

284-284: Do not catch blind exception: Exception

(BLE001)

286-286: f-string without any placeholders

Remove extraneous f prefix

(F541)

299-302: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Check PR Checklist Resolution
GitHub Check: Check PR Title Format
GitHub Check: Pre-commit Check

tensorrt_llm/_torch/model_config.py

tests/unittest/llmapi/test_llm_quant.py

rosenrodt · 2025-10-23T07:03:14Z

/bot run

tensorrt-cicd · 2025-10-23T07:10:44Z

PR_Github #22261 [ run ] triggered by Bot. Commit: 78f99be

tensorrt-cicd · 2025-10-23T09:33:48Z

PR_Github #22261 [ run ] completed with state SUCCESS. Commit: 78f99be
/LLM/main/L0_MergeRequest_PR pipeline #16783 completed with status: 'FAILURE'

rosenrodt · 2025-10-25T14:00:05Z

/bot run

tensorrt-cicd · 2025-10-25T14:05:36Z

PR_Github #22501 [ run ] triggered by Bot. Commit: 78f99be

tensorrt-cicd · 2025-10-25T15:41:52Z

PR_Github #22501 [ run ] completed with state SUCCESS. Commit: 78f99be
/LLM/main/L0_MergeRequest_PR pipeline #16958 completed with status: 'FAILURE'

rosenrodt · 2025-10-25T15:45:10Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-10-25T15:51:09Z

PR_Github #22506 [ run ] triggered by Bot. Commit: 78f99be

tensorrt-cicd · 2025-10-26T07:15:16Z

PR_Github #22506 [ run ] completed with state SUCCESS. Commit: 78f99be
/LLM/main/L0_MergeRequest_PR pipeline #16963 completed with status: 'FAILURE'

rosenrodt · 2025-10-27T01:43:06Z

/bot run

Restart CI due to GB300s being offline

tensorrt-cicd · 2025-10-27T01:48:47Z

PR_Github #22546 [ run ] triggered by Bot. Commit: 78f99be

tensorrt-cicd · 2025-10-27T05:01:36Z

PR_Github #22546 [ run ] completed with state FAILURE. Commit: 78f99be
/LLM/main/L0_MergeRequest_PR pipeline #16995 completed with status: 'FAILURE'

rosenrodt · 2025-10-27T07:02:07Z

/bot run --disable-fail-fast

Retrigger CI due to B300 being offline

tensorrt-cicd · 2025-10-27T07:08:37Z

PR_Github #22598 [ run ] triggered by Bot. Commit: 45f074b

tensorrt-cicd · 2025-10-27T11:45:26Z

PR_Github #22598 [ run ] completed with state SUCCESS. Commit: 45f074b
/LLM/main/L0_MergeRequest_PR pipeline #17035 completed with status: 'FAILURE'

rosenrodt · 2025-10-27T14:47:03Z

/bot run

Retrigger CI due to hitting HF rate limit in test_multi_request_batch_chat[llava-v1.6-mistral-7b-hf]. Error:

huggingface_hub.errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/llava-hf/llava-v1.6-mistral-7b-hf/revision/main

tensorrt-cicd · 2025-10-27T14:52:52Z

PR_Github #22655 [ run ] triggered by Bot. Commit: 4cc93f0

tensorrt-cicd · 2025-10-27T17:49:44Z

PR_Github #22655 [ run ] completed with state SUCCESS. Commit: 4cc93f0
/LLM/main/L0_MergeRequest_PR pipeline #17078 completed with status: 'FAILURE'

…cfg.json Signed-off-by: Anthony Chang <[email protected]>

rosenrodt · 2025-10-29T05:55:08Z

/bot run

tensorrt-cicd · 2025-10-29T06:00:36Z

PR_Github #22852 [ run ] triggered by Bot. Commit: a8e9206

tensorrt-cicd · 2025-10-29T16:13:44Z

PR_Github #22852 [ run ] completed with state SUCCESS. Commit: a8e9206
/LLM/main/L0_MergeRequest_PR pipeline #17236 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

rosenrodt · 2025-10-30T04:22:09Z

/bot run --disable-reuse-test --disable-fail-fast

tensorrt-cicd · 2025-10-30T04:28:01Z

PR_Github #22979 [ run ] triggered by Bot. Commit: a8e9206

rosenrodt · 2025-10-30T04:48:15Z

/bot kill

tensorrt-cicd · 2025-10-30T04:54:19Z

PR_Github #22983 [ kill ] triggered by Bot. Commit: a8e9206

tensorrt-cicd · 2025-10-30T04:54:21Z

PR_Github #22979 [ run ] completed with state ABORTED. Commit: a8e9206
LLM/main/L0_MergeRequest_PR #17323 (Blue Ocean) completed with status: ABORTED

tensorrt-cicd · 2025-10-30T04:54:51Z

PR_Github #22983 [ kill ] completed with state SUCCESS. Commit: a8e9206
Successfully killed previous jobs for commit a8e9206

rosenrodt · 2025-10-31T08:41:38Z

/bot reuse-pipeline

Revert to last good pipeline from #8617 (comment). I accidentally triggered a new pipeline without making any change.

tensorrt-cicd · 2025-10-31T08:47:13Z

PR_Github #23174 [ reuse-pipeline ] triggered by Bot. Commit: a8e9206

tensorrt-cicd · 2025-10-31T08:48:23Z

PR_Github #23174 [ reuse-pipeline ] completed with state SUCCESS. Commit: a8e9206
Can't reuse PR_Github #22979 with status: ABORTED

yilin-void

LGTM

Superjomn

LGTM on the llmapi changes.

QiJune · 2025-10-31T11:17:19Z

/bot skip --comment "pipeline passed before without code changes"

tensorrt-cicd · 2025-10-31T11:22:58Z

PR_Github #23191 [ skip ] triggered by Bot. Commit: a8e9206

tensorrt-cicd · 2025-10-31T11:41:42Z

PR_Github #23191 [ skip ] completed with state SUCCESS. Commit: a8e9206
Skipping testing for commit a8e9206

…ant_config.json (NVIDIA#8617) Signed-off-by: Anthony Chang <[email protected]> Signed-off-by: FredricZ-2007 <[email protected]>

rosenrodt requested review from a team as code owners October 23, 2025 06:54

rosenrodt requested review from Superjomn, nv-yilinf and yilin-void October 23, 2025 06:54

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

tensorrt_llm/_torch/model_config.py Show resolved Hide resolved

tensorrt_llm/_torch/model_config.py Show resolved Hide resolved

tensorrt_llm/_torch/model_config.py Show resolved Hide resolved

tests/unittest/llmapi/test_llm_quant.py Show resolved Hide resolved

rosenrodt force-pushed the load-modelopt-perlayer-quant branch 2 times, most recently from d17849f to 45f074b Compare October 27, 2025 06:48

rosenrodt requested a review from yuantailing October 27, 2025 07:15

rosenrodt force-pushed the load-modelopt-perlayer-quant branch from 45f074b to 4cc93f0 Compare October 27, 2025 14:45

allow loading per-layer quant config from hf_quant_config.json/quant_…

a8e9206

…cfg.json Signed-off-by: Anthony Chang <[email protected]>

rosenrodt force-pushed the load-modelopt-perlayer-quant branch from 4cc93f0 to a8e9206 Compare October 28, 2025 01:16

yilin-void approved these changes Oct 31, 2025

View reviewed changes

Superjomn approved these changes Oct 31, 2025

View reviewed changes

QiJune enabled auto-merge (squash) October 31, 2025 11:16

QiJune merged commit 852e506 into NVIDIA:main Oct 31, 2025
5 checks passed

syuoni mentioned this pull request Nov 19, 2025

Support W4A8 method of AngleSlim tool #6857

Open

[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json #8617

[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json #8617

Uh oh!

Conversation

rosenrodt commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Oct 23, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rosenrodt commented Oct 23, 2025

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

rosenrodt commented Oct 25, 2025

Uh oh!

tensorrt-cicd commented Oct 25, 2025

Uh oh!

tensorrt-cicd commented Oct 25, 2025

Uh oh!

rosenrodt commented Oct 25, 2025

Uh oh!

tensorrt-cicd commented Oct 25, 2025

Uh oh!

tensorrt-cicd commented Oct 26, 2025

Uh oh!

rosenrodt commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

rosenrodt commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

rosenrodt commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

rosenrodt commented Oct 29, 2025

Uh oh!

tensorrt-cicd commented Oct 29, 2025

Uh oh!

tensorrt-cicd commented Oct 29, 2025

Uh oh!

rosenrodt commented Oct 30, 2025

Uh oh!

tensorrt-cicd commented Oct 30, 2025

Uh oh!

rosenrodt commented Oct 30, 2025

rosenrodt commented Oct 23, 2025 •

edited

Loading

rosenrodt commented Oct 27, 2025 •

edited

Loading

rosenrodt commented Oct 27, 2025 •

edited

Loading

rosenrodt commented Oct 27, 2025 •

edited

Loading

rosenrodt commented Oct 31, 2025 •

edited

Loading