feat: Enable AutoDeploy to llm-eval example #4020

meenchen · 2025-05-02T03:19:46Z

PR title

Description

Support autodeploy backend in the lm_eval_tensorrt_llm.py. And deprecated the lm_eval script in autodeploy.
Enable chat templates in the lm_eval_tensorrt_llm.py
Upgrade lm_eval to the latest release (align with modelopt)

Test Coverage

python lm_eval_tensorrt_llm.py --model trt-llm --model_args model=/home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B,backend=autodeploy,max_context_length=2048,max_gen_toks=128,tp=1,temperature=0 --tasks mmlu --batch_size 4

Groups	Version	Filter	Metric		Value		Stderr
mmlu_llama	1	strict_match	exact_match	↑	0.6737	±	0.0037
- humanities	1	strict_match	exact_match	↑	0.6225	±	0.0067
- other	1	strict_match	exact_match	↑	0.7480	±	0.0074
- social sciences	1	strict_match	exact_match	↑	0.7722	±	0.0074
- stem	0	strict_match	exact_match	↑	0.5807	±	0.0084

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

New Features
- Added support for multiple backends and enhanced chat templating in model evaluation, including temperature control for generation.
Bug Fixes
- Updated test configurations to use the latest model evaluation scripts and parameters, and disabled select unstable test cases.
Documentation
- Revised example commands and instructions for running model evaluation, and added a reference to the official evaluation harness documentation.
Chores
- Updated dependency version for evaluation harness.
- Adjusted Python path settings for improved module discovery during testing.
- Removed unused configuration in example scripts.

lucaslie

Please make sure to update the usage of lm_eval_ad.py as well, for example, in the integration tests

lucaslie · 2025-05-07T18:59:55Z

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py

why is this not set anymore?

lucaslie · 2025-05-07T19:01:20Z

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py

where is this coming from?

Fridah-nv · 2025-05-08T16:59:18Z

Hi @meenchen , does it support --tasks gsm8k and TP>1?
I fail to align the gsm8k results from lm_eval_ad.py and lm_eval_tensorrt_llm.py. Here's my results:

Configuration	lm_eval_ad	lm_eval_tensorrt_llm
upstream/main	0.42	0.42
upstream/main without rope transformation	0.78	0.43

All run with Llama3.1 and the number is strict-match.

lm_eval_ad.py result uses config from first test case in test_lm_eval.py
lm_eval_tensorrt_llm uses this command:

python lm_eval_tensorrt_llm.py --model trt-llm --model_args model=/home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B,backend=autodeploy,max_context_length=2048,max_gen_toks=128,tp=1,temperature=0 --tasks gsm8k --batch_size 4

I cannot run TP>1 with lm_eval_tensorrt_llm or TP=1 with test_lm_eval.py

meenchen · 2025-05-09T18:47:42Z

Hi @meenchen , does it support --tasks gsm8k and TP>1? I fail to align the gsm8k results from lm_eval_ad.py and lm_eval_tensorrt_llm.py. Here's my results:

Configuration lm_eval_ad lm_eval_tensorrt_llm
upstream/main 0.42 0.42
upstream/main without rope transformation 0.78 0.43
All run with Llama3.1 and the number is strict-match.

lm_eval_ad.py result uses config from first test case in test_lm_eval.py lm_eval_tensorrt_llm uses this command:
python lm_eval_tensorrt_llm.py --model trt-llm --model_args model=/home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B,backend=autodeploy,max_context_length=2048,max_gen_toks=128,tp=1,temperature=0 --tasks gsm8k --batch_size 4
I cannot run TP>1 with lm_eval_tensorrt_llm or TP=1 with test_lm_eval.py

@Fridah-nv, I am able to run TP> 2 for gsm8k. My command:

python lm_eval_tensorrt_llm.py --model trt-llm --model_args model=/home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B-Instruct,backend=autodeploy,max_context_length=2048,max_gen_toks=128,tp=2,temperature=0 --tasks gsm8k --batch_size 4 --limit 0.05

What's the error you encounter?

poweiw · 2025-06-05T20:14:42Z

@Fridah-nv any updates?

Fridah-nv · 2025-06-16T15:50:24Z

@Fridah-nv any updates?

@meenchen will wrap up this PR. Please let me know if there's further question.

Signed-off-by: weimingc <[email protected]>

coderabbitai · 2025-08-06T19:56:23Z

📝 Walkthrough

Walkthrough

The changes introduce expanded backend support and chat templating in the model evaluation harness, update documentation and test configurations to reflect new usage patterns, and adjust testing to use the new backend and parameter formats. Some tests are disabled, and dependency versions are updated. Minor code cleanup and configuration adjustments are also included.

Changes

Cohort / File(s)	Change Summary
Backend & Chat Templating Enhancements `examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py`	Added support for multiple backends (torch, pytorch, autodeploy), introduced chat templating, temperature control, new `JsonChatStr` class, and reorganized LLM instantiation logic.
Documentation Update `examples/auto_deploy/README.md`	Updated model evaluation command instructions to reflect new script location, backend, and argument requirements; added reference to official lm-eval-harness README.
Test Adaptation and Reduction `tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py`	Switched to new evaluation harness import, disabled several parameterized tests, simplified remaining test to focus on "mmlu_llama" with consolidated arguments and updated backend.
Dependency Update `examples/llm-eval/lm-eval-harness/requirements.txt`	Updated `lm_eval` package version from 0.4.7 to 0.4.8.
Python Path Configuration `tests/unittest/pytest.ini`	Changed Python path from `../../examples/auto_deploy` to `../../examples/llm-eval/lm-eval-harness` for pytest.
Script Cleanup `examples/auto_deploy/build_and_run_flux.py`	Removed setting of `torch._dynamo.config.cache_size_limit`.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LM_Eval_Harness
    participant TRTLLMEvalBase
    participant LLM_Backend

    User->>LM_Eval_Harness: Run evaluation with CLI args (model, backend, etc.)
    LM_Eval_Harness->>TRTLLMEvalBase: Initialize with backend and parameters
    TRTLLMEvalBase->>LLM_Backend: Configure backend (torch/pytorch/autodeploy)
    TRTLLMEvalBase->>TRTLLMEvalBase: Apply chat template (if needed)
    TRTLLMEvalBase->>LLM_Backend: Generate or compute loglikelihood with temperature
    LLM_Backend-->>TRTLLMEvalBase: Return results
    TRTLLMEvalBase-->>LM_Eval_Harness: Return evaluation results
    LM_Eval_Harness-->>User: Output metrics/results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–25 minutes

Possibly related PRs

[fix] Fixes to parameter usage and low latency configuration. #6343: Both PRs involve backend-specific handling for AutoDeployLLM, extending support and initialization logic for the 'autodeploy' backend, but in different contexts (evaluation harness vs. benchmarking).

Suggested reviewers

pcastonguay
litaotju
syuoni

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py (2)

177-178: Clarify why max_length is no longer set.

A previous reviewer asked why self.max_length is no longer being set. Please provide clarification on this change.

188-203: Clarify the source of the chat template implementation.

A previous reviewer asked about the origin of this code. Please provide context on whether this is adapted from another implementation or newly written.

🧹 Nitpick comments (3)

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py (2)
71-77: Consider adding docstrings for new parameters.

The new parameters attn_backend, tokenized_requests, and temperature would benefit from documentation explaining their purpose and valid values.

180-187: Fix docstring formatting issues.

The docstring needs a blank line after the summary and line 184 exceeds the character limit.
 @property
 def tokenizer_name(self) -> str:
     """Must be defined for LM subclasses which implement Chat Templating.
+    
     Should return the name of the tokenizer or chat template used.
-    Used only to properly fingerprint caches when requests are being cached with `--cache_requests`, otherwise not used.
+    Used only to properly fingerprint caches when requests are being cached 
+    with `--cache_requests`, otherwise not used.
     """
     return ""
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py (1)

61-124: Consider using pytest.mark.skip instead of commenting out test cases.

Rather than commenting out test parameters, consider keeping them active with pytest.mark.skip decorators. This preserves the test definitions while clearly marking them as temporarily disabled.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a16ba64 and 4b6a7fd.

📒 Files selected for processing (6)

examples/auto_deploy/README.md (1 hunks)
examples/auto_deploy/build_and_run_flux.py (0 hunks)
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py (8 hunks)
examples/llm-eval/lm-eval-harness/requirements.txt (1 hunks)
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py (3 hunks)
tests/unittest/pytest.ini (1 hunks)

💤 Files with no reviewable changes (1)

examples/auto_deploy/build_and_run_flux.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a class in the constructor in Python.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/unittest/pytest.ini
examples/auto_deploy/README.md
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tests/unittest/pytest.ini
examples/auto_deploy/README.md
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py

📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...

Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

examples/auto_deploy/README.md
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py

📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T08:45:40.701Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

examples/auto_deploy/README.md
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py

📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...

Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

examples/auto_deploy/README.md
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py

📚 Learning: ministral is a valid model name from mistral ai, distinct from the regular mistral models. in tensor...

Learnt from: venkywonka
PR: NVIDIA/TensorRT-LLM#6650
File: tests/integration/test_lists/qa/llm_perf_cluster.yml:33-37
Timestamp: 2025-08-06T03:47:16.802Z
Learning: Ministral is a valid model name from Mistral AI, distinct from the regular Mistral models. In TensorRT-LLM test configurations, "ministral_8b" and "ministral_8b_fp8" are correct model identifiers and should not be changed to "mistral_8b".

Applied to files:

examples/auto_deploy/README.md

🪛 Ruff (0.12.2)

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py

128-128: Line too long (121 > 120)

(E501)

131-131: Local variable pytorch_config is assigned to but never used

Remove assignment to unused variable pytorch_config

(F841)

182-185: 1 blank line required between summary line and description

(D205)

184-184: Line too long (124 > 120)

(E501)

🔇 Additional comments (8)

examples/llm-eval/lm-eval-harness/requirements.txt (1)

1-1: LGTM!

The patch version update from 0.4.7 to 0.4.8 is appropriate for the new backend support features.

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py (3)

46-52: LGTM!

The JsonChatStr utility class is well-designed for handling JSON-encoded chat prompts.

324-326: LGTM!

The temperature override implementation allows for consistent generation behavior across all requests.

139-151: LGTM!

The LLM instantiation correctly supports the new backend architecture.

tests/unittest/pytest.ini (1)

8-8: LGTM!

The pythonpath update correctly reflects the new location of the evaluation harness module.

examples/auto_deploy/README.md (1)

195-200: LGTM!

The documentation updates correctly reflect the new evaluation harness usage with autodeploy backend support. The example command and reference link are helpful for users.

tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py (2)

12-12: LGTM!

The import correctly references the new evaluation harness module.

133-140: LGTM!

The model arguments correctly configure the autodeploy backend with deterministic generation settings appropriate for testing.

coderabbitai · 2025-08-06T20:00:18Z

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py

+            if self.backend == 'pytorch':
+                if hasattr(PyTorchConfig, "moe_backend"):
+                    pytorch_config_params["moe_backend"] = self.moe_backend
+                    print(f"Info: moe_backend is set to {self.moe_backend}")
+                pytorch_config_params["attn_backend"] = attn_backend.upper()
+                print(f"Info: attn_backend is set to {attn_backend}")
+                pytorch_config = PyTorchConfig(**pytorch_config_params)


⚠️ Potential issue

Remove duplicate moe_backend configuration.

The moe_backend is already configured at lines 116-118. This duplicate block is redundant.

if self.backend == 'pytorch': - if hasattr(PyTorchConfig, "moe_backend"): - pytorch_config_params["moe_backend"] = self.moe_backend - print(f"Info: moe_backend is set to {self.moe_backend}") pytorch_config_params["attn_backend"] = attn_backend.upper() print(f"Info: attn_backend is set to {attn_backend}") pytorch_config = PyTorchConfig(**pytorch_config_params)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.backend == 'pytorch':

if hasattr(PyTorchConfig, "moe_backend"):

pytorch_config_params["moe_backend"] = self.moe_backend

print(f"Info: moe_backend is set to {self.moe_backend}")

pytorch_config_params["attn_backend"] = attn_backend.upper()

print(f"Info: attn_backend is set to {attn_backend}")

pytorch_config = PyTorchConfig(**pytorch_config_params)

if self.backend == 'pytorch':

pytorch_config_params["attn_backend"] = attn_backend.upper()

print(f"Info: attn_backend is set to {attn_backend}")

pytorch_config = PyTorchConfig(**pytorch_config_params)

🤖 Prompt for AI Agents

In examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py around lines 120 to 126, remove the duplicate setting of moe_backend in the pytorch_config_params dictionary since it is already configured at lines 116-118. Keep only the unique configuration lines such as setting attn_backend and creating the PyTorchConfig instance, eliminating the redundant moe_backend assignment and its print statement.

coderabbitai · 2025-08-06T20:00:18Z

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py

+            elif self.backend == 'autodeploy':
+                assert self.max_context_length is not None, "max_context_length must be specified for autodeploy backend"
+                # Only FlashInfer is supported for autodeploy backend.
+                pytorch_config_params["attn_backend"] = "FlashInfer"
+                pytorch_config = AutoDeployConfig(**pytorch_config_params)


⚠️ Potential issue

Fix unused variable and line length issues.

The pytorch_config variable is created but not used. Also, line 128 exceeds the 120 character limit.

elif self.backend == 'autodeploy': - assert self.max_context_length is not None, "max_context_length must be specified for autodeploy backend" + assert self.max_context_length is not None, \ + "max_context_length must be specified for autodeploy backend" # Only FlashInfer is supported for autodeploy backend. pytorch_config_params["attn_backend"] = "FlashInfer" - pytorch_config = AutoDeployConfig(**pytorch_config_params) + # AutoDeployConfig is passed directly through kwargs

Since the config is passed via kwargs, there's no need to assign it to pytorch_config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

elif self.backend == 'autodeploy':

assert self.max_context_length is not None, "max_context_length must be specified for autodeploy backend"

# Only FlashInfer is supported for autodeploy backend.

pytorch_config_params["attn_backend"] = "FlashInfer"

pytorch_config = AutoDeployConfig(**pytorch_config_params)

elif self.backend == 'autodeploy':

assert self.max_context_length is not None, \

"max_context_length must be specified for autodeploy backend"

# Only FlashInfer is supported for autodeploy backend.

pytorch_config_params["attn_backend"] = "FlashInfer"

# AutoDeployConfig is passed directly through kwargs

🧰 Tools

🪛 Ruff (0.12.2)

128-128: Line too long (121 > 120)

(E501)

131-131: Local variable pytorch_config is assigned to but never used

Remove assignment to unused variable pytorch_config

(F841)

🤖 Prompt for AI Agents

In examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py around lines 127 to 131, remove the assignment to the unused variable pytorch_config and instead directly call AutoDeployConfig with the unpacked pytorch_config_params. Also, break the long line 128 into multiple lines or simplify it to ensure it does not exceed 120 characters.

meenchen requested review from Fridah-nv, lucaslie, sugunav14 and suyoggupta May 2, 2025 16:33

meenchen changed the title ~~Add AutoDeploy to llm-eval example~~ feat: Enable AutoDeploy to llm-eval example May 2, 2025

lucaslie reviewed May 7, 2025

View reviewed changes

meenchen self-assigned this May 9, 2025

meenchen force-pushed the user/weimingc/lm_eval_refactor branch from ff6bcc9 to 731b278 Compare May 9, 2025 17:11

lucaslie added the AutoDeploy <NV> AutoDeploy Backend label May 14, 2025

github-project-automation bot added this to AutoDeploy Board May 14, 2025

github-project-automation bot moved this to Backlog in AutoDeploy Board May 14, 2025

lucaslie moved this from Backlog to In progress in AutoDeploy Board May 14, 2025

meenchen force-pushed the user/weimingc/lm_eval_refactor branch from 4b4e476 to 33fe071 Compare June 3, 2025 22:25

meenchen marked this pull request as draft June 3, 2025 22:25

poweiw added the Community want to contribute PRs initiated from Community label Jun 5, 2025

poweiw assigned Fridah-nv and unassigned meenchen Jun 5, 2025

poweiw added the triaged Issue has been triaged by maintainers label Jun 5, 2025

lucaslie removed triaged Issue has been triaged by maintainers Community want to contribute PRs initiated from Community labels Jun 5, 2025

Fridah-nv assigned meenchen and unassigned Fridah-nv Jun 16, 2025

poweiw added the Community want to contribute PRs initiated from Community label Jun 24, 2025

meenchen added 3 commits August 6, 2025 19:42

add autodeploy to lm_eval example

9dfce62

Signed-off-by: weimingc <[email protected]>

minor

a579365

Signed-off-by: weimingc <[email protected]>

minor

6e48382

Signed-off-by: weimingc <[email protected]>

wip

4b6a7fd

Signed-off-by: weimingc <[email protected]>

ajrasane force-pushed the user/weimingc/lm_eval_refactor branch from 33fe071 to 4b6a7fd Compare August 6, 2025 19:56

coderabbitai bot reviewed Aug 6, 2025

View reviewed changes

meenchen closed this Nov 14, 2025

github-project-automation bot moved this from In progress to Done in AutoDeploy Board Nov 14, 2025

feat: Enable AutoDeploy to llm-eval example #4020

feat: Enable AutoDeploy to llm-eval example #4020

Uh oh!

Conversation

meenchen commented May 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR title

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Uh oh!

lucaslie left a comment

Choose a reason for hiding this comment

Uh oh!

lucaslie May 7, 2025

Choose a reason for hiding this comment

Uh oh!

lucaslie May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Fridah-nv commented May 8, 2025

Uh oh!

meenchen commented May 9, 2025

Uh oh!

poweiw commented Jun 5, 2025

Uh oh!

Fridah-nv commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

meenchen commented May 2, 2025 •

edited by coderabbitai bot

Loading

Fridah-nv commented Jun 16, 2025 •

edited

Loading

coderabbitai bot commented Aug 6, 2025 •

edited

Loading