Skip to content

Conversation

@meenchen
Copy link
Collaborator

@meenchen meenchen commented May 2, 2025

PR title

Description

  • Support autodeploy backend in the lm_eval_tensorrt_llm.py. And deprecated the lm_eval script in autodeploy.
  • Enable chat templates in the lm_eval_tensorrt_llm.py
  • Upgrade lm_eval to the latest release (align with modelopt)

Test Coverage

python lm_eval_tensorrt_llm.py --model trt-llm --model_args model=/home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B,backend=autodeploy,max_context_length=2048,max_gen_toks=128,tp=1,temperature=0 --tasks mmlu --batch_size 4

Groups Version Filter n-shot Metric Value Stderr
mmlu_llama 1 strict_match exact_match 0.6737 ± 0.0037
- humanities 1 strict_match exact_match 0.6225 ± 0.0067
- other 1 strict_match exact_match 0.7480 ± 0.0074
- social sciences 1 strict_match exact_match 0.7722 ± 0.0074
- stem 0 strict_match exact_match 0.5807 ± 0.0084

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

  • New Features

    • Added support for multiple backends and enhanced chat templating in model evaluation, including temperature control for generation.
  • Bug Fixes

    • Updated test configurations to use the latest model evaluation scripts and parameters, and disabled select unstable test cases.
  • Documentation

    • Revised example commands and instructions for running model evaluation, and added a reference to the official evaluation harness documentation.
  • Chores

    • Updated dependency version for evaluation harness.
    • Adjusted Python path settings for improved module discovery during testing.
    • Removed unused configuration in example scripts.

@meenchen meenchen changed the title Add AutoDeploy to llm-eval example feat: Enable AutoDeploy to llm-eval example May 2, 2025
Copy link
Member

@lucaslie lucaslie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to update the usage of lm_eval_ad.py as well, for example, in the integration tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not set anymore?

Comment on lines +173 to +203
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this coming from?

@Fridah-nv
Copy link
Collaborator

Hi @meenchen , does it support --tasks gsm8k and TP>1?
I fail to align the gsm8k results from lm_eval_ad.py and lm_eval_tensorrt_llm.py. Here's my results:

Configuration lm_eval_ad lm_eval_tensorrt_llm
upstream/main 0.42 0.42
upstream/main without rope transformation 0.78 0.43

All run with Llama3.1 and the number is strict-match.

lm_eval_ad.py result uses config from first test case in test_lm_eval.py
lm_eval_tensorrt_llm uses this command:

python lm_eval_tensorrt_llm.py --model trt-llm --model_args model=/home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B,backend=autodeploy,max_context_length=2048,max_gen_toks=128,tp=1,temperature=0 --tasks gsm8k --batch_size 4

I cannot run TP>1 with lm_eval_tensorrt_llm or TP=1 with test_lm_eval.py

@meenchen meenchen self-assigned this May 9, 2025
@meenchen meenchen force-pushed the user/weimingc/lm_eval_refactor branch from ff6bcc9 to 731b278 Compare May 9, 2025 17:11
@meenchen
Copy link
Collaborator Author

meenchen commented May 9, 2025

Hi @meenchen , does it support --tasks gsm8k and TP>1? I fail to align the gsm8k results from lm_eval_ad.py and lm_eval_tensorrt_llm.py. Here's my results:

Configuration lm_eval_ad lm_eval_tensorrt_llm
upstream/main 0.42 0.42
upstream/main without rope transformation 0.78 0.43
All run with Llama3.1 and the number is strict-match.

lm_eval_ad.py result uses config from first test case in test_lm_eval.py lm_eval_tensorrt_llm uses this command:

python lm_eval_tensorrt_llm.py --model trt-llm --model_args model=/home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B,backend=autodeploy,max_context_length=2048,max_gen_toks=128,tp=1,temperature=0 --tasks gsm8k --batch_size 4

I cannot run TP>1 with lm_eval_tensorrt_llm or TP=1 with test_lm_eval.py

@Fridah-nv, I am able to run TP> 2 for gsm8k. My command:

python lm_eval_tensorrt_llm.py --model trt-llm --model_args model=/home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B-Instruct,backend=autodeploy,max_context_length=2048,max_gen_toks=128,tp=2,temperature=0 --tasks gsm8k --batch_size 4 --limit 0.05

What's the error you encounter?

@lucaslie lucaslie added the AutoDeploy <NV> AutoDeploy Backend label May 14, 2025
@lucaslie lucaslie moved this from Backlog to In progress in AutoDeploy Board May 14, 2025
@meenchen meenchen force-pushed the user/weimingc/lm_eval_refactor branch from 4b4e476 to 33fe071 Compare June 3, 2025 22:25
@meenchen meenchen marked this pull request as draft June 3, 2025 22:25
@poweiw poweiw added the Community want to contribute PRs initiated from Community label Jun 5, 2025
@poweiw poweiw assigned Fridah-nv and unassigned meenchen Jun 5, 2025
@poweiw poweiw added the triaged Issue has been triaged by maintainers label Jun 5, 2025
@poweiw
Copy link
Collaborator

poweiw commented Jun 5, 2025

@Fridah-nv any updates?

@lucaslie lucaslie removed triaged Issue has been triaged by maintainers Community want to contribute PRs initiated from Community labels Jun 5, 2025
@Fridah-nv Fridah-nv assigned meenchen and unassigned Fridah-nv Jun 16, 2025
@Fridah-nv
Copy link
Collaborator

Fridah-nv commented Jun 16, 2025

@Fridah-nv any updates?

@meenchen will wrap up this PR. Please let me know if there's further question.

@poweiw poweiw added the Community want to contribute PRs initiated from Community label Jun 24, 2025
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
@ajrasane ajrasane force-pushed the user/weimingc/lm_eval_refactor branch from 33fe071 to 4b6a7fd Compare August 6, 2025 19:56
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 6, 2025

📝 Walkthrough

Walkthrough

The changes introduce expanded backend support and chat templating in the model evaluation harness, update documentation and test configurations to reflect new usage patterns, and adjust testing to use the new backend and parameter formats. Some tests are disabled, and dependency versions are updated. Minor code cleanup and configuration adjustments are also included.

Changes

Cohort / File(s) Change Summary
Backend & Chat Templating Enhancements
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
Added support for multiple backends (torch, pytorch, autodeploy), introduced chat templating, temperature control, new JsonChatStr class, and reorganized LLM instantiation logic.
Documentation Update
examples/auto_deploy/README.md
Updated model evaluation command instructions to reflect new script location, backend, and argument requirements; added reference to official lm-eval-harness README.
Test Adaptation and Reduction
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py
Switched to new evaluation harness import, disabled several parameterized tests, simplified remaining test to focus on "mmlu_llama" with consolidated arguments and updated backend.
Dependency Update
examples/llm-eval/lm-eval-harness/requirements.txt
Updated lm_eval package version from 0.4.7 to 0.4.8.
Python Path Configuration
tests/unittest/pytest.ini
Changed Python path from ../../examples/auto_deploy to ../../examples/llm-eval/lm-eval-harness for pytest.
Script Cleanup
examples/auto_deploy/build_and_run_flux.py
Removed setting of torch._dynamo.config.cache_size_limit.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LM_Eval_Harness
    participant TRTLLMEvalBase
    participant LLM_Backend

    User->>LM_Eval_Harness: Run evaluation with CLI args (model, backend, etc.)
    LM_Eval_Harness->>TRTLLMEvalBase: Initialize with backend and parameters
    TRTLLMEvalBase->>LLM_Backend: Configure backend (torch/pytorch/autodeploy)
    TRTLLMEvalBase->>TRTLLMEvalBase: Apply chat template (if needed)
    TRTLLMEvalBase->>LLM_Backend: Generate or compute loglikelihood with temperature
    LLM_Backend-->>TRTLLMEvalBase: Return results
    TRTLLMEvalBase-->>LM_Eval_Harness: Return evaluation results
    LM_Eval_Harness-->>User: Output metrics/results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–25 minutes

Possibly related PRs

Suggested reviewers

  • pcastonguay
  • litaotju
  • syuoni

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py (2)

177-178: Clarify why max_length is no longer set.

A previous reviewer asked why self.max_length is no longer being set. Please provide clarification on this change.


188-203: Clarify the source of the chat template implementation.

A previous reviewer asked about the origin of this code. Please provide context on whether this is adapted from another implementation or newly written.

🧹 Nitpick comments (3)
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py (2)

71-77: Consider adding docstrings for new parameters.

The new parameters attn_backend, tokenized_requests, and temperature would benefit from documentation explaining their purpose and valid values.


180-187: Fix docstring formatting issues.

The docstring needs a blank line after the summary and line 184 exceeds the character limit.

 @property
 def tokenizer_name(self) -> str:
     """Must be defined for LM subclasses which implement Chat Templating.
+    
     Should return the name of the tokenizer or chat template used.
-    Used only to properly fingerprint caches when requests are being cached with `--cache_requests`, otherwise not used.
+    Used only to properly fingerprint caches when requests are being cached 
+    with `--cache_requests`, otherwise not used.
     """
     return ""
tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py (1)

61-124: Consider using pytest.mark.skip instead of commenting out test cases.

Rather than commenting out test parameters, consider keeping them active with pytest.mark.skip decorators. This preserves the test definitions while clearly marking them as temporarily disabled.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a16ba64 and 4b6a7fd.

📒 Files selected for processing (6)
  • examples/auto_deploy/README.md (1 hunks)
  • examples/auto_deploy/build_and_run_flux.py (0 hunks)
  • examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py (8 hunks)
  • examples/llm-eval/lm-eval-harness/requirements.txt (1 hunks)
  • tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py (3 hunks)
  • tests/unittest/pytest.ini (1 hunks)
💤 Files with no reviewable changes (1)
  • examples/auto_deploy/build_and_run_flux.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a class in the constructor in Python.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
  • tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
  • tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/pytest.ini
  • examples/auto_deploy/README.md
  • examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
  • tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/unittest/pytest.ini
  • examples/auto_deploy/README.md
  • examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
  • tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/auto_deploy/README.md
  • examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
  • tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T08:45:40.701Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

  • examples/auto_deploy/README.md
  • examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
  • tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py
📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

  • examples/auto_deploy/README.md
  • examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py
  • tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py
📚 Learning: ministral is a valid model name from mistral ai, distinct from the regular mistral models. in tensor...
Learnt from: venkywonka
PR: NVIDIA/TensorRT-LLM#6650
File: tests/integration/test_lists/qa/llm_perf_cluster.yml:33-37
Timestamp: 2025-08-06T03:47:16.802Z
Learning: Ministral is a valid model name from Mistral AI, distinct from the regular Mistral models. In TensorRT-LLM test configurations, "ministral_8b" and "ministral_8b_fp8" are correct model identifiers and should not be changed to "mistral_8b".

Applied to files:

  • examples/auto_deploy/README.md
🪛 Ruff (0.12.2)
examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py

128-128: Line too long (121 > 120)

(E501)


131-131: Local variable pytorch_config is assigned to but never used

Remove assignment to unused variable pytorch_config

(F841)


182-185: 1 blank line required between summary line and description

(D205)


184-184: Line too long (124 > 120)

(E501)

🔇 Additional comments (8)
examples/llm-eval/lm-eval-harness/requirements.txt (1)

1-1: LGTM!

The patch version update from 0.4.7 to 0.4.8 is appropriate for the new backend support features.

examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py (3)

46-52: LGTM!

The JsonChatStr utility class is well-designed for handling JSON-encoded chat prompts.


324-326: LGTM!

The temperature override implementation allows for consistent generation behavior across all requests.


139-151: LGTM!

The LLM instantiation correctly supports the new backend architecture.

tests/unittest/pytest.ini (1)

8-8: LGTM!

The pythonpath update correctly reflects the new location of the evaluation harness module.

examples/auto_deploy/README.md (1)

195-200: LGTM!

The documentation updates correctly reflect the new evaluation harness usage with autodeploy backend support. The example command and reference link are helpful for users.

tests/unittest/_torch/auto_deploy/integration/test_lm_eval.py (2)

12-12: LGTM!

The import correctly references the new evaluation harness module.


133-140: LGTM!

The model arguments correctly configure the autodeploy backend with deterministic generation settings appropriate for testing.

Comment on lines +120 to +126
if self.backend == 'pytorch':
if hasattr(PyTorchConfig, "moe_backend"):
pytorch_config_params["moe_backend"] = self.moe_backend
print(f"Info: moe_backend is set to {self.moe_backend}")
pytorch_config_params["attn_backend"] = attn_backend.upper()
print(f"Info: attn_backend is set to {attn_backend}")
pytorch_config = PyTorchConfig(**pytorch_config_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate moe_backend configuration.

The moe_backend is already configured at lines 116-118. This duplicate block is redundant.

 if self.backend == 'pytorch':
-    if hasattr(PyTorchConfig, "moe_backend"):
-        pytorch_config_params["moe_backend"] = self.moe_backend
-        print(f"Info: moe_backend is set to {self.moe_backend}")
     pytorch_config_params["attn_backend"] = attn_backend.upper()
     print(f"Info: attn_backend is set to {attn_backend}")
     pytorch_config = PyTorchConfig(**pytorch_config_params)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.backend == 'pytorch':
if hasattr(PyTorchConfig, "moe_backend"):
pytorch_config_params["moe_backend"] = self.moe_backend
print(f"Info: moe_backend is set to {self.moe_backend}")
pytorch_config_params["attn_backend"] = attn_backend.upper()
print(f"Info: attn_backend is set to {attn_backend}")
pytorch_config = PyTorchConfig(**pytorch_config_params)
if self.backend == 'pytorch':
pytorch_config_params["attn_backend"] = attn_backend.upper()
print(f"Info: attn_backend is set to {attn_backend}")
pytorch_config = PyTorchConfig(**pytorch_config_params)
🤖 Prompt for AI Agents
In examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py around lines 120 to
126, remove the duplicate setting of moe_backend in the pytorch_config_params
dictionary since it is already configured at lines 116-118. Keep only the unique
configuration lines such as setting attn_backend and creating the PyTorchConfig
instance, eliminating the redundant moe_backend assignment and its print
statement.

Comment on lines +127 to +131
elif self.backend == 'autodeploy':
assert self.max_context_length is not None, "max_context_length must be specified for autodeploy backend"
# Only FlashInfer is supported for autodeploy backend.
pytorch_config_params["attn_backend"] = "FlashInfer"
pytorch_config = AutoDeployConfig(**pytorch_config_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix unused variable and line length issues.

The pytorch_config variable is created but not used. Also, line 128 exceeds the 120 character limit.

 elif self.backend == 'autodeploy':
-    assert self.max_context_length is not None, "max_context_length must be specified for autodeploy backend"
+    assert self.max_context_length is not None, \
+        "max_context_length must be specified for autodeploy backend"
     # Only FlashInfer is supported for autodeploy backend.
     pytorch_config_params["attn_backend"] = "FlashInfer"
-    pytorch_config = AutoDeployConfig(**pytorch_config_params)
+    # AutoDeployConfig is passed directly through kwargs

Since the config is passed via kwargs, there's no need to assign it to pytorch_config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif self.backend == 'autodeploy':
assert self.max_context_length is not None, "max_context_length must be specified for autodeploy backend"
# Only FlashInfer is supported for autodeploy backend.
pytorch_config_params["attn_backend"] = "FlashInfer"
pytorch_config = AutoDeployConfig(**pytorch_config_params)
elif self.backend == 'autodeploy':
assert self.max_context_length is not None, \
"max_context_length must be specified for autodeploy backend"
# Only FlashInfer is supported for autodeploy backend.
pytorch_config_params["attn_backend"] = "FlashInfer"
# AutoDeployConfig is passed directly through kwargs
🧰 Tools
🪛 Ruff (0.12.2)

128-128: Line too long (121 > 120)

(E501)


131-131: Local variable pytorch_config is assigned to but never used

Remove assignment to unused variable pytorch_config

(F841)

🤖 Prompt for AI Agents
In examples/llm-eval/lm-eval-harness/lm_eval_tensorrt_llm.py around lines 127 to
131, remove the assignment to the unused variable pytorch_config and instead
directly call AutoDeployConfig with the unpacked pytorch_config_params. Also,
break the long line 128 into multiple lines or simplify it to ensure it does not
exceed 120 characters.

@meenchen meenchen closed this Nov 14, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in AutoDeploy Board Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AutoDeploy <NV> AutoDeploy Backend Community want to contribute PRs initiated from Community

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants