Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
556 commits
Select commit Hold shift + click to select a range
194a708
[fix] Fix test_attention_mla (#5084)
jinyangyuan-nvidia Jun 10, 2025
6cb2b7d
CI: Allow run (#5101)
IzzyPutterman Jun 10, 2025
fcd7192
[fix] Unwaive test_llama_eagle3 (#5042)
mikeiovine Jun 10, 2025
1b79041
fix: XQA is not enabled when history_length < kMinHistoryTokensPerBlo…
bobboli Jun 11, 2025
580a925
test: conditional disagg and cache aware balancing for deepseek v3 (#…
zhengd-nv Jun 11, 2025
273c6b9
[https://nvbugspro.nvidia.com/bug/5332927][fix] Fix the bug in the ro…
ChristinaZ Jun 11, 2025
035b048
infra: Add timeout and retry for wget in docker image build (#5035)
ZhanruiSunCh Jun 11, 2025
0a9f105
Waive L0 tests (#5111)
yiqingy0 Jun 11, 2025
00991d1
chore: Merge remaining changes from feat/large-ep branch to main (#5039)
syuoni Jun 11, 2025
fdf1c47
[TRTLLM-4995][feat] TRTLLM Sampler log probs support (#4836)
dcampora Jun 11, 2025
e2863a3
chore: bump version to 0.21.0rc2 (#5112)
ZhanruiSunCh Jun 11, 2025
56abae0
test: add more llama_v3.3_70b cases in perf test (#4979)
ruodil Jun 11, 2025
8282d6c
[fix] Fix llama4 min latency (#5117)
liji-nv Jun 11, 2025
a90dd57
[TRTLLM-5082] - Add a bot run option for detailed logs (#4390)
yiqingy0 Jun 11, 2025
11b94fe
test: skip disaggregated tests on arm (#5070)
xinhe-nv Jun 11, 2025
ddfe4fc
[chore] 2025-06-10 update allowlist (#5102)
tburt-nv Jun 11, 2025
ad99a08
[TRTLLM-5581][infra] Update Module Owners (#5052)
poweiw Jun 12, 2025
ee44fa0
chore: rename IOFormatter to BaseCacheFormatter (#5068)
zhengd-nv Jun 12, 2025
c592798
fix: limit process pool size when prefetching (#5088)
zhengd-nv Jun 12, 2025
4319237
Use backend to replace macro to control enablement of MNNVL all reduc…
HuiGao-NV Jun 12, 2025
e692779
Solve underallocation in VSWA+/VGQA (#4667)
netanel-haber Jun 12, 2025
49d7268
[nvbugs/5331013] fix AutoDeploy for PyTorch 25.05 dependency upgrade …
lucaslie Jun 12, 2025
c3b2eb6
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultr…
venkywonka Jun 12, 2025
0daa709
Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs (#…
moraxu Jun 12, 2025
505678a
update the free_gpu_mem_fraction for H100 qwen3 qa test (#5114)
byshiue Jun 12, 2025
06d9f1e
[test] Use LLM API for Nemotron-H correctness test (#5097)
tomeras91 Jun 12, 2025
d021cc5
test: set enable_attention_dp to False for non-deepseek models and ad…
ruodil Jun 12, 2025
53983ad
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#…
moraxu Jun 12, 2025
e462677
Fix logprobs issues. (#5136)
dcampora Jun 12, 2025
4d070d3
chore: fix typo in tests (#5092)
lfr-0531 Jun 12, 2025
10ab979
[fix] Do not reuse dummy request KVCache (#4804)
liji-nv Jun 12, 2025
a97f458
infra: upload imageTag info to artifactory and add ngc_staging to sav…
ZhanruiSunCh Jun 12, 2025
b563696
doc:fix invalid links for trtllm-serve doc (#5145)
nv-guomingz Jun 12, 2025
88cba5f
test: waive the NIXL related tests (#5153)
Shixiaowei02 Jun 12, 2025
59c9588
enh(doc): Add `ci-overview` in `docs/source/reference/` (#5137)
venkywonka Jun 12, 2025
22281cf
doc: Added documentation for enable_trtllm_sampler. (#4990)
dcampora Jun 12, 2025
58d4ca2
fix:remove duplicated trust_remote_code knob from trtllm-serve (#5143)
nv-guomingz Jun 12, 2025
cf35a07
fix:https://nvbugs/5298661 (#5022)
nv-guomingz Jun 12, 2025
8cfb567
fix: Updates to yarn implementation (#5105)
brb-nv Jun 12, 2025
dfeeaf6
Move allreduce_strategy from committed api to reference (#5147)
HuiGao-NV Jun 12, 2025
690873b
[nvbug/5334370][fix] Fix one model EAGLE3 (#5134)
mikeiovine Jun 12, 2025
655bce0
[fix][test] report individual unittests results to jenkins (#5116)
omera-nv Jun 12, 2025
3a04c9f
chore: Include prompt_token_ids only for context-only disagg requests…
pcastonguay Jun 12, 2025
cc2a134
None: fix OOM because of unnecessary mha workspace (#5056)
ttyio Jun 12, 2025
a0b6c63
[feat] trtllmGen MoE routing: added support for top groups and top K …
MatthiasKohl Jun 12, 2025
38a907a
[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptanc…
lfr-0531 Jun 13, 2025
4ae46b6
fix: [nvbugs/5324229] Fix broken WInt4AFP8FusedMoEMethod since FusedM…
yuxianq Jun 13, 2025
a891013
[feat] Optimize KV Cache Reuse for MLA (#4869)
zhhuang-nv Jun 13, 2025
fa582cb
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtyp…
ruodil Jun 13, 2025
d9be419
tests: update tests for b200 (#5180)
xinhe-nv Jun 13, 2025
b79eb34
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn (#5074)
yibinl-nvidia Jun 13, 2025
dec326b
[fix] Reenable test return logits (#5160)
dcampora Jun 13, 2025
01bd4c0
Add two MTP disaggregated test (#4546)
Tabrizian Jun 13, 2025
28cd536
[test] Update timeout params in QA test list (#5124)
crazydemo Jun 13, 2025
4d0a5ad
chore: gracefully exit disagg process in tests; better startup and lo…
zhengd-nv Jun 13, 2025
514baf1
[fix] Fix comment to pass guardwords check (#5191)
MatthiasKohl Jun 13, 2025
12e075e
[nvbug 5333996 ][fix] Unload XQA cubins early to avoid static lifetim…
lowsfer Jun 13, 2025
30c5b41
refactoring: port customized kernels with public cutlass version (#5027)
yunruis Jun 13, 2025
b959618
refactor [BREAKING CHANGE]:: remove the redundant use_kv_cache field …
nv-guomingz Jun 13, 2025
30d9d0f
test: [CI] Add failed cases into waives.txt (#5178)
xinhe-nv Jun 13, 2025
089be89
feat: Basic skeleton for Gemma3 VLM (#5108)
brb-nv Jun 13, 2025
e96d686
add doc for open-sourced cutlass kernels (#5194)
yunruis Jun 13, 2025
e5be3a9
fix: fix license bug (#5200)
yunruis Jun 13, 2025
8e99370
ucxx only use ucp_feature_tag to aviod some issuse on some platform (…
chuangz0 Jun 13, 2025
952f33d
CI: move all test cases of TensorRT backend into post merge (#5186)
QiJune Jun 13, 2025
3d87770
[https://nvbugspro.nvidia.com/bug/5295470] support headDim 256 for bl…
PerkzZheng Jun 13, 2025
25aa388
[nvbug/5319281][fix] Stop drafting when we hit the draft model's max …
mikeiovine Jun 13, 2025
06342ff
[feat] Implement model-agnostic one-engine eagle3 (#4778)
nv-yilinf Jun 13, 2025
5f2785f
fix: Fix waive list (#5205)
syuoni Jun 13, 2025
82e280f
feat: add multi-node support for Triton with pytorch backend (#5172)
achartier Jun 13, 2025
97657bf
optimize memset before alltoall communication (#5188)
dongxuy04 Jun 14, 2025
3b7b5a5
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part …
nv-guomingz Jun 14, 2025
b99c5ce
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL…
yunruis Jun 14, 2025
443b2eb
refactor: Speculative decoding buffers (#5091)
Funatiq Jun 14, 2025
0b60da2
feat: large-scale EP(part 7: DeepEP integration) (#4792)
yuantailing Jun 14, 2025
dc52b67
linting(python): Enable ruff on more files (wave 1/N) (#5140)
2ez4bz Jun 14, 2025
1389f5a
feat: Add support for fp8 rowwise quantization (#4876)
achartier Jun 14, 2025
e055af1
chore: improve disagg test failure detection (#4738)
ixlmar Jun 14, 2025
6bce733
perf: avoid dynamic import overhead in is_llm_response with duck typi…
tongyuantongyu Jun 14, 2025
63bc62d
feat: Enable EPLB to existing MoE models (#5203)
syuoni Jun 15, 2025
dce1dcc
feat: Support post_proc for bench (#5122)
kaiyux Jun 15, 2025
159ffc5
fix: fix cuda graph max batch size for spec decoding cases. (#5076)
lfr-0531 Jun 15, 2025
4eade3a
[fix][test] Speedup Nemotron NAS unittests (#5202)
omera-nv Jun 15, 2025
5a01ba5
use cu for fmha_v2 (#4694)
qsang-nv Jun 15, 2025
39bba63
[TRTLLM-4983] feat: enable overlap scheduler between draft forwards (…
lfr-0531 Jun 15, 2025
109c426
Enable trtllm-bench to run LoRA and add basic e2e perf testing capabi…
amitz-nv Jun 15, 2025
c84e41f
fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972)
Superjomn Jun 16, 2025
7a5e0fd
[fix] Fix Llama4 min-latency import error (#5209)
nv-yilinf Jun 16, 2025
babdd9c
test: Add json_mode_eval for guided decoding evaluation (#5179)
syuoni Jun 16, 2025
3d22f27
test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attent…
ruodil Jun 16, 2025
2848e01
test: add llama4 models for perf test (#5187)
ruodil Jun 16, 2025
9b616db
test: Add fixture to skip tests based on MPI world size (#5028)
yizhang-nv Jun 16, 2025
ef3fdc8
feat: Add w4a8_mxfp4_fp8 quantization recipe. (#4867)
Tracin Jun 16, 2025
0acf231
[Stress test] Add DeepSeek-R1 stress test (#5033)
Wanli-Jiang Jun 16, 2025
dda6416
refactor: Scheduling based on KV cache state (#4865)
Funatiq Jun 16, 2025
1d2b0d3
use file lock to avoid port conflict (#5123)
chuangz0 Jun 16, 2025
4f9fa9f
feat: MoE trtllm backend kernel update (#5183)
rosenrodt Jun 16, 2025
b6ca677
refactor: remove decoder request from decoder interface (#5129)
Funatiq Jun 16, 2025
8445416
Waive L0 tests (#5233)
yiqingy0 Jun 16, 2025
802f22c
test: [CI] Add failed cases into waives.txt (#5221)
xinhe-nv Jun 16, 2025
64b7f04
[test] split nemotron test cases from examples_test_list (#5238)
crazydemo Jun 16, 2025
03f1a6a
Update DeepSeek R1 perf numbers to latest release/0.20 results (#5235)
litaotju Jun 16, 2025
dd29063
[feat] Add llm args to tune python gc threshold (#5141)
nv-yilinf Jun 16, 2025
cea5dd1
[TRTLLM-5835][feat] Optimized Mamba2Mixer prefill (#5128)
tomeras91 Jun 16, 2025
e607768
Speculation: Draft Target in new FW (#4558)
IzzyPutterman Jun 16, 2025
5c18160
chore: Waive CI failure. (#5252)
SimengLiu-nv Jun 16, 2025
c53bc19
[infra] Make test_chunked_prefill faster (#5248)
mikeiovine Jun 16, 2025
a2e8ae1
Update internal cutlass commit. (#5228)
Tracin Jun 17, 2025
bb23483
test: add more pytorch cases in perf test (#5237)
ruodil Jun 17, 2025
546274d
fix ci (#5259)
QiJune Jun 17, 2025
a49ad79
test: [CI] remove closed bugs (#5218)
xinhe-nv Jun 17, 2025
4b82b8b
[TRTLLM-5330] perf: Optimize MoE supplementary kernels for large-scal…
syuoni Jun 17, 2025
134cb66
fix mla test (#5240)
qsang-nv Jun 17, 2025
6a6b9d2
doc: add document of benchmarking for Qwen3 (#5158)
byshiue Jun 17, 2025
faca19c
update setup.py for special cases (#5227)
qsang-nv Jun 17, 2025
517c1ec
move some test cases of TensorRT backend back (#5232)
QiJune Jun 17, 2025
498fadc
[feat] Add EAGLE3 support for Qwen3 (#5206)
nv-yilinf Jun 17, 2025
2ad8758
[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520][test] Add QA …
crazydemo Jun 17, 2025
ccd9adb
CI: move multi-gpu test cases of tensorrt backend to h200 (#5272)
QiJune Jun 17, 2025
dc3861b
refactor: Unify decoder test with e2e worklfow (#5239)
Funatiq Jun 17, 2025
13eef64
[feat] Piecewise cuda graph support for MLA (#4467)
liji-nv Jun 17, 2025
8451a87
chore: Mass integration of release/0.20 (#5082)
amirkl94 Jun 17, 2025
44fb3c1
[TRTLLM-5770] feat: Integrate TRT-LLM Gen FP8 block scale MoE with Py…
DomBrown Jun 17, 2025
f4cdbfc
None - Some clean-ups for the automation pipeline (#5245)
chzblych Jun 17, 2025
f899c4d
Re-implement LlmResponse in Python to reduce host overhead of pybind …
QiJune Jun 17, 2025
5236bb9
delete cubins (#5274)
qsang-nv Jun 17, 2025
dcf18c4
infra[TRTLLM-5635] remove package stage in CI build (#5075)
niukuo Jun 17, 2025
ff32caf
[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#…
EmmaQiaoCh Jun 17, 2025
9bf69c9
[chore] Remove BaseDraftTokenManager (#5251)
mikeiovine Jun 17, 2025
2df9f87
[infra] Report CI authorization errors to PR (#5175)
tburt-nv Jun 17, 2025
7d55c38
Revert "[infra] Report CI authorization errors to PR" (#5298)
tburt-nv Jun 17, 2025
627062c
refactor: Update decoder buffer and logits management (#4450)
Funatiq Jun 18, 2025
e1e5f72
fix: only set _mpi_session if world_size is > 1 (#5253)
achartier Jun 18, 2025
855036d
update LlmRequest.is_dummy property (#5283)
QiJune Jun 18, 2025
41cfcaa
test: update qa test list (#5305)
crazydemo Jun 18, 2025
3c0fecb
CI: extend model weights load time for dsv3 in stress test. (#5275)
dominicshanshan Jun 18, 2025
f501ce5
[fix][test] move deepseek single gpu tests to post merge (#5280)
omera-nv Jun 18, 2025
8f67e36
Waive L0 tests (#5308)
yiqingy0 Jun 18, 2025
e44f768
feat: Add no_kv_cache_reuse option and streaming support for trtllm s…
yizhang-nv Jun 18, 2025
724e495
chore: partition LLM class into TorchLLM and TrtLLM (#4900)
Superjomn Jun 18, 2025
908463a
[feat]: improve performance of XQA-MLA for sm120 (#5087)
lowsfer Jun 18, 2025
ee26965
doc:update contributing md for internal developers (#5250)
nv-guomingz Jun 18, 2025
3b5d916
test: cherry-pick deepseek rcca cases in main branch (#5307)
ruodil Jun 18, 2025
6711ad9
[TRTLLM-5589] feat: Minor optimizations for tunable FP8 batched GEMM …
hyukn Jun 18, 2025
9ea7bb6
CI: fix TensorRT H200 tests (#5301)
QiJune Jun 18, 2025
3a02489
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159)
Wanli-Jiang Jun 18, 2025
d76bda7
chore: Refine printed info of CHECK_TYPE. (#5295)
bobboli Jun 18, 2025
38547b9
refactor: Introduce ResourceManagerType enum for resource management …
Funatiq Jun 18, 2025
516bd4d
chore: bump version to 0.21.0rc3 (#5309)
ZhanruiSunCh Jun 18, 2025
f599ee6
test: correct unittest rerun behavior (#5273)
tongyuantongyu Jun 18, 2025
a3a4841
Fix rerun step (#5319)
yiqingy0 Jun 18, 2025
375dd0b
Waive L0 (#5311)
yizhang-nv Jun 18, 2025
610a49f
tests: add multi nodes tests (#5196)
xinhe-nv Jun 18, 2025
0623ffe
feat: Add LLGuidance Support for PyTorch Backend (#5214)
jellysnack Jun 18, 2025
b29ac5b
[Infra] Update 5080 and 5090 case condition due to the driver update …
EmmaQiaoCh Jun 18, 2025
00bdd39
chore: Update README.md to expose meet-up info (#5329)
juney-nvidia Jun 18, 2025
d13d2f4
Remove duplicated test cases (#5323)
HuiGao-NV Jun 18, 2025
857108a
Add disagg slurm scripts (#5243)
qiaoxj07 Jun 18, 2025
e5ee5c5
Unwaive disaggregated serving accuracy tests (#5095)
Tabrizian Jun 18, 2025
a1c5704
[feat] Multi-node CI testing support via Slurm (#4771)
yuanjingx87 Jun 18, 2025
a28a152
[fix][test] remove some cpp test cases from h100 (#5335)
omera-nv Jun 18, 2025
5010f87
[fix][test] remove duplicate test runs (#5241)
omera-nv Jun 18, 2025
d25f93c
chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head (#…
achartier Jun 18, 2025
0b6d005
[fix][test] clear cuda cache before unittests automatically (#5121)
omera-nv Jun 18, 2025
3946e79
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727)
Superjomn Jun 18, 2025
1a7c6e7
ci: Split long running jobs into multiple jobs (#5268)
Funatiq Jun 18, 2025
2b23cd5
[feat] Fusion finalize and allreduce for qwenmoe model (#5223)
zongfeijing Jun 19, 2025
6a388b1
chore: remove torch_compile prefix for TorchCompileConfig field membe…
nv-guomingz Jun 19, 2025
6c3210a
[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125)
lfr-0531 Jun 19, 2025
da576bc
Waive L0 test (#5349)
yiqingy0 Jun 19, 2025
decfe2f
chore: bump version to 0.21.0 (#5325)
yiqingy0 Jun 19, 2025
e87cf62
tests: cherry-pick from main branch, add qwen3 test cases and amend t…
ruodil Jun 19, 2025
8686805
[Infra]cherry pick sanity check yml change for 5080 and 5090 from mai…
EmmaQiaoCh Jun 19, 2025
ebc6dbc
doc: cherry pick #5334 (#5368)
MartinMarciniszyn Jun 19, 2025
2d5e202
fix: Fix skip by mpi size fixture (#5355)
yizhang-nv Jun 21, 2025
2b56957
Fix: missing clientId when serialize and deserialize response (cherry…
kaiyux Jun 24, 2025
9e110b2
tests: fix typos in qa test (#5421)
crazydemo Jun 25, 2025
32f50de
nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mis…
brb-nv Jun 25, 2025
af58393
feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364)
Wanli-Jiang Jun 25, 2025
5e50fcc
test: set enable_attention_dp=True in default deepseek settings (#5461)
ruodil Jun 25, 2025
5cd87be
tests: Set kv cache free memory fraction in test case (#5462)
HuiGao-NV Jun 25, 2025
b6d23d5
[Infra] - Waive failed tests on release/0.21 (#5477)
EmmaQiaoCh Jun 25, 2025
fc64f13
Fix permission for local user issues in NGC docker container. (#5373)
MartinMarciniszyn Jun 25, 2025
87ead4e
[nvbug 5273941] fix: broken cyclic reference detect (#5417)
Superjomn Jun 25, 2025
c2799d0
[nvbug/5354825] Fix nougat test image url (#5496)
amukkara Jun 26, 2025
a811077
fix: fix regression in LOCAL_USER (#5517)
ixlmar Jun 26, 2025
30a2a8b
doc: Fix benchmark cmd in disagg scripts (#5516)
kaiyux Jun 26, 2025
312fd47
fix: constrain grepping in docker/Makefile (#5493)
ixlmar Jun 26, 2025
e2054bb
[Infra][release/0.21] - waive failed tests (#5537)
EmmaQiaoCh Jun 27, 2025
b78ad75
ci: unwaive llmapi launch test (#5281)
Superjomn Jun 27, 2025
abb7357
[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instruc…
ixlmar Jun 27, 2025
4fc0666
[cherry-pick] [CI] Waive `test_fp8_block_scales_4gpus[ep4-mtp_nextn=0…
venkywonka Jun 27, 2025
647e070
[Infra][release/0.21]Update nccl to 2.27.5 (#5539)
EmmaQiaoCh Jun 29, 2025
d6c81ba
fix [nvbug5351244]: test_mpi_session submit sync/async (#5608)
Superjomn Jun 30, 2025
9fe1dd6
fix:https://nvbugs/5362398 (#5609)
nv-guomingz Jun 30, 2025
1824c44
[nvbug 5300551] test: increase block count in eviction test (#5465)
zhengd-nv Jul 1, 2025
aa0b927
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397)
yizhang-nv Jul 1, 2025
682b164
doc: Fix outdated config in DeepSeek best perf practice doc (#5638)
kaiyux Jul 1, 2025
d5606b0
fix: [https://nvbugs/5355219] Fix bug of Qwen3 235B CI on dgx_gb200 (…
byshiue Jul 2, 2025
92d3a2d
[https://nvbugspro.nvidia.com/bug/5351333][fix] Update to chunking ca…
FrankD412 Jul 2, 2025
a3c0cf0
fix: Investigate Gemma3 1B decoder output discrepancy (#5564)
brb-nv Jul 3, 2025
2f9d061
[Infra] - Waive failed cases on release/0.21 (#5674)
EmmaQiaoCh Jul 3, 2025
14f938e
Doc: Update invalid hugging face URLs (#5683)
Linda-Stadter Jul 3, 2025
8a8d2e9
[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5651)
farazkh80 Jul 3, 2025
2aacdba
[TRTLLM-6100] fix: Nvbug 5356427: autotuned TRTLLM Gen fp8 block scal…
DomBrown Jul 4, 2025
2b66fe8
[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test (#5735)
brb-nv Jul 4, 2025
53394e0
test: Move some of the test from post merge to pre-merge, update dgx …
yizhang-nv Jul 4, 2025
b0354ef
[5321981] fix: Fix the Llama3.1 405B hanging issue. (#5698)
hyukn Jul 4, 2025
3e44db1
[Infra][nvbugs/5370968] - Unwaive l0 test (#5750)
yiqingy0 Jul 4, 2025
5ac92bb
[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTL…
yizhang-nv Jul 4, 2025
518915b
[nvbug/5337601][fix] Fix disagg + speculative decoding (#5558)
Tabrizian Jul 4, 2025
aa4d0f0
[Infra] - Always use x86 image for the Jenkins agent (#5756)
chzblych Jul 6, 2025
6103466
test: fix some test failure and add llama_nemotron models in perf san…
ruodil Jul 7, 2025
9106b5d
fix: Skip rope scaling for local layers in Gemma3 VLM (#5773)
brb-nv Jul 7, 2025
7524c77
[nvbug 5004744][fix] rewrite completion API to avoid repetitive token…
LinPoly Jul 7, 2025
3a58db8
fix _pad_attention_dp_dummy_request (#5583)
QiJune Jul 7, 2025
06f8327
Fix docker cache mount (#5763)
MartinMarciniszyn Jul 7, 2025
4fa9284
[nvbug/5302638][nvbugs/5310314] fix _handle_cancelled_requests (#5532)
QiJune Jul 7, 2025
d47ac4e
cherry pick #5416 (#5776)
QiJune Jul 7, 2025
0a0ac7b
[nvbug 5304752][fix] enhance _check_arguments to filter illegal reque…
LinPoly Jul 7, 2025
97f4c9e
[nvbug5266240] chore: unwaive test_llm_with_dummy_weights (#5744)
Superjomn Jul 7, 2025
5a50e2b
[https://nvbugspro.nvidia.com/bug/5355054] fallback to cubins for fp8…
PerkzZheng Jul 8, 2025
6062dc6
fix: [https://nvbugspro.nvidia.com/bug/5345215] Unwaive for bug 53452…
bobboli Jul 8, 2025
f8b4077
[nvbugs/5326453] Avoid nesting NCCL grouping in allgather OP (#5789)
QiJune Jul 8, 2025
6d7a2cb
fix: [https://nvbugs/5351130][https://nvbugs/5333654] Unwaive for bug…
bobboli Jul 8, 2025
39ad602
doc: Update gb200 doc (#5840)
yizhang-nv Jul 8, 2025
cbcc55e
test: remove duplicate cases in perf sanity test (#5870)
ruodil Jul 9, 2025
2e21e34
[nvbug 5327706][fix] fix mgmn postprocess error (#5835)
LinPoly Jul 9, 2025
fd94d3c
[nvbugs/5345391] fix: chunked prefill + overlap scheduling (#5761)
Funatiq Jul 9, 2025
ce048ec
cherry-pick: [fix: nvbugs/5355493] Correctly clamp max sequence len t…
netanel-haber Jul 9, 2025
d9e265d
[https://nvbugs/5355316] fix: update torch.compile option to fix trit…
dc3671 Jul 10, 2025
ff9aabb
test: Add Gemma3 unit tests to CI in release/0.21 (#5899)
brb-nv Jul 10, 2025
cd7aeec
tests: Fix lora perf test (#5875)
amirkl94 Jul 10, 2025
8b7422c
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction…
bobboli Jul 10, 2025
8429c8b
chore: Port leftover 0.20 (#5907)
amirkl94 Jul 10, 2025
bfa917f
fix [nvbug/5351244]: address remote mpi session submit (#5664)
Superjomn Jul 10, 2025
aeea5b3
fix: [5328141] increase tolerance for test_fp8_block_scale_gemm (#5849)
nekorobov Jul 10, 2025
e831673
fix: timeout and broken pipe in disagg and worker tests (#5827)
zhengd-nv Jul 11, 2025
4905cac
[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (…
lfr-0531 Jul 12, 2025
bed78a2
fix: fix index out of bounds error in spec decoding (#5954)
lfr-0531 Jul 14, 2025
332a65b
[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918)
yizhang-nv Jul 14, 2025
2e7da20
[fix] Release slots with spec decode + disagg (#5975)
Tabrizian Jul 14, 2025
63f4a7a
[TRTLLM-6495] doc: add disclaimer for 3rd party software installation…
nv-guomingz Jul 15, 2025
69a15c8
[None] - Waive L0 tests (#6082)
yiqingy0 Jul 16, 2025
bce13bb
Cherry Pick: PR #6076 (#6088)
ZhanruiSunCh Jul 16, 2025
f6db521
add release notes for 0.21 release (#6049)
QiJune Jul 16, 2025
4d0bcbc
fix: Fix triton backend build [nvbug 5396469] (#6098)
pcastonguay Jul 16, 2025
eeca3ad
[None][infra] Cherry-pick #6128 and #6130 from main branch (#6151)
chzblych Jul 18, 2025
9323de6
[Doc][Qwen3] update qwen3 into support-matrix (#6161)
byshiue Jul 18, 2025
ab4e178
[fix]: Revert commit 388b491 (#6143)
LinPoly Jul 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
  • Loading branch information
Wanli-Jiang authored Jun 18, 2025
commit 3a02489e86ccbc3e2baf7be1010744ce70e57286
5 changes: 5 additions & 0 deletions tests/integration/defs/accuracy/references/gsm8k.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,3 +92,8 @@ nvidia/Llama-3_1-Nemotron-Ultra-253B-v1:
accuracy: 94.16
kanana-1.5-2.1b-instruct-2505:
- accuracy: 75.81
speakleash/Bielik-11B-v2.2-Instruct:
- accuracy: 41.51
- quant_algo: FP8
kv_cache_quant_algo: FP8
accuracy: 40.41
5 changes: 5 additions & 0 deletions tests/integration/defs/accuracy/references/mmlu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -179,3 +179,8 @@ nvidia/Llama-3_1-Nemotron-Ultra-253B-v1:
accuracy: 83.36
kanana-1.5-2.1b-instruct-2505:
- accuracy: 56.89
speakleash/Bielik-11B-v2.2-Instruct:
- accuracy: 64.47
- quant_algo: FP8
kv_cache_quant_algo: FP8
accuracy: 64.36
19 changes: 19 additions & 0 deletions tests/integration/defs/accuracy/test_llm_api_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -1488,3 +1488,22 @@ def test_auto_dtype(self):
task.evaluate(llm)
task = GSM8K(self.MODEL_NAME)
task.evaluate(llm)


class TestBielik11BInstruct(LlmapiAccuracyTestHarness):
MODEL_NAME = "speakleash/Bielik-11B-v2.2-Instruct"

def test_auto_dtype(self):
with LLM(f"{llm_models_root()}/Bielik-11B-v2.2-Instruct") as llm:
task = MMLU(self.MODEL_NAME)
task.evaluate(llm)
task = GSM8K(self.MODEL_NAME)
task.evaluate(llm)

@skip_pre_hopper
def test_fp8(self):
with LLM(f"{llm_models_root()}/Bielik-11B-v2.2-Instruct-FP8") as llm:
task = MMLU(self.MODEL_NAME)
task.evaluate(llm)
task = GSM8K(self.MODEL_NAME)
task.evaluate(llm)
2 changes: 2 additions & 0 deletions tests/integration/test_lists/qa/examples_test_list.txt
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,8 @@ accuracy/test_llm_api_pytorch.py::TestQwen3_235B_A22B::test_fp8[latency]
accuracy/test_llm_api_pytorch.py::TestQwen3_235B_A22B::test_nvfp4[throughput_latency]
accuracy/test_llm_api_pytorch.py::TestQwen3_235B_A22B::test_nvfp4[latency]
accuracy/test_llm_api_pytorch.py::TestKanana_Instruct::test_auto_dtype
accuracy/test_llm_api_pytorch.py::TestBielik11BInstruct::test_auto_dtype
accuracy/test_llm_api_pytorch.py::TestBielik11BInstruct::test_fp8

test_e2e.py::test_llama_e2e[use_cpp_session-remove_input_padding-]
test_e2e.py::test_llama_e2e[use_py_session-remove_input_padding-]
Expand Down
50 changes: 50 additions & 0 deletions tests/unittest/llmapi/test_llm_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,3 +304,53 @@ def test_codellama_fp8_with_bf16_lora() -> None:
lora_request=lora_requests)

assert len(outputs) == 2


@skip_gpu_memory_less_than_80gb
def test_bielik_11b_v2_2_instruct_multi_lora() -> None:
from tensorrt_llm._torch.llm import LLM

model_dir = f"{llm_models_root()}/Bielik-11B-v2.2-Instruct"

target_modules = ['attn_q', 'attn_k', 'attn_v']

# Set up temporary directory for LoRA adapters
with tempfile.TemporaryDirectory() as lora_dir:
print("Creating dummy LoRAs...")

model = AutoModelForCausalLM.from_pretrained(model_dir,
torch_dtype=torch.bfloat16,
device_map="auto")
hf_modules = ["q_proj", "k_proj", "v_proj"]
peft_lora_config = PeftLoraConfig(r=8,
target_modules=hf_modules,
bias="none",
task_type="CAUSAL_LM")
lora_paths = []
for i in range(2):
lora_model = get_peft_model(model, peft_lora_config)
for param in lora_model.parameters():
param.data.zero_()
lora_path = f"{lora_dir}/lora_{i}"
lora_model.save_pretrained(lora_path)
lora_paths.append(lora_path)

trtllm_lora_config = LoraConfig(lora_dir=lora_paths,
lora_target_modules=target_modules,
max_lora_rank=8)
llm = LLM(model_dir, lora_config=trtllm_lora_config)

prompts = [
"Kim był Mikołaj Kopernik i z czego zasłynął?",
"Gdzie znajduje się stolica Polski?",
]
lora_req1 = LoRARequest("lora-1", 0, lora_paths[0])
lora_req2 = LoRARequest("lora-2", 1, lora_paths[1])
lora_requests = [lora_req1, lora_req2]
sampling_params = SamplingParams(max_tokens=200)

outputs = llm.generate(prompts,
sampling_params,
lora_request=lora_requests)

assert len(outputs) == 2