Skip to content

Commit a9bcdfa

Browse files
authored
Update building TRT engine in README.md
1 parent 2d15de6 commit a9bcdfa

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,9 @@ Arguments
7878

7979
For RTX 4090 (TensorRT 9.1.0.4 & TensorRT-LLM 0.5.0), a prebuilt TRT engine is provided. For other RTX GPUs or TensorRT versions, follow these steps to build your TRT engine:
8080

81-
Download LLaMa 2 13B AWQ int4 weights **model.pt** from [here](https://catalog.ngc.nvidia.com/orgs/nvidia/models/llama2-13b/files)
81+
Download LLaMa 2 13B chat model from [https://huggingface.co/meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)
82+
83+
Download LLaMa 2 13B AWQ int4 checkpoints **model.pt** from [here](https://catalog.ngc.nvidia.com/orgs/nvidia/models/llama2-13b/files)
8284

8385
Clone the [TensorRT LLM](https://github.com/NVIDIA/TensorRT-LLM/) repository:
8486
```
@@ -87,7 +89,7 @@ git clone https://github.com/NVIDIA/TensorRT-LLM.git
8789

8890
Navigate to the examples\llama directory and run the following script:
8991
```
90-
python build.py --model_dir <path to llama13_awq_int4_chat> --quant_ckpt_path <path to llama13_int4_chat>\model.pt --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3500 --max_output_len 1024 --output_dir <TRT engine folder>
92+
python build.py --model_dir <path to llama13_chat model> --quant_ckpt_path <path to model.pt> --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3500 --max_output_len 1024 --output_dir <TRT engine folder>
9193
```
9294

9395
## Adding your own data

0 commit comments

Comments
 (0)