How to use TensorRT C++ API for high performance GPU machine-learning inference.
Supports models with single / multiple inputs and single / multiple outputs with batching.
Project Overview Video
.
Code Deep-Dive Video
This project demonstrates how to use the TensorRT C++ API for high performance GPU inference. It covers how to do the following:
- How to install TensorRT 8 on Ubuntu 20.04
- How to generate a TRT engine file optimized for your GPU
- How to specify a simple optimization profile
- How to read / write data from / into GPU memory and work with GPU images.
- How to use cuda stream to run async inference and later synchronize.
- How to work with models with static and dynamic batch sizes.
- New: Supports models with multiple outputs (and even works with batching!).
- New: Supports models with multiple inputs.
- New: New video walkthrough where I explain every line of code.
- The code can be used as a base for many models, including Insightface ArcFace, YoloV7, SCRFD face detection, and many other single / multiple input - single / multiple output models. You will just need to implement the appropriate post-processing code.
- TODO: Add support for models with dynamic input shapes.
The following instructions assume you are using Ubuntu 20.04.
You will need to supply your own onnx model for this sample code, or you can download the sample model (see Sanity Check section below). Ensure to specify a dynamic batch size when exporting the onnx model if you would like to use batching. If not, you will need to set Options.doesSupportDynamicBatchSize to false.
sudo apt install build-essentialsudo apt install python3-pippip3 install cmake- Install OpenCV with cuda support. Instructions can be found here.
- Download TensorRT 8 from here.
- Extract, and then navigate to the
CMakeLists.txtfile and replace theTODOwith the path to your TensorRT installation.
mkdir build && cd buildcmake ..make -j$(nproc)
- To perform a sanity check, download the following ArcFace model from here and place it in the
./modelsdirectory. - Make sure
Options.doesSupportDynamicBatchSizeis set tofalsebefore passing theOptionsto theEngineconstructor on this line. - Uncomment the code for printing out the feature vector at the bottom of
./src/main.cpp. - Running inference using said model and the image located in
inputs/face_chip.jpgshould produce the following feature vector:
-0.0548096 -0.0994873 0.176514 0.161377 0.226807 0.215942 -0.296143 -0.0601807 0.240112 -0.18457 ...
- The bulk of the implementation is in
src/engine.cpp. I have written lots of comments all throughout the code which should make it easy to understand what is going on. - You can also check out my deep-dive video in which I explain every line of code.
- If you have having issues creating the TensorRT engine file from the onnx model, I would advise using the
trtexeccommand line tool (comes packaged in the TensorRT download bundle in the/bindirectory). It will provide you with more debug information.
If this project was helpful to you, I would appreicate if you could give it a star. That will encourage me to ensure it's up to date and solve issues quickly.
v2.2
- Serialize model name as part of engine file.
V2.1
- Added support for models with multiple inputs. Implementation now supports models with single inputs, multiple inputs, single outputs, multiple outputs, and batching.
V2.0
- Requires OpenCV cuda to be installed. To install, follow instructions here.
Options.optBatchSizeshas been removed, replaced byOptions.optBatchSize.- Support models with more than a single output (ex. SCRFD).
- Added support for models which do not support batch inference (first input dimension is fixed).
- More error checking.
- Fixed a bunch of common issues people were running into with the original V1.0 version.
- Remove whitespace from GPU device name