Still work in progress
-
Place ONNX model to sample
/intent_model/1/ -
Config
config.pbtxt2.1 Define input/output of the model (shape, dtype). Look at
config.pbtxtfor more details2.2 Define optimization technique OpenVINO or TensorRT (Optional). Look at
config.pbtxtfor more details
- Download model weights via HuggingFace first
- Config
vllm/1/model.json
2.1 Go tovllm/1/model.json, replacemodel_namewith what you download from huggingface.
- Create new folder following structure
/{MODEL_NAME}
/{VERSION}
{weights in *.onnx}
config.pbtxt
- Replace
--model-storefromdocker-composeto new folder:
command: >
tritonserver
--model-store ./{MY_NEW_MODEL_WORKSPACE}/
--grpc-port ${TRITON_GRPC_PORT:-8012}
--http-port ${TRITON_HTTP_PORT:-8013}
--metrics-port ${TRITON_METRICS_PORT:-8014}
- Default behvaior: will load all models at to GPU, for more specific configurations please go to Triton Document for clear details
chmod +x ./start.sh
./start.sh