-
Notifications
You must be signed in to change notification settings - Fork 751
feat: add runtime image for trtllm container build #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
67eec7e
48ea1ec
1d1ddaf
9f25c85
605a169
213b62a
e3bb46b
c696e9d
f3680ab
0091cf6
a878ccf
bcc70f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -397,8 +397,8 @@ COPY --from=build /usr/local/bin/etcd/ /usr/local/bin/etcd/ | |
| COPY --from=build /usr/local/ucx /usr/local/ucx | ||
| # Copy NIXL source from build image (required for NIXL plugins) | ||
| COPY --from=build /usr/local/nixl /usr/local/nixl | ||
| # Copy HPCX from base image | ||
| COPY --from=build /opt/hpcx /opt/hpcx | ||
| # Copy OpenMPI from build image | ||
| COPY --from=build /opt/hpcx/ompi /opt/hpcx/ompi | ||
| # Copy NUMA library from build image | ||
| COPY --from=build /usr/lib/x86_64-linux-gnu/libnuma.so* /usr/lib/x86_64-linux-gnu/ | ||
|
|
||
|
|
@@ -408,21 +408,22 @@ RUN uv venv $VIRTUAL_ENV --python 3.12 && \ | |
| echo "source $VIRTUAL_ENV/bin/activate" >> ~/.bashrc | ||
|
|
||
| # Common dependencies | ||
| # ToDo: Remove extra install and use pyproject.toml to define all dependencies | ||
| RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \ | ||
| uv pip install --requirement /tmp/requirements.txt | ||
|
|
||
| # Install test dependencies | ||
| #TODO: Remove this once we have a functional dev image built on top of the runtime image | ||
| # TODO: Remove this once we have a functional CI image built on top of the runtime image | ||
| RUN --mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.txt \ | ||
| uv pip install --requirement /tmp/requirements.txt | ||
|
|
||
| # Copy CUDA toolkit components needed for nvcc, cudafe, cicc etc. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we will run into an issue where the BASE container is not on the same cuda version as the RUNTIME container, thus causing issues down the line with straight copying the libs. I would highlight this as a risk rather than a hard stop.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, but its the same case with other packages. I believe torch and other packages also expect a certain version of cuda so they would also need to match wrt BASE and RUNTIME containers? I can try using the cuda-devel image which comes with all these cuda binaries, but I would need to check on size impact for that |
||
| COPY --from=build /usr/local/cuda/bin/ /usr/local/cuda/bin/ | ||
| COPY --from=build /usr/local/cuda/include /usr/local/cuda/include | ||
| COPY --from=build /usr/local/cuda/bin/nvcc /usr/local/cuda/bin/nvcc | ||
| COPY --from=build /usr/local/cuda/bin/cudafe++ /usr/local/cuda/bin/cudafe++ | ||
| COPY --from=build /usr/local/cuda/bin/ptxas /usr/local/cuda/bin/ptxas | ||
| COPY --from=build /usr/local/cuda/bin/fatbinary /usr/local/cuda/bin/fatbinary | ||
| COPY --from=build /usr/local/cuda/include/ /usr/local/cuda/include/ | ||
| COPY --from=build /usr/local/cuda/lib64/libcudart.so* /usr/local/cuda/lib64/ | ||
| COPY --from=build /usr/local/cuda/lib64/libnvvm.so* /usr/local/cuda/lib64/ | ||
| COPY --from=build /usr/local/cuda/lib64/libnvvmx.so* /usr/local/cuda/lib64/ | ||
| COPY --from=build /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/ | ||
| COPY --from=build /usr/local/cuda/nvvm /usr/local/cuda/nvvm | ||
|
|
||
| # Copy pytorch installation from NGC PyTorch | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this all be better defined as a I get why you did it this way to begin with, to prove we have all the packages needed for trtllm. Now we have something to compare the next iteration against to make it better.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some of these are internal versions, so I dont think there's any easy way to pip install them? I could be wrong here |
||
|
|
@@ -435,7 +436,6 @@ ARG NETWORKX_VER=3.4.2 | |
| ARG SYMPY_VER=1.14.0 | ||
| ARG PACKAGING_VER=23.2 | ||
| ARG FLASH_ATTN_VER=2.7.3 | ||
| ARG MPI4PY_VER | ||
| ARG MPMATH_VER=1.3.0 | ||
| COPY --from=build /usr/local/lib/lib* /usr/local/lib/ | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/torch /usr/local/lib/python3.12/dist-packages/torch | ||
|
|
@@ -447,8 +447,8 @@ COPY --from=build /usr/local/lib/python3.12/dist-packages/torchvision.libs /usr/ | |
| COPY --from=build /usr/local/lib/python3.12/dist-packages/setuptools /usr/local/lib/python3.12/dist-packages/setuptools | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/setuptools-${SETUPTOOLS_VER}.dist-info /usr/local/lib/python3.12/dist-packages/setuptools-${SETUPTOOLS_VER}.dist-info | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/functorch /usr/local/lib/python3.12/dist-packages/functorch | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info /usr/local/lib/python3.12/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/triton /usr/local/lib/python3.12/dist-packages/triton | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info /usr/local/lib/python3.12/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/jinja2 /usr/local/lib/python3.12/dist-packages/jinja2 | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/jinja2-${JINJA2_VER}.dist-info /usr/local/lib/python3.12/dist-packages/jinja2-${JINJA2_VER}.dist-info | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/networkx /usr/local/lib/python3.12/dist-packages/networkx | ||
|
|
@@ -460,21 +460,16 @@ COPY --from=build /usr/local/lib/python3.12/dist-packages/packaging-${PACKAGING_ | |
| COPY --from=build /usr/local/lib/python3.12/dist-packages/flash_attn /usr/local/lib/python3.12/dist-packages/flash_attn | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/flash_attn-${FLASH_ATTN_VER}.dist-info /usr/local/lib/python3.12/dist-packages/flash_attn-${FLASH_ATTN_VER}.dist-info | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/flash_attn_2_cuda.cpython-312-*-linux-gnu.so /usr/local/lib/python3.12/dist-packages/ | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/mpmath /usr/local/lib/python3.12/dist-packages/mpmath | ||
| COPY --from=build /usr/local/lib/python3.12/dist-packages/mpmath-${MPMATH_VER}.dist-info /usr/local/lib/python3.12/dist-packages/mpmath-${MPMATH_VER}.dist-info | ||
| # COPY --from=build /usr/local/lib/python3.12/dist-packages/mpmath /usr/local/lib/python3.12/dist-packages/mpmath | ||
| # COPY --from=build /usr/local/lib/python3.12/dist-packages/mpmath-${MPMATH_VER}.dist-info /usr/local/lib/python3.12/dist-packages/mpmath-${MPMATH_VER}.dist-info | ||
|
|
||
| # Setup environment variables | ||
| ARG ARCH_ALT | ||
| ENV NIXL_PLUGIN_DIR=/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins | ||
| ENV LD_LIBRARY_PATH=/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins:/usr/local/ucx/lib:/opt/hpcx/ompi/lib:$LD_LIBRARY_PATH | ||
| ENV OMPI_HOME=/opt/hpcx/ompi | ||
| ENV PATH=/opt/hpcx/ompi/bin:/usr/local/bin/etcd/:$PATH | ||
| ENV PATH=/opt/hpcx/ompi/bin:/usr/local/bin/etcd/:/usr/local/cuda/nvvm/bin:$PATH | ||
| ENV OPAL_PREFIX=/opt/hpcx/ompi | ||
|
|
||
| #TODO: Remove this once we have a functional dev image built on top of the runtime image | ||
| COPY . /workspace | ||
| RUN uv pip install /workspace/benchmarks | ||
|
|
||
| # Install TensorRT-LLM (same as in build stage) | ||
| ARG HAS_TRTLLM_CONTEXT=0 | ||
| ARG TENSORRTLLM_PIP_WHEEL="tensorrt-llm" | ||
|
|
@@ -490,8 +485,17 @@ RUN uv pip install --index-url "${TENSORRTLLM_INDEX_URL}" \ | |
| uv pip install ai-dynamo --find-links wheelhouse && \ | ||
| uv pip install nixl --find-links wheelhouse | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the future I would consider installing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, but my concern with installing nixl first is that if it brings any dependency which is common and not version controlled in ai-dynamo, the one from nixl will be installed and used. We should do single install i think, options there could be
|
||
|
|
||
| # Copy TensorRT-LLM environment setup script | ||
| # Setup TRTLLM environment variables, same as in dev image | ||
| ENV TRTLLM_USE_UCX_KVCACHE=1 | ||
| COPY --from=dev /usr/local/bin/set_trtllm_env.sh /usr/local/bin/set_trtllm_env.sh | ||
| RUN echo 'source /usr/local/bin/set_trtllm_env.sh' >> /root/.bashrc | ||
|
|
||
| # Copy benchmarks, exmaples and tests for CI | ||
nv-anants marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # TODO: Remove this once we have a functional CI image built on top of the runtime image | ||
| COPY tests /workspace/tests | ||
| COPY benchmarks /workspace/benchmarks | ||
| COPY examples /workspace/examples | ||
| RUN uv pip install /workspace/benchmarks | ||
|
|
||
| # Copy launch banner | ||
| RUN --mount=type=bind,source=./container/launch_message.txt,target=/workspace/launch_message.txt \ | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.