Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
ac7e888
docs: fix helm chart urls (#2033)
nealvaidya Jul 21, 2025
76fd471
refactor: support for turning prefix cache off (#2034)
alec-flowers Jul 22, 2025
4449f3d
fix: never sleep on the eos (#2039)
alec-flowers Jul 22, 2025
20c5daf
fix: install torch distribution matching container cuda version (#2027)
ptarasiewiczNV Jul 22, 2025
e5a8628
feat: add a hierarchical Prometheus MetricsRegistry trait for Distrib…
keivenchang Jul 22, 2025
7882693
feat: use atomic transactions when creating etcd kv (#2044)
PeaBrane Jul 22, 2025
d65ce1b
chore(sglang): Move examples/sglang to components/backends/sglang (#2…
grahamking Jul 22, 2025
73505c7
fix: correct Nixl plugin paths in Dockerfile. (#2048)
karya0 Jul 22, 2025
c49a13e
docs: Cleanup index.rst (#2007)
atchernych Jul 22, 2025
9f2356c
chore: Remove unused portion of kv bindings test (#2052)
rmccorm4 Jul 22, 2025
f3e3d94
refactor: vLLM to new Python UX (#1983)
alec-flowers Jul 22, 2025
9cfaa7b
chore: Bump genai-perf to v0.0.15 (#2051)
ptarasiewiczNV Jul 22, 2025
22e6c96
chore: Change vllm K8s from dynamo-run to python -m dynamo.frontend (…
grahamking Jul 22, 2025
b127d95
feat: health check changes based on endpoint served (#1996)
nnshah1 Jul 23, 2025
1958b3a
build: Fixes for vLLM Blackwell Builds (#2020)
zaristei Jul 23, 2025
2c642fd
fix: vllm deployment examples (#2062)
biswapanda Jul 23, 2025
6a69ef4
fix: cryptic error message for empty messages list in /chat/completio…
heisenberglit Jul 23, 2025
c6f12f6
ci: Add RUN_SGLANG to CI variables (#1928)
pvijayakrish Jul 23, 2025
e0a5194
feat: Connect Library (#1478)
whoisj Jul 23, 2025
ffb5409
fix: endpoint changes should be prioritized over new requests in kv s…
PeaBrane Jul 23, 2025
eebc741
docs: Adjust the path to examples (#2056)
atchernych Jul 23, 2025
f9b1757
fix: Bring back ignore_eos/min_tokens support in trtllm component (#2…
rmccorm4 Jul 23, 2025
66b7d2c
fix: updates versions and adds ahashmap to BPE (#2072)
paulhendricks Jul 23, 2025
9bdceac
fix: github ci triggers (#2075)
biswapanda Jul 23, 2025
7a0013b
chore: update attributions for 0.3.2 release (#1837) (#2032)
nv-anants Jul 23, 2025
13560ab
feat: sglang examples launch and deploy (#2068)
biswapanda Jul 23, 2025
f3d784f
feat: query instance_id based on routing strategy (#1787)
biswapanda Jul 23, 2025
3c500ae
docs: Update docs for new UX (#2070)
grahamking Jul 23, 2025
19a77ae
chore(dynamo-run): Remove out=sglang|vllm|trtllm (#1920)
grahamking Jul 24, 2025
ee3a8e4
feat: add initial Grove support (#2012)
julienmancuso Jul 24, 2025
cde8db3
docs: Replace a sym link with and actual markdown link (#2074)
atchernych Jul 24, 2025
13d3cc1
feat: add nixl benchmark deployment instructions (#2060)
biswapanda Jul 24, 2025
2fc65ad
feat: dump radix tree as router events (#2057)
PeaBrane Jul 24, 2025
ba3ac23
test: add router e2e test with mockers to per-merge ci (#2073)
PeaBrane Jul 24, 2025
fe718fd
feat: deploy SLA profiler to k8s (#2030)
hhzhang16 Jul 24, 2025
a2874fd
feat: add possibility to use grove in dynamo graph helm chart (#1954)
julienmancuso Jul 24, 2025
f03f8be
docs: hello_world python binding example (#2083)
nealvaidya Jul 24, 2025
2bbbd44
chore: Remove unused trtllm requirements.txt (#2098)
rmccorm4 Jul 24, 2025
f0e382a
fix: Merge env vars correctly (#2096)
julienmancuso Jul 24, 2025
3094278
docs: Create a guide for writing dynamo deployments CR (#1999)
atchernych Jul 24, 2025
ff92053
docs: add NAMESPACE (#2105)
atchernych Jul 25, 2025
a2cb1c3
feat: update python packaging for new dynamo UX (#2054)
grahamking Jul 25, 2025
24cb926
docs: Clean index.rst (#2104)
atchernych Jul 25, 2025
412a12a
fix: rm enforce eager from vllm deploy - prefer perf over pod launch …
biswapanda Jul 25, 2025
2cd96ec
build: Add TensorRT-LLM to optional dependency and corresponding inst…
tanmayv25 Jul 25, 2025
384e449
fix: agg router test (#2123)
alec-flowers Jul 25, 2025
4dc529a
chore: remove vLLM v0 multimodal example (#2099)
GuanLuo Jul 25, 2025
4498a77
fix: move docker-compose.yml to deploy/, and update frontend port (#2…
keivenchang Jul 25, 2025
222245e
refactor: Move engine and publisher from dynamo.llm.tensorrt_llm to d…
tanmayv25 Jul 26, 2025
b8461b6
chore: updated health checks to use new probes (#2124)
nnshah1 Jul 27, 2025
e2a514b
fix: remove prints (#2142)
alec-flowers Jul 28, 2025
615580d
feat: Base metrics: add generic ingress handler metrics (#2090)
keivenchang Jul 28, 2025
e82bc4e
chore: update vLLM to 0.10.0 (#2114)
ptarasiewiczNV Jul 28, 2025
803bfa8
feat: proper local hashes for mockers + router watches endpoints (#2132)
PeaBrane Jul 28, 2025
0cb01b3
feat: updates to structured logging (#2061)
nnshah1 Jul 28, 2025
ca0035f
fix: copy whole workspace for pre-merge vllm tests (#2146)
nv-anants Jul 28, 2025
d23d48b
feat: Deploy SLA planner to Kubernetes (#2135)
hhzhang16 Jul 28, 2025
708d7c3
docs: add Llama4 eagle3 one model example and configs (#2087)
jhaotingc Jul 28, 2025
096d117
docs: update router docs (#2148)
PeaBrane Jul 28, 2025
1e6709d
feat: allow to override any podSpec property (#2116)
julienmancuso Jul 28, 2025
f809659
docs: hello world deploy example (#2102)
atchernych Jul 28, 2025
cfc6178
feat: add sglang disagg deployment examples (#2137)
biswapanda Jul 28, 2025
bbe8dbb
fix: remove containers from required property of extraPodSpec (#2153)
julienmancuso Jul 28, 2025
fdcf611
chore: Add Request Migration docs and minor enhancements (#2038)
kthui Jul 28, 2025
095ea3e
chore: updating and removing tests (#2130)
nnshah1 Jul 29, 2025
4747790
feat: deprecate sdk as dependency (#2149)
biswapanda Jul 29, 2025
3175b10
docs: Update to README.md (#2141)
athreesh Jul 29, 2025
7fbd43a
docs: Update dynamo_glossary.md (#2082)
athreesh Jul 29, 2025
358e908
docs: Adding document for running Dynamo on Azure Kubernetes Services…
saurabh-nvidia Jul 29, 2025
195c4c4
docs: Quickstart with new UX (#2005)
nealvaidya Jul 29, 2025
291df28
docs: add disagg example + explanation (#2086)
nealvaidya Jul 29, 2025
ca5b681
docs: add multinode example (#2155)
nealvaidya Jul 29, 2025
a8cb655
docs: update readme install instructions (#2170)
nv-anants Jul 29, 2025
5be23eb
Readmes + eks additions (#2157)
athreesh Jul 29, 2025
2befa38
feat: claim support for AL2023 x86_64 (#2150)
saturley-hall Jul 29, 2025
e542f00
chore: cleanup examples codeowners (#2171)
nealvaidya Jul 29, 2025
12a7b83
docs: Examples README/restructuring, framework READMEs, EKS examples …
athreesh Jul 29, 2025
8b0a035
docs: Update the operator docs (#2172)
atchernych Jul 29, 2025
8248a11
feat: gaie helm chart based example (#2168)
biswapanda Jul 29, 2025
157714a
chore: add instructions to modify SLA to profile_sla doc; update comp…
tedzhouhk Jul 29, 2025
30d4612
fix: install rdma libs in runtime image. (#2163)
karya0 Jul 29, 2025
da0c572
chore: update sgl version and fix h100 wideep example (#2169)
ishandhanani Jul 30, 2025
4c90b1b
chore: Version bump to 0.4.0 (#2179)
dmitry-tokarev-nv Jul 30, 2025
ee09de0
fix: link to point to bindings/python/README.md (#2186)
keivenchang Jul 30, 2025
dabfea3
chore: address QA broken links comments (#2184)
athreesh Jul 30, 2025
b69c507
fix: add better port logic (#2175)
alec-flowers Jul 30, 2025
7fc94da
fix(container): update sgl dockerfile install commands (#2194)
ishandhanani Jul 30, 2025
57482dc
docs: Bug 5424387 (#2196)
atchernych Jul 30, 2025
f3868b1
fix: support config without resource limit for profile sla script (#2…
tedzhouhk Jul 31, 2025
f8b0a5a
feat: Add trtllm deploy examples for k8s (#2133)
tanmayv25 Jul 31, 2025
62c7898
fix: add curl and jq for health checks (#2203)
biswapanda Jul 31, 2025
c546b63
fix: update SGLang version in instructions and Dockerfile to revert t…
ishandhanani Jul 31, 2025
97390ac
fix(k8s): sglang disagg now uses decode worker (#2206)
ishandhanani Jul 31, 2025
f10aab3
fix: Migrating trtllm examples from `1.0.0rc0` to `1.0.4rc4` (#2217)
KrishnanPrash Jul 31, 2025
3bf22bb
feat: reorganize sglang and add expert distribution endpoints (#2181)
ishandhanani Jul 31, 2025
bae25dc
feat: skip downloading model weights if using mocker (only tokenizer)…
PeaBrane Jul 31, 2025
cbc0e20
fix: fix endpoint run to return error DIS-325 (#2156)
keivenchang Jul 31, 2025
625578c
chore: update nixl version to 0.4.1 (#2221)
nv-anants Jul 31, 2025
7e3b3fa
fix: Add default configs in LLMAPI. Fixes OOM issues (#2198)
tanmayv25 Jul 31, 2025
f10e44c
fix: Integration tests fixes (#2161)
keivenchang Jul 31, 2025
f14f59c
chore: Remove multimodal readme. (#2212)
krishung5 Jul 31, 2025
dbd33df
fix: handle groveTerminationDelay and auto-detect grove installation …
julienmancuso Aug 1, 2025
66231cf
feat: reduce / revert routing overheads, do not consider output token…
PeaBrane Aug 1, 2025
8c75ed7
fix: frontend metrics to be renamed from nv_llm_http_service_* => dyn…
keivenchang Aug 1, 2025
1ad6abe
feat: add sgl deploy readme (#2238)
ishandhanani Aug 1, 2025
efd863d
fix: dynamo_component to be added in metric names (#2180)
keivenchang Aug 1, 2025
faafa5f
docs: add a docs/guides/metrics.md (#2160)
keivenchang Aug 1, 2025
cb1492a
rebase main
ziqifan617 Aug 1, 2025
ae51b3f
test: Request Migration Docs and E2E vLLM Tests (#2177)
kthui Aug 1, 2025
959f810
feat: sglang + gb200 (#2223)
ishandhanani Aug 1, 2025
fa492bb
docs: Dyn 591 (#2247)
atchernych Aug 2, 2025
357f34b
cleanup (#2250)
ziqifan617 Aug 2, 2025
2954005
Merge branch 'main' into ziqi/connector-250801
ziqifan617 Aug 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
feat: update python packaging for new dynamo UX (#2054)
Signed-off-by: Anant Sharma <[email protected]>
Co-authored-by: Anant Sharma <[email protected]>
  • Loading branch information
grahamking and nv-anants authored Jul 25, 2025
commit a2cb1c33f184a94d84ab5a2762aefa43511fc7d2
2 changes: 1 addition & 1 deletion Earthfile
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ dynamo-build:
FROM +rust-base
WORKDIR /workspace
COPY Cargo.toml Cargo.lock ./
COPY pyproject.toml README.md hatch_build.py ./
COPY pyproject.toml README.md ./
COPY components/ components/
COPY lib/ lib/
COPY launch/ launch/
Expand Down
5 changes: 0 additions & 5 deletions components/backends/sglang/requirements.txt

This file was deleted.

5 changes: 0 additions & 5 deletions components/backends/vllm/requirements.txt

This file was deleted.

35 changes: 7 additions & 28 deletions container/Dockerfile.sglang
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ ARG RUNTIME_IMAGE_TAG="12.8.1-runtime-ubuntu24.04"
ARG ARCH=amd64
ARG ARCH_ALT=x86_64

# Make sure to update the dependency version in pyproject.toml when updating this
ARG SGLANG_VERSION="0.4.9.post1"
ARG SGL_KERNEL_VERSION="0.2.4"

##################################
########## Base Image ############
Expand Down Expand Up @@ -116,7 +116,7 @@ RUN git clone "https://github.com/ai-dynamo/nixl.git" ${NIXL_SRC_DIR} && \
cd ${NIXL_SRC_DIR} && \
git checkout ${NIXL_REF} && \
if [ "$ARCH" = "arm64" ]; then \
nixl_build_args="-Ddisable_gds_backend=true -Dgds_path=/usr/local/cuda/targets/sbsa-linux"; \
nixl_build_args="-Ddisable_gds_backend=true"; \
else \
nixl_build_args=""; \
fi && \
Expand Down Expand Up @@ -155,7 +155,7 @@ ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# TEMP: disable gds backend for arm64
RUN if [ "$ARCH" = "arm64" ]; then \
cd ${NIXL_SRC_DIR} && uv build . --out-dir /workspace/wheels/nixl \
--config-settings=setup-args="-Ddisable_gds_backend=true -Dgds_path=/usr/local/cuda/targets/sbsa-linux"; \
--config-settings=setup-args="-Ddisable_gds_backend=true"; \
else \
cd ${NIXL_SRC_DIR} && uv build . --out-dir /workspace/wheels/nixl; \
fi && \
Expand Down Expand Up @@ -269,8 +269,6 @@ RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.comman

RUN mkdir -p /home/$USERNAME/.cache/

ENV VLLM_KV_CAPI_PATH=$HOME/dynamo/.build/target/debug/libdynamo_llm_capi.so

ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]

##################################
Expand Down Expand Up @@ -321,7 +319,6 @@ COPY LICENSE /workspace/
COPY Cargo.toml /workspace/
COPY Cargo.lock /workspace/
COPY rust-toolchain.toml /workspace/
COPY hatch_build.py /workspace/

# Copy source code
COPY lib/ /workspace/lib/
Expand Down Expand Up @@ -364,18 +361,11 @@ COPY --from=wheel_builder $CARGO_HOME $CARGO_HOME
# Copy rest of the code
COPY . /workspace

# Build C bindings, creates lib/bindings/c/include
RUN cd /workspace/lib/bindings/c && cargo build --release --locked

# Package the bindings
RUN mkdir -p /opt/dynamo/bindings/wheels && \
mkdir /opt/dynamo/bindings/lib && \
cp dist/ai_dynamo*cp312*.whl /opt/dynamo/bindings/wheels/. && \
cp target/release/libdynamo_llm_capi.so /opt/dynamo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dynamo/bindings/. && \
cp target/release/dynamo-run /usr/local/bin && \
cp target/release/metrics /usr/local/bin && \
cp target/release/mock_worker /usr/local/bin
cp target/release/metrics /usr/local/bin

RUN uv pip install /workspace/dist/ai_dynamo_runtime*cp312*.whl && \
uv pip install /workspace/dist/ai_dynamo*any.whl
Expand All @@ -385,9 +375,6 @@ RUN --mount=type=bind,source=./container/launch_message.txt,target=/workspace/la
sed '/^#\s/d' /workspace/launch_message.txt > ~/.launch_screen && \
echo "cat ~/.launch_screen" >> ~/.bashrc

# Tell vllm to use the Dynamo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH=/opt/dynamo/bindings/lib/libdynamo_llm_capi.so

ENV PYTHONPATH=/workspace/dynamo/deploy/sdk/src:/workspace/dynamo/components/planner/src:/workspace/examples/sglang/utils:$PYTHONPATH

########################################
Expand Down Expand Up @@ -442,21 +429,13 @@ RUN apt-get update && \
uv venv $VIRTUAL_ENV --python 3.12 && \
echo "source $VIRTUAL_ENV/bin/activate" >> ~/.bashrc

# Install SGLang and related packages (sgl-kernel, einops, sentencepiece) since they are not included in the runtime wheel
# https://github.com/sgl-project/sglang/blob/v0.4.9.post1/python/pyproject.toml#L18-51
ARG SGLANG_VERSION
ARG SGL_KERNEL_VERSION
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install sglang[runtime_common]==${SGLANG_VERSION} einops sgl-kernel==${SGL_KERNEL_VERSION} sentencepiece

# Install the wheels and symlink executables to /usr/local/bin so dynamo components can use them
# Copy metrics binary from wheel_builder image, not part of ai-dynamo wheel
# Dynamo components currently do not have the VIRTUAL_ENV in their PATH, so we need to symlink the executables
COPY --from=ci_minimum /workspace/target/release/metrics /usr/local/bin/metrics
COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/
COPY --from=base /workspace/wheels/nixl/*.whl wheelhouse/
RUN uv pip install ai-dynamo nixl --find-links wheelhouse

# Tell vllm to use the Dynamo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dynamo/bindings/lib/libdynamo_llm_capi.so"
RUN uv pip install ai-dynamo[sglang] --find-links wheelhouse

# Copy launch banner
RUN --mount=type=bind,source=./container/launch_message.txt,target=/workspace/launch_message.txt \
Expand Down
16 changes: 5 additions & 11 deletions container/Dockerfile.tensorrt_llm
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ RUN git clone "https://github.com/ai-dynamo/nixl.git" ${NIXL_SRC_DIR} && \
cd ${NIXL_SRC_DIR} && \
git checkout ${NIXL_REF} && \
if [ "$ARCH" = "arm64" ]; then \
nixl_build_args="-Ddisable_gds_backend=true -Dgds_path=/usr/local/cuda/targets/sbsa-linux"; \
nixl_build_args="-Ddisable_gds_backend=true"; \
else \
nixl_build_args=""; \
fi && \
Expand Down Expand Up @@ -220,7 +220,7 @@ ENV VIRTUAL_ENV=/opt/dynamo/venv
# TEMP: disable gds backend for arm64
RUN if [ "$ARCH" = "arm64" ]; then \
cd ${NIXL_SRC_DIR} && uv build . --out-dir /workspace/wheels/nixl \
--config-settings=setup-args="-Ddisable_gds_backend=true -Dgds_path=/usr/local/cuda/targets/sbsa-linux"; \
--config-settings=setup-args="-Ddisable_gds_backend=true"; \
else \
cd ${NIXL_SRC_DIR} && uv build . --out-dir /workspace/wheels/nixl; \
fi && \
Expand Down Expand Up @@ -273,7 +273,6 @@ COPY LICENSE /workspace/
COPY Cargo.toml /workspace/
COPY Cargo.lock /workspace/
COPY rust-toolchain.toml /workspace/
COPY hatch_build.py /workspace/

# Copy source code
COPY lib/ /workspace/lib/
Expand Down Expand Up @@ -311,18 +310,11 @@ COPY --from=wheel_builder $CARGO_HOME $CARGO_HOME
# Copy rest of the code
COPY . /workspace

# Build C bindings, creates lib/bindings/c/include
RUN cd /workspace/lib/bindings/c && cargo build --release --locked

# Package the bindings
RUN mkdir -p /opt/dynamo/bindings/wheels && \
mkdir /opt/dynamo/bindings/lib && \
cp dist/ai_dynamo*cp312*.whl /opt/dynamo/bindings/wheels/. && \
cp target/release/libdynamo_llm_capi.so /opt/dynamo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dynamo/bindings/. && \
cp target/release/dynamo-run /usr/local/bin && \
cp target/release/metrics /usr/local/bin && \
cp target/release/mock_worker /usr/local/bin
cp target/release/metrics /usr/local/bin

# Install wheels
RUN . /opt/dynamo/venv/bin/activate && \
Expand Down Expand Up @@ -484,8 +476,10 @@ ARG TENSORRTLLM_PIP_WHEEL="tensorrt-llm"
ARG TENSORRTLLM_INDEX_URL="https://pypi.python.org/simple"

# Copy Dynamo wheels into wheelhouse
# Copy metrics binary from wheel_builder image, not part of ai-dynamo wheel
COPY --from=dev /workspace/wheels/nixl/*.whl wheelhouse/
COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/
COPY --from=dev /workspace/target/release/metrics /usr/local/bin/metrics

# NOTE: If a package (tensorrt_llm) exists on both --index-url and --extra-index-url,
# uv will prioritize the --extra-index-url, unless --index-strategy unsafe-best-match
Expand Down
56 changes: 27 additions & 29 deletions container/Dockerfile.vllm
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ ARG TORCH_BACKEND="cu128"
ARG DEEPGEMM_REF="03d0be3"
ARG FLASHINF_REF="1d72ed4"

# Make sure to update the dependency version in pyproject.toml when updating this
ARG VLLM_VERSION="0.9.2"

# Define general architecture ARGs for supporting both x86 and aarch64 builds.
# ARCH: Used for package suffixes (e.g., amd64, arm64)
# ARCH_ALT: Used for Rust targets, manylinux suffix (e.g., x86_64, aarch64)
Expand All @@ -39,10 +42,11 @@ ARG ARCH_ALT=x86_64

FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS base

# Redeclare ARCH, ARCH_ALT, TORCH_BACKEND so they're available in this stage
# Redeclare ARCH, ARCH_ALT, TORCH_BACKEND, VLLM_VERSION so they're available in this stage
ARG ARCH
ARG ARCH_ALT
ARG TORCH_BACKEND
ARG VLLM_VERSION

USER root
ARG PYTHON_VERSION=3.12
Expand Down Expand Up @@ -134,7 +138,7 @@ RUN git clone "https://github.com/ai-dynamo/nixl.git" ${NIXL_SRC_DIR} && \
cd ${NIXL_SRC_DIR} && \
git checkout ${NIXL_REF} && \
if [ "$ARCH" = "arm64" ]; then \
nixl_build_args="-Ddisable_gds_backend=true -Dgds_path=/usr/local/cuda/targets/sbsa-linux"; \
nixl_build_args="-Ddisable_gds_backend=true"; \
else \
nixl_build_args=""; \
fi && \
Expand Down Expand Up @@ -171,8 +175,7 @@ ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# TEMP: disable gds backend for arm64
RUN if [ "$ARCH" = "arm64" ]; then \
cd ${NIXL_SRC_DIR} && uv build . --out-dir /workspace/wheels/nixl \
--config-settings=setup-args="-Ddisable_gds_backend=true" \
--config-settings=setup-args="-Dgds_path=/usr/local/cuda/targets/sbsa-linux"; \
--config-settings=setup-args="-Ddisable_gds_backend=true"; \
else \
cd ${NIXL_SRC_DIR} && uv build . --out-dir /workspace/wheels/nixl; \
fi && \
Expand All @@ -190,13 +193,17 @@ ARG MAX_JOBS=16
ENV MAX_JOBS=$MAX_JOBS
ENV CUDA_HOME=/usr/local/cuda

# TODO - split vllm, DeepEP, DeepGeMM, PPLX installs
# Should be able to select how you want your build to go
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
--mount=type=cache,target=/root/.cache/uv \
cp /tmp/deps/vllm/install_vllm.sh /tmp/install_vllm.sh && \
chmod +x /tmp/install_vllm.sh && \
/tmp/install_vllm.sh --editable --vllm-ref $VLLM_REF --max-jobs $MAX_JOBS --arch $ARCH --installation-dir /opt --deepgemm-ref $DEEPGEMM_REF --flashinf-ref $FLASHINF_REF --torch-backend $TORCH_BACKEND
if [ "$ARCH" = "arm64" ]; then \
# TODO - split vllm, DeepEP, DeepGeMM, PPLX installs
# Should be able to select how you want your build to go
cp /tmp/deps/vllm/install_vllm.sh /tmp/install_vllm.sh && \
chmod +x /tmp/install_vllm.sh && \
/tmp/install_vllm.sh --editable --vllm-ref $VLLM_REF --max-jobs $MAX_JOBS --arch $ARCH --installation-dir /opt --deepgemm-ref $DEEPGEMM_REF --flashinf-ref $FLASHINF_REF --torch-backend $TORCH_BACKEND; \
else \
uv pip install "vllm==${VLLM_VERSION}"; \
fi

ENV LD_LIBRARY_PATH=\
/opt/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_install/lib:\
Expand Down Expand Up @@ -348,7 +355,6 @@ COPY LICENSE /workspace/
COPY Cargo.toml /workspace/
COPY Cargo.lock /workspace/
COPY rust-toolchain.toml /workspace/
COPY hatch_build.py /workspace/

# Copy source code
COPY lib/ /workspace/lib/
Expand Down Expand Up @@ -392,22 +398,11 @@ COPY --from=wheel_builder $CARGO_HOME $CARGO_HOME
# Copy rest of the code
COPY . /workspace

# Build C bindings, creates lib/bindings/c/include
#
# TODO: In theory the 'cargo build' in earlier stage covers this, we "just" need to copy the
# `lib/bindings/c/include` folder that build.rs generated across.
# I couldn't get that to work, hence TODO.
RUN cd /workspace/lib/bindings/c && cargo build --release --locked

# Package the bindings
RUN mkdir -p /opt/dynamo/bindings/wheels && \
mkdir /opt/dynamo/bindings/lib && \
cp dist/ai_dynamo*cp312*.whl /opt/dynamo/bindings/wheels/. && \
cp target/release/libdynamo_llm_capi.so /opt/dynamo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dynamo/bindings/. && \
cp target/release/dynamo-run /usr/local/bin && \
cp target/release/metrics /usr/local/bin && \
cp target/release/mock_worker /usr/local/bin
cp target/release/metrics /usr/local/bin

RUN uv pip install /workspace/dist/ai_dynamo_runtime*cp312*.whl && \
uv pip install /workspace/dist/ai_dynamo*any.whl
Expand Down Expand Up @@ -455,9 +450,6 @@ RUN apt-get update && \
cuda-toolkit-12-8 && \
rm -rf /var/lib/apt/lists/*

### COPY BINDINGS ###
# Copy all bindings (wheels, lib, include) from ci_minimum
COPY --from=ci_minimum /opt/dynamo/bindings /opt/dynamo/bindings
### COPY NATS & ETCD ###
# Copy nats and etcd from base image
COPY --from=base /usr/bin/nats-server /usr/bin/nats-server
Expand All @@ -466,11 +458,16 @@ ENV PATH=/usr/local/bin/etcd/:$PATH

# Copy UCX from base image as plugin for NIXL
# Copy NIXL source from wheel_builder image
# Copy dynamo wheels for gitlab artifacts
COPY --from=base /usr/local/ucx /usr/local/ucx
COPY --from=wheel_builder $NIXL_PREFIX $NIXL_PREFIX
COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/

# Copies vllm, DeepEP, DeepGEMM, PPLX repos (all editable installs) and nvshmem binaries
COPY --from=base /opt/vllm /opt/vllm
RUN if [ "$ARCH" = "arm64" ]; then \
COPY --from=base /opt/vllm /opt/vllm; \
fi

ENV LD_LIBRARY_PATH=\
/opt/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_install/lib:\
$NIXL_LIB_DIR:\
Expand All @@ -479,10 +476,11 @@ $NIXL_PLUGIN_DIR:\
/usr/local/ucx/lib/ucx:\
$LD_LIBRARY_PATH


# Copy entire venv
# Theres a lot of stuff we'd have to re-compile
# Think its better to just copy
# Theres a lot of stuff we'd have to re-compile (for arm64)
# TODO: use pip ai-dynamo[vllm] in venv to replicate end user environment
# Copy metrics binary from wheel_builder image, not part of ai-dynamo wheel
COPY --from=ci_minimum /workspace/target/release/metrics /usr/local/bin/metrics
COPY --from=ci_minimum ${VIRTUAL_ENV} ${VIRTUAL_ENV}

# Once UX refactor is merged
Expand Down
1 change: 0 additions & 1 deletion deploy/sdk/src/dynamo/sdk/tests/test_e2e_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@

from dynamo.sdk.cli.cli import cli

pytestmark = pytest.mark.pre_merge
runner = CliRunner()


Expand Down
1 change: 0 additions & 1 deletion deploy/sdk/src/dynamo/sdk/tests/test_e2e_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@

from dynamo.sdk.cli.cli import cli

pytestmark = pytest.mark.pre_merge
runner = CliRunner()


Expand Down
3 changes: 0 additions & 3 deletions deploy/sdk/src/dynamo/sdk/tests/test_link.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import pytest

from dynamo.sdk.core.protocol.interface import LinkedServices

pytestmark = pytest.mark.pre_merge


def test_remove_backend2():
from dynamo.sdk.tests.pipeline import Backend, Backend2, Frontend, Middle
Expand Down
2 changes: 0 additions & 2 deletions deploy/sdk/src/dynamo/sdk/tests/test_resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@
from dynamo.sdk.core.protocol.interface import ServiceInterface
from dynamo.sdk.core.runner import TargetEnum

pytestmark = pytest.mark.pre_merge


@pytest.fixture(scope="module", autouse=True)
def setup_and_teardown():
Expand Down
30 changes: 0 additions & 30 deletions hatch_build.py

This file was deleted.

Loading