Skip to content

Conversation

@mikebonnet
Copy link
Collaborator

@mikebonnet mikebonnet commented Dec 15, 2025

Only copy required binaries and libraries from the installation directory into the final image, and install only necessary runtime dependencies. The final image size is reduced by over 2Gb.

Summary by Sourcery

Use a multi-stage build for the ROCm container image and adjust build scripts to support installation into a temporary prefix for ROCm builds.

Enhancements:

  • Refactor the ROCm Containerfile to use a builder stage and produce a slimmer runtime image by copying only required binaries and shared libraries.
  • Update the build_llama_and_whisper.sh script to treat ROCm like other GPU-specific images for install prefix handling and to use a unified GPU targets flag instead of AMDGPU-specific targets.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 15, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Converts the ROCm container image to a multi-stage build that installs llama/whisper binaries into a slim runtime image under /tmp/install, adjusts the build script to use a temporary install prefix for ROCm and to use GPU_TARGETS instead of AMDGPU_TARGETS, and removes some redundant install logic for whisper and llama binaries.

File-Level Changes

Change Details Files
Convert ROCm Containerfile to a multi-stage build and slim runtime image.
  • Change base stage to a named builder stage using Fedora 43 and run the existing build_llama_and_whisper.sh rocm script there.
  • Introduce a second Fedora 43 runtime stage that copies only selected llama and whisper binaries from /tmp/install/bin into /usr/bin.
  • Copy only shared libraries from /tmp/install/lib64 into /usr/lib64 in the final image.
  • Install only required ROCm runtime packages (hipblas, rocblas, rocm-hip, rocm-runtime, rocsolver) with weak dependencies disabled, then clean dnf caches.
container-images/rocm/Containerfile
Align build script install prefix and flags for ROCm multi-stage layout and simplify install steps.
  • Extend set_install_prefix to use /tmp/install for the rocm containerfile, matching other GPU-specific images.
  • Change ROCm CMake flag from AMDGPU_TARGETS to GPU_TARGETS, preserving the default list of architectures.
  • Remove explicit mkdir of the install_prefix/bin directory from the Whisper build function (now handled by cmake install or prior steps).
  • Remove redundant local install_prefix variable and its use in llama.cpp clone_and_build_llama_cpp, relying instead on cmake install behavior or centralized prefix logic.
container-images/scripts/build_llama_and_whisper.sh

Possibly linked issues

  • #rocm-ubi container build fails around dnf remove gcc-c++: PR’s multi-stage rocm build avoids the problematic dnf remove step that caused the original build failure.
  • #ramalama new rocm image is broken due to missing libhipblas.so.2: They both address the broken ROCm image; PR adds hipblas/ROCm runtimes via multi-stage build, fixing missing libhipblas.so.2.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @mikebonnet, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a multi-stage build strategy for the ROCm container image. The core purpose is to dramatically decrease the final image's footprint by isolating the build artifacts and only transferring the essential components to the production-ready image. This optimization ensures that the container remains lean while still providing full functionality for ROCm-accelerated applications.

Highlights

  • Multi-stage build implementation: The ROCm Containerfile has been refactored to utilize a multi-stage build process, separating the build environment from the final runtime image.
  • Significant image size reduction: By selectively copying only the necessary binaries and libraries from the build stage and installing minimal runtime dependencies, the final image size is reduced by over 2GB.
  • Optimized dependency management: The final image now installs only essential ROCm runtime dependencies using dnf with install_weak_deps=false.
  • Build script adjustments: The build_llama_and_whisper.sh script was updated to correctly set the installation prefix to /tmp/install for ROCm builds, align with the multi-stage approach, and include a variable name change from AMDGPU_TARGETS to GPU_TARGETS.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • In clone_and_build_llama_cpp, install_prefix is no longer defined after removing the local assignment, but it’s still used in the final install command, which will cause the script to fail or use an unintended value—either reintroduce a local assignment from set_install_prefix or adjust the install path accordingly.
  • The HIP build flags changed from -DAMDGPU_TARGETS=... to -DGPU_TARGETS=...; please confirm this matches the current llama.cpp/whisper.cpp CMake option names, as this rename may be ignored or break target selection if the upstream variable is still AMDGPU_TARGETS.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `clone_and_build_llama_cpp`, `install_prefix` is no longer defined after removing the local assignment, but it’s still used in the final `install` command, which will cause the script to fail or use an unintended value—either reintroduce a local assignment from `set_install_prefix` or adjust the install path accordingly.
- The HIP build flags changed from `-DAMDGPU_TARGETS=...` to `-DGPU_TARGETS=...`; please confirm this matches the current llama.cpp/whisper.cpp CMake option names, as this rename may be ignored or break target selection if the upstream variable is still `AMDGPU_TARGETS`.

## Individual Comments

### Comment 1
<location> `container-images/scripts/build_llama_and_whisper.sh:246-245` </location>
<code_context>

 clone_and_build_llama_cpp() {
   local llama_cpp_commit="${LLAMA_CPP_PULL_REF:-$DEFAULT_LLAMA_CPP_COMMIT}"
-  local install_prefix
-  install_prefix=$(set_install_prefix)
   git_clone_specific_commit "${LLAMA_CPP_REPO:-https://github.com/ggml-org/llama.cpp}" "$llama_cpp_commit"
   cmake_steps "${common_flags[@]}"
   install -m 755 build/bin/rpc-server "$install_prefix"/bin/rpc-server
</code_context>

<issue_to_address>
**issue (bug_risk):** Removing the local install_prefix assignment risks installing rpc-server into an unintended path.

This function still calls `install ... "$install_prefix"/bin/rpc-server`, but `install_prefix` is no longer initialized here. Unless it’s guaranteed to be set in the environment, this will likely resolve to `/bin/rpc-server` or fail. Either restore `local install_prefix=$(set_install_prefix)` or remove this manual `install` in favor of `cmake --install` with the prefix configured in `cmake_steps`.
</issue_to_address>

### Comment 2
<location> `container-images/rocm/Containerfile:18` </location>
<code_context>
+  /tmp/install/bin/llama-quantize \
+  /tmp/install/bin/whisper-server \
+  /usr/bin/
+COPY --from=builder /tmp/install/lib64/*.so /usr/lib64/
+
+RUN dnf -y --setopt=install_weak_deps=false install hipblas rocblas rocm-hip rocm-runtime rocsolver && \
</code_context>

<issue_to_address>
**suggestion:** The COPY glob for shared libraries can fail if no .so files exist or miss versioned symlinks.

To avoid this fragility, either copy the entire directory (e.g., `COPY --from=builder /tmp/install/lib64/ /usr/lib64/`) or use a broader pattern such as `*.so*` so that all ROCm shared libraries and their symlinks are reliably included.

```suggestion
COPY --from=builder /tmp/install/lib64/ /usr/lib64/
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes refactor the ROCm container build to a multi-stage Dockerfile, separating the llama.cpp and whisper.cpp compilation into a builder stage and then copying specific binaries and shared libraries, along with installing ROCm runtime dependencies, into the final image. The build script was updated to correctly configure the install prefix and GPU targets for ROCm. A review comment points out that the rpc-server binary is not being copied to the final image and suggests using a wildcard in the COPY command for binaries to ensure all executables are included robustly.

Only copy required binaries and libraries from the installation directory into
the final image, and install only necessary runtime dependencies. The final image
size is reduced by over 2Gb.

Signed-off-by: Mike Bonnet <[email protected]>
Override the use of "uv", so python dependencies get installed into system directories.

Signed-off-by: Mike Bonnet <[email protected]>
@mikebonnet
Copy link
Collaborator Author

/retest ramalama-on-pull-request

@rhatdan
Copy link
Member

rhatdan commented Dec 16, 2025

LGTM, assuming you tested this on an AMD system?

@mikebonnet
Copy link
Collaborator Author

Unfortunately I don't have a AMD system to test with. If someone could test ramalama run or ramalama serve with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-34482d11f78e7c93649c4cddb070c285e355ac39 that would be very helpful.

@rhatdan
Copy link
Member

rhatdan commented Dec 16, 2025

Well i have a amd laptop, and it fails the same way with your image as the quay.io/ramalama/ramalama image, luckily for me, the laptop works with vulkan driver.

@rhatdan
Copy link
Member

rhatdan commented Dec 16, 2025

.Memory critical error by agent node-0 (Agent handle: 0x3390320) on address 0x7f599f65e000. Reason: Memory in use.

@mikebonnet
Copy link
Collaborator Author

Does the quay.io/ramalama/rocm:0.15 image behave the same way?

@rhatdan
Copy link
Member

rhatdan commented Dec 16, 2025

Yes

@mikebonnet
Copy link
Collaborator Author

There were reports that it was a firmware problem. Have you tried with different/older firmware?

@olliewalsh
Copy link
Collaborator

Unfortunately I don't have a AMD system to test with. If someone could test ramalama run or ramalama serve with --image quay.io/redhat-user-workloads/ramalama-tenant/rocm:on-pr-34482d11f78e7c93649c4cddb070c285e355ac39 that would be very helpful.

I'm almost done setting up an AMD system so can test this soon, tomorrow hopefully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants