Skip to content

Conversation

@vlasky
Copy link
Contributor

@vlasky vlasky commented Dec 8, 2025

Note: This PR targets release v0.9.3 (commit ff0c02e) to make it easy for users relying on the last stable release to apply this fix to their local builds. Upstream main has since restructured to use submodules and patches.

Summary

  • Move cosmo_args() before determine_program() so embedded .args flags like --server --v2 are visible when determining program mode
  • Rename lf::server::main() to lf::server::run() with args_already_loaded parameter to avoid double-loading .args

Problem

When a llamafile has an embedded .args file containing --server --v2, the determine_program() call happens BEFORE cosmo_args() loads the embedded args. These flags are not seen, so the program falls through to chatbot mode instead of launching llamafiler.

Issue with PR #788

PR #788 correctly identifies this bug and moves cosmo_args() before determine_program(). However, it introduces a double-loading issue: when the program dispatches to llamafiler mode, lf::server::main() in prog.cpp calls cosmo_args() again. This causes .args to be loaded twice, which is problematic for accumulator flags like --header that append values rather than overwrite them.

Solution

This PR:

  1. Moves cosmo_args() call before determine_program() in llama.cpp/main/main.cpp
  2. Renames lf::server::main() to lf::server::run() with an args_already_loaded parameter
  3. The dispatcher passes true to skip the redundant .args load
  4. Standalone llamafiler binary passes false to load its own args

This provides the same fix as PR #788 while avoiding the double-loading issue.

Test plan

  • Build llamafile
  • Embed a .args file containing --server --v2 in a llamafile using zipalign
  • Run the llamafile and verify it launches llamafiler instead of chatbot
  • Verify standalone llamafiler binary still works correctly

Enhances the .args timing fix from PR #788.

Move cosmo_args() before determine_program() so that embedded .args flags
like --server --v2 are visible when determining program mode. Without this
fix, a .llamafile with embedded server flags would fall through to chatbot
mode instead of launching llamafiler.

To avoid double-loading .args (once in main.cpp, once in prog.cpp), rename
lf::server::main() to lf::server::run() and add an args_already_loaded
parameter. The dispatcher passes true since it already loaded .args, while
the standalone llamafiler binary passes false to load its own args.

Enhances the .args timing fix from PR mozilla-ai#788.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant