Fix .args loading order for program mode detection #840

vlasky · 2025-12-08T03:58:17Z

Note: This PR targets release v0.9.3 (commit ff0c02e) to make it easy for users relying on the last stable release to apply this fix to their local builds. Upstream main has since restructured to use submodules and patches.

Summary

Move cosmo_args() before determine_program() so embedded .args flags like --server --v2 are visible when determining program mode
Rename lf::server::main() to lf::server::run() with args_already_loaded parameter to avoid double-loading .args

Problem

When a llamafile has an embedded .args file containing --server --v2, the determine_program() call happens BEFORE cosmo_args() loads the embedded args. These flags are not seen, so the program falls through to chatbot mode instead of launching llamafiler.

Issue with PR #788

PR #788 correctly identifies this bug and moves cosmo_args() before determine_program(). However, it introduces a double-loading issue: when the program dispatches to llamafiler mode, lf::server::main() in prog.cpp calls cosmo_args() again. This causes .args to be loaded twice, which is problematic for accumulator flags like --header that append values rather than overwrite them.

Solution

This PR:

Moves cosmo_args() call before determine_program() in llama.cpp/main/main.cpp
Renames lf::server::main() to lf::server::run() with an args_already_loaded parameter
The dispatcher passes true to skip the redundant .args load
Standalone llamafiler binary passes false to load its own args

This provides the same fix as PR #788 while avoiding the double-loading issue.

Test plan

Build llamafile
Embed a .args file containing --server --v2 in a llamafile using zipalign
Run the llamafile and verify it launches llamafiler instead of chatbot
Verify standalone llamafiler binary still works correctly

Enhances the .args timing fix from PR #788.

Move cosmo_args() before determine_program() so that embedded .args flags like --server --v2 are visible when determining program mode. Without this fix, a .llamafile with embedded server flags would fall through to chatbot mode instead of launching llamafiler. To avoid double-loading .args (once in main.cpp, once in prog.cpp), rename lf::server::main() to lf::server::run() and add an args_already_loaded parameter. The dispatcher passes true since it already loaded .args, while the standalone llamafiler binary passes false to load its own args. Enhances the .args timing fix from PR mozilla-ai#788.

github-actions bot added llama.cpp llamafile labels Dec 8, 2025

vlasky mentioned this pull request Dec 8, 2025

Fix Server v2 production issues (#767, #783, #787) #788

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix .args loading order for program mode detection #840

Fix .args loading order for program mode detection #840

Uh oh!

vlasky commented Dec 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix .args loading order for program mode detection #840

Are you sure you want to change the base?

Fix .args loading order for program mode detection #840

Uh oh!

Conversation

vlasky commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Issue with PR #788

Solution

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vlasky commented Dec 8, 2025 •

edited

Loading