Skip to content

Conversation

@anivar
Copy link

@anivar anivar commented Dec 7, 2025

Fixes #835

This PR addresses a remote DoS vulnerability where attackers can crash the llamafile server by sending requests with extremely large prompts.

The Problem

When someone sends a prompt with more than 2.1 billion characters, the server tries to allocate memory for tokenization. The code adds the text length to a small number and stores it in a 32-bit integer, but the text length is a 64-bit value. When the text is too large, this causes integer overflow - the value wraps around to negative, the vector allocation fails with std::length_error, and the entire process crashes.

The bug is in llamafile/llama.cpp line 50:

int n_tokens = text.size() + 2 * add_special;

The Fix

Check the text size before doing the math. If it's too large, throw an exception that gets caught by the existing error handler instead of letting the overflow happen:

if (text.size() > static_cast<size_t>(INT_MAX) - 2) {
    throw std::length_error("cannot create std::vector larger than max_size()");
}

The worker's exception handler (worker.cpp:122) already catches these exceptions and logs them, so the server stays up instead of crashing.

Impact

This closes a remote DoS vector. An attacker can no longer crash the server just by sending a malformed request. The fix makes llamafile behave like standalone llama.cpp, which handles this gracefully.

Fixes mozilla-ai#835

When an extremely large prompt (>2^31 characters) is sent to the
llamafile server, the tokenization function would experience integer
overflow, causing a crash with std::length_error and terminating
the entire server process.

Root cause: In llamafile/llama.cpp line 50, text.size() (size_t/uint64)
was being added to a small value and assigned to int (int32), causing
overflow when text.size() exceeded INT_MAX.

Fix: Added bounds checking before the addition to prevent overflow.
If the input text is too large, we now throw std::length_error with
the same error message that llama.cpp naturally throws, which the
worker exception handler will catch and log.

This matches the behavior of standalone llama.cpp which has internal
bounds checks in std::vector and returns a controlled 500 error rather
than crashing the process.

Security impact: Prevents remote unauthenticated DoS attack where an
attacker could crash the llamafile server by sending an oversized prompt.
@anivar anivar force-pushed the fix/integer-overflow-dos-835 branch from d05e8ce to 8eca66e Compare December 7, 2025 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Integer overflow in llamafile leads to arbitrary DoS in llamafile

1 participant