Fix integer overflow DoS vulnerability in tokenization (#835) #839
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #835
This PR addresses a remote DoS vulnerability where attackers can crash the llamafile server by sending requests with extremely large prompts.
The Problem
When someone sends a prompt with more than 2.1 billion characters, the server tries to allocate memory for tokenization. The code adds the text length to a small number and stores it in a 32-bit integer, but the text length is a 64-bit value. When the text is too large, this causes integer overflow - the value wraps around to negative, the vector allocation fails with std::length_error, and the entire process crashes.
The bug is in llamafile/llama.cpp line 50:
The Fix
Check the text size before doing the math. If it's too large, throw an exception that gets caught by the existing error handler instead of letting the overflow happen:
The worker's exception handler (worker.cpp:122) already catches these exceptions and logs them, so the server stays up instead of crashing.
Impact
This closes a remote DoS vector. An attacker can no longer crash the server just by sending a malformed request. The fix makes llamafile behave like standalone llama.cpp, which handles this gracefully.