Commit 2722844
ytian218
server: fix crash when batch > ubatch with embeddings (#12836)
Fixes #12836 where the server crashes with GGML_ASSERT failure when
running with embeddings enabled and n_batch > n_ubatch.
Root cause: Embeddings use non-causal attention which requires all
tokens to be processed within a single ubatch. When n_batch > n_ubatch,
the server attempts to split processing, causing assertion failure.
Solution:
- Add parameter validation in main() after common_params_parse()
- When embeddings enabled and n_batch > n_ubatch:
* Log warnings explaining the issue
* Automatically set n_batch = n_ubatch
* Prevent server crash
This follows the approach suggested by @ggerganov in issue #12836.
Note: This supersedes stalled PR #12940 which attempted a runtime fix
in the old examples/server/server.cpp location. This implementation
validates at startup in tools/server/server.cpp (current location).
Testing:
- Build: Compiles successfully
- Validation triggers: Warns when -b > -ub with --embedding
- Auto-correction works: Adjusts n_batch = n_ubatch
- No false positives: Valid params don't trigger warnings
- Verified on macOS M3 Pro with embedding model1 parent 583cb83 commit 2722844
1 file changed
+9
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3657 | 3657 | | |
3658 | 3658 | | |
3659 | 3659 | | |
| 3660 | + | |
| 3661 | + | |
| 3662 | + | |
| 3663 | + | |
| 3664 | + | |
| 3665 | + | |
| 3666 | + | |
| 3667 | + | |
| 3668 | + | |
3660 | 3669 | | |
3661 | 3670 | | |
3662 | 3671 | | |
| |||
0 commit comments