Skip to content
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
07b506d
llama-router: multi-model serving with dynamic backends
ServeurpersoCom Nov 29, 2025
25f1433
llama-router: fix logging init via static constructor
ServeurpersoCom Nov 29, 2025
4cbedbc
llama-router: centralize defaults and add mmproj auto-detection
ServeurpersoCom Nov 29, 2025
cb7c489
llama-router: add process grouping for selective VRAM management
ServeurpersoCom Nov 29, 2025
dbf3250
llama-router: add legacy endpoint support for single-model compat
ServeurpersoCom Nov 29, 2025
70eec73
llama-router: add comprehensive debug logging
ServeurpersoCom Nov 29, 2025
0f090e2
llama-router: implement SSE streaming and production safety features
ServeurpersoCom Nov 29, 2025
dac95e8
llama-router: fix segfault from static initialization order fiasco
ServeurpersoCom Nov 29, 2025
ee94bfc
llama-router: auto-detect sibling binary, capture logs, wait for back…
ServeurpersoCom Nov 29, 2025
e472330
llama-router: implement cross-platform subprocess I/O forwarding and …
ServeurpersoCom Nov 30, 2025
728bccc
llama-router: validate binary before spawn, clean child error handling
ServeurpersoCom Nov 30, 2025
cbcc8a8
llama-router: add multi-engine support with configurable spawn and en…
ServeurpersoCom Nov 30, 2025
635b70d
llama-router: fix SSE streaming termination and use-after-free
ServeurpersoCom Nov 30, 2025
232799a
llama-router: auto-rescan, admin endpoints, and fixes
ServeurpersoCom Nov 30, 2025
7f274d5
llama-router: add --import-dir for custom model collections
ServeurpersoCom Nov 30, 2025
bfb3e62
llama-router: add README with CLI reference and configuration guide
ServeurpersoCom Nov 30, 2025
4bc8f69
llama-router: document KISS philosophy, optimization patterns, and sy…
ServeurpersoCom Nov 30, 2025
b14ea20
llama-router: fix PATH binary support and macOS detection
ServeurpersoCom Nov 30, 2025
c5fdd3a
llama-router: separate quick-start guide from technical architecture …
ServeurpersoCom Nov 30, 2025
cb44f59
llama-router: async polling for process termination after SIGKILL
ServeurpersoCom Nov 30, 2025
85f418d
llama-router: separate PROCESS (OS) and BACKEND (HTTP) polling constants
ServeurpersoCom Nov 30, 2025
41f506a
llama-router: add real-time model swap notifications via SSE
ServeurpersoCom Dec 1, 2025
da65c5f
llama-router: document notify_model_swap feature in README and ARCHIT…
ServeurpersoCom Dec 1, 2025
919e581
llama-router: add embedded WebUI support
ServeurpersoCom Dec 1, 2025
b248838
llama-router: add startup_model configuration option
ServeurpersoCom Dec 1, 2025
6e93322
llama-router: document startup_model in README and ARCHITECTURE
ServeurpersoCom Dec 1, 2025
47408bc
llama-router: auto-configure startup_model on first HF download
ServeurpersoCom Dec 1, 2025
1a014b2
llama-router: add --jinja to default spawn configuration
ServeurpersoCom Dec 1, 2025
d99d952
llama-router: replace implicit arg injection with explicit placeholders
ServeurpersoCom Dec 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
llama-router: replace implicit arg injection with explicit placeholders
Remove automatic --model/--port/--host appending in favor of $path,
$port, $host placeholders in spawn commands. All parameters now visible
in configuration for full transparency and flexibility
  • Loading branch information
ServeurpersoCom committed Dec 1, 2025
commit d99d95206ede5e7c20a66c8dfa41637b37b4517c
7 changes: 7 additions & 0 deletions tools/router/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,13 @@ Spawn commands support both absolute/relative paths and PATH-based binaries:

The router only validates file existence for commands containing `/` or `\\` path separators, allowing seamless use of system-installed binaries.

### Spawn Command Placeholders

The router expands placeholders in spawn commands:
- `$path` → The model file path from `path` field
- `$port` → Dynamically assigned port (increments from `base_port`)
- `$host` → Always expands to `127.0.0.1` for security

### Model-Scoped Route Stripping

Routes like `/<model>/health` are router-side aliases for convenience. Before proxying to the backend, the router strips the model prefix:
Expand Down
43 changes: 35 additions & 8 deletions tools/router/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,15 @@ Override with `--config`:
"notify_model_swap": false
},
"default_spawn": {
"command": ["llama-server", "--jinja", "--ctx-size", "4096", "--n-gpu-layers", "99"],
"command": [
"llama-server",
"-m", "$path",
"--port", "$port",
"--host", "$host",
"--jinja",
"--ctx-size", "4096",
"--n-gpu-layers", "99"
],
"proxy_endpoints": ["/v1/", "/health", "/slots", "/props"],
"health_endpoint": "/health"
},
Expand Down Expand Up @@ -233,16 +241,31 @@ The `default_spawn` block defines how llama-server instances are launched:

```json
{
"command": ["llama-server", "--jinja", "--ctx-size", "4096", "--n-gpu-layers", "99"],
"command": [
"llama-server",
"-m", "$path",
"--port", "$port",
"--host", "$host",
"--jinja",
"--ctx-size", "4096",
"--n-gpu-layers", "99"
],
"proxy_endpoints": ["/v1/", "/health", "/slots", "/props"],
"health_endpoint": "/health"
}
```

The router automatically appends these arguments:
- `--model <path>` - The model file path
- `--port <port>` - Dynamically assigned port
- `--host 127.0.0.1` - Localhost binding for security
### Spawn Command Placeholders

The router supports placeholders in spawn commands for dynamic value injection:

| Placeholder | Description | Example expansion |
|-------------|-------------|-------------------|
| `$path` | Model file path from configuration | `/home/user/.cache/llama.cpp/model.gguf` |
| `$port` | Dynamically assigned port | `50000`, `50001`, etc. |
| `$host` | Bind address (always `127.0.0.1`) | `127.0.0.1` |

This makes all spawn parameters explicit and visible in the configuration.

### Optimizing for Your Hardware

Expand All @@ -253,6 +276,9 @@ The `default_spawn` is where you tune performance for your specific hardware. **
"default_spawn": {
"command": [
"llama-server",
"-m", "$path",
"--port", "$port",
"--host", "$host",
"-ngl", "999",
"-ctk", "q8_0",
"-ctv", "q8_0",
Expand All @@ -277,8 +303,6 @@ The `default_spawn` is where you tune performance for your specific hardware. **
- `-kvu`: Use single unified KV buffer for all sequences (also `--kv-unified`)
- `--jinja`: Enable Jinja template support

**Note:** The router automatically appends `--model`, `--port`, and `--host` - do not include these in your command.

Change `default_spawn`, reload the router, and all `auto` models instantly use the new configuration.

### Per-Model Spawn Override
Expand All @@ -293,6 +317,9 @@ Individual models can override the default spawn configuration:
"spawn": {
"command": [
"llama-server",
"-m", "$path",
"--port", "$port",
"--host", "$host",
"--jinja",
"--ctx-size", "8192",
"--n-gpu-layers", "99",
Expand Down
18 changes: 12 additions & 6 deletions tools/router/router-app.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,18 @@ bool RouterApp::ensure_running(const std::string & model_name, std::string & err
const SpawnConfig spawn_cfg = resolve_spawn_config(cfg);

std::vector<std::string> command = spawn_cfg.command;
command.push_back("--model");
command.push_back(expand_user_path(cfg.path));
command.push_back("--port");
command.push_back(std::to_string(port));
command.push_back("--host");
command.push_back("127.0.0.1");
const std::string model_path = expand_user_path(cfg.path);

// Replace all placeholders
for (auto & arg : command) {
if (arg == "$path") {
arg = model_path;
} else if (arg == "$port") {
arg = std::to_string(port);
} else if (arg == "$host") {
arg = "127.0.0.1";
}
}

LOG_INF("Starting %s on port %d\n", model_name.c_str(), port);

Expand Down
2 changes: 1 addition & 1 deletion tools/router/router-config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ static json serialize_spawn_config(const SpawnConfig & spawn) {
const SpawnConfig & get_default_spawn() {
static const SpawnConfig spawn = [] {
SpawnConfig default_spawn = {
/*command =*/ {"llama-server", "--jinja", "--ctx-size", "4096", "--n-gpu-layers", "99"},
/*command =*/ {"llama-server", "-m", "$path", "--port", "$port", "--host", "$host", "--jinja", "--ctx-size", "4096", "--n-gpu-layers", "99"},
/*proxy_endpoints =*/ {"/v1/", "/health", "/slots", "/props"},
/*health_endpoint =*/ "/health",
};
Expand Down