Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: add scaling adapter
Signed-off-by: Julien Mancuso <[email protected]>
  • Loading branch information
julienmancuso committed Dec 4, 2025
commit de1a1573adad76d0e8a79706ce5b6462af72b04e
8 changes: 4 additions & 4 deletions docs/kubernetes/autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ Dynamo metrics include these labels for filtering:

#### Example: Scale Decode Service Based on TTFT

Using HPA with Prometheus Adapter requires configuring external metrics.
Using HPA with Prometheus Adapter requires configuring external metrics.

**Step 1: Configure Prometheus Adapter**

Expand All @@ -208,7 +208,7 @@ rules:
as: "dynamo_ttft_p95_seconds"
metricsQuery: |
histogram_quantile(0.95,
sum(rate(dynamo_frontend_time_to_first_token_seconds_bucket{<<.LabelMatchers>>}[5m]))
sum(rate(dynamo_frontend_time_to_first_token_seconds_bucket{<<.LabelMatchers>>}[5m]))
by (le, namespace, dynamo_namespace)
)
```
Expand Down Expand Up @@ -383,7 +383,7 @@ spec:
metricName: dynamo_ttft_p95
query: |
histogram_quantile(0.95,
sum(rate(dynamo_frontend_time_to_first_token_seconds_bucket{dynamo_namespace="default-sglang-agg"}[5m]))
sum(rate(dynamo_frontend_time_to_first_token_seconds_bucket{dynamo_namespace="default-sglang-agg"}[5m]))
by (le)
)
threshold: "0.5" # Scale up when TTFT p95 > 500ms (0.5 seconds)
Expand Down Expand Up @@ -519,7 +519,7 @@ spec:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090
query: |
histogram_quantile(0.95,
sum(rate(dynamo_frontend_time_to_first_token_seconds_bucket{dynamo_namespace="default-sglang-agg"}[5m]))
sum(rate(dynamo_frontend_time_to_first_token_seconds_bucket{dynamo_namespace="default-sglang-agg"}[5m]))
by (le)
)
threshold: "0.5"
Expand Down
Loading