Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs(deploy): remove outdated health checks and clarify model configu…
…ration details in README.md
  • Loading branch information
ishandhanani committed Aug 1, 2025
commit f3c9619e2c6ea36da97089b28a4162862dabd5f0
31 changes: 1 addition & 30 deletions components/backends/sglang/deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,18 +57,6 @@ resources:
gpu: "1"
```

**Health Checks:**
```yaml
livenessProbe:
httpGet: # For Frontend
path: /health
port: 8000
exec: # For Workers
command: ["/bin/sh", "-c", "exit 0"]
readinessProbe:
# Similar structure
```

**Container Configuration:**
```yaml
extraPodSpec:
Expand Down Expand Up @@ -119,31 +107,14 @@ args:
kubectl apply -f <your-template>.yaml
```

## Resource Requirements

| Component | CPU | Memory | GPU | Purpose |
|-----------|-----|--------|-----|---------|
| Frontend | 5 cores | 10Gi | 0 | HTTP API server |
| DecodeWorker | 10 cores | 20Gi | 1 | Model inference |
| PrefillWorker | 10 cores | 20Gi | 1 | Initial token processing (disagg only) |

**Note:** Adjust resources based on your model size and performance requirements.

## Model Configuration

All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. Key parameters:

- `--page-size 16`: KV cache page size
- `--tp 1`: Tensor parallelism degree
- `--trust-remote-code`: Enable custom model code
- `--skip-tokenizer-init`: Optimize startup time
All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. But you can use any sglang argument and configuration. Key parameters:

## Monitoring and Health

- **Frontend health endpoint**: `http://<frontend-service>:8000/health`
- **Liveness probes**: Check process health every 60s
- **Readiness probes**: Ensure service readiness before routing traffic
- **Failure threshold**: 10 consecutive failures trigger restart

## Further Reading

Expand Down
Loading