A Kubernetes-compatible runtime service for OpenHands that provisions sandbox pods for agent sessions. This service implements the OpenHands Remote Runtime API contract and supports Kubernetes versions 1.30+.
- ✅ Complete OpenHands Remote Runtime API implementation
- ✅ Subdomain-based routing for agent server, VSCode, and worker ports
- ✅ Kubernetes pod provisioning with proper resource management
- ✅ Session management with pause/resume capabilities
- ✅ Automatic idle sandbox cleanup (configurable timeout)
- ✅ API key authentication
- ✅ Support for custom runtime classes (sysbox-runc, gvisor)
- ✅ Structured logging and error handling
- ✅ Health checks and readiness probes
- ✅ Automatic cleanup of orphaned resources (failed and idle pods)
The service creates the following Kubernetes resources for each sandbox:
-
Pod: Runs the OpenHands agent server with exposed ports
- Port 60000: Agent server
- Port 60001: VSCode
- Port 12000: Worker 1
- Port 12001: Worker 2
-
Service: ClusterIP service to expose pod ports
-
Ingress: Subdomain-based routing for each port
{session-id}.sandbox.example.com→ Agent servervscode-{session-id}.sandbox.example.com→ VSCodework-1-{session-id}.sandbox.example.com→ Worker 1work-2-{session-id}.sandbox.example.com→ Worker 2
You can add custom annotations to each sandbox Ingress (e.g. for TLS/cert-manager) via SANDBOX_INGRESS_ANNOTATIONS: set to comma-separated
key=valuepairs, e.g.cert-manager.io/issuer=my-issuer,cert-manager.io/issuer-group=cert-manager.io. These are merged with the default annotations (ssl-redirect, websocket-services).
If your DNS provider is slow to propagate new subdomain records (e.g. >5 minutes), you can route sandbox traffic through the runtime API so that only one stable hostname is needed.
- Set
PROXY_BASE_URLto the public URL of this runtime API (e.g.https://runtime-api.your-domain.com). - The
/startresponse will then return:url:{PROXY_BASE_URL}/sandbox/{runtime_id}(agent server; OpenHands uses this for actions).vscode_url:{PROXY_BASE_URL}/sandbox/{runtime_id}/vscode(for "Open in VSCode" in the browser).
- All agent and VSCode traffic is reverse-proxied by the runtime API to the sandbox pod via in-cluster service DNS. No per-sandbox DNS or wildcard DNS is required for proxy mode.
- Ingress resources for each sandbox are still created (for optional direct access once DNS has propagated), but OpenHands and the browser use the proxy URLs immediately.
- Kubernetes cluster version 1.30 or higher
kubectlconfigured to access your cluster- Ingress controller installed (e.g., nginx-ingress)
- Wildcard DNS configured for your base domain
- Container registry access for OpenHands runtime images
Set up wildcard DNS for your base domain. For example, if using sandbox.example.com:
*.sandbox.example.com → Your Ingress Controller IP
Edit k8s/deployment.yaml and update the following:
# In ConfigMap
BASE_DOMAIN: "your-domain.com" # Change to your domain
REGISTRY_PREFIX: "your-registry/openhands" # Change to your registry
# In Secret
API_KEY: "your-secure-api-key" # Generate a secure key
# In Ingress
host: runtime-api.your-domain.com # Change to your API endpoint# Apply the manifests
kubectl apply -f k8s/deployment.yaml
# Verify deployment
kubectl get pods -n openhands
kubectl get svc -n openhands
kubectl get ingress -n openhands# Check if the API is running (no authentication required)
curl https://runtime-api.your-domain.com/health
# Get registry prefix
curl -H "X-API-Key: your-api-key" https://runtime-api.your-domain.com/registry_prefixAll endpoints require the X-API-Key header for authentication.
Start a new runtime sandbox.
Request:
{
"image": "ghcr.io/openhands/runtime:latest",
"command": "/usr/local/bin/openhands-agent-server --port 60000",
"working_dir": "/openhands/code/",
"environment": {
"DEBUG": "true"
},
"session_id": "abc123",
"resource_factor": 1.0,
"runtime_class": "sysbox-runc"
}Response:
{
"runtime_id": "def456",
"session_id": "abc123",
"url": "https://abc123.sandbox.example.com",
"session_api_key": "session-key-here",
"status": "running",
"pod_status": "ready",
"work_hosts": {
"https://work-1-abc123.sandbox.example.com": 12000,
"https://work-2-abc123.sandbox.example.com": 12001
}
}Stop a running runtime.
Request:
{
"runtime_id": "def456"
}Pause a running runtime (deletes pod, keeps state).
Request:
{
"runtime_id": "def456"
}Resume a paused runtime (recreates pod).
Request:
{
"runtime_id": "def456"
}List all runtimes.
Response:
{
"runtimes": [...]
}Get details of a specific runtime.
Get runtime by session ID.
Batch query multiple sessions.
Get the container registry prefix.
Response:
{
"registry_prefix": "ghcr.io/openhands"
}Check if a container image exists.
Response:
{
"exists": true
}Environment variables:
| Variable | Default | Description |
|---|---|---|
SERVER_PORT |
8080 |
HTTP server port |
API_KEY |
(required) | API authentication key |
LOG_LEVEL |
info |
Logging level: info or debug (enables verbose logging with request/response details) |
NAMESPACE |
openhands |
Kubernetes namespace for sandboxes |
INGRESS_CLASS |
nginx |
Ingress class to use |
BASE_DOMAIN |
sandbox.example.com |
Base domain for subdomain routing |
REGISTRY_PREFIX |
ghcr.io/openhands |
Container registry prefix |
DEFAULT_IMAGE |
ghcr.io/openhands/runtime:latest |
Default runtime image |
IMAGE_PULL_SECRETS |
(none) | Comma-separated Kubernetes secret names for pulling sandbox images (e.g. private registry). Required when using images that need a pull secret. |
AGENT_SERVER_PORT |
60000 |
Agent server port in pods |
VSCODE_PORT |
60001 |
VSCode port in pods |
WORKER_1_PORT |
12000 |
Worker 1 port in pods |
WORKER_2_PORT |
12001 |
Worker 2 port in pods |
APP_SERVER_URL |
(optional) | OpenHands app server URL for webhooks |
APP_SERVER_PUBLIC_URL |
(optional) | Public URL for CORS configuration |
PROXY_BASE_URL |
(optional) | When set, sandbox URLs are served via this API (e.g. https://runtime-api.your-domain.com) so only one DNS record is needed; avoids DNS propagation delay for new sandboxes |
IDLE_TIMEOUT_HOURS |
12 |
Hours of inactivity before a sandbox is automatically cleaned up |
REAPER_CHECK_INTERVAL |
15m |
How often to check for idle sandboxes (e.g. 15m, 30m, 1h) |
CLEANUP_ENABLED |
true |
Enable automatic cleanup of orphaned resources |
CLEANUP_INTERVAL_MINUTES |
5 |
Interval between cleanup runs (in minutes) |
CLEANUP_FAILED_THRESHOLD_MINUTES |
60 |
Time before cleaning up failed pods (in minutes) |
CLEANUP_IDLE_THRESHOLD_MINUTES |
1440 |
Time before cleaning up idle pods (in minutes, default 24 hours) |
The runtime API automatically cleans up sandbox pods that have been idle for a configurable duration. This helps prevent resource waste from forgotten or orphaned sandboxes.
- Activity tracking: The last activity timestamp is updated whenever the sandbox receives API requests through the proxy endpoint (
/sandbox/{runtime_id}) - Automatic cleanup: A background reaper process runs every
REAPER_CHECK_INTERVALand removes sandboxes idle for more thanIDLE_TIMEOUT_HOURS - Graceful shutdown: Cleanup deletes the pod, service, and ingress resources and removes the runtime from state
- Only running sandboxes: Paused or stopped sandboxes are not affected by the idle timeout
- Logged: All cleanup operations are logged with the sandbox ID and idle duration
Example configuration for shorter timeout (useful for development):
IDLE_TIMEOUT_HOURS=2 # Clean up after 2 hours of inactivity
REAPER_CHECK_INTERVAL=10m # Check every 10 minutesThe runtime API also cleans up orphaned resources to prevent resource leaks and maintain cluster health. The cleanup service runs periodically and removes:
- Failed Pods: Pods that have been in a failed state (Failed or CrashLoopBackOff) for longer than
CLEANUP_FAILED_THRESHOLD_MINUTES(default: 60 minutes) - Idle Pods: Pods that have been running for longer than
CLEANUP_IDLE_THRESHOLD_MINUTES(default: 24 hours)
When a runtime is cleaned up, all associated resources (Pod, Service, and Ingress) are deleted from Kubernetes, and the runtime is removed from the internal state.
Configuration:
- Set
CLEANUP_ENABLED=falseto disable automatic cleanup - Adjust
CLEANUP_INTERVAL_MINUTESto change how often cleanup runs (default: 5 minutes) - Adjust
CLEANUP_FAILED_THRESHOLD_MINUTESto change when failed pods are cleaned up (default: 60 minutes) - Adjust
CLEANUP_IDLE_THRESHOLD_MINUTESto change when idle pods are cleaned up (default: 1440 minutes / 24 hours)
Monitoring:
- Cleanup operations are logged at INFO level
- Each cleanup run reports the number of runtimes cleaned (failed vs idle)
- Errors during cleanup are logged but don't stop the service
To enable detailed debug logging, set LOG_LEVEL=debug. Debug mode logs:
- Full request/response bodies for all API calls
- Kubernetes operations (pod/service/ingress creation/deletion)
- State management operations
- Authentication and authorization checks
- Detailed error messages
Configure your OpenHands instance to use this runtime:
# config.toml
[sandbox]
api_key = "your-api-key"
remote_runtime_api_url = "https://runtime-api.your-domain.com"
runtime_container_image = "ghcr.io/openhands/runtime:latest"Or using environment variables:
export SANDBOX_API_KEY="your-api-key"
export SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime-api.your-domain.com"
export SANDBOX_RUNTIME_CONTAINER_IMAGE="ghcr.io/openhands/runtime:latest"# Build the binary
go build -o runtime-api ./cmd/runtime-api
# Run locally (requires kubeconfig)
export API_KEY="test-key"
export BASE_DOMAIN="localhost"
./runtime-api# Build
docker build -t openhands-kubernetes-remote-runtime:latest .
# Push to registry
docker tag openhands-kubernetes-remote-runtime:latest your-registry/openhands-kubernetes-remote-runtime:latest
docker push your-registry/openhands-kubernetes-remote-runtime:latest# Start a runtime
curl -X POST https://runtime-api.your-domain.com/start \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"image": "ghcr.io/openhands/runtime:latest",
"command": "/usr/local/bin/openhands-agent-server --port 60000",
"working_dir": "/openhands/code/",
"session_id": "test-123"
}'
# Check runtime status
curl https://runtime-api.your-domain.com/sessions/test-123 \
-H "X-API-Key: your-api-key"
# Stop runtime
curl -X POST https://runtime-api.your-domain.com/stop \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{"runtime_id": "returned-runtime-id"}'- All API endpoints require
X-API-Keyauthentication - Session API keys are generated for each sandbox
- Pods are isolated in the
openhandsnamespace - Optional support for gvisor or sysbox runtime classes for additional isolation
# Check pod status
kubectl get pods -n openhands -l app=openhands-runtime
# View pod logs
kubectl logs -n openhands <pod-name>
# Describe pod for events
kubectl describe pod -n openhands <pod-name># Check ingress configuration
kubectl get ingress -n openhands
# View ingress controller logs
kubectl logs -n ingress-nginx <ingress-controller-pod>Ensure your wildcard DNS is configured correctly:
# Test DNS resolution
nslookup test.sandbox.example.com
# Check if it points to your ingress controllerSee LICENSE file.