OpenHands Kubernetes Remote Runtime

A Kubernetes-compatible runtime service for OpenHands that provisions sandbox pods for agent sessions. This service implements the OpenHands Remote Runtime API contract and supports Kubernetes versions 1.30+.

Features

✅ Complete OpenHands Remote Runtime API implementation
✅ Subdomain-based routing for agent server, VSCode, and worker ports
✅ Kubernetes pod provisioning with proper resource management
✅ Session management with pause/resume capabilities
✅ Automatic idle sandbox cleanup (configurable timeout)
✅ API key authentication
✅ Support for custom runtime classes (sysbox-runc, gvisor)
✅ Structured logging and error handling
✅ Health checks and readiness probes
✅ Automatic cleanup of orphaned resources (failed and idle pods)

Architecture

The service creates the following Kubernetes resources for each sandbox:

Pod: Runs the OpenHands agent server with exposed ports
- Port 60000: Agent server
- Port 60001: VSCode
- Port 12000: Worker 1
- Port 12001: Worker 2
Service: ClusterIP service to expose pod ports
Ingress: Subdomain-based routing for each port
- {session-id}.sandbox.example.com → Agent server
- vscode-{session-id}.sandbox.example.com → VSCode
- work-1-{session-id}.sandbox.example.com → Worker 1
- work-2-{session-id}.sandbox.example.com → Worker 2
You can add custom annotations to each sandbox Ingress (e.g. for TLS/cert-manager) via SANDBOX_INGRESS_ANNOTATIONS: set to comma-separated key=value pairs, e.g. cert-manager.io/issuer=my-issuer,cert-manager.io/issuer-group=cert-manager.io. These are merged with the default annotations (ssl-redirect, websocket-services).

Proxy mode (optional)

If your DNS provider is slow to propagate new subdomain records (e.g. >5 minutes), you can route sandbox traffic through the runtime API so that only one stable hostname is needed.

Set PROXY_BASE_URL to the public URL of this runtime API (e.g. https://runtime-api.your-domain.com).
The /start response will then return:
- url: {PROXY_BASE_URL}/sandbox/{runtime_id} (agent server; OpenHands uses this for actions).
- vscode_url: {PROXY_BASE_URL}/sandbox/{runtime_id}/vscode (for "Open in VSCode" in the browser).
All agent and VSCode traffic is reverse-proxied by the runtime API to the sandbox pod via in-cluster service DNS. No per-sandbox DNS or wildcard DNS is required for proxy mode.
Ingress resources for each sandbox are still created (for optional direct access once DNS has propagated), but OpenHands and the browser use the proxy URLs immediately.

Prerequisites

Kubernetes cluster version 1.30 or higher
kubectl configured to access your cluster
Ingress controller installed (e.g., nginx-ingress)
Wildcard DNS configured for your base domain
Container registry access for OpenHands runtime images

Installation

1. Configure DNS

Set up wildcard DNS for your base domain. For example, if using sandbox.example.com:

*.sandbox.example.com → Your Ingress Controller IP

2. Update Configuration

Edit k8s/deployment.yaml and update the following:

# In ConfigMap
BASE_DOMAIN: "your-domain.com"  # Change to your domain
REGISTRY_PREFIX: "your-registry/openhands"  # Change to your registry

# In Secret
API_KEY: "your-secure-api-key"  # Generate a secure key

# In Ingress
host: runtime-api.your-domain.com  # Change to your API endpoint

3. Deploy to Kubernetes

# Apply the manifests
kubectl apply -f k8s/deployment.yaml

# Verify deployment
kubectl get pods -n openhands
kubectl get svc -n openhands
kubectl get ingress -n openhands

4. Verify Installation

# Check if the API is running (no authentication required)
curl https://runtime-api.your-domain.com/health

# Get registry prefix
curl -H "X-API-Key: your-api-key" https://runtime-api.your-domain.com/registry_prefix

API Endpoints

All endpoints require the X-API-Key header for authentication.

POST /start

Start a new runtime sandbox.

Request:

{
  "image": "ghcr.io/openhands/runtime:latest",
  "command": "/usr/local/bin/openhands-agent-server --port 60000",
  "working_dir": "/openhands/code/",
  "environment": {
    "DEBUG": "true"
  },
  "session_id": "abc123",
  "resource_factor": 1.0,
  "runtime_class": "sysbox-runc"
}

Response:

{
  "runtime_id": "def456",
  "session_id": "abc123",
  "url": "https://abc123.sandbox.example.com",
  "session_api_key": "session-key-here",
  "status": "running",
  "pod_status": "ready",
  "work_hosts": {
    "https://work-1-abc123.sandbox.example.com": 12000,
    "https://work-2-abc123.sandbox.example.com": 12001
  }
}

POST /stop

Stop a running runtime.

Request:

{
  "runtime_id": "def456"
}

POST /pause

Pause a running runtime (deletes pod, keeps state).

Request:

{
  "runtime_id": "def456"
}

POST /resume

Resume a paused runtime (recreates pod).

Request:

{
  "runtime_id": "def456"
}

GET /list

List all runtimes.

Response:

{
  "runtimes": [...]
}

GET /runtime/{runtime_id}

Get details of a specific runtime.

GET /sessions/{session_id}

Get runtime by session ID.

GET /sessions/batch?ids=session1,session2

Batch query multiple sessions.

GET /registry_prefix

Get the container registry prefix.

Response:

{
  "registry_prefix": "ghcr.io/openhands"
}

GET /image_exists?image=ghcr.io/openhands/runtime:latest

Check if a container image exists.

Response:

{
  "exists": true
}

Configuration

Environment variables:

Variable	Default	Description
`SERVER_PORT`	`8080`	HTTP server port
`API_KEY`	(required)	API authentication key
`LOG_LEVEL`	`info`	Logging level: `info` or `debug` (enables verbose logging with request/response details)
`NAMESPACE`	`openhands`	Kubernetes namespace for sandboxes
`INGRESS_CLASS`	`nginx`	Ingress class to use
`BASE_DOMAIN`	`sandbox.example.com`	Base domain for subdomain routing
`REGISTRY_PREFIX`	`ghcr.io/openhands`	Container registry prefix
`DEFAULT_IMAGE`	`ghcr.io/openhands/runtime:latest`	Default runtime image
`IMAGE_PULL_SECRETS`	(none)	Comma-separated Kubernetes secret names for pulling sandbox images (e.g. private registry). Required when using images that need a pull secret.
`AGENT_SERVER_PORT`	`60000`	Agent server port in pods
`VSCODE_PORT`	`60001`	VSCode port in pods
`WORKER_1_PORT`	`12000`	Worker 1 port in pods
`WORKER_2_PORT`	`12001`	Worker 2 port in pods
`APP_SERVER_URL`	(optional)	OpenHands app server URL for webhooks
`APP_SERVER_PUBLIC_URL`	(optional)	Public URL for CORS configuration
`PROXY_BASE_URL`	(optional)	When set, sandbox URLs are served via this API (e.g. `https://runtime-api.your-domain.com`) so only one DNS record is needed; avoids DNS propagation delay for new sandboxes
`IDLE_TIMEOUT_HOURS`	`12`	Hours of inactivity before a sandbox is automatically cleaned up
`REAPER_CHECK_INTERVAL`	`15m`	How often to check for idle sandboxes (e.g. `15m`, `30m`, `1h`)
`CLEANUP_ENABLED`	`true`	Enable automatic cleanup of orphaned resources
`CLEANUP_INTERVAL_MINUTES`	`5`	Interval between cleanup runs (in minutes)
`CLEANUP_FAILED_THRESHOLD_MINUTES`	`60`	Time before cleaning up failed pods (in minutes)
`CLEANUP_IDLE_THRESHOLD_MINUTES`	`1440`	Time before cleaning up idle pods (in minutes, default 24 hours)

Idle Sandbox Cleanup

The runtime API automatically cleans up sandbox pods that have been idle for a configurable duration. This helps prevent resource waste from forgotten or orphaned sandboxes.

Activity tracking: The last activity timestamp is updated whenever the sandbox receives API requests through the proxy endpoint (/sandbox/{runtime_id})
Automatic cleanup: A background reaper process runs every REAPER_CHECK_INTERVAL and removes sandboxes idle for more than IDLE_TIMEOUT_HOURS
Graceful shutdown: Cleanup deletes the pod, service, and ingress resources and removes the runtime from state
Only running sandboxes: Paused or stopped sandboxes are not affected by the idle timeout
Logged: All cleanup operations are logged with the sandbox ID and idle duration

Example configuration for shorter timeout (useful for development):

IDLE_TIMEOUT_HOURS=2      # Clean up after 2 hours of inactivity
REAPER_CHECK_INTERVAL=10m # Check every 10 minutes

Automatic Resource Cleanup

The runtime API also cleans up orphaned resources to prevent resource leaks and maintain cluster health. The cleanup service runs periodically and removes:

Failed Pods: Pods that have been in a failed state (Failed or CrashLoopBackOff) for longer than CLEANUP_FAILED_THRESHOLD_MINUTES (default: 60 minutes)
Idle Pods: Pods that have been running for longer than CLEANUP_IDLE_THRESHOLD_MINUTES (default: 24 hours)

When a runtime is cleaned up, all associated resources (Pod, Service, and Ingress) are deleted from Kubernetes, and the runtime is removed from the internal state.

Configuration:

Set CLEANUP_ENABLED=false to disable automatic cleanup
Adjust CLEANUP_INTERVAL_MINUTES to change how often cleanup runs (default: 5 minutes)
Adjust CLEANUP_FAILED_THRESHOLD_MINUTES to change when failed pods are cleaned up (default: 60 minutes)
Adjust CLEANUP_IDLE_THRESHOLD_MINUTES to change when idle pods are cleaned up (default: 1440 minutes / 24 hours)

Monitoring:

Cleanup operations are logged at INFO level
Each cleanup run reports the number of runtimes cleaned (failed vs idle)
Errors during cleanup are logged but don't stop the service

Debug Logging

To enable detailed debug logging, set LOG_LEVEL=debug. Debug mode logs:

Full request/response bodies for all API calls
Kubernetes operations (pod/service/ingress creation/deletion)
State management operations
Authentication and authorization checks
Detailed error messages

⚠️ Security Warning: Debug mode logs full request/response bodies which may contain sensitive information such as API keys, session tokens, and environment variables. Only enable debug logging in development or when troubleshooting specific issues in controlled environments. Never enable debug logging in production with untrusted users or where logs are stored insecurely.

Integration with OpenHands

Configure your OpenHands instance to use this runtime:

# config.toml
[sandbox]
api_key = "your-api-key"
remote_runtime_api_url = "https://runtime-api.your-domain.com"
runtime_container_image = "ghcr.io/openhands/runtime:latest"

Or using environment variables:

export SANDBOX_API_KEY="your-api-key"
export SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime-api.your-domain.com"
export SANDBOX_RUNTIME_CONTAINER_IMAGE="ghcr.io/openhands/runtime:latest"

Development

Building Locally

# Build the binary
go build -o runtime-api ./cmd/runtime-api

# Run locally (requires kubeconfig)
export API_KEY="test-key"
export BASE_DOMAIN="localhost"
./runtime-api

Building Docker Image

# Build
docker build -t openhands-kubernetes-remote-runtime:latest .

# Push to registry
docker tag openhands-kubernetes-remote-runtime:latest your-registry/openhands-kubernetes-remote-runtime:latest
docker push your-registry/openhands-kubernetes-remote-runtime:latest

Testing

# Start a runtime
curl -X POST https://runtime-api.your-domain.com/start \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "image": "ghcr.io/openhands/runtime:latest",
    "command": "/usr/local/bin/openhands-agent-server --port 60000",
    "working_dir": "/openhands/code/",
    "session_id": "test-123"
  }'

# Check runtime status
curl https://runtime-api.your-domain.com/sessions/test-123 \
  -H "X-API-Key: your-api-key"

# Stop runtime
curl -X POST https://runtime-api.your-domain.com/stop \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"runtime_id": "returned-runtime-id"}'

Security

All API endpoints require X-API-Key authentication
Session API keys are generated for each sandbox
Pods are isolated in the openhands namespace
Optional support for gvisor or sysbox runtime classes for additional isolation

Troubleshooting

Pods not starting

# Check pod status
kubectl get pods -n openhands -l app=openhands-runtime

# View pod logs
kubectl logs -n openhands <pod-name>

# Describe pod for events
kubectl describe pod -n openhands <pod-name>

Ingress not routing correctly

# Check ingress configuration
kubectl get ingress -n openhands

# View ingress controller logs
kubectl logs -n ingress-nginx <ingress-controller-pod>

DNS not resolving

Ensure your wildcard DNS is configured correctly:

# Test DNS resolution
nslookup test.sandbox.example.com

# Check if it points to your ingress controller

License

See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github/workflows		.github/workflows
cmd/runtime-api		cmd/runtime-api
pkg		pkg
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TESTING.md		TESTING.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

OpenHands Kubernetes Remote Runtime

Features

Architecture

Proxy mode (optional)

Prerequisites

Installation

1. Configure DNS

2. Update Configuration

3. Deploy to Kubernetes

4. Verify Installation

API Endpoints

POST /start

POST /stop

POST /pause

POST /resume

GET /list

GET /runtime/{runtime_id}

GET /sessions/{session_id}

GET /sessions/batch?ids=session1,session2

GET /registry_prefix

GET /image_exists?image=ghcr.io/openhands/runtime:latest

Configuration

Idle Sandbox Cleanup

Automatic Resource Cleanup

Debug Logging

Integration with OpenHands

Development

Building Locally

Building Docker Image

Testing

Security

Troubleshooting

Pods not starting

Ingress not routing correctly

DNS not resolving

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages