A Kubernetes operator that automates the deployment and lifecycle management of LMCache multiprocess cache servers. It manages a single CRD (LMCacheEngine) and reconciles it into a DaemonSet, ConfigMap, Service, and optional ServiceMonitor.
See DESIGN.md for architecture details, reconciliation logic, and CRD spec reference.
- Kubernetes 1.20+
kubectlconfigured to access your cluster- NVIDIA GPU Operator with the
nvidiaRuntimeClass available on GPU nodes - (Optional) Prometheus Operator for ServiceMonitor support
Important
The operator runs LMCache pods with runtimeClassName: nvidia and privileged: true to gain GPU visibility without consuming GPU resources via the device plugin. This allows the serving engine (e.g., vLLM) to claim all GPUs on the node. Clusters using Pod Security Standards must allow the privileged profile for the LMCache namespace.
Option A: One-line install from release (recommended)
Install the latest stable release:
kubectl apply -f https://github.com/LMCache/LMCache/releases/download/operator-latest/install.yamlOr use the nightly build from the dev branch:
kubectl apply -f https://github.com/LMCache/LMCache/releases/download/operator-nightly-latest/install.yamlOption B: Build from source
cd operator
make build
make install
make deploy IMG=<your-registry>/lmcache-operator:latestA minimal CR deploys a DaemonSet with 60 GB L1 cache on every node:
# lmcache-engine.yaml
apiVersion: lmcache.lmcache.ai/v1alpha1
kind: LMCacheEngine
metadata:
name: my-cache
spec:
l1:
sizeGB: 60kubectl apply -f lmcache-engine.yamlThe operator automatically handles hostIPC, GPU visibility (runtimeClassName: nvidia, privileged: true), node-local service routing, resource sizing, and Prometheus metrics — see DESIGN.md for details.
The operator creates a ConfigMap named <engine-name>-connection containing the kv-transfer-config JSON that vLLM needs. Use it in your vLLM Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm
spec:
replicas: 1
selector:
matchLabels:
app: vllm
template:
metadata:
labels:
app: vllm
spec:
# Required for CUDA IPC between vLLM and LMCache
hostIPC: true
containers:
- name: vllm
image: lmcache/vllm-openai:latest
env:
# Deterministic hashing required by LMCache
- name: PYTHONHASHSEED
value: "0"
command: ["/bin/sh", "-c"]
args:
- |
exec python3 -m vllm.entrypoints.openai.api_server \
--model <your-model> \
--port 8000 \
--gpu-memory-utilization 0.8 \
--kv-transfer-config "$(cat /etc/lmcache/kv-transfer-config.json)"
ports:
- name: http
containerPort: 8000
volumeMounts:
- name: kv-transfer-config
mountPath: /etc/lmcache
readOnly: true
resources:
limits:
nvidia.com/gpu: "1"
volumes:
- name: kv-transfer-config
configMap:
name: my-cache-connection # Must match your LMCacheEngine name + "-connection"Key points for vLLM pods:
hostIPC: trueis required — CUDA IPC (cudaIpcOpenMemHandle) needs a shared IPC namespace between vLLM and LMCache. Without this, GPU memory mapping fails.PYTHONHASHSEED=0— ensures deterministic token hashing so vLLM and LMCache produce consistent cache keys.- ConfigMap mount — the
$(cat ...)pattern reads the connection JSON and passes it inline to--kv-transfer-config. The ConfigMap name is always<LMCacheEngine name>-connection. - No
hostNetworkneeded — the operator creates a ClusterIP Service withinternalTrafficPolicy=Local. kube-proxy routes traffic to the LMCache pod on the same node automatically. The ConfigMap points to the service DNS name, so neither LMCache nor vLLM pods needhostNetwork.
Warning
Do NOT mount an emptyDir at /dev/shm on either LMCache or vLLM pods. With hostIPC: true, both pods share the host's /dev/shm. Mounting an emptyDir (even with medium: Memory) shadows it with a private tmpfs, breaking CUDA IPC — cudaIpcOpenMemHandle fails because IPC handles from one pod become invisible to the other.
# Check LMCacheEngine status
kubectl get lmcNAME PHASE READY DESIRED AGE
my-cache Running 3 3 5m
# Check the connection ConfigMap
kubectl get configmap my-cache-connection -o yaml
# Check LMCache pods
kubectl get pods -l app.kubernetes.io/managed-by=lmcache-operator
# Check detailed status with endpoints
kubectl describe lmc my-cacheUse nodeSelector to run LMCache only on GPU nodes. New GPU nodes automatically get an LMCache pod:
apiVersion: lmcache.lmcache.ai/v1alpha1
kind: LMCacheEngine
metadata:
name: my-cache
spec:
nodeSelector:
nvidia.com/gpu.present: "true"
l1:
sizeGB: 60If the default port (5555) conflicts with other services:
apiVersion: lmcache.lmcache.ai/v1alpha1
kind: LMCacheEngine
metadata:
name: my-cache
spec:
server:
port: 6555
l1:
sizeGB: 60The connection ConfigMap updates automatically — vLLM pods pick up the new port on restart.
apiVersion: lmcache.lmcache.ai/v1alpha1
kind: LMCacheEngine
metadata:
name: production-cache
namespace: llm-serving
spec:
nodeSelector:
nvidia.com/gpu.present: "true"
image:
repository: lmcache/standalone
tag: v0.1.0
server:
port: 6555
chunkSize: 256
maxWorkers: 4
l1:
sizeGB: 60
eviction:
triggerWatermark: 0.8
evictionRatio: 0.2
prometheus:
enabled: true
port: 9090
serviceMonitor:
enabled: true
labels:
release: kube-prometheus-stack
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
priorityClassName: system-node-criticalAdd a Redis L2 adapter for persistent KV cache storage beyond L1 memory:
apiVersion: lmcache.lmcache.ai/v1alpha1
kind: LMCacheEngine
metadata:
name: cache-with-redis
spec:
l1:
sizeGB: 60
l2Backend:
resp:
host: redis.default.svc.cluster.local
port: 6379
numWorkers: 8For Redis authentication, create a Secret with username and password keys and reference it. Credentials are injected as environment variables and never appear in pod args or kubectl describe output. The Secret can live in a different namespace — the operator creates a managed copy automatically:
# Create the secret (or reference an existing one in another namespace):
# kubectl create secret generic redis-auth \
# --from-literal=username=myuser \
# --from-literal=password=mypassword
spec:
l2Backend:
resp:
host: redis.default.svc.cluster.local
port: 6379
authSecretRef:
name: redis-auth
namespace: redis # omit if the Secret is in the same namespaceFor adapter types not yet natively supported by the operator (e.g. nixl_store, fs, mock), use the raw escape hatch. The JSON is passed through to --l2-adapter as-is:
spec:
l2Backend:
raw:
type: nixl_store
config:
backend: "POSIX"
backend_params:
file_path: "/data/lmcache/l2"
use_direct_io: "false"
pool_size: 64Note
Currently only a single L2 adapter is supported at a time. While LMCache multiprocess mode is designed to support multiple L2 adapters in cascade, this functionality is not yet fully tested. Once the multi-adapter pipeline is validated and performance is confirmed, the operator will be updated to support multiple adapters.
By default, the operator derives memory requests/limits from l1.sizeGB. To override:
spec:
l1:
sizeGB: 60
resourceOverrides:
requests:
memory: "70Gi"
cpu: "8"
limits:
memory: "100Gi"make generate # Generate DeepCopy methods
make manifests # Generate CRD YAML + RBAC
make build # Compile operator binary
make fmt # go fmt
make vet # go vet
make test # Run unit tests
make lint # Run golangci-lint# Docker Hub
docker login
make docker-build docker-push IMG=docker.io/<your-user>/lmcache-operator:latest
make deploy IMG=docker.io/<your-user>/lmcache-operator:latest
# Private registry
docker login <your-registry>
make docker-build docker-push IMG=<your-registry>/lmcache-operator:latest
make deploy IMG=<your-registry>/lmcache-operator:latest
# Multi-platform (amd64 + arm64)
make docker-buildx IMG=<your-registry>/lmcache-operator:latestIf your cluster needs pull credentials, create a secret and reference it in config/manager/manager.yaml:
kubectl create secret docker-registry regcred \
--docker-server=<your-registry> \
--docker-username=<username> \
--docker-password=<password> \
-n lmcache-operator-systemCopyright 2026.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.