⚠️ Disclaimer: This README is Work In Progress. Please verify all commands, variable names, and steps before using it in any environment.
Orchestrate asynchronous disaster recovery (DR) for KubeVirt VMs across OpenShift clusters using VolSync. This project ships Ansible roles and playbooks that:
- install the required operators (VolSync, optional MetalLB),
- discover VM disks and create
ReplicationSource/ReplicationDestination,- schedule periodic syncs, pick up new VM disks via re-scans,
- capture sanitized VM specs on the destination cluster for like‑for‑like restore (CPU, RAM, disks, NICs, MACs), and
- perform failover by pausing RD and restoring the captured VM spec pointing to replicated PVCs.
Goal. Keep one or more namespaces / VMs on a source OpenShift cluster asynchronously replicated to a destination cluster using VolSync. Dataflow is PVC→PVC with a transport (typically restic to an object store or rsync direct), scheduled by Kubernetes CronJobs generated by VolSync CRs.
What the automation does.
- Installs VolSync (and optionally MetalLB) via OperatorHub API resources.
- Discovers a VM’s data volumes / PVCs on the source cluster and generates a matching
ReplicationSource. - Creates a
ReplicationDestinationon the destination cluster with compatible storageClass/size. - Schedules periodic syncs and optional retention.
- Captures/exports a sanitized VM manifest on destination and stores it for DR (same CPU, memory, disks, network interfaces, MACs and NADs when possible).
- For failover, pauses destination RD, performs a final sync/promote, rebinds PVCs, and recreates VM from the captured manifest.
Note: Live migration of VolSync‑replicated disks isn’t a thing; this is asynchronous, point‑in‑time replication. RPO≈your schedule; RTO depends on PVC promotion + VM restore.
-
Ansible Core and Python 3 with Kubernetes client libs:
dnf install -y ansible-core git python3-pip # or: apt/yum pip3 install kubernetes -
ocCLI installed and logged into both clusters at least once (to seed kubeconfigs/contexts) or provide paths to kubeconfig files in the inventory.
- Two OpenShift clusters: source (primary) and destination (DR).
- Working storage classes on both sides with sufficient capacity.
- Object storage credentials if using the restic transport (S3/compatible) — recommended for geo DR.
.
├── ansible.cfg
├── requirements.yml
├── inventories/
│ └── lab/ # example inventory
├── playbooks/ # task entry points (install, discover, configure, capture, failover, etc.)
└── roles/ # role-ized logic used by the playbooks
Tip: Keep your own inventory (e.g.,
inventories/prod/) separate from the samplelabone.
# 1) Clone
git clone https://github.com/linusali/ocp-virt-async-dr
cd ocp-virt-async-dr
# 2) Prepare Python and Ansible bits (once)
pip3 install kubernetes
ansible-galaxy collection install -r requirements.yml
# 3) Copy the sample inventory and edit
cp -r inventories/lab inventories/my-site
$EDITOR inventories/my-site/group_vars/all.yml # see sections below
$EDITOR inventories/my-site/hosts.ini # set contexts/kubeconfigs
# 4) Install operators on both clusters (not tested)
ansible-playbook -i inventories/my-site playbooks/install-operators.yml
# 5) Discover PVCs & configure VolSync for the selected VMs
ansible-playbook -i inventories/my-site playbooks/configure-sync.yml
# 6) Test a planned failover (namespaced)
ansible-playbook -i inventories/my-site playbooks/failover.yml All playbooks are idempotent. Re-running configure after editing the inventory will reconcile (create/update) the VolSync CRs.
The project expects a local connection (you talk to clusters via the Kubernetes API), so hosts.ini usually just targets localhost.
inventories/my-site/hosts.ini
[localhost]
127.0.0.1 ansible_connection=localinventories/my-site/group_vars/all.yml
# Identify clusters by kubeconfig+context
source:
kubeconfig: "{{ lookup('env', 'HOME') }}/.kube/source.kubeconfig" # or leave empty to use default
context: "admin/source-cluster"
destination:
kubeconfig: "{{ lookup('env', 'HOME') }}/.kube/destination.kubeconfig"
context: "admin/destination-cluster"
# Default storage classes and PVC sizing behavior at DR
storage:
default_sc: "ocs-storagecluster-ceph-rbd" # adjust to your DR class
expand_to_source_size: true # ensure dest ≥ source
# Select which namespaces are in scope (optional, otherwise VM list drives scope)
namespaces: ["my-workload-ns"]You can choose VMs explicitly or by label selectors. The roles will discover the relevant DataVolumes/PVCs for each VM and configure VolSync CRs accordingly.
Explicit list (recommended for first run):
vms:
- name: web-01
namespace: my-workload-ns
- name: db-01
namespace: my-workload-nsInstalls/ensures VolSync (and optionally MetalLB) operators exist in both clusters. Assumes OperatorHub installation via Subscription/OperatorGroup resources.
ansible-playbook -i inventories/my-site playbooks/install-operators.ymlDiscovers the DataVolumes/PVCs for each selected VM on source, then creates/updates ReplicationSource and ReplicationDestination CRs across clusters with your schedule & transport.
ansible-playbook -i inventories/my-site playbooks/configure-replication.ymlFor a controlled switchover of a namespace:
ansible-playbook -i inventories/my-site playbooks/failover.yml TODO
- Start with one namespace, one VM, and verify RPO/RTO.
- For databases, consider application‑level quiesce hooks before sync windows.
- Ensure time sync (NTP/Chrony) on nodes; VolSync cron scheduling depends on it.
- If using restic, test repo credentials and retention windows outside of prod.
- Keep storage classes compatible (block vs filesystem, access modes, volumeModes).
TODO
Apache-2.0 (see repository for details).