OpenShift Virtualization Async DR with VolSync (Ansible-driven)

⚠️ Disclaimer: This README is Work In Progress. Please verify all commands, variable names, and steps before using it in any environment.

OpenShift Virtualization Async DR with VolSync (Ansible-driven)

Orchestrate asynchronous disaster recovery (DR) for KubeVirt VMs across OpenShift clusters using VolSync. This project ships Ansible roles and playbooks that:

install the required operators (VolSync, optional MetalLB),

discover VM disks and create ReplicationSource/ReplicationDestination,

schedule periodic syncs, pick up new VM disks via re-scans,

capture sanitized VM specs on the destination cluster for like‑for‑like restore (CPU, RAM, disks, NICs, MACs), and

perform failover by pausing RD and restoring the captured VM spec pointing to replicated PVCs.

Architecture

Goal. Keep one or more namespaces / VMs on a source OpenShift cluster asynchronously replicated to a destination cluster using VolSync. Dataflow is PVC→PVC with a transport (typically restic to an object store or rsync direct), scheduled by Kubernetes CronJobs generated by VolSync CRs.

What the automation does.

Installs VolSync (and optionally MetalLB) via OperatorHub API resources.
Discovers a VM’s data volumes / PVCs on the source cluster and generates a matching ReplicationSource.
Creates a ReplicationDestination on the destination cluster with compatible storageClass/size.
Schedules periodic syncs and optional retention.
Captures/exports a sanitized VM manifest on destination and stores it for DR (same CPU, memory, disks, network interfaces, MACs and NADs when possible).
For failover, pauses destination RD, performs a final sync/promote, rebinds PVCs, and recreates VM from the captured manifest.

Note: Live migration of VolSync‑replicated disks isn’t a thing; this is asynchronous, point‑in‑time replication. RPO≈your schedule; RTO depends on PVC promotion + VM restore.

Prerequisites

Workstation

Ansible Core and Python 3 with Kubernetes client libs:

dnf install -y ansible-core git python3-pip  # or: apt/yum
pip3 install kubernetes

oc CLI installed and logged into both clusters at least once (to seed kubeconfigs/contexts) or provide paths to kubeconfig files in the inventory.

Clusters

Two OpenShift clusters: source (primary) and destination (DR).
Working storage classes on both sides with sufficient capacity.
Object storage credentials if using the restic transport (S3/compatible) — recommended for geo DR.

Repository layout

.
├── ansible.cfg
├── requirements.yml
├── inventories/
│   └── lab/           # example inventory
├── playbooks/         # task entry points (install, discover, configure, capture, failover, etc.)
└── roles/             # role-ized logic used by the playbooks

Tip: Keep your own inventory (e.g., inventories/prod/) separate from the sample lab one.

Quick start

# 1) Clone
git clone https://github.com/linusali/ocp-virt-async-dr
cd ocp-virt-async-dr

# 2) Prepare Python and Ansible bits (once)
pip3 install kubernetes
ansible-galaxy collection install -r requirements.yml

# 3) Copy the sample inventory and edit
cp -r inventories/lab inventories/my-site
$EDITOR inventories/my-site/group_vars/all.yml   # see sections below
$EDITOR inventories/my-site/hosts.ini            # set contexts/kubeconfigs

# 4) Install operators on both clusters (not tested)
ansible-playbook -i inventories/my-site playbooks/install-operators.yml

# 5) Discover PVCs & configure VolSync for the selected VMs
ansible-playbook -i inventories/my-site playbooks/configure-sync.yml

# 6) Test a planned failover (namespaced)
ansible-playbook -i inventories/my-site playbooks/failover.yml

All playbooks are idempotent. Re-running configure after editing the inventory will reconcile (create/update) the VolSync CRs.

Inventory and variables

The project expects a local connection (you talk to clusters via the Kubernetes API), so hosts.ini usually just targets localhost.

Minimal inventory

inventories/my-site/hosts.ini

[localhost]
127.0.0.1 ansible_connection=local

inventories/my-site/group_vars/all.yml

# Identify clusters by kubeconfig+context
source:
  kubeconfig: "{{ lookup('env', 'HOME') }}/.kube/source.kubeconfig"  # or leave empty to use default
  context:    "admin/source-cluster"

destination:
  kubeconfig: "{{ lookup('env', 'HOME') }}/.kube/destination.kubeconfig"
  context:    "admin/destination-cluster"

# Default storage classes and PVC sizing behavior at DR
storage:
  default_sc: "ocs-storagecluster-ceph-rbd"  # adjust to your DR class
  expand_to_source_size: true                 # ensure dest ≥ source

# Select which namespaces are in scope (optional, otherwise VM list drives scope)
namespaces: ["my-workload-ns"]

Defining which VMs to replicate

You can choose VMs explicitly or by label selectors. The roles will discover the relevant DataVolumes/PVCs for each VM and configure VolSync CRs accordingly.

Explicit list (recommended for first run):

vms:
  - name: web-01
    namespace: my-workload-ns

  - name: db-01
    namespace: my-workload-ns

Typical workflows

Install operators

Installs/ensures VolSync (and optionally MetalLB) operators exist in both clusters. Assumes OperatorHub installation via Subscription/OperatorGroup resources.

ansible-playbook -i inventories/my-site playbooks/install-operators.yml

Discover VM disks and configure replication

Discovers the DataVolumes/PVCs for each selected VM on source, then creates/updates ReplicationSource and ReplicationDestination CRs across clusters with your schedule & transport.

ansible-playbook -i inventories/my-site playbooks/configure-replication.yml

Planned failover (promote DR)

For a controlled switchover of a namespace:

ansible-playbook -i inventories/my-site playbooks/failover.yml

Running with AWX/Automation Controller

TODO

Operational tips

Start with one namespace, one VM, and verify RPO/RTO.
For databases, consider application‑level quiesce hooks before sync windows.
Ensure time sync (NTP/Chrony) on nodes; VolSync cron scheduling depends on it.
If using restic, test repo credentials and retention windows outside of prod.
Keep storage classes compatible (block vs filesystem, access modes, volumeModes).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenShift Virtualization Async DR with VolSync (Ansible-driven)

Table of contents

Architecture

Prerequisites

Workstation

Clusters

Repository layout

Quick start

Inventory and variables

Minimal inventory

Defining which VMs to replicate

Typical workflows

Install operators

Discover VM disks and configure replication

Planned failover (promote DR)

Running with AWX/Automation Controller

Operational tips

Troubleshooting

FAQ

TODO

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
inventories/lab		inventories/lab
playbooks		playbooks
roles		roles
README.md		README.md
ansible.cfg		ansible.cfg
requirements.yml		requirements.yml

linusali/ocp-virt-async-dr

Folders and files

Latest commit

History

Repository files navigation

OpenShift Virtualization Async DR with VolSync (Ansible-driven)

Table of contents

Architecture

Prerequisites

Workstation

Clusters

Repository layout

Quick start

Inventory and variables

Minimal inventory

Defining which VMs to replicate

Typical workflows

Install operators

Discover VM disks and configure replication

Planned failover (promote DR)

Running with AWX/Automation Controller

Operational tips

Troubleshooting

FAQ

TODO

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages