-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
What steps did you take and what happened:
VolumePolicy allow to control how velero handle volumes matching certains conditions by supporting three actions, that are documented like this :
- skip: don’t back up the action matching volume’s data
- snapshot: back up the action matching volume’s data by the snapshot way
- fs-backup: back up the action matching volume’s data by the fs-backup way
Everything seems to be working fine during the backup.
But when we do a restore the result is very strange for the PVCs backuped with the skip action. The restore recreate the PVC (ok), but recreate the bound PV as it was during the backup, ie that all the information of the PV are restored as it. There is no dynamic provisioning of PV like with others actions. So the PV has the same VolumeHandle that the backuped PV. This is very strange, not intuitive and dangerous.
- if we restore a PVC with a Delete reclaim policy, we deleted the PVCs before so that velero can restore it : but by doing that the bound PV is also deleted and the underlying storage also. So the PV is restored identically as it was backuped so its VolumeHandle references a cloud resource (like azure disk, azure file share...) that no longer exists. Pods using this PVC will never start as CSI driver could not attach/mount its storage.
- if we test a DRP and restore a backup issued of a production cluster in a cluster dedicated to the DRP, then the PVCs with the
skipaction are present in each cluster but their bound PVs have the same VolumeHandle so they use the same underlying storage. This can cause corruption of data as two workloads use the same storage to do different works. And if the DRP cluster is cleared and deleted, the PVCs, PVs and underlying storages will be also be deleted. So the production cluster will have some PVs with no underlying storage causing the pods using them to no longer work. - if we want to restore the backup in environment where the underlying storage is not available (for example in another cloud region or in another provider), the backup will fail as these PVs will have some VolumeHandle that reference storage not available.
What did you expect to happen:
I think that the behavior of the skip policy is dangerous and should not work like that. The documentation indicates precisely don’t back up the action matching volume’s data : so it should just not backup data, and when restoring a new PV should be provisioned, like with others actions, and no data should be restored (as none has been backuped).
So restores will work fine and no risk of data corruption or loss could occur.
Environment:
- Velero version (use
velero version): 1.16.1 - Kubernetes version (use
kubectl version): 1.33.3 - Kubernetes installer & version: AKS
- Cloud provider or hardware configuration: Azure
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
- 👍 for "I would like to see this bug fixed as soon as possible"
- 👎 for "There are more important bugs to focus on right now"