Skip to content

Behavior of skip volume policy is strange/problematic #9318

@fredgate

Description

@fredgate

What steps did you take and what happened:
VolumePolicy allow to control how velero handle volumes matching certains conditions by supporting three actions, that are documented like this :

  • skip: don’t back up the action matching volume’s data
  • snapshot: back up the action matching volume’s data by the snapshot way
  • fs-backup: back up the action matching volume’s data by the fs-backup way

Everything seems to be working fine during the backup.
But when we do a restore the result is very strange for the PVCs backuped with the skip action. The restore recreate the PVC (ok), but recreate the bound PV as it was during the backup, ie that all the information of the PV are restored as it. There is no dynamic provisioning of PV like with others actions. So the PV has the same VolumeHandle that the backuped PV. This is very strange, not intuitive and dangerous.

  • if we restore a PVC with a Delete reclaim policy, we deleted the PVCs before so that velero can restore it : but by doing that the bound PV is also deleted and the underlying storage also. So the PV is restored identically as it was backuped so its VolumeHandle references a cloud resource (like azure disk, azure file share...) that no longer exists. Pods using this PVC will never start as CSI driver could not attach/mount its storage.
  • if we test a DRP and restore a backup issued of a production cluster in a cluster dedicated to the DRP, then the PVCs with the skip action are present in each cluster but their bound PVs have the same VolumeHandle so they use the same underlying storage. This can cause corruption of data as two workloads use the same storage to do different works. And if the DRP cluster is cleared and deleted, the PVCs, PVs and underlying storages will be also be deleted. So the production cluster will have some PVs with no underlying storage causing the pods using them to no longer work.
  • if we want to restore the backup in environment where the underlying storage is not available (for example in another cloud region or in another provider), the backup will fail as these PVs will have some VolumeHandle that reference storage not available.

What did you expect to happen:
I think that the behavior of the skip policy is dangerous and should not work like that. The documentation indicates precisely don’t back up the action matching volume’s data : so it should just not backup data, and when restoring a new PV should be provisioned, like with others actions, and no data should be restored (as none has been backuped).
So restores will work fine and no risk of data corruption or loss could occur.

Environment:

  • Velero version (use velero version): 1.16.1
  • Kubernetes version (use kubectl version): 1.33.3
  • Kubernetes installer & version: AKS
  • Cloud provider or hardware configuration: Azure

Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"

Metadata

Metadata

Assignees

Labels

Good first issueLooking to contribute to Velero? Issues with this label might be a great place to start!Needs infoWaiting for informationRestore

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions