From c03094364ecc71fed1b6f85a3df45721e144f628 Mon Sep 17 00:00:00 2001 From: "Hoang, Phuong" Date: Mon, 19 Oct 2020 23:17:53 -0400 Subject: [PATCH] Specify timeout for plugins of specific Kind. Signed-off-by: Hoang, Phuong --- design/backup-resources-order.md | 40 -------------------------------- design/plugins-timeout.md | 34 +++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 40 deletions(-) delete mode 100644 design/backup-resources-order.md create mode 100644 design/plugins-timeout.md diff --git a/design/backup-resources-order.md b/design/backup-resources-order.md deleted file mode 100644 index 46521b76a1..0000000000 --- a/design/backup-resources-order.md +++ /dev/null @@ -1,40 +0,0 @@ -## Backup Resources Order -This document proposes a solution that allows user to specify a backup order for resources of specific resource type. - -## Background -During backup process, user may need to back up resources of specific type in some specific order to ensure the resources were backup properly because these resources are related and ordering might be required to preserve the consistency for the apps to recover itself �from the backup image -(Ex: primary-secondary database pods in a cluster). - -## Goals -- Enable user to specify an order of back up resources belong to specific resource type - -## Alternatives Considered -- Use a plugin to backup an resources and all the sub resources. For example use a plugin for StatefulSet and backup pods belong to the StatefulSet in specific order. This plugin solution is not generic and requires plugin for each resource type. - -## High-Level Design -User will specify a map of resource type to list resource names (separate by semicolons). Each name will be in the format "namespaceName/resourceName" to enable ordering accross namespaces. Based on this map, the resources of each resource type will be sorted by the order specified in the list of resources. If a resource instance belong to that specific type but its name is not in the order list, then it will be put behind other resources that are in the list. - -### Changes to BackupSpec -Add new field to BackupSpec - - type BackupSpec struct { - ... - // OrderedResources contains a list of key-value pairs that represent the order - // of backup of resources that belong to specific resource type - // +optional - // +nullable - OrderedResources map[string]string - } - -### Changes to itemCollector -Function getResourceItems collects all items belong to a specific resource type. This function will be enhanced to check with the map to see whether the OrderedResources has specified the order for this resource type. If such order exists, then sort the items by such order being process before return. - -### Changes to velero CLI -Add new flag "--ordered-resources" to Velero backup create command which takes a string of key-values pairs which represents the map between resource type and the order of the items of such resource type. Key-value pairs are separated by semicolon, items in the value are separated by commas. - -Example: ->velero backup create mybackup --ordered-resources "pod=ns1/pod1,ns1/pod2;persistentvolumeclaim=n2/slavepod,ns2/primarypod" - -## Open Issues -- In the CLI, the design proposes to use commas to separate items of a resource type and semicolon to separate key-value pairs. This follows the convention of using commas to separate items in a list (For example: --include-namespaces ns1,ns2). However, the syntax for map in labels and annotations use commas to seperate key-value pairs. So it introduces some inconsistency. -- For pods that managed by Deployment or DaemonSet, this design may not work because the pods' name is randomly generated and if pods are restarted, they would have different names so the Backup operation may not consider the restarted pods in the sorting algorithm. This problem will be addressed when we enhance the design to use regular expression to specify the OrderResources instead of exact match. diff --git a/design/plugins-timeout.md b/design/plugins-timeout.md new file mode 100644 index 0000000000..626761f1db --- /dev/null +++ b/design/plugins-timeout.md @@ -0,0 +1,34 @@ +## Timeout for Plugins +This document proposes a solution that allows user to specify timeout during backup for all plugin executions on an object of specific resource types. + +## Background +The execution of plugins in either Backup or Restore datapath is blocking call. If the plugin code does not behave as expected and return within certain period of time, the application may fail. For example in App Consistent backup, the application pod will be quiesced while plugin being executed so user operations are blocked until the plugin execution completes and pod is unquiesced. If for some reason the plugin execution takes longer than certain threshold or hang, the application starts to fail user operations. + +## Goals +- Enable user to specify timeout during backup for all plugin executions on an object of specific resource types. + +## Non-goals +- Restore data path may require similar timeout but it is beyond the scope of this change due to the complexity of dependency between restoring objects. + +## Alternatives Considered +- Set timeout for the entire Velero Backup or Restore to avoid the hanging of the backup or restore workflow. This only helps in some scenario but it will not help the AppConsistent pod being quiesced for too long. +- Set timeout for each plugin being executed on an object. In cases that multiple plugins being executed on an object, the total time of these plugin executions may be still greater than the limit even though individual timeout of each plugin would be smaller than timeout. + + +## High-Level Design +- Enhance the BackupSpec to contain the map of resource type (Kind) to the timeout of executing all plugins being executed on an object of that resource type. + type BackupSpec struct { + ... + TotalPluginsTimeouts map[string]int `json:"totalPluginsTimeouts,omitempty"` //map of resource type to total plugins timeout (in milliseconds) + } +- During backup an item, if the item's type match with the type specify in the TotalPluginsTimeouts map, all the plugins will have to be executed within that timeout. If item's type is not in the map, then execute the plugins without timeout. + +### Changes to item_backupper.go +- In itemBackupper.backupItem, use the resource type of the item to look up the TotalPluginsTimeouts. If the timeout exists, create a goroutine to run executeActions to make sure that function would be done within the specified timeout or cancel it. If timeout does not exist, the run executeAction directly as before. + +### Changes to velero CLI +Add new flag "--total-plugins-timeouts" to Velero backup create command which takes a string of key-values pairs which represents the map between resource type and the total plugins timeout of such resource type. The key-value pairs are separated by semicolon. + +Example: +>velero backup create mybackup --total-plugins-timeouts "persistentvolumeclaims=30000,pods=60000" +