|
| 1 | +--- |
| 2 | +title: RRDD archival redesign |
| 3 | +layout: default |
| 4 | +design_doc: true |
| 5 | +revision: 1 |
| 6 | +status: proposed |
| 7 | +--- |
| 8 | + |
| 9 | +## Introduction |
| 10 | + |
| 11 | +Current problems with rrdd: |
| 12 | + |
| 13 | +* rrdd stores knowledge about whether it is running on a master or a slave |
| 14 | + |
| 15 | +This determines the host to which rrdd will archive a VM's rrd when the VM's |
| 16 | +domain disappears - rrdd will always try to archive to the master. However, |
| 17 | +when a host joins a pool as a slave rrdd is not restarted so this knowledge is |
| 18 | +out of date. When a VM shuts down on the slave rrdd will archive the rrd |
| 19 | +locally. When starting this VM again the master xapi will attempt to push any |
| 20 | +locally-existing rrd to the host on which the VM is being started, but since |
| 21 | +no rrd archive exists on the master the slave rrdd will end up creating a new |
| 22 | +rrd and the previous rrd will be lost. |
| 23 | + |
| 24 | +* rrdd handles rebooting VMs unpredictably |
| 25 | + |
| 26 | +When rebooting a VM, there is a chance rrdd will attempt to update that VM's rrd |
| 27 | +during the brief period when there is no domain for that VM. If this happens, |
| 28 | +rrdd will archive the VM's rrd to the master, and then create a new rrd for the |
| 29 | +VM when it sees the new domain. If rrdd doesn't attempt to update that VM's rrd |
| 30 | +during this period, rrdd will continue to add data for the new domain to the old |
| 31 | +rrd. |
| 32 | + |
| 33 | +## Proposal |
| 34 | + |
| 35 | +To solve these problems, we will remove some of the intelligence from rrdd and |
| 36 | +make it into more of a slave process of xapi. This will entail removing all |
| 37 | +knowledge from rrdd of whether it is running on a master or a slave, and also |
| 38 | +modifying rrdd to only start monitoring a VM when it is told to, and only |
| 39 | +archiving an rrd (to a specified address) when it is told to. This matches the |
| 40 | +way xenopsd only manages domains which it has been told to manage. |
| 41 | + |
| 42 | +## Design |
| 43 | + |
| 44 | +For most VM lifecycle operations, xapi and rrdd processes (sometimes across more |
| 45 | +than one host) cooperate to start or stop recording a VM's metrics and/or to |
| 46 | +restore or backup the VM's archived metrics. Below we will describe, for each |
| 47 | +relevant VM operation, how the VM's rrd is currently handled, and how we propose |
| 48 | +it will be handled after the redesign. |
| 49 | + |
| 50 | +#### VM.destroy |
| 51 | + |
| 52 | +The master xapi makes a remove_rrd call to the local rrdd, which causes rrdd to |
| 53 | +to delete the VM's archived rrd from disk. This behaviour will remain unchanged. |
| 54 | + |
| 55 | +#### VM.start(\_on) and VM.resume(\_on) |
| 56 | + |
| 57 | +The master xapi makes a push_rrd call to the local rrdd, which causes rrdd to |
| 58 | +send any locally-archived rrd for the VM in question to the rrdd of the host on |
| 59 | +which the VM is starting. This behaviour will remain unchanged. |
| 60 | + |
| 61 | +#### VM.shutdown and VM.suspend |
| 62 | + |
| 63 | +Every update cycle rrdd compares its list of registered VMs to the list of |
| 64 | +domains actually running on the host. Any registered VMs which do not have a |
| 65 | +corresponding domain have their rrds archived to the rrdd running on the host |
| 66 | +believed to be the master. We will change this behaviour by stopping rrdd from |
| 67 | +doing the archiving itself; instead we will expose a new function in rrdd's |
| 68 | +interface: |
| 69 | + |
| 70 | +``` |
| 71 | +val archive_rrd : vm_uuid:string -> remote_address:string -> unit |
| 72 | +``` |
| 73 | + |
| 74 | +This will cause rrdd to remove the specified rrd from its table of registered |
| 75 | +VMs, and archive the rrd to the specified host. When a VM has finished shutting |
| 76 | +down or suspending, the xapi process on the host on which the VM was running |
| 77 | +will call archive_rrd to ask the local rrdd to archive back to the master rrdd. |
| 78 | + |
| 79 | +### VM.reboot |
| 80 | + |
| 81 | +Removing rrdd's ability to automatically archive the rrds for disappeared |
| 82 | +domains will have the bonus effect of fixing how the rrds of rebooting VMs are |
| 83 | +handled, as we don't want the rrds of rebooting VMs to be archived at all. |
| 84 | + |
| 85 | +#### VM.checkpoint |
| 86 | + |
| 87 | +This will be handled automatically, as internally VM.checkpoint carries out a |
| 88 | +VM.suspend followed by a VM.resume. |
| 89 | + |
| 90 | +#### VM.pool_migrate and VM.migrate_send |
| 91 | + |
| 92 | +The source host's xapi makes a migrate_rrd call to the local rrd, with a |
| 93 | +destination address and an optional session ID. The session ID is only required |
| 94 | +for cross-pool migration. The local rrdd sends the rrd for that VM to the |
| 95 | +destination host's rrdd as an HTTP PUT. This behaviour will remain unchanged. |
0 commit comments