Skip to content

Commit 0cffb60

Browse files
committed
Merge pull request #118 from johnelse/rrdd-archival-redesign
Add redesign of rrdd archival
2 parents bf1718d + 5322d50 commit 0cffb60

File tree

1 file changed

+95
-0
lines changed

1 file changed

+95
-0
lines changed

rrdd/futures/archival-redesign.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: RRDD archival redesign
3+
layout: default
4+
design_doc: true
5+
revision: 1
6+
status: proposed
7+
---
8+
9+
## Introduction
10+
11+
Current problems with rrdd:
12+
13+
* rrdd stores knowledge about whether it is running on a master or a slave
14+
15+
This determines the host to which rrdd will archive a VM's rrd when the VM's
16+
domain disappears - rrdd will always try to archive to the master. However,
17+
when a host joins a pool as a slave rrdd is not restarted so this knowledge is
18+
out of date. When a VM shuts down on the slave rrdd will archive the rrd
19+
locally. When starting this VM again the master xapi will attempt to push any
20+
locally-existing rrd to the host on which the VM is being started, but since
21+
no rrd archive exists on the master the slave rrdd will end up creating a new
22+
rrd and the previous rrd will be lost.
23+
24+
* rrdd handles rebooting VMs unpredictably
25+
26+
When rebooting a VM, there is a chance rrdd will attempt to update that VM's rrd
27+
during the brief period when there is no domain for that VM. If this happens,
28+
rrdd will archive the VM's rrd to the master, and then create a new rrd for the
29+
VM when it sees the new domain. If rrdd doesn't attempt to update that VM's rrd
30+
during this period, rrdd will continue to add data for the new domain to the old
31+
rrd.
32+
33+
## Proposal
34+
35+
To solve these problems, we will remove some of the intelligence from rrdd and
36+
make it into more of a slave process of xapi. This will entail removing all
37+
knowledge from rrdd of whether it is running on a master or a slave, and also
38+
modifying rrdd to only start monitoring a VM when it is told to, and only
39+
archiving an rrd (to a specified address) when it is told to. This matches the
40+
way xenopsd only manages domains which it has been told to manage.
41+
42+
## Design
43+
44+
For most VM lifecycle operations, xapi and rrdd processes (sometimes across more
45+
than one host) cooperate to start or stop recording a VM's metrics and/or to
46+
restore or backup the VM's archived metrics. Below we will describe, for each
47+
relevant VM operation, how the VM's rrd is currently handled, and how we propose
48+
it will be handled after the redesign.
49+
50+
#### VM.destroy
51+
52+
The master xapi makes a remove_rrd call to the local rrdd, which causes rrdd to
53+
to delete the VM's archived rrd from disk. This behaviour will remain unchanged.
54+
55+
#### VM.start(\_on) and VM.resume(\_on)
56+
57+
The master xapi makes a push_rrd call to the local rrdd, which causes rrdd to
58+
send any locally-archived rrd for the VM in question to the rrdd of the host on
59+
which the VM is starting. This behaviour will remain unchanged.
60+
61+
#### VM.shutdown and VM.suspend
62+
63+
Every update cycle rrdd compares its list of registered VMs to the list of
64+
domains actually running on the host. Any registered VMs which do not have a
65+
corresponding domain have their rrds archived to the rrdd running on the host
66+
believed to be the master. We will change this behaviour by stopping rrdd from
67+
doing the archiving itself; instead we will expose a new function in rrdd's
68+
interface:
69+
70+
```
71+
val archive_rrd : vm_uuid:string -> remote_address:string -> unit
72+
```
73+
74+
This will cause rrdd to remove the specified rrd from its table of registered
75+
VMs, and archive the rrd to the specified host. When a VM has finished shutting
76+
down or suspending, the xapi process on the host on which the VM was running
77+
will call archive_rrd to ask the local rrdd to archive back to the master rrdd.
78+
79+
### VM.reboot
80+
81+
Removing rrdd's ability to automatically archive the rrds for disappeared
82+
domains will have the bonus effect of fixing how the rrds of rebooting VMs are
83+
handled, as we don't want the rrds of rebooting VMs to be archived at all.
84+
85+
#### VM.checkpoint
86+
87+
This will be handled automatically, as internally VM.checkpoint carries out a
88+
VM.suspend followed by a VM.resume.
89+
90+
#### VM.pool_migrate and VM.migrate_send
91+
92+
The source host's xapi makes a migrate_rrd call to the local rrd, with a
93+
destination address and an optional session ID. The session ID is only required
94+
for cross-pool migration. The local rrdd sends the rrd for that VM to the
95+
destination host's rrdd as an HTTP PUT. This behaviour will remain unchanged.

0 commit comments

Comments
 (0)