Skip to content

Commit 7093851

Browse files
committed
Merge pull request #57 from djs55/snapshot
snapshots: put the revert implementation details in a subsection
2 parents a5a1dcd + 505aba0 commit 7093851

File tree

9 files changed

+66
-3
lines changed

9 files changed

+66
-3
lines changed
3.26 KB
Binary file not shown.

features/snapshots/coalesce1.png

36.3 KB
Loading
3.22 KB
Binary file not shown.

features/snapshots/coalesce2.png

36.2 KB
Loading
2.74 KB
Binary file not shown.

features/snapshots/coalesce3.png

29.8 KB
Loading
3.57 KB
Binary file not shown.

features/snapshots/lun-trees.png

48.4 KB
Loading

features/snapshots/snapshots.md

Lines changed: 66 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,10 @@ conform to an api (the SMAPI) which has operations including
2929
- vdi_snapshot: create a snapshot of a disk
3030

3131

32-
Example vhd implementation
33-
==========================
32+
File-based vhd implementation
33+
=============================
3434

35-
The existing "EXT" and "NFS" Xapi SM plugins store disk data in
35+
The existing "EXT" and "NFS" file-based Xapi SM plugins store disk data in
3636
trees of .vhd files as in the following diagram:
3737

3838
![Relationship between VDIs and vhd files](vhd-trees.png)
@@ -41,6 +41,32 @@ From the XenAPI point of view, we have one current VDI and a set of snapshots,
4141
each taken at a different point in time. These VDIs correspond to leaf vhds in
4242
a tree stored on disk, where the non-leaf nodes contain all the shared blocks.
4343

44+
The vhd files are always thinly-provisioned which means they only allocate new
45+
blocks on an as-needed basis. The snapshot leaf vhd files only contain vhd
46+
metadata and therefore are very small (a few KiB). The parent nodes containing
47+
the shared blocks only contain the shared blocks. The current leaf initially
48+
contains only the vhd metadata and therefore is very small (a few KiB) and will
49+
only grow when the VM writes blocks.
50+
51+
File-based vhd implementations are a good choice if a "gold image" snapshot
52+
is going to be cloned lots of times.
53+
54+
Block-based vhd implementation
55+
==============================
56+
57+
The existing "LVM", "LVMoISCSI" and "LVMoHBA" block-based Xapi SM plugins store
58+
disk data in trees of .vhd files contained within LVM logical volumes:
59+
60+
![Relationship between VDIs and LVs containing vhd data](lun-trees.png)
61+
62+
Non-snapshot VDIs are always stored full size (a.k.a. thickly-provisioned).
63+
When parent nodes are created they are automatically shrunk to the minimum size
64+
needed to store the shared blocks. The LVs corresponding with snapshot VDIs
65+
only contain vhd metadata and by default consume 8MiB. Note: this is different
66+
to VDI.clones which are stored full size.
67+
68+
Block-based vhd implementations are not a good choice if a "gold image" snapshot
69+
is going to be cloned lots of times, since each clone will be stored full size.
4470

4571
Hypothetical LUN implementation
4672
===============================
@@ -85,9 +111,46 @@ We have fields that help navigate the new objects: ```VM.snapshot_of```,
85111
and ```VDI.snapshot_of```. These, like you would expect, point to the
86112
relevant other objects.
87113

114+
Deleting VM snapshots
115+
=====================
116+
117+
When a snapshot is deleted Xapi calls the SM API `vdi_delete`. The Xapi SM
118+
plugins which use vhd format data do not reclaim space immediately; instead
119+
they mark the corresponding vhd leaf node as "hidden" and, at some point later,
120+
run a garbage collector process.
121+
122+
The garbage collector will first determine whether a "coalesce" should happen i.e.
123+
whether any parent nodes have only one child i.e. the "shared" blocks are only
124+
"shared" with one other node. In the following example the snapshot delete leaves
125+
such a parent node and the coalesce process copies blocks from the redundant
126+
parent's only child into the parent:
127+
128+
![We coalesce parent blocks into grand parent nodes](coalesce1.png)
129+
130+
Note that if the vhd data is being stored in LVM, then the parent node will
131+
have had to be expanded to full size to accommodate the writes. Unfortunately
132+
this means the act of reclaiming space actually consumes space itself, which
133+
means it is important to never completely run out of space in such an SR.
134+
135+
Once the blocks have been copied, we can now cut one of the parents out of the
136+
tree by relinking its children into their grandparent:
137+
138+
![Relink children into grand parent](coalesce2.png)
139+
140+
Finally the garbage collector can remove unused vhd files / LVM LVs:
141+
142+
![Clean up](coalesce3.png)
143+
88144
Reverting VM snapshots
89145
======================
90146

147+
The XenAPI call `VM.revert` overwrites the VM metadata with the snapshot VM
148+
metadata, deletes the current VDIs and replaces them with clones of the
149+
snapshot VDIs. Note there is no "vdi_revert" in the SMAPI.
150+
151+
Revert implementation details
152+
-----------------------------
153+
91154
This is the process by which we revert a VM to a snapshot. The
92155
first thing to notice is that there is some logic that is called
93156
from [message_forwarding.ml](https://github.com/xapi-project/xen-api/blob/ce6d3f276f0a56ef57ebcf10f45b0f478fd70322/ocaml/xapi/message_forwarding.ml#L1528),

0 commit comments

Comments
 (0)