Skip to content

Commit 143fbda

Browse files
committed
Merge pull request #70 from djs55/local-database
Add a simple local database design
2 parents 1a279c9 + 8725118 commit 143fbda

File tree

1 file changed

+68
-0
lines changed

1 file changed

+68
-0
lines changed
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: Local database
3+
layout: default
4+
design_doc: true
5+
revision: 1
6+
status: proposed
7+
---
8+
9+
All hosts in a pool use the shared database by sending queries to
10+
the pool master. This creates a performance bottleneck as the pool
11+
size increases. All hosts in a pool receive a database backup from
12+
the master periodically, every couple of hours. This creates a
13+
reliability problem as updates may be lost if the master fails during
14+
the window before the backup.
15+
16+
The reliability problem can be avoided by running with HA or the redo
17+
log enabled, but this is not always possible.
18+
19+
We propose to:
20+
21+
- adapt the existing event machinery to allow every host to maintain
22+
an up-to-date database replica;
23+
- actively cache the database locally on each host and satisfy read
24+
operations from the cache. Most database operations are reads so
25+
this should reduce the number of RPCs across the network.
26+
27+
In a later phase we can move to a completely
28+
[distributed database](distributed-database.html).
29+
30+
Replicating the database
31+
------------------------
32+
33+
We will create a database-level variant of the existing XenAPI `event.from`
34+
API. The new RPC will block until a database event is generated, and then
35+
the events will be returned using the existing "redo-log" event types. We
36+
will add a few second delay into the RPC to batch the updates.
37+
38+
We will replace the pool database download logic with an `event.from`-like
39+
loop which fetches all the events from the master's database and applies
40+
them to the local copy. The first call will naturally return the full database
41+
contents.
42+
43+
We will turn on the existing "in memory db cache" mechanism on all hosts,
44+
not just the master. This will be where the database updates will go.
45+
46+
The result should be that every host will have a `/var/xapi/state.db` file,
47+
with writes going to the master first and then filtering down to all slaves.
48+
49+
Using the replica as a cache
50+
----------------------------
51+
52+
We will re-use the [Disaster Recovery](../../../features/DR/DR.html) multiple
53+
database mechanism to allow slaves to access their local database. We will
54+
change the defalult database "context" to snapshot the local database,
55+
perform reads locally and write-through to the master.
56+
57+
We will add an HTTP header to all forwarded XenAPI calls from the master which
58+
will include the current database generation count. When a forwarded XenAPI
59+
operation is received, the slave will deliberately wait until the local cache
60+
is at least as new as this, so that we always use fresh metadata for XenAPI
61+
calls (e.g. the VM.start uses the absolute latest VM memory size).
62+
63+
We will document the new database coherence policy, i.e. that writes on a host
64+
will not immediately be seen by reads on another host. We believe that this
65+
is only a problem when we are using the database for locking and are attempting
66+
to hand over a lock to another host. We are already using XenAPI calls forwarded
67+
to the master for some of this, but may need to do a bit more of this; in
68+
particular the storage backends may need some updating.

0 commit comments

Comments
 (0)