|
| 1 | +--- |
| 2 | +title: Local database |
| 3 | +layout: default |
| 4 | +design_doc: true |
| 5 | +revision: 1 |
| 6 | +status: proposed |
| 7 | +--- |
| 8 | + |
| 9 | +All hosts in a pool use the shared database by sending queries to |
| 10 | +the pool master. This creates a performance bottleneck as the pool |
| 11 | +size increases. All hosts in a pool receive a database backup from |
| 12 | +the master periodically, every couple of hours. This creates a |
| 13 | +reliability problem as updates may be lost if the master fails during |
| 14 | +the window before the backup. |
| 15 | + |
| 16 | +The reliability problem can be avoided by running with HA or the redo |
| 17 | +log enabled, but this is not always possible. |
| 18 | + |
| 19 | +We propose to: |
| 20 | + |
| 21 | +- adapt the existing event machinery to allow every host to maintain |
| 22 | + an up-to-date database replica; |
| 23 | +- actively cache the database locally on each host and satisfy read |
| 24 | + operations from the cache. Most database operations are reads so |
| 25 | + this should reduce the number of RPCs across the network. |
| 26 | + |
| 27 | +In a later phase we can move to a completely |
| 28 | +[distributed database](distributed-database.html). |
| 29 | + |
| 30 | +Replicating the database |
| 31 | +------------------------ |
| 32 | + |
| 33 | +We will create a database-level variant of the existing XenAPI `event.from` |
| 34 | +API. The new RPC will block until a database event is generated, and then |
| 35 | +the events will be returned using the existing "redo-log" event types. We |
| 36 | +will add a few second delay into the RPC to batch the updates. |
| 37 | + |
| 38 | +We will replace the pool database download logic with an `event.from`-like |
| 39 | +loop which fetches all the events from the master's database and applies |
| 40 | +them to the local copy. The first call will naturally return the full database |
| 41 | +contents. |
| 42 | + |
| 43 | +We will turn on the existing "in memory db cache" mechanism on all hosts, |
| 44 | +not just the master. This will be where the database updates will go. |
| 45 | + |
| 46 | +The result should be that every host will have a `/var/xapi/state.db` file, |
| 47 | +with writes going to the master first and then filtering down to all slaves. |
| 48 | + |
| 49 | +Using the replica as a cache |
| 50 | +---------------------------- |
| 51 | + |
| 52 | +We will re-use the [Disaster Recovery](../../../features/DR/DR.html) multiple |
| 53 | +database mechanism to allow slaves to access their local database. We will |
| 54 | +change the defalult database "context" to snapshot the local database, |
| 55 | +perform reads locally and write-through to the master. |
| 56 | + |
| 57 | +We will add an HTTP header to all forwarded XenAPI calls from the master which |
| 58 | +will include the current database generation count. When a forwarded XenAPI |
| 59 | +operation is received, the slave will deliberately wait until the local cache |
| 60 | +is at least as new as this, so that we always use fresh metadata for XenAPI |
| 61 | +calls (e.g. the VM.start uses the absolute latest VM memory size). |
| 62 | + |
| 63 | +We will document the new database coherence policy, i.e. that writes on a host |
| 64 | +will not immediately be seen by reads on another host. We believe that this |
| 65 | +is only a problem when we are using the database for locking and are attempting |
| 66 | +to hand over a lock to another host. We are already using XenAPI calls forwarded |
| 67 | +to the master for some of this, but may need to do a bit more of this; in |
| 68 | +particular the storage backends may need some updating. |
0 commit comments