Commit 1849b89
CA-273775: remove race in vgpu_receiver_sync during vm migration
During a VM migration, the new receive_vgpu thread races with the original
receive_memory thread in the receiving host. A new 'Synchronisation
point 1-vgpu' was meant to indicate to the sending host that the table
vgpu_receiver_sync had been initialised in the receiving host, and therefore
it should be safe for both sending and receiving hosts to go past the original
'Synchronisation point 1' and start streaming the VM state. The problem with
going past this original point 1 is that the table vgpu_receiver_sync is used
and if still uninitialised it would result in the migration proceeding without
the vgpu stream information.
However, the original 'Synchronisation point 1' only blocks the sending host,
not the receiving host. This means that the new 'Synchronisation point 1-vgpu'
signal was just half of the necessary signalling infrastructure to protect the
use of the table vgpu_receiver_sync, asserting only to the sending host that the
original 'Synchronisation point 1' comes after 'Synchronisation point 1-vgpu'.
The receiving host's receive_memory thread is still free to race after
'Synchronisation point 1' before the 'Synchronisation point 1-vgpu' is reached
on the receiving host's receive_vgpu thread.
This patch adds a new 1-vgpu ACK signal, sent by the sending host just after
the 'Synchronisation point 1-vgpu' is reached, which the receiving host's
receive_memory thread will wait, before the table vgpu_receiver_sync is used.
Therefore, after this patch, both the sending and the receiving host will know
that the table vgpu_receiver_sync has been initialised and is ready to be used
after they get past the 1-vgpu ACK point.
This is an invasive change because it changes the xenopsd VM migration protocol,
and incompatible with the previous protocol: VMs using previous xenopsd versions
cannot be migrated after this change. Therefore, the change only affects the VM
migration protocol when a VGPU is present. This means that:
- if a VGPU is not present, the VM migration still works from older xenopsd, so
it's backwards compatible.
- if a VGPU is present, the new protocol will not work with older xenopsd, so
it's not backwards compatible. Since this VGPU-migration is a new feature, this
is not a problem, because it's not present in a supported manner in older
versions of xenopsd.
Signed-off-by: Marcus Granado <[email protected]>1 parent a619065 commit 1849b89
1 file changed
+17
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1660 | 1660 | | |
1661 | 1661 | | |
1662 | 1662 | | |
| 1663 | + | |
| 1664 | + | |
1663 | 1665 | | |
1664 | 1666 | | |
1665 | 1667 | | |
| |||
1682 | 1684 | | |
1683 | 1685 | | |
1684 | 1686 | | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
1685 | 1702 | | |
1686 | 1703 | | |
1687 | 1704 | | |
| |||
0 commit comments