Skip to content

Commit 27d1c29

Browse files
committed
seize: fix pause devices for frozen containers
The container checkpointing procedure in Kubernetes freezes running containers to create a consistent snapshot of both the runtime state and the rootfs of the container. However, when checkpointing a GPU container, it must be unfrozen before invoking `cuda-checkpoint`. This is achieved in `prepare_freezer_for_interrupt_only_mode()`, which needs to be called before the `PAUSE_DEVICES` hook. Fixes: checkpoint-restore#2514 Signed-off-by: Radostin Stoyanov <[email protected]>
1 parent f6c14ee commit 27d1c29

File tree

1 file changed

+14
-5
lines changed

1 file changed

+14
-5
lines changed

criu/seize.c

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1060,22 +1060,31 @@ int collect_pstree(void)
10601060
*/
10611061
alarm(opts.timeout);
10621062

1063-
ret = run_plugins(PAUSE_DEVICES, pid);
1064-
if (ret < 0 && ret != -ENOTSUP) {
1065-
goto err;
1066-
}
1067-
10681063
if (opts.freeze_cgroup && cgroup_version())
10691064
goto err;
10701065

10711066
pr_debug("Detected cgroup V%d freezer\n", cgroup_v2 ? 2 : 1);
10721067

10731068
if (opts.freeze_cgroup && !compel_interrupt_only_mode) {
1069+
ret = run_plugins(PAUSE_DEVICES, pid);
1070+
if (ret < 0 && ret != -ENOTSUP) {
1071+
goto err;
1072+
}
1073+
10741074
if (freeze_processes())
10751075
goto err;
10761076
} else {
10771077
if (opts.freeze_cgroup && prepare_freezer_for_interrupt_only_mode())
10781078
goto err;
1079+
1080+
/* We need to call PAUSE_DEVICES after prepare_freezer_for_interrupt_only_mode()
1081+
* to be able to checkpoint containers in a frozen state.
1082+
*/
1083+
ret = run_plugins(PAUSE_DEVICES, pid);
1084+
if (ret < 0 && ret != -ENOTSUP) {
1085+
goto err;
1086+
}
1087+
10791088
if (compel_interrupt_task(pid)) {
10801089
set_cr_errno(ESRCH);
10811090
goto err;

0 commit comments

Comments
 (0)