Skip to content

Conversation

@rst0git
Copy link
Member

@rst0git rst0git commented Nov 4, 2024

Container runtimes like CRI-O and containerd utilize the freezer cgroup to create a consistent snapshot of container rootfs changes. In this case, the container is frozen before invoking CRIU. Once CRIU successfully completes, a copy of the container rootfs diff is saved, and then the container is unfrozen. To enable GPU checkpointing support with these runtimes, we need to unfreeze the cgroup and restore it to its original state at the end.

When the CUDA plugin is installed, container checkpointing with Kubernetes fails, even for containers that don't use GPUs. This patch aims to resolve this issue.

Fixes: #2508

@rst0git rst0git requested a review from avagin November 4, 2024 20:06
@rst0git rst0git marked this pull request as ready for review November 4, 2024 20:07
@rst0git rst0git requested a review from adrianreber November 4, 2024 20:08
@rst0git rst0git force-pushed the 2024-11-04-seize-checkpointing-freezen-containers branch from 70b3ad9 to 511d073 Compare November 4, 2024 20:08
@adrianreber
Copy link
Member

Doesn't this break the expectations of the container engines. You wrote they freeze the container to avoid changes to the container file-system. Does the container now continue to run with your change?

@rst0git
Copy link
Member Author

rst0git commented Nov 5, 2024

Does the container now continue to run with your change?

No, we use process seizing without freezer cgroup during checkpointing (see #2475 and #2470).
After criu dump the container should remain in a frozen state.

@rst0git rst0git force-pushed the 2024-11-04-seize-checkpointing-freezen-containers branch 2 times, most recently from d39fc15 to 979e277 Compare November 7, 2024 12:06
@avagin
Copy link
Member

avagin commented Nov 7, 2024

Doesn't this break the expectations of the container engines. You wrote they freeze the container to avoid changes to the container file-system. Does the container now continue to run with your change?

Strictly speaking, this expectation was not right even before this change. CRIU does file system changes while dumping processes. For example, it creates ghost files.

I think the right expectation here is that file systems are not changed after dumping processes and this statement isn't affected by this change.

@avagin
Copy link
Member

avagin commented Nov 8, 2024

LGTM. Thanks.

Container runtimes like CRI-O and containerd utilize the freezer cgroup
to create a consistent snapshot of container root filesystem (rootfs)
changes. In this case, the container is frozen before invoking CRIU.
After CRIU successfully completes, a copy of the container rootfs diff
is saved, and the container is then unfrozen.

However, the `cuda-checkpoint` tool is not able to perform a 'lock'
action on frozen threads.  To support GPU checkpointing with these
container runtimes, we need to unfreeze the cgroup and return it to its
original state once the checkpointing is complete.

To reflect this new behavior, the following changes are applied:
 - `dont_use_freeze_cgroup(void)` -> `set_compel_interrupt_only_mode(void)`
 - `bool freeze_cgroup_disabled` -> `bool compel_interrupt_only_mode`
 - `check_freezer_cgroup(void)` -> `prepare_freezer_for_interrupt_only_mode(void)`

Note that when `compel_interrupt_only_mode` is set to `true`,
`compel_interrupt_task()` is used instead of `freeze_processes()`
to prevent tasks from running during `criu dump`.

Fixes: checkpoint-restore#2508

Signed-off-by: Radostin Stoyanov <[email protected]>
@rst0git rst0git force-pushed the 2024-11-04-seize-checkpointing-freezen-containers branch from 979e277 to 495e39e Compare November 8, 2024 13:44
@avagin avagin merged commit 31b38d6 into checkpoint-restore:criu-dev Nov 12, 2024
38 of 41 checks passed
@rst0git rst0git deleted the 2024-11-04-seize-checkpointing-freezen-containers branch November 12, 2024 09:20
rst0git added a commit to rst0git/criu that referenced this pull request May 11, 2025
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state and
the rootfs of the container. However, when checkpointing a GPU container,
it must be unfrozen before invoking `cuda-checkpoint`. This is achieved
in `prepare_freezer_for_interrupt_only_mode()`, which needs to be called
before the `PAUSE_DEVICES` hook.

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <[email protected]>
rst0git added a commit to rst0git/criu that referenced this pull request May 11, 2025
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process, the
PAUSE_DEVICES hook must be invoked immediately before
prepare_freezer_for_interrupt_only_mode().

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <[email protected]>
rst0git added a commit to rst0git/criu that referenced this pull request May 11, 2025
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <[email protected]>
rst0git added a commit to rst0git/criu that referenced this pull request May 14, 2025
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <[email protected]>
avagin pushed a commit that referenced this pull request May 15, 2025
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: #2514

Signed-off-by: Radostin Stoyanov <[email protected]>
avagin pushed a commit to avagin/criu that referenced this pull request Oct 21, 2025
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <[email protected]>
avagin pushed a commit to avagin/criu that referenced this pull request Nov 2, 2025
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <[email protected]>
avagin pushed a commit to avagin/criu that referenced this pull request Nov 2, 2025
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

seccomp: Can't find entry on tid_real

3 participants