Skip to content

Conversation

@avagin
Copy link
Member

@avagin avagin commented Oct 21, 2025

No description provided.

@rst0git
Copy link
Member

rst0git commented Oct 21, 2025

@avagin Would be possible to include the patches from the following PRs in the release?

adrianreber and others added 10 commits November 2, 2025 07:42
Signed-off-by: Adrian Reber <[email protected]>
This is highly confusing, and it seems that the ret variable
is not handled in the subsequent process.

Signed-off-by: Yuanhong Peng <[email protected]>
The stack test incorrectly assumed the page immediately
following the stack pointer could never be changed. This doesn't work,
because this page can be a part of another mapping.

This commit introduces a dedicated "stack redzone," a small guard region
directly after the stack. The stack test is modified to specifically
check for corruption within this redzone.

Signed-off-by: Andrei Vagin <[email protected]>
Thomas Gleixner introduced the new interface to create posix timers
with specifed timer IDs:
torvalds/linux@ec2d0c0

Previously, CRIU recreated timers by repeatedly creating and deleting
them until the desired ID was reached. This approach isn't fast,
especially for timers with large IDs. For example, restoring two timers
with IDs 1000000 and 2000000 took approximately 1.5 seconds.

The new `prctl()` based interface allows direct creation of timers with
specified IDs, reducing the restoration time to around 3 microseconds
for the same example.

Signed-off-by: Andrei Vagin <[email protected]>
When handing errors for functions such as `ptrace()`, `pipe()`, and
`fork()` it would be better to use `pr_perror` instead of `pr_err`
as it would include a message describing the encountered error.

Signed-off-by: Radostin Stoyanov <[email protected]>
The `goto interrupt` label is unnecessary as the code directly
returns after `cuda_process_checkpoint_action()`.

Signed-off-by: Radostin Stoyanov <[email protected]>
On a RHEL 8 based system building CRIU fails with:

criu/arch/aarch64/crtools.c: In function 'save_pac_keys':
criu/arch/aarch64/crtools.c:73:39: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_PACA_KEYS'?
   ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov);
                                       ^~~~~~~~~~~~~~~~~~~~~~~
                                       NT_ARM_PACA_KEYS
criu/arch/aarch64/crtools.c:73:39: note: each undeclared identifier is reported only once for each function it appears in
criu/arch/aarch64/crtools.c: In function 'arch_ptrace_restore':
criu/arch/aarch64/crtools.c:261:44: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_PACA_KEYS'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~~~~~~~~
                                            NT_ARM_PACA_KEYS

This adds the missing define if it is undefined.

Signed-off-by: Adrian Reber <[email protected]>
Currently we save FP regs before parasite code runs, and restore after
for --leave-running, --check-only, and in case of errors. In case of
errors the error may have happened before FP regs were saved, so we
should only restore them if they were actually saved.

Signed-off-by: Younes Manton <[email protected]>
CRIU locks the network during restore in an "empty" network namespace.
However, "empty" in this context means CRIU isn't restoring the
namespace. This network namespace can be the same namespace where
processes have been dumped and so the network is already locked in it.

Fixes checkpoint-restore#2650

Signed-off-by: Andrei Vagin <[email protected]>
Building CRIU package on Debian 11 aarch64 fails with

criu/arch/aarch64/crtools.c: In function 'save_pac_keys':
criu/arch/aarch64/crtools.c:32:31: error: storage size of 'paca' isn't known
  struct user_pac_address_keys paca;
                               ^~~~
criu/arch/aarch64/crtools.c:33:31: error: storage size of 'pacg' isn't known
  struct user_pac_generic_keys pacg;
                               ^~~~
criu/arch/aarch64/crtools.c:47:15: error: 'HWCAP_PACA' undeclared (first use in this function); did you mean 'HWCAP_FCMA'?
  if (hwcaps & HWCAP_PACA) {
               ^~~~~~~~~~
               HWCAP_FCMA
criu/arch/aarch64/crtools.c:47:15: note: each undeclared identifier is reported only once for each function it appears in
criu/arch/aarch64/crtools.c:53:44: error: 'NT_ARM_PACA_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PACA_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:73:39: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function)
   ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov);
                                       ^~~~~~~~~~~~~~~~~~~~~~~
criu/arch/aarch64/crtools.c:82:15: error: 'HWCAP_PACG' undeclared (first use in this function); did you mean 'HWCAP_AES'?
  if (hwcaps & HWCAP_PACG) {
               ^~~~~~~~~~
               HWCAP_AES
criu/arch/aarch64/crtools.c:88:44: error: 'NT_ARM_PACG_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PACG_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:33:31: error: unused variable 'pacg' [-Werror=unused-variable]
  struct user_pac_generic_keys pacg;
                               ^~~~
criu/arch/aarch64/crtools.c:32:31: error: unused variable 'paca' [-Werror=unused-variable]
  struct user_pac_address_keys paca;
                               ^~~~
criu/arch/aarch64/crtools.c: In function 'arch_ptrace_restore':
criu/arch/aarch64/crtools.c:227:31: error: storage size of 'upaca' isn't known
  struct user_pac_address_keys upaca;
                               ^~~~~
criu/arch/aarch64/crtools.c:228:31: error: storage size of 'upacg' isn't known
  struct user_pac_generic_keys upacg;
                               ^~~~~
criu/arch/aarch64/crtools.c:241:18: error: 'HWCAP_PACA' undeclared (first use in this function); did you mean 'HWCAP_FCMA'?
   if (!(hwcaps & HWCAP_PACA)) {
                  ^~~~~~~~~~
                  HWCAP_FCMA
criu/arch/aarch64/crtools.c:255:44: error: 'NT_ARM_PACA_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PACA_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:261:44: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function)
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~~~~~~~~
criu/arch/aarch64/crtools.c:268:18: error: 'HWCAP_PACG' undeclared (first use in this function); did you mean 'HWCAP_AES'?
   if (!(hwcaps & HWCAP_PACG)) {
                  ^~~~~~~~~~
                  HWCAP_AES
criu/arch/aarch64/crtools.c:275:44: error: 'NT_ARM_PACG_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PACG_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:233:6: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
  int ret;
      ^~~
criu/arch/aarch64/crtools.c:228:31: error: unused variable 'upacg' [-Werror=unused-variable]
  struct user_pac_generic_keys upacg;
                               ^~~~~
criu/arch/aarch64/crtools.c:227:31: error: unused variable 'upaca' [-Werror=unused-variable]
  struct user_pac_address_keys upaca;
                               ^~~~~
This patch adds the missing constants and structs if undefined.

Signed-off-by: Radostin Stoyanov <[email protected]>
Snorch and others added 11 commits November 2, 2025 07:48
Mount flags belong to mount and mount namespace of the Container, so we
should preserve them, as Container user will not expect mounts switching
between ro and rw over c/r.

Fixes: checkpoint-restore#2632

v5: fix both mount-v1 and mount-v2

Signed-off-by: Pavel Tikhomirov <[email protected]>
Add {'bind': 'path/to/bindmount'} zdtm descriptor option, so that in
test mount namespace a directory bindmount can be created before running
the test.

This is useful to leave test directory writable (e.g. for logs) while
the test makes root mount readonly. note: We create this bindmount early
so that all test files are opened on it initially and not on the below
mount. Will be used in mnt_ro_root test.

Signed-off-by: Pavel Tikhomirov <[email protected]>
It makes root mount readonly and checks that it is still readonly after
migration.

Make zdtm/static writable for logs via "bind" desc option.

v2: explain why we don't have explicit rw/ro flag check
v3: use new zdtm "bind" desc option

Signed-off-by: Pavel Tikhomirov <[email protected]>
With Go version 1.24, ListenConfig now uses MPTCP by default [1].
Checkpoint/restore for this protocol is not currently supported
and adding support requires kernel changes that are not trivial
to implement. As a result, checkpointing of many containers that
run Go programs is likely to fail with the following error [2]:

(00.026522) Error (criu/sk-inet.c:130): inet: Unsupported proto 262 for socket 2f9bc5

This patch adds a message with suggested workaround for this problem.

[1] https://go.dev/doc/go1.24#netpkgnet
[2] checkpoint-restore#2655

Signed-off-by: Radostin Stoyanov <[email protected]>
In some cases, they might not work in virtual machines if the hypervisor
doesn't virtualize them. For example, they don't work in AMD SEV virtual
machines if the Debug Virtualization extension isn't supported or isn't
enabled in SEV_FEATURES.

Fixes checkpoint-restore#2658

Signed-off-by: Andrei Vagin <[email protected]>
In 0a7c5fd we swapped the BSD
implementation of strlcat and strlcpy in favor of our own replacement.

The checks and the predefined macros are not needed anymore.

Signed-off-by: Lorenzo Fontana <[email protected]>
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: checkpoint-restore#2514

Signed-off-by: Radostin Stoyanov <[email protected]>
Building CRIU on Ubuntu 20.04 fails with the following error:

criu/sk-inet.c: In function 'can_dump_ipproto':
criu/sk-inet.c:131:16: error: 'IPPROTO_MPTCP' undeclared (first use in this function); did you mean 'IPPROTO_MTP'?
  131 |   if (proto == IPPROTO_MPTCP)
      |                ^~~~~~~~~~~~~
      |                IPPROTO_MTP

Add definition for MPTCP to fix this error.

Signed-off-by: Radostin Stoyanov <[email protected]>
Currently, in the target process, device-related restore operations and
other restore operations almost run sequentially. When the target
process executes the corresponding CRIU hook functions, it can't perform
other restore operations.  However, for GPU applications, some device
restore operations have no logical dependencies on other common restore
operations and can be parallelized with other operations to speed up the
process.

Instead of launching a thread in child processes for parallelization,
this patch chooses to add a new hook, `POST_FORKING`, in the main CRIU
process to handle these restore operations. This is because the
restoration of memory state in the restore blob is one of the most
time-consuming parts of all restore logic. The main CRIU process can
easily parallelize these operations, whereas parallelizing in threads
within child processes is challenging.

- POST_FORKING

*POST_FORKING: Hook to enable the main CRIU process to perform some
restore operations of plugins.

Signed-off-by: Yanning Yang <[email protected]>
Currently, when CRIU calls `cr_plugin_init`, `fdstore` is not
initialized. However, during the plugin restore procedure, there may be
some common file operations used in multiple hooks. This patch moves
`cr_plugin_init` after `fdstore_init`, allowing `cr_plugin_init` to use
`fdstore` to place these file operations.

Signed-off-by: Yanning Yang <[email protected]>
Currently, parallel restore only focuses on the single-process
situation. Therefore, it needs an interface to know if there is only one
process to restore. This patch adds a `has_children` function in
`pstree.h` and replaces some existing implementations with this
function.

Signed-off-by: Yanning Yang <[email protected]>
svilenkov and others added 27 commits November 5, 2025 15:41
1. create shadow stack vma during vma_remap cycle
2. copy contents from a premapped non-shstk VMA into it
3. unmap premapped non-shstk VMA
4. Mark shstk VMA for remap into the final destination

Signed-off-by: Igor Svilenkov Bozic <[email protected]>
Co-Authored-By: Andrei Vagin <[email protected]>
Co-Authored-By: Alexander Mikhalitsyn <[email protected]>
[ alex: debugging, rework together with Andrei and code cleanup ]
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
* call shstk_vma_restore() for VMA_AREA_SHSTK in vma_remap()
* delete map/copy/unmap from shstk_restore() and keep token setup + finalize
* before the loop naturally stopped at cet->ssp-8, so a -8 nudge is required here

Signed-off-by: Igor Svilenkov Bozic <[email protected]>
Co-Authored-By: Andrei Vagin <[email protected]>
[ alex: small code cleanups ]
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
* add SHSTK_ENABLE=1 toggle
* passes -mshstk to compiler and -z shstk to linker

Example:
  $ make -C test/zdtm/static clean
  $ make -C test/zdtm/static V=1 SHSTK_ENABLE=1 env00

  $ readelf --notes test/zdtm/static/env00 | grep SHSTK
      Properties: x86 feature: SHSTK

Signed-off-by: Igor Svilenkov Bozic <[email protected]>
Co-Authored-By: Andrei Vagin <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
We use LGPL-v2.1 license for the libcriu and pycriu as they are
intended to be usable by both proprietary and open-source applications.

Signed-off-by: Andrii Herheliuk <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
pycriu depends on protobuf to function correctly. Currently,
it raises an error if protobuf is not installed. Adding
protobuf to the dependencies ensures it is available after
installing pycriu.

Signed-off-by: Andrii Herheliuk <[email protected]>
Regardless of the actual error message, "Unknown" was always appended
to the end of the string, resulting in messages like:
"DUMP failed: Error(3): No process with such pidUnknown".

Fixed by changing standalone if statements to else-if blocks so
"Unknown" is only added when no specific error condition matches.

Signed-off-by: Andrii Herheliuk <[email protected]>
This patch consolidates the action-script tests into
`test/others/action-script` to ensure all tests are executed
consistently and reduce duplication. Since we had two tests that appear
to do the same thing, we can remove the one that doesn't use zdtm.py.

Signed-off-by: Radostin Stoyanov <[email protected]>
The existing test collects all action-script hooks triggered during
`h`, `ns`, and `uns` runs with ZDTM into `actions_called.txt`, then
verifies that each hook appears at least once. However, the test does
not verify that hooks are invoked *exactly once* or in *correct order*.

This change updates the test to run ZDTM only with ns flavour as this
seems to cover all action-script hooks, and checks that all hooks are
called correctly.

Signed-off-by: Radostin Stoyanov <[email protected]>
Don't install external pip dependencies when running `make install`.
As we are not really into developing a Python project, we should
not install additional packages. CRIU does that nowhere else.

Signed-off-by: Radostin Stoyanov <[email protected]>
These dependencies are required to for `pip install`.

Signed-off-by: Radostin Stoyanov <[email protected]>
which is used in Makefiles to check for dependencies:

Example:
export USE_ASCIIDOCTOR ?= $(shell which asciidoctor 2>/dev/null)

Signed-off-by: Radostin Stoyanov <[email protected]>
Unlike "which", which is a separate executable not always installed by
default, "command -v" is a shell built-in available at least for bash,
dash, and busybox shell.

Unlike "which", "command -v" is also easier to grep for, and it is
already used in a few places here.

Inspired by commit 57251d8.

Signed-off-by: Kir Kolyshkin <[email protected]>
Container runtimes that use libcriu (e.g., crun) need to specify a CRIU
configuration file that allows to overwrite default options set via RPC.
This is particularly useful to set options such as `--tcp-established`
via `/etc/criu/runc.conf` in Kubernetes.

Signed-off-by: Radostin Stoyanov <[email protected]>
Use system-installed CRIU binary instead of a local file

Thanks to @avagin for suggesting this solution.

Co-authored-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrii Herheliuk <[email protected]>
[Errno 2] No such file or directory -> Socket file not found.
[Errno 111] Connection refused -> Service not running.

Signed-off-by: Andrii Herheliuk <[email protected]>
This change allows users to call criu.use_sk() without any
parameters to use the default socket name.

Co-authored-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Andrii Herheliuk <[email protected]>
Move the code that opens the images directory, resolves its absolute
path via readlink(), selects the work_dir, and chdir()s into it into a
new function: setup_images_and_workdir(). This reduces the size of
`setup_opts_from_req()`, improves its readability, and allows this
functionality to be reused.

While at it, change open_image_dir() to take a const char *dir
parameter, reflecting that the path is not modified by the function and
allowing callers to pass string literals without casts.

No functional changes are intended.

Signed-off-by: Radostin Stoyanov <[email protected]>
Commit 9089ce8 ("service: use setproctitle") extended cr-service to
get the full path of images_dir using readlink(). However, the RPC
API was later extended to allow setting a custom path (folder) to
be set instead of passing a file descriptor, which causes readlink()
to fail as the path is not a symbolic link.

It would be better to drop the code setting the images-dir path as a
string in the proctitle.

Fixes: checkpoint-restore#2794

Suggested-by: Andrei Vagin <[email protected]>
Co-authored-by: Andrii Herheliuk <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
Move the images_dir selection logic from setup_opts_from_req() into a
new function: resolve_images_dir_path(). This improves readability and
allows the code to be reused. While at it, use snprintf() instead of
sprintf() for the /proc path and ensure NULL termination after strncpy().

Signed-off-by: Radostin Stoyanov <[email protected]>
Move the logging initialization into a helper function that
can be reused.

No functional change intended.

Signed-off-by: Radostin Stoyanov <[email protected]>
The check() functionality is very different from dump, pre-dump,
and restore. It is used only to check if the kernel supports required
features, and does not need the majority of options set via RPC.

In particular, we don't need to open `image_dir` when running `check()`
because this functionality doesn't create or process image files. In
this case, `image_dir` is used as `work_dir`, only when the latter is
not specified and a log file is used.

This patch updates the RPC options parser so that it only handles the
logging options when check() is used. Logging to a file is required when
log_file is explicitly set or no log_to_stderr is used. In such case, we
also resolve images_dir and work_dir where the log file will be created.

Fixes: checkpoint-restore#2758

Suggested-by: Andrei Vagin <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
This allows users to specify RPC options when
using the check() functionality.

Co-authored-by: Andrii Herheliuk <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
_init__.py defines the public API for pycriu. It is important to use
explicit imports to avoid leaking every symbol from criu.py into the
pycriu namespace. This avoids import-time side effects, prevents name
collisions, and circular-import traps.

Fixes the following lint error:
  F403 `from .criu import *` used; unable to detect undefined names

Signed-off-by: Radostin Stoyanov <[email protected]>
The --mntns-compat-mode option is no longer parsed with CHECK.
Use --log-file instead to test the error message.

Signed-off-by: Radostin Stoyanov <[email protected]>
Use nr_pages when available, falling back to compat_nr_pages
for compatibility.

Signed-off-by: alam0rt <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
@avagin avagin merged commit cb8e1da into checkpoint-restore:master Nov 5, 2025
19 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.