Checkpoint / Restore Functionality with CRIU #200

Open
opened 2025-12-29 00:20:08 +01:00 by adam · 6 comments
Owner

Originally created by @christosnc on GitHub (Jun 11, 2021).

Hello,

I would like to say this is a great project. I managed to install macOS on Ubuntu 20.04. As others pointed out, booting is very slow (especially for CI / CD) and I would like to have a suspend functionality, in order to start right into the booted system.

I found docker's checkpoint experimental feature. To set it up, do the following:

sudo systemctl stop docker
sudo -i
sudo echo '{"experimental": true}' >> /etc/docker/daemon.json
sudo add-apt-repository ppa:criu/ppa
sudo apt-get update
sudo apt-get -y install criu
sudo service docker restart

Then I should be able to boot into macOS, and create a checkpoint with:

docker checkpoint create --leave-running=true <container-id> checkpoint-1

(I tried starting the container with and without -ai)

But the checkpoint command always fails with:

Error response from daemon: Cannot checkpoint container: runc did not terminate successfully: criu failed: type NOTIFY errno 0

Am I missing something, or is it impossible to create checkpoints for this project?
(And is there any other solution that I am missing? I couldn't find anything else that suits my needs.)

Thanks!

Originally created by @christosnc on GitHub (Jun 11, 2021). Hello, I would like to say this is a great project. I managed to install macOS on Ubuntu 20.04. As others pointed out, booting is very slow (especially for CI / CD) and I would like to have a suspend functionality, in order to start right into the booted system. I found docker's checkpoint experimental feature. To set it up, do the following: ``` bash sudo systemctl stop docker sudo -i sudo echo '{"experimental": true}' >> /etc/docker/daemon.json sudo add-apt-repository ppa:criu/ppa sudo apt-get update sudo apt-get -y install criu sudo service docker restart ``` Then I should be able to boot into macOS, and create a checkpoint with: ``` bash docker checkpoint create --leave-running=true <container-id> checkpoint-1 ``` (I tried starting the container with and without -ai) But the checkpoint command always fails with: ``` Error response from daemon: Cannot checkpoint container: runc did not terminate successfully: criu failed: type NOTIFY errno 0 ``` Am I missing something, or is it impossible to create checkpoints for this project? (And is there any other solution that I am missing? I couldn't find anything else that suits my needs.) Thanks!
Author
Owner

@christosnc commented on GitHub (Jun 11, 2021):

PS:

I also tried with and without --leave-running=true, with and without sudo, and with the short and full container Id.

Also sudo criu check --all returns "Looks good."

Looking at the criu logs the following errors are at the bottom:

(00.564986) Error (criu/proc_parse.c:453): Unknown shit 600 (anon_inode:kvm-vcpu:3)
(00.565000) Error (criu/proc_parse.c:661): Can't open 2730's mapfile link 7f5567a43000: No such device or address
(00.565009) Error (criu/cr-dump.c:1250): Collect mappings (pid: 2730) failed with -1
(00.565137) Unlock network
(00.565141) Running network-unlock scripts
(00.565144)     RPC
(00.569669) Unfreezing tasks into 1
(00.569708)     Unseizing 2568 into 1
(00.569722)     Unseizing 2729 into 1
(00.569734)     Unseizing 2738 into 1
(00.569753)     Unseizing 2730 into 1
(00.569971) Error (criu/cr-dump.c:1768): Dumping FAILED.
@christosnc commented on GitHub (Jun 11, 2021): PS: I also tried with and without --leave-running=true, with and without sudo, and with the short and full container Id. Also `sudo criu check --all` returns "Looks good." ## Looking at the criu logs the following errors are at the bottom: ``` (00.564986) Error (criu/proc_parse.c:453): Unknown shit 600 (anon_inode:kvm-vcpu:3) (00.565000) Error (criu/proc_parse.c:661): Can't open 2730's mapfile link 7f5567a43000: No such device or address (00.565009) Error (criu/cr-dump.c:1250): Collect mappings (pid: 2730) failed with -1 (00.565137) Unlock network (00.565141) Running network-unlock scripts (00.565144) RPC (00.569669) Unfreezing tasks into 1 (00.569708) Unseizing 2568 into 1 (00.569722) Unseizing 2729 into 1 (00.569734) Unseizing 2738 into 1 (00.569753) Unseizing 2730 into 1 (00.569971) Error (criu/cr-dump.c:1768): Dumping FAILED. ```
Author
Owner

@christosnc commented on GitHub (Jun 13, 2021):

Update

I also tried creating a snapshot directly in qemu with the savevm command, but this also fails with:

Error: State blocked by non-migratable CPU device (invtsc flag)
@christosnc commented on GitHub (Jun 13, 2021): **Update** I also tried creating a snapshot directly in qemu with the **savevm** command, but this also fails with: ``` Error: State blocked by non-migratable CPU device (invtsc flag) ```
Author
Owner

@sickcodes commented on GitHub (Jun 16, 2021):

Hey this is very interesting and I would also like to save the state of the machine. I'll take a look at this during the week

@sickcodes commented on GitHub (Jun 16, 2021): Hey this is very interesting and I would also like to save the state of the machine. I'll take a look at this during the week
Author
Owner

@christosnc commented on GitHub (Jun 19, 2021):

That's awesome! I hope we can get something working

@christosnc commented on GitHub (Jun 19, 2021): That's awesome! I hope we can get something working
Author
Owner

@KernelDash commented on GitHub (Jan 11, 2024):

I also get a similar error when dumping processes that use gpu acceleration/gpu.. Here are the logs:

(00.019628) Error (criu/proc_parse.c:467): Unknown shit 600 (anon_inode:i915.gem) (00.019645) Error (criu/proc_parse.c:694): Can't open 74009's mapfile link 7f2e13200000: No such device or address (00.019655) Error (criu/cr-dump.c:1558): Collect mappings (pid: 74009) failed with -1 (00.019768) net: Unlock network (00.019776) Unfreezing tasks into 1 (00.019779) Unseizing 74009 into 1 (00.020261) Error (criu/cr-dump.c:2093): Dumping FAILED.
edit: I think its because of the gpu because it says "anon_inode:i915.gem" and i915 is a gpu driver

@KernelDash commented on GitHub (Jan 11, 2024): I also get a similar error when dumping processes that use gpu acceleration/gpu.. Here are the logs: ` (00.019628) Error (criu/proc_parse.c:467): Unknown shit 600 (anon_inode:i915.gem) (00.019645) Error (criu/proc_parse.c:694): Can't open 74009's mapfile link 7f2e13200000: No such device or address (00.019655) Error (criu/cr-dump.c:1558): Collect mappings (pid: 74009) failed with -1 (00.019768) net: Unlock network (00.019776) Unfreezing tasks into 1 (00.019779) Unseizing 74009 into 1 (00.020261) Error (criu/cr-dump.c:2093): Dumping FAILED. ` edit: I think its because of the gpu because it says "anon_inode:i915.gem" and i915 is a gpu driver
Author
Owner

@silenceli commented on GitHub (Oct 28, 2025):

meet the same issue:

when I dump a vllm process with T4 GPU

(base) root@rcuda001:~# criu dump --shell-job --images-dir demo2 --tree 49826 --ghost-limit $((10 * 1024 * 1024))
Error (cuda_plugin.c:222): cuda_plugin: Failed to launch cuda-checkpoint to retrieve restore tid: Could not find restore thread for process ID 49988

Error (cuda_plugin.c:222): cuda_plugin: Failed to launch cuda-checkpoint to retrieve restore tid: Could not find restore thread for process ID 49988

Error (criu/proc_parse.c:479): Unknown shit 600 (anon_inode:[io_uring])
Error (criu/proc_parse.c:707): Can't open 49826's mapfile link 7240f0c5e000: No such device or address
Error (criu/cr-dump.c:1570): Collect mappings (pid: 49826) failed with -1
Error (criu/cr-dump.c:2111): Dumping FAILED.
@silenceli commented on GitHub (Oct 28, 2025): meet the same issue: when I dump a vllm process with T4 GPU ``` (base) root@rcuda001:~# criu dump --shell-job --images-dir demo2 --tree 49826 --ghost-limit $((10 * 1024 * 1024)) Error (cuda_plugin.c:222): cuda_plugin: Failed to launch cuda-checkpoint to retrieve restore tid: Could not find restore thread for process ID 49988 Error (cuda_plugin.c:222): cuda_plugin: Failed to launch cuda-checkpoint to retrieve restore tid: Could not find restore thread for process ID 49988 Error (criu/proc_parse.c:479): Unknown shit 600 (anon_inode:[io_uring]) Error (criu/proc_parse.c:707): Can't open 49826's mapfile link 7240f0c5e000: No such device or address Error (criu/cr-dump.c:1570): Collect mappings (pid: 49826) failed with -1 Error (criu/cr-dump.c:2111): Dumping FAILED. ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/Docker-OSX#200