Here, we investigate the behaviour of LXD when moving containers between LXD cluster nodes, with a focus on various types of (filesystem) snapshots.

LXD containers can be snapshot by LXD itself, but in case one uses a ZFS storage backend, one can also use a tool like Sanoid to make snapshots of a container’s filesystem. When moving an LXD container from one LXD cluster node to another, one, of course, wants those filesystem snapshots to move along as well. Spoiler: this isn’t always the case.

Let’s create a test container on my home LXD cluster (which uses ZFS as default storage backend), starting on node wiske2:

lxc launch ubuntu:22.04 snapmovetest --target=wiske2

Check the container is running:

lxc list snapmovetest

+--------------+---------+-----------------------+-------------------------------------------+-----------+-----------+----------+
|     NAME     |  STATE  |         IPV4          |                   IPV6                    |   TYPE    | SNAPSHOTS | LOCATION |
+--------------+---------+-----------------------+-------------------------------------------+-----------+-----------+----------+
| snapmovetest | RUNNING | 192.168.10.158 (eth0) | 2a10:3781:782:1:216:3eff:fed5:ef48 (eth0) | CONTAINER | 0         | wiske2   |
+--------------+---------+-----------------------+-------------------------------------------+-----------+-----------+----------+

Now, let’s use LXD to create two snapshots:

lxc snapshot snapmovetest "Test1"
sleep 10
lxc snapshot snapmovetest "Test2"

Check the snapshots have been made:

lxc info snapmovetest | awk '$1=="Snapshots:" {toprint=1}; {if(toprint==1) {print $0}}'

Snapshots:
+-------+----------------------+------------+----------+
| NAME  |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------+----------------------+------------+----------+
| Test1 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+
| Test2 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+

At the ZFS level:

zfs list -rtall rpool/lxd/containers/snapmovetest

NAME                                               USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/snapmovetest                 24.7M   192G      748M  legacy
rpool/lxd/containers/snapmovetest@snapshot-Test1    60K      -      748M  -
rpool/lxd/containers/snapmovetest@snapshot-Test2    60K      -      748M  -

All is fine! Now, let’s move the container to node wiske3:

lxc stop snapmovetest
lxc move snapmovetest snapmovetest --target=wiske3
lxc list snapmovetest

+--------------+---------+------+------+-----------+-----------+----------+
|     NAME     |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+--------------+---------+------+------+-----------+-----------+----------+
| snapmovetest | STOPPED |      |      | CONTAINER | 2         | wiske3   |
+--------------+---------+------+------+-----------+-----------+----------+

Check the snapshots:

lxc info snapmovetest | awk '$1=="Snapshots:" {toprint=1}; {if(toprint==1) {print $0}}'

Snapshots:
+-------+----------------------+------------+----------+
| NAME  |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------+----------------------+------------+----------+
| Test1 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+
| Test2 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+

At the ZFS level:

zfs list -rtall rpool/lxd/containers/snapmovetest

NAME                                               USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/snapmovetest                  749M   202G      748M  legacy
rpool/lxd/containers/snapmovetest@snapshot-Test1    60K      -      748M  -
rpool/lxd/containers/snapmovetest@snapshot-Test2    60K      -      748M  -

So far so good: snapshots taken with the native LXD toolchain get moved. Now let’s manually create a ZFS snapshot:

zfs snapshot rpool/lxd/containers/snapmovetest@manual_zfs_snap
zfs list -rtall rpool/lxd/containers/snapmovetest

NAME                                                USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/snapmovetest                   749M   202G      748M  legacy
rpool/lxd/containers/snapmovetest@snapshot-Test1     60K      -      748M  -
rpool/lxd/containers/snapmovetest@snapshot-Test2     60K      -      748M  -
rpool/lxd/containers/snapmovetest@manual_zfs_snap     0B      -      748M  -

Nove move the container back to node wiske2:

lxc move snapmovetest snapmovetest --target=wiske2
lxc list snapmovetest

+--------------+---------+------+------+-----------+-----------+----------+
|     NAME     |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+--------------+---------+------+------+-----------+-----------+----------+
| snapmovetest | STOPPED |      |      | CONTAINER | 2         | wiske2   |
+--------------+---------+------+------+-----------+-----------+----------+

What happened to the snapshots?

lxc info snapmovetest | awk '$1=="Snapshots:" {toprint=1}; {if(toprint==1) {print $0}}'

Snapshots:
+-------+----------------------+------------+----------+
| NAME  |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------+----------------------+------------+----------+
| Test1 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+
| Test2 | 2023/03/11 22:22 CET |            | NO       |
+-------+----------------------+------------+----------+

zfs list -rtall rpool/lxd/containers/snapmovetest

NAME                                               USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/snapmovetest                  749M   191G      748M  legacy
rpool/lxd/containers/snapmovetest@snapshot-Test1    60K      -      748M  -
rpool/lxd/containers/snapmovetest@snapshot-Test2    60K      -      748M  -

Somehow, the ZFS-level snapshot has been removed… I guess this part of the LXD manual should be written in bold (emphasis mine):

LXD assumes that it has full control over the ZFS pool and dataset. Therefore, you should never maintain any datasets or file system entities that are not owned by LXD in a ZFS pool or dataset, because LXD might delete them.

Consequently, in a LXD cluster one shouldn’t use Sanoid to make snapshots ZFS-backed LXD container filesystems. Instead, use LXD’s builtin automatic snapshot capabilities (see the snapshots.expiry and snapshots.schedule options).

Clean up:

lxc delete snapmovetest

Related Images: