Debian 12 + LXD/LXC security.idmap.isolated fails - eviltoast

Debian 12.1 (6.1.0-11-amd64) running LXD/LXC and on an unprivileged container setting security.idmap.isolated=true seems to fail to update the owner/group of the container’s files.

Here is an example:

# lxc launch images:debian/12 debian
(...)

# lxc config get debian volatile.idmap.base
296608

# lxc stop debian
Error: The instance is already stopped

# lxc config set debian security.idmap.isolated true

# lxc config get debian security.idmap.isolated
true

# lxc start debian

Now if I list the files on the container volume I’ll get they’re all owned by the host root user:

# ls -la /mnt/NVME1/lxd/containers/debian/rootfs/
total 24
drwxr-xr-x 1 root   root  154 Sep  5 06:28 .
d--x------ 1 296608 root   78 Sep  5 15:59 ..
lrwxrwxrwx 1 root   root    7 Sep  5 06:25 bin -> usr/bin
drwxr-xr-x 1 root   root    0 Jul 14 17:00 boot
drwxr-xr-x 1 root   root    0 Sep  5 06:28 dev
drwxr-xr-x 1 root   root 1570 Sep  5 06:28 etc

I tried multiple versions of LXD/LXC. This happens with both 5.0.2 from apt as well with 4.0 and 5.17 (latest) from snap.

Interestingly enough I have another Debian 10 (4.19.0-25-amd64) running and older LXD 4 from snap and on that one things work as expected:

# ls -la /mnt/NVME1/lxd/containers/debian/rootfs/
total 0
drwxr-xr-x 1 1065536 1065536  138 Oct 29  2020 .
d--x------ 1 1065536 root      78 Oct 14  2020 ..
drwxr-xr-x 1 1065536 1065536 1328 Jul 24 19:07 bin
drwxr-xr-x 1 1065536 1065536    0 Sep 19  2020 boot
drwxr-xr-x 1 1065536 1065536    0 Oct 14  2020 dev
drwxr-xr-x 1 1065536 1065536 1716 Jul 24 19:08 etc

As you can see on this systems all the files are owned by 1065536:1065536.


Update:

I tried to probe around the maps with lxc config show debian in both machines and I saw this:

Machine running Debian 10:

security.idmap.isolated: "true"
(...)
volatile.idmap.base: "1065536"
volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'

Machine running Debian 12:

security.idmap.isolated: "true"
(...)
volatile.idmap.base: "231072"
volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":231072,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":231072,"Nsid":0,"Maprange":65536}]'
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":231072,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":231072,"Nsid":0,"Maprange":65536}]'
volatile.last_state.idmap: '[]'

Why didn’t it populate volatile.last_state.idmap: '[]'?

How can I fix it? Thank you.

  • TCB13@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    Apparently this is by design a feature of newer kernels. Here is a good explanation by Stéphane Graber, maintainer of LXC:

    Prior to VFS idmap being available, we needed to work around file ownership by having LXD manually rewrite the owner of every single file on disk. That’s what you’re showing here on an older kernel.

    On newer kernels, this is no longer needed as we can have the kernel keep the permissions on-disk unshifted and just shift in-kernel so the ownership looks correct inside of the container.

    What you’re showing above looks like a perfectly working setup on a kernel that does support VFS idmap.

    I could indeed config this on the host machine:

    root@vm-debian-12-cli:~# lxc info | grep 'shift\|idmap'
    - storage_shifted
        idmapped_mounts: "true"
        shiftfs: "false"
        idmapped_mounts_v2: "true"
    

    And inside containers the root mount point also shows as idmapped (last line):

    root@debian:~# cat /proc/self/uid_map
             0     231072      65536
    
    root@debian:~# cat /proc/self/gid_map
             0     231072      65536
    
    root@debian:~# cat /proc/self/mountinfo
    490 460 0:24 /@rootfs/mnt/NVME1/lxd/containers/debian/rootfs / rw,relatime,idmapped shared:251 master:1 - btrfs /dev/sda1 rw,space_cache=v2,user_subvol_rm_allowed,subvolid=259,subvol=/@rootfs/mnt/NVME1/lxd/containers/debian
    

    To disable this one might:

    There is an environment variable that can be passed to LXD by adding an override in its systemd unit. LXD_IDMAPPED_MOUNTS_DISABLE=1

    However, and according to Mr. Graber we shouldn’t do that:

    Okay, so your system is operating perfectly normally and with the lowest overhead possible right now, nothing to be worried about.

    The old pre-start shifting method was very slow and very risky as a crash or failure to shift a particular bit of metadata (ACL, xattr, …) could allow for a security issue with the container. It was also horrible for CoW filesystems as it effectively made it look like every single file in the container had been modified, potentially duplicating GBs of data.

    shiftfs (which was an Ubuntu-specific hack) and now the proper VFS idmap shifting, simply have the kernel apply the reverse uidmap/gidmap on any filesystem operation to a mount that’s marked as idmapped. It’s an extremely trivial operation to perform, allows for dynamic changes to the container maps (very useful for isolated), allows for sharing data between containers and properly supports everything that can hold a uid/gid (ioctl, xattr, acl, …) so doing away with the risk of having missed something.