1
0
mirror of https://github.com/systemd/systemd synced 2026-03-01 18:54:47 +01:00

Compare commits

...

63 Commits

Author SHA1 Message Date
Daan De Meyer
e6be5fb720 ssh-proxy: Support ssh machine/xxx for nspawn containers 2026-02-19 21:32:29 +01:00
David Santamaría Rogado
3b51529cbf hwdb: sensor: hp use board product name as hp-wmi
Doing it made also to include the 14t-fh000, the product name initial
units of the omnibook ultra flip 14 had, this is intended.

Order the entries by product name.

Follow up: fadb0b53f7d8d2d9e9d8dd141bc05de9116b083a.
2026-02-20 04:54:53 +09:00
Daan De Meyer
59a83c37bf ci: Simplify musl build setup
No need to setup symlink farms, we can just use the host's /usr/include
now.
2026-02-19 20:04:06 +01:00
Daan De Meyer
5ab8bba2b0 meson: Explicitly check for musl for gshadow and nss
This allows building with musl on glibc systems as follows:

env \
    CC=musl-gcc \
    CXX=musl-gcc \
    CFLAGS="-idirafter /usr/include" \
    CXXFLAGS="-idirafter /usr/include" \
        meson setup --auto-features=disabled -Dlibc=musl musl
2026-02-19 20:04:06 +01:00
Nandakumar Raghavan
fd6506eb9a repart: return 1 from probe_sector_size_prefer_ioctl() on block device success
probe_sector_size() returns 1 when it successfully determines the sector size,
0 when falling back to the default. blockdev_get_sector_size() returns 0 on
success. probe_sector_size_prefer_ioctl() was passing blockdev_get_sector_size()
return value through directly, so caller is checking r > 0 to detect a
successfully probed sector size never saw it for block devices.

In context_load_partition_table(), this caused fs_secsz to stay at 4096 bytes
even on 512-byte sector block devices, making verity hash partition sizes wrong
unless --sector-size=512 was passed explicitly.

Fix by returning 1 on success from the block device path to match probe_sector_size()
convention.
2026-02-19 17:43:56 +01:00
Yu Watanabe
551227e3a0
Python modernization followups (#40755) 2026-02-20 01:33:07 +09:00
Yu Watanabe
e819c31c05 NEWS: move and extend entry for PTP device permission
Follow-up for 1e6854e112e9723be6108b83f6935ec7e04cea17.
2026-02-20 01:25:25 +09:00
Yu Watanabe
5329f4bf76 man: fix typo
Follow-up for 6b22ac31afcfab53dc9b51d6b5f7862e52607923.
2026-02-20 01:18:59 +09:00
Yu Watanabe
e94c410e51 man: fix typo
Follow-up for eb581ff6d9556d29f1b9b57d6a40c4adefde16a6.
2026-02-20 01:18:54 +09:00
Yu Watanabe
fcf5f1db94 mstack: fix typo
Follow-up for 8343032a86b62f62780de85a696ab8f9d2632244.
2026-02-20 01:14:30 +09:00
Yu Watanabe
5d01b27ffc import: fix typo
Follow-up for a9f6ba04969d6eb2e629e30299fab7538ef42a57.
2026-02-20 01:13:02 +09:00
Yu Watanabe
9bc3c2e54a TODO: fix typo
Follow-up for 3bbada87e290f3f0c2ca17f4f10396ec037b03c9.
2026-02-20 01:11:30 +09:00
Lennart Poettering
0036d62e6e
importd: add support for downloading OCI images (#39621)
This adds the ability to download OCI images via importd. 

Not a fan of the OCI format tbh, in particular its security properties
are a bit sad. But I guess it exists and is very popular, hence we might
as well add support for it, even if it comes at much weaker security
properties than DDIs.

Fixes #36447
2026-02-19 16:43:11 +01:00
Lennart Poettering
d911ca8454
Bring Bash profile for reporting context via Operating System Commands (OSC) into compliance with specifications (#40696)
This script fails to comply with the spec it's designed to implement,
[UAPI.15 OSC 3008: Hierarchical Context
Signalling](https://uapi-group.org/specifications/specs/osc_context/),
and fails the correctly utilize the specs provided by
[POSIX.1-2024](https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/mindex.html)
and [man 1
bash](https://www.man7.org/linux/man-pages//man1/bash.1.html); improve
compliance.

Changes are made in small atomic commits, with more detailed
descriptions of the work done in each message.
2026-02-19 15:50:24 +01:00
Zbigniew Jędrzejewski-Szmek
a6b328fd95 elf2efi: modernize typing annotations
We still need Union and Optional as long as compat with Python 3.9
is needed.
2026-02-19 15:34:10 +01:00
Zbigniew Jędrzejewski-Szmek
3c7bb0c37b elf2efi: make mypy-clean 2026-02-19 15:34:10 +01:00
Zbigniew Jędrzejewski-Szmek
528a939c89 elf2efi: import whole module, not individual symbols
When reading the code, it was hard to figure out if the given name was
imported or a local class. And the renaming of imports also made it
harder to look things up online. Arguably, the deeply nested import
structure and inconsistent naming in elftools is partially to blame:
there is just no good way to make this look nice. But anyway, let's use
the usual style of importing the module and using names prefixed with
the module path so that the origin of imported names is clear.

elfutils.elf.elffile is importered separately, because a) it needs to be
imported separately anyway bxecause the module does lazy imports
internally, a) the name already indicates the origin, c) is used in
quite a few places so the shorter name is nice.
2026-02-19 15:34:10 +01:00
Zbigniew Jędrzejewski-Szmek
469879aa44 generate-sym-test: skip everything that is not a file
The generator looks for files in the filesystem, and it sometimes fails
on emacs "lock files" which are a symlink. Ignore those.
2026-02-19 15:33:59 +01:00
Yaping Li
b0fba6abd3 metrics: fix casing for metrics names (take 2)
Change the casing for metrics names to mimic properties exposed via
varlink/dbus: Use PascalCase.
2026-02-19 15:30:18 +01:00
Daan De Meyer
b6bdd540f9 machine: Fix cid passed to machine_add_from_params()
The default value is VMADDR_CID_ANY, not zero.
2026-02-19 15:17:10 +01:00
Lennart Poettering
3bbada87e2 update TODO 2026-02-19 15:08:20 +01:00
Lennart Poettering
c86a72af04 ci: drop 'Ex' suffix from transient props
The "Ex" is mostly internal, and our parsers will append it
automatically when needed
2026-02-19 15:08:20 +01:00
Lennart Poettering
e31ee582fc ci: add test for OCI downloading 2026-02-19 15:08:20 +01:00
Lennart Poettering
eb581ff6d9 man: document everything we just added 2026-02-19 15:08:20 +01:00
Lennart Poettering
cab2caa170 mountpoint-util: fix typo in comment 2026-02-19 15:08:20 +01:00
Lennart Poettering
6ebaea6990 portable: fix log levels
portable_extract_by_path() and install_image() can't agree whether to be
of the "logging" or "non-logging" kind
2026-02-19 15:08:20 +01:00
Lennart Poettering
968f3ebd20 discover-image: make sure we can remove mstacks 2026-02-19 15:08:20 +01:00
Lennart Poettering
e93fd71094 core: introduce PinnedResource
This introduces PinnedResources as a structure combining pinned
references to a root directory, root image, or root mstack. This is not
only easier to work with, but essential to make certain unpriv things
work, as we need some mechanism to pin resources before we drop into a
userns which might possibly not provide access anymore to those
resources.

Hence this does two things: introduce the new structure, and immediately
hook it up so that we pin things properly before dropping into userns,
and then makes use of this after dropping the right way, and enables
unpriv userns operation.

The concept is generic enough to eventually implement extension images +
mount images with the same structure, but in order to keep the changes
managable this is left for another time.

(This also makes one further clean-up: client-side verity-reuse checks
are moved server side if we are unpriv. Previously we'd do them client
side, but they were doomed to fail because of lack of privs. Hence let's
drop the client side if we are unpriv and purely do them server-side in
that case.)
2026-02-19 15:08:20 +01:00
Lennart Poettering
68ed4f3c66 mountfsd,nsresource: allow recycling mountfsd/nsresourced client connections
So far we opened a new Varlink connection for every mountfsd/nsresourced
method call. Given each tool only does a very small number of calls
(usually 1…5) on them and the connections are cheap this is not too
wasteful. Nonetheless, let's do something about it, and allow reusing
the connection for multiple calls.

This not only makes things a bit more efficient, but has one more
important benefit: Varlink connections pin the security context of the
client when connecting. This means that varlink method calls done with a
connection established while some code was privileged will still operate
as privieged once privs are dropped, until the connection is closed.
This pinning effect is really nice, as it gives us behaviour in a
"capability system" like scheme. Later code is going to use that to
continue doing certain priv userns ops even after unsharing userns and
becoming fully unpriv.
2026-02-19 15:08:19 +01:00
Lennart Poettering
fe487d3670 namespace: extend bind mount ignore field to permission issues
A later commit will add transient allocation of user namespaces with
dynamic UID range assignment. That creates certain permission issues.
Let's hence allow them to be handled gracefully in case the 'ignore'
field is set for a mount.
2026-02-19 15:07:19 +01:00
Lennart Poettering
26f80a9d33 namespace: port mount_private_apivfs() to fsopen() and friends
This is not just refactoring, but has the big benefit that it makes us
indepdendent from a temporary directory we might not have enough access
to create. (This matters with the new PrivateUsers=managed).
2026-02-19 15:07:19 +01:00
Lennart Poettering
051dfed9b6 private 2026-02-19 15:07:18 +01:00
Lennart Poettering
6b22ac31af core: add PrivateUsers=managed 2026-02-19 15:05:15 +01:00
Lennart Poettering
fbaf05bfeb importctl: add 'pull-oci' client API 2026-02-19 15:05:15 +01:00
Lennart Poettering
9628a5ced4 importd: add bus/varlink api for downloading OCIs 2026-02-19 15:05:15 +01:00
Lennart Poettering
fe1f8c1acd run: support RootMStack= on the client side for systemd-run 2026-02-19 15:05:15 +01:00
Lennart Poettering
38e65b6e4c portable: support .mstack images 2026-02-19 15:05:15 +01:00
Lennart Poettering
32b88dd25a pid1: introduce RootMStack= for using an mstack as root dir for a service 2026-02-19 15:05:15 +01:00
Lennart Poettering
116394f7ce tree-wide: move logging from varlink clients in nsresource.c/dissect-image.c into callers
These calls are "library-like", hence better should only debug log on
their own, not more.
2026-02-19 15:05:15 +01:00
Lennart Poettering
1bc237dd7a nspawn: add support for running mstack container images 2026-02-19 15:05:15 +01:00
Lennart Poettering
8cd29712cb discover-image: add support for discovering mstack images 2026-02-19 15:05:15 +01:00
Lennart Poettering
8187cd18d6 add mstack tool for accessing mstacks from the command line 2026-02-19 15:05:15 +01:00
Lennart Poettering
944161d0d4 vpick: add generic definition for mstack image pick filters 2026-02-19 15:05:15 +01:00
Lennart Poettering
8343032a86 mstack: introduce "mstack" concept 2026-02-19 15:05:15 +01:00
Lennart Poettering
a9f6ba0496 pull: add OCI support 2026-02-19 15:05:15 +01:00
Lennart Poettering
087f2ec344 core: introduce exec_context_with_rootfs_strict() as a stricter version of exec_context_with_rootfs()
We have two very similar checks in place: in some contexts we want to
know if *any* RootDirectory= is configured, in the other we want to
suppress if it is configured to our regular root. Let's add a helper for
both (even if we only need it once), to make the mirrored behaviour
clear.
2026-02-19 15:05:15 +01:00
Lennart Poettering
6d327e9fdd core: use exec_context_with_rootfs() at one more place 2026-02-19 15:05:15 +01:00
Lennart Poettering
6527861039 tar-util: add support for extracting OCI compatible whiteouts, and turn them into overlayfs whiteouts 2026-02-19 15:05:15 +01:00
Lennart Poettering
d66e3cc79a pull-job: make sure pull_job_restart() can be used to fetch the same resource again, just with new headers
Let's flush out all response state from the job, but let's keep the
request data previously configured, in particular the headers set. This
is useful to re-request a resource, just with a slightly modified or
identical URL.
2026-02-19 15:05:15 +01:00
Lennart Poettering
93915ec17a pull-job: add helpers to detect requests for authentication, and accept bearer tokens 2026-02-19 15:05:14 +01:00
Lennart Poettering
1f593d2b5e pull-job: add 'description' field to PullJob
This is shown in the output in place of the URL if non-NULL. This is
useful for OCI's hash-based URLs, which alone are very opaque to read.
2026-02-19 15:05:14 +01:00
Lennart Poettering
98b714e0cb pull-job: optionally free userdata when we destroy a PullJob 2026-02-19 15:05:14 +01:00
Lennart Poettering
4a886d9ab4 pull-job: add interface for controlling Accept: header sent to http server 2026-02-19 15:05:14 +01:00
Lennart Poettering
41b0c68760 pull-job: keep track of content type reported by server 2026-02-19 15:05:14 +01:00
Lennart Poettering
dbf13060e1 uid-range: add uid_range_base() that returns the lowest entry 2026-02-19 15:05:14 +01:00
Lennart Poettering
4c701125df basic: define Architecture typedef in basic-forward.h 2026-02-19 15:05:14 +01:00
Carolina Jubran
1e6854e112 udev: grant read access to PTP devices for unprivileged users
Change the default udev rule for /dev/ptp* from 0660 to 0664,
allowing unprivileged users read-only access.

NIC telemetry and hardware logs often use device timestamps that must
be correlated with host time via read-only PTP ioctls (e.g.
cross-timestamp queries). Requiring privileged access makes these
workflows unnecessarily restrictive.

Older kernels lacked proper permission checks in some PTP ioctls.
Kernel commit b4e53b15c04e3852949003752f48f7a14ae39e86 ("ptp: Add PHC
file mode checks. Allow RO adjtime() without FMODE_WRITE.") introduces
the necessary file mode validation, ensuring that read access does not
permit clock modification or configuration changes, which still require
write permissions.

This commit has been backported to all actively maintained stable
kernel branches.

Related to #31034
2026-02-19 14:51:12 +01:00
Yu Watanabe
dbc83d6353 NEWS: mention python requirement bump 2026-02-19 22:23:04 +09:00
Chris Lindee
2d738a0aee profile/systemd-osc-context: Enforce length limits
UAPI.15 limits the length of fields; adhere to them.

References:
 [0] https://uapi-group.org/specifications/specs/osc_context/#syntax-in-abnf
2026-02-16 03:15:53 -06:00
Chris Lindee
72e71897f9 profile/systemd-osc-context: Remove invalid octets
UAPI.15 requires:
* No C0 Control Characters (`\x00-\x1f`)
* No DEL character (`\x7f`)

The following would be nice, but requires a `sed` implementation that is
aware of UTF-8: `-e $'y/\x00-\x1f\x7f/␀-␟␡/'`.

References:
 [0] https://uapi-group.org/specifications/specs/osc_context/#general-syntax
2026-02-16 03:15:53 -06:00
Chris Lindee
6a46846871 profile/systemd-osc-context: Bring escape up to spec
UAPI.15 requires:
* `\` to be replaced with `\x5c`
* `;` to be replaced with `\x3b`

References:
 [0] https://uapi-group.org/specifications/specs/osc_context/#general-syntax
2026-02-16 03:15:53 -06:00
Chris Lindee
b88ab1c2dd profile/systemd-osc-context: Acknowledge uncertainty
Bash does not provide an easy way to discern if an exit status came from
a signal, or was a legitimate non-zero exit (i.e. a failure).  It can be
done, by using job control or by invoking another program; however, such
approaches require modifying the command entered by a user and are, thus,
invasive and risky.

Since an exit status of 129 on a command could either indicate it exited
cleanly with `exit(129)` or was interrupted by `SIGHUP`, the osc context
should report both possibilities, to acknowledge our uncertainty.  Given
we have no idea what happened, besides an unsuccessful exit, the exit is
described as `exit=failure`.

Moreover, discerning between an `interrupt` and a `crash` with a command
likely involves categorizing every signal.  It is fairly obvious `SIGINT`
is an interrupt and also obvious, IMO, that `SIGSEGV` is a crash.  Avoid
the complication (and potential disagreements) by using the encompassing,
generic value — one that remains true if no signal occurred.

References:
 [0] https://stackoverflow.com/a/66431355
2026-02-16 03:15:51 -06:00
Chris Lindee
3de092fad7 profile/systemd-osc-context: Bring exit up to spec
The exit status should be compared against 128 — not 127 — as the Bash
manual, under the EXIT STATUS section, states [0]:

    When a command terminates on a fatal signal N, bash uses the value
    of 128+N as the exit status.

    If a command is not found, the child process created to execute it
    returns a status of 127.  If a command is found but is not
    executable, the return status is 126.

Furthermore, UAPI.15 specifies the `signal` value must be the symbolic
signal name, not a numerical value [1].  Luckily, POSIX.1-2008 ensures
the shell must supply `kill -l`, which must be able to convert `$?` to
the signal that could cause such an exit in that shell [2].  It should
be noted that only a select handful of signals have a standard numeric
value; all others are implementation-defined [3].  This is why UAPI.15
requires the symbolic name, as the value may differ across systems.

Notably, not every value above 128 needs to correspond to a signal; on
my Linux system, only exit codes 129–192 indicate a signal.  Again, we
can use `kill -l`, which will exit non-zero when an exit code does not
come from a signal (plus 128) [4].

Signal 0 (which would correspond to exit code 128) is reserved for the
null signal, per POSIX.1-2008 [5], and therefore will never be an exit
signal.

This change brings the implementation into compliance.

References:
 [0] https://www.man7.org/linux/man-pages//man1/bash.1.html#EXIT_STATUS
 [1] https://uapi-group.org/specifications/specs/osc_context/
 [2] https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/kill.html#:~:text=The%20letter%20ell
 [3] https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/kill.html#:~:text=The%20effects%20of%20specifying%20any%20signal_number
 [4] https://www.man7.org/linux/man-pages//man1/bash.1.html#:~:text=false%20if%20an%20error%20occurs%20or%20an%20invalid%20option%20is%20encountered
 [5] https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/basedefs/signal.h.html#:~:text=The%20value%200%20is%20reserved%20for%20use%20as%20the%20null%20signal
2026-02-16 03:14:43 -06:00
96 changed files with 6792 additions and 925 deletions

View File

@ -1,24 +0,0 @@
#!/bin/bash
# SPDX-License-Identifier: LGPL-2.1-or-later
set -eux
if ! command -v musl-gcc >/dev/null; then
echo "musl-gcc is not installed, skipping the test."
exit 77
fi
TMPDIR=$(mktemp -d)
cleanup() (
set +e
if [[ -d "$TMPDIR" ]]; then
rm -rf "$TMPDIR"
fi
)
trap cleanup EXIT ERR INT TERM
tools/setup-musl-build.sh "${TMPDIR}/build"
ninja -v -C "${TMPDIR}/build"

View File

@ -81,4 +81,12 @@ jobs:
run: mkosi box -- meson test -C build --suite=clang-tidy --print-errorlogs --no-stdsplit --quiet run: mkosi box -- meson test -C build --suite=clang-tidy --print-errorlogs --no-stdsplit --quiet
- name: Build with musl - name: Build with musl
run: mkosi box -- .github/workflows/build-test-musl.sh run: |
mkosi box -- \
env \
CC=musl-gcc \
CXX=musl-gcc \
CFLAGS="-idirafter /usr/include" \
CXXFLAGS="-idirafter /usr/include" \
meson setup -Dlibc=musl -Ddbus-interfaces-dir=no musl
mkosi box -- ninja -C musl

13
NEWS
View File

@ -32,7 +32,8 @@ CHANGES WITH 260 in spe:
libseccomp 2.3.1 → 2.4.0, libseccomp 2.3.1 → 2.4.0,
glibc 2.31 → 2.34, glibc 2.31 → 2.34,
libxcrypt or libcrypt from glibc → libxcrypt 4.4.0 only, libxcrypt or libcrypt from glibc → libxcrypt 4.4.0 only,
OpenSSL 1.1.0 → 3.0.0. OpenSSL 1.1.0 → 3.0.0,
Python 3.7.0 → 3.9.0.
The Linux kernel version requirements have been updated too: The Linux kernel version requirements have been updated too:
baseline 5.4 → 5.10, recommended baseline 5.7 → 5.14, 6.6 for full baseline 5.4 → 5.10, recommended baseline 5.7 → 5.14, 6.6 for full
@ -109,6 +110,16 @@ CHANGES WITH 260 in spe:
Changes in udev: Changes in udev:
* Permissions for /dev/ptp* are now set to 0664 (previously 0660),
allowing unprivileged read-only access. This relies on the kernel fix
"ptp: Add PHC file mode checks. Allow RO adjtime() without
FMODE_WRITE." (commit b4e53b15c04e3852949003752f48f7a14ae39e86 in
v6.15, backported to LTS releases in v6.12.68, v6.6.122, v6.1.162,
v5.15.199, and v5.10.249), which adds missing PTP ioctl permission
checks and keeps clock-modifying operations write-restricted. Systems
running stable kernel branches should ensure they are updated to patch
levels that include the fix.
* Persistent network interface naming has bee extended to MCTP devices * Persistent network interface naming has bee extended to MCTP devices
with the "mc" prefix. with the "mc" prefix.

3
README
View File

@ -77,7 +77,8 @@ REQUIREMENTS:
≥ 6.10 for fcntl(F_DUPFD_QUERY), unprivileged linkat(AT_EMPTY_PATH), ≥ 6.10 for fcntl(F_DUPFD_QUERY), unprivileged linkat(AT_EMPTY_PATH),
and block device 'partscan' sysfs attribute and block device 'partscan' sysfs attribute
≥ 6.12 for AT_HANDLE_MNT_ID_UNIQUE ≥ 6.12 for AT_HANDLE_MNT_ID_UNIQUE
≥ 6.13 for PIDFD_GET_INFO and {set,remove}xattrat() ≥ 6.13 for PIDFD_GET_INFO and {set,remove}xattrat() and
FSCONFIG_SET_FD support for overlayfs layers
≥ 6.16 for coredump pattern '%F' (pidfd) specifier and SO_PASSRIGHTS ≥ 6.16 for coredump pattern '%F' (pidfd) specifier and SO_PASSRIGHTS
✅ systemd utilizes several new kernel APIs, but will fall back gracefully ✅ systemd utilizes several new kernel APIs, but will fall back gracefully

30
TODO
View File

@ -124,6 +124,15 @@ Features:
* networkd: maintain a file in /run/ that can be symlinked into /run/issue.d/ * networkd: maintain a file in /run/ that can be symlinked into /run/issue.d/
that always shows the current primary IP address that always shows the current primary IP address
* oci: add support for blake hashes for layers
* oci: add support for "importctl import-oci" which implements the "OCI layout"
spec (i.e. acquiring via local fs access), as opposed to the current
"importctl pull-oci" which focusses on the "OCI image spec", i.e. downloads
from the web (i.e. acquiring via URLs).
* oci: support "data" in any OCI descriptor, not just manifest config.
* report: * report:
- should the list of metrics use JSON-SEQ? or maybe be wrapped in a json - should the list of metrics use JSON-SEQ? or maybe be wrapped in a json
array (the latter might be necessary, once we sign the combination) array (the latter might be necessary, once we sign the combination)
@ -204,6 +213,10 @@ Features:
* homed/pam_systemd: allow authentication by ssh-agent, so that run0/polkit can * homed/pam_systemd: allow authentication by ssh-agent, so that run0/polkit can
be allowed if caller comes with the right ssh-agent keys. be allowed if caller comes with the right ssh-agent keys.
* machined: gc for OCI layers that are not referenced anymore by any .mstack/ links.
* pull-oci: progress notification
* networkd/machined: implement reverse name lookups in the resolved hook * networkd/machined: implement reverse name lookups in the resolved hook
* networkd's resolved hook: optionally map all lease IP addresses handed out to * networkd's resolved hook: optionally map all lease IP addresses handed out to
@ -220,19 +233,8 @@ Features:
the file systems to disk later (using btrfs device replacement), if needed as the file systems to disk later (using btrfs device replacement), if needed as
part of an installer logic. part of an installer logic.
* add a concept of overlay directory stacks to image discovery, i.e. have a dir
with a name suffix of ".ovl" or so that contains DDIs and plain dirs (and
possible .v dirs) that are glued together on use via overlayfs. one special
subdir should be used as writable top layer.
* journald: log pidfid as another field, i.e. _PIDFDID= * journald: log pidfid as another field, i.e. _PIDFDID=
* systemd-nspawn: something like --volatile= but that isn't volatile, but
stores the data in some separate dir on disk. Usecase: keep always up-to-date
DDIs of some OS in your home dir, but combine its /usr/ with a locally
maintained root fs in a regular dir to maintain local state. (idea: call it
--mutable= and take dir or DDI and merge in)
* measure all log-in attempts into a new nvpcr * measure all log-in attempts into a new nvpcr
* measure all DDI activations into a new nvpcr * measure all DDI activations into a new nvpcr
@ -1128,12 +1130,6 @@ Features:
automatically support reverting back to older OS version images if newer ones automatically support reverting back to older OS version images if newer ones
fail to boot. fail to boot.
* implement new "systemd-fsrebind" tool that works like gpt-auto-generator but
looks at a root dir and then applies vpick on various dirs/images to pick a
root tree, a /usr/ tree, a /home/, a /srv/, a /var/ tree and so on. Dirs
could also be btrfs subvols (combine with btrfs auto-snapshort approach for
creating versions like these automatically).
* remove tomoyo support, it's obsolete and unmaintained apparently * remove tomoyo support, it's obsolete and unmaintained apparently
* In .socket units, add ConnectStream=, ConnectDatagram=, * In .socket units, add ConnectStream=, ConnectDatagram=,

View File

@ -530,8 +530,7 @@ sensor:modalias:acpi:BMA250E:*:dmi:*:svnInsyde*:pni101c:*
######################################### #########################################
# HP # HP
######################################### #########################################
# With HP match in system vendor and board product name as hp-wmi module does.
# TODO: Look if for HP is better to use pvr.
# Most HP Laptop using the lis3lv02d device have it in the base, # Most HP Laptop using the lis3lv02d device have it in the base,
# mark these sensors as such. # mark these sensors as such.
@ -539,20 +538,20 @@ sensor:modalias:platform:lis3lv02d:dmi:*:svnHewlett-Packard:*
sensor:modalias:platform:lis3lv02d:dmi:*:svnHP:* sensor:modalias:platform:lis3lv02d:dmi:*:svnHP:*
ACCEL_LOCATION=base ACCEL_LOCATION=base
sensor:modalias:platform:HID-SENSOR-200073:dmi:*:svnHP:pnHPOmniBookUltraFlipLaptop14-fh0xxx:* # OmniBook Ultra Flip 14 sensor:modalias:platform:HID-SENSOR-200073:dmi:*:svnHP:*:rn8CDE:* # OmniBook Ultra Flip Laptop 14-fh0xxx and 14t-fh000
ACCEL_MOUNT_MATRIX=1, 0, 0; 0, 1, 0; 0, 0, -1 ACCEL_MOUNT_MATRIX=1, 0, 0; 0, 1, 0; 0, 0, -1
sensor:modalias:acpi:SMO8500:*:dmi:*:svnHewlett-Packard:pnHPStream7Tablet:* # Stream 7 Tablet sensor:modalias:i2c:bmc150_accel:dmi:*:svnHewlett-Packard:*:rn8021:* # Pavilion X2 10-k010nr
sensor:modalias:acpi:SMO8500:*:dmi:*:svnHewlett-Packard:pnHPStream8Tablet:* # Stream 8 Tablet sensor:modalias:i2c:bmc150_accel:dmi:*:svnHewlett-Packard:*:rn815D:* # Pavilion X2 10-n000nd
ACCEL_MOUNT_MATRIX=0, 1, 0; 1, 0, 0; 0, 0, 1
sensor:modalias:i2c:bmc150_accel:dmi:*:svnHewlett-Packard:pnHPPavilionx2Detachable:*:rn815D:* # Pavilion X2 10-n000nd
sensor:modalias:i2c:bmc150_accel:dmi:*:svnHewlett-Packard:pnHPPavilionx2DetachablePC10:* # Pavilion X2 10-k010nr
ACCEL_MOUNT_MATRIX=0, -1, 0; -1, 0, 0; 0, 0, 1 ACCEL_MOUNT_MATRIX=0, -1, 0; -1, 0, 0; 0, 0, 1
sensor:modalias:i2c:bmc150_accel:dmi:*:svnHewlett-Packard:pnHPProTablet408:*:rn8048:* # Pro Tablet 408 G1 sensor:modalias:i2c:bmc150_accel:dmi:*:svnHewlett-Packard:*:rn8048:* # Pro Tablet 408 G1
ACCEL_MOUNT_MATRIX=0, 1, 0; -1, 0, 0; 0, 0, 1 ACCEL_MOUNT_MATRIX=0, 1, 0; -1, 0, 0; 0, 0, 1
sensor:modalias:acpi:SMO8500:*:dmi:*:svnHewlett-Packard:*:rn8031:* # Stream 7 Tablet
sensor:modalias:acpi:SMO8500:*:dmi:*:svnHewlett-Packard:*:rn8032:* # Stream 8 Tablet
ACCEL_MOUNT_MATRIX=0, 1, 0; 1, 0, 0; 0, 0, 1
######################################### #########################################
# HUAWEI # HUAWEI
######################################### #########################################

View File

@ -39,9 +39,9 @@
<para><command>importctl</command> may be used to download, import, and export disk images via <para><command>importctl</command> may be used to download, import, and export disk images via
<citerefentry><refentrytitle>systemd-importd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>.</para> <citerefentry><refentrytitle>systemd-importd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>.</para>
<para><command>importctl</command> operates both on block-level disk images (such as DDIs) as well as <para><command>importctl</command> operates both on block-level disk images (such as DDIs),
file-system-level images (tarballs). It supports disk images in one of the four following file-system-level images (tarballs), as well as OCI images. It supports disk images in one of the four
classes:</para> following classes:</para>
<itemizedlist> <itemizedlist>
<listitem><para>VM images or full OS container images, that may be run via <listitem><para>VM images or full OS container images, that may be run via
@ -171,6 +171,24 @@
<xi:include href="version-info.xml" xpointer="v256"/></listitem> <xi:include href="version-info.xml" xpointer="v256"/></listitem>
</varlistentry> </varlistentry>
<varlistentry>
<term><command>pull-oci</command> <replaceable>REF</replaceable> [<replaceable>NAME</replaceable>]</term>
<listitem><para>Downloads the specified OCI container image, and makes it available under the
specified local name in the image directory for the selected <option>--class=</option>. The first
argument must be an OCI container reference, such as <literal>library/nginx</literal> If the local
name is omitted, it is automatically derived from the last component of the URL, with its suffix
removed.</para>
<para>When downloading images of this type no image verification is done beyond the usual
authentication of the HTTPS certificates.</para>
<para>Note that pressing Control-c during execution of this command will not abort the download. Use
<command>cancel-transfer</command>, described below.</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<varlistentry> <varlistentry>
<term><command>import-tar</command> <replaceable>FILE</replaceable> [<replaceable>NAME</replaceable>]</term> <term><command>import-tar</command> <replaceable>FILE</replaceable> [<replaceable>NAME</replaceable>]</term>
<term><command>import-raw</command> <replaceable>FILE</replaceable> [<replaceable>NAME</replaceable>]</term> <term><command>import-raw</command> <replaceable>FILE</replaceable> [<replaceable>NAME</replaceable>]</term>

View File

@ -136,6 +136,12 @@ node /org/freedesktop/import1 {
in t flags, in t flags,
out u transfer_id, out u transfer_id,
out o transfer_path); out o transfer_path);
PullOci(in s ref,
in s local_name,
in s class,
in t flags,
out u transfer_id,
out o transfer_path);
ListTransfers(out a(usssdo) transfers); ListTransfers(out a(usssdo) transfers);
ListTransfersEx(in s class, ListTransfersEx(in s class,
in t flags, in t flags,
@ -191,6 +197,8 @@ node /org/freedesktop/import1 {
<variablelist class="dbus-method" generated="True" extra-ref="PullRawEx()"/> <variablelist class="dbus-method" generated="True" extra-ref="PullRawEx()"/>
<variablelist class="dbus-method" generated="True" extra-ref="PullOci()"/>
<variablelist class="dbus-method" generated="True" extra-ref="ListTransfers()"/> <variablelist class="dbus-method" generated="True" extra-ref="ListTransfers()"/>
<variablelist class="dbus-method" generated="True" extra-ref="ListTransfersEx()"/> <variablelist class="dbus-method" generated="True" extra-ref="ListTransfersEx()"/>
@ -290,6 +298,19 @@ node /org/freedesktop/import1 {
export calls above, these calls return a pair of transfer identifier and object path for the ongoing export calls above, these calls return a pair of transfer identifier and object path for the ongoing
download.</para> download.</para>
<para><function>PullOci()</function> is similar to <function>PullTarEx()</function> or
<function>PullRawEx()</function> and may be used to download and import an OCI container image from an
OCI registry. It takes an OCI container reference as argument. The second argument is a local name for
the image (which will be generated as
<citerefentry><refentrytitle>systemd.mstack</refentrytitle><manvolnum>7</manvolnum></citerefentry>
directory referencing the OCI layers). It should be suitable as a hostname, similarly to the matching
argument of the <function>PullTar()</function>/<function>PullTarEx()</function> and
<function>PullRaw()</function>/<function>PullRawEx()</function> methods above. The last argument is a
64bit flags parameter, where bit 0 controls the <literal>force</literal> flag, bit 1 is a
<literal>read_only</literal> flag that controls whether the created image shall be marked
read-only. Like the pull calls above, this call return a pair of transfer identifier and object path
for the ongoing download.</para>
<para><function>ImportFileSystem()</function>/<function>ImportFileSystemEx()</function> are similar to <para><function>ImportFileSystem()</function>/<function>ImportFileSystemEx()</function> are similar to
<function>ImportTar()</function>/<function>ImportTarEx()</function> but import a directory tree. The <function>ImportTar()</function>/<function>ImportTarEx()</function> but import a directory tree. The
first argument must refer to a directory file descriptor for the source hierarchy to import.</para> first argument must refer to a directory file descriptor for the source hierarchy to import.</para>
@ -464,6 +485,7 @@ node /org/freedesktop/import1/transfer/_1 {
<function>ExportRawEx()</function>, <function>PullTarEx()</function>, <function>PullRawEx()</function>, <function>ExportRawEx()</function>, <function>PullTarEx()</function>, <function>PullRawEx()</function>,
<function>ListTransfersEx()</function>, <function>ListImages()</function> were added in version <function>ListTransfersEx()</function>, <function>ListImages()</function> were added in version
256.</para> 256.</para>
<para><function>PullOci()</function> was added in version 260.</para>
</refsect2> </refsect2>
<refsect2> <refsect2>
<title>Transfer Objects</title> <title>Transfer Objects</title>

View File

@ -3137,6 +3137,8 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly b RootEphemeral = ...; readonly b RootEphemeral = ...;
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly s RootMStack = '...';
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly as ExtensionDirectories = ['...', ...]; readonly as ExtensionDirectories = ['...', ...];
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly a(sba(ss)) ExtensionImages = [...]; readonly a(sba(ss)) ExtensionImages = [...];
@ -4496,6 +4498,8 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
<variablelist class="dbus-property" generated="True" extra-ref="RootEphemeral"/> <variablelist class="dbus-property" generated="True" extra-ref="RootEphemeral"/>
<variablelist class="dbus-property" generated="True" extra-ref="RootMStack"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExtensionDirectories"/> <variablelist class="dbus-property" generated="True" extra-ref="ExtensionDirectories"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExtensionImages"/> <variablelist class="dbus-property" generated="True" extra-ref="ExtensionImages"/>
@ -4938,6 +4942,7 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
<varname>MountImages</varname> <varname>MountImages</varname>
<varname>ExtensionImages</varname> <varname>ExtensionImages</varname>
<varname>ExtensionDirectories</varname> <varname>ExtensionDirectories</varname>
<varname>RootMStack</varname>
see systemd.exec(5) for their meaning.</para> see systemd.exec(5) for their meaning.</para>
<para><varname>MemoryAvailable</varname> takes into account unit's and parents' <literal>MemoryMax</literal> <para><varname>MemoryAvailable</varname> takes into account unit's and parents' <literal>MemoryMax</literal>
@ -5392,6 +5397,8 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2esocket {
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly b RootEphemeral = ...; readonly b RootEphemeral = ...;
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly s RootMStack = '...';
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly as ExtensionDirectories = ['...', ...]; readonly as ExtensionDirectories = ['...', ...];
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly a(sba(ss)) ExtensionImages = [...]; readonly a(sba(ss)) ExtensionImages = [...];
@ -6741,6 +6748,8 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2esocket {
<variablelist class="dbus-property" generated="True" extra-ref="RootEphemeral"/> <variablelist class="dbus-property" generated="True" extra-ref="RootEphemeral"/>
<variablelist class="dbus-property" generated="True" extra-ref="RootMStack"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExtensionDirectories"/> <variablelist class="dbus-property" generated="True" extra-ref="ExtensionDirectories"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExtensionImages"/> <variablelist class="dbus-property" generated="True" extra-ref="ExtensionImages"/>
@ -7461,6 +7470,8 @@ node /org/freedesktop/systemd1/unit/home_2emount {
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly b RootEphemeral = ...; readonly b RootEphemeral = ...;
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly s RootMStack = '...';
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly as ExtensionDirectories = ['...', ...]; readonly as ExtensionDirectories = ['...', ...];
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly a(sba(ss)) ExtensionImages = [...]; readonly a(sba(ss)) ExtensionImages = [...];
@ -8642,6 +8653,8 @@ node /org/freedesktop/systemd1/unit/home_2emount {
<variablelist class="dbus-property" generated="True" extra-ref="RootEphemeral"/> <variablelist class="dbus-property" generated="True" extra-ref="RootEphemeral"/>
<variablelist class="dbus-property" generated="True" extra-ref="RootMStack"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExtensionDirectories"/> <variablelist class="dbus-property" generated="True" extra-ref="ExtensionDirectories"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExtensionImages"/> <variablelist class="dbus-property" generated="True" extra-ref="ExtensionImages"/>
@ -9495,6 +9508,8 @@ node /org/freedesktop/systemd1/unit/dev_2dsda3_2eswap {
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly b RootEphemeral = ...; readonly b RootEphemeral = ...;
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly s RootMStack = '...';
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly as ExtensionDirectories = ['...', ...]; readonly as ExtensionDirectories = ['...', ...];
@org.freedesktop.DBus.Property.EmitsChangedSignal("const") @org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly a(sba(ss)) ExtensionImages = [...]; readonly a(sba(ss)) ExtensionImages = [...];
@ -10640,6 +10655,8 @@ node /org/freedesktop/systemd1/unit/dev_2dsda3_2eswap {
<variablelist class="dbus-property" generated="True" extra-ref="RootEphemeral"/> <variablelist class="dbus-property" generated="True" extra-ref="RootEphemeral"/>
<variablelist class="dbus-property" generated="True" extra-ref="RootMStack"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExtensionDirectories"/> <variablelist class="dbus-property" generated="True" extra-ref="ExtensionDirectories"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExtensionImages"/> <variablelist class="dbus-property" generated="True" extra-ref="ExtensionImages"/>
@ -12520,9 +12537,8 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<varname>ManagedOOMKills</varname>, <varname>ManagedOOMKills</varname>,
<varname>ExecReloadPost</varname>, and <varname>ExecReloadPost</varname>, and
<varname>ExecReloadPostEx</varname> were added in version 259.</para> <varname>ExecReloadPostEx</varname> were added in version 259.</para>
<para><varname>BindNetworkInterface</varname>, <para><varname>BindNetworkInterface</varname>, <varname>MemoryTHP</varname>,
<varname>MemoryTHP</varname>, and <varname>RefreshOnReload</varname>, and <varname>RootMStack</varname> were added in version 260.</para>
<varname>RefreshOnReload</varname> were added in version 260.</para>
</refsect2> </refsect2>
<refsect2> <refsect2>
<title>Socket Unit Objects</title> <title>Socket Unit Objects</title>
@ -12591,8 +12607,8 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<para><varname>UserNamespacePath</varname>, <para><varname>UserNamespacePath</varname>,
<varname>OOMKills</varname>, and <varname>OOMKills</varname>, and
<varname>ManagedOOMKills</varname> were added in 259.</para> <varname>ManagedOOMKills</varname> were added in 259.</para>
<para><varname>BindNetworkInterface</varname>, and <para><varname>BindNetworkInterface</varname> <varname>MemoryTHP</varname>, and
<varname>MemoryTHP</varname> were added in version 260.</para> <varname>RootMStack</varname> were added in version 260.</para>
</refsect2> </refsect2>
<refsect2> <refsect2>
<title>Mount Unit Objects</title> <title>Mount Unit Objects</title>
@ -12656,8 +12672,8 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<para><varname>UserNamespacePath</varname>, <para><varname>UserNamespacePath</varname>,
<varname>OOMKills</varname>, and <varname>OOMKills</varname>, and
<varname>ManagedOOMKills</varname> were added in 259.</para> <varname>ManagedOOMKills</varname> were added in 259.</para>
<para><varname>BindNetworkInterface</varname>, and <para><varname>BindNetworkInterface</varname> <varname>MemoryTHP</varname>, and
<varname>MemoryTHP</varname> were added in version 260.</para> <varname>RootMStack</varname> were added in version 260.</para>
</refsect2> </refsect2>
<refsect2> <refsect2>
<title>Swap Unit Objects</title> <title>Swap Unit Objects</title>
@ -12719,8 +12735,8 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<para><varname>UserNamespacePath</varname>, <para><varname>UserNamespacePath</varname>,
<varname>OOMKills</varname>, and <varname>OOMKills</varname>, and
<varname>ManagedOOMKills</varname> were added in 259.</para> <varname>ManagedOOMKills</varname> were added in 259.</para>
<para><varname>BindNetworkInterface</varname>, and <para><varname>BindNetworkInterface</varname>, <varname>MemoryTHP</varname>, and
<varname>MemoryTHP</varname> were added in version 260.</para> <varname>RootMStack</varname> were added in version 260.</para>
</refsect2> </refsect2>
<refsect2> <refsect2>
<title>Slice Unit Objects</title> <title>Slice Unit Objects</title>

View File

@ -1074,6 +1074,7 @@ manpages = [
['systemd-modules-load.service', '8', ['systemd-modules-load'], 'HAVE_KMOD'], ['systemd-modules-load.service', '8', ['systemd-modules-load'], 'HAVE_KMOD'],
['systemd-mount', '1', ['systemd-umount'], ''], ['systemd-mount', '1', ['systemd-umount'], ''],
['systemd-mountfsd.service', '8', ['systemd-mountfsd'], 'ENABLE_MOUNTFSD'], ['systemd-mountfsd.service', '8', ['systemd-mountfsd'], 'ENABLE_MOUNTFSD'],
['systemd-mstack', '1', ['mount.mstack'], 'HAVE_BLKID'],
['systemd-mute-console', ['systemd-mute-console',
'1', '1',
['systemd-mute-console.socket', 'systemd-mute-console@.service'], ['systemd-mute-console.socket', 'systemd-mute-console@.service'],
@ -1249,6 +1250,7 @@ manpages = [
['systemd.kill', '5', [], ''], ['systemd.kill', '5', [], ''],
['systemd.link', '5', [], ''], ['systemd.link', '5', [], ''],
['systemd.mount', '5', [], ''], ['systemd.mount', '5', [], ''],
['systemd.mstack', '7', [], ''],
['systemd.net-naming-scheme', '7', [], ''], ['systemd.net-naming-scheme', '7', [], ''],
['systemd.netdev', '5', [], 'ENABLE_NETWORKD'], ['systemd.netdev', '5', [], 'ENABLE_NETWORKD'],
['systemd.network', '5', [], 'ENABLE_NETWORKD'], ['systemd.network', '5', [], 'ENABLE_NETWORKD'],

218
man/systemd-mstack.xml Normal file
View File

@ -0,0 +1,218 @@
<?xml version='1.0'?> <!--*-nxml-*-->
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<!-- SPDX-License-Identifier: LGPL-2.1-or-later -->
<refentry id="systemd-mstack"
xmlns:xi="http://www.w3.org/2001/XInclude">
<refentryinfo>
<title>systemd-mstack</title>
<productname>systemd</productname>
</refentryinfo>
<refmeta>
<refentrytitle>systemd-mstack</refentrytitle>
<manvolnum>1</manvolnum>
</refmeta>
<refnamediv>
<refname>systemd-mstack</refname>
<refname>mount.mstack</refname>
<refpurpose>Mstack Discoverable Disk Images (DDIs)</refpurpose>
</refnamediv>
<refsynopsisdiv>
<cmdsynopsis>
<command>systemd-mstack</command> <arg choice="opt" rep="repeat">OPTIONS</arg> <arg choice="plain"><replaceable>IMAGE</replaceable></arg>
</cmdsynopsis>
<cmdsynopsis>
<command>systemd-mstack</command> <arg choice="opt" rep="repeat">OPTIONS</arg> <arg>--mount</arg> <arg choice="plain"><replaceable>IMAGE</replaceable></arg> <arg choice="plain"><replaceable>PATH</replaceable></arg>
</cmdsynopsis>
<cmdsynopsis>
<command>systemd-mstack</command> <arg choice="opt" rep="repeat">OPTIONS</arg> <arg>--umount</arg> <arg choice="plain"><replaceable>PATH</replaceable></arg>
</cmdsynopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para><command>systemd-mstack</command> is a tool for introspecting and interacting with
<filename>.mstack/</filename> mount stack directories, as described in
<citerefentry><refentrytitle>systemd.mstack</refentrytitle><manvolnum>7</manvolnum></citerefentry>. It
supports three different operations:</para>
<orderedlist>
<listitem><para>Show general mount stack information, including all described
<literal>overlayfs</literal> layers and bind mounts.</para></listitem>
<listitem><para>Mount a mount stack to a local directory.</para></listitem>
<listitem><para>Unmount a mount stack from a local directory.</para></listitem>
</orderedlist>
<para>The <command>systemd-mstack</command> command may be invoked as <command>mount.mstack</command> in
which case it implements the <citerefentry
project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>8</manvolnum></citerefentry> "external
helper" interface. This ensures mount stack directories compatible with <command>systemd-mstack</command>
can be mounted directly by <command>mount</command> and <citerefentry
project='man-pages'><refentrytitle>fstab</refentrytitle><manvolnum>5</manvolnum></citerefentry>. For
details see below.</para>
<xi:include href="vpick.xml" xpointer="image"/>
</refsect1>
<refsect1>
<title>Commands</title>
<para>If neither of the command switches listed below are passed the specified mount stack is opened and
general information about it is shown, including a list of all defined layers.</para>
<variablelist>
<varlistentry>
<term><option>--mount</option></term>
<term><option>-m</option></term>
<listitem><para>Mount the specified mount stack to the specified directory.</para>
<para>To unmount a mount stack directory mounted like this use the <option>--umount</option> operation.</para>
<para>Note that this functionality is also available in <citerefentry
project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>8</manvolnum></citerefentry> via a
command such as <command>mount -t mstack mystack.mstack targetdir/</command>, as well as in <citerefentry
project='man-pages'><refentrytitle>fstab</refentrytitle><manvolnum>5</manvolnum></citerefentry>. For
details, see below.</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<varlistentry>
<term><option>-M</option></term>
<listitem><para>This is a shortcut for <option>--mount --mkdir</option>.</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<varlistentry>
<term><option>--umount</option></term>
<term><option>-u</option></term>
<listitem><para>Unmount a mount stack from the specified directory. This command expects one argument:
a directory where mount stack was mounted.</para>
<para>All mounted mounts will be recursively unmounted</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<varlistentry>
<term><option>-U</option></term>
<listitem><para>This is a shortcut for <option>--umount --rmdir</option>.</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<xi:include href="standard-options.xml" xpointer="help" />
<xi:include href="standard-options.xml" xpointer="version" />
</variablelist>
</refsect1>
<refsect1>
<title>Options</title>
<para>The following options are understood:</para>
<variablelist>
<varlistentry>
<term><option>--read-only</option></term>
<term><option>-r</option></term>
<listitem><para>Operate in read-only mode. By default, <option>--mount</option> will establish
writable mount points. If this option is specified they are established in read-only mode
instead.</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<varlistentry>
<term><option>--mkdir</option></term>
<listitem><para>If combined with <option>--mount</option> the directory to mount the mount stack to
is created if it is missing. Note that the directory is not automatically removed when the mount
stack is unmounted again.</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<varlistentry>
<term><option>--rmdir</option></term>
<listitem><para>If combined with <option>--umount</option> the specified directory where the mount stack
is mounted is removed after unmounting it.</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<xi:include href="standard-options.xml" xpointer="image-policy-open" />
<xi:include href="standard-options.xml" xpointer="image-filter" />
<xi:include href="standard-options.xml" xpointer="no-pager" />
<xi:include href="standard-options.xml" xpointer="no-legend" />
<xi:include href="standard-options.xml" xpointer="json" />
</variablelist>
</refsect1>
<refsect1>
<title>Exit status</title>
<para>On success, 0 is returned, a non-zero failure code otherwise.</para>
</refsect1>
<refsect1>
<title>Invocation as <command>/sbin/mount.mstack</command></title>
<!-- In case you are wondering why this is spelled out with the /sbin/ prefix, rather than /usr/bin/ or
so, or omitting the prefix entirely: it's simply because util-linux mount always uses precisely this
spelling in their API docs. For example open mount(8), and grep for '/sbin/mount', and you see a ton
of occurrences. We just inherit this spelling here, to make clear this is just an instantiation of
their plugin mechanism. -->
<para>The <command>systemd-mstack</command> executable may be symlinked to
<filename>/sbin/mount.mstack</filename>. If invoked through that it implements <citerefentry
project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>8</manvolnum></citerefentry>'s
"external helper" interface for the (pseudo) file system type <literal>mstack</literal>. This means
conformant mount stack directories may be mounted directly via</para>
<programlisting># mount -t mstack mystack.mstack targetdir/</programlisting>
<para>in a fashion mostly equivalent to:</para>
<programlisting># systemd-mstack --mount mystack.mstack targetdir/</programlisting>
<para>Note that since a single mount stack may contain multiple mount points it should later be unmounted with
<command>umount -R targetdir/</command>, for recursive operation.</para>
<para>This functionality is particularly useful to mount mount stacks automatically at boot via simple
<filename>/etc/fstab</filename> entries. For example:</para>
<programlisting>/path/to/mystack.nspawn /images/mystack/ mstack defaults 0 0</programlisting>
<para>When invoked this way the mount options <literal>ro</literal>, <literal>rw</literal> map to the
corresponding options listed above (i.e. <option>--read-only</option>).</para>
</refsect1>
<refsect1>
<title>See Also</title>
<para><simplelist type="inline">
<member><citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.mstack</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
<member><citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
<member><citerefentry project='man-pages'><refentrytitle>umount</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
</simplelist></para>
</refsect1>
</refentry>

View File

@ -370,6 +370,18 @@
<xi:include href="version-info.xml" xpointer="v254"/></listitem> <xi:include href="version-info.xml" xpointer="v254"/></listitem>
</varlistentry> </varlistentry>
<varlistentry>
<term><option>--mstack=</option></term>
<listitem><para>Mount stack to mount the root directory for the container from. Takes a path to a
directory implementing
<citerefentry><refentrytitle>systemd.mstack</refentrytitle><manvolnum>7</manvolnum></citerefentry>. This
may be used to run a container of an <literal>overlayfs</literal> assembled from a number of layers,
possibly writable and augmented with additional bind mounts.</para>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<varlistentry> <varlistentry>
<term><option>--oci-bundle=</option></term> <term><option>--oci-bundle=</option></term>
@ -2038,6 +2050,7 @@ After=sys-subsystem-net-devices-ens1.device</programlisting>
<member><citerefentry><refentrytitle>importctl</refentrytitle><manvolnum>1</manvolnum></citerefentry></member> <member><citerefentry><refentrytitle>importctl</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-mountfsd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry></member> <member><citerefentry><refentrytitle>systemd-mountfsd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-nsresourced.service</refentrytitle><manvolnum>8</manvolnum></citerefentry></member> <member><citerefentry><refentrytitle>systemd-nsresourced.service</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.mstack</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
<member><citerefentry project='url'><refentrytitle url='https://btrfs.readthedocs.io/en/latest/btrfs.html'>btrfs</refentrytitle><manvolnum>8</manvolnum></citerefentry></member> <member><citerefentry project='url'><refentrytitle url='https://btrfs.readthedocs.io/en/latest/btrfs.html'>btrfs</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
</simplelist></para> </simplelist></para>
</refsect1> </refsect1>

View File

@ -352,6 +352,26 @@
<xi:include href="version-info.xml" xpointer="v254"/></listitem> <xi:include href="version-info.xml" xpointer="v254"/></listitem>
</varlistentry> </varlistentry>
<varlistentry>
<term><varname>RootMStack=</varname></term>
<listitem><para>Takes a path to a
<citerefentry><refentrytitle>systemd.mstack</refentrytitle><manvolnum>7</manvolnum></citerefentry>
directory encapsulating a mount stack consisting of layers and bind mounts. Similar to
<varname>RootDirectory=</varname> and <varname>RootImage=</varname> this runs the service off a
distinct root file system, in this case set up via <literal>overlayfs</literal>.</para>
<para>Since <filename>.mstack/</filename> directories may reference disk images (DDIs) similar device
policy extensions and dependencies are in effect when <varname>RootMStack=</varname> is used as are
if <varname>RootImage=</varname> is used.</para>
<xi:include href="vpick.xml" xpointer="image"/>
<xi:include href="system-or-user-ns-mountfsd.xml" xpointer="singular"/>
<xi:include href="version-info.xml" xpointer="v260"/></listitem>
</varlistentry>
<varlistentry> <varlistentry>
<term><varname>MountAPIVFS=</varname></term> <term><varname>MountAPIVFS=</varname></term>
@ -450,16 +470,16 @@
<term><varname>BindPaths=</varname></term> <term><varname>BindPaths=</varname></term>
<term><varname>BindReadOnlyPaths=</varname></term> <term><varname>BindReadOnlyPaths=</varname></term>
<listitem><para>Configures unit-specific bind mounts. A bind mount makes a particular file or directory <listitem><para>Configures unit-specific bind mounts. A bind mount makes a particular file or
available at an additional place in the unit's view of the file system. Any bind mounts created with this directory available at an additional place in the unit's view of the file system. Any bind mounts
option are specific to the unit, and are not visible in the host's mount table. This option expects a created with this option are specific to the unit, and are not visible in the host's mount
whitespace separated list of bind mount definitions. Each definition consists of a colon-separated triple of table. This option expects a whitespace separated list of bind mount definitions. Each definition
source path, destination path and option string, where the latter two are optional. If only a source path is consists of a colon-separated triple of source path, destination path and option string, where the
specified the source and destination is taken to be the same. The option string may be either latter two are optional. If only a source path is specified the source and destination is taken to be
<literal>rbind</literal> or <literal>norbind</literal> for configuring a recursive or non-recursive bind the same. The option string may be either <literal>rbind</literal> or <literal>norbind</literal> for
mount. If the destination path is omitted, the option string must be omitted too. configuring a recursive or non-recursive bind mount. If the destination path is omitted, the option
Each bind mount definition may be prefixed with <literal>-</literal>, in which case it will be ignored string must be omitted too. Each bind mount definition may be prefixed with <literal>-</literal>, in
when its source path does not exist.</para> which case it will be ignored when its source path does not exist or is not accessible.</para>
<para><varname>BindPaths=</varname> creates regular writable bind mounts (unless the source file system mount <para><varname>BindPaths=</varname> creates regular writable bind mounts (unless the source file system mount
is already marked read-only), while <varname>BindReadOnlyPaths=</varname> creates read-only bind mounts. These is already marked read-only), while <varname>BindReadOnlyPaths=</varname> creates read-only bind mounts. These
@ -2202,16 +2222,17 @@ BindReadOnlyPaths=/var/lib/systemd</programlisting>
<varlistentry> <varlistentry>
<term><varname>PrivateUsers=</varname></term> <term><varname>PrivateUsers=</varname></term>
<listitem><para>Takes a boolean argument or one of <literal>self</literal>, <literal>identity</literal>, <listitem><para>Takes a boolean argument or one of <literal>self</literal>,
or <literal>full</literal>. Defaults to false. If enabled, sets up a new user namespace for the <literal>identity</literal>, <literal>full</literal> or <literal>managed</literal>. Defaults to
executed processes and configures a user and group mapping. If set to a true value or false. If enabled, sets up a new user namespace for the executed processes and configures a user and
<literal>self</literal>, a minimal user and group mapping is configured that maps the group mapping. If set to a true value or <literal>self</literal>, a minimal user and group mapping is
<literal>root</literal> user and group as well as the unit's own user and group to themselves and configured that maps the <literal>root</literal> user and group as well as the unit's own user and
everything else to the <literal>nobody</literal> user and group. This is useful to securely detach group to themselves and everything else to the <literal>nobody</literal> user and group. This is
the user and group databases used by the unit from the rest of the system, and thus to create an useful to securely detach the user and group databases used by the unit from the rest of the system,
effective sandbox environment. All files, directories, processes, IPC objects and other resources and thus to create an effective sandbox environment. All files, directories, processes, IPC objects
owned by users/groups not equaling <literal>root</literal> or the unit's own will stay visible from and other resources owned by users/groups not equaling <literal>root</literal> or the unit's own will
within the unit but appear owned by the <literal>nobody</literal> user and group. </para> stay visible from within the unit but appear owned by the <literal>nobody</literal> user and
group. </para>
<para>If the parameter is <literal>identity</literal>, user namespacing is set up with an identity <para>If the parameter is <literal>identity</literal>, user namespacing is set up with an identity
mapping for the first 65536 UIDs/GIDs. Any UIDs/GIDs above 65536 will be mapped to the mapping for the first 65536 UIDs/GIDs. Any UIDs/GIDs above 65536 will be mapped to the
@ -2224,14 +2245,21 @@ BindReadOnlyPaths=/var/lib/systemd</programlisting>
to call <function>setgroups()</function> system calls (by setting to call <function>setgroups()</function> system calls (by setting
<filename>/proc/<replaceable>pid</replaceable>/setgroups</filename> to <literal>allow</literal>). <filename>/proc/<replaceable>pid</replaceable>/setgroups</filename> to <literal>allow</literal>).
Similar to <literal>identity</literal>, this does not provide UID/GID isolation, but it does provide Similar to <literal>identity</literal>, this does not provide UID/GID isolation, but it does provide
process capability isolation.</para> process capability isolation. If this mode is enabled, all unit processes are run without privileges
in the host user namespace (regardless of whether the unit's own user/group is
<literal>root</literal> or not). Specifically this means that the process will have zero process
capabilities on the host's user namespace, but full capabilities within the service's user
namespace. Settings such as <varname>CapabilityBoundingSet=</varname> will affect only the latter,
and there's no way to acquire additional capabilities in the host's user namespace.</para>
<para>If this mode is enabled, all unit processes are run without privileges in the host user <para>If the parameter is <literal>managed</literal> a transient, dynamically allocated range of
namespace (regardless of whether the unit's own user/group is <literal>root</literal> or not). Specifically 65536 UIDs/GIDs is allocated for the unit, and a UID/GID mapping is assigned to the unit's process
this means that the process will have zero process capabilities on the host's user namespace, but so the UID/GID 0 from inside the unit maps to the first UID/GID of the allocated mapping. Note that
full capabilities within the service's user namespace. Settings such as in this mode the UID/GID the service process will run as is different depending if looking from the
<varname>CapabilityBoundingSet=</varname> will affect only the latter, and there's no way to acquire host side (where it will be a high, dynamically assigned UID) or from inside the unit (where it will
additional capabilities in the host's user namespace.</para> be 0). Also note that this mode will enable file system UID mapping for the file systems this service
accesses, mapping the "foreign" UID range on disk to the selected dynamic UID range at
runtime.</para>
<para>When this setting is set up by a per-user instance of the service manager, the mapping of the <para>When this setting is set up by a per-user instance of the service manager, the mapping of the
<literal>root</literal> user and group to itself is omitted (unless the user manager is root). <literal>root</literal> user and group to itself is omitted (unless the user manager is root).

126
man/systemd.mstack.xml Normal file
View File

@ -0,0 +1,126 @@
<?xml version='1.0'?> <!--*-nxml-*-->
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<!-- SPDX-License-Identifier: LGPL-2.1-or-later -->
<refentry id="systemd.mstack">
<refentryinfo>
<title>systemd.mstack</title>
<productname>systemd</productname>
</refentryinfo>
<refmeta>
<refentrytitle>systemd.mstack</refentrytitle>
<manvolnum>7</manvolnum>
</refmeta>
<refnamediv>
<refname>systemd.mstack</refname>
<refpurpose>Mount stacks in self descriptive directories</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>Directories with the <literal>.mstack/</literal> suffix may encode 'mount stacks' for assembling OS
mount hierarchies based on bind and overlay mounts, for use in
<citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry>'s
<option>--mstack=</option> switch or the service manager's <option>RootMStack=</option> setting for
services. <literal>.mstack/</literal> directories may contain various files and subdirectories, where
each will effect one layer of an <literal>overlayfs</literal> mount, or a bind mount. The name of the
file or subdirectory indicates how it shall used in the mount hierarchy. Specifically, the following
names are defined:</para>
<orderedlist>
<listitem><para>A <filename>layer@<replaceable>id</replaceable>/</filename> directory will be turned into
a layer of an overlayfs mount. The <literal>id</literal> identifier is used to define the order of the
layers: a version sort is executed, with the first entry being the bottom layer in the
<literal>overlayfs</literal> stack, and the last entry becoming the highest layer (precisely:
highest "lowerdir") in the <literal>overlayfs</literal> stack.</para></listitem>
<listitem><para>Similar, a <filename>layer@<replaceable>id</replaceable>.raw</filename> regular file
will be mounted as a DDI, and the resulting mount will be turned into an overlayfs layer, following the
same sorting rules.</para></listitem>
<listitem><para>An <filename>rw</filename> directory will be turned into a writable layer at the very top
of the <literal>overlayfs</literal> stack. A subdirectory <filename>data</filename> of it will become
the "upperdir", a subdirectory <filename>work</filename> will become the "workdir". Note that these two
subdirectories do not need to be created explicitly, they are created automatically on first use should
they be missing.</para></listitem>
<listitem><para>A <filename>bind@<replaceable>location</replaceable>/</filename> directory will be bind
mounted to the mount point indicated by the <variable>location</variable> identifier, in read-write
fashion. The location is encoded via the same escaping logic used for naming <literal>.mount</literal>
units, i.e. slashes become dashes.</para></listitem>
<listitem><para>Similar, a
<filename>bind@<replaceable>location</replaceable>.raw</filename> file will be mounted as a DDI, and the
resulting mount bind mounted to the specified location.</para></listitem>
<listitem><para>A <filename>robind@<replaceable>location</replaceable>/</filename> is treated very
similar to <filename>bind@<replaceable>location</replaceable>/</filename>, but the resulting bind mount
is read-only.</para></listitem>
<listitem><para>Similar, <filename>robind@<replaceable>location</replaceable>.raw</filename> creates a
read-only bind mount from a DDI.</para></listitem>
<listitem><para>If a <filename>root/</filename> subdirectory it is used as root of the resulting mount
hierarchy, and only the <filename>usr/</filename> subtree of the overlayfs mount will be bound to
<filename>usr/</filename> in the hierarchy.</para></listitem>
</orderedlist>
<para>Note that each of the entry types above may be a symbolic link pointing to a directory or image
file, instead a directory or image file itself.</para>
<para>On each listed file or subdirectory type the
<citerefentry><refentrytitle>systemd.v</refentrytitle><manvolnum>7</manvolnum></citerefentry>
functionality may be used, for automatic selection of versioned resources.</para>
<para>Use the
<citerefentry><refentrytitle>systemd-mstack</refentrytitle><manvolnum>1</manvolnum></citerefentry> tool
to process or mount <filename>.mstack/</filename> directories from the command line.</para>
</refsect1>
<refsect1>
<title>Examples</title>
<para>The following <filename>.mstack/</filename> consists of two read-only overlayfs layers as DDI, plus one
writable directory one on top. The read-only layers are symlinked:</para>
<orderedlist>
<listitem><para><filename>foobar.mstack/layer@0.raw</filename><filename>../base.raw</filename></para></listitem>
<listitem><para><filename>foobar.mstack/layer@1.raw</filename><filename>../app.raw</filename></para></listitem>
<listitem><para><filename>foobar.mstack/rw/</filename></para></listitem>
</orderedlist>
<para>The following <filename>.mstack/</filename> consists of a read-only DDI mounted to <literal>/usr/</literal>
and writable root:</para>
<orderedlist>
<listitem><para><filename>waldo.mstack/layer@0.raw</filename><filename>../vendor.raw</filename></para></listitem>
<listitem><para><filename>waldo.mstack/root/</filename></para></listitem>
</orderedlist>
<para>The following <filename>.mstack/</filename> consists of a read-only DDI mounted as root, but a
writable <filename>/var/</filename> mounted on top:</para>
<orderedlist>
<listitem><para><filename>quux.mstack/layer@0.raw</filename><filename>../myapp1.raw</filename></para></listitem>
<listitem><para><filename>quux.mstack/bind:var</filename><filename>../myapp1-var/</filename></para></listitem>
</orderedlist>
</refsect1>
<refsect1>
<title>See Also</title>
<para><simplelist type="inline">
<member><citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-mstack</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.v</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd-vpick</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
</simplelist></para>
</refsect1>
</refentry>

View File

@ -675,8 +675,6 @@ conf.set('GPERF_LEN_TYPE', gperf_len_type,
##################################################################### #####################################################################
foreach header : [ foreach header : [
'gshadow.h',
'nss.h',
'sys/sdt.h', 'sys/sdt.h',
'threads.h', 'threads.h',
'valgrind/memcheck.h', 'valgrind/memcheck.h',
@ -1603,7 +1601,7 @@ foreach tuple : [
['efi'], ['efi'],
['environment-d'], ['environment-d'],
['firstboot'], ['firstboot'],
['gshadow', conf.get('HAVE_GSHADOW_H') == 1, 'gshadow.h not found'], ['gshadow', get_option('libc') != 'musl', 'musl does not support it'],
['hibernate'], ['hibernate'],
['hostnamed'], ['hostnamed'],
['hwdb'], ['hwdb'],
@ -1619,8 +1617,8 @@ foreach tuple : [
['mountfsd'], ['mountfsd'],
['networkd'], ['networkd'],
['nsresourced'], ['nsresourced'],
['nss-myhostname', conf.get('HAVE_NSS_H') == 1, 'nss.h not found'], ['nss-myhostname', get_option('libc') != 'musl', 'musl does not support it'],
['nss-systemd', conf.get('HAVE_NSS_H') == 1, 'nss.h not found'], ['nss-systemd', get_option('libc') != 'musl', 'musl does not support it'],
['oomd'], ['oomd'],
['portabled'], ['portabled'],
['pstore'], ['pstore'],
@ -1657,15 +1655,15 @@ foreach tuple : [['nss-mymachines', 'machined'],
['nss-resolve', 'resolve']] ['nss-resolve', 'resolve']]
want = get_option(tuple[0]) want = get_option(tuple[0])
if want.enabled() if want.enabled()
if conf.get('HAVE_NSS_H') != 1 if get_option('libc') == 'musl'
error('@0@ is requested but nss.h not found'.format(tuple[0])) error('@0@ is requested but musl does not support it'.format(tuple[0]))
endif endif
if not get_option(tuple[1]) if not get_option(tuple[1])
error('@0@ is requested but @1@ is disabled'.format(tuple[0], tuple[1])) error('@0@ is requested but @1@ is disabled'.format(tuple[0], tuple[1]))
endif endif
have = true have = true
elif want.allowed() elif want.allowed()
have = get_option(tuple[1]) and conf.get('HAVE_NSS_H') == 1 have = get_option(tuple[1]) and get_option('libc') != 'musl'
else else
have = false have = false
endif endif
@ -2379,6 +2377,7 @@ subdir('src/measure')
subdir('src/modules-load') subdir('src/modules-load')
subdir('src/mount') subdir('src/mount')
subdir('src/mountfsd') subdir('src/mountfsd')
subdir('src/mstack')
subdir('src/mute-console') subdir('src/mute-console')
subdir('src/network') subdir('src/network')
subdir('src/notify') subdir('src/notify')

View File

@ -565,3 +565,6 @@ option('vmlinux-h-path', type : 'string', value : '',
option('default-mountfsd-trusted-directories', type : 'boolean', value: false, option('default-mountfsd-trusted-directories', type : 'boolean', value: false,
description : 'controls whether mountfsd should apply a relaxed policy on DDIs in system DDI directories') description : 'controls whether mountfsd should apply a relaxed policy on DDIs in system DDI directories')
option('default-oci-registry', type : 'string', value : 'docker.io',
description : 'Default OCI registry name')

View File

@ -28,27 +28,27 @@ __systemd_osc_context_escape() {
# uuids, id128, hostnames, usernames, since they all come with syntax # uuids, id128, hostnames, usernames, since they all come with syntax
# requirements that exclude \ and ; anyway. This hence primarily is about # requirements that exclude \ and ; anyway. This hence primarily is about
# escaping the current working directory. # escaping the current working directory.
echo "$1" | sed -e 's/\\/\\x5x/g' -e 's/;/\\x3b/g' echo "$1" | sed -e 's/\\/\\x5c/g' -e 's/;/\\x3b/g' -e 's/[[:cntrl:]]/⍰/g'
} }
__systemd_osc_context_common() { __systemd_osc_context_common() {
if [ -f /etc/machine-id ]; then if [ -f /etc/machine-id ]; then
printf ";machineid=%s" "$(</etc/machine-id)" printf ";machineid=%.36s" "$(</etc/machine-id)"
fi fi
printf ";user=%s;hostname=%s;bootid=%s;pid=%s" "$USER" "$HOSTNAME" "$(</proc/sys/kernel/random/boot_id)" "$$" printf ";user=%.255s;hostname=%.255s;bootid=%.36s;pid=%.20d" "$USER" "$HOSTNAME" "$(</proc/sys/kernel/random/boot_id)" "$$"
} }
__systemd_osc_context_precmdline() { __systemd_osc_context_precmdline() {
local systemd_exitstatus="$?" local systemd_exitstatus="$?" systemd_signal
# Close previous command # Close previous command
if [ -n "${systemd_osc_context_cmd_id:-}" ]; then if [ -n "${systemd_osc_context_cmd_id:-}" ]; then
if [ "$systemd_exitstatus" -ge 127 ]; then if [ "$systemd_exitstatus" -gt 128 ] && systemd_signal=$(kill -l "$systemd_exitstatus" 2>&-); then
printf "\033]3008;end=%s;exit=interrupt;signal=%s\033\\" "$systemd_osc_context_cmd_id" $((systemd_exitstatus-127)) printf "\033]3008;end=%.64s;exit=failure;status=%d;signal=SIG%s\033\\" "$systemd_osc_context_cmd_id" "$systemd_exitstatus" "$systemd_signal"
elif [ "$systemd_exitstatus" -ne 0 ]; then elif [ "$systemd_exitstatus" -ne 0 ]; then
printf "\033]3008;end=%s;exit=failure;status=%s\033\\" "$systemd_osc_context_cmd_id" $((systemd_exitstatus)) printf "\033]3008;end=%.64s;exit=failure;status=%d\033\\" "$systemd_osc_context_cmd_id" $((systemd_exitstatus))
else else
printf "\033]3008;end=%s;exit=success\033\\" "$systemd_osc_context_cmd_id" printf "\033]3008;end=%.64s;exit=success\033\\" "$systemd_osc_context_cmd_id"
fi fi
fi fi
@ -58,7 +58,7 @@ __systemd_osc_context_precmdline() {
fi fi
# Create or update the shell session # Create or update the shell session
printf "\033]3008;start=%s%s;type=shell;cwd=%s\033\\" "$systemd_osc_context_shell_id" "$(__systemd_osc_context_common)" "$(__systemd_osc_context_escape "$PWD")" printf "\033]3008;start=%.64s%s;type=shell;cwd=%.255s\033\\" "$systemd_osc_context_shell_id" "$(__systemd_osc_context_common)" "$(__systemd_osc_context_escape "$PWD")"
# Prepare cmd id for next command # Prepare cmd id for next command
read -r systemd_osc_context_cmd_id </proc/sys/kernel/random/uuid read -r systemd_osc_context_cmd_id </proc/sys/kernel/random/uuid
@ -68,7 +68,7 @@ __systemd_osc_context_ps0() {
# Skip if PROMPT_COMMAND= is cleared manually or by other profiles. # Skip if PROMPT_COMMAND= is cleared manually or by other profiles.
[ -n "${systemd_osc_context_cmd_id:-}" ] || return [ -n "${systemd_osc_context_cmd_id:-}" ] || return
printf "\033]3008;start=%s%s;type=command;cwd=%s\033\\" "$systemd_osc_context_cmd_id" "$(__systemd_osc_context_common)" "$(__systemd_osc_context_escape "$PWD")" printf "\033]3008;start=%.64s%s;type=command;cwd=%.255s\033\\" "$systemd_osc_context_cmd_id" "$(__systemd_osc_context_common)" "$(__systemd_osc_context_escape "$PWD")"
} }
if [ -n "${BASH_VERSION:-}" ]; then if [ -n "${BASH_VERSION:-}" ]; then

View File

@ -31,7 +31,7 @@ SUBSYSTEM=="pci|usb|platform", IMPORT{builtin}="path_id"
SUBSYSTEM=="net", IMPORT{builtin}="net_driver" SUBSYSTEM=="net", IMPORT{builtin}="net_driver"
SUBSYSTEM=="ptp", GROUP="clock", MODE="0660" SUBSYSTEM=="ptp", GROUP="clock", MODE="0664"
SUBSYSTEM=="ptp", ATTR{clock_name}=="KVM virtual PTP", SYMLINK+="ptp_kvm" SUBSYSTEM=="ptp", ATTR{clock_name}=="KVM virtual PTP", SYMLINK+="ptp_kvm"
SUBSYSTEM=="ptp", ATTR{clock_name}=="hyperv", SYMLINK+="ptp_hyperv" SUBSYSTEM=="ptp", ATTR{clock_name}=="hyperv", SYMLINK+="ptp_hyperv"
SUBSYSTEM=="ptp", ATTR{clock_name}=="ptp_vmw", SYMLINK+="ptp_vmware" SUBSYSTEM=="ptp", ATTR{clock_name}=="ptp_vmw", SYMLINK+="ptp_vmware"

View File

@ -9,7 +9,7 @@
* processor features, models, generations or even ABIs. Hence we * processor features, models, generations or even ABIs. Hence we
* focus on general family, and distinguish word width and endianness. */ * focus on general family, and distinguish word width and endianness. */
typedef enum { typedef enum Architecture {
ARCHITECTURE_ALPHA, ARCHITECTURE_ALPHA,
ARCHITECTURE_ARC, ARCHITECTURE_ARC,
ARCHITECTURE_ARC_BE, ARCHITECTURE_ARC_BE,

View File

@ -83,6 +83,7 @@ struct iovec_wrapper;
union in_addr_union; union in_addr_union;
union sockaddr_union; union sockaddr_union;
typedef enum Architecture Architecture;
typedef enum CGroupFlags CGroupFlags; typedef enum CGroupFlags CGroupFlags;
typedef enum CGroupMask CGroupMask; typedef enum CGroupMask CGroupMask;
typedef enum ChaseFlags ChaseFlags; typedef enum ChaseFlags ChaseFlags;

View File

@ -621,7 +621,7 @@ bool mount_new_api_supported(void) {
if (cache >= 0) if (cache >= 0)
return cache; return cache;
/* This is the newer API among the ones we use, so use it as boundary */ /* This is the newest API among the ones we use, so use it as boundary */
r = RET_NERRNO(mount_setattr(-EBADF, NULL, 0, NULL, 0)); r = RET_NERRNO(mount_setattr(-EBADF, NULL, 0, NULL, 0));
if (r == 0 || ERRNO_IS_NOT_SUPPORTED(r)) /* This should return an error if it is working properly */ if (r == 0 || ERRNO_IS_NOT_SUPPORTED(r)) /* This should return an error if it is working properly */
return (cache = false); return (cache = false);

View File

@ -157,6 +157,11 @@ int stat_verify_socket(const struct stat *st) {
return 0; return 0;
} }
int is_socket(const char *path) {
assert(!isempty(path));
return verify_stat_at(AT_FDCWD, path, /* follow= */ true, stat_verify_socket, /* verify= */ false);
}
int stat_verify_linked(const struct stat *st) { int stat_verify_linked(const struct stat *st) {
assert(st); assert(st);

View File

@ -21,6 +21,7 @@ int fd_verify_symlink(int fd);
int is_symlink(const char *path); int is_symlink(const char *path);
int stat_verify_socket(const struct stat *st); int stat_verify_socket(const struct stat *st);
int is_socket(const char *path);
int stat_verify_linked(const struct stat *st); int stat_verify_linked(const struct stat *st);
int fd_verify_linked(int fd); int fd_verify_linked(int fd);

View File

@ -379,3 +379,18 @@ int uid_map_search_root(pid_t pid, UIDRangeUsernsMode mode, uid_t *ret) {
} }
} }
} }
uid_t uid_range_base(const UIDRange *range) {
/* Returns the lowest UID in the range (notw that elements are sorted, hence we just need to look at
* the first one that is populated. */
if (uid_range_is_empty(range))
return UID_INVALID;
FOREACH_ARRAY(e, range->entries, range->n_entries)
if (e->nr > 0)
return e->start;
return UID_INVALID;
}

View File

@ -54,3 +54,5 @@ int uid_range_load_userns_by_fd(int userns_fd, UIDRangeUsernsMode mode, UIDRange
bool uid_range_overlaps(const UIDRange *range, uid_t start, uid_t nr); bool uid_range_overlaps(const UIDRange *range, uid_t start, uid_t nr);
int uid_map_search_root(pid_t pid, UIDRangeUsernsMode mode, uid_t *ret); int uid_map_search_root(pid_t pid, UIDRangeUsernsMode mode, uid_t *ret);
uid_t uid_range_base(const UIDRange *range);

View File

@ -1258,6 +1258,7 @@ const sd_bus_vtable bus_exec_vtable[] = {
SD_BUS_PROPERTY("RootHashSignaturePath", "s", NULL, offsetof(ExecContext, root_hash_sig_path), SD_BUS_VTABLE_PROPERTY_CONST), SD_BUS_PROPERTY("RootHashSignaturePath", "s", NULL, offsetof(ExecContext, root_hash_sig_path), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("RootVerity", "s", NULL, offsetof(ExecContext, root_verity), SD_BUS_VTABLE_PROPERTY_CONST), SD_BUS_PROPERTY("RootVerity", "s", NULL, offsetof(ExecContext, root_verity), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("RootEphemeral", "b", bus_property_get_bool, offsetof(ExecContext, root_ephemeral), SD_BUS_VTABLE_PROPERTY_CONST), SD_BUS_PROPERTY("RootEphemeral", "b", bus_property_get_bool, offsetof(ExecContext, root_ephemeral), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("RootMStack", "s", NULL, offsetof(ExecContext, root_mstack), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("ExtensionDirectories", "as", NULL, offsetof(ExecContext, extension_directories), SD_BUS_VTABLE_PROPERTY_CONST), SD_BUS_PROPERTY("ExtensionDirectories", "as", NULL, offsetof(ExecContext, extension_directories), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("ExtensionImages", "a(sba(ss))", property_get_extension_images, 0, SD_BUS_VTABLE_PROPERTY_CONST), SD_BUS_PROPERTY("ExtensionImages", "a(sba(ss))", property_get_extension_images, 0, SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("MountImages", "a(ssba(ss))", property_get_mount_images, 0, SD_BUS_VTABLE_PROPERTY_CONST), SD_BUS_PROPERTY("MountImages", "a(ssba(ss))", property_get_mount_images, 0, SD_BUS_VTABLE_PROPERTY_CONST),
@ -1882,6 +1883,9 @@ int bus_exec_context_set_transient_property(
if (streq(name, "RootImage")) if (streq(name, "RootImage"))
return bus_set_transient_path(u, name, &c->root_image, message, flags, reterr_error); return bus_set_transient_path(u, name, &c->root_image, message, flags, reterr_error);
if (streq(name, "RootMStack"))
return bus_set_transient_path(u, name, &c->root_mstack, message, flags, reterr_error);
if (streq(name, "RootImageOptions")) { if (streq(name, "RootImageOptions")) {
_cleanup_(mount_options_free_allp) MountOptions *options = NULL; _cleanup_(mount_options_free_allp) MountOptions *options = NULL;
_cleanup_free_ char *format_str = NULL; _cleanup_free_ char *format_str = NULL;

View File

@ -14,6 +14,7 @@
#include <unistd.h> #include <unistd.h>
#include "sd-messages.h" #include "sd-messages.h"
#include "sd-varlink.h"
#include "apparmor-util.h" /* IWYU pragma: keep */ #include "apparmor-util.h" /* IWYU pragma: keep */
#include "argv-util.h" #include "argv-util.h"
@ -55,8 +56,10 @@
#include "mkdir-label.h" #include "mkdir-label.h"
#include "mount-util.h" #include "mount-util.h"
#include "mountpoint-util.h" #include "mountpoint-util.h"
#include "mstack.h"
#include "namespace-util.h" #include "namespace-util.h"
#include "nsflags.h" #include "nsflags.h"
#include "nsresource.h"
#include "open-file.h" #include "open-file.h"
#include "osc-context.h" #include "osc-context.h"
#include "pam-util.h" #include "pam-util.h"
@ -81,6 +84,7 @@
#include "strv.h" #include "strv.h"
#include "strxcpyx.h" #include "strxcpyx.h"
#include "terminal-util.h" #include "terminal-util.h"
#include "uid-range.h"
#include "user-util.h" #include "user-util.h"
#include "utmp-wtmp.h" #include "utmp-wtmp.h"
#include "vpick.h" #include "vpick.h"
@ -2395,11 +2399,14 @@ static int setup_private_users_child(int unshare_ready_fd, const char *uid_map,
} }
static int setup_private_users( static int setup_private_users(
sd_varlink *nsresource_link,
PrivateUsers private_users, PrivateUsers private_users,
uid_t ouid, uid_t saved_uid, /* service manager uid */
gid_t ogid, gid_t saved_gid, /* service manager gid */
uid_t uid, uid_t *uid, /* unit uid (seen from inside) [input+output] */
gid_t gid, gid_t *gid, /* unit gid (ditto) [input+output] */
uid_t *outside_uid, /* uid seen from the outside (which is the same as *uid, except of userns is used) */
gid_t *outside_gid, /* gid seen from the outside (similar) */
bool allow_setgroups) { bool allow_setgroups) {
_cleanup_free_ char *uid_map = NULL, *gid_map = NULL; _cleanup_free_ char *uid_map = NULL, *gid_map = NULL;
@ -2410,6 +2417,11 @@ static int setup_private_users(
ssize_t n; ssize_t n;
int r; int r;
assert(uid);
assert(gid);
assert(outside_uid);
assert(outside_gid);
/* Set up a user namespace and map the original UID/GID (IDs from before any user or group changes, i.e. /* Set up a user namespace and map the original UID/GID (IDs from before any user or group changes, i.e.
* the IDs from the user or system manager(s)) to itself, the selected UID/GID to itself, and everything else to * the IDs from the user or system manager(s)) to itself, the selected UID/GID to itself, and everything else to
* nobody. In order to be able to write this mapping we need CAP_SETUID in the original user namespace, which * nobody. In order to be able to write this mapping we need CAP_SETUID in the original user namespace, which
@ -2425,6 +2437,45 @@ static int setup_private_users(
case PRIVATE_USERS_NO: case PRIVATE_USERS_NO:
return 0; /* Early exit */ return 0; /* Early exit */
case PRIVATE_USERS_MANAGED: {
if (uid_is_valid(*uid) || uid_is_valid(*gid))
return log_debug_errno(SYNTHETIC_ERRNO(EPERM), "When allocating dynamic user namespace range, target UID/GID must be root, refusing.");
_cleanup_close_ int userns_fd = nsresource_allocate_userns(
nsresource_link,
/* name= */ NULL,
NSRESOURCE_UIDS_64K);
if (userns_fd < 0)
return userns_fd;
_cleanup_(uid_range_freep) UIDRange *uid_range = NULL;
r = uid_range_load_userns_by_fd(userns_fd, UID_RANGE_USERNS_OUTSIDE, &uid_range);
if (r < 0)
return log_debug_errno(r, "Failed to read outside userns UID range: %m");
_cleanup_(uid_range_freep) UIDRange *gid_range = NULL;
r = uid_range_load_userns_by_fd(userns_fd, GID_RANGE_USERNS_OUTSIDE, &gid_range);
if (r < 0)
return log_debug_errno(r, "Failed to read outside userns GID range: %m");
if (setns(userns_fd, CLONE_NEWUSER) < 0)
return log_debug_errno(errno, "Failed to join freshly allocated user namespace: %m");
/* In "managed" mode the originating UID is not mapped hence we need to explicitly become root in the new userns now. */
r = reset_uid_gid();
if (r < 0)
return log_debug_errno(r, "Failed to reset UID/GID to root: %m");
/* Now report are new UIDs/GIDS, both inside and outside */
*uid = 0;
*gid = 0;
*outside_uid = uid_range_base(uid_range);
*outside_gid = uid_range_base(gid_range);
return 1; /* Early exit */
}
case PRIVATE_USERS_IDENTITY: case PRIVATE_USERS_IDENTITY:
uid_map = strdup("0 0 65536\n"); uid_map = strdup("0 0 65536\n");
if (!uid_map) if (!uid_map)
@ -2445,15 +2496,15 @@ static int setup_private_users(
case PRIVATE_USERS_SELF: case PRIVATE_USERS_SELF:
/* Can only set up multiple mappings with CAP_SETUID. */ /* Can only set up multiple mappings with CAP_SETUID. */
if (uid_is_valid(uid) && uid != ouid && have_effective_cap(CAP_SETUID) > 0) if (uid_is_valid(*uid) && *uid != saved_uid && have_effective_cap(CAP_SETUID) > 0)
r = asprintf(&uid_map, r = asprintf(&uid_map,
UID_FMT " " UID_FMT " 1\n" /* Map $OUID → $OUID */ UID_FMT " " UID_FMT " 1\n" /* Map $OUID → $OUID */
UID_FMT " " UID_FMT " 1\n", /* Map $UID → $UID */ UID_FMT " " UID_FMT " 1\n", /* Map $UID → $UID */
ouid, ouid, uid, uid); saved_uid, saved_uid, *uid, *uid);
else else
r = asprintf(&uid_map, r = asprintf(&uid_map,
UID_FMT " " UID_FMT " 1\n", /* Map $OUID → $OUID */ UID_FMT " " UID_FMT " 1\n", /* Map $OUID → $OUID */
ouid, ouid); saved_uid, saved_uid);
if (r < 0) if (r < 0)
return -ENOMEM; return -ENOMEM;
@ -2479,15 +2530,15 @@ static int setup_private_users(
case PRIVATE_USERS_SELF: case PRIVATE_USERS_SELF:
/* Can only set up multiple mappings with CAP_SETGID. */ /* Can only set up multiple mappings with CAP_SETGID. */
if (gid_is_valid(gid) && gid != ogid && have_effective_cap(CAP_SETGID) > 0) if (gid_is_valid(*gid) && *gid != saved_gid && have_effective_cap(CAP_SETGID) > 0)
r = asprintf(&gid_map, r = asprintf(&gid_map,
GID_FMT " " GID_FMT " 1\n" /* Map $OGID → $OGID */ GID_FMT " " GID_FMT " 1\n" /* Map $OGID → $OGID */
GID_FMT " " GID_FMT " 1\n", /* Map $GID → $GID */ GID_FMT " " GID_FMT " 1\n", /* Map $GID → $GID */
ogid, ogid, gid, gid); saved_gid, saved_gid, *gid, *gid);
else else
r = asprintf(&gid_map, r = asprintf(&gid_map,
GID_FMT " " GID_FMT " 1\n", /* Map $OGID -> $OGID */ GID_FMT " " GID_FMT " 1\n", /* Map $OGID -> $OGID */
ogid, ogid); saved_gid, saved_gid);
if (r < 0) if (r < 0)
return -ENOMEM; return -ENOMEM;
break; break;
@ -3487,8 +3538,7 @@ static int compile_symlinks(
static bool insist_on_sandboxing( static bool insist_on_sandboxing(
const ExecContext *context, const ExecContext *context,
const char *root_dir, const PinnedResource *rootfs,
const char *root_image,
const BindMount *bind_mounts, const BindMount *bind_mounts,
size_t n_bind_mounts) { size_t n_bind_mounts) {
@ -3502,7 +3552,7 @@ static bool insist_on_sandboxing(
if (context->n_temporary_filesystems > 0) if (context->n_temporary_filesystems > 0)
return true; return true;
if (root_dir || root_image || context->root_directory_as_fd) if (pinned_resource_is_set(rootfs))
return true; return true;
if (context->n_mount_images > 0) if (context->n_mount_images > 0)
@ -3529,8 +3579,7 @@ static bool insist_on_sandboxing(
static int setup_ephemeral( static int setup_ephemeral(
const ExecContext *context, const ExecContext *context,
ExecRuntime *runtime, ExecRuntime *runtime,
char **root_image, /* both input and output! modified if ephemeral logic enabled */ PinnedResource *rootfs, /* both input and output! modified if ephemeral logic enabled */
char **root_directory, /* ditto */
char **reterr_path) { char **reterr_path) {
_cleanup_close_ int fd = -EBADF; _cleanup_close_ int fd = -EBADF;
@ -3538,12 +3587,10 @@ static int setup_ephemeral(
int r; int r;
assert(context); assert(context);
assert(!context->root_directory_as_fd);
assert(runtime); assert(runtime);
assert(root_image); assert(rootfs);
assert(root_directory);
if (!*root_image && !*root_directory) if (!rootfs->image && !rootfs->directory)
return 0; return 0;
if (!runtime->ephemeral_copy) if (!runtime->ephemeral_copy)
@ -3569,32 +3616,32 @@ static int setup_ephemeral(
if (fd != -EAGAIN) if (fd != -EAGAIN)
return log_debug_errno(fd, "Failed to receive file descriptor queued on ephemeral storage socket: %m"); return log_debug_errno(fd, "Failed to receive file descriptor queued on ephemeral storage socket: %m");
if (*root_image) { if (rootfs->image) {
log_debug("Making ephemeral copy of %s to %s", *root_image, new_root); log_debug("Making ephemeral copy of %s to %s", rootfs->image, new_root);
fd = copy_file(*root_image, new_root, O_EXCL, 0600, fd = copy_file(rootfs->image, new_root, O_EXCL, 0600,
COPY_LOCK_BSD|COPY_REFLINK|COPY_CRTIME|COPY_NOCOW_AFTER); COPY_LOCK_BSD|COPY_REFLINK|COPY_CRTIME|COPY_NOCOW_AFTER);
if (fd < 0) { if (fd < 0) {
*reterr_path = strdup(*root_image); *reterr_path = strdup(rootfs->image);
return log_debug_errno(fd, "Failed to copy image %s to %s: %m", return log_debug_errno(fd, "Failed to copy image %s to %s: %m",
*root_image, new_root); rootfs->image, new_root);
} }
} else { } else {
assert(*root_directory); assert(rootfs->directory);
log_debug("Making ephemeral snapshot of %s to %s", *root_directory, new_root); log_debug("Making ephemeral snapshot of %s to %s", rootfs->directory, new_root);
fd = btrfs_subvol_snapshot_at( fd = btrfs_subvol_snapshot_at(
AT_FDCWD, *root_directory, AT_FDCWD, rootfs->directory,
AT_FDCWD, new_root, AT_FDCWD, new_root,
BTRFS_SNAPSHOT_FALLBACK_COPY | BTRFS_SNAPSHOT_FALLBACK_COPY |
BTRFS_SNAPSHOT_FALLBACK_DIRECTORY | BTRFS_SNAPSHOT_FALLBACK_DIRECTORY |
BTRFS_SNAPSHOT_RECURSIVE | BTRFS_SNAPSHOT_RECURSIVE |
BTRFS_SNAPSHOT_LOCK_BSD); BTRFS_SNAPSHOT_LOCK_BSD);
if (fd < 0) { if (fd < 0) {
*reterr_path = strdup(*root_directory); *reterr_path = strdup(rootfs->directory);
return log_debug_errno(fd, "Failed to snapshot directory %s to %s: %m", return log_debug_errno(fd, "Failed to snapshot directory %s to %s: %m",
*root_directory, new_root); rootfs->directory, new_root);
} }
} }
@ -3602,11 +3649,14 @@ static int setup_ephemeral(
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to queue file descriptor on ephemeral storage socket: %m"); return log_debug_errno(r, "Failed to queue file descriptor on ephemeral storage socket: %m");
if (*root_image) if (rootfs->image) {
free_and_replace(*root_image, new_root); free_and_replace(rootfs->image, new_root);
else { close_and_replace(rootfs->image_fd, fd);
assert(*root_directory); } else {
free_and_replace(*root_directory, new_root); assert(rootfs->directory);
free_and_replace(rootfs->directory, new_root);
close_and_replace(rootfs->directory_fd, fd);
} }
return 1; return 1;
@ -3660,20 +3710,35 @@ static int verity_settings_prepare(
return 0; return 0;
} }
static int pick_versions( static int pin_rootfs(
const ExecContext *context, const ExecContext *context,
const ExecParameters *params, const ExecParameters *params,
char **ret_root_image, PinnedResource *ret,
char **ret_root_directory,
char **reterr_path) { char **reterr_path) {
int r; int r;
assert(context); assert(context);
assert(!context->root_directory_as_fd);
assert(params); assert(params);
assert(ret_root_image); assert(ret);
assert(ret_root_directory);
if (!FLAGS_SET(params->flags, EXEC_APPLY_CHROOT)) {
*ret = PINNED_RESOURCE_NULL;
return 0;
}
if (context->root_directory_as_fd) {
_cleanup_close_ int fd = fcntl(params->root_directory_fd, F_DUPFD_CLOEXEC, 3);
if (fd < 0)
return log_debug_errno(errno, "Failed to duplicate root directory fd: %m");
*ret = (PinnedResource) {
.directory_fd = TAKE_FD(fd),
.image_fd = -EBADF,
};
return 1;
}
if (context->root_image) { if (context->root_image) {
_cleanup_(pick_result_done) PickResult result = PICK_RESULT_NULL; _cleanup_(pick_result_done) PickResult result = PICK_RESULT_NULL;
@ -3695,8 +3760,24 @@ static int pick_versions(
return log_debug_errno(SYNTHETIC_ERRNO(ENOENT), "No matching entry in .v/ directory %s found.", context->root_image); return log_debug_errno(SYNTHETIC_ERRNO(ENOENT), "No matching entry in .v/ directory %s found.", context->root_image);
} }
*ret_root_image = TAKE_PTR(result.path); /* path_pick() returns us an O_PATH fd, let's turn this into a fully opened file, because
*ret_root_directory = NULL; * mountfsd will want this later, and it wants a fully opened fd, so that security checks
* have been passed */
_cleanup_close_ int reopened_fd = -EBADF;
reopened_fd = fd_reopen(result.fd, O_CLOEXEC|O_NONBLOCK|O_NOCTTY|O_RDWR);
if (ERRNO_IS_NEG_FS_WRITE_REFUSED(reopened_fd))
reopened_fd = fd_reopen(result.fd, O_CLOEXEC|O_NONBLOCK|O_NOCTTY|O_RDONLY);
if (reopened_fd < 0) {
*reterr_path = strdup(context->root_image);
return log_debug_errno(reopened_fd, "Failed to open image '%s': %m", context->root_image);
}
*ret = (PinnedResource) {
.image = TAKE_PTR(result.path),
.image_fd = TAKE_FD(reopened_fd),
.directory_fd = -EBADF,
};
return r; return r;
} }
@ -3720,12 +3801,53 @@ static int pick_versions(
return log_debug_errno(SYNTHETIC_ERRNO(ENOENT), "No matching entry in .v/ directory %s found.", context->root_directory); return log_debug_errno(SYNTHETIC_ERRNO(ENOENT), "No matching entry in .v/ directory %s found.", context->root_directory);
} }
*ret_root_image = NULL; *ret = (PinnedResource) {
*ret_root_directory = TAKE_PTR(result.path); .directory = TAKE_PTR(result.path),
.directory_fd = TAKE_FD(result.fd),
.image_fd = -EBADF,
};
return r; return r;
} }
*ret_root_image = *ret_root_directory = NULL; if (context->root_mstack) {
_cleanup_(pick_result_done) PickResult result = PICK_RESULT_NULL;
r = path_pick(/* toplevel_path= */ NULL,
/* toplevel_fd= */ AT_FDCWD,
context->root_mstack,
pick_filter_image_mstack,
/* n_filters= */ 1,
PICK_ARCHITECTURE|PICK_TRIES|PICK_RESOLVE,
&result);
if (r < 0) {
*reterr_path = strdup(context->root_mstack);
return r;
}
if (!result.path) {
*reterr_path = strdup(context->root_mstack);
return log_debug_errno(SYNTHETIC_ERRNO(ENOENT), "No matching entry in .v/ directory %s found.", context->root_mstack);
}
_cleanup_(mstack_freep) MStack *mstack = NULL;
r = mstack_load(result.path, result.fd, &mstack);
if (r < 0) {
*reterr_path = TAKE_PTR(result.path);
return r;
}
*ret = (PinnedResource) {
.mstack = TAKE_PTR(result.path),
.mstack_loaded = TAKE_PTR(mstack),
.image_fd = -EBADF,
.directory_fd = -EBADF,
};
return r;
}
*ret = PINNED_RESOURCE_NULL;
return 0; return 0;
} }
@ -3733,7 +3855,8 @@ static int apply_mount_namespace(
ExecCommandFlags command_flags, ExecCommandFlags command_flags,
const ExecContext *context, const ExecContext *context,
const ExecParameters *params, const ExecParameters *params,
ExecRuntime *runtime, const ExecRuntime *runtime,
const PinnedResource *rootfs,
const char *memory_pressure_path, const char *memory_pressure_path,
bool needs_sandboxing, bool needs_sandboxing,
uid_t exec_directory_uid, uid_t exec_directory_uid,
@ -3741,13 +3864,14 @@ static int apply_mount_namespace(
PidRef *bpffs_pidref, PidRef *bpffs_pidref,
int bpffs_socket_fd, int bpffs_socket_fd,
int bpffs_errno_pipe, int bpffs_errno_pipe,
sd_varlink *mountfsd_link,
char **reterr_path) { char **reterr_path) {
_cleanup_(verity_settings_done) VeritySettings verity = VERITY_SETTINGS_DEFAULT; _cleanup_(verity_settings_done) VeritySettings verity = VERITY_SETTINGS_DEFAULT;
_cleanup_strv_free_ char **empty_directories = NULL, **symlinks = NULL, _cleanup_strv_free_ char **empty_directories = NULL, **symlinks = NULL,
**read_write_paths_cleanup = NULL; **read_write_paths_cleanup = NULL;
_cleanup_free_ char *creds_path = NULL, *incoming_dir = NULL, *propagate_dir = NULL, _cleanup_free_ char *creds_path = NULL, *incoming_dir = NULL, *propagate_dir = NULL,
*private_namespace_dir = NULL, *host_os_release_stage = NULL, *root_image = NULL, *root_dir = NULL; *private_namespace_dir = NULL, *host_os_release_stage = NULL;
const char *tmp_dir = NULL, *var_tmp_dir = NULL; const char *tmp_dir = NULL, *var_tmp_dir = NULL;
char **read_write_paths; char **read_write_paths;
bool setup_os_release_symlink; bool setup_os_release_symlink;
@ -3761,26 +3885,6 @@ static int apply_mount_namespace(
CLEANUP_ARRAY(bind_mounts, n_bind_mounts, bind_mount_free_many); CLEANUP_ARRAY(bind_mounts, n_bind_mounts, bind_mount_free_many);
if (params->flags & EXEC_APPLY_CHROOT && !context->root_directory_as_fd) {
r = pick_versions(
context,
params,
&root_image,
&root_dir,
reterr_path);
if (r < 0)
return r;
r = setup_ephemeral(
context,
runtime,
&root_image,
&root_dir,
reterr_path);
if (r < 0)
return r;
}
r = compile_bind_mounts(context, params, exec_directory_uid, exec_directory_gid, &bind_mounts, &n_bind_mounts, &empty_directories); r = compile_bind_mounts(context, params, exec_directory_uid, exec_directory_gid, &bind_mounts, &n_bind_mounts, &empty_directories);
if (r < 0) if (r < 0)
return r; return r;
@ -3819,7 +3923,7 @@ static int apply_mount_namespace(
} }
/* Symlinks (exec dirs, os-release) are set up after other mounts, before they are made read-only. */ /* Symlinks (exec dirs, os-release) are set up after other mounts, before they are made read-only. */
setup_os_release_symlink = needs_sandboxing && exec_context_get_effective_mount_apivfs(context) && (root_dir || root_image); setup_os_release_symlink = needs_sandboxing && exec_context_get_effective_mount_apivfs(context) && pinned_resource_is_set(rootfs);
r = compile_symlinks(context, params, setup_os_release_symlink, &symlinks); r = compile_symlinks(context, params, setup_os_release_symlink, &symlinks);
if (r < 0) if (r < 0)
return r; return r;
@ -3866,10 +3970,10 @@ static int apply_mount_namespace(
return -ENOMEM; return -ENOMEM;
} }
if (root_image) { if (rootfs->image) {
r = verity_settings_prepare( r = verity_settings_prepare(
&verity, &verity,
root_image, rootfs->image,
&context->root_hash, context->root_hash_path, &context->root_hash, context->root_hash_path,
&context->root_hash_sig, context->root_hash_sig_path, &context->root_hash_sig, context->root_hash_sig_path,
context->root_verity); context->root_verity);
@ -3880,9 +3984,7 @@ static int apply_mount_namespace(
NamespaceParameters parameters = { NamespaceParameters parameters = {
.runtime_scope = params->runtime_scope, .runtime_scope = params->runtime_scope,
.root_directory = root_dir, .rootfs = rootfs,
.root_image = root_image,
.root_directory_fd = params->flags & EXEC_APPLY_CHROOT ? params->root_directory_fd : -EBADF,
.root_image_options = context->root_image_options, .root_image_options = context->root_image_options,
.root_image_policy = context->root_image_policy ?: &image_policy_service, .root_image_policy = context->root_image_policy ?: &image_policy_service,
@ -3930,7 +4032,7 @@ static int apply_mount_namespace(
/* If DynamicUser=no and RootDirectory= is set then lets pass a relaxed sandbox info, /* If DynamicUser=no and RootDirectory= is set then lets pass a relaxed sandbox info,
* otherwise enforce it, don't ignore protected paths and fail if we are enable to apply the * otherwise enforce it, don't ignore protected paths and fail if we are enable to apply the
* sandbox inside the mount namespace. */ * sandbox inside the mount namespace. */
.ignore_protect_paths = !needs_sandboxing && !context->dynamic_user && root_dir, .ignore_protect_paths = !needs_sandboxing && !context->dynamic_user && pinned_resource_is_set(rootfs),
.protect_control_groups = needs_sandboxing ? exec_get_protect_control_groups(context) : PROTECT_CONTROL_GROUPS_NO, .protect_control_groups = needs_sandboxing ? exec_get_protect_control_groups(context) : PROTECT_CONTROL_GROUPS_NO,
.protect_kernel_tunables = needs_sandboxing && context->protect_kernel_tunables, .protect_kernel_tunables = needs_sandboxing && context->protect_kernel_tunables,
@ -3956,10 +4058,13 @@ static int apply_mount_namespace(
.protect_proc = needs_sandboxing ? context->protect_proc : PROTECT_PROC_DEFAULT, .protect_proc = needs_sandboxing ? context->protect_proc : PROTECT_PROC_DEFAULT,
.proc_subset = needs_sandboxing ? context->proc_subset : PROC_SUBSET_ALL, .proc_subset = needs_sandboxing ? context->proc_subset : PROC_SUBSET_ALL,
.private_bpf = needs_sandboxing ? context->private_bpf : PRIVATE_BPF_NO, .private_bpf = needs_sandboxing ? context->private_bpf : PRIVATE_BPF_NO,
.private_users = needs_sandboxing ? context->private_users : PRIVATE_USERS_NO,
.bpffs_pidref = bpffs_pidref, .bpffs_pidref = bpffs_pidref,
.bpffs_socket_fd = bpffs_socket_fd, .bpffs_socket_fd = bpffs_socket_fd,
.bpffs_errno_pipe = bpffs_errno_pipe, .bpffs_errno_pipe = bpffs_errno_pipe,
.mountfsd_link = mountfsd_link,
}; };
r = setup_namespace(&parameters, reterr_path); r = setup_namespace(&parameters, reterr_path);
@ -3970,17 +4075,18 @@ static int apply_mount_namespace(
if (r == -ENOANO) { if (r == -ENOANO) {
if (insist_on_sandboxing( if (insist_on_sandboxing(
context, context,
root_dir, root_image, rootfs,
bind_mounts, bind_mounts,
n_bind_mounts)) n_bind_mounts))
return log_debug_errno(SYNTHETIC_ERRNO(EOPNOTSUPP), return log_debug_errno(SYNTHETIC_ERRNO(EOPNOTSUPP),
"Failed to set up namespace, and refusing to continue since " "Failed to set up namespace, and refusing to continue since "
"the selected namespacing options alter mount environment non-trivially.\n" "the selected namespacing options alter mount environment non-trivially.\n"
"Bind mounts: %zu, temporary filesystems: %zu, root directory: %s, root image: %s, dynamic user: %s", "Bind mounts: %zu, temporary filesystems: %zu, root directory: %s, root image: %s, root mstack: %s, dynamic user: %s",
n_bind_mounts, n_bind_mounts,
context->n_temporary_filesystems, context->n_temporary_filesystems,
yes_no(root_dir), yes_no(rootfs->directory_fd >= 0),
yes_no(root_image), yes_no(rootfs->image_fd >= 0),
yes_no(!!rootfs->mstack_loaded),
yes_no(context->dynamic_user)); yes_no(context->dynamic_user));
log_debug("Failed to set up namespace, assuming containerized execution and ignoring."); log_debug("Failed to set up namespace, assuming containerized execution and ignoring.");
@ -4530,10 +4636,8 @@ static bool exec_needs_cap_sys_admin(const ExecContext *context, const ExecParam
context->bind_log_sockets > 0 || context->bind_log_sockets > 0 ||
context->n_bind_mounts > 0 || context->n_bind_mounts > 0 ||
context->n_temporary_filesystems > 0 || context->n_temporary_filesystems > 0 ||
context->root_directory || exec_context_with_rootfs_strict(context) ||
context->root_directory_as_fd ||
!strv_isempty(context->extension_directories) || !strv_isempty(context->extension_directories) ||
context->root_image ||
context->n_mount_images > 0 || context->n_mount_images > 0 ||
context->n_extension_images > 0 || context->n_extension_images > 0 ||
context->protect_system != PROTECT_SYSTEM_NO || context->protect_system != PROTECT_SYSTEM_NO ||
@ -4605,7 +4709,8 @@ static bool exec_namespace_is_delegated(
static int setup_delegated_namespaces( static int setup_delegated_namespaces(
const ExecContext *context, const ExecContext *context,
ExecParameters *params, ExecParameters *params,
ExecRuntime *runtime, const ExecRuntime *runtime,
const PinnedResource *rootfs,
bool delegate, bool delegate,
const char *memory_pressure_path, const char *memory_pressure_path,
uid_t uid, uid_t uid,
@ -4616,6 +4721,7 @@ static int setup_delegated_namespaces(
PidRef *bpffs_pidref, PidRef *bpffs_pidref,
int bpffs_socket_fd, int bpffs_socket_fd,
int bpffs_errno_pipe, int bpffs_errno_pipe,
sd_varlink *mountfsd_link,
int *reterr_exit_status) { int *reterr_exit_status) {
int r; int r;
@ -4630,6 +4736,7 @@ static int setup_delegated_namespaces(
assert(context); assert(context);
assert(params); assert(params);
assert(runtime); assert(runtime);
assert(rootfs);
assert(reterr_exit_status); assert(reterr_exit_status);
if (exec_needs_network_namespace(context) && if (exec_needs_network_namespace(context) &&
@ -4730,10 +4837,12 @@ static int setup_delegated_namespaces(
exec_namespace_is_delegated(context, params, have_cap_sys_admin, CLONE_NEWNS) == delegate) { exec_namespace_is_delegated(context, params, have_cap_sys_admin, CLONE_NEWNS) == delegate) {
_cleanup_free_ char *error_path = NULL; _cleanup_free_ char *error_path = NULL;
r = apply_mount_namespace(command->flags, r = apply_mount_namespace(
command->flags,
context, context,
params, params,
runtime, runtime,
rootfs,
memory_pressure_path, memory_pressure_path,
needs_sandboxing, needs_sandboxing,
uid, uid,
@ -4741,6 +4850,7 @@ static int setup_delegated_namespaces(
bpffs_pidref, bpffs_pidref,
bpffs_socket_fd, bpffs_socket_fd,
bpffs_errno_pipe, bpffs_errno_pipe,
mountfsd_link,
&error_path); &error_path);
if (r < 0) { if (r < 0) {
*reterr_exit_status = EXIT_NAMESPACE; *reterr_exit_status = EXIT_NAMESPACE;
@ -5093,10 +5203,10 @@ int exec_invoke(
#if HAVE_SECCOMP #if HAVE_SECCOMP
uint64_t saved_bset = 0; uint64_t saved_bset = 0;
#endif #endif
uid_t saved_uid = getuid(); /* saved_uid → where we started from; uid → where we ended up in; outside_uid → where we ended up in,
gid_t saved_gid = getgid(); * but seen from the outside (covers userns mappings) */
uid_t uid = UID_INVALID; uid_t uid = UID_INVALID, saved_uid = getuid(), outside_uid = UID_INVALID;
gid_t gid = GID_INVALID; gid_t gid = GID_INVALID, saved_gid = getgid(), outside_gid = GID_INVALID;
int secure_bits; int secure_bits;
_cleanup_free_ gid_t *gids = NULL, *gids_after_pam = NULL; _cleanup_free_ gid_t *gids = NULL, *gids_after_pam = NULL;
int ngids = 0, ngids_after_pam = 0; int ngids = 0, ngids_after_pam = 0;
@ -5266,7 +5376,9 @@ int exec_invoke(
return log_error_errno(errno, "Failed to update environment: %m"); return log_error_errno(errno, "Failed to update environment: %m");
} }
if (context->dynamic_user && runtime->dynamic_creds) { if (exec_context_get_effective_private_users(context, params) == PRIVATE_USERS_MANAGED)
log_debug("Running with a managed user namespace, not initializing UIDs/GIDs.");
else if (context->dynamic_user && runtime->dynamic_creds) {
_cleanup_strv_free_ char **suggested_paths = NULL; _cleanup_strv_free_ char **suggested_paths = NULL;
/* On top of that, make sure we bypass our own NSS module nss-systemd comprehensively for any NSS /* On top of that, make sure we bypass our own NSS module nss-systemd comprehensively for any NSS
@ -5348,6 +5460,7 @@ int exec_invoke(
} }
} }
if (exec_context_get_effective_private_users(context, params) != PRIVATE_USERS_MANAGED) {
/* Initialize user supplementary groups and get SupplementaryGroups= ones */ /* Initialize user supplementary groups and get SupplementaryGroups= ones */
ngids = get_supplementary_groups(context, username, gid, &gids); ngids = get_supplementary_groups(context, username, gid, &gids);
if (ngids < 0) { if (ngids < 0) {
@ -5362,6 +5475,7 @@ int exec_invoke(
} }
params->user_lookup_fd = safe_close(params->user_lookup_fd); params->user_lookup_fd = safe_close(params->user_lookup_fd);
}
r = acquire_home(context, &pwent_home, &home_buffer); r = acquire_home(context, &pwent_home, &home_buffer);
if (r < 0) { if (r < 0) {
@ -5697,6 +5811,24 @@ int exec_invoke(
} }
} }
_cleanup_(sd_varlink_unrefp) sd_varlink *mountfsd_link = NULL, *nsresource_link = NULL;
if (needs_sandboxing &&
exec_context_get_effective_private_users(context, params) == PRIVATE_USERS_MANAGED) {
/* In managed mode we need to allocate a userns via nsresource, and then assign mounts to
* it. We must do so with our original privileges (since after creating the userns, we might
* simply not have the necessary privs for the IPC calls anymore), hence do this here, ahead
* of time. */
r = mountfsd_connect(&mountfsd_link);
if (r < 0)
return log_error_errno(r, "Failed to connect to mountfsd: %m");
r = nsresource_connect(&nsresource_link);
if (r < 0)
return log_error_errno(r, "Failed to connect to nsresourced: %m");
}
needs_mount_namespace = exec_needs_mount_namespace(context, params, runtime); needs_mount_namespace = exec_needs_mount_namespace(context, params, runtime);
for (ExecDirectoryType dt = 0; dt < _EXEC_DIRECTORY_TYPE_MAX; dt++) { for (ExecDirectoryType dt = 0; dt < _EXEC_DIRECTORY_TYPE_MAX; dt++) {
@ -5867,6 +5999,20 @@ int exec_invoke(
} }
} }
_cleanup_(pinned_resource_done) PinnedResource rootfs = PINNED_RESOURCE_NULL;
_cleanup_free_ char *error_path = NULL;
r = pin_rootfs(context, params, &rootfs, &error_path);
if (r < 0) {
*exit_status = EXIT_NAMESPACE;
return log_error_errno(r, "Failed to open service's root fs%s%s: %m", error_path ? ": " : "", strempty(error_path));
}
r = setup_ephemeral(context, runtime, &rootfs, &error_path);
if (r < 0) {
*exit_status = EXIT_NAMESPACE;
return log_error_errno(r, "Failed to make ephemeral copy of service's root fs%s%s: %m", error_path ? ": " : "", strempty(error_path));
}
/* Load a bunch of libraries we'll possibly need later, before we turn off dlopen() */ /* Load a bunch of libraries we'll possibly need later, before we turn off dlopen() */
(void) dlopen_bpf(); (void) dlopen_bpf();
(void) dlopen_cryptsetup(); (void) dlopen_cryptsetup();
@ -5891,16 +6037,26 @@ int exec_invoke(
/* The kernel requires /proc/pid/setgroups be set to "deny" prior to writing /proc/pid/gid_map in /* The kernel requires /proc/pid/setgroups be set to "deny" prior to writing /proc/pid/gid_map in
* unprivileged user namespaces. */ * unprivileged user namespaces. */
r = setup_private_users(pu, saved_uid, saved_gid, uid, gid, /* allow_setgroups= */ false); r = setup_private_users(
nsresource_link,
pu,
saved_uid,
saved_gid,
&uid,
&gid,
&outside_uid,
&outside_gid,
/* allow_setgroups= */ false);
/* If it was requested explicitly and we can't set it up, fail early. Otherwise, continue and let /* If it was requested explicitly and we can't set it up, fail early. Otherwise, continue and let
* the actual requested operations fail (or silently continue). */ * the actual requested operations fail (or silently continue). */
if (r < 0 && context->private_users != PRIVATE_USERS_NO) { if (r < 0) {
if (context->private_users != PRIVATE_USERS_NO) {
*exit_status = EXIT_USER; *exit_status = EXIT_USER;
return log_error_errno(r, "Failed to set up user namespacing for unprivileged user: %m"); return log_error_errno(r, "Failed to set up user namespacing for unprivileged user: %m");
} }
if (r < 0)
log_info_errno(r, "Failed to set up user namespacing for unprivileged user, ignoring: %m"); log_notice_errno(r, "Failed to set up user namespacing for unprivileged user, ignoring: %m");
else { } else {
assert(r > 0); assert(r > 0);
userns_set_up = true; userns_set_up = true;
log_debug("Set up unprivileged user namespace"); log_debug("Set up unprivileged user namespace");
@ -5912,6 +6068,7 @@ int exec_invoke(
context, context,
params, params,
runtime, runtime,
&rootfs,
/* delegate= */ false, /* delegate= */ false,
memory_pressure_path, memory_pressure_path,
uid, uid,
@ -5922,6 +6079,7 @@ int exec_invoke(
&bpffs_pidref, &bpffs_pidref,
bpffs_socket_fd, bpffs_socket_fd,
bpffs_errno_pipe, bpffs_errno_pipe,
mountfsd_link,
exit_status); exit_status);
if (r < 0) if (r < 0)
return r; return r;
@ -5971,7 +6129,15 @@ int exec_invoke(
} else if (needs_sandboxing && !userns_set_up) { } else if (needs_sandboxing && !userns_set_up) {
PrivateUsers pu = exec_context_get_effective_private_users(context, params); PrivateUsers pu = exec_context_get_effective_private_users(context, params);
r = setup_private_users(pu, saved_uid, saved_gid, uid, gid, r = setup_private_users(
nsresource_link,
pu,
saved_uid,
saved_gid,
&uid,
&gid,
&outside_uid,
&outside_gid,
/* allow_setgroups= */ pu == PRIVATE_USERS_FULL); /* allow_setgroups= */ pu == PRIVATE_USERS_FULL);
if (r < 0) { if (r < 0) {
*exit_status = EXIT_USER; *exit_status = EXIT_USER;
@ -5981,11 +6147,25 @@ int exec_invoke(
log_debug("Set up privileged user namespace"); log_debug("Set up privileged user namespace");
} }
if (params->user_lookup_fd >= 0) {
/* If we haven't sent the UIDs/GIDs we settled on upstream yet, let's do so now, as we
* finally know our final UID/GID range */
r = send_user_lookup(params->unit_id, params->user_lookup_fd, outside_uid, outside_gid);
if (r < 0) {
*exit_status = EXIT_USER;
return log_error_errno(r, "Failed to send user credentials to PID1: %m");
}
params->user_lookup_fd = safe_close(params->user_lookup_fd);
}
/* Call setup_delegated_namespaces() the second time to unshare all delegated namespaces. */ /* Call setup_delegated_namespaces() the second time to unshare all delegated namespaces. */
r = setup_delegated_namespaces( r = setup_delegated_namespaces(
context, context,
params, params,
runtime, runtime,
&rootfs,
/* delegate= */ true, /* delegate= */ true,
memory_pressure_path, memory_pressure_path,
uid, uid,
@ -5996,10 +6176,19 @@ int exec_invoke(
&bpffs_pidref, &bpffs_pidref,
bpffs_socket_fd, bpffs_socket_fd,
bpffs_errno_pipe, bpffs_errno_pipe,
mountfsd_link,
exit_status); exit_status);
if (r < 0) if (r < 0)
return r; return r;
/* We are done now with the nsresourced/mountfsd shenanigans, let's close the connections */
nsresource_link = sd_varlink_unref(nsresource_link);
mountfsd_link = sd_varlink_unref(mountfsd_link);
/* We don't need the pinned rootfs anymore at this point. Close the fds now, so that they are
* definitely gone before we do our fd rearrangements below. */
pinned_resource_done(&rootfs);
/* Kill unnecessary process, for the case that e.g. when the bpffs mount point is hidden. */ /* Kill unnecessary process, for the case that e.g. when the bpffs mount point is hidden. */
pidref_done_sigkill_wait(&bpffs_pidref); pidref_done_sigkill_wait(&bpffs_pidref);
@ -6433,7 +6622,6 @@ int exec_invoke(
} }
} }
#endif #endif
} }
if (!strv_isempty(context->unset_environment)) { if (!strv_isempty(context->unset_environment)) {

View File

@ -1633,6 +1633,10 @@ static int exec_context_serialize(const ExecContext *c, FILE *f) {
if (r < 0) if (r < 0)
return r; return r;
r = serialize_item_escaped(f, "exec-context-root-mstack", c->root_mstack);
if (r < 0)
return r;
r = serialize_item_format(f, "exec-context-umask", "%04o", c->umask); r = serialize_item_format(f, "exec-context-umask", "%04o", c->umask);
if (r < 0) if (r < 0)
return r; return r;
@ -2568,6 +2572,14 @@ static int exec_context_deserialize(ExecContext *c, FILE *f) {
if (r < 0) if (r < 0)
return r; return r;
c->root_ephemeral = r; c->root_ephemeral = r;
} else if ((val = startswith(l, "exec-context-root-mstack="))) {
ssize_t k;
char *p;
k = cunescape(val, 0, &p);
if (k < 0)
return k;
free_and_replace(c->root_mstack, p);
} else if ((val = startswith(l, "exec-context-umask="))) { } else if ((val = startswith(l, "exec-context-umask="))) {
r = parse_mode(val, &c->umask); r = parse_mode(val, &c->umask);
if (r < 0) if (r < 0)

View File

@ -263,7 +263,8 @@ bool exec_needs_mount_namespace(
assert(context); assert(context);
if (context->root_image) if (context->root_image ||
context->root_mstack)
return true; return true;
if (context->root_directory_as_fd) if (context->root_directory_as_fd)
@ -356,7 +357,7 @@ const char* exec_get_private_notify_socket_path(const ExecContext *context, cons
if (!needs_sandboxing) if (!needs_sandboxing)
return NULL; return NULL;
if (!context->root_directory && !context->root_image && !context->root_directory_as_fd) if (!exec_context_with_rootfs(context))
return NULL; return NULL;
if (!exec_context_get_effective_mount_apivfs(context)) if (!exec_context_get_effective_mount_apivfs(context))
@ -684,6 +685,7 @@ void exec_context_done(ExecContext *c) {
iovec_done(&c->root_hash_sig); iovec_done(&c->root_hash_sig);
c->root_hash_sig_path = mfree(c->root_hash_sig_path); c->root_hash_sig_path = mfree(c->root_hash_sig_path);
c->root_verity = mfree(c->root_verity); c->root_verity = mfree(c->root_verity);
c->root_mstack = mfree(c->root_mstack);
c->tty_path = mfree(c->tty_path); c->tty_path = mfree(c->tty_path);
c->syslog_identifier = mfree(c->syslog_identifier); c->syslog_identifier = mfree(c->syslog_identifier);
c->user = mfree(c->user); c->user = mfree(c->user);
@ -1203,6 +1205,9 @@ void exec_context_dump(const ExecContext *c, FILE* f, const char *prefix) {
if (c->root_verity) if (c->root_verity)
fprintf(f, "%sRootVerity: %s\n", prefix, c->root_verity); fprintf(f, "%sRootVerity: %s\n", prefix, c->root_verity);
if (c->root_mstack)
fprintf(f, "%sRootMStack: %s\n", prefix, c->root_mstack);
STRV_FOREACH(e, c->environment) STRV_FOREACH(e, c->environment)
fprintf(f, "%sEnvironment: %s\n", prefix, *e); fprintf(f, "%sEnvironment: %s\n", prefix, *e);
@ -2058,9 +2063,19 @@ bool exec_context_restrict_filesystems_set(const ExecContext *c) {
bool exec_context_with_rootfs(const ExecContext *c) { bool exec_context_with_rootfs(const ExecContext *c) {
assert(c); assert(c);
/* Checks if RootDirectory=, RootImage= or RootDirectoryFileDescriptor= are used */ /* Checks if RootDirectory=, RootImage=, RootMStack= or RootDirectoryFileDescriptor= are used */
return !empty_or_root(c->root_directory) || c->root_image || c->root_directory_as_fd; return !empty_or_root(c->root_directory) || c->root_image || c->root_directory_as_fd || c->root_mstack;
}
bool exec_context_with_rootfs_strict(const ExecContext *c) {
assert(c);
/* just like exec_context_with_rootfs(), but doesn't suppress a root directory of "/", i.e. returns
* true in more cases: when a root directory is explicitly configured, even if it's our usual
* root. */
return c->root_directory || c->root_image || c->root_directory_as_fd || c->root_mstack;
} }
int exec_context_has_vpicked_extensions(const ExecContext *context) { int exec_context_has_vpicked_extensions(const ExecContext *context) {

View File

@ -199,9 +199,12 @@ typedef struct ExecContext {
char **unset_environment; char **unset_environment;
struct rlimit *rlimit[_RLIMIT_MAX]; struct rlimit *rlimit[_RLIMIT_MAX];
char *working_directory, *root_directory, *root_image, *root_verity, *root_hash_path, *root_hash_sig_path; char *working_directory;
char *root_directory;
char *root_image, *root_verity, *root_hash_path, *root_hash_sig_path;
struct iovec root_hash, root_hash_sig; struct iovec root_hash, root_hash_sig;
MountOptions *root_image_options; MountOptions *root_image_options;
char *root_mstack;
bool root_ephemeral; bool root_ephemeral;
bool working_directory_missing_ok:1; bool working_directory_missing_ok:1;
bool working_directory_home:1; bool working_directory_home:1;
@ -577,6 +580,7 @@ char** exec_context_get_restrict_filesystems(const ExecContext *c);
bool exec_context_restrict_namespaces_set(const ExecContext *c); bool exec_context_restrict_namespaces_set(const ExecContext *c);
bool exec_context_restrict_filesystems_set(const ExecContext *c); bool exec_context_restrict_filesystems_set(const ExecContext *c);
bool exec_context_with_rootfs(const ExecContext *c); bool exec_context_with_rootfs(const ExecContext *c);
bool exec_context_with_rootfs_strict(const ExecContext *c);
int exec_context_has_vpicked_extensions(const ExecContext *context); int exec_context_has_vpicked_extensions(const ExecContext *context);

View File

@ -11,6 +11,7 @@
{{type}}.RootHashSignature, config_parse_exec_root_hash_sig, 0, offsetof({{type}}, exec_context) {{type}}.RootHashSignature, config_parse_exec_root_hash_sig, 0, offsetof({{type}}, exec_context)
{{type}}.RootVerity, config_parse_unit_path_printf, true, offsetof({{type}}, exec_context.root_verity) {{type}}.RootVerity, config_parse_unit_path_printf, true, offsetof({{type}}, exec_context.root_verity)
{{type}}.RootEphemeral, config_parse_bool, 0, offsetof({{type}}, exec_context.root_ephemeral) {{type}}.RootEphemeral, config_parse_bool, 0, offsetof({{type}}, exec_context.root_ephemeral)
{{type}}.RootMStack, config_parse_unit_path_printf, true, offsetof({{type}}, exec_context.root_mstack)
{{type}}.ExtensionDirectories, config_parse_namespace_path_strv, 0, offsetof({{type}}, exec_context.extension_directories) {{type}}.ExtensionDirectories, config_parse_namespace_path_strv, 0, offsetof({{type}}, exec_context.extension_directories)
{{type}}.ExtensionImages, config_parse_extension_images, 0, offsetof({{type}}, exec_context) {{type}}.ExtensionImages, config_parse_extension_images, 0, offsetof({{type}}, exec_context)
{{type}}.ExtensionImagePolicy, config_parse_image_policy, 0, offsetof({{type}}, exec_context.extension_image_policy) {{type}}.ExtensionImagePolicy, config_parse_image_policy, 0, offsetof({{type}}, exec_context.extension_image_policy)

View File

@ -34,6 +34,7 @@
#include "mkdir-label.h" #include "mkdir-label.h"
#include "mount-util.h" #include "mount-util.h"
#include "mountpoint-util.h" #include "mountpoint-util.h"
#include "mstack.h"
#include "namespace.h" #include "namespace.h"
#include "namespace-util.h" #include "namespace-util.h"
#include "nsflags.h" #include "nsflags.h"
@ -1360,7 +1361,7 @@ static int mount_private_dev(const MountEntry *m, const NamespaceParameters *p)
/* We assume /run/systemd/journal/ is available if not changing root, which isn't entirely accurate /* We assume /run/systemd/journal/ is available if not changing root, which isn't entirely accurate
* but shouldn't matter, as either way the user would get ENOENT when accessing /dev/log */ * but shouldn't matter, as either way the user would get ENOENT when accessing /dev/log */
if ((!p->root_image && !p->root_directory && p->root_directory_fd < 0) || p->bind_log_sockets) { if (!pinned_resource_is_set(p->rootfs) || p->bind_log_sockets) {
const char *devlog = strjoina(temporary_mount, "/dev/log"); const char *devlog = strjoina(temporary_mount, "/dev/log");
if (symlink("/run/systemd/journal/dev-log", devlog) < 0) if (symlink("/run/systemd/journal/dev-log", devlog) < 0)
log_debug_errno(errno, log_debug_errno(errno,
@ -1454,16 +1455,45 @@ static int mount_private_apivfs(
assert(entry_path); assert(entry_path);
assert(bind_source); assert(bind_source);
(void) mkdir_p_label(entry_path, 0755); bool noprivs = false;
/* First, check if we have enough privileges to mount a new instance. */
_cleanup_close_ int mount_fd = make_fsmount(
LOG_DEBUG,
/* what= */ fstype,
fstype,
MS_NOSUID|MS_NOEXEC|MS_NODEV,
opts,
/* userns_fd= */ -EBADF);
if (ERRNO_IS_NEG_PRIVILEGE(mount_fd))
noprivs = true;
else if (ERRNO_IS_NEG_NOT_SUPPORTED(mount_fd)) {
/* Fallback for kernels lacking mount_setattr() */
// FIXME: This compatibility code path shall be removed once kernel 5.12
// becomes the new minimal baseline
/* First, check if we have enough privileges to mount a new instance. Note, a new sysfs instance
* cannot be mounted on an already existing mount. Let's use a temporary place. */
r = create_temporary_mount_point(scope, &temporary_mount); r = create_temporary_mount_point(scope, &temporary_mount);
if (r < 0) if (r < 0)
return r; return r;
r = mount_nofollow_verbose(LOG_DEBUG, fstype, temporary_mount, fstype, MS_NOSUID|MS_NOEXEC|MS_NODEV, opts); r = mount_nofollow_verbose(
if (ERRNO_IS_NEG_PRIVILEGE(r)) { LOG_DEBUG,
fstype,
temporary_mount,
fstype,
MS_NOSUID|MS_NOEXEC|MS_NODEV,
opts);
if (ERRNO_IS_NEG_PRIVILEGE(r))
noprivs = true;
else if (r < 0)
return r;
} else if (mount_fd < 0)
return log_debug_errno(mount_fd, "Failed to make file system mount: %m");
(void) mkdir_p_label(entry_path, 0755);
if (noprivs) {
/* When we do not have enough privileges to mount a new instance, fall back to use an /* When we do not have enough privileges to mount a new instance, fall back to use an
* existing mount. */ * existing mount. */
@ -1482,8 +1512,6 @@ static int mount_private_apivfs(
return 1; return 1;
} }
if (r < 0)
return r;
/* OK. We have a new mount instance. Let's clear an existing mount and its submounts. */ /* OK. We have a new mount instance. Let's clear an existing mount and its submounts. */
r = umount_recursive(entry_path, /* flags= */ 0); r = umount_recursive(entry_path, /* flags= */ 0);
@ -1491,9 +1519,16 @@ static int mount_private_apivfs(
log_debug_errno(r, "Failed to unmount directories below '%s', ignoring: %m", entry_path); log_debug_errno(r, "Failed to unmount directories below '%s', ignoring: %m", entry_path);
/* Then, move the new mount instance. */ /* Then, move the new mount instance. */
if (mount_fd >= 0) {
r = RET_NERRNO(move_mount(mount_fd, "", -EBADF, entry_path, MOVE_MOUNT_F_EMPTY_PATH));
if (r < 0)
return log_debug_errno(r, "Failed to attach '%s' to '%s': %m", fstype, entry_path);
} else if (temporary_mount) {
r = mount_nofollow_verbose(LOG_DEBUG, temporary_mount, entry_path, /* fstype= */ NULL, MS_MOVE, /* options= */ NULL); r = mount_nofollow_verbose(LOG_DEBUG, temporary_mount, entry_path, /* fstype= */ NULL, MS_MOVE, /* options= */ NULL);
if (r < 0) if (r < 0)
return r; return r;
} else
assert_not_reached();
/* We mounted a new instance now. Let's bind mount the children over now. This matters for nspawn /* We mounted a new instance now. Let's bind mount the children over now. This matters for nspawn
* where a bunch of files are overmounted, in particular the boot id. */ * where a bunch of files are overmounted, in particular the boot id. */
@ -1974,12 +2009,20 @@ static int apply_one_mount(
} }
r = chase(mount_entry_source(m), NULL, CHASE_TRAIL_SLASH|CHASE_TRIGGER_AUTOFS, &chased, NULL); r = chase(mount_entry_source(m), NULL, CHASE_TRAIL_SLASH|CHASE_TRIGGER_AUTOFS, &chased, NULL);
if (r == -ENOENT && m->ignore) { if (r < 0) {
log_debug_errno(r, "Path %s does not exist, ignoring.", mount_entry_source(m)); if (m->ignore) {
if (r == -ENOENT) {
log_debug_errno(r, "Path '%s' does not exist, ignoring.", mount_entry_source(m));
return 0; return 0;
} }
if (r < 0) if (ERRNO_IS_NEG_PRIVILEGE(r)) {
log_debug_errno(r, "Path '%s' is not accessible, ignoring: %m", mount_entry_source(m));
return 0;
}
}
return log_debug_errno(r, "Failed to follow symlinks on %s: %m", mount_entry_source(m)); return log_debug_errno(r, "Failed to follow symlinks on %s: %m", mount_entry_source(m));
}
log_debug("Followed source symlinks %s %s %s.", log_debug("Followed source symlinks %s %s %s.",
mount_entry_source(m), glyph(GLYPH_ARROW_RIGHT), chased); mount_entry_source(m), glyph(GLYPH_ARROW_RIGHT), chased);
@ -2497,6 +2540,17 @@ static bool home_read_only(
return false; return false;
} }
static bool namespace_read_only(const NamespaceParameters *p) {
assert(p);
return root_read_only(p->read_only_paths,
p->protect_system) &&
home_read_only(p->read_only_paths, p->inaccessible_paths, p->empty_directories,
p->bind_mounts, p->n_bind_mounts, p->temporary_filesystems, p->n_temporary_filesystems,
p->protect_home) &&
strv_isempty(p->read_write_paths);
}
int setup_namespace(const NamespaceParameters *p, char **reterr_path) { int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
_cleanup_(loop_device_unrefp) LoopDevice *loop_device = NULL; _cleanup_(loop_device_unrefp) LoopDevice *loop_device = NULL;
@ -2518,6 +2572,7 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
DISSECT_IMAGE_PIN_PARTITION_DEVICES | DISSECT_IMAGE_PIN_PARTITION_DEVICES |
DISSECT_IMAGE_ALLOW_USERSPACE_VERITY | DISSECT_IMAGE_ALLOW_USERSPACE_VERITY |
DISSECT_IMAGE_VERITY_SHARE; DISSECT_IMAGE_VERITY_SHARE;
MStackFlags mstack_flags = 0;
int r; int r;
assert(p); assert(p);
@ -2529,22 +2584,70 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
bool setup_propagate = !isempty(p->propagate_dir) && !isempty(p->incoming_dir); bool setup_propagate = !isempty(p->propagate_dir) && !isempty(p->incoming_dir);
unsigned long mount_propagation_flag = p->mount_propagation_flag != 0 ? p->mount_propagation_flag : MS_SHARED; unsigned long mount_propagation_flag = p->mount_propagation_flag != 0 ? p->mount_propagation_flag : MS_SHARED;
if (p->root_image) {
/* Make the whole image read-only if we can determine that we only access it in a read-only fashion. */ /* Make the whole image read-only if we can determine that we only access it in a read-only fashion. */
if (root_read_only(p->read_only_paths, bool ro = namespace_read_only(p);
p->protect_system) && if (ro) {
home_read_only(p->read_only_paths, p->inaccessible_paths, p->empty_directories,
p->bind_mounts, p->n_bind_mounts, p->temporary_filesystems, p->n_temporary_filesystems,
p->protect_home) &&
strv_isempty(p->read_write_paths))
dissect_image_flags |= DISSECT_IMAGE_READ_ONLY; dissect_image_flags |= DISSECT_IMAGE_READ_ONLY;
mstack_flags |= MSTACK_RDONLY;
}
_cleanup_close_ int _root_mount_fd = -EBADF;
int root_mount_fd = -EBADF;
if (pinned_resource_is_set(p->rootfs)) {
if (p->rootfs->directory_fd >= 0) {
/* In "managed" mode we need to map from foreign UID/GID space, hence go via mountfsd */
if (p->private_users == PRIVATE_USERS_MANAGED) {
userns_fd = namespace_open_by_type(NAMESPACE_USER);
if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to open our own user namespace: %m");
r = mountfsd_mount_directory_fd(
p->mountfsd_link,
p->rootfs->directory_fd,
userns_fd,
dissect_image_flags,
&_root_mount_fd);
if (r < 0)
return r;
root_mount_fd = _root_mount_fd;
}
/* Try to to clone the directory mount if we have privs to, so that we can apply the
* MS_SLAVE propagation settings right-away. */
if (root_mount_fd < 0) {
_root_mount_fd = open_tree_attr_with_fallback(
p->rootfs->directory_fd,
"",
OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC|AT_SYMLINK_NOFOLLOW|AT_EMPTY_PATH|AT_RECURSIVE,
&(struct mount_attr) {
/* We just remounted / as slave, but that didn't affect the detached
* mount that we just mounted, so remount that one as slave recursive
* as well now. */
.propagation = MS_SLAVE,
});
if (_root_mount_fd < 0 && !ERRNO_IS_NEG_PRIVILEGE(_root_mount_fd) && _root_mount_fd != -EINVAL)
return log_debug_errno(_root_mount_fd, "Failed to clone specified directory: %m");
root_mount_fd = _root_mount_fd;
}
/* If we have only a root fd (and we couldn't make it ours), and we have no path,
* then try to go on with the literal fd */
if (root_mount_fd < 0 && !p->rootfs->directory)
root_mount_fd = p->rootfs->directory_fd;
}
if (p->rootfs->image_fd >= 0) {
SET_FLAG(dissect_image_flags, DISSECT_IMAGE_NO_PARTITION_TABLE, p->verity && p->verity->data_path); SET_FLAG(dissect_image_flags, DISSECT_IMAGE_NO_PARTITION_TABLE, p->verity && p->verity->data_path);
if (p->runtime_scope == RUNTIME_SCOPE_SYSTEM) {
/* In system mode we mount directly */
/* First check if we have a verity device already open and with a fstype pinned by policy. If it /* First check if we have a verity device already open and with a fstype pinned by policy. If it
* cannot be found, then fallback to the slow path (full dissect). */ * cannot be found, then fallback to the slow path (full dissect). */
r = dissected_image_new_from_existing_verity( r = dissected_image_new_from_existing_verity(
p->root_image, p->rootfs->image,
p->verity, p->verity,
p->root_image_options, p->root_image_options,
p->root_image_policy, p->root_image_policy,
@ -2555,14 +2658,13 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
if (r < 0 && !ERRNO_IS_NEG_DEVICE_ABSENT(r) && r != -ENOPKG) if (r < 0 && !ERRNO_IS_NEG_DEVICE_ABSENT(r) && r != -ENOPKG)
return r; return r;
if (r >= 0) if (r >= 0)
log_debug("Reusing pre-existing verity-protected root image %s", p->root_image); log_debug("Reusing pre-existing verity-protected root image %s", p->rootfs->image);
else { else {
if (p->runtime_scope == RUNTIME_SCOPE_SYSTEM) { r = loop_device_make(
/* In system mode we mount directly */ p->rootfs->image_fd,
FLAGS_SET(dissect_image_flags, DISSECT_IMAGE_DEVICE_READ_ONLY) ? O_RDONLY : -1 /* < 0 means take access mode from fd */,
r = loop_device_make_by_path( /* offset= */ 0,
p->root_image, /* size= */ UINT64_MAX,
FLAGS_SET(dissect_image_flags, DISSECT_IMAGE_DEVICE_READ_ONLY) ? O_RDONLY : -1 /* < 0 means writable if possible, read-only as fallback */,
/* sector_size= */ UINT32_MAX, /* sector_size= */ UINT32_MAX,
FLAGS_SET(dissect_image_flags, DISSECT_IMAGE_NO_PARTITION_TABLE) ? 0 : LO_FLAGS_PARTSCAN, FLAGS_SET(dissect_image_flags, DISSECT_IMAGE_NO_PARTITION_TABLE) ? 0 : LO_FLAGS_PARTSCAN,
LOCK_SH, LOCK_SH,
@ -2603,13 +2705,15 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
dissect_image_flags); dissect_image_flags);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to decrypt dissected image: %m"); return log_debug_errno(r, "Failed to decrypt dissected image: %m");
}
} else { } else {
userns_fd = namespace_open_by_type(NAMESPACE_USER); userns_fd = namespace_open_by_type(NAMESPACE_USER);
if (userns_fd < 0) if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to open our own user namespace: %m"); return log_debug_errno(userns_fd, "Failed to open our own user namespace: %m");
r = mountfsd_mount_image( r = mountfsd_mount_image_fd(
p->root_image, p->mountfsd_link,
p->rootfs->image_fd,
userns_fd, userns_fd,
p->root_image_options, p->root_image_options,
p->root_image_policy, p->root_image_policy,
@ -2620,10 +2724,28 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
return r; return r;
} }
} }
if (p->rootfs->mstack_loaded) {
if (p->runtime_scope != RUNTIME_SCOPE_SYSTEM) {
userns_fd = namespace_open_by_type(NAMESPACE_USER);
if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to open our own user namespace: %m");
} }
if (p->root_directory) r = mstack_open_images(
root = p->root_directory; p->rootfs->mstack_loaded,
p->mountfsd_link,
userns_fd,
p->root_image_policy,
/* image_filter= */ NULL,
mstack_flags);
if (r < 0)
return r;
}
}
if (p->rootfs && p->rootfs->directory)
root = p->rootfs->directory;
else { else {
/* /run/systemd should have been created by PID 1 early on already, but in some cases, like /* /run/systemd should have been created by PID 1 early on already, but in some cases, like
* when running tests (test-execute), it might not have been created yet so let's make sure * when running tests (test-execute), it might not have been created yet so let's make sure
@ -2964,21 +3086,36 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
/* Remount / as SLAVE so that nothing now mounted in the namespace /* Remount / as SLAVE so that nothing now mounted in the namespace
* shows up in the parent */ * shows up in the parent */
if (mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) r = mount_nofollow_verbose(LOG_DEBUG, /* what= */ NULL, "/", /* fstype= */ NULL, MS_SLAVE|MS_REC, /* options= */ NULL);
return log_debug_errno(errno, "Failed to remount '/' as SLAVE: %m"); if (r < 0)
return r;
if (p->root_directory_fd >= 0) { if (root_mount_fd >= 0) {
/* If we have root_mount_fd we have a ready-to-use detached mount. Attach it. */
if (move_mount(p->root_directory_fd, "", AT_FDCWD, root, MOVE_MOUNT_F_EMPTY_PATH) < 0) if (move_mount(root_mount_fd, "", AT_FDCWD, root, MOVE_MOUNT_F_EMPTY_PATH) < 0)
return log_debug_errno(errno, "Failed to move detached mount to '%s': %m", root); return log_debug_errno(errno, "Failed to move detached mount to '%s': %m", root);
/* We just remounted / as slave, but that didn't affect the detached mount that we just r = mount_nofollow_verbose(LOG_DEBUG, /* what= */ NULL, root, /* fstype= */ NULL, MS_SLAVE|MS_REC, /* options= */ NULL);
* mounted, so remount that one as slave recursive as well now. */ if (r < 0)
return r;
if (mount(NULL, root, NULL, MS_SLAVE|MS_REC, NULL) < 0) } else if (p->rootfs && p->rootfs->directory) {
return log_debug_errno(errno, "Failed to remount '%s' as SLAVE: %m", root);
/* If we do not have root_mount_fd, but a directory was specified, then we can use it directly. */
/* A root directory is specified. Turn its directory into bind mount, if it isn't one yet. */
r = path_is_mount_point_full(root, /* root = */ NULL, AT_SYMLINK_FOLLOW);
if (r < 0)
return log_debug_errno(r, "Failed to detect that %s is a mount point or not: %m", root);
if (r == 0) {
r = mount_nofollow_verbose(LOG_DEBUG, root, root, /* fstype= */ NULL, MS_BIND|MS_REC, /* options= */ NULL);
if (r < 0)
return r;
}
} else if (dissected_image) {
} else if (p->root_image) {
/* A root image is specified, mount it to the right place */ /* A root image is specified, mount it to the right place */
r = dissected_image_mount( r = dissected_image_mount(
dissected_image, dissected_image,
@ -3002,17 +3139,15 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to relinquish dissected image: %m"); return log_debug_errno(r, "Failed to relinquish dissected image: %m");
} else if (p->root_directory) { } else if (p->rootfs && p->rootfs->mstack_loaded) {
/* A root directory is specified. Turn its directory into bind mount, if it isn't one yet. */ r = mstack_make_mounts(p->rootfs->mstack_loaded, root, mstack_flags);
r = path_is_mount_point_full(root, /* root= */ NULL, AT_SYMLINK_FOLLOW); if (r < 0)
if (r < 0) return r;
return log_debug_errno(r, "Failed to detect that %s is a mount point or not: %m", root);
if (r == 0) { r = mstack_bind_mounts(p->rootfs->mstack_loaded, root, /* where_fd= */ -EBADF, mstack_flags, /* ret_root_fd= */ NULL);
r = mount_nofollow_verbose(LOG_DEBUG, root, root, NULL, MS_BIND|MS_REC, NULL);
if (r < 0) if (r < 0)
return r; return r;
}
} else { } else {
/* Let's mount the main root directory to the root directory to use */ /* Let's mount the main root directory to the root directory to use */
@ -3022,7 +3157,7 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
} }
/* Try to set up the new root directory before mounting anything else there. */ /* Try to set up the new root directory before mounting anything else there. */
if (p->root_image || p->root_directory || p->root_directory_fd >= 0) if (pinned_resource_is_set(p->rootfs))
(void) base_filesystem_create(root, UID_INVALID, GID_INVALID); (void) base_filesystem_create(root, UID_INVALID, GID_INVALID);
/* Now make the magic happen */ /* Now make the magic happen */
@ -3032,7 +3167,7 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
/* MS_MOVE does not work on MS_SHARED so the remount MS_SHARED will be done later */ /* MS_MOVE does not work on MS_SHARED so the remount MS_SHARED will be done later */
r = mount_switch_root(root, /* mount_propagation_flag = */ 0); r = mount_switch_root(root, /* mount_propagation_flag = */ 0);
if (r == -EINVAL && p->root_directory) { if (r == -EINVAL && p->rootfs && p->rootfs->directory) {
/* If we are using root_directory and we don't have privileges (ie: user manager in a user /* If we are using root_directory and we don't have privileges (ie: user manager in a user
* namespace) and the root_directory is already a mount point in the parent namespace, * namespace) and the root_directory is already a mount point in the parent namespace,
* MS_MOVE will fail as we don't have permission to change it (with EINVAL rather than * MS_MOVE will fail as we don't have permission to change it (with EINVAL rather than
@ -3041,6 +3176,7 @@ int setup_namespace(const NamespaceParameters *p, char **reterr_path) {
r = mount_nofollow_verbose(LOG_DEBUG, root, root, NULL, MS_BIND|MS_REC, NULL); r = mount_nofollow_verbose(LOG_DEBUG, root, root, NULL, MS_BIND|MS_REC, NULL);
if (r < 0) if (r < 0)
return r; return r;
r = mount_switch_root(root, /* mount_propagation_flag= */ 0); r = mount_switch_root(root, /* mount_propagation_flag= */ 0);
} }
if (r < 0) if (r < 0)
@ -4094,6 +4230,7 @@ static const char* const private_users_table[_PRIVATE_USERS_MAX] = {
[PRIVATE_USERS_SELF] = "self", [PRIVATE_USERS_SELF] = "self",
[PRIVATE_USERS_IDENTITY] = "identity", [PRIVATE_USERS_IDENTITY] = "identity",
[PRIVATE_USERS_FULL] = "full", [PRIVATE_USERS_FULL] = "full",
[PRIVATE_USERS_MANAGED] = "managed",
}; };
DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN(private_users, PrivateUsers, PRIVATE_USERS_SELF); DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN(private_users, PrivateUsers, PRIVATE_USERS_SELF);
@ -4104,3 +4241,26 @@ static const char* const private_pids_table[_PRIVATE_PIDS_MAX] = {
}; };
DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN(private_pids, PrivatePIDs, PRIVATE_PIDS_YES); DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN(private_pids, PrivatePIDs, PRIVATE_PIDS_YES);
void pinned_resource_done(PinnedResource *p) {
assert(p);
p->directory_fd = safe_close(p->directory_fd);
p->directory = mfree(p->directory);
p->image_fd = safe_close(p->image_fd);
p->image = mfree(p->image);
p->mstack_loaded = mstack_free(p->mstack_loaded);
p->mstack = mfree(p->mstack);
}
bool pinned_resource_is_set(const PinnedResource *p) {
if (!p)
return false;
return p->directory_fd >= 0 ||
p->directory ||
p->image_fd >= 0 ||
p->image ||
p->mstack_loaded ||
p->mstack;
}

View File

@ -70,6 +70,7 @@ typedef enum PrivateUsers {
PRIVATE_USERS_SELF, PRIVATE_USERS_SELF,
PRIVATE_USERS_IDENTITY, PRIVATE_USERS_IDENTITY,
PRIVATE_USERS_FULL, PRIVATE_USERS_FULL,
PRIVATE_USERS_MANAGED,
_PRIVATE_USERS_MAX, _PRIVATE_USERS_MAX,
_PRIVATE_USERS_INVALID = -EINVAL, _PRIVATE_USERS_INVALID = -EINVAL,
} PrivateUsers; } PrivateUsers;
@ -90,6 +91,24 @@ typedef enum PrivatePIDs {
_PRIVATE_PIDS_INVALID = -EINVAL, _PRIVATE_PIDS_INVALID = -EINVAL,
} PrivatePIDs; } PrivatePIDs;
typedef struct PinnedResource {
/* Pins a disk image, directory or mstack by file descriptors. The paths are stored too, but they are
* intended to be decoration only, to enhance log messages and should not be load-bearing
* otherwise. */
int directory_fd;
char *directory;
int image_fd;
char *image;
MStack *mstack_loaded;
char *mstack;
} PinnedResource;
#define PINNED_RESOURCE_NULL \
(PinnedResource) { \
.directory_fd = -EBADF, \
.image_fd = -EBADF, \
}
typedef struct BindMount { typedef struct BindMount {
char *source; char *source;
char *destination; char *destination;
@ -127,9 +146,7 @@ typedef struct MountImage {
typedef struct NamespaceParameters { typedef struct NamespaceParameters {
RuntimeScope runtime_scope; RuntimeScope runtime_scope;
int root_directory_fd; const PinnedResource *rootfs;
const char *root_directory;
const char *root_image;
const MountOptions *root_image_options; const MountOptions *root_image_options;
const ImagePolicy *root_image_policy; const ImagePolicy *root_image_policy;
@ -199,10 +216,13 @@ typedef struct NamespaceParameters {
PrivateTmp private_tmp; PrivateTmp private_tmp;
PrivateTmp private_var_tmp; PrivateTmp private_var_tmp;
PrivatePIDs private_pids; PrivatePIDs private_pids;
PrivateUsers private_users;
PidRef *bpffs_pidref; PidRef *bpffs_pidref;
int bpffs_socket_fd; int bpffs_socket_fd;
int bpffs_errno_pipe; int bpffs_errno_pipe;
sd_varlink *mountfsd_link;
} NamespaceParameters; } NamespaceParameters;
int setup_namespace(const NamespaceParameters *p, char **reterr_path); int setup_namespace(const NamespaceParameters *p, char **reterr_path);
@ -300,3 +320,6 @@ int refresh_extensions_in_namespace(
const PidRef *target, const PidRef *target,
const char *hierarchy_env, const char *hierarchy_env,
const NamespaceParameters *p); const NamespaceParameters *p);
void pinned_resource_done(PinnedResource *p);
bool pinned_resource_is_set(const PinnedResource *p);

View File

@ -3019,7 +3019,6 @@ static void service_enter_refresh_extensions(Service *s) {
.n_extension_images = s->exec_context.n_extension_images, .n_extension_images = s->exec_context.n_extension_images,
.extension_directories = s->exec_context.extension_directories, .extension_directories = s->exec_context.extension_directories,
.extension_image_policy = s->exec_context.extension_image_policy, .extension_image_policy = s->exec_context.extension_image_policy,
.root_directory_fd = -EBADF,
}; };
/* Only reload confext, and not sysext as they also typically contain the executable(s) used /* Only reload confext, and not sysext as they also typically contain the executable(s) used
@ -5887,8 +5886,9 @@ int service_determine_exec_selinux_label(Service *s, char **ret) {
if (s->exec_context.root_image || if (s->exec_context.root_image ||
s->exec_context.n_extension_images > 0 || s->exec_context.n_extension_images > 0 ||
!strv_isempty(s->exec_context.extension_directories)) /* We cannot chase paths through images */ !strv_isempty(s->exec_context.extension_directories) ||
return log_unit_debug_errno(UNIT(s), SYNTHETIC_ERRNO(ENODATA), "Service with RootImage=, ExtensionImages= or ExtensionDirectories= set, cannot determine socket SELinux label before activation, ignoring."); s->exec_context.root_mstack) /* We cannot chase paths through images */
return log_unit_debug_errno(UNIT(s), SYNTHETIC_ERRNO(ENODATA), "Service with RootImage=, ExtensionImages=, ExtensionDirectories= or RootMStack= set cannot determine socket SELinux label before activation, ignoring.");
ExecCommand *c = s->exec_command[SERVICE_EXEC_START]; ExecCommand *c = s->exec_command[SERVICE_EXEC_START];
if (!c) if (!c)

View File

@ -1266,6 +1266,12 @@ int unit_add_exec_dependencies(Unit *u, ExecContext *c) {
return r; return r;
} }
if (c->root_mstack) {
r = unit_add_mounts_for(u, c->root_mstack, UNIT_DEPENDENCY_FILE, UNIT_MOUNT_WANTS);
if (r < 0)
return r;
}
for (ExecDirectoryType dt = 0; dt < _EXEC_DIRECTORY_TYPE_MAX; dt++) { for (ExecDirectoryType dt = 0; dt < _EXEC_DIRECTORY_TYPE_MAX; dt++) {
if (!u->manager->prefix[dt]) if (!u->manager->prefix[dt])
continue; continue;
@ -1322,9 +1328,9 @@ int unit_add_exec_dependencies(Unit *u, ExecContext *c) {
return r; return r;
} }
if (c->root_image) { if (c->root_image || c->root_mstack) {
/* We need to wait for /dev/loopX to appear when doing RootImage=, hence let's add an /* We need to wait for /dev/loopX to appear when doing RootImage=, hence let's add an
* implicit dependency on udev */ * implicit dependency on udev. (And for RootMStack= we might need it) */
r = unit_add_dependency_by_name(u, UNIT_AFTER, SPECIAL_UDEVD_SERVICE, true, UNIT_DEPENDENCY_FILE); r = unit_add_dependency_by_name(u, UNIT_AFTER, SPECIAL_UDEVD_SERVICE, true, UNIT_DEPENDENCY_FILE);
if (r < 0) if (r < 0)
@ -4322,6 +4328,9 @@ static int unit_verify_contexts(const Unit *u) {
if (ec->pam_name && kc && !IN_SET(kc->kill_mode, KILL_CONTROL_GROUP, KILL_MIXED)) if (ec->pam_name && kc && !IN_SET(kc->kill_mode, KILL_CONTROL_GROUP, KILL_MIXED))
return log_unit_error_errno(u, SYNTHETIC_ERRNO(ENOEXEC), "Unit has PAM enabled. Kill mode must be set to 'control-group' or 'mixed'. Refusing."); return log_unit_error_errno(u, SYNTHETIC_ERRNO(ENOEXEC), "Unit has PAM enabled. Kill mode must be set to 'control-group' or 'mixed'. Refusing.");
if ((ec->user || ec->dynamic_user || ec->group || ec->pam_name) && ec->private_users == PRIVATE_USERS_MANAGED)
return log_unit_error_errno(u, SYNTHETIC_ERRNO(ENOEXEC), "PrivateUsers=managed may not be used in combination with User=/DynamicUser=/Group=/PAMName=, refusing.");
return 0; return 0;
} }
@ -4461,9 +4470,10 @@ int unit_patch_contexts(Unit *u) {
/* Only add these if needed, as they imply that everything else is blocked. */ /* Only add these if needed, as they imply that everything else is blocked. */
if (cgroup_context_has_device_policy(cc)) { if (cgroup_context_has_device_policy(cc)) {
if (ec->root_image || ec->mount_images) { if (ec->root_image || ec->mount_images || ec->root_mstack) {
/* When RootImage= or MountImages= is specified, the following devices are touched. */ /* When RootImage= or MountImages= is specified, the following devices are
* touched. For RootMStack= there's the possibility the are touched. */
FOREACH_STRING(p, "/dev/loop-control", "/dev/mapper/control") { FOREACH_STRING(p, "/dev/loop-control", "/dev/mapper/control") {
r = cgroup_context_add_device_allow(cc, p, CGROUP_DEVICE_READ|CGROUP_DEVICE_WRITE); r = cgroup_context_add_device_allow(cc, p, CGROUP_DEVICE_READ|CGROUP_DEVICE_WRITE);
if (r < 0) if (r < 0)

View File

@ -794,6 +794,7 @@ int unit_exec_context_build_json(sd_json_variant **ret, const char *name, void *
JSON_BUILD_PAIR_CALLBACK_NON_NULL("WorkingDirectory", working_directory_build_json, c), JSON_BUILD_PAIR_CALLBACK_NON_NULL("WorkingDirectory", working_directory_build_json, c),
JSON_BUILD_PAIR_STRING_NON_EMPTY("RootDirectory", c->root_directory), JSON_BUILD_PAIR_STRING_NON_EMPTY("RootDirectory", c->root_directory),
JSON_BUILD_PAIR_STRING_NON_EMPTY("RootImage", c->root_image), JSON_BUILD_PAIR_STRING_NON_EMPTY("RootImage", c->root_image),
JSON_BUILD_PAIR_STRING_NON_EMPTY("RootMStack", c->root_mstack),
JSON_BUILD_PAIR_CALLBACK_NON_NULL("RootImageOptions", root_image_options_build_json, c->root_image_options), JSON_BUILD_PAIR_CALLBACK_NON_NULL("RootImageOptions", root_image_options_build_json, c->root_image_options),
SD_JSON_BUILD_PAIR_BOOLEAN("RootEphemeral", c->root_ephemeral), SD_JSON_BUILD_PAIR_BOOLEAN("RootEphemeral", c->root_ephemeral),
JSON_BUILD_PAIR_BASE64_NON_EMPTY("RootHash", c->root_hash.iov_base, c->root_hash.iov_len), JSON_BUILD_PAIR_BASE64_NON_EMPTY("RootHash", c->root_hash.iov_base, c->root_hash.iov_len),

View File

@ -146,31 +146,31 @@ static int units_by_state_total_build_json(MetricFamilyContext *context, void *u
const MetricFamily metric_family_table[] = { const MetricFamily metric_family_table[] = {
/* Keep metrics ordered alphabetically */ /* Keep metrics ordered alphabetically */
{ {
.name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "nRestarts", .name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "NRestarts",
.description = "Per unit metric: number of restarts", .description = "Per unit metric: number of restarts",
.type = METRIC_FAMILY_TYPE_COUNTER, .type = METRIC_FAMILY_TYPE_COUNTER,
.generate = nrestarts_build_json, .generate = nrestarts_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "unitActiveState", .name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "UnitActiveState",
.description = "Per unit metric: active state", .description = "Per unit metric: active state",
.type = METRIC_FAMILY_TYPE_STRING, .type = METRIC_FAMILY_TYPE_STRING,
.generate = unit_active_state_build_json, .generate = unit_active_state_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "unitLoadState", .name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "UnitLoadState",
.description = "Per unit metric: load state", .description = "Per unit metric: load state",
.type = METRIC_FAMILY_TYPE_STRING, .type = METRIC_FAMILY_TYPE_STRING,
.generate = unit_load_state_build_json, .generate = unit_load_state_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "unitsByStateTotal", .name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "UnitsByStateTotal",
.description = "Total number of units of different state", .description = "Total number of units of different state",
.type = METRIC_FAMILY_TYPE_GAUGE, .type = METRIC_FAMILY_TYPE_GAUGE,
.generate = units_by_state_total_build_json, .generate = units_by_state_total_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "unitsByTypeTotal", .name = METRIC_IO_SYSTEMD_MANAGER_PREFIX "UnitsByTypeTotal",
.description = "Total number of units of different types", .description = "Total number of units of different types",
.type = METRIC_FAMILY_TYPE_GAUGE, .type = METRIC_FAMILY_TYPE_GAUGE,
.generate = units_by_type_total_build_json, .generate = units_by_type_total_build_json,

View File

@ -2247,12 +2247,16 @@ static int run(int argc, char *argv[]) {
/* Don't run things in private userns, if the mount shall be attached to the host /* Don't run things in private userns, if the mount shall be attached to the host
* or if we're copying from/to the host. */ * or if we're copying from/to the host. */
if (!IN_SET(arg_action, ACTION_MOUNT, ACTION_WITH, ACTION_COPY_FROM, ACTION_COPY_TO)) { if (!IN_SET(arg_action, ACTION_MOUNT, ACTION_WITH, ACTION_COPY_FROM, ACTION_COPY_TO)) {
userns_fd = nsresource_allocate_userns(/* name= */ NULL, NSRESOURCE_UIDS_64K); /* allocate 64K users by default */ userns_fd = nsresource_allocate_userns(
/* vl= */ NULL,
/* name= */ NULL,
NSRESOURCE_UIDS_64K); /* allocate 64K users by default */
if (userns_fd < 0) if (userns_fd < 0)
return log_error_errno(userns_fd, "Failed to allocate user namespace with 64K users: %m"); return log_error_errno(userns_fd, "Failed to allocate user namespace with 64K users: %m");
} }
r = mountfsd_mount_image( r = mountfsd_mount_image(
/* vl= */ NULL,
arg_image, arg_image,
userns_fd, userns_fd,
/* options= */ NULL, /* options= */ NULL,
@ -2261,7 +2265,7 @@ static int run(int argc, char *argv[]) {
arg_flags, arg_flags,
&m); &m);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to mount image via mountfsd: %m");
} }
} }

View File

@ -363,9 +363,14 @@ int tar_export_start(
return log_error_errno(r, "Failed to open '%s': %m", p); return log_error_errno(r, "Failed to open '%s': %m", p);
_cleanup_close_ int mapped_fd = -EBADF; _cleanup_close_ int mapped_fd = -EBADF;
r = mountfsd_mount_directory_fd(directory_fd, e->userns_fd, DISSECT_IMAGE_FOREIGN_UID, &mapped_fd); r = mountfsd_mount_directory_fd(
/* vl= */ NULL,
directory_fd,
e->userns_fd,
DISSECT_IMAGE_FOREIGN_UID,
&mapped_fd);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to mount directory via mountfsd: %m");
/* Drop O_PATH */ /* Drop O_PATH */
e->tree_fd = fd_reopen(mapped_fd, O_DIRECTORY|O_CLOEXEC); e->tree_fd = fd_reopen(mapped_fd, O_DIRECTORY|O_CLOEXEC);

View File

@ -375,7 +375,10 @@ int import_make_foreign_userns(int *userns_fd) {
if (*userns_fd >= 0) if (*userns_fd >= 0)
return 0; return 0;
*userns_fd = nsresource_allocate_userns(/* name= */ NULL, NSRESOURCE_UIDS_64K); /* allocate 64K users */ *userns_fd = nsresource_allocate_userns(
/* vl= */ NULL,
/* name= */ NULL,
NSRESOURCE_UIDS_64K); /* allocate 64K users */
if (*userns_fd < 0) if (*userns_fd < 0)
return log_error_errno(*userns_fd, "Failed to allocate transient user namespace: %m"); return log_error_errno(*userns_fd, "Failed to allocate transient user namespace: %m");

View File

@ -23,13 +23,14 @@ typedef enum ImportFlags {
IMPORT_PULL_ROOTHASH_SIGNATURE = 1 << 11, /* only for raw: download .roothash.p7s file for verity */ IMPORT_PULL_ROOTHASH_SIGNATURE = 1 << 11, /* only for raw: download .roothash.p7s file for verity */
IMPORT_PULL_VERITY = 1 << 12, /* only for raw: download .verity file for verity */ IMPORT_PULL_VERITY = 1 << 12, /* only for raw: download .verity file for verity */
/* The supported flags for the tar and the raw importing */ /* The supported flags for the tar and raw importing */
IMPORT_FLAGS_MASK_TAR = IMPORT_FORCE|IMPORT_READ_ONLY|IMPORT_BTRFS_SUBVOL|IMPORT_BTRFS_QUOTA|IMPORT_DIRECT|IMPORT_SYNC|IMPORT_FOREIGN_UID, IMPORT_FLAGS_MASK_TAR = IMPORT_FORCE|IMPORT_READ_ONLY|IMPORT_BTRFS_SUBVOL|IMPORT_BTRFS_QUOTA|IMPORT_DIRECT|IMPORT_SYNC|IMPORT_FOREIGN_UID,
IMPORT_FLAGS_MASK_RAW = IMPORT_FORCE|IMPORT_READ_ONLY|IMPORT_CONVERT_QCOW2|IMPORT_DIRECT|IMPORT_SYNC, IMPORT_FLAGS_MASK_RAW = IMPORT_FORCE|IMPORT_READ_ONLY|IMPORT_CONVERT_QCOW2|IMPORT_DIRECT|IMPORT_SYNC,
/* The supported flags for the tar and the raw pulling */ /* The supported flags for the tar, raw, oci pulling */
IMPORT_PULL_FLAGS_MASK_TAR = IMPORT_FLAGS_MASK_TAR|IMPORT_PULL_KEEP_DOWNLOAD|IMPORT_PULL_SETTINGS, IMPORT_PULL_FLAGS_MASK_TAR = IMPORT_FLAGS_MASK_TAR|IMPORT_PULL_KEEP_DOWNLOAD|IMPORT_PULL_SETTINGS,
IMPORT_PULL_FLAGS_MASK_RAW = IMPORT_FLAGS_MASK_RAW|IMPORT_PULL_KEEP_DOWNLOAD|IMPORT_PULL_SETTINGS|IMPORT_PULL_ROOTHASH|IMPORT_PULL_ROOTHASH_SIGNATURE|IMPORT_PULL_VERITY, IMPORT_PULL_FLAGS_MASK_RAW = IMPORT_FLAGS_MASK_RAW|IMPORT_PULL_KEEP_DOWNLOAD|IMPORT_PULL_SETTINGS|IMPORT_PULL_ROOTHASH|IMPORT_PULL_ROOTHASH_SIGNATURE|IMPORT_PULL_VERITY,
IMPORT_PULL_FLAGS_MASK_OCI = IMPORT_FORCE|IMPORT_READ_ONLY|IMPORT_BTRFS_SUBVOL|IMPORT_BTRFS_QUOTA|IMPORT_SYNC|IMPORT_FOREIGN_UID|IMPORT_PULL_SETTINGS,
_IMPORT_FLAGS_INVALID = -EINVAL, _IMPORT_FLAGS_INVALID = -EINVAL,
} ImportFlags; } ImportFlags;

View File

@ -4,6 +4,7 @@
#include "sd-daemon.h" #include "sd-daemon.h"
#include "sd-event.h" #include "sd-event.h"
#include "sd-varlink.h"
#include "alloc-util.h" #include "alloc-util.h"
#include "btrfs-util.h" #include "btrfs-util.h"
@ -256,14 +257,29 @@ static int tar_import_fork_tar(TarImport *i) {
if (r < 0) if (r < 0)
return r; return r;
_cleanup_close_ int directory_fd = -EBADF; _cleanup_(sd_varlink_unrefp) sd_varlink *mountfsd_link = NULL;
r = mountfsd_make_directory(d, MODE_INVALID, /* flags= */ 0, &directory_fd); r = mountfsd_connect(&mountfsd_link);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to connect to mountfsd: %m");
r = mountfsd_mount_directory_fd(directory_fd, i->userns_fd, DISSECT_IMAGE_FOREIGN_UID, &i->tree_fd); _cleanup_close_ int directory_fd = -EBADF;
r = mountfsd_make_directory(
mountfsd_link,
d,
MODE_INVALID,
/* flags= */ 0,
&directory_fd);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to make directory via mountfsd: %m");
r = mountfsd_mount_directory_fd(
mountfsd_link,
directory_fd,
i->userns_fd,
DISSECT_IMAGE_FOREIGN_UID,
&i->tree_fd);
if (r < 0)
return log_error_errno(r, "Failed mount directory via mountfsd: %m");
} else { } else {
if (i->flags & IMPORT_BTRFS_SUBVOL) if (i->flags & IMPORT_BTRFS_SUBVOL)
r = btrfs_subvol_make_fallback(AT_FDCWD, d, 0755); r = btrfs_subvol_make_fallback(AT_FDCWD, d, 0755);

View File

@ -19,6 +19,7 @@
#include "import-util.h" #include "import-util.h"
#include "log.h" #include "log.h"
#include "main-func.h" #include "main-func.h"
#include "oci-util.h"
#include "os-util.h" #include "os-util.h"
#include "pager.h" #include "pager.h"
#include "parse-argument.h" #include "parse-argument.h"
@ -769,6 +770,61 @@ static int pull_raw(int argc, char *argv[], void *userdata) {
return transfer_image_common(bus, m); return transfer_image_common(bus, m);
} }
static int pull_oci(int argc, char *argv[], void *userdata) {
_cleanup_(sd_bus_message_unrefp) sd_bus_message *m = NULL;
_cleanup_free_ char *l = NULL;
const char *local, *remote;
sd_bus *bus = ASSERT_PTR(userdata);
int r;
r = settle_image_class();
if (r < 0)
return r;
remote = argv[1];
_cleanup_free_ char *image = NULL;
r = oci_ref_parse(remote, /* ret_registry= */ NULL, &image, /* ret_tag= */ NULL);
if (r == -EINVAL)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Ref '%s' is not valid.", remote);
if (r < 0)
return log_error_errno(r, "Failed to determine if ref '%s' is valid.", remote);
if (argc >= 3)
local = argv[2];
else {
r = path_extract_filename(image, &l);
if (r < 0)
return log_error_errno(r, "Failed to get final component of reference: %m");
local = l;
}
local = empty_or_dash_to_null(local);
if (local) {
if (!image_name_is_valid(local))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL),
"Local name %s is not a suitable image name.",
local);
}
r = bus_message_new_method_call(bus, &m, bus_import_mgr, "PullOci");
if (r < 0)
return bus_log_create_error(r);
r = sd_bus_message_append(
m,
"ssst",
remote,
local,
image_class_to_string(arg_image_class),
(uint64_t) arg_import_flags & (IMPORT_FORCE|IMPORT_READ_ONLY));
if (r < 0)
return bus_log_create_error(r);
return transfer_image_common(bus, m);
}
static int list_transfers(int argc, char *argv[], void *userdata) { static int list_transfers(int argc, char *argv[], void *userdata) {
_cleanup_(sd_bus_error_free) sd_bus_error error = SD_BUS_ERROR_NULL; _cleanup_(sd_bus_error_free) sd_bus_error error = SD_BUS_ERROR_NULL;
_cleanup_(sd_bus_message_unrefp) sd_bus_message *reply = NULL; _cleanup_(sd_bus_message_unrefp) sd_bus_message *reply = NULL;
@ -1007,6 +1063,7 @@ static int help(int argc, char *argv[], void *userdata) {
"\n%3$sCommands:%4$s\n" "\n%3$sCommands:%4$s\n"
" pull-tar URL [NAME] Download a TAR container image\n" " pull-tar URL [NAME] Download a TAR container image\n"
" pull-raw URL [NAME] Download a RAW container or VM image\n" " pull-raw URL [NAME] Download a RAW container or VM image\n"
" pull-oci REF [NAME] Download an OCI container image\n"
" import-tar FILE [NAME] Import a local TAR container image\n" " import-tar FILE [NAME] Import a local TAR container image\n"
" import-raw FILE [NAME] Import a local RAW container or VM image\n" " import-raw FILE [NAME] Import a local RAW container or VM image\n"
" import-fs DIRECTORY [NAME] Import a local directory container image\n" " import-fs DIRECTORY [NAME] Import a local directory container image\n"
@ -1245,6 +1302,7 @@ static int importctl_main(int argc, char *argv[], sd_bus *bus) {
{ "export-tar", 2, 3, 0, export_tar }, { "export-tar", 2, 3, 0, export_tar },
{ "export-raw", 2, 3, 0, export_raw }, { "export-raw", 2, 3, 0, export_raw },
{ "pull-tar", 2, 3, 0, pull_tar }, { "pull-tar", 2, 3, 0, pull_tar },
{ "pull-oci", 2, 3, 0, pull_oci },
{ "pull-raw", 2, 3, 0, pull_raw }, { "pull-raw", 2, 3, 0, pull_raw },
{ "list-transfers", VERB_ANY, 1, VERB_DEFAULT, list_transfers }, { "list-transfers", VERB_ANY, 1, VERB_DEFAULT, list_transfers },
{ "cancel-transfer", 2, VERB_ANY, 0, cancel_transfer }, { "cancel-transfer", 2, VERB_ANY, 0, cancel_transfer },

View File

@ -29,6 +29,7 @@
#include "json-util.h" #include "json-util.h"
#include "main-func.h" #include "main-func.h"
#include "notify-recv.h" #include "notify-recv.h"
#include "oci-util.h"
#include "os-util.h" #include "os-util.h"
#include "parse-util.h" #include "parse-util.h"
#include "path-lookup.h" #include "path-lookup.h"
@ -58,6 +59,7 @@ typedef enum TransferType {
TRANSFER_EXPORT_RAW, TRANSFER_EXPORT_RAW,
TRANSFER_PULL_TAR, TRANSFER_PULL_TAR,
TRANSFER_PULL_RAW, TRANSFER_PULL_RAW,
TRANSFER_PULL_OCI,
_TRANSFER_TYPE_MAX, _TRANSFER_TYPE_MAX,
_TRANSFER_TYPE_INVALID = -EINVAL, _TRANSFER_TYPE_INVALID = -EINVAL,
} TransferType; } TransferType;
@ -127,6 +129,7 @@ static const char* const transfer_type_table[_TRANSFER_TYPE_MAX] = {
[TRANSFER_EXPORT_RAW] = "export-raw", [TRANSFER_EXPORT_RAW] = "export-raw",
[TRANSFER_PULL_TAR] = "pull-tar", [TRANSFER_PULL_TAR] = "pull-tar",
[TRANSFER_PULL_RAW] = "pull-raw", [TRANSFER_PULL_RAW] = "pull-raw",
[TRANSFER_PULL_OCI] = "pull-oci",
}; };
DEFINE_PRIVATE_STRING_TABLE_LOOKUP_TO_STRING(transfer_type, TransferType); DEFINE_PRIVATE_STRING_TABLE_LOOKUP_TO_STRING(transfer_type, TransferType);
@ -497,6 +500,7 @@ static int transfer_start(Transfer *t) {
case TRANSFER_PULL_TAR: case TRANSFER_PULL_TAR:
case TRANSFER_PULL_RAW: case TRANSFER_PULL_RAW:
case TRANSFER_PULL_OCI:
cmd[k++] = SYSTEMD_PULL_PATH; cmd[k++] = SYSTEMD_PULL_PATH;
break; break;
@ -518,6 +522,10 @@ static int transfer_start(Transfer *t) {
cmd[k++] = "raw"; cmd[k++] = "raw";
break; break;
case TRANSFER_PULL_OCI:
cmd[k++] = "oci";
break;
case TRANSFER_IMPORT_FS: case TRANSFER_IMPORT_FS:
cmd[k++] = "run"; cmd[k++] = "run";
break; break;
@ -1053,7 +1061,7 @@ static int method_export_tar_or_raw(sd_bus_message *msg, void *userdata, sd_bus_
return 1; return 1;
} }
static int method_pull_tar_or_raw(sd_bus_message *msg, void *userdata, sd_bus_error *error) { static int method_pull_tar_or_raw_or_oci(sd_bus_message *msg, void *userdata, sd_bus_error *error) {
_cleanup_(transfer_unrefp) Transfer *t = NULL; _cleanup_(transfer_unrefp) Transfer *t = NULL;
ImageClass class = _IMAGE_CLASS_INVALID; ImageClass class = _IMAGE_CLASS_INVALID;
const char *remote, *local, *verify; const char *remote, *local, *verify;
@ -1078,7 +1086,24 @@ static int method_pull_tar_or_raw(sd_bus_message *msg, void *userdata, sd_bus_er
return 1; /* Will call us back */ return 1; /* Will call us back */
} }
if (endswith(sd_bus_message_get_member(msg), "Ex")) { if (streq(sd_bus_message_get_member(msg), "PullOci")) {
const char *sclass;
r = sd_bus_message_read(msg, "ssst", &remote, &local, &sclass, &flags);
if (r < 0)
return r;
class = image_class_from_string(sclass);
if (class < 0)
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS,
"Image class '%s' not known", sclass);
if (flags & ~(IMPORT_FORCE|IMPORT_READ_ONLY))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS,
"Flags 0x%" PRIx64 " invalid", flags);
verify = NULL;
} else if (endswith(sd_bus_message_get_member(msg), "Ex")) {
const char *sclass; const char *sclass;
r = sd_bus_message_read(msg, "sssst", &remote, &local, &sclass, &verify, &flags); r = sd_bus_message_read(msg, "sssst", &remote, &local, &sclass, &verify, &flags);
@ -1106,9 +1131,21 @@ static int method_pull_tar_or_raw(sd_bus_message *msg, void *userdata, sd_bus_er
SET_FLAG(flags, IMPORT_FORCE, force); SET_FLAG(flags, IMPORT_FORCE, force);
} }
if (!http_url_is_valid(remote) && !file_url_is_valid(remote)) type = startswith(sd_bus_message_get_member(msg), "PullTar") ? TRANSFER_PULL_TAR :
startswith(sd_bus_message_get_member(msg), "PullRaw") ? TRANSFER_PULL_RAW :
streq(sd_bus_message_get_member(msg), "PullOci") ? TRANSFER_PULL_OCI : _TRANSFER_TYPE_INVALID;
assert(type >= 0);
if (type == TRANSFER_PULL_OCI) {
r = oci_ref_valid(remote);
if (r < 0)
return r;
if (r == 0)
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS,
"URL %s is invalid", remote); "Reference '%s' is invalid", remote);
} else if (!http_url_is_valid(remote) && !file_url_is_valid(remote))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS,
"URL '%s' is invalid", remote);
if (isempty(local)) if (isempty(local))
local = NULL; local = NULL;
@ -1116,13 +1153,16 @@ static int method_pull_tar_or_raw(sd_bus_message *msg, void *userdata, sd_bus_er
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS,
"Local image name %s is invalid", local); "Local image name %s is invalid", local);
if (isempty(verify)) if (type == TRANSFER_PULL_OCI)
v = _IMPORT_VERIFY_INVALID;
else if (isempty(verify))
v = IMPORT_VERIFY_SIGNATURE; v = IMPORT_VERIFY_SIGNATURE;
else else {
v = import_verify_from_string(verify); v = import_verify_from_string(verify);
if (v < 0) if (v < 0)
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS,
"Unknown verification mode %s", verify); "Unknown verification mode %s", verify);
}
if (class == IMAGE_MACHINE) { if (class == IMAGE_MACHINE) {
r = image_setup_pool(m->runtime_scope, class, m->use_btrfs_subvol, m->use_btrfs_quota); r = image_setup_pool(m->runtime_scope, class, m->use_btrfs_subvol, m->use_btrfs_quota);
@ -1130,9 +1170,6 @@ static int method_pull_tar_or_raw(sd_bus_message *msg, void *userdata, sd_bus_er
return sd_bus_error_set_errnof(error, r, "Failed to set up machine pool: %m"); return sd_bus_error_set_errnof(error, r, "Failed to set up machine pool: %m");
} }
type = startswith(sd_bus_message_get_member(msg), "PullTar") ?
TRANSFER_PULL_TAR : TRANSFER_PULL_RAW;
if (manager_find(m, type, remote)) if (manager_find(m, type, remote))
return sd_bus_error_setf(error, BUS_ERROR_TRANSFER_IN_PROGRESS, return sd_bus_error_setf(error, BUS_ERROR_TRANSFER_IN_PROGRESS,
"Transfer for %s already in progress.", remote); "Transfer for %s already in progress.", remote);
@ -1613,51 +1650,52 @@ static const sd_bus_vtable manager_vtable[] = {
SD_BUS_PARAM(transfer_path), SD_BUS_PARAM(transfer_path),
method_export_tar_or_raw, method_export_tar_or_raw,
SD_BUS_VTABLE_UNPRIVILEGED), SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_METHOD_WITH_NAMES("PullTar", SD_BUS_METHOD_WITH_ARGS("PullTar",
"sssb", SD_BUS_ARGS("s", url,
SD_BUS_PARAM(url) "s", local_name,
SD_BUS_PARAM(local_name) "s", verify_mode,
SD_BUS_PARAM(verify_mode) "b", force),
SD_BUS_PARAM(force), SD_BUS_RESULT("u", transfer_id,
"uo", "o", transfer_path),
SD_BUS_PARAM(transfer_id) method_pull_tar_or_raw_or_oci,
SD_BUS_PARAM(transfer_path),
method_pull_tar_or_raw,
SD_BUS_VTABLE_UNPRIVILEGED), SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_METHOD_WITH_NAMES("PullTarEx", SD_BUS_METHOD_WITH_ARGS("PullTarEx",
"sssst", SD_BUS_ARGS("s", url,
SD_BUS_PARAM(url) "s", local_name,
SD_BUS_PARAM(local_name) "s", class,
SD_BUS_PARAM(class) "s", verify_mode,
SD_BUS_PARAM(verify_mode) "t", flags),
SD_BUS_PARAM(flags), SD_BUS_RESULT("u", transfer_id,
"uo", "o", transfer_path),
SD_BUS_PARAM(transfer_id) method_pull_tar_or_raw_or_oci,
SD_BUS_PARAM(transfer_path),
method_pull_tar_or_raw,
SD_BUS_VTABLE_UNPRIVILEGED), SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_METHOD_WITH_NAMES("PullRaw", SD_BUS_METHOD_WITH_ARGS("PullRaw",
"sssb", SD_BUS_ARGS("s", url,
SD_BUS_PARAM(url) "s", local_name,
SD_BUS_PARAM(local_name) "s", verify_mode,
SD_BUS_PARAM(verify_mode) "b", force),
SD_BUS_PARAM(force), SD_BUS_RESULT("u", transfer_id,
"uo", "o", transfer_path),
SD_BUS_PARAM(transfer_id) method_pull_tar_or_raw_or_oci,
SD_BUS_PARAM(transfer_path),
method_pull_tar_or_raw,
SD_BUS_VTABLE_UNPRIVILEGED), SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_METHOD_WITH_NAMES("PullRawEx", SD_BUS_METHOD_WITH_ARGS("PullRawEx",
"sssst", SD_BUS_ARGS("s", url,
SD_BUS_PARAM(url) "s", local_name,
SD_BUS_PARAM(local_name) "s", class,
SD_BUS_PARAM(class) "s", verify_mode,
SD_BUS_PARAM(verify_mode) "t", flags),
SD_BUS_PARAM(flags), SD_BUS_RESULT("u", transfer_id,
"uo", "o", transfer_path),
SD_BUS_PARAM(transfer_id) method_pull_tar_or_raw_or_oci,
SD_BUS_PARAM(transfer_path), SD_BUS_VTABLE_UNPRIVILEGED),
method_pull_tar_or_raw, SD_BUS_METHOD_WITH_ARGS("PullOci",
SD_BUS_ARGS("s", ref,
"s", local_name,
"s", class,
"t", flags),
SD_BUS_RESULT("u", transfer_id,
"o", transfer_path),
method_pull_tar_or_raw_or_oci,
SD_BUS_VTABLE_UNPRIVILEGED), SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_METHOD_WITH_NAMES("ListTransfers", SD_BUS_METHOD_WITH_NAMES("ListTransfers",
NULL,, NULL,,
@ -1864,7 +1902,13 @@ static int vl_method_pull(sd_varlink *link, sd_json_variant *parameters, sd_varl
if (r != 0) if (r != 0)
return r; return r;
if (!http_url_is_valid(p.remote) && !file_url_is_valid(p.remote)) if (p.type == IMPORT_OCI) {
r = oci_ref_valid(p.remote);
if (r < 0)
return r;
if (r == 0)
return sd_varlink_error_invalid_parameter_name(link, "remote");
} else if (!http_url_is_valid(p.remote) && !file_url_is_valid(p.remote))
return sd_varlink_error_invalid_parameter_name(link, "remote"); return sd_varlink_error_invalid_parameter_name(link, "remote");
if (p.local && !image_name_is_valid(p.local)) if (p.local && !image_name_is_valid(p.local))
@ -1874,8 +1918,8 @@ static int vl_method_pull(sd_varlink *link, sd_json_variant *parameters, sd_varl
TransferType tt = TransferType tt =
p.type == IMPORT_TAR ? TRANSFER_PULL_TAR : p.type == IMPORT_TAR ? TRANSFER_PULL_TAR :
p.type == IMPORT_RAW ? TRANSFER_PULL_RAW : _TRANSFER_TYPE_INVALID; p.type == IMPORT_RAW ? TRANSFER_PULL_RAW :
p.type == IMPORT_OCI ? TRANSFER_PULL_OCI : _TRANSFER_TYPE_INVALID;
assert(tt >= 0); assert(tt >= 0);
if (manager_find(m, tt, p.remote)) if (manager_find(m, tt, p.remote))

View File

@ -18,8 +18,10 @@ executables += [
'dbus' : true, 'dbus' : true,
'sources' : files( 'sources' : files(
'importd.c', 'importd.c',
'oci-util.c',
), ),
'extract' : files( 'extract' : files(
'oci-util.c',
'import-common.c', 'import-common.c',
'import-compress.c', 'import-compress.c',
'qcow2-util.c', 'qcow2-util.c',
@ -30,12 +32,13 @@ executables += [
'name' : 'systemd-pull', 'name' : 'systemd-pull',
'public' : true, 'public' : true,
'sources' : files( 'sources' : files(
'curl-util.c',
'pull.c', 'pull.c',
'pull-common.c',
'pull-job.c',
'pull-oci.c',
'pull-raw.c', 'pull-raw.c',
'pull-tar.c', 'pull-tar.c',
'pull-job.c',
'pull-common.c',
'curl-util.c',
), ),
'objects' : ['systemd-importd'], 'objects' : ['systemd-importd'],
'dependencies' : common_deps + [ 'dependencies' : common_deps + [
@ -77,6 +80,8 @@ executables += [
'name' : 'importctl', 'name' : 'importctl',
'public' : true, 'public' : true,
'sources' : files('importctl.c'), 'sources' : files('importctl.c'),
'objects': ['systemd-importd'],
'dependencies' : common_deps,
}, },
generator_template + { generator_template + {
'name' : 'systemd-import-generator', 'name' : 'systemd-import-generator',
@ -92,7 +97,11 @@ executables += [
'dependencies' : common_deps, 'dependencies' : common_deps,
'type' : 'manual', 'type' : 'manual',
}, },
test_template + {
'sources' : files('test-oci-util.c'),
'objects': ['systemd-importd'],
'dependencies' : common_deps,
},
] ]
install_data('org.freedesktop.import1.conf', install_data('org.freedesktop.import1.conf',
@ -108,3 +117,28 @@ install_data('org.freedesktop.import1.policy',
install_data('import-pubring.pgp', install_data('import-pubring.pgp',
install_dir : libexecdir) install_dir : libexecdir)
# TODO: shouldn't this be in pkgdatadir? # TODO: shouldn't this be in pkgdatadir?
oci_registry_files = [
'registry.docker.io.oci-registry',
'registry.fedora.oci-registry',
]
oci_registry_symlinks = [
[ 'default.oci-registry', 'registry.' + get_option('default-oci-registry') + '.oci-registry' ],
[ 'image.alpine.oci-registry', 'registry.docker.io.oci-registry' ],
[ 'image.archlinux.oci-registry', 'registry.docker.io.oci-registry' ],
[ 'image.debian.oci-registry', 'registry.docker.io.oci-registry' ],
[ 'image.fedora-minimal.oci-registry', 'registry.fedora.oci-registry' ],
[ 'image.fedora.oci-registry', 'registry.fedora.oci-registry' ],
[ 'image.ubuntu.oci-registry', 'registry.docker.io.oci-registry' ],
]
foreach file : oci_registry_files
install_data('oci-registry' / file,
install_dir : libexecdir / 'oci-registry')
endforeach
foreach tuple: oci_registry_symlinks
install_symlink(tuple[0],
pointing_to : tuple[1],
install_dir : libexecdir / 'oci-registry')
endforeach

View File

@ -0,0 +1,4 @@
{
"shortImagePrefix" : "library/",
"overrideRegistry" : "registry-1.docker.io"
}

View File

@ -0,0 +1,3 @@
{
"defaultRegistry" : "registry.fedoraproject.org"
}

401
src/import/oci-util.c Normal file
View File

@ -0,0 +1,401 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include <string.h>
#include "sd-json.h"
#include "alloc-util.h"
#include "architecture.h"
#include "constants.h"
#include "dns-domain.h"
#include "fd-util.h"
#include "fileio.h"
#include "hexdecoct.h"
#include "log.h"
#include "oci-util.h"
#include "parse-util.h"
#include "string-table.h"
#include "string-util.h"
bool oci_image_is_valid(const char *n) {
bool slash = true;
/* The OCI spec suggests validating this regex:
*
* [a-z0-9]+((\.|_|__|-+)[a-z0-9]+)*(\/[a-z0-9]+((\.|_|__|-+)[a-z0-9]+)*)*
*
* We implement a generalization of this, i.e. do not insist on the single ".", "_", "__", "-", "+"
* separator, but allow any number of them. And we refuse leading dots, since if used in the fs this
* would make the files hidden, and we probably don't want that.
*/
for (const char *p = n; *p; p++) {
if (*p == '/') {
if (slash)
return false;
slash = true;
continue;
}
if (!strchr(slash ? LOWERCASE_LETTERS DIGITS "_-+" :
"." LOWERCASE_LETTERS DIGITS "_-+", *p))
return false;
slash = false;
}
return !slash;
}
int oci_registry_is_valid(const char *n) {
int r;
if (!n)
return false;
const char *colon = strchr(n, ':');
if (!colon)
return dns_name_is_valid(n);
_cleanup_free_ char *s = strndup(n, colon - n);
if (!s)
return -ENOMEM;
r = dns_name_is_valid(s);
if (r <= 0)
return r;
uint16_t port;
return safe_atou16(s, &port) >= 0 && port != 0;
}
bool oci_tag_is_valid(const char *n) {
if (!n)
return false;
/* As per https://github.com/opencontainers/distribution-spec/blob/main/spec.md, accept the following regex:
*
* [a-zA-Z0-9_][a-zA-Z0-9._-]{0,127}
*/
if (!strchr(LETTERS DIGITS "_", n[0]))
return false;
size_t l = strspn(n + 1, LETTERS DIGITS "._-");
if (l > 126)
return false;
if (n[1+l] != 0)
return false;
return true;
}
int oci_ref_parse(
const char *ref,
char **ret_registry,
char **ret_image,
char **ret_tag) {
int r;
assert(ref);
_cleanup_free_ char *without_tag = NULL, *tag = NULL;
const char *t = strrchr(ref, ':');
if (t) {
tag = strdup(t + 1);
if (!tag)
return -ENOMEM;
if (!oci_tag_is_valid(tag))
return log_debug_errno(SYNTHETIC_ERRNO(EINVAL), "OCI tag specification '%s' is not valid.", tag);
without_tag = strndup(ref, t - ref);
if (!without_tag)
return -ENOMEM;
ref = without_tag;
}
_cleanup_free_ char *image = NULL, *registry = NULL;
t = strchr(ref, '/');
if (t) {
registry = strndup(ref, t - ref);
if (!registry)
return -ENOMEM;
r = oci_registry_is_valid(registry);
if (r < 0)
return r;
if (r == 0)
return log_debug_errno(SYNTHETIC_ERRNO(EINVAL), "OCI registry specification '%s' is not valid.", registry);
image = strdup(t + 1);
} else
image = strdup(ref);
if (!image)
return -ENOMEM;
if (!oci_image_is_valid(image))
return log_debug_errno(SYNTHETIC_ERRNO(EINVAL), "OCI image specification '%s' is not valid.", registry);
if (ret_registry)
*ret_registry = TAKE_PTR(registry);
if (ret_image)
*ret_image = TAKE_PTR(image);
if (ret_tag)
*ret_tag = TAKE_PTR(tag);
return 0;
}
int oci_ref_normalize(char **protocol, char **registry, char **image, char **tag) {
int r;
assert(protocol);
assert(registry);
assert(image && *image);
assert(tag);
/* OCI container reference are supposed to have the form <registry>/<name>:<tag>. Except that it's
* all super messy, and for some registries the server name differs from the name people use in the
* references, and there are special rules for "short" container names (i.e. those which do not
* contain a "/"), and more. To deal with this, we devise a relatively simple scheme, to normalize
* such names. Specifically:
*
* If a registry is specified we look for
* /usr/lib/systemd/oci-registry/registry.<registry>.oci-registry for registry-specific rules, to
* enforce on the reference. If no registry is specified, we look for an
* /usr/lib/systemd/oci-registry/image.<name>.oci-registry file, which contains image-specific rules
* instead. If this is not found we load /usr/lib/systemd/oci-registry/default.oci-registry
* instead. The files are encoded in JSON.
*
* The rules we apply are relatively simple:
*
* defaultProtocol controls which protocol to use if none is known. This should always be https
* (since OCI images are authenticated purely via HTTPS), but for testing purposes "file" might be
* useful too.
*
* overrideRegistry encodes which registry server to actually use, overriding what might have been
* specified.
*
* overrideImage encodes which image name to actually use, overriding what might have been specified.
*
* shortImagePrefix encodes a name prefix to prepend to "short" container names. This has no effect
* if overrideImage is set too.
*
* defaultTag contains a tag to use as default, if none is specified. If not configured this
* defaults to "latest".
*/
_cleanup_free_ char *fn = NULL;
if (*registry) {
/* If a registry is specified, we'll always respect it, and use it as only search key */
_cleanup_free_ char *e = urlescape(*registry);
if (!e)
return -ENOMEM;
fn = strjoin("registry.", e, ".oci-registry");
} else {
/* If no registry is specified, let's go by image name */
_cleanup_free_ char *e = urlescape(*image);
if (!e)
return -ENOMEM;
fn = strjoin("image.", e, ".oci-registry");
}
if (!fn)
return -ENOMEM;
_cleanup_fclose_ FILE *f = NULL;
_cleanup_free_ char *path = NULL;
r = search_and_fopen_nulstr(fn, "re", /* root= */ NULL, CONF_PATHS_NULSTR("systemd/oci-registry"), &f, &path);
if (r == -ENOENT)
r = search_and_fopen_nulstr("default.oci-registry", "re", /* root= */ NULL, CONF_PATHS_NULSTR("systemd/oci-registry"), &f, &path);
if (r < 0 && r != -ENOENT)
return log_debug_errno(r, "Failed to find suitable OCI registry file: %m");
/* if ENOENT is seen, we use the defaults below! */
struct {
const char *default_protocol;
const char *override_registry;
const char *default_registry;
const char *override_image;
const char *short_image_prefix;
const char *default_tag;
} data = {
.default_protocol = "https",
.default_tag = "latest",
};
_cleanup_(sd_json_variant_unrefp) sd_json_variant *v = NULL;
if (f) {
unsigned line = 0, column = 0;
r = sd_json_parse_file(f, path, /* flags= */ 0, &v, &line, &column);
if (r < 0)
return log_debug_errno(r, "Parse failure at %s:%u:%u: %m", path, line, column);
static const sd_json_dispatch_field table[] = {
{ "defaultProtocol", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, voffsetof(data, default_protocol), 0 },
{ "overrideRegistry", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, voffsetof(data, override_registry), 0 },
{ "defaultRegistry", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, voffsetof(data, default_registry), 0 },
{ "overrideImage", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, voffsetof(data, override_image), 0 },
{ "shortImagePrefix", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, voffsetof(data, short_image_prefix), 0 },
{ "defaultTag", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, voffsetof(data, default_tag), 0 },
{},
};
r = sd_json_dispatch(v, table, SD_JSON_ALLOW_EXTENSIONS, &data);
if (r < 0)
return r;
}
_cleanup_free_ char *new_protocol = NULL;
if (data.default_protocol && isempty(*protocol)) {
new_protocol = strdup(data.default_protocol);
if (!new_protocol)
return -ENOMEM;
}
_cleanup_free_ char *new_registry = NULL;
if (data.override_registry) {
if (!isempty(*registry))
log_debug("Overriding registry to '%s' (was '%s') based on OCI registry database.", data.override_registry, *registry);
new_registry = strdup(data.override_registry);
if (!new_registry)
return -ENOMEM;
} else if (data.default_registry && isempty(*registry)) {
new_registry = strdup(data.default_registry);
if (!new_registry)
return -ENOMEM;
}
_cleanup_free_ char *new_image = NULL;
if (data.override_image) {
log_debug("Overriding image to '%s' (was '%s') based on OCI registry database.", data.override_registry, *image);
new_image = strdup(data.override_image);
if (!new_image)
return -ENOMEM;
} else if (data.short_image_prefix && !strchr(*image, '/')) {
new_image = strjoin(data.short_image_prefix, *image);
if (!new_image)
return -ENOMEM;
}
_cleanup_free_ char *new_tag = NULL;
if (data.default_tag && isempty(*tag)) {
new_tag = strdup(data.default_tag);
if (!new_tag)
return -ENOMEM;
}
if (!new_registry && isempty(*registry))
return log_debug_errno(SYNTHETIC_ERRNO(ENODATA), "No suitable registry found.");
if (new_protocol)
free_and_replace(*protocol, new_protocol);
if (new_registry)
free_and_replace(*registry, new_registry);
if (new_image)
free_and_replace(*image, new_image);
if (new_tag)
free_and_replace(*tag, new_tag);
return 0;
}
char* oci_digest_string(const struct iovec *iovec) {
assert(iovec);
_cleanup_free_ char *h = hexmem(iovec->iov_base, iovec->iov_len);
if (!h)
return NULL;
return strjoin("sha256:", h);
}
int oci_make_manifest_url(
const char *protocol,
const char *repository,
const char *image,
const char *tag,
char **ret) {
assert(protocol);
assert(repository);
assert(image);
assert(tag);
assert(ret);
_cleanup_free_ char *url = strjoin(protocol, "://", repository, "/v2/", image, "/manifests/", tag);
if (!url)
return -ENOMEM;
*ret = TAKE_PTR(url);
return 0;
}
int oci_make_blob_url(
const char *protocol,
const char *repository,
const char *image,
const struct iovec *digest,
char **ret) {
assert(protocol);
assert(repository);
assert(image);
assert(digest);
assert(ret);
_cleanup_free_ char *d = oci_digest_string(digest);
if (!d)
return -ENOMEM;
_cleanup_free_ char *url = strjoin(protocol, "://", repository, "/v2/", image, "/blobs/", d);
if (!url)
return -ENOMEM;
*ret = TAKE_PTR(url);
return 0;
}
/* OCI uses the Go architecture IDs */
static const char *const go_arch_table[_ARCHITECTURE_MAX] = {
[ARCHITECTURE_ARM] = "arm",
[ARCHITECTURE_ARM64] = "arm64",
[ARCHITECTURE_MIPS] = "mips",
[ARCHITECTURE_MIPS64] = "mips64",
[ARCHITECTURE_MIPS64_LE] = "mips64le",
[ARCHITECTURE_MIPS_LE] = "mipsle",
[ARCHITECTURE_PPC64] = "ppc64",
[ARCHITECTURE_PPC64_LE] = "ppc64le",
[ARCHITECTURE_S390X] = "s390x",
[ARCHITECTURE_X86] = "386",
[ARCHITECTURE_X86_64] = "amd64",
};
DEFINE_STRING_TABLE_LOOKUP_FROM_STRING(go_arch, Architecture);
char* urlescape(const char *s) {
size_t l = strlen_ptr(s);
_cleanup_free_ char *t = new(char, l * 3 + 1);
if (!t)
return NULL;
char *p = t;
for (; s && *s; s++) {
if (strchr(LETTERS DIGITS ".-_", *s))
*(p++) = *s;
else {
*(p++) = '%';
*(p++) = hexchar((uint8_t) *s >> 4);
*(p++) = hexchar((uint8_t) *s & 15);
}
}
*p = 0;
return TAKE_PTR(t);
}

31
src/import/oci-util.h Normal file
View File

@ -0,0 +1,31 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#pragma once
#include "basic-forward.h"
bool oci_image_is_valid(const char *n);
int oci_registry_is_valid(const char *n);
bool oci_tag_is_valid(const char *n);
int oci_ref_parse(const char *ref, char **ret_registry, char **ret_image, char **ret_tag);
static inline int oci_ref_valid(const char *ref) {
int r;
r = oci_ref_parse(ref, NULL, NULL, NULL);
if (r == -EINVAL)
return false;
if (r < 0)
return r;
return true;
}
int oci_ref_normalize(char **protocol, char **registry, char **image, char **tag);
char* oci_digest_string(const struct iovec *iovec);
int oci_make_manifest_url(const char *protocol, const char *repository, const char *image, const char *tag, char **ret);
int oci_make_blob_url(const char *protocol, const char *repository, const char *image, const struct iovec *digest, char **ret);
Architecture go_arch_from_string(const char *s);
char* urlescape(const char *s);

View File

@ -106,6 +106,10 @@
send_interface="org.freedesktop.import1.Manager" send_interface="org.freedesktop.import1.Manager"
send_member="PullRawEx"/> send_member="PullRawEx"/>
<allow send_destination="org.freedesktop.import1"
send_interface="org.freedesktop.import1.Manager"
send_member="PullOci"/>
<allow send_destination="org.freedesktop.import1" <allow send_destination="org.freedesktop.import1"
send_interface="org.freedesktop.import1.Transfer" send_interface="org.freedesktop.import1.Transfer"
send_member="Cancel"/> send_member="Cancel"/>

View File

@ -30,6 +30,10 @@ static int http_status_etag_exists(CURLcode status) {
return status == 304; return status == 304;
} }
static int http_status_need_authentication(CURLcode status) {
return status == 401;
}
void pull_job_close_disk_fd(PullJob *j) { void pull_job_close_disk_fd(PullJob *j) {
if (!j) if (!j)
return; return;
@ -60,10 +64,22 @@ PullJob* pull_job_unref(PullJob *j) {
iovec_done(&j->payload); iovec_done(&j->payload);
iovec_done(&j->checksum); iovec_done(&j->checksum);
iovec_done(&j->expected_checksum); iovec_done(&j->expected_checksum);
free(j->content_type);
if (j->free_userdata)
j->free_userdata(j->userdata);
free(j->description);
free(j->authentication_challenge);
return mfree(j); return mfree(j);
} }
static const char* pull_job_description(PullJob *j) {
assert(j);
return j->description ?: j->url;
}
static void pull_job_finish(PullJob *j, int ret) { static void pull_job_finish(PullJob *j, int ret) {
assert(j); assert(j);
@ -73,7 +89,7 @@ static void pull_job_finish(PullJob *j, int ret) {
if (ret == 0) { if (ret == 0) {
j->state = PULL_JOB_DONE; j->state = PULL_JOB_DONE;
j->progress_percent = 100; j->progress_percent = 100;
log_info("Download of %s complete.", j->url); log_info("Download of %s complete.", pull_job_description(j));
} else { } else {
j->state = PULL_JOB_FAILED; j->state = PULL_JOB_FAILED;
j->error = ret; j->error = ret;
@ -83,15 +99,19 @@ static void pull_job_finish(PullJob *j, int ret) {
j->on_finished(j); j->on_finished(j);
} }
static int pull_job_restart(PullJob *j, const char *new_url) { int pull_job_restart(PullJob *j, const char *new_url) {
int r; int r;
assert(j); assert(j);
assert(new_url);
/* If an URL is specified we retry the same request, just towards a different URL. If the URL is NULL
* then we'll fire the same request again (which is useful if some parameters have been changed) */
if (new_url) {
r = free_and_strdup(&j->url, new_url); r = free_and_strdup(&j->url, new_url);
if (r < 0) if (r < 0)
return r; return r;
}
j->state = PULL_JOB_INIT; j->state = PULL_JOB_INIT;
j->error = 0; j->error = 0;
@ -103,15 +123,17 @@ static int pull_job_restart(PullJob *j, const char *new_url) {
j->etag_exists = false; j->etag_exists = false;
j->mtime = 0; j->mtime = 0;
iovec_done(&j->checksum); iovec_done(&j->checksum);
j->content_type = mfree(j->content_type);
if (new_url) {
/* Reset expectations if the URL changes */
iovec_done(&j->expected_checksum); iovec_done(&j->expected_checksum);
j->expected_content_length = UINT64_MAX; j->expected_content_length = UINT64_MAX;
}
curl_glue_remove_and_free(j->glue, j->curl); curl_glue_remove_and_free(j->glue, j->curl);
j->curl = NULL; j->curl = NULL;
curl_slist_free_all(j->request_header);
j->request_header = NULL;
import_compress_free(&j->compress); import_compress_free(&j->compress);
if (j->checksum_ctx) { if (j->checksum_ctx) {
@ -192,6 +214,10 @@ void pull_job_curl_on_finished(CurlGlue *g, CURL *curl, CURLcode result) {
j->etag_exists = true; j->etag_exists = true;
r = 0; r = 0;
goto finish; goto finish;
} else if (http_status_need_authentication(status)) {
log_info("Access to image requires authentication.");
r = -ENOKEY;
goto finish;
} else if (status >= 300) { } else if (status >= 300) {
if (status == 404 && j->on_not_found) { if (status == 404 && j->on_not_found) {
@ -267,7 +293,7 @@ void pull_job_curl_on_finished(CurlGlue *g, CURL *curl, CURLcode result) {
goto finish; goto finish;
} }
log_debug("%s of %s is %s.", EVP_MD_CTX_get0_name(j->checksum_ctx), j->url, h); log_debug("%s of %s is %s.", EVP_MD_CTX_get0_name(j->checksum_ctx), pull_job_description(j), h);
} }
if (iovec_is_set(&j->expected_checksum) && if (iovec_is_set(&j->expected_checksum) &&
@ -331,7 +357,7 @@ void pull_job_curl_on_finished(CurlGlue *g, CURL *curl, CURLcode result) {
} }
} }
log_info("Acquired %s.", FORMAT_BYTES(j->written_uncompressed)); log_info("Acquired %s for %s.", FORMAT_BYTES(j->written_uncompressed), pull_job_description(j));
r = 0; r = 0;
@ -546,7 +572,7 @@ fail:
} }
static size_t pull_job_header_callback(void *contents, size_t size, size_t nmemb, void *userdata) { static size_t pull_job_header_callback(void *contents, size_t size, size_t nmemb, void *userdata) {
_cleanup_free_ char *length = NULL, *last_modified = NULL, *etag = NULL; _cleanup_free_ char *length = NULL, *last_modified = NULL, *etag = NULL, *ct = NULL;
size_t sz = size * nmemb; size_t sz = size * nmemb;
PullJob *j = ASSERT_PTR(userdata); PullJob *j = ASSERT_PTR(userdata);
CURLcode code; CURLcode code;
@ -568,6 +594,19 @@ static size_t pull_job_header_callback(void *contents, size_t size, size_t nmemb
goto fail; goto fail;
} }
if (http_status_need_authentication(status)) {
_cleanup_free_ char *challenge = NULL;
r = curl_header_strdup(contents, sz, "WWW-Authenticate:", &challenge);
if (r < 0) {
log_oom();
goto fail;
}
if (r > 0)
free_and_replace(j->authentication_challenge, challenge);
return sz;
}
if (http_status_ok(status) || http_status_etag_exists(status)) { if (http_status_ok(status) || http_status_etag_exists(status)) {
/* Check Etag on OK and etag exists responses. */ /* Check Etag on OK and etag exists responses. */
@ -614,7 +653,7 @@ static size_t pull_job_header_callback(void *contents, size_t size, size_t nmemb
goto fail; goto fail;
} }
log_info("Downloading %s for %s.", FORMAT_BYTES(j->content_length), j->url); log_info("Downloading %s for %s.", FORMAT_BYTES(j->content_length), pull_job_description(j));
} }
return sz; return sz;
@ -630,6 +669,16 @@ static size_t pull_job_header_callback(void *contents, size_t size, size_t nmemb
return sz; return sz;
} }
r = curl_header_strdup(contents, sz, "Content-Type:", &ct);
if (r < 0) {
log_oom();
goto fail;
}
if (r > 0) {
free_and_replace(j->content_type, ct);
return sz;
}
if (j->on_header) { if (j->on_header) {
r = j->on_header(j, contents, sz); r = j->on_header(j, contents, sz);
if (r < 0) if (r < 0)
@ -666,11 +715,11 @@ static int pull_job_progress_callback(void *userdata, curl_off_t dltotal, curl_o
log_info("Got %u%% of %s. %s left at %s/s.", log_info("Got %u%% of %s. %s left at %s/s.",
percent, percent,
j->url, pull_job_description(j),
FORMAT_TIMESPAN(left, USEC_PER_SEC), FORMAT_TIMESPAN(left, USEC_PER_SEC),
FORMAT_BYTES((uint64_t) ((double) dlnow / ((double) done / (double) USEC_PER_SEC)))); FORMAT_BYTES((uint64_t) ((double) dlnow / ((double) done / (double) USEC_PER_SEC))));
} else } else
log_info("Got %u%% of %s.", percent, j->url); log_info("Got %u%% of %s.", percent, pull_job_description(j));
j->progress_percent = percent; j->progress_percent = percent;
j->last_status_usec = n; j->last_status_usec = n;
@ -724,6 +773,27 @@ int pull_job_new(
return 0; return 0;
} }
int pull_job_add_request_header(PullJob *j, const char *hdr) {
assert(j);
assert(hdr);
if (j->request_header) {
struct curl_slist *l;
l = curl_slist_append(j->request_header, hdr);
if (!l)
return -ENOMEM;
j->request_header = l;
} else {
j->request_header = curl_slist_new(hdr, NULL);
if (!j->request_header)
return -ENOMEM;
}
return 0;
}
int pull_job_begin(PullJob *j) { int pull_job_begin(PullJob *j) {
int r; int r;
@ -747,19 +817,9 @@ int pull_job_begin(PullJob *j) {
if (!hdr) if (!hdr)
return -ENOMEM; return -ENOMEM;
if (!j->request_header) { r = pull_job_add_request_header(j, hdr);
j->request_header = curl_slist_new(hdr, NULL); if (r < 0)
if (!j->request_header) return r;
return -ENOMEM;
} else {
struct curl_slist *l;
l = curl_slist_append(j->request_header, hdr);
if (!l)
return -ENOMEM;
j->request_header = l;
}
} }
if (j->request_header) { if (j->request_header) {
@ -796,3 +856,30 @@ int pull_job_begin(PullJob *j) {
return 0; return 0;
} }
int pull_job_set_accept(PullJob *j, char * const *l) {
assert(j);
if (strv_isempty(l))
return 0;
_cleanup_free_ char *joined = strv_join(l, ", ");
if (!joined)
return -ENOMEM;
_cleanup_free_ char *f = strjoin("Accept: ", joined);
if (!f)
return -ENOMEM;
return pull_job_add_request_header(j, f);
}
int pull_job_set_bearer_token(PullJob *j, const char *token) {
assert(j);
_cleanup_free_ char *f = strjoin("Authorization: Bearer ", token);
if (!f)
return -ENOMEM;
return pull_job_add_request_header(j, f);
}

View File

@ -36,6 +36,10 @@ typedef struct PullJob {
char *url; char *url;
void *userdata; void *userdata;
free_func_t free_userdata;
char *description;
PullJobFinished on_finished; PullJobFinished on_finished;
PullJobOpenDisk on_open_disk; PullJobOpenDisk on_open_disk;
PullJobHeader on_header; PullJobHeader on_header;
@ -67,6 +71,7 @@ typedef struct PullJob {
struct stat disk_stat; struct stat disk_stat;
usec_t mtime; usec_t mtime;
char *content_type;
ImportCompress compress; ImportCompress compress;
@ -82,6 +87,8 @@ typedef struct PullJob {
bool sync; bool sync;
bool force_memory; bool force_memory;
char *authentication_challenge;
} PullJob; } PullJob;
int pull_job_new(PullJob **ret, const char *url, CurlGlue *glue, void *userdata); int pull_job_new(PullJob **ret, const char *url, CurlGlue *glue, void *userdata);
@ -93,4 +100,10 @@ void pull_job_curl_on_finished(CurlGlue *g, CURL *curl, CURLcode result);
void pull_job_close_disk_fd(PullJob *j); void pull_job_close_disk_fd(PullJob *j);
int pull_job_add_request_header(PullJob *j, const char *hdr);
int pull_job_set_accept(PullJob *j, char * const *l);
int pull_job_set_bearer_token(PullJob *j, const char *token);
int pull_job_restart(PullJob *j, const char *new_url);
DEFINE_TRIVIAL_CLEANUP_FUNC(PullJob*, pull_job_unref); DEFINE_TRIVIAL_CLEANUP_FUNC(PullJob*, pull_job_unref);

1601
src/import/pull-oci.c Normal file

File diff suppressed because it is too large Load Diff

16
src/import/pull-oci.h Normal file
View File

@ -0,0 +1,16 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#pragma once
#include "shared-forward.h"
#include "import-common.h"
typedef struct OciPull OciPull;
typedef void (*OciPullFinished)(OciPull *pull, int error, void *userdata);
int oci_pull_new(OciPull **ret, sd_event *event, const char *image_root, OciPullFinished on_finished, void *userdata);
OciPull* oci_pull_unref(OciPull *i);
DEFINE_TRIVIAL_CLEANUP_FUNC(OciPull*, oci_pull_unref);
int oci_pull_start(OciPull *i, const char *ref, const char *local, ImportFlags flags);

View File

@ -2,6 +2,7 @@
#include "sd-daemon.h" #include "sd-daemon.h"
#include "sd-event.h" #include "sd-event.h"
#include "sd-varlink.h"
#include "alloc-util.h" #include "alloc-util.h"
#include "btrfs-util.h" #include "btrfs-util.h"
@ -276,6 +277,11 @@ static int tar_pull_make_local_copy(TarPull *p) {
if (r < 0) if (r < 0)
return r; return r;
_cleanup_(sd_varlink_unrefp) sd_varlink *mountfsd_link = NULL;
r = mountfsd_connect(&mountfsd_link);
if (r < 0)
return log_error_errno(r, "Failed to connect to mountsd: %m");
/* Usually, tar_pull_job_on_open_disk_tar() would allocate ->tree_fd for us, but if /* Usually, tar_pull_job_on_open_disk_tar() would allocate ->tree_fd for us, but if
* already downloaded the image before, and are just making a copy of the original * already downloaded the image before, and are just making a copy of the original
* download, we need to open ->tree_fd now */ * download, we need to open ->tree_fd now */
@ -294,20 +300,35 @@ static int tar_pull_make_local_copy(TarPull *p) {
"Image tree '%s' is not owned by the foreign UID range, refusing.", "Image tree '%s' is not owned by the foreign UID range, refusing.",
p->final_path); p->final_path);
r = mountfsd_mount_directory_fd(directory_fd, p->userns_fd, DISSECT_IMAGE_FOREIGN_UID, &p->tree_fd); r = mountfsd_mount_directory_fd(
mountfsd_link,
directory_fd,
p->userns_fd,
DISSECT_IMAGE_FOREIGN_UID,
&p->tree_fd);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to mount directory via mountfsd: %m");
} }
_cleanup_close_ int directory_fd = -EBADF; _cleanup_close_ int directory_fd = -EBADF;
r = mountfsd_make_directory(t, MODE_INVALID, /* flags= */ 0, &directory_fd); r = mountfsd_make_directory(
mountfsd_link,
t,
MODE_INVALID,
/* flags= */ 0,
&directory_fd);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to make directory via mountfsd: %m");
_cleanup_close_ int copy_fd = -EBADF; _cleanup_close_ int copy_fd = -EBADF;
r = mountfsd_mount_directory_fd(directory_fd, p->userns_fd, DISSECT_IMAGE_FOREIGN_UID, &copy_fd); r = mountfsd_mount_directory_fd(
mountfsd_link,
directory_fd,
p->userns_fd,
DISSECT_IMAGE_FOREIGN_UID,
&copy_fd);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to mount directory via mountfsd: %m");
r = copy_tree_at_foreign(p->tree_fd, copy_fd, p->userns_fd); r = copy_tree_at_foreign(p->tree_fd, copy_fd, p->userns_fd);
if (r < 0) if (r < 0)
@ -611,14 +632,29 @@ static int tar_pull_job_on_open_disk_tar(PullJob *j) {
if (r < 0) if (r < 0)
return r; return r;
_cleanup_close_ int directory_fd = -EBADF; _cleanup_(sd_varlink_unrefp) sd_varlink *mountfsd_link = NULL;
r = mountfsd_make_directory(where, MODE_INVALID, /* flags= */ 0, &directory_fd); r = mountfsd_connect(&mountfsd_link);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to connect to mountfsd: %m");
r = mountfsd_mount_directory_fd(directory_fd, p->userns_fd, DISSECT_IMAGE_FOREIGN_UID, &p->tree_fd); _cleanup_close_ int directory_fd = -EBADF;
r = mountfsd_make_directory(
mountfsd_link,
where,
MODE_INVALID,
/* flags= */ 0,
&directory_fd);
if (r < 0) if (r < 0)
return r; return log_error_errno(r, "Failed to make directory via mountfsd: %m");
r = mountfsd_mount_directory_fd(
mountfsd_link,
directory_fd,
p->userns_fd,
DISSECT_IMAGE_FOREIGN_UID,
&p->tree_fd);
if (r < 0)
return log_error_errno(r, "Failed to mount directory via mountfsd: %m");
} else { } else {
if (p->flags & IMPORT_BTRFS_SUBVOL) if (p->flags & IMPORT_BTRFS_SUBVOL)
r = btrfs_subvol_make_fallback(AT_FDCWD, where, 0755); r = btrfs_subvol_make_fallback(AT_FDCWD, where, 0755);

View File

@ -18,9 +18,11 @@
#include "iovec-util.h" #include "iovec-util.h"
#include "log.h" #include "log.h"
#include "main-func.h" #include "main-func.h"
#include "oci-util.h"
#include "parse-argument.h" #include "parse-argument.h"
#include "parse-util.h" #include "parse-util.h"
#include "path-util.h" #include "path-util.h"
#include "pull-oci.h"
#include "pull-raw.h" #include "pull-raw.h"
#include "pull-tar.h" #include "pull-tar.h"
#include "runtime-scope.h" #include "runtime-scope.h"
@ -244,6 +246,71 @@ static int pull_raw(int argc, char *argv[], void *userdata) {
return -r; return -r;
} }
static void on_oci_finished(OciPull *pull, int error, void *userdata) {
sd_event *event = userdata;
assert(pull);
if (error == 0)
log_info("Operation completed successfully.");
sd_event_exit(event, ABS(error));
}
static int pull_oci(int argc, char *argv[], void *userdata) {
int r;
const char *ref = argv[1];
_cleanup_free_ char *image = NULL;
r = oci_ref_parse(ref, /* ret_registry= */ NULL, &image, /* ret_tag= */ NULL);
if (r == -EINVAL)
return log_error_errno(r, "OCI ref '%s' is invalid.", ref);
if (r < 0)
return log_error_errno(r, "Failed to check of OCI ref '%s' is valid: %m", ref);
_cleanup_free_ char *l = NULL;
const char *local;
if (argc >= 3)
local = empty_or_dash_to_null(argv[2]);
else {
r = path_extract_filename(image, &l);
if (r < 0)
return log_error_errno(r, "Failed to get extract final component of '%s': %m", image);
local = l;
}
_cleanup_free_ char *normalized = NULL;
r = normalize_local(local, ref, &normalized);
if (r < 0)
return r;
_cleanup_(sd_event_unrefp) sd_event *event = NULL;
r = import_allocate_event_with_signals(&event);
if (r < 0)
return r;
_cleanup_(oci_pull_unrefp) OciPull *pull = NULL;
r = oci_pull_new(&pull, event, arg_image_root, on_oci_finished, event);
if (r < 0)
return log_error_errno(r, "Failed to allocate puller: %m");
r = oci_pull_start(
pull,
ref,
normalized,
arg_import_flags & IMPORT_PULL_FLAGS_MASK_OCI);
if (r < 0)
return log_error_errno(r, "Failed to pull image: %m");
r = sd_event_loop(event);
if (r < 0)
return log_error_errno(r, "Failed to run event loop: %m");
log_info("Exiting.");
return -r;
}
static int help(int argc, char *argv[], void *userdata) { static int help(int argc, char *argv[], void *userdata) {
printf("%1$s [OPTIONS...] {COMMAND} ...\n" printf("%1$s [OPTIONS...] {COMMAND} ...\n"
@ -251,6 +318,7 @@ static int help(int argc, char *argv[], void *userdata) {
"\n%2$sCommands:%3$s\n" "\n%2$sCommands:%3$s\n"
" tar URL [NAME] Download a TAR image\n" " tar URL [NAME] Download a TAR image\n"
" raw URL [NAME] Download a RAW image\n" " raw URL [NAME] Download a RAW image\n"
" oci REF [NAME] Download an OCI image\n"
"\n%2$sOptions:%3$s\n" "\n%2$sOptions:%3$s\n"
" -h --help Show this help\n" " -h --help Show this help\n"
" --version Show package version\n" " --version Show package version\n"
@ -595,6 +663,7 @@ static int pull_main(int argc, char *argv[]) {
{ "help", VERB_ANY, VERB_ANY, 0, help }, { "help", VERB_ANY, VERB_ANY, 0, help },
{ "tar", 2, 3, 0, pull_tar }, { "tar", 2, 3, 0, pull_tar },
{ "raw", 2, 3, 0, pull_raw }, { "raw", 2, 3, 0, pull_raw },
{ "oci", 2, 3, 0, pull_oci },
{} {}
}; };

View File

@ -0,0 +1,22 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include "tests.h"
#include "oci-util.h"
static void test_urlescape_one(const char *s, const char *expected) {
_cleanup_free_ char *t = ASSERT_PTR(urlescape(s));
ASSERT_STREQ(t, expected);
}
TEST(urlescape) {
test_urlescape_one(NULL, "");
test_urlescape_one("", "");
test_urlescape_one("a", "a");
test_urlescape_one(" ", "%20");
test_urlescape_one(" ", "%20%20%20%20%20");
test_urlescape_one("foo\tfoo\aqux", "foo%09foo%07qux");
test_urlescape_one("müffel", "m%c3%bcffel");
}
DEFINE_TEST_MAIN(LOG_DEBUG);

View File

@ -480,7 +480,7 @@ static int method_create_or_register_machine(
root_directory, root_directory,
netif, netif,
n_netif, n_netif,
/* cid= */ 0, /* cid= */ VMADDR_CID_ANY,
/* ssh_address= */ NULL, /* ssh_address= */ NULL,
/* ssh_private_key_path= */ NULL, /* ssh_private_key_path= */ NULL,
ret, ret,

View File

@ -662,6 +662,7 @@ static int vl_method_mount_image(
if (r < 0) if (r < 0)
return r; return r;
_cleanup_(sd_varlink_unrefp) sd_varlink *nsresource_link = NULL;
for (PartitionDesignator d = 0; d < _PARTITION_DESIGNATOR_MAX; d++) { for (PartitionDesignator d = 0; d < _PARTITION_DESIGNATOR_MAX; d++) {
DissectedPartition *pp = di->partitions + d; DissectedPartition *pp = di->partitions + d;
int fd_idx; int fd_idx;
@ -673,7 +674,14 @@ static int vl_method_mount_image(
continue; continue;
if (userns_fd >= 0) { if (userns_fd >= 0) {
r = nsresource_add_mount(userns_fd, pp->fsmount_fd);
if (!nsresource_link) {
r = nsresource_connect(&nsresource_link);
if (r < 0)
return r;
}
r = nsresource_add_mount(nsresource_link, userns_fd, pp->fsmount_fd);
if (r < 0) if (r < 0)
return r; return r;
} }
@ -1206,7 +1214,7 @@ static int vl_method_mount_directory(
} }
if (userns_fd >= 0) { if (userns_fd >= 0) {
r = nsresource_add_mount(userns_fd, mount_fd); r = nsresource_add_mount(/* vl= */ NULL, userns_fd, mount_fd);
if (r < 0) if (r < 0)
return r; return r;
} }

13
src/mstack/meson.build Normal file
View File

@ -0,0 +1,13 @@
# SPDX-License-Identifier: LGPL-2.1-or-later
executables += [
executable_template + {
'name' : 'systemd-mstack',
'public' : true,
'sources' : files('mstack-tool.c'),
},
]
install_symlink('mount.mstack',
pointing_to : sbin_to_bin + 'systemd-mstack',
install_dir : sbindir)

437
src/mstack/mstack-tool.c Normal file
View File

@ -0,0 +1,437 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include <fcntl.h>
#include <getopt.h>
#include <unistd.h>
#include "argv-util.h"
#include "build.h"
#include "chase.h"
#include "errno-util.h"
#include "extract-word.h"
#include "fd-util.h"
#include "format-table.h"
#include "image-policy.h"
#include "main-func.h"
#include "mount-util.h"
#include "mountpoint-util.h"
#include "mstack.h"
#include "parse-argument.h"
#include "pretty-print.h"
#include "string-util.h"
static enum {
ACTION_INSPECT,
ACTION_MOUNT,
ACTION_UMOUNT,
} arg_action = ACTION_INSPECT;
static char *arg_what = NULL;
static char *arg_where = NULL;
static sd_json_format_flags_t arg_json_format_flags = SD_JSON_FORMAT_OFF;
static PagerFlags arg_pager_flags = 0;
static int arg_legend = true;
static MStackFlags arg_mstack_flags = 0;
static bool arg_rmdir = false;
static ImagePolicy *arg_image_policy = NULL;
static ImageFilter *arg_image_filter = NULL;
STATIC_DESTRUCTOR_REGISTER(arg_what, freep);
STATIC_DESTRUCTOR_REGISTER(arg_where, freep);
STATIC_DESTRUCTOR_REGISTER(arg_image_policy, image_policy_freep);
STATIC_DESTRUCTOR_REGISTER(arg_image_filter, image_filter_freep);
static int help(void) {
_cleanup_free_ char *link = NULL;
int r;
r = terminal_urlify_man("systemd-mstack", "1", &link);
if (r < 0)
return log_oom();
printf("%1$s [OPTIONS...] WHAT\n"
"%1$s [OPTIONS...] --mount WHAT WHERE\n"
"%1$s [OPTIONS...] --umount WHERE\n"
"\n%5$sInspect or apply mount stack.%6$s\n\n"
"%3$sOptions:%4$s\n"
" --no-pager Do not pipe output into a pager\n"
" --no-legend Do not print the column headers\n"
" --json=pretty|short|off Generate JSON output\n"
" -r --read-only Mount read-only\n"
" --mkdir Make mount directory before mounting, if missing\n"
" --rmdir Remove mount directory after unmounting\n"
" --image-policy=POLICY\n"
" Specify image dissection policy\n"
" --image-filter=FILTER\n"
" Specify image dissection filter\n"
"\n%3$sCommands:%4$s\n"
" -h --help Show this help\n"
" --version Show package version\n"
" -m --mount Mount the mstack to the specified directory\n"
" -M Shortcut for --mount --mkdir\n"
" -u --umount Unmount the image from the specified directory\n"
" -U Shortcut for --umount --rmdir\n"
"\nSee the %2$s for details.\n",
program_invocation_short_name,
link,
ansi_underline(), ansi_normal(),
ansi_highlight(), ansi_normal());
return 0;
}
static int parse_argv(int argc, char *argv[]) {
enum {
ARG_VERSION = 0x100,
ARG_NO_PAGER,
ARG_NO_LEGEND,
ARG_JSON,
ARG_MKDIR,
ARG_RMDIR,
ARG_IMAGE_POLICY,
ARG_IMAGE_FILTER,
};
static const struct option options[] = {
{ "help", no_argument, NULL, 'h' },
{ "version", no_argument, NULL, ARG_VERSION },
{ "no-pager", no_argument, NULL, ARG_NO_PAGER },
{ "no-legend", no_argument, NULL, ARG_NO_LEGEND },
{ "mount", no_argument, NULL, 'm' },
{ "umount", no_argument, NULL, 'u' },
{ "json", required_argument, NULL, ARG_JSON },
{ "read-only", no_argument, NULL, 'r' },
{ "rmdir", no_argument, NULL, ARG_RMDIR },
{ "image-policy", required_argument, NULL, ARG_IMAGE_POLICY },
{ "image-filter", required_argument, NULL, ARG_IMAGE_FILTER },
{}
};
int c, r;
assert(argc >= 0);
assert(argv);
while ((c = getopt_long(argc, argv, "hmMuUr", options, NULL)) >= 0) {
switch (c) {
case 'h':
return help();
case ARG_VERSION:
return version();
case ARG_NO_PAGER:
arg_pager_flags |= PAGER_DISABLE;
break;
case ARG_NO_LEGEND:
arg_legend = false;
break;
case ARG_JSON:
r = parse_json_argument(optarg, &arg_json_format_flags);
if (r <= 0)
return r;
break;
case 'r':
arg_mstack_flags |= MSTACK_RDONLY;
break;
case ARG_IMAGE_POLICY:
r = parse_image_policy_argument(optarg, &arg_image_policy);
if (r < 0)
return r;
break;
case ARG_IMAGE_FILTER: {
_cleanup_(image_filter_freep) ImageFilter *f = NULL;
r = image_filter_parse(optarg, &f);
if (r < 0)
return log_error_errno(r, "Failed to parse image filter expression: %s", optarg);
image_filter_free(arg_image_filter);
arg_image_filter = TAKE_PTR(f);
break;
}
case ARG_MKDIR:
arg_mstack_flags |= MSTACK_MKDIR;
break;
case ARG_RMDIR:
arg_rmdir = true;
break;
case 'm':
arg_action = ACTION_MOUNT;
break;
case 'M':
/* Shortcut combination of --mkdir + --mount */
arg_action = ACTION_MOUNT;
arg_mstack_flags |= MSTACK_MKDIR;
break;
case 'u':
arg_action = ACTION_UMOUNT;
break;
case 'U':
/* Shortcut combination of --rmdir + --umount */
arg_action = ACTION_UMOUNT;
arg_rmdir = true;
break;
case '?':
return -EINVAL;
default:
assert_not_reached();
}
}
switch (arg_action) {
case ACTION_INSPECT:
if (optind + 1 != argc)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Expected one argument.");
r = parse_path_argument(argv[optind], /* suppress_root= */ false, &arg_what);
if (r < 0)
return r;
break;
case ACTION_MOUNT:
if (optind + 2 != argc)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Expected two arguments.");
r = parse_path_argument(argv[optind], /* suppress_root= */ false, &arg_what);
if (r < 0)
return r;
r = parse_path_argument(argv[optind+1], /* suppress_root= */ false, &arg_where);
if (r < 0)
return r;
break;
case ACTION_UMOUNT:
if (optind + 1 != argc)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Expected one argument.");
r = parse_path_argument(argv[optind], /* suppress_root= */ false, &arg_where);
if (r < 0)
return r;
break;
default:
assert_not_reached();
}
return 1;
}
static int parse_argv_as_mount_helper(int argc, char *argv[]) {
const char *options = NULL;
bool fake = false;
int c, r;
/* Implements util-linux "external helper" command line interface, as per mount(8) man page. */
while ((c = getopt(argc, argv, "sfnvN:o:t:")) >= 0) {
switch (c) {
case 'f':
fake = true;
break;
case 'o':
options = optarg;
break;
case 't':
if (!streq(optarg, "mstack"))
log_debug("Unexpected file system type '%s', ignoring.", optarg);
break;
case 's': /* sloppy mount options */
case 'n': /* aka --no-mtab */
case 'v': /* aka --verbose */
log_debug("Ignoring option -%c, not implemented.", c);
break;
case 'N': /* aka --namespace= */
return log_error_errno(SYNTHETIC_ERRNO(EOPNOTSUPP), "Option -%c is not implemented, refusing.", c);
case '?':
return -EINVAL;
}
}
if (optind + 2 != argc)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL),
"Expected an image file path and target directory as only argument.");
for (const char *p = options;;) {
_cleanup_free_ char *word = NULL;
r = extract_first_word(&p, &word, ",", EXTRACT_KEEP_QUOTE);
if (r < 0)
return log_error_errno(r, "Failed to extract mount option: %m");
if (r == 0)
break;
if (streq(word, "ro"))
SET_FLAG(arg_mstack_flags, MSTACK_RDONLY, true);
else if (streq(word, "rw"))
SET_FLAG(arg_mstack_flags, MSTACK_RDONLY, false);
else
return log_error_errno(SYNTHETIC_ERRNO(EINVAL),
"Unknown mount option '%s'.", word);
}
if (fake)
return 0;
r = parse_path_argument(argv[optind], /* suppress_root= */ false, &arg_what);
if (r < 0)
return r;
r = parse_path_argument(argv[optind+1], /* suppress_root= */ false, &arg_where);
if (r < 0)
return r;
arg_action = ACTION_MOUNT;
return 1;
}
static int inspect_mstack(void) {
_cleanup_(mstack_freep) MStack *mstack = NULL;
int r;
assert(arg_what);
r = mstack_load(arg_what, /* dir_fd= */ -EBADF, &mstack);
if (r < 0)
return log_debug_errno(r, "Failed to load .mstack/ directory '%s': %m", arg_what);
_cleanup_(table_unrefp) Table *t = NULL;
t = table_new("type", "name", "image", "what", "where", "sort");
if (!t)
return log_oom();
table_set_ersatz_string(t, TABLE_ERSATZ_DASH);
FOREACH_ARRAY(m, mstack->mounts, mstack->n_mounts) {
_cleanup_free_ char *w = NULL;
r = fd_get_path(m->what_fd, &w);
if (r < 0)
return log_error_errno(r, "Failed to get path of what file descriptor: %m");
r = table_add_many(
t,
TABLE_STRING, mstack_mount_type_to_string(m->mount_type),
TABLE_STRING, m->what,
TABLE_STRING, image_type_to_string(m->image_type),
TABLE_PATH, w,
TABLE_PATH, m->where ?: ((mstack->root_mount && mstack->root_mount != m) ? "/usr" : "/"),
TABLE_STRING, m->sort_key);
if (r < 0)
return table_log_add_error(r);
}
return table_print_with_pager(t, arg_json_format_flags, arg_pager_flags, arg_legend);
}
static int mount_mstack(void) {
int r;
assert(arg_what);
assert(arg_where);
r = mstack_apply(
arg_what,
/* dir_fd= */ -EBADF,
arg_where,
/* temp_mount_dir= */ NULL, /* auto-create temporary directory */
/* mountfsd_link= */ NULL,
/* userns_fd= */ -EBADF,
arg_image_policy,
arg_image_filter,
arg_mstack_flags,
/* ret_root_fd= */ NULL);
if (r < 0)
return log_error_errno(r, "Failed to apply .mstack/ directory '%s': %m", arg_what);
return 0;
}
static int umount_mstack(void) {
int r;
assert(arg_where);
_cleanup_free_ char *canonical = NULL;
_cleanup_close_ int fd = chase_and_open(arg_where, /* root= */ NULL, /* chase_flags= */ 0, O_DIRECTORY, &canonical);
if (fd == -ENOTDIR)
return log_error_errno(SYNTHETIC_ERRNO(ENOTDIR), "'%s' is not a directory", arg_where);
if (fd < 0)
return log_error_errno(fd, "Failed to resolve path '%s': %m", arg_where);
r = is_mount_point_at(fd, /* path= */ NULL, /* flags= */ 0);
if (r == 0)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "'%s' is not a mount point", canonical);
if (r < 0)
return log_error_errno(r, "Failed to determine whether '%s' is a mount point: %m", canonical);
fd = safe_close(fd);
r = umount_recursive(canonical, 0);
if (r < 0)
return log_error_errno(r, "Failed to unmount '%s': %m", canonical);
if (arg_rmdir) {
r = RET_NERRNO(rmdir(canonical));
if (r < 0)
return log_error_errno(r, "Failed to remove mount directory '%s': %m", canonical);
}
return 0;
}
static int run(int argc, char *argv[]) {
int r;
log_setup();
if (invoked_as(argv, "mount.mstack"))
r = parse_argv_as_mount_helper(argc, argv);
else
r = parse_argv(argc, argv);
if (r <= 0)
return r;
switch (arg_action) {
case ACTION_INSPECT:
return inspect_mstack();
case ACTION_MOUNT:
return mount_mstack();
case ACTION_UMOUNT:
return umount_mstack();
default:
assert_not_reached();
}
}
DEFINE_MAIN_FUNCTION(run);

View File

@ -102,43 +102,43 @@ static int managed_interfaces_build_json(MetricFamilyContext *context, void *use
/* Keep metrics ordered alphabetically */ /* Keep metrics ordered alphabetically */
static const MetricFamily network_metric_family_table[] = { static const MetricFamily network_metric_family_table[] = {
{ {
.name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "addressState", .name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "AddressState",
.description = "Per interface metric: address state", .description = "Per interface metric: address state",
.type = METRIC_FAMILY_TYPE_STRING, .type = METRIC_FAMILY_TYPE_STRING,
.generate = link_address_state_build_json, .generate = link_address_state_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "adminState", .name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "AdministrativeState",
.description = "Per interface metric: admin state", .description = "Per interface metric: administrative state",
.type = METRIC_FAMILY_TYPE_STRING, .type = METRIC_FAMILY_TYPE_STRING,
.generate = link_admin_state_build_json, .generate = link_admin_state_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "carrierState", .name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "CarrierState",
.description = "Per interface metric: carrier state", .description = "Per interface metric: carrier state",
.type = METRIC_FAMILY_TYPE_STRING, .type = METRIC_FAMILY_TYPE_STRING,
.generate = link_carrier_state_build_json, .generate = link_carrier_state_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "ipv4AddressState", .name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "IPv4AddressState",
.description = "Per interface metric: IPv4 address state", .description = "Per interface metric: IPv4 address state",
.type = METRIC_FAMILY_TYPE_STRING, .type = METRIC_FAMILY_TYPE_STRING,
.generate = link_ipv4_address_state_build_json, .generate = link_ipv4_address_state_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "ipv6AddressState", .name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "IPv6AddressState",
.description = "Per interface metric: IPv6 address state", .description = "Per interface metric: IPv6 address state",
.type = METRIC_FAMILY_TYPE_STRING, .type = METRIC_FAMILY_TYPE_STRING,
.generate = link_ipv6_address_state_build_json, .generate = link_ipv6_address_state_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "managedInterfaces", .name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "ManagedInterfaces",
.description = "Number of network interfaces managed by systemd-networkd", .description = "Number of network interfaces managed by systemd-networkd",
.type = METRIC_FAMILY_TYPE_GAUGE, .type = METRIC_FAMILY_TYPE_GAUGE,
.generate = managed_interfaces_build_json, .generate = managed_interfaces_build_json,
}, },
{ {
.name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "operationalState", .name = METRIC_IO_SYSTEMD_NETWORK_PREFIX "OperationalState",
.description = "Per interface metric: operational state", .description = "Per interface metric: operational state",
.type = METRIC_FAMILY_TYPE_STRING, .type = METRIC_FAMILY_TYPE_STRING,
.generate = link_oper_state_build_json, .generate = link_oper_state_build_json,

View File

@ -108,7 +108,10 @@ int create_subcgroup(
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to add process " PID_FMT " to cgroup %s: %m", pid->pid, payload); return log_error_errno(r, "Failed to add process " PID_FMT " to cgroup %s: %m", pid->pid, payload);
r = nsresource_add_cgroup(userns_fd, cgroup_fd); r = nsresource_add_cgroup(
/* vl= */ NULL,
userns_fd,
cgroup_fd);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to add cgroup %s to userns: %m", payload); return log_error_errno(r, "Failed to add cgroup %s to userns: %m", payload);
} else { } else {

View File

@ -19,6 +19,7 @@
#include "sd-id128.h" #include "sd-id128.h"
#include "sd-netlink.h" #include "sd-netlink.h"
#include "sd-path.h" #include "sd-path.h"
#include "sd-varlink.h"
#include "alloc-util.h" #include "alloc-util.h"
#include "barrier.h" #include "barrier.h"
@ -69,6 +70,7 @@
#include "mkdir.h" #include "mkdir.h"
#include "mount-util.h" #include "mount-util.h"
#include "mountpoint-util.h" #include "mountpoint-util.h"
#include "mstack.h"
#include "namespace-util.h" #include "namespace-util.h"
#include "netlink-internal.h" #include "netlink-internal.h"
#include "notify-recv.h" #include "notify-recv.h"
@ -204,6 +206,7 @@ struct ether_addr arg_network_provided_mac = {};
static PagerFlags arg_pager_flags = 0; static PagerFlags arg_pager_flags = 0;
static unsigned long arg_personality = PERSONALITY_INVALID; static unsigned long arg_personality = PERSONALITY_INVALID;
static char *arg_image = NULL; static char *arg_image = NULL;
static char *arg_mstack = NULL;
static char *arg_oci_bundle = NULL; static char *arg_oci_bundle = NULL;
static VolatileMode arg_volatile_mode = VOLATILE_NO; static VolatileMode arg_volatile_mode = VOLATILE_NO;
static ExposePort *arg_expose_ports = NULL; static ExposePort *arg_expose_ports = NULL;
@ -272,6 +275,7 @@ STATIC_DESTRUCTOR_REGISTER(arg_network_bridge, freep);
STATIC_DESTRUCTOR_REGISTER(arg_network_zone, freep); STATIC_DESTRUCTOR_REGISTER(arg_network_zone, freep);
STATIC_DESTRUCTOR_REGISTER(arg_network_namespace_path, freep); STATIC_DESTRUCTOR_REGISTER(arg_network_namespace_path, freep);
STATIC_DESTRUCTOR_REGISTER(arg_image, freep); STATIC_DESTRUCTOR_REGISTER(arg_image, freep);
STATIC_DESTRUCTOR_REGISTER(arg_mstack, freep);
STATIC_DESTRUCTOR_REGISTER(arg_oci_bundle, freep); STATIC_DESTRUCTOR_REGISTER(arg_oci_bundle, freep);
STATIC_DESTRUCTOR_REGISTER(arg_property, strv_freep); STATIC_DESTRUCTOR_REGISTER(arg_property, strv_freep);
STATIC_DESTRUCTOR_REGISTER(arg_property_message, sd_bus_message_unrefp); STATIC_DESTRUCTOR_REGISTER(arg_property_message, sd_bus_message_unrefp);
@ -738,6 +742,7 @@ static int parse_argv(int argc, char *argv[]) {
ARG_BACKGROUND, ARG_BACKGROUND,
ARG_CLEANUP, ARG_CLEANUP,
ARG_NO_ASK_PASSWORD, ARG_NO_ASK_PASSWORD,
ARG_MSTACK,
}; };
static const struct option options[] = { static const struct option options[] = {
@ -817,6 +822,7 @@ static int parse_argv(int argc, char *argv[]) {
{ "background", required_argument, NULL, ARG_BACKGROUND }, { "background", required_argument, NULL, ARG_BACKGROUND },
{ "cleanup", no_argument, NULL, ARG_CLEANUP }, { "cleanup", no_argument, NULL, ARG_CLEANUP },
{ "no-ask-password", no_argument, NULL, ARG_NO_ASK_PASSWORD }, { "no-ask-password", no_argument, NULL, ARG_NO_ASK_PASSWORD },
{ "mstack", required_argument, NULL, ARG_MSTACK },
{} {}
}; };
@ -863,6 +869,14 @@ static int parse_argv(int argc, char *argv[]) {
arg_settings_mask |= SETTING_DIRECTORY; arg_settings_mask |= SETTING_DIRECTORY;
break; break;
case ARG_MSTACK:
r = parse_path_argument(optarg, false, &arg_mstack);
if (r < 0)
return r;
arg_settings_mask |= SETTING_DIRECTORY;
break;
case ARG_OCI_BUNDLE: case ARG_OCI_BUNDLE:
r = parse_path_argument(optarg, false, &arg_oci_bundle); r = parse_path_argument(optarg, false, &arg_oci_bundle);
if (r < 0) if (r < 0)
@ -1654,11 +1668,11 @@ static int verify_arguments(void) {
if (has_custom_root_mount(arg_custom_mounts, arg_n_custom_mounts)) if (has_custom_root_mount(arg_custom_mounts, arg_n_custom_mounts))
arg_read_only = true; arg_read_only = true;
if (arg_directory && arg_image) if (!!arg_directory + !!arg_image + !!arg_mstack > 1)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--directory= and --image= may not be combined."); return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--directory=, --image= --mstack= may not be combined.");
if (arg_template && arg_image) if (arg_template && (arg_image || arg_mstack))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--template= and --image= may not be combined."); return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--template= and --image=/--mstack= may not be combined.");
if (arg_template && !(arg_directory || arg_machine)) if (arg_template && !(arg_directory || arg_machine))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--template= needs --directory= or --machine=."); return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--template= needs --directory= or --machine=.");
@ -1666,6 +1680,9 @@ static int verify_arguments(void) {
if (arg_ephemeral && arg_template) if (arg_ephemeral && arg_template)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--ephemeral and --template= may not be combined."); return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--ephemeral and --template= may not be combined.");
if (arg_ephemeral && arg_mstack)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "--ephemeral and --mstack= may not be combined.");
/* Permit --ephemeral with --link-journal=try-* to satisfy principle of the least astonishment /* Permit --ephemeral with --link-journal=try-* to satisfy principle of the least astonishment
* (by common sense, "try" means "do not fail if not possible") */ * (by common sense, "try" means "do not fail if not possible") */
if (arg_ephemeral && !IN_SET(arg_link_journal, LINK_NO, LINK_AUTO) && !arg_link_journal_try) if (arg_ephemeral && !IN_SET(arg_link_journal, LINK_NO, LINK_AUTO) && !arg_link_journal_try)
@ -3054,6 +3071,24 @@ static int pick_paths(void) {
arg_architecture = result.architecture; arg_architecture = result.architecture;
} }
if (arg_mstack) {
_cleanup_(pick_result_done) PickResult result = PICK_RESULT_NULL;
PickFilter filter = *pick_filter_image_mstack;
filter.architecture = arg_architecture;
r = path_pick_update_warn(
&arg_mstack,
&filter,
/* n_filters= */ 1,
PICK_ARCHITECTURE|PICK_TRIES,
&result);
if (r < 0)
return r;
arg_architecture = result.architecture;
}
if (arg_template) { if (arg_template) {
_cleanup_(pick_result_done) PickResult result = PICK_RESULT_NULL; _cleanup_(pick_result_done) PickResult result = PICK_RESULT_NULL;
PickFilter filter = *pick_filter_image_dir; PickFilter filter = *pick_filter_image_dir;
@ -3088,7 +3123,7 @@ static int determine_names(void) {
return log_oom(); return log_oom();
} }
if (!arg_image && !arg_directory) { if (!arg_image && !arg_directory && !arg_mstack) {
if (arg_machine) { if (arg_machine) {
_cleanup_(image_unrefp) Image *i = NULL; _cleanup_(image_unrefp) Image *i = NULL;
@ -3099,10 +3134,24 @@ static int determine_names(void) {
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to find image for machine '%s': %m", arg_machine); return log_error_errno(r, "Failed to find image for machine '%s': %m", arg_machine);
if (IN_SET(i->type, IMAGE_RAW, IMAGE_BLOCK)) switch (i->type) {
case IMAGE_RAW:
case IMAGE_BLOCK:
r = free_and_strdup(&arg_image, i->path); r = free_and_strdup(&arg_image, i->path);
else break;
case IMAGE_DIRECTORY:
case IMAGE_SUBVOLUME:
r = free_and_strdup(&arg_directory, i->path); r = free_and_strdup(&arg_directory, i->path);
break;
case IMAGE_MSTACK:
r = free_and_strdup(&arg_mstack, i->path);
break;
default:
assert_not_reached();
}
if (r < 0) if (r < 0)
return log_oom(); return log_oom();
@ -3114,31 +3163,40 @@ static int determine_names(void) {
return log_error_errno(r, "Failed to determine current directory: %m"); return log_error_errno(r, "Failed to determine current directory: %m");
} }
if (!arg_directory && !arg_image) if (!arg_directory && !arg_image && !arg_mstack)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Failed to determine path, please use -D or -i."); return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Failed to determine path, please use --directory=, --image= or --mstack=.");
} }
if (!arg_machine) { if (!arg_machine) {
if (arg_directory && path_equal(arg_directory, "/")) { if (arg_directory) {
if (path_equal(arg_directory, "/")) {
arg_machine = gethostname_malloc(); arg_machine = gethostname_malloc();
if (!arg_machine) if (!arg_machine)
return log_oom(); return log_oom();
} else if (arg_image) {
char *e;
r = path_extract_filename(arg_image, &arg_machine);
if (r < 0)
return log_error_errno(r, "Failed to extract file name from '%s': %m", arg_image);
/* Truncate suffix if there is one */
e = endswith(arg_machine, ".raw");
if (e)
*e = 0;
} else { } else {
r = path_extract_filename(arg_directory, &arg_machine); r = path_extract_filename(arg_directory, &arg_machine);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to extract file name from '%s': %m", arg_directory); return log_error_errno(r, "Failed to extract file name from '%s': %m", arg_directory);
} }
} else if (arg_image) {
r = path_extract_filename(arg_image, &arg_machine);
if (r < 0)
return log_error_errno(r, "Failed to extract file name from '%s': %m", arg_image);
/* Truncate suffix if there is one */
char *e = endswith(arg_machine, ".raw");
if (e)
*e = 0;
} else if (arg_mstack) {
r = path_extract_filename(arg_mstack, &arg_machine);
if (r < 0)
return log_error_errno(r, "Failed to extract file name from '%s': %m", arg_mstack);
char *e = endswith(arg_machine, ".mstack");
if (e)
*e = 0;
} else
assert_not_reached();
hostname_cleanup(arg_machine); hostname_cleanup(arg_machine);
if (!hostname_is_valid(arg_machine, 0)) if (!hostname_is_valid(arg_machine, 0))
@ -3873,6 +3931,7 @@ static int outer_child(
const char *directory, const char *directory,
int mount_fd, int mount_fd,
DissectedImage *dissected_image, DissectedImage *dissected_image,
MStack *mstack,
int fd_outer_socket, int fd_outer_socket,
int fd_inner_socket, int fd_inner_socket,
FDSet *fds, FDSet *fds,
@ -3928,6 +3987,7 @@ static int outer_child(
if (mount_fd >= 0) { if (mount_fd >= 0) {
assert(arg_directory); assert(arg_directory);
assert(!arg_image); assert(!arg_image);
assert(!arg_mstack);
if (move_mount(mount_fd, "", AT_FDCWD, directory, MOVE_MOUNT_F_EMPTY_PATH) < 0) if (move_mount(mount_fd, "", AT_FDCWD, directory, MOVE_MOUNT_F_EMPTY_PATH) < 0)
return log_error_errno(errno, "Failed to attach root directory: %m"); return log_error_errno(errno, "Failed to attach root directory: %m");
@ -3938,6 +3998,7 @@ static int outer_child(
} else if (dissected_image) { } else if (dissected_image) {
assert(!arg_directory); assert(!arg_directory);
assert(arg_image); assert(arg_image);
assert(!arg_mstack);
/* If we are operating on a disk image, then mount its root directory now, but leave out the /* If we are operating on a disk image, then mount its root directory now, but leave out the
* rest. We can read the UID shift from it if we need to. Further down we'll mount the rest, * rest. We can read the UID shift from it if we need to. Further down we'll mount the rest,
@ -3955,9 +4016,38 @@ static int outer_child(
(arg_start_mode == START_BOOT ? DISSECT_IMAGE_VALIDATE_OS : 0)); (arg_start_mode == START_BOOT ? DISSECT_IMAGE_VALIDATE_OS : 0));
if (r < 0) if (r < 0)
return r; return r;
} else if (arg_mstack) {
assert(!arg_directory);
assert(!arg_image);
assert(arg_mstack);
MStackFlags mstack_flags = arg_read_only ? MSTACK_RDONLY : 0;
/* This creates the needed overlayfs or tmpfs, owned by our target userns. Note that we pass
* the target mount dir as temporary mount dir here. We after all just need some dir here
* that definitely exists, and the temporary mounts on it are not going to be visible
* outside. */
r = mstack_make_mounts(
mstack,
/* temp_mount_dir= */ directory, /* !! */
mstack_flags);
if (r < 0)
return log_error_errno(r, "Failed to make .mstack/ mounts: %m");
/* And then attaches all mounts to the directory */
r = mstack_bind_mounts(
mstack,
directory,
/* where_fd= */ -EBADF,
mstack_flags,
/* ret_root_fd= */ NULL);
if (r < 0)
return log_error_errno(r, "Failed bind mount .mstack/ mounts: %m");
} else { } else {
assert(arg_directory); assert(arg_directory);
assert(!arg_image); assert(!arg_image);
assert(!arg_mstack);
r = mount_nofollow_verbose(LOG_ERR, arg_directory, directory, /* fstype= */ NULL, MS_BIND|MS_REC, /* options= */ NULL); r = mount_nofollow_verbose(LOG_ERR, arg_directory, directory, /* fstype= */ NULL, MS_BIND|MS_REC, /* options= */ NULL);
if (r < 0) if (r < 0)
@ -5023,6 +5113,10 @@ static int load_settings(void) {
r = file_in_same_dir(arg_directory, arg_settings_filename, &p); r = file_in_same_dir(arg_directory, arg_settings_filename, &p);
if (r < 0 && r != -EADDRNOTAVAIL) /* if directory is root fs, don't complain */ if (r < 0 && r != -EADDRNOTAVAIL) /* if directory is root fs, don't complain */
return log_error_errno(r, "Failed to generate settings path from directory path: %m"); return log_error_errno(r, "Failed to generate settings path from directory path: %m");
} else if (arg_mstack) {
r = file_in_same_dir(arg_mstack, arg_settings_filename, &p);
if (r < 0)
return log_error_errno(r, "Failed to generate settings path from mstack path: %m");
} }
if (p) { if (p) {
@ -5073,6 +5167,7 @@ static int run_container(
const char *directory, const char *directory,
int mount_fd, int mount_fd,
DissectedImage *dissected_image, DissectedImage *dissected_image,
MStack *mstack,
int userns_fd, int userns_fd,
FDSet *fds, FDSet *fds,
char veth_name[IFNAMSIZ], char veth_name[IFNAMSIZ],
@ -5223,6 +5318,7 @@ static int run_container(
directory, directory,
mount_fd, mount_fd,
dissected_image, dissected_image,
mstack,
fd_outer_socket_pair[1], fd_outer_socket_pair[1],
fd_inner_socket_pair[1], fd_inner_socket_pair[1],
fds, fds,
@ -5347,7 +5443,13 @@ static int run_container(
} else { } else {
_cleanup_free_ char *host_ifname = NULL; _cleanup_free_ char *host_ifname = NULL;
r = nsresource_add_netif_veth(userns_fd, child_netns_fd, /* namespace_ifname= */ NULL, &host_ifname, /* ret_namespace_ifname= */ NULL); r = nsresource_add_netif_veth(
/* vl= */ NULL,
userns_fd,
child_netns_fd,
/* namespace_ifname= */ NULL,
&host_ifname,
/* ret_namespace_ifname= */ NULL);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to add network interface to container: %m"); return log_error_errno(r, "Failed to add network interface to container: %m");
@ -5932,8 +6034,10 @@ static int run(int argc, char *argv[]) {
_cleanup_(rm_rf_subvolume_and_freep) char *snapshot_dir = NULL; _cleanup_(rm_rf_subvolume_and_freep) char *snapshot_dir = NULL;
_cleanup_(loop_device_unrefp) LoopDevice *loop = NULL; _cleanup_(loop_device_unrefp) LoopDevice *loop = NULL;
_cleanup_(dissected_image_unrefp) DissectedImage *dissected_image = NULL; _cleanup_(dissected_image_unrefp) DissectedImage *dissected_image = NULL;
_cleanup_(mstack_freep) MStack *mstack = NULL;
_cleanup_(sd_netlink_unrefp) sd_netlink *nfnl = NULL; _cleanup_(sd_netlink_unrefp) sd_netlink *nfnl = NULL;
_cleanup_(pidref_done) PidRef pid = PIDREF_NULL; _cleanup_(pidref_done) PidRef pid = PIDREF_NULL;
_cleanup_(sd_varlink_unrefp) sd_varlink *nsresource_link = NULL, *mountfsd_link = NULL;
log_setup(); log_setup();
@ -6036,13 +6140,28 @@ static int run(int argc, char *argv[]) {
if (arg_userns_mode == USER_NAMESPACE_MANAGED) { if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
/* Let's allocate a 64K userns first, if managed mode is chosen */ /* Let's allocate a 64K userns first, if managed mode is chosen */
r = nsresource_connect(&nsresource_link);
if (r < 0) {
log_error_errno(r, "Failed to connect to nsresourced: %m");
goto finish;
}
r = mountfsd_connect(&mountfsd_link);
if (r < 0) {
log_error_errno(r, "Failed to connect to mountsd: %m");
goto finish;
}
_cleanup_free_ char *userns_name = NULL; _cleanup_free_ char *userns_name = NULL;
if (asprintf(&userns_name, "nspawn-" PID_FMT "-%s", getpid_cached(), arg_machine) < 0) { if (asprintf(&userns_name, "nspawn-" PID_FMT "-%s", getpid_cached(), arg_machine) < 0) {
r = log_oom(); r = log_oom();
goto finish; goto finish;
} }
userns_fd = nsresource_allocate_userns(userns_name, NSRESOURCE_UIDS_64K); /* allocate 64K UIDs */ userns_fd = nsresource_allocate_userns(
nsresource_link,
userns_name,
NSRESOURCE_UIDS_64K); /* allocate 64K UIDs */
if (userns_fd < 0) { if (userns_fd < 0) {
r = log_error_errno(userns_fd, "Failed to allocate user namespace with 64K users: %m"); r = log_error_errno(userns_fd, "Failed to allocate user namespace with 64K users: %m");
goto finish; goto finish;
@ -6195,20 +6314,23 @@ static int run(int argc, char *argv[]) {
} }
} }
if (arg_userns_mode == USER_NAMESPACE_MANAGED) { if (userns_fd >= 0) {
r = mountfsd_mount_directory( r = mountfsd_mount_directory(
mountfsd_link,
arg_directory, arg_directory,
userns_fd, userns_fd,
determine_dissect_image_flags(), determine_dissect_image_flags(),
&mount_fd); &mount_fd);
if (r < 0) if (r < 0) {
log_error_errno(r, "Failed to mount directory via mountfsd: %m");
goto finish; goto finish;
} }
} else { }
} else if (arg_image) {
DissectImageFlags dissect_image_flags = DissectImageFlags dissect_image_flags =
determine_dissect_image_flags(); determine_dissect_image_flags();
assert(arg_image);
assert(!arg_template); assert(!arg_template);
r = chase_and_update(&arg_image, 0); r = chase_and_update(&arg_image, 0);
@ -6345,6 +6467,7 @@ static int run(int argc, char *argv[]) {
goto finish; goto finish;
} else { } else {
r = mountfsd_mount_image( r = mountfsd_mount_image(
mountfsd_link,
arg_image, arg_image,
userns_fd, userns_fd,
/* options= */ NULL, /* options= */ NULL,
@ -6352,9 +6475,11 @@ static int run(int argc, char *argv[]) {
&arg_verity_settings, &arg_verity_settings,
dissect_image_flags, dissect_image_flags,
&dissected_image); &dissected_image);
if (r < 0) if (r < 0) {
log_error_errno(r, "Failed to mount image via mountfsd: %m");
goto finish; goto finish;
} }
}
/* Now that we mounted the image, let's try to remove it again, if it is ephemeral */ /* Now that we mounted the image, let's try to remove it again, if it is ephemeral */
if (remove_image && unlink(arg_image) >= 0) if (remove_image && unlink(arg_image) >= 0)
@ -6362,8 +6487,41 @@ static int run(int argc, char *argv[]) {
if (arg_architecture < 0) if (arg_architecture < 0)
arg_architecture = dissected_image_architecture(dissected_image); arg_architecture = dissected_image_architecture(dissected_image);
} else if (arg_mstack) {
assert(!arg_template);
assert(!arg_ephemeral);
r = chase_and_update(&arg_mstack, CHASE_MUST_BE_DIRECTORY);
if (r < 0)
goto finish;
if (!IN_SET(arg_userns_mode, USER_NAMESPACE_NO, USER_NAMESPACE_MANAGED))
return log_error_errno(SYNTHETIC_ERRNO(EOPNOTSUPP), "--mstack= requires managed user namespacing, or user namespace turned off.");
MStackFlags mstack_flags = arg_read_only ? MSTACK_RDONLY : 0;
r = mstack_load(arg_mstack,
/* dir_fd= */ -EBADF,
&mstack);
if (r < 0) {
log_error_errno(r, "Failed to load .mstack/ directory '%s': %m", arg_mstack);
goto finish;
} }
r = mstack_open_images(
mstack,
mountfsd_link,
userns_fd,
arg_image_policy,
/* image_filter= */ NULL,
mstack_flags);
if (r < 0) {
log_error_errno(r, "Failed to open .mstack/ layer images '%s': %m", arg_mstack);
goto finish;
}
} else
assert_not_reached();
/* Create a temporary place to mount stuff. */ /* Create a temporary place to mount stuff. */
r = mkdtemp_malloc("/tmp/nspawn-root-XXXXXX", &rootdir); r = mkdtemp_malloc("/tmp/nspawn-root-XXXXXX", &rootdir);
if (r < 0) { if (r < 0) {
@ -6375,8 +6533,11 @@ static int run(int argc, char *argv[]) {
if (r < 0) if (r < 0)
goto finish; goto finish;
mountfsd_link = sd_varlink_unref(mountfsd_link);
nsresource_link = sd_varlink_unref(nsresource_link);
if (!arg_quiet) { if (!arg_quiet) {
const char *t = arg_image ?: arg_directory; const char *t = arg_mstack ?: arg_image ?: arg_directory;
_cleanup_free_ char *u = NULL; _cleanup_free_ char *u = NULL;
(void) terminal_urlify_path(t, t, &u); (void) terminal_urlify_path(t, t, &u);
@ -6412,9 +6573,11 @@ static int run(int argc, char *argv[]) {
rootdir, rootdir,
mount_fd, mount_fd,
dissected_image, dissected_image,
mstack,
userns_fd, userns_fd,
fds, fds,
veth_name, &veth_created, veth_name,
&veth_created,
&expose_args, &master, &expose_args, &master,
&pid, &ret); &pid, &ret);
if (r <= 0) if (r <= 0)

View File

@ -6,6 +6,7 @@
#include "sd-bus.h" #include "sd-bus.h"
#include "sd-messages.h" #include "sd-messages.h"
#include "sd-varlink.h"
#include "bus-common-errors.h" #include "bus-common-errors.h"
#include "bus-error.h" #include "bus-error.h"
@ -470,7 +471,7 @@ static int portable_extract_by_path(
_cleanup_close_ int rfd = open(path, O_PATH|O_CLOEXEC); _cleanup_close_ int rfd = open(path, O_PATH|O_CLOEXEC);
if (rfd < 0) if (rfd < 0)
return log_error_errno(errno, "Failed to open '%s': %m", path); return log_debug_errno(errno, "Failed to open '%s': %m", path);
struct stat st; struct stat st;
if (fstat(rfd, &st) < 0) if (fstat(rfd, &st) < 0)
@ -480,17 +481,25 @@ static int portable_extract_by_path(
_cleanup_free_ char *image_name = NULL; _cleanup_free_ char *image_name = NULL;
r = path_extract_filename(path, &image_name); r = path_extract_filename(path, &image_name);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to extract image name from path '%s': %m", path); return log_debug_errno(r, "Failed to extract image name from path '%s': %m", path);
if (scope == RUNTIME_SCOPE_USER && uid_is_foreign(st.st_uid)) { if (scope == RUNTIME_SCOPE_USER && uid_is_foreign(st.st_uid)) {
_cleanup_close_ int userns_fd = nsresource_allocate_userns(/* name= */ NULL, NSRESOURCE_UIDS_64K); _cleanup_close_ int userns_fd = nsresource_allocate_userns(
/* vl= */ NULL,
/* name= */ NULL,
NSRESOURCE_UIDS_64K);
if (userns_fd < 0) if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to allocate user namespace: %m"); return log_debug_errno(userns_fd, "Failed to allocate user namespace: %m");
_cleanup_close_ int mfd = -EBADF; _cleanup_close_ int mfd = -EBADF;
r = mountfsd_mount_directory_fd(rfd, userns_fd, DISSECT_IMAGE_FOREIGN_UID, &mfd); r = mountfsd_mount_directory_fd(
/* vl= */ NULL,
rfd,
userns_fd,
DISSECT_IMAGE_FOREIGN_UID,
&mfd);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to open '%s' via mountfsd: %m", path); return r;
_cleanup_close_pair_ int seq[2] = EBADF_PAIR; _cleanup_close_pair_ int seq[2] = EBADF_PAIR;
if (socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0, seq) < 0) if (socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0, seq) < 0)
@ -587,7 +596,7 @@ static int portable_extract_by_path(
/* root_hash_path= */ NULL, /* root_hash_path= */ NULL,
/* root_hash_sig_path= */ NULL); /* root_hash_sig_path= */ NULL);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to read verity artifacts for %s: %m", path); return log_debug_errno(r, "Failed to read verity artifacts for %s: %m", path);
if (verity.data_path) if (verity.data_path)
flags |= DISSECT_IMAGE_NO_PARTITION_TABLE; flags |= DISSECT_IMAGE_NO_PARTITION_TABLE;
@ -604,11 +613,15 @@ static int portable_extract_by_path(
return log_debug_errno(r, "Failed to create temporary directory: %m"); return log_debug_errno(r, "Failed to create temporary directory: %m");
if (scope == RUNTIME_SCOPE_USER) { if (scope == RUNTIME_SCOPE_USER) {
userns_fd = nsresource_allocate_userns(/* name= */ NULL, NSRESOURCE_UIDS_64K); userns_fd = nsresource_allocate_userns(
/* vl= */ NULL,
/* name= */ NULL,
NSRESOURCE_UIDS_64K);
if (userns_fd < 0) if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to allocate user namespace: %m"); return log_debug_errno(userns_fd, "Failed to allocate user namespace: %m");
r = mountfsd_mount_image_fd( r = mountfsd_mount_image_fd(
/* vl= */ NULL,
rfd, rfd,
userns_fd, userns_fd,
/* options= */ NULL, /* options= */ NULL,
@ -1267,11 +1280,36 @@ void portable_changes_free(PortableChange *changes, size_t n_changes) {
} }
static const char *root_setting_from_image(ImageType type) { static const char *root_setting_from_image(ImageType type) {
return IN_SET(type, IMAGE_DIRECTORY, IMAGE_SUBVOLUME) ? "RootDirectory=" : "RootImage="; switch (type) {
case IMAGE_DIRECTORY:
case IMAGE_SUBVOLUME:
return "RootDirectory=";
case IMAGE_RAW:
case IMAGE_BLOCK:
return "RootImage=";
case IMAGE_MSTACK:
return "RootMStack=";
default:
return NULL;
}
} }
static const char *extension_setting_from_image(ImageType type) { static const char *extension_setting_from_image(ImageType type) {
return IN_SET(type, IMAGE_DIRECTORY, IMAGE_SUBVOLUME) ? "ExtensionDirectories=" : "ExtensionImages="; switch (type) {
case IMAGE_DIRECTORY:
case IMAGE_SUBVOLUME:
return "ExtensionDirectories=";
case IMAGE_RAW:
case IMAGE_BLOCK:
return "ExtensionImages=";
default:
return NULL;
}
} }
static int make_marker_text(const char *image_path, OrderedHashmap *extension_images, char **ret_text) { static int make_marker_text(const char *image_path, OrderedHashmap *extension_images, char **ret_text) {
@ -1401,6 +1439,8 @@ static int install_chroot_dropin(
Image *ext; Image *ext;
root_type = root_setting_from_image(type); root_type = root_setting_from_image(type);
if (!root_type)
return log_debug_errno(SYNTHETIC_ERRNO(EOPNOTSUPP), "Image type '%s' not supported as portable service.", image_type_to_string(type));
r = path_extract_filename(m->image_path ?: image_path, &base_name); r = path_extract_filename(m->image_path ?: image_path, &base_name);
if (r < 0) if (r < 0)
@ -1453,15 +1493,19 @@ static int install_chroot_dropin(
if (m->image_path && !path_equal(m->image_path, image_path)) if (m->image_path && !path_equal(m->image_path, image_path))
ORDERED_HASHMAP_FOREACH(ext, extension_images) { ORDERED_HASHMAP_FOREACH(ext, extension_images) {
_cleanup_free_ char *extension_base_name = NULL;
const char *extension_setting = extension_setting_from_image(ext->type);
if (!extension_setting)
return log_debug_errno(SYNTHETIC_ERRNO(EOPNOTSUPP), "Image type '%s' not supported for extensions: %m", image_type_to_string(ext->type));
_cleanup_free_ char *extension_base_name = NULL;
r = path_extract_filename(ext->path, &extension_base_name); r = path_extract_filename(ext->path, &extension_base_name);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to extract basename from '%s': %m", ext->path); return log_debug_errno(r, "Failed to extract basename from '%s': %m", ext->path);
if (!strextend(&text, if (!strextend(&text,
"\n", "\n",
extension_setting_from_image(ext->type), extension_setting,
ext->path, ext->path,
/* With --force tell PID1 to avoid enforcing that the image <name> and /* With --force tell PID1 to avoid enforcing that the image <name> and
* extension-release.<name> have to match. */ * extension-release.<name> have to match. */
@ -1777,33 +1821,46 @@ static int install_image(
if (flags & PORTABLE_MIXED_COPY_LINK) { if (flags & PORTABLE_MIXED_COPY_LINK) {
if (scope == RUNTIME_SCOPE_USER) { if (scope == RUNTIME_SCOPE_USER) {
_cleanup_close_ int userns_fd = nsresource_allocate_userns(/* name= */ NULL, NSRESOURCE_UIDS_64K); _cleanup_close_ int userns_fd = nsresource_allocate_userns(
/* vl= */ NULL,
/* name= */ NULL,
NSRESOURCE_UIDS_64K);
if (userns_fd < 0) if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to allocate user namespace: %m"); return log_debug_errno(userns_fd, "Failed to allocate user namespace: %m");
_cleanup_close_ int fd = open(image_path, O_DIRECTORY|O_CLOEXEC); _cleanup_close_ int fd = open(image_path, O_DIRECTORY|O_CLOEXEC);
if (fd < 0) if (fd < 0)
return log_error_errno(errno, "Failed to open '%s': %m", image_path); return log_debug_errno(errno, "Failed to open '%s': %m", image_path);
struct stat st; struct stat st;
if (fstat(fd, &st) < 0) if (fstat(fd, &st) < 0)
return log_error_errno(errno, "Failed to stat '%s': %m", image_path); return log_debug_errno(errno, "Failed to stat '%s': %m", image_path);
_cleanup_(sd_varlink_unrefp) sd_varlink *mountfsd_link = NULL;
r = mountfsd_connect(&mountfsd_link);
if (r < 0)
return r;
_cleanup_close_ int tree_fd = -EBADF; _cleanup_close_ int tree_fd = -EBADF;
if (uid_is_foreign(st.st_uid)) { if (uid_is_foreign(st.st_uid)) {
r = mountfsd_mount_directory_fd(fd, userns_fd, DISSECT_IMAGE_FOREIGN_UID, &tree_fd); r = mountfsd_mount_directory_fd(
mountfsd_link,
fd,
userns_fd,
DISSECT_IMAGE_FOREIGN_UID,
&tree_fd);
if (r < 0) if (r < 0)
return r; return r;
} else } else
tree_fd = TAKE_FD(fd); tree_fd = TAKE_FD(fd);
_cleanup_close_ int directory_fd = -EBADF; _cleanup_close_ int directory_fd = -EBADF;
r = mountfsd_make_directory(target, MODE_INVALID, /* flags= */ 0, &directory_fd); r = mountfsd_make_directory(mountfsd_link, target, MODE_INVALID, /* flags= */ 0, &directory_fd);
if (r < 0) if (r < 0)
return r; return r;
_cleanup_close_ int copy_fd = -EBADF; _cleanup_close_ int copy_fd = -EBADF;
r = mountfsd_mount_directory_fd(directory_fd, userns_fd, DISSECT_IMAGE_FOREIGN_UID, &copy_fd); r = mountfsd_mount_directory_fd(mountfsd_link, directory_fd, userns_fd, DISSECT_IMAGE_FOREIGN_UID, &copy_fd);
if (r < 0) if (r < 0)
return r; return r;

View File

@ -3053,7 +3053,7 @@ static bool shall_make_executable_absolute(void) {
if (!arg_root_directory && running_in_chroot() > 0) if (!arg_root_directory && running_in_chroot() > 0)
return false; return false;
FOREACH_STRING(f, "RootDirectory=", "RootImage=", "ExecSearchPath=", "MountImages=", "ExtensionImages=") FOREACH_STRING(f, "RootDirectory=", "RootImage=", "RootMStack=", "ExecSearchPath=", "MountImages=", "ExtensionImages=")
if (strv_find_startswith(arg_property, f)) if (strv_find_startswith(arg_property, f))
return false; return false;

View File

@ -2468,6 +2468,7 @@ static const BusProperty execute_properties[] = {
{ "SELinuxContext", bus_append_string }, { "SELinuxContext", bus_append_string },
{ "RootImage", bus_append_string }, { "RootImage", bus_append_string },
{ "RootVerity", bus_append_string }, { "RootVerity", bus_append_string },
{ "RootMStack", bus_append_string },
{ "RuntimeDirectoryPreserve", bus_append_string }, { "RuntimeDirectoryPreserve", bus_append_string },
{ "Personality", bus_append_string }, { "Personality", bus_append_string },
{ "KeyringMode", bus_append_string }, { "KeyringMode", bus_append_string },

View File

@ -11,6 +11,7 @@
#include "sd-json.h" #include "sd-json.h"
#include "sd-path.h" #include "sd-path.h"
#include "sd-varlink.h"
#include "alloc-util.h" #include "alloc-util.h"
#include "blockdev-util.h" #include "blockdev-util.h"
@ -36,6 +37,7 @@
#include "log.h" #include "log.h"
#include "loop-util.h" #include "loop-util.h"
#include "mountpoint-util.h" #include "mountpoint-util.h"
#include "mstack.h"
#include "namespace-util.h" #include "namespace-util.h"
#include "nsresource.h" #include "nsresource.h"
#include "nulstr-util.h" #include "nulstr-util.h"
@ -43,6 +45,7 @@
#include "path-lookup.h" #include "path-lookup.h"
#include "path-util.h" #include "path-util.h"
#include "process-util.h" #include "process-util.h"
#include "recurse-dir.h"
#include "rm-rf.h" #include "rm-rf.h"
#include "runtime-scope.h" #include "runtime-scope.h"
#include "stat-util.h" #include "stat-util.h"
@ -478,12 +481,67 @@ static int image_make(
on_mount_id = _mnt_id; on_mount_id = _mnt_id;
if (S_ISDIR(st->st_mode)) { if (S_ISDIR(st->st_mode)) {
unsigned file_attr = 0;
usec_t crtime = 0;
if (!ret) if (!ret)
return 0; return 0;
if (endswith(path, ".mstack")) {
usec_t crtime = 0;
r = fd_getcrtime(fd, &crtime);
if (r < 0)
log_debug_errno(r, "Unable to read creation time of '%s', ignoring: %m", path);
if (!pretty) {
r = extract_image_basename(
path,
image_class_suffix_to_string(c),
STRV_MAKE(".mstack"),
&pretty_buffer,
/* ret_suffix= */ NULL);
if (r < 0)
return r;
pretty = pretty_buffer;
}
_cleanup_(mstack_freep) MStack *mstack = NULL;
r = mstack_load(path, fd, &mstack);
if (r < 0) {
log_debug_errno(r, "Failed to load mstack '%s', ignoring: %m", path);
read_only = true;
} else if (!read_only) {
r = mstack_is_read_only(mstack);
if (r < 0)
log_debug_errno(r, "Failed to determine if mstack '%s' is read-only, assuming it is: %m", path);
read_only = r != 0;
}
r = image_new(IMAGE_MSTACK,
c,
pretty,
path,
read_only,
crtime,
/* mtime= */ 0,
fh,
on_mount_id,
(uint64_t) st->st_ino,
ret);
if (r < 0)
return r;
if (mstack) {
r = mstack_is_foreign_uid_owned(mstack);
if (r < 0)
log_debug_errno(r, "Failed to determine if mstack '%s' is foreign UID owned, assuming it is not: %m", path);
if (r > 0)
(*ret)->foreign_uid_owned = true;
}
return 0;
}
if (!pretty) { if (!pretty) {
r = extract_image_basename( r = extract_image_basename(
path, path,
@ -532,10 +590,12 @@ static int image_make(
} }
/* Get directory creation time (not available everywhere, but that's OK */ /* Get directory creation time (not available everywhere, but that's OK */
usec_t crtime = 0;
(void) fd_getcrtime(fd, &crtime); (void) fd_getcrtime(fd, &crtime);
/* If the IMMUTABLE bit is set, we consider the directory read-only. Since the ioctl is not /* If the IMMUTABLE bit is set, we consider the directory read-only. Since the ioctl is not
* supported everywhere we ignore failures. */ * supported everywhere we ignore failures. */
unsigned file_attr = 0;
(void) read_attr_fd(fd, &file_attr); (void) read_attr_fd(fd, &file_attr);
/* It's just a normal directory. */ /* It's just a normal directory. */
@ -767,7 +827,7 @@ static char** make_possible_filenames(ImageClass class, const char *image_name)
assert(image_name); assert(image_name);
FOREACH_STRING(v_suffix, "", ".v") FOREACH_STRING(v_suffix, "", ".v")
FOREACH_STRING(format_suffix, "", ".raw") { FOREACH_STRING(format_suffix, "", ".raw", ".mstack") {
_cleanup_free_ char *j = NULL; _cleanup_free_ char *j = NULL;
const char *class_suffix; const char *class_suffix;
@ -864,6 +924,12 @@ int image_find(RuntimeScope scope,
log_debug("Ignoring non-regular file '%s' with .raw suffix.", fname); log_debug("Ignoring non-regular file '%s' with .raw suffix.", fname);
continue; continue;
} }
} else if (endswith(fname, ".mstack")) {
if (!S_ISDIR(st.st_mode)) {
log_debug("Ignoring non-directory '%s' with .mstack suffix.", fname);
continue;
}
} else if (endswith(fname, ".v")) { } else if (endswith(fname, ".v")) {
@ -1088,7 +1154,7 @@ int image_discover(
r = extract_image_basename( r = extract_image_basename(
nov, nov,
image_class_suffix_to_string(class), image_class_suffix_to_string(class),
STRV_MAKE(".raw", ""), STRV_MAKE(".raw", ".mstack", ""),
&pretty, &pretty,
&suffix); &suffix);
if (r < 0) { if (r < 0) {
@ -1139,7 +1205,7 @@ int image_discover(
r = extract_image_basename( r = extract_image_basename(
fname, fname,
image_class_suffix_to_string(class), image_class_suffix_to_string(class),
/* format_suffixes= */ NULL, STRV_MAKE(".mstack", ""),
&pretty, &pretty,
/* ret_suffix= */ NULL); /* ret_suffix= */ NULL);
if (r < 0) { if (r < 0) {
@ -1210,23 +1276,36 @@ int image_discover(
return 0; return 0;
} }
static int unprivileged_remove(Image *i) { static int unpriv_remove_cb(
int r; RecurseDirEvent event,
const char *path,
int dir_fd,
int inode_fd,
const struct dirent *de,
const struct statx *sx,
void *userdata) {
assert(i); int r, userns_fd = PTR_TO_FD(userdata);
_cleanup_close_ int userns_fd = nsresource_allocate_userns(/* name= */ NULL, /* size= */ NSRESOURCE_UIDS_64K); assert(sx);
if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to allocate transient user namespace: %m"); if (event == RECURSE_DIR_ENTER &&
S_ISDIR(sx->stx_mode) &&
uid_is_foreign(sx->stx_uid)) {
/* This is owned by the foreign UID range, and a dir, let's remove it via mountfsd userns
* shenanigans. */
_cleanup_close_ int tree_fd = -EBADF; _cleanup_close_ int tree_fd = -EBADF;
r = mountfsd_mount_directory( r = mountfsd_mount_directory_fd(
i->path, /* vl= */ NULL,
inode_fd,
userns_fd, userns_fd,
DISSECT_IMAGE_FOREIGN_UID, DISSECT_IMAGE_FOREIGN_UID,
&tree_fd); &tree_fd);
if (r < 0) if (r < 0)
return r; return r;
/* Fork off child that moves into userns and does the copying */ /* Fork off child that moves into userns and does the copying */
r = pidref_safe_fork_full( r = pidref_safe_fork_full(
"rm-tree", "rm-tree",
@ -1235,8 +1314,8 @@ static int unprivileged_remove(Image *i) {
FORK_RESET_SIGNALS|FORK_CLOSE_ALL_FDS|FORK_DEATHSIG_SIGTERM|FORK_WAIT|FORK_REOPEN_LOG, FORK_RESET_SIGNALS|FORK_CLOSE_ALL_FDS|FORK_DEATHSIG_SIGTERM|FORK_WAIT|FORK_REOPEN_LOG,
/* ret= */ NULL); /* ret= */ NULL);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Process that was supposed to remove tree failed: %m"); log_debug_errno(r, "Process that was supposed to remove subtree '%s' failed, ignoring: %m", empty_to_root(path));
if (r == 0) { else if (r == 0) {
/* child */ /* child */
r = namespace_enter( r = namespace_enter(
@ -1258,13 +1337,52 @@ static int unprivileged_remove(Image *i) {
r = rm_rf_children(dfd, REMOVE_PHYSICAL|REMOVE_SUBVOLUME|REMOVE_CHMOD, /* root_dev= */ NULL); r = rm_rf_children(dfd, REMOVE_PHYSICAL|REMOVE_SUBVOLUME|REMOVE_CHMOD, /* root_dev= */ NULL);
if (r < 0) { if (r < 0) {
log_error_errno(r, "Failed to empty '%s' directory in foreign UID mode: %m", i->path); log_error_errno(r, "Failed to empty '%s' directory in foreign UID mode: %m", empty_to_root(path));
_exit(EXIT_FAILURE); _exit(EXIT_FAILURE);
} }
_exit(EXIT_SUCCESS); _exit(EXIT_SUCCESS);
} }
/* Don't descent further into this one, and delete it immediately */
return RECURSE_DIR_UNLINK_GRACEFUL;
}
/* Everything else try to remove */
if (event == RECURSE_DIR_LEAVE)
return RECURSE_DIR_UNLINK_GRACEFUL;
return RECURSE_DIR_CONTINUE;
}
static int unprivileged_remove(Image *i) {
int r;
assert(i);
/* We want this to work in complex .mstack/ hierarchies, where the main directory (and maybe a .v/
* directory below or two) might be owned by the user themselves, but some subdirs might be owned by
* the foreign UID range. We deal with this by recursively descending down the tree, and removing
* foreign-owned ranges via userns shenanigans, and the rest just like that. */
_cleanup_close_ int userns_fd = nsresource_allocate_userns(
/* vl= */ NULL,
/* name= */ NULL,
/* size= */ NSRESOURCE_UIDS_64K);
if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to allocate transient user namespace: %m");
r = recurse_dir_at(
AT_FDCWD,
i->path,
/* statx_mask= */ STATX_TYPE|STATX_UID,
/* n_depth_max= */ UINT_MAX,
RECURSE_DIR_IGNORE_DOT|RECURSE_DIR_ENSURE_TYPE|RECURSE_DIR_SAME_MOUNT|RECURSE_DIR_INODE_FD|RECURSE_DIR_TOPLEVEL,
unpriv_remove_cb,
FD_TO_PTR(userns_fd));
if (r < 0)
return r;
return 0; return 0;
} }
@ -1305,6 +1423,9 @@ int image_remove(Image *i, RuntimeScope scope) {
/* Allow deletion of read-only directories */ /* Allow deletion of read-only directories */
(void) chattr_path(i->path, 0, FS_IMMUTABLE_FL); (void) chattr_path(i->path, 0, FS_IMMUTABLE_FL);
_fallthrough_;
case IMAGE_MSTACK:
/* If this is foreign owned, try an unprivileged remove first, but accept if that doesn't work, and do it directly either way, maybe it works */ /* If this is foreign owned, try an unprivileged remove first, but accept if that doesn't work, and do it directly either way, maybe it works */
if (i->foreign_uid_owned) if (i->foreign_uid_owned)
(void) unprivileged_remove(i); (void) unprivileged_remove(i);
@ -1559,13 +1680,22 @@ static int unprivileged_clone(Image *i, const char *new_path) {
assert(i); assert(i);
assert(new_path); assert(new_path);
_cleanup_close_ int userns_fd = nsresource_allocate_userns(/* name= */ NULL, /* size= */ NSRESOURCE_UIDS_64K); _cleanup_close_ int userns_fd = nsresource_allocate_userns(
/* vl= */ NULL,
/* name= */ NULL,
/* size= */ NSRESOURCE_UIDS_64K);
if (userns_fd < 0) if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to allocate transient user namespace: %m"); return log_debug_errno(userns_fd, "Failed to allocate transient user namespace: %m");
_cleanup_(sd_varlink_unrefp) sd_varlink *link = NULL;
r = mountfsd_connect(&link);
if (r < 0)
return r;
/* Map original image */ /* Map original image */
_cleanup_close_ int tree_fd = -EBADF; _cleanup_close_ int tree_fd = -EBADF;
r = mountfsd_mount_directory( r = mountfsd_mount_directory(
link,
i->path, i->path,
userns_fd, userns_fd,
DISSECT_IMAGE_FOREIGN_UID, DISSECT_IMAGE_FOREIGN_UID,
@ -1576,6 +1706,7 @@ static int unprivileged_clone(Image *i, const char *new_path) {
/* Make new image */ /* Make new image */
_cleanup_close_ int new_fd = -EBADF; _cleanup_close_ int new_fd = -EBADF;
r = mountfsd_make_directory( r = mountfsd_make_directory(
link,
new_path, new_path,
MODE_INVALID, MODE_INVALID,
/* flags= */ 0, /* flags= */ 0,
@ -1586,6 +1717,7 @@ static int unprivileged_clone(Image *i, const char *new_path) {
/* Mount new image */ /* Mount new image */
_cleanup_close_ int target_fd = -EBADF; _cleanup_close_ int target_fd = -EBADF;
r = mountfsd_mount_directory_fd( r = mountfsd_mount_directory_fd(
link,
new_fd, new_fd,
userns_fd, userns_fd,
DISSECT_IMAGE_FOREIGN_UID, DISSECT_IMAGE_FOREIGN_UID,
@ -1593,6 +1725,8 @@ static int unprivileged_clone(Image *i, const char *new_path) {
if (r < 0) if (r < 0)
return r; return r;
link = sd_varlink_unref(link);
/* Fork off child that moves into userns and does the copying */ /* Fork off child that moves into userns and does the copying */
return copy_tree_at_foreign(tree_fd, target_fd, userns_fd); return copy_tree_at_foreign(tree_fd, target_fd, userns_fd);
} }
@ -2367,6 +2501,7 @@ static const char* const image_type_table[_IMAGE_TYPE_MAX] = {
[IMAGE_SUBVOLUME] = "subvolume", [IMAGE_SUBVOLUME] = "subvolume",
[IMAGE_RAW] = "raw", [IMAGE_RAW] = "raw",
[IMAGE_BLOCK] = "block", [IMAGE_BLOCK] = "block",
[IMAGE_MSTACK] = "mstack",
}; };
DEFINE_STRING_TABLE_LOOKUP(image_type, ImageType); DEFINE_STRING_TABLE_LOOKUP(image_type, ImageType);

View File

@ -11,6 +11,7 @@ typedef enum ImageType {
IMAGE_SUBVOLUME, IMAGE_SUBVOLUME,
IMAGE_RAW, IMAGE_RAW,
IMAGE_BLOCK, IMAGE_BLOCK,
IMAGE_MSTACK,
_IMAGE_TYPE_MAX, _IMAGE_TYPE_MAX,
_IMAGE_TYPE_INVALID = -EINVAL, _IMAGE_TYPE_INVALID = -EINVAL,
} ImageType; } ImageType;

View File

@ -200,8 +200,15 @@ int probe_sector_size_prefer_ioctl(int fd, uint32_t *ret) {
if (fstat(fd, &st) < 0) if (fstat(fd, &st) < 0)
return -errno; return -errno;
if (S_ISBLK(st.st_mode)) if (S_ISBLK(st.st_mode)) {
return blockdev_get_sector_size(fd, ret); int r;
r = blockdev_get_sector_size(fd, ret);
if (r < 0)
return r;
return 1; /* indicate we *did* find it, like probe_sector_size() does */
}
return probe_sector_size(fd, ret); return probe_sector_size(fd, ret);
} }
@ -4958,6 +4965,7 @@ int verity_dissect_and_mount(
return log_debug_errno(userns_fd, "Failed to open our own user namespace: %m"); return log_debug_errno(userns_fd, "Failed to open our own user namespace: %m");
r = mountfsd_mount_image( r = mountfsd_mount_image(
/* vl= */ NULL,
src_fd >= 0 ? FORMAT_PROC_FD_PATH(src_fd) : src, src_fd >= 0 ? FORMAT_PROC_FD_PATH(src_fd) : src,
userns_fd, userns_fd,
options, options,
@ -5125,7 +5133,30 @@ static void mount_image_reply_parameters_done(MountImageReplyParameters *p) {
#endif #endif
int mountfsd_connect(sd_varlink **ret) {
int r;
assert(ret);
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL;
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.MountFileSystem");
if (r < 0)
return log_debug_errno(r, "Failed to connect to mountfsd: %m");
r = sd_varlink_set_allow_fd_passing_input(vl, true);
if (r < 0)
return log_debug_errno(r, "Failed to enable varlink fd passing for read: %m");
r = sd_varlink_set_allow_fd_passing_output(vl, true);
if (r < 0)
return log_debug_errno(r, "Failed to enable varlink fd passing for write: %m");
*ret = TAKE_PTR(vl);
return 0;
}
int mountfsd_mount_image_fd( int mountfsd_mount_image_fd(
sd_varlink *vl,
int image_fd, int image_fd,
int userns_fd, int userns_fd,
const MountOptions *options, const MountOptions *options,
@ -5149,7 +5180,6 @@ int mountfsd_mount_image_fd(
_cleanup_(dissected_image_unrefp) DissectedImage *di = NULL; _cleanup_(dissected_image_unrefp) DissectedImage *di = NULL;
_cleanup_close_ int verity_data_fd = -EBADF; _cleanup_close_ int verity_data_fd = -EBADF;
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL;
_cleanup_free_ char *ps = NULL; _cleanup_free_ char *ps = NULL;
const char *error_id; const char *error_id;
int r; int r;
@ -5157,48 +5187,45 @@ int mountfsd_mount_image_fd(
assert(image_fd >= 0); assert(image_fd >= 0);
assert(ret); assert(ret);
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.MountFileSystem"); _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
if (!vl) {
r = mountfsd_connect(&_vl);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to connect to mountfsd: %m"); return r;
r = sd_varlink_set_allow_fd_passing_input(vl, true); vl = _vl;
if (r < 0) }
return log_error_errno(r, "Failed to enable varlink fd passing for read: %m");
r = sd_varlink_set_allow_fd_passing_output(vl, true);
if (r < 0)
return log_error_errno(r, "Failed to enable varlink fd passing for write: %m");
_cleanup_close_ int reopened_fd = -EBADF; _cleanup_close_ int reopened_fd = -EBADF;
image_fd = fd_reopen_condition(image_fd, O_CLOEXEC|O_NOCTTY|O_NONBLOCK|(FLAGS_SET(flags, DISSECT_IMAGE_MOUNT_READ_ONLY) ? O_RDONLY : O_RDWR), O_PATH, &reopened_fd); image_fd = fd_reopen_condition(image_fd, O_CLOEXEC|O_NOCTTY|O_NONBLOCK|(FLAGS_SET(flags, DISSECT_IMAGE_MOUNT_READ_ONLY) ? O_RDONLY : O_RDWR), O_PATH, &reopened_fd);
if (image_fd < 0) if (image_fd < 0)
return log_error_errno(image_fd, "Failed to reopen fd: %m"); return log_debug_errno(image_fd, "Failed to reopen fd: %m");
r = sd_varlink_push_dup_fd(vl, image_fd); r = sd_varlink_push_dup_fd(vl, image_fd);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to push image fd into varlink connection: %m"); return log_debug_errno(r, "Failed to push image fd into varlink connection: %m");
if (userns_fd >= 0) { if (userns_fd >= 0) {
r = sd_varlink_push_dup_fd(vl, userns_fd); r = sd_varlink_push_dup_fd(vl, userns_fd);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to push image fd into varlink connection: %m"); return log_debug_errno(r, "Failed to push image fd into varlink connection: %m");
} }
if (image_policy) { if (image_policy) {
r = image_policy_to_string(image_policy, /* simplify= */ false, &ps); r = image_policy_to_string(image_policy, /* simplify= */ false, &ps);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to format image policy to string: %m"); return log_debug_errno(r, "Failed to format image policy to string: %m");
} }
if (verity && verity->data_path) { if (verity && verity->data_path) {
verity_data_fd = open(verity->data_path, O_RDONLY|O_CLOEXEC); verity_data_fd = open(verity->data_path, O_RDONLY|O_CLOEXEC);
if (verity_data_fd < 0) if (verity_data_fd < 0)
return log_error_errno(errno, "Failed to open verity data file '%s': %m", verity->data_path); return log_debug_errno(errno, "Failed to open verity data file '%s': %m", verity->data_path);
r = sd_varlink_push_dup_fd(vl, verity_data_fd); r = sd_varlink_push_dup_fd(vl, verity_data_fd);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to push verity data fd into varlink connection: %m"); return log_debug_errno(r, "Failed to push verity data fd into varlink connection: %m");
} }
_cleanup_(sd_json_variant_unrefp) sd_json_variant *mount_options = NULL; _cleanup_(sd_json_variant_unrefp) sd_json_variant *mount_options = NULL;
@ -5220,7 +5247,7 @@ int mountfsd_mount_image_fd(
/* ret_values= */ NULL, /* ret_values= */ NULL,
&filtered); &filtered);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to filter mount options: %m"); return log_debug_errno(r, "Failed to filter mount options: %m");
if (isempty(filtered)) if (isempty(filtered))
continue; continue;
@ -5230,7 +5257,7 @@ int mountfsd_mount_image_fd(
&mount_options, &mount_options,
SD_JSON_BUILD_PAIR_STRING(partition_designator_to_string(i), filtered ?: o)); SD_JSON_BUILD_PAIR_STRING(partition_designator_to_string(i), filtered ?: o));
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to build mount options array: %m"); return log_debug_errno(r, "Failed to build mount options array: %m");
} }
sd_json_variant *reply = NULL; sd_json_variant *reply = NULL;
@ -5256,7 +5283,7 @@ int mountfsd_mount_image_fd(
r = sd_json_dispatch(reply, dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &p); r = sd_json_dispatch(reply, dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &p);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to parse MountImage() reply: %m"); return log_debug_errno(r, "Failed to parse MountImage() reply: %m");
log_debug("Effective image policy: %s", p.image_policy); log_debug("Effective image policy: %s", p.image_policy);
@ -5289,7 +5316,7 @@ int mountfsd_mount_image_fd(
r = sd_json_dispatch(i, partition_dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &pp); r = sd_json_dispatch(i, partition_dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &pp);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to parse partition data: %m"); return log_debug_errno(r, "Failed to parse partition data: %m");
if (pp.fsmount_fd_idx != UINT_MAX) { if (pp.fsmount_fd_idx != UINT_MAX) {
fsmount_fd = sd_varlink_take_fd(vl, pp.fsmount_fd_idx); fsmount_fd = sd_varlink_take_fd(vl, pp.fsmount_fd_idx);
@ -5302,11 +5329,11 @@ int mountfsd_mount_image_fd(
if (!di) { if (!di) {
r = dissected_image_new(/* path= */ NULL, &di); r = dissected_image_new(/* path= */ NULL, &di);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to allocated new dissected image structure: %m"); return log_debug_errno(r, "Failed to allocated new dissected image structure: %m");
} }
if (di->partitions[pp.designator].found) if (di->partitions[pp.designator].found)
return log_error_errno(SYNTHETIC_ERRNO(EBADMSG), "Duplicate partition data for '%s'.", partition_designator_to_string(pp.designator)); return log_debug_errno(SYNTHETIC_ERRNO(EBADMSG), "Duplicate partition data for '%s'.", partition_designator_to_string(pp.designator));
di->partitions[pp.designator] = (DissectedPartition) { di->partitions[pp.designator] = (DissectedPartition) {
.found = true, .found = true,
@ -5337,6 +5364,7 @@ int mountfsd_mount_image_fd(
} }
int mountfsd_mount_image( int mountfsd_mount_image(
sd_varlink *vl,
const char *path, const char *path,
int userns_fd, int userns_fd,
const MountOptions *options, const MountOptions *options,
@ -5352,10 +5380,10 @@ int mountfsd_mount_image(
_cleanup_close_ int image_fd = open(path, O_RDONLY|O_CLOEXEC); _cleanup_close_ int image_fd = open(path, O_RDONLY|O_CLOEXEC);
if (image_fd < 0) if (image_fd < 0)
return log_error_errno(errno, "Failed to open '%s': %m", path); return log_debug_errno(errno, "Failed to open '%s': %m", path);
_cleanup_(dissected_image_unrefp) DissectedImage *di = NULL; _cleanup_(dissected_image_unrefp) DissectedImage *di = NULL;
r = mountfsd_mount_image_fd(image_fd, userns_fd, options, image_policy, verity, flags, &di); r = mountfsd_mount_image_fd(vl, image_fd, userns_fd, options, image_policy, verity, flags, &di);
if (r < 0) if (r < 0)
return r; return r;
@ -5370,6 +5398,7 @@ int mountfsd_mount_image(
} }
int mountfsd_mount_directory_fd( int mountfsd_mount_directory_fd(
sd_varlink *vl,
int directory_fd, int directory_fd,
int userns_fd, int userns_fd,
DissectImageFlags flags, DissectImageFlags flags,
@ -5383,27 +5412,23 @@ int mountfsd_mount_directory_fd(
/* Pick one identity, not both, that makes no sense. */ /* Pick one identity, not both, that makes no sense. */
assert(!FLAGS_SET(flags, DISSECT_IMAGE_FOREIGN_UID|DISSECT_IMAGE_IDENTITY_UID)); assert(!FLAGS_SET(flags, DISSECT_IMAGE_FOREIGN_UID|DISSECT_IMAGE_IDENTITY_UID));
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL; _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.MountFileSystem"); if (!vl) {
r = mountfsd_connect(&_vl);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to connect to mountfsd: %m"); return r;
r = sd_varlink_set_allow_fd_passing_input(vl, true); vl = _vl;
if (r < 0) }
return log_error_errno(r, "Failed to enable varlink fd passing for read: %m");
r = sd_varlink_set_allow_fd_passing_output(vl, true);
if (r < 0)
return log_error_errno(r, "Failed to enable varlink fd passing for write: %m");
r = sd_varlink_push_dup_fd(vl, directory_fd); r = sd_varlink_push_dup_fd(vl, directory_fd);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to push directory fd into varlink connection: %m"); return log_debug_errno(r, "Failed to push directory fd into varlink connection: %m");
if (userns_fd >= 0) { if (userns_fd >= 0) {
r = sd_varlink_push_dup_fd(vl, userns_fd); r = sd_varlink_push_dup_fd(vl, userns_fd);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to push user namespace fd into varlink connection: %m"); return log_debug_errno(r, "Failed to push user namespace fd into varlink connection: %m");
} }
sd_json_variant *reply = NULL; sd_json_variant *reply = NULL;
@ -5430,17 +5455,18 @@ int mountfsd_mount_directory_fd(
unsigned fsmount_fd_idx = UINT_MAX; unsigned fsmount_fd_idx = UINT_MAX;
r = sd_json_dispatch(reply, dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &fsmount_fd_idx); r = sd_json_dispatch(reply, dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &fsmount_fd_idx);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to parse MountImage() reply: %m"); return log_debug_errno(r, "Failed to parse MountImage() reply: %m");
_cleanup_close_ int fsmount_fd = sd_varlink_take_fd(vl, fsmount_fd_idx); _cleanup_close_ int fsmount_fd = sd_varlink_take_fd(vl, fsmount_fd_idx);
if (fsmount_fd < 0) if (fsmount_fd < 0)
return log_error_errno(fsmount_fd, "Failed to take mount fd from Varlink connection: %m"); return log_debug_errno(fsmount_fd, "Failed to take mount fd from Varlink connection: %m");
*ret_mount_fd = TAKE_FD(fsmount_fd); *ret_mount_fd = TAKE_FD(fsmount_fd);
return 0; return 0;
} }
int mountfsd_mount_directory( int mountfsd_mount_directory(
sd_varlink *vl,
const char *path, const char *path,
int userns_fd, int userns_fd,
DissectImageFlags flags, DissectImageFlags flags,
@ -5451,12 +5477,13 @@ int mountfsd_mount_directory(
_cleanup_close_ int directory_fd = open(path, O_DIRECTORY|O_RDONLY|O_CLOEXEC|O_PATH); _cleanup_close_ int directory_fd = open(path, O_DIRECTORY|O_RDONLY|O_CLOEXEC|O_PATH);
if (directory_fd < 0) if (directory_fd < 0)
return log_error_errno(errno, "Failed to open '%s': %m", path); return log_debug_errno(errno, "Failed to open '%s': %m", path);
return mountfsd_mount_directory_fd(directory_fd, userns_fd, flags, ret_mount_fd); return mountfsd_mount_directory_fd(vl, directory_fd, userns_fd, flags, ret_mount_fd);
} }
int mountfsd_make_directory_fd( int mountfsd_make_directory_fd(
sd_varlink *vl,
int parent_fd, int parent_fd,
const char *name, const char *name,
mode_t mode, mode_t mode,
@ -5468,22 +5495,18 @@ int mountfsd_make_directory_fd(
assert(parent_fd >= 0); assert(parent_fd >= 0);
assert(name); assert(name);
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL; _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.MountFileSystem"); if (!vl) {
r = mountfsd_connect(&_vl);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to connect to mountfsd: %m"); return r;
r = sd_varlink_set_allow_fd_passing_input(vl, true); vl = _vl;
if (r < 0) }
return log_error_errno(r, "Failed to enable varlink fd passing for read: %m");
r = sd_varlink_set_allow_fd_passing_output(vl, true);
if (r < 0)
return log_error_errno(r, "Failed to enable varlink fd passing for write: %m");
r = sd_varlink_push_dup_fd(vl, parent_fd); r = sd_varlink_push_dup_fd(vl, parent_fd);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to push parent fd into varlink connection: %m"); return log_debug_errno(r, "Failed to push parent fd into varlink connection: %m");
sd_json_variant *reply = NULL; sd_json_variant *reply = NULL;
const char *error_id = NULL; const char *error_id = NULL;
@ -5507,11 +5530,11 @@ int mountfsd_make_directory_fd(
unsigned directory_fd_idx = UINT_MAX; unsigned directory_fd_idx = UINT_MAX;
r = sd_json_dispatch(reply, dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &directory_fd_idx); r = sd_json_dispatch(reply, dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &directory_fd_idx);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to parse MountImage() reply: %m"); return log_debug_errno(r, "Failed to parse MountImage() reply: %m");
_cleanup_close_ int directory_fd = sd_varlink_take_fd(vl, directory_fd_idx); _cleanup_close_ int directory_fd = sd_varlink_take_fd(vl, directory_fd_idx);
if (directory_fd < 0) if (directory_fd < 0)
return log_error_errno(directory_fd, "Failed to take directory fd from Varlink connection: %m"); return log_debug_errno(directory_fd, "Failed to take directory fd from Varlink connection: %m");
if (ret_directory_fd) if (ret_directory_fd)
*ret_directory_fd = TAKE_FD(directory_fd); *ret_directory_fd = TAKE_FD(directory_fd);
@ -5519,6 +5542,7 @@ int mountfsd_make_directory_fd(
} }
int mountfsd_make_directory( int mountfsd_make_directory(
sd_varlink *vl,
const char *path, const char *path,
mode_t mode, mode_t mode,
DissectImageFlags flags, DissectImageFlags flags,
@ -5529,18 +5553,18 @@ int mountfsd_make_directory(
_cleanup_free_ char *parent = NULL; _cleanup_free_ char *parent = NULL;
r = path_extract_directory(path, &parent); r = path_extract_directory(path, &parent);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to extract parent directory from '%s': %m", path); return log_debug_errno(r, "Failed to extract parent directory from '%s': %m", path);
_cleanup_free_ char *dirname = NULL; _cleanup_free_ char *dirname = NULL;
r = path_extract_filename(path, &dirname); r = path_extract_filename(path, &dirname);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to extract directory name from '%s': %m", path); return log_debug_errno(r, "Failed to extract directory name from '%s': %m", path);
_cleanup_close_ int fd = open(parent, O_DIRECTORY|O_CLOEXEC); _cleanup_close_ int fd = open(parent, O_DIRECTORY|O_CLOEXEC);
if (fd < 0) if (fd < 0)
return log_error_errno(r, "Failed to open '%s': %m", parent); return log_debug_errno(r, "Failed to open '%s': %m", parent);
return mountfsd_make_directory_fd(fd, dirname, mode, flags, ret_directory_fd); return mountfsd_make_directory_fd(vl, fd, dirname, mode, flags, ret_directory_fd);
} }
int copy_tree_at_foreign(int source_fd, int target_fd, int userns_fd) { int copy_tree_at_foreign(int source_fd, int target_fd, int userns_fd) {
@ -5600,6 +5624,7 @@ int remove_tree_foreign(const char *path, int userns_fd) {
_cleanup_close_ int tree_fd = -EBADF; _cleanup_close_ int tree_fd = -EBADF;
r = mountfsd_mount_directory( r = mountfsd_mount_directory(
/* vl= */ NULL,
path, path,
userns_fd, userns_fd,
DISSECT_IMAGE_FOREIGN_UID, DISSECT_IMAGE_FOREIGN_UID,
@ -5611,7 +5636,7 @@ int remove_tree_foreign(const char *path, int userns_fd) {
"rm-tree", "rm-tree",
/* stdio_fds= */ NULL, /* stdio_fds= */ NULL,
(int[]) { userns_fd, tree_fd }, 2, (int[]) { userns_fd, tree_fd }, 2,
FORK_RESET_SIGNALS|FORK_CLOSE_ALL_FDS|FORK_DEATHSIG_SIGTERM|FORK_LOG|FORK_REOPEN_LOG|FORK_WAIT, FORK_RESET_SIGNALS|FORK_CLOSE_ALL_FDS|FORK_DEATHSIG_SIGTERM|FORK_WAIT|FORK_REOPEN_LOG,
/* ret= */ NULL); /* ret= */ NULL);
if (r < 0) if (r < 0)
return r; return r;
@ -5625,19 +5650,19 @@ int remove_tree_foreign(const char *path, int userns_fd) {
userns_fd, userns_fd,
/* root_fd= */ -EBADF); /* root_fd= */ -EBADF);
if (r < 0) { if (r < 0) {
log_error_errno(r, "Failed to join user namespace: %m"); log_debug_errno(r, "Failed to join user namespace: %m");
_exit(EXIT_FAILURE); _exit(EXIT_FAILURE);
} }
_cleanup_close_ int dfd = fd_reopen(tree_fd, O_DIRECTORY|O_CLOEXEC); _cleanup_close_ int dfd = fd_reopen(tree_fd, O_DIRECTORY|O_CLOEXEC);
if (dfd < 0) { if (dfd < 0) {
log_error_errno(r, "Failed to reopen tree fd: %m"); log_debug_errno(r, "Failed to reopen tree fd: %m");
_exit(EXIT_FAILURE); _exit(EXIT_FAILURE);
} }
r = rm_rf_children(dfd, REMOVE_PHYSICAL|REMOVE_SUBVOLUME|REMOVE_CHMOD, /* root_dev= */ NULL); r = rm_rf_children(dfd, REMOVE_PHYSICAL|REMOVE_SUBVOLUME|REMOVE_CHMOD, /* root_dev= */ NULL);
if (r < 0) if (r < 0)
log_warning_errno(r, "Failed to empty '%s' directory in foreign UID mode, ignoring: %m", path); log_debug_errno(r, "Failed to empty '%s' directory in foreign UID mode, ignoring: %m", path);
_exit(EXIT_SUCCESS); _exit(EXIT_SUCCESS);
} }

View File

@ -269,13 +269,23 @@ static inline const char* dissected_partition_fstype(const DissectedPartition *m
int get_common_dissect_directory(char **ret); int get_common_dissect_directory(char **ret);
int mountfsd_mount_image_fd(int image_fd, int userns_fd, const MountOptions *options, const ImagePolicy *image_policy, const VeritySettings *verity, DissectImageFlags flags, DissectedImage **ret); int mountfsd_connect(sd_varlink **ret);
int mountfsd_mount_image(const char *path, int userns_fd, const MountOptions *options, const ImagePolicy *image_policy, const VeritySettings *verity, DissectImageFlags flags, DissectedImage **ret);
int mountfsd_mount_directory_fd(int directory_fd, int userns_fd, DissectImageFlags flags, int *ret_mount_fd);
int mountfsd_mount_directory(const char *path, int userns_fd, DissectImageFlags flags, int *ret_mount_fd);
int mountfsd_make_directory_fd(int parent_fd, const char *name, mode_t mode, DissectImageFlags flags, int *ret_directory_fd); /* All the calls below take a 'link' parameter, that may be an already established Varlink connection object
int mountfsd_make_directory(const char *path, mode_t mode, DissectImageFlags flags, int *ret_directory_fd); * towards systemd-mountfsd, previously created via mountfsd_connect(). This serves two purposes: first of
* all allows more efficient resource usage, as this allows recycling already allocated resources for
* multiple calls. Secondly, the user credentials are pinned at time of mountfsd_connect(), and the caller
* hence can drop privileges afterwards while keeping open the connection and still execute relevant
* operations under the original identity, until the connection is closed. The 'link' parameter may be passed
* as NULL in which case a short-lived connection is created, just to execute the requested operation. */
int mountfsd_mount_image_fd(sd_varlink *vl, int image_fd, int userns_fd, const MountOptions *options, const ImagePolicy *image_policy, const VeritySettings *verity, DissectImageFlags flags, DissectedImage **ret);
int mountfsd_mount_image(sd_varlink *vl, const char *path, int userns_fd, const MountOptions *options, const ImagePolicy *image_policy, const VeritySettings *verity, DissectImageFlags flags, DissectedImage **ret);
int mountfsd_mount_directory_fd(sd_varlink *vl, int directory_fd, int userns_fd, DissectImageFlags flags, int *ret_mount_fd);
int mountfsd_mount_directory(sd_varlink *vl, const char *path, int userns_fd, DissectImageFlags flags, int *ret_mount_fd);
int mountfsd_make_directory_fd(sd_varlink *vl, int parent_fd, const char *name, mode_t mode, DissectImageFlags flags, int *ret_directory_fd);
int mountfsd_make_directory(sd_varlink *vl, const char *path, mode_t mode, DissectImageFlags flags, int *ret_directory_fd);
int copy_tree_at_foreign(int source_fd, int target_fd, int userns_fd); int copy_tree_at_foreign(int source_fd, int target_fd, int userns_fd);
int remove_tree_foreign(const char *path, int userns_fd); int remove_tree_foreign(const char *path, int userns_fd);

View File

@ -125,6 +125,7 @@ int import_url_change_suffix(
static const char* const import_type_table[_IMPORT_TYPE_MAX] = { static const char* const import_type_table[_IMPORT_TYPE_MAX] = {
[IMPORT_RAW] = "raw", [IMPORT_RAW] = "raw",
[IMPORT_TAR] = "tar", [IMPORT_TAR] = "tar",
[IMPORT_OCI] = "oci",
}; };
DEFINE_STRING_TABLE_LOOKUP(import_type, ImportType); DEFINE_STRING_TABLE_LOOKUP(import_type, ImportType);

View File

@ -6,6 +6,7 @@
typedef enum ImportType { typedef enum ImportType {
IMPORT_RAW, IMPORT_RAW,
IMPORT_TAR, IMPORT_TAR,
IMPORT_OCI,
_IMPORT_TYPE_MAX, _IMPORT_TYPE_MAX,
_IMPORT_TYPE_INVALID = -EINVAL, _IMPORT_TYPE_INVALID = -EINVAL,
} ImportType; } ImportType;

View File

@ -132,6 +132,7 @@ shared_sources = files(
'module-util.c', 'module-util.c',
'mount-setup.c', 'mount-setup.c',
'mount-util.c', 'mount-util.c',
'mstack.c',
'net-condition.c', 'net-condition.c',
'netif-naming-scheme.c', 'netif-naming-scheme.c',
'netif-sriov.c', 'netif-sriov.c',

1203
src/shared/mstack.c Normal file

File diff suppressed because it is too large Load Diff

64
src/shared/mstack.h Normal file
View File

@ -0,0 +1,64 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#pragma once
#include "discover-image.h"
#include "shared-forward.h"
typedef enum MStackFlags {
MSTACK_MKDIR = 1 << 0, /* when mounting, create top-level inode to mount on top */
MSTACK_RDONLY = 1 << 1,
} MStackFlags;
typedef enum MStackMountType {
MSTACK_ROOT, /* optional "root" entry used as root, with the layer@/rw layers only used for /usr/ */
MSTACK_LAYER, /* "layer@…" entries that are the lower (read-only) layers of an overlayfs stack */
MSTACK_RW, /* "rw" entry that is the upper (writable) layer of an overlayfs stack (contains two subdirs: 'data' + 'work') */
MSTACK_BIND, /* "bind@…" entries that are (writable) bind mounted on top of the overlayfs */
MSTACK_ROBIND, /* "robind@…" similar, but read-only */
_MSTACK_MOUNT_TYPE_MAX,
_MSTACK_MOUNT_TYPE_INVALID = -EINVAL,
} MStackMountType;
typedef struct MStackMount {
MStackMountType mount_type;
char *what;
int what_fd;
int mount_fd;
char *sort_key;
char *where;
ImageType image_type;
DissectedImage *dissected_image;
} MStackMount;
typedef struct MStack {
char *path;
MStackMount *mounts;
size_t n_mounts;
bool has_tmpfs_root; /* If true, we need a throw-away tmpfs as root */
bool has_overlayfs; /* Indicates whether we need overlayfs (i.e. if there are more than a single layer */
MStackMount *root_mount; /* If there's a MOUNT_BIND/MOUNT_ROBIND/MOUNT_ROOT mount, this points to it */
int root_mount_fd;
int usr_mount_fd;
} MStack;
#define MSTACK_INIT \
(MStack) { \
.root_mount_fd = -EBADF, \
.usr_mount_fd = -EBADF, \
}
MStack *mstack_free(MStack *mstack);
DEFINE_TRIVIAL_CLEANUP_FUNC(MStack*, mstack_free);
int mstack_load(const char *dir, int dir_fd, MStack **ret);
int mstack_open_images(MStack *mstack, sd_varlink *mountfsd_link, int userns_fd, const ImagePolicy *image_policy, const ImageFilter *image_filter, MStackFlags flags);
int mstack_make_mounts(MStack *mstack, const char *temp_mount_dir, MStackFlags flags);
int mstack_bind_mounts(MStack *mstack, const char *where, int where_fd, MStackFlags flags, int *ret_root_fd);
/* The four calls above in one */
int mstack_apply(const char *dir, int dir_fd, const char *where, const char *temp_mount_dir, sd_varlink *mountfsd_link, int userns_fd, const ImagePolicy *image_policy, const ImageFilter *image_filter, MStackFlags flags, int *ret_root_fd);
int mstack_is_read_only(MStack *mstack);
int mstack_is_foreign_uid_owned(MStack *mstack);
DECLARE_STRING_TABLE_LOOKUP_TO_STRING(mstack_mount_type, MStackMountType);

View File

@ -57,8 +57,25 @@ static int make_pid_name(char **ret) {
return 0; return 0;
} }
int nsresource_allocate_userns(const char *name, uint64_t size) { int nsresource_connect(sd_varlink **ret) {
int r;
assert(ret);
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL; _cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL;
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.NamespaceResource");
if (r < 0)
return log_debug_errno(r, "Failed to connect to namespace resource manager: %m");
r = sd_varlink_set_allow_fd_passing_output(vl, true);
if (r < 0)
return log_debug_errno(r, "Failed to enable varlink fd passing for write: %m");
*ret = TAKE_PTR(vl);
return 0;
}
int nsresource_allocate_userns(sd_varlink *vl, const char *name, uint64_t size) {
_cleanup_close_ int userns_fd = -EBADF; _cleanup_close_ int userns_fd = -EBADF;
_cleanup_free_ char *_name = NULL; _cleanup_free_ char *_name = NULL;
const char *error_id; const char *error_id;
@ -77,13 +94,14 @@ int nsresource_allocate_userns(const char *name, uint64_t size) {
if (size <= 0 || size > UINT64_C(0x100000000)) /* Note: the server actually only allows allocating 1 or 64K right now */ if (size <= 0 || size > UINT64_C(0x100000000)) /* Note: the server actually only allows allocating 1 or 64K right now */
return -EINVAL; return -EINVAL;
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.NamespaceResource"); _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
if (!vl) {
r = nsresource_connect(&_vl);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to connect to namespace resource manager: %m"); return r;
r = sd_varlink_set_allow_fd_passing_output(vl, true); vl = _vl;
if (r < 0) }
return log_debug_errno(r, "Failed to enable varlink fd passing for write: %m");
userns_fd = userns_acquire_empty(); userns_fd = userns_acquire_empty();
if (userns_fd < 0) if (userns_fd < 0)
@ -113,8 +131,7 @@ int nsresource_allocate_userns(const char *name, uint64_t size) {
return TAKE_FD(userns_fd); return TAKE_FD(userns_fd);
} }
int nsresource_register_userns(const char *name, int userns_fd) { int nsresource_register_userns(sd_varlink *vl, const char *name, int userns_fd) {
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL;
_cleanup_close_ int _userns_fd = -EBADF; _cleanup_close_ int _userns_fd = -EBADF;
_cleanup_free_ char *_name = NULL; _cleanup_free_ char *_name = NULL;
const char *error_id; const char *error_id;
@ -138,13 +155,14 @@ int nsresource_register_userns(const char *name, int userns_fd) {
userns_fd = _userns_fd; userns_fd = _userns_fd;
} }
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.NamespaceResource"); _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
if (!vl) {
r = nsresource_connect(&_vl);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to connect to namespace resource manager: %m"); return r;
r = sd_varlink_set_allow_fd_passing_output(vl, true); vl = _vl;
if (r < 0) }
return log_debug_errno(r, "Failed to enable varlink fd passing for write: %m");
userns_fd_idx = sd_varlink_push_dup_fd(vl, userns_fd); userns_fd_idx = sd_varlink_push_dup_fd(vl, userns_fd);
if (userns_fd_idx < 0) if (userns_fd_idx < 0)
@ -169,8 +187,7 @@ int nsresource_register_userns(const char *name, int userns_fd) {
return 0; return 0;
} }
int nsresource_add_mount(int userns_fd, int mount_fd) { int nsresource_add_mount(sd_varlink *vl, int userns_fd, int mount_fd) {
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL;
_cleanup_close_ int _userns_fd = -EBADF; _cleanup_close_ int _userns_fd = -EBADF;
int r, userns_fd_idx, mount_fd_idx; int r, userns_fd_idx, mount_fd_idx;
const char *error_id; const char *error_id;
@ -185,21 +202,22 @@ int nsresource_add_mount(int userns_fd, int mount_fd) {
userns_fd = _userns_fd; userns_fd = _userns_fd;
} }
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.NamespaceResource"); _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
if (!vl) {
r = nsresource_connect(&_vl);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to connect to namespace resource manager: %m"); return r;
r = sd_varlink_set_allow_fd_passing_output(vl, true); vl = _vl;
if (r < 0) }
return log_error_errno(r, "Failed to enable varlink fd passing for write: %m");
userns_fd_idx = sd_varlink_push_dup_fd(vl, userns_fd); userns_fd_idx = sd_varlink_push_dup_fd(vl, userns_fd);
if (userns_fd_idx < 0) if (userns_fd_idx < 0)
return log_error_errno(userns_fd_idx, "Failed to push userns fd into varlink connection: %m"); return log_debug_errno(userns_fd_idx, "Failed to push userns fd into varlink connection: %m");
mount_fd_idx = sd_varlink_push_dup_fd(vl, mount_fd); mount_fd_idx = sd_varlink_push_dup_fd(vl, mount_fd);
if (mount_fd_idx < 0) if (mount_fd_idx < 0)
return log_error_errno(mount_fd_idx, "Failed to push mount fd into varlink connection: %m"); return log_debug_errno(mount_fd_idx, "Failed to push mount fd into varlink connection: %m");
sd_json_variant *reply = NULL; sd_json_variant *reply = NULL;
r = sd_varlink_callbo( r = sd_varlink_callbo(
@ -210,19 +228,18 @@ int nsresource_add_mount(int userns_fd, int mount_fd) {
SD_JSON_BUILD_PAIR_UNSIGNED("userNamespaceFileDescriptor", userns_fd_idx), SD_JSON_BUILD_PAIR_UNSIGNED("userNamespaceFileDescriptor", userns_fd_idx),
SD_JSON_BUILD_PAIR_UNSIGNED("mountFileDescriptor", mount_fd_idx)); SD_JSON_BUILD_PAIR_UNSIGNED("mountFileDescriptor", mount_fd_idx));
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to call AddMountToUserNamespace() varlink call: %m"); return log_debug_errno(r, "Failed to call AddMountToUserNamespace() varlink call: %m");
if (streq_ptr(error_id, "io.systemd.NamespaceResource.UserNamespaceNotRegistered")) { if (streq_ptr(error_id, "io.systemd.NamespaceResource.UserNamespaceNotRegistered")) {
log_notice("User namespace has not been allocated via namespace resource registry, not adding mount to registration."); log_debug("User namespace has not been allocated via namespace resource registry, not adding mount to registration.");
return 0; return 0;
} }
if (error_id) if (error_id)
return log_error_errno(sd_varlink_error_to_errno(error_id, reply), "Failed to mount image: %s", error_id); return log_debug_errno(sd_varlink_error_to_errno(error_id, reply), "Failed to mount image: %s", error_id);
return 1; return 1;
} }
int nsresource_add_cgroup(int userns_fd, int cgroup_fd) { int nsresource_add_cgroup(sd_varlink *vl, int userns_fd, int cgroup_fd) {
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL;
_cleanup_close_ int _userns_fd = -EBADF; _cleanup_close_ int _userns_fd = -EBADF;
int r, userns_fd_idx, cgroup_fd_idx; int r, userns_fd_idx, cgroup_fd_idx;
const char *error_id; const char *error_id;
@ -237,13 +254,14 @@ int nsresource_add_cgroup(int userns_fd, int cgroup_fd) {
userns_fd = _userns_fd; userns_fd = _userns_fd;
} }
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.NamespaceResource"); _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
if (!vl) {
r = nsresource_connect(&_vl);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to connect to namespace resource manager: %m"); return r;
r = sd_varlink_set_allow_fd_passing_output(vl, true); vl = _vl;
if (r < 0) }
return log_debug_errno(r, "Failed to enable varlink fd passing for write: %m");
userns_fd_idx = sd_varlink_push_dup_fd(vl, userns_fd); userns_fd_idx = sd_varlink_push_dup_fd(vl, userns_fd);
if (userns_fd_idx < 0) if (userns_fd_idx < 0)
@ -264,7 +282,7 @@ int nsresource_add_cgroup(int userns_fd, int cgroup_fd) {
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to call AddControlGroupToUserNamespace() varlink call: %m"); return log_debug_errno(r, "Failed to call AddControlGroupToUserNamespace() varlink call: %m");
if (streq_ptr(error_id, "io.systemd.NamespaceResource.UserNamespaceNotRegistered")) { if (streq_ptr(error_id, "io.systemd.NamespaceResource.UserNamespaceNotRegistered")) {
log_notice("User namespace has not been allocated via namespace resource registry, not adding cgroup to registration."); log_debug("User namespace has not been allocated via namespace resource registry, not adding cgroup to registration.");
return 0; return 0;
} }
if (error_id) if (error_id)
@ -287,6 +305,7 @@ static void interface_params_done(InterfaceParams *p) {
} }
int nsresource_add_netif_veth( int nsresource_add_netif_veth(
sd_varlink *vl,
int userns_fd, int userns_fd,
int netns_fd, int netns_fd,
const char *namespace_ifname, const char *namespace_ifname,
@ -294,7 +313,6 @@ int nsresource_add_netif_veth(
char **ret_namespace_ifname) { char **ret_namespace_ifname) {
_cleanup_close_ int _userns_fd = -EBADF, _netns_fd = -EBADF; _cleanup_close_ int _userns_fd = -EBADF, _netns_fd = -EBADF;
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL;
int r, userns_fd_idx, netns_fd_idx; int r, userns_fd_idx, netns_fd_idx;
const char *error_id; const char *error_id;
@ -314,13 +332,14 @@ int nsresource_add_netif_veth(
netns_fd = _netns_fd; netns_fd = _netns_fd;
} }
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.NamespaceResource"); _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
if (!vl) {
r = nsresource_connect(&_vl);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to connect to namespace resource manager: %m"); return r;
r = sd_varlink_set_allow_fd_passing_output(vl, true); vl = _vl;
if (r < 0) }
return log_debug_errno(r, "Failed to enable varlink fd passing for write: %m");
userns_fd_idx = sd_varlink_push_dup_fd(vl, userns_fd); userns_fd_idx = sd_varlink_push_dup_fd(vl, userns_fd);
if (userns_fd_idx < 0) if (userns_fd_idx < 0)
@ -343,7 +362,7 @@ int nsresource_add_netif_veth(
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to call AddNetworkToUserNamespace() varlink call: %m"); return log_debug_errno(r, "Failed to call AddNetworkToUserNamespace() varlink call: %m");
if (streq_ptr(error_id, "io.systemd.NamespaceResource.UserNamespaceNotRegistered")) { if (streq_ptr(error_id, "io.systemd.NamespaceResource.UserNamespaceNotRegistered")) {
log_notice("User namespace has not been allocated via namespace resource registry, not adding network to registration."); log_debug("User namespace has not been allocated via namespace resource registry, not adding network to registration.");
return 0; return 0;
} }
if (error_id) if (error_id)
@ -368,11 +387,11 @@ int nsresource_add_netif_veth(
} }
int nsresource_add_netif_tap( int nsresource_add_netif_tap(
sd_varlink *vl,
int userns_fd, int userns_fd,
char **ret_host_ifname) { char **ret_host_ifname) {
_cleanup_close_ int _userns_fd = -EBADF; _cleanup_close_ int _userns_fd = -EBADF;
_cleanup_(sd_varlink_unrefp) sd_varlink *vl = NULL;
int r, userns_fd_idx; int r, userns_fd_idx;
const char *error_id; const char *error_id;
@ -384,13 +403,14 @@ int nsresource_add_netif_tap(
userns_fd = _userns_fd; userns_fd = _userns_fd;
} }
r = sd_varlink_connect_address(&vl, "/run/systemd/io.systemd.NamespaceResource"); _cleanup_(sd_varlink_unrefp) sd_varlink *_vl = NULL;
if (!vl) {
r = nsresource_connect(&_vl);
if (r < 0) if (r < 0)
return log_debug_errno(r, "Failed to connect to namespace resource manager: %m"); return r;
r = sd_varlink_set_allow_fd_passing_output(vl, true); vl = _vl;
if (r < 0) }
return log_debug_errno(r, "Failed to enable varlink fd passing for write: %m");
r = sd_varlink_set_allow_fd_passing_input(vl, true); r = sd_varlink_set_allow_fd_passing_input(vl, true);
if (r < 0) if (r < 0)

View File

@ -7,9 +7,19 @@
#define NSRESOURCE_UIDS_64K 0x10000U #define NSRESOURCE_UIDS_64K 0x10000U
#define NSRESOURCE_UIDS_1 1U #define NSRESOURCE_UIDS_1 1U
int nsresource_allocate_userns(const char *name, uint64_t size); int nsresource_connect(sd_varlink **ret);
int nsresource_register_userns(const char *name, int userns_fd);
int nsresource_add_mount(int userns_fd, int mount_fd); /* All the calls below take a 'link' parameter, that may be an already established Varlink connection object
int nsresource_add_cgroup(int userns_fd, int cgroup_fd); * towards systemd-nsresourced, previously created via nsresource_connect(). This serves two purposes: first
int nsresource_add_netif_veth(int userns_fd, int netns_fd, const char *namespace_ifname, char **ret_host_ifname, char **ret_namespace_ifname); * of all allows more efficient resource usage, as this allows recycling already allocated resources for
int nsresource_add_netif_tap(int userns_fd, char **ret_host_ifname); * multiple calls. Secondly, the user credentials are pinned at time of nsresource_connect(), and the caller
* hence can drop privileges afterwards while keeping open the connection and still execute relevant
* operations under the original identity, until the connection is closed. The 'link' parameter may be passed
* as NULL in which case a short-lived connection is created, just to execute the requested operation. */
int nsresource_allocate_userns(sd_varlink *vl, const char *name, uint64_t size);
int nsresource_register_userns(sd_varlink *vl, const char *name, int userns_fd);
int nsresource_add_mount(sd_varlink *vl, int userns_fd, int mount_fd);
int nsresource_add_cgroup(sd_varlink *vl, int userns_fd, int cgroup_fd);
int nsresource_add_netif_veth(sd_varlink *vl, int userns_fd, int netns_fd, const char *namespace_ifname, char **ret_host_ifname, char **ret_namespace_ifname);
int nsresource_add_netif_tap(sd_varlink *vl, int userns_fd, char **ret_host_ifname);

View File

@ -53,6 +53,7 @@ typedef struct Condition Condition;
typedef struct ConfigSection ConfigSection; typedef struct ConfigSection ConfigSection;
typedef struct ConfigTableItem ConfigTableItem; typedef struct ConfigTableItem ConfigTableItem;
typedef struct CPUSet CPUSet; typedef struct CPUSet CPUSet;
typedef struct DissectedImage DissectedImage;
typedef struct DnsAnswer DnsAnswer; typedef struct DnsAnswer DnsAnswer;
typedef struct DnsPacket DnsPacket; typedef struct DnsPacket DnsPacket;
typedef struct DnsQuestion DnsQuestion; typedef struct DnsQuestion DnsQuestion;
@ -64,6 +65,7 @@ typedef struct FDSet FDSet;
typedef struct Fido2HmacSalt Fido2HmacSalt; typedef struct Fido2HmacSalt Fido2HmacSalt;
typedef struct GroupRecord GroupRecord; typedef struct GroupRecord GroupRecord;
typedef struct Image Image; typedef struct Image Image;
typedef struct ImageFilter ImageFilter;
typedef struct ImagePolicy ImagePolicy; typedef struct ImagePolicy ImagePolicy;
typedef struct InstallChange InstallChange; typedef struct InstallChange InstallChange;
typedef struct InstallInfo InstallInfo; typedef struct InstallInfo InstallInfo;
@ -72,6 +74,7 @@ typedef struct LoopDevice LoopDevice;
typedef struct MachineBindUserContext MachineBindUserContext; typedef struct MachineBindUserContext MachineBindUserContext;
typedef struct MachineCredentialContext MachineCredentialContext; typedef struct MachineCredentialContext MachineCredentialContext;
typedef struct MountOptions MountOptions; typedef struct MountOptions MountOptions;
typedef struct MStack MStack;
typedef struct OpenFile OpenFile; typedef struct OpenFile OpenFile;
typedef struct Pkcs11EncryptedKey Pkcs11EncryptedKey; typedef struct Pkcs11EncryptedKey Pkcs11EncryptedKey;
typedef struct Table Table; typedef struct Table Table;

View File

@ -147,8 +147,8 @@ static int open_inode_finalize(OpenInode *of) {
/* We adjust the UID/GID right before the mode, since doing this might affect the mode (drops /* We adjust the UID/GID right before the mode, since doing this might affect the mode (drops
* suid/sgid bits). * suid/sgid bits).
* *
* We adjust the mode only when leaving a dir, because if we are unpriv we might lose the * We adjust the mode only when leaving a dir, because if we are unprivileged we might lose
* ability to enter it once we do this. */ * the ability to enter it once we do this. */
if (uid_is_valid(of->uid) || gid_is_valid(of->gid) || of->mode != MODE_INVALID) { if (uid_is_valid(of->uid) || gid_is_valid(of->gid) || of->mode != MODE_INVALID) {
k = fchmod_and_chown_with_fallback(of->fd, /* path= */ NULL, of->mode, of->uid, of->gid); k = fchmod_and_chown_with_fallback(of->fd, /* path= */ NULL, of->mode, of->uid, of->gid);
@ -277,6 +277,91 @@ static int archive_unpack_regular(
return TAKE_FD(fd); return TAKE_FD(fd);
} }
static int overlayfs_fsetfattr(
const char *path, /* purely decorative, for log purposes */
int fd,
const char *name, /* xattr key name */
const char *value /* xattr value */) {
int r;
assert(fd >= 0);
assert(path);
assert(name);
assert(value);
/* overlayfs knows magic {user|trusted}.overlay.* xattrs for whiteouts and opaque directories. The
* 'user.overlay.*' ones are only checked if overlayfs is mounted with "userxattr". We only set that
* one because we want to operate unprivileged. Ideally, we'd set both here, to maximize the chance
* that things work both in privileged and unprivileged scenarios, but unfortunately this has the
* effect that the privileged ones are ignored (and visible in the overlayfs mount). */
_cleanup_free_ char *n = strjoin("user.overlay.", name);
if (!n)
return log_oom();
r = xsetxattr(fd, /* path= */ NULL, AT_EMPTY_PATH, n, value);
if (r < 0)
return log_error_errno(r, "Failed to set '%s' xattr on file '%s': %m", n, path);
return 0;
}
static int archive_unpack_whiteout(
struct archive *a,
struct archive_entry *entry,
int parent_fd,
const char *parent_path, /* Full path of 'parent_fd', purely decorative for log purposes */
const char *filename, /* Just the filename we are supposed to whiteout */
const char *path /* Full path of the whiteout file, purely decorative for log purposes */) {
int r;
assert(a);
assert(entry);
assert(parent_fd >= 0);
assert(parent_path);
assert(filename);
assert(path);
_cleanup_free_ char *tmp = NULL;
_cleanup_close_ int fd = open_tmpfile_linkable_at(parent_fd, filename, O_CLOEXEC|O_WRONLY, &tmp);
if (fd < 0)
return log_error_errno(fd, "Failed to create whiteout file for '%s': %m", path);
CLEANUP_TMPFILE_AT(parent_fd, tmp);
r = overlayfs_fsetfattr(path, fd, "whiteout", "y");
if (r < 0)
return r;
/* As per https://docs.kernel.org/filesystems/overlayfs.html also mark the parent */
r = overlayfs_fsetfattr(parent_path, parent_fd, "opaque", "x");
if (r < 0)
return r;
r = link_tmpfile_at(fd, parent_fd, tmp, filename, LINK_TMPFILE_REPLACE);
if (r < 0)
return log_error_errno(r, "Failed to install regular file '%s': %m", path);
tmp = mfree(tmp); /* disarm CLEANUP_TMPFILE_AT */
return 0; /* we do not return an fd here, because this kills an inode, and doesn't synthesize one */
}
static int archive_unpack_opaque(
struct archive *a,
struct archive_entry *entry,
int parent_fd,
const char *parent_path) {
assert(a);
assert(entry);
assert(parent_fd >= 0);
assert(parent_path);
/* we do not return an fd here either */
return overlayfs_fsetfattr(parent_path, parent_fd, "opaque", "y");
}
static int archive_unpack_directory( static int archive_unpack_directory(
struct archive *a, struct archive *a,
struct archive_entry *entry, struct archive_entry *entry,
@ -908,15 +993,45 @@ int tar_x(int input_fd, int tree_fd, TarFlags flags) {
switch (filetype) { switch (filetype) {
case S_IFREG: case S_IFREG:
if (FLAGS_SET(flags, TAR_OCI_WHITEOUTS)) {
if (streq(e, ".wh..wh..opq")) {
r = archive_unpack_opaque(a, entry, parent_fd, empty_to_root(parent_path));
if (r < 0)
return r;
/* NB: this does not create an inode! */
break;
}
const char *w = startswith(e, ".wh.");
if (w) {
if (!filename_is_valid(w))
return log_error_errno(SYNTHETIC_ERRNO(EBADMSG), "Invalid whiteout file entry '%s', refusing.", e);
r = archive_unpack_whiteout(a, entry, parent_fd, empty_to_root(parent_path), w, j);
if (r < 0)
return r;
/* NB: this does not create an inode! */
break;
}
}
fd = archive_unpack_regular(a, entry, parent_fd, e, j, fflags); fd = archive_unpack_regular(a, entry, parent_fd, e, j, fflags);
if (fd < 0)
return fd;
break; break;
case S_IFDIR: case S_IFDIR:
fd = archive_unpack_directory(a, entry, parent_fd, e, j, fflags); fd = archive_unpack_directory(a, entry, parent_fd, e, j, fflags);
if (fd < 0)
return fd;
break; break;
case S_IFLNK: case S_IFLNK:
fd = archive_unpack_symlink(a, entry, parent_fd, e, j); fd = archive_unpack_symlink(a, entry, parent_fd, e, j);
if (fd < 0)
return fd;
break; break;
case S_IFCHR: case S_IFCHR:
@ -924,6 +1039,8 @@ int tar_x(int input_fd, int tree_fd, TarFlags flags) {
case S_IFIFO: case S_IFIFO:
case S_IFSOCK: case S_IFSOCK:
fd = archive_unpack_special_inode(a, entry, parent_fd, e, j, filetype); fd = archive_unpack_special_inode(a, entry, parent_fd, e, j, filetype);
if (fd < 0)
return fd;
break; break;
default: default:
@ -931,9 +1048,6 @@ int tar_x(int input_fd, int tree_fd, TarFlags flags) {
SYNTHETIC_ERRNO(ENOTRECOVERABLE), SYNTHETIC_ERRNO(ENOTRECOVERABLE),
"Unexpected file type %i of '%s', refusing.", (int) filetype, j); "Unexpected file type %i of '%s', refusing.", (int) filetype, j);
} }
if (fd < 0)
return fd;
} else { } else {
/* This is some intermediary node in the path that we haven't opened yet. Create it with default attributes */ /* This is some intermediary node in the path that we haven't opened yet. Create it with default attributes */
fd = open_mkdir_at(parent_fd, e, O_CLOEXEC, 0700); fd = open_mkdir_at(parent_fd, e, O_CLOEXEC, 0700);
@ -949,6 +1063,7 @@ int tar_x(int input_fd, int tree_fd, TarFlags flags) {
* fully done with the inode (i.e. after creating further inodes inside of dir inodes * fully done with the inode (i.e. after creating further inodes inside of dir inodes
* for example), due to permission problems this might create or that the mtime * for example), due to permission problems this might create or that the mtime
* changes we do might still be affected by our changes. */ * changes we do might still be affected by our changes. */
if (fd >= 0) {
open_inodes[n_open_inodes++] = (OpenInode) { open_inodes[n_open_inodes++] = (OpenInode) {
.fd = TAKE_FD(fd), .fd = TAKE_FD(fd),
.path = TAKE_PTR(j), .path = TAKE_PTR(j),
@ -967,6 +1082,7 @@ int tar_x(int input_fd, int tree_fd, TarFlags flags) {
n_xa = 0; n_xa = 0;
} }
} }
}
r = open_inode_finalize_many(&open_inodes, &n_open_inodes); r = open_inode_finalize_many(&open_inodes, &n_open_inodes);
if (r < 0) if (r < 0)

View File

@ -4,6 +4,7 @@
typedef enum TarFlags { typedef enum TarFlags {
TAR_SELINUX = 1 << 0, /* Include SELinux xattr in tarball, or unpack it */ TAR_SELINUX = 1 << 0, /* Include SELinux xattr in tarball, or unpack it */
TAR_SQUASH_UIDS_ABOVE_64K = 1 << 1, /* Squash UIDs/GIDs above 64K when packing/unpacking to the nobody user */ TAR_SQUASH_UIDS_ABOVE_64K = 1 << 1, /* Squash UIDs/GIDs above 64K when packing/unpacking to the nobody user */
TAR_OCI_WHITEOUTS = 1 << 2, /* Turn OCI/aufs whiteout inodes into overlayfs whiteouts */
} TarFlags; } TarFlags;
int tar_x(int input_fd, int tree_fd, TarFlags flags); int tar_x(int input_fd, int tree_fd, TarFlags flags);

View File

@ -409,6 +409,8 @@ static SD_VARLINK_DEFINE_STRUCT_TYPE(
SD_VARLINK_DEFINE_FIELD(RootDirectory, SD_VARLINK_STRING, SD_VARLINK_NULLABLE), SD_VARLINK_DEFINE_FIELD(RootDirectory, SD_VARLINK_STRING, SD_VARLINK_NULLABLE),
SD_VARLINK_FIELD_COMMENT("https://www.freedesktop.org/software/systemd/man"PROJECT_VERSION_STR"systemd.exec.html#RootImage="), SD_VARLINK_FIELD_COMMENT("https://www.freedesktop.org/software/systemd/man"PROJECT_VERSION_STR"systemd.exec.html#RootImage="),
SD_VARLINK_DEFINE_FIELD(RootImage, SD_VARLINK_STRING, SD_VARLINK_NULLABLE), SD_VARLINK_DEFINE_FIELD(RootImage, SD_VARLINK_STRING, SD_VARLINK_NULLABLE),
SD_VARLINK_FIELD_COMMENT("https://www.freedesktop.org/software/systemd/man"PROJECT_VERSION_STR"systemd.exec.html#RootMStack="),
SD_VARLINK_DEFINE_FIELD(RootMStack, SD_VARLINK_STRING, SD_VARLINK_NULLABLE),
SD_VARLINK_FIELD_COMMENT("https://www.freedesktop.org/software/systemd/man"PROJECT_VERSION_STR"systemd.exec.html#RootImageOptions="), SD_VARLINK_FIELD_COMMENT("https://www.freedesktop.org/software/systemd/man"PROJECT_VERSION_STR"systemd.exec.html#RootImageOptions="),
SD_VARLINK_DEFINE_FIELD_BY_TYPE(RootImageOptions, PartitionMountOptions, SD_VARLINK_ARRAY|SD_VARLINK_NULLABLE), SD_VARLINK_DEFINE_FIELD_BY_TYPE(RootImageOptions, PartitionMountOptions, SD_VARLINK_ARRAY|SD_VARLINK_NULLABLE),
SD_VARLINK_FIELD_COMMENT("https://www.freedesktop.org/software/systemd/man"PROJECT_VERSION_STR"systemd.exec.html#RootEphemeral="), SD_VARLINK_FIELD_COMMENT("https://www.freedesktop.org/software/systemd/man"PROJECT_VERSION_STR"systemd.exec.html#RootEphemeral="),

View File

@ -798,3 +798,11 @@ const PickFilter pick_filter_image_dir[1] = {
.architecture = _ARCHITECTURE_INVALID, .architecture = _ARCHITECTURE_INVALID,
}, },
}; };
const PickFilter pick_filter_image_mstack[1] = {
{
.type_mask = UINT32_C(1) << DT_DIR,
.architecture = _ARCHITECTURE_INVALID,
.suffix = ".mstack",
},
};

View File

@ -64,8 +64,10 @@ int path_uses_vpick(const char *path);
extern const PickFilter pick_filter_image_raw[1]; extern const PickFilter pick_filter_image_raw[1];
extern const PickFilter pick_filter_image_dir[1]; extern const PickFilter pick_filter_image_dir[1];
extern const PickFilter pick_filter_image_mstack[1];
#define pick_filter_image_any (const PickFilter[]) { \ #define pick_filter_image_any (const PickFilter[]) { \
pick_filter_image_raw[0], \ pick_filter_image_raw[0], \
pick_filter_image_mstack[0], \
pick_filter_image_dir[0], \ pick_filter_image_dir[0], \
} }

View File

@ -11,7 +11,9 @@
#include "log.h" #include "log.h"
#include "main-func.h" #include "main-func.h"
#include "path-lookup.h" #include "path-lookup.h"
#include "path-util.h"
#include "socket-util.h" #include "socket-util.h"
#include "stat-util.h"
#include "string-util.h" #include "string-util.h"
#include "strv.h" #include "strv.h"
#include "time-util.h" #include "time-util.h"
@ -309,27 +311,65 @@ static int process_machine(const char *machine, const char *port) {
assert(port); assert(port);
_cleanup_(sd_json_variant_unrefp) sd_json_variant *result = NULL; _cleanup_(sd_json_variant_unrefp) sd_json_variant *result = NULL;
r = fetch_machine(machine, RUNTIME_SCOPE_USER, &result); RuntimeScope scope = RUNTIME_SCOPE_USER;
if (r == -ESRCH) r = fetch_machine(machine, scope, &result);
r = fetch_machine(machine, RUNTIME_SCOPE_SYSTEM, &result); if (r == -ESRCH) {
scope = RUNTIME_SCOPE_SYSTEM;
r = fetch_machine(machine, scope, &result);
}
if (r < 0) if (r < 0)
return r; return r;
uint32_t cid = VMADDR_CID_ANY; struct {
uint32_t cid;
static const sd_json_dispatch_field dispatch_table[] = { const char *class;
{ "vSockCid", SD_JSON_VARIANT_UNSIGNED, sd_json_dispatch_uint32, 0, 0 }, const char *service;
{} } p = {
.cid = VMADDR_CID_ANY,
}; };
r = sd_json_dispatch(result, dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &cid); static const sd_json_dispatch_field dispatch_table[] = {
{ "vSockCid", SD_JSON_VARIANT_UNSIGNED, sd_json_dispatch_uint32, voffsetof(p, cid), 0 },
{ "class", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, voffsetof(p, class), SD_JSON_MANDATORY },
{ "service", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, voffsetof(p, service), 0 },
};
r = sd_json_dispatch(result, dispatch_table, SD_JSON_ALLOW_EXTENSIONS, &p);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to parse Varlink reply: %m"); return log_error_errno(r, "Failed to parse Varlink reply: %m");
if (cid == VMADDR_CID_ANY) if (streq(p.class, "container")) {
_cleanup_free_ char *path = NULL;
if (!streq_ptr(p.service, "systemd-nspawn"))
return log_error_errno(SYNTHETIC_ERRNO(EMEDIUMTYPE), "Don't know how to SSH into '%s' container %s.", p.service, machine);
r = runtime_directory_generic(scope, "systemd/nspawn/unix-export", &path);
if (r < 0)
return log_error_errno(r, "Failed to determine runtime directory: %m");
if (!path_extend(&path, machine, "ssh"))
return log_oom();
r = is_socket(path);
if (r < 0)
return log_error_errno(r, "Failed to check if '%s' exists and is a socket: %m", path);
if (r == 0)
return log_error_errno(
SYNTHETIC_ERRNO(ENOENT),
"'%s' does not exist or is not a socket, are sshd and systemd-ssh-generator installed and enabled in the container?",
path);
return process_unix(path);
}
if (!streq(p.class, "vm"))
return log_error_errno(SYNTHETIC_ERRNO(EMEDIUMTYPE), "Don't know how to SSH into machine %s with class '%s'.", machine, p.class);
if (p.cid == VMADDR_CID_ANY)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Machine %s has no AF_VSOCK CID assigned.", machine); return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Machine %s has no AF_VSOCK CID assigned.", machine);
return process_vsock_cid(cid, port); return process_vsock_cid(p.cid, port);
} }
static char *startswith_sep(const char *s, const char *prefix) { static char *startswith_sep(const char *s, const char *prefix) {

View File

@ -6,6 +6,7 @@
import os import os
import re import re
import sys import sys
from pathlib import Path
from typing import IO from typing import IO
@ -161,7 +162,10 @@ for dirpath, _, filenames in sorted(os.walk(sys.argv[2])):
for filename in sorted(filenames): for filename in sorted(filenames):
if not filename.endswith('.c') and not filename.endswith('.h'): if not filename.endswith('.c') and not filename.endswith('.h'):
continue continue
with open(os.path.join(dirpath, filename), 'r') as f: p = Path(dirpath) / filename
if p.is_symlink():
continue
with p.open('rt') as f:
process_source_file(f) process_source_file(f)
print(""" {} print(""" {}

View File

@ -149,6 +149,7 @@ simple_tests += files(
'test-mkdir.c', 'test-mkdir.c',
'test-modhex.c', 'test-modhex.c',
'test-mountpoint-util.c', 'test-mountpoint-util.c',
'test-mstack.c',
'test-namespace-util.c', 'test-namespace-util.c',
'test-net-naming-scheme.c', 'test-net-naming-scheme.c',
'test-notify-recv.c', 'test-notify-recv.c',

132
src/test/test-mstack.c Normal file
View File

@ -0,0 +1,132 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include <sys/mount.h>
#include <sys/stat.h>
#include "capability-util.h"
#include "errno-util.h"
#include "fd-util.h"
#include "fs-util.h"
#include "mountpoint-util.h"
#include "mstack.h"
#include "path-util.h"
#include "process-util.h"
#include "rm-rf.h"
#include "tests.h"
#include "tmpfile-util.h"
#include "virt.h"
static bool overlayfs_set_fd_lowerdir_plus_supported(void) {
int r;
_cleanup_close_ int sb_fd = fsopen("overlay", FSOPEN_CLOEXEC);
if (sb_fd < 0 && (ERRNO_IS_NOT_SUPPORTED(errno) || errno == ENODEV))
return false;
ASSERT_OK_ERRNO(sb_fd);
_cleanup_close_ int layer_fd = open("/", O_DIRECTORY|O_CLOEXEC);
ASSERT_OK_ERRNO(layer_fd);
r = RET_NERRNO(fsconfig(sb_fd, FSCONFIG_SET_FD, "lowerdir+", /* value= */ NULL, layer_fd));
if (r < 0 && (ERRNO_IS_NEG_NOT_SUPPORTED(r) || r == -EINVAL))
return false;
ASSERT_OK_ERRNO(r);
return true;
}
TEST(mstack) {
_cleanup_(rm_rf_physical_and_freep) char *t = NULL;
_cleanup_close_ int tfd = -EBADF;
int r;
tfd = mkdtemp_open("/tmp/mstack-what-XXXXXX", O_PATH, &t);
ASSERT_OK(tfd);
ASSERT_OK_ERRNO(mkdirat(tfd, "rw", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "rw/data", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "rw/data/check1", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "layer@0", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "layer@0/check2", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "layer@0/zzz", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "layer@1", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "layer@1/check3", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "layer@0/yyy", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "bind@zzz", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "bind@zzz/check4", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "robind@yyy", 0755));
ASSERT_OK_ERRNO(mkdirat(tfd, "robind@yyy/check5", 0755));
_cleanup_(mstack_freep) MStack *mstack = NULL;
ASSERT_OK(mstack_load(t, tfd, &mstack));
ASSERT_OK_ZERO(mstack_is_read_only(mstack));
ASSERT_OK_ZERO(mstack_is_foreign_uid_owned(mstack));
if (!have_effective_cap(CAP_SYS_ADMIN))
return (void) log_tests_skipped("not attaching mstack, lacking privs");
if (!mount_new_api_supported())
return (void) log_tests_skipped("kernel does not support new mount API, skipping mstack attachment test.");
if (!overlayfs_set_fd_lowerdir_plus_supported())
return (void) log_tests_skipped("overlayfs does not support FSCONFIG_SET_FD with lowerdir+, skipping mstack attachment test.");
if (running_in_chroot() > 0) /* we cannot disable mount prop if we are in a chroot without the root inode being a proper mount point */
return (void) log_tests_skipped("running in chroot(), skipping mstack attachment test.");
mstack = mstack_free(mstack);
/* For with a new mountns */
r = pidref_safe_fork("(mstack-test", FORK_DEATHSIG_SIGTERM|FORK_LOG|FORK_WAIT|FORK_NEW_MOUNTNS|FORK_MOUNTNS_SLAVE, /* ret= */ NULL);
ASSERT_OK(r);
if (r == 0) {
MStackFlags flags = 0;
/* Close the original temporary fd, it still points to an inode of the original mountns,
* which we cannot use to generate mounts from */
tfd = safe_close(tfd);
{
ASSERT_OK(mstack_load(t, -EBADF, &mstack));
ASSERT_OK(mstack_open_images(
mstack,
/* mountfsd_link= */ NULL,
/* userns_fd= */ -EBADF,
/* image_policy= */ NULL,
/* image_filter= */ NULL,
flags));
_cleanup_(rmdir_and_freep) char *m = NULL;
ASSERT_OK(mkdtemp_malloc("/tmp/mstack-temporary-XXXXXX", &m));
ASSERT_OK(mstack_make_mounts(mstack, m, flags));
_cleanup_(rmdir_and_freep) char *w = NULL;
ASSERT_OK(mkdtemp_malloc("/tmp/mstack-where-XXXXXX", &w));
_cleanup_close_ int rfd = -EBADF;
ASSERT_OK(mstack_bind_mounts(mstack, w, /* where_fd= */ -EBADF, flags, &rfd));
_cleanup_close_ int ofd = open(w, O_PATH|O_CLOEXEC);
ASSERT_OK_ERRNO(ofd);
ASSERT_OK_ERRNO(faccessat(ofd, "check1", F_OK, AT_SYMLINK_NOFOLLOW));
ASSERT_OK_ERRNO(faccessat(ofd, "check2/", F_OK, AT_SYMLINK_NOFOLLOW));
ASSERT_OK_ERRNO(faccessat(ofd, "check3/", F_OK, AT_SYMLINK_NOFOLLOW));
ASSERT_OK_ERRNO(faccessat(ofd, "zzz/check4/", F_OK, AT_SYMLINK_NOFOLLOW));
ASSERT_OK_ERRNO(faccessat(ofd, "yyy/check5/", F_OK, AT_SYMLINK_NOFOLLOW));
_cleanup_free_ char *j = ASSERT_PTR(path_join(w, "zzz"));
ASSERT_OK_ERRNO(umount2(j, MNT_DETACH));
_cleanup_free_ char *jj = ASSERT_PTR(path_join(w, "yyy"));
ASSERT_OK_ERRNO(umount2(jj, MNT_DETACH));
ASSERT_OK_ERRNO(umount2(w, MNT_DETACH));
}
mstack = mstack_free(mstack);
_exit(EXIT_SUCCESS);
}
}
DEFINE_TEST_MAIN(LOG_INFO);

View File

@ -198,7 +198,6 @@ TEST(protect_kernel_logs) {
static const NamespaceParameters p = { static const NamespaceParameters p = {
.runtime_scope = RUNTIME_SCOPE_SYSTEM, .runtime_scope = RUNTIME_SCOPE_SYSTEM,
.protect_kernel_logs = true, .protect_kernel_logs = true,
.root_directory_fd = -EBADF,
}; };
int r; int r;

View File

@ -1,5 +1,6 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */ /* SPDX-License-Identifier: LGPL-2.1-or-later */
#include <fcntl.h>
#include <stdlib.h> #include <stdlib.h>
#include <unistd.h> #include <unistd.h>
@ -77,11 +78,20 @@ int main(int argc, char *argv[]) {
else else
log_info("Not chrooted"); log_info("Not chrooted");
_cleanup_(pinned_resource_done) PinnedResource pr = PINNED_RESOURCE_NULL;
if (root_directory) {
pr.directory_fd = open(root_directory, O_PATH|O_CLOEXEC|O_DIRECTORY);
assert_se(pr.directory_fd >= 0);
pr.directory = strdup(root_directory);
assert_se(pr.directory);
}
NamespaceParameters p = { NamespaceParameters p = {
.runtime_scope = RUNTIME_SCOPE_SYSTEM, .runtime_scope = RUNTIME_SCOPE_SYSTEM,
.root_directory = root_directory, .rootfs = &pr,
.root_directory_fd = -EBADF,
.read_write_paths = (char**) writable, .read_write_paths = (char**) writable,
.read_only_paths = (char**) readonly, .read_only_paths = (char**) readonly,

View File

@ -2,6 +2,8 @@
#include <net/if.h> #include <net/if.h>
#include "sd-varlink.h"
#include "errno-util.h" #include "errno-util.h"
#include "fd-util.h" #include "fd-util.h"
#include "namespace-util.h" #include "namespace-util.h"
@ -16,13 +18,18 @@ TEST(delegatetap) {
return (void) log_tests_skipped_errno(userns_fd, "User namespaces not available"); return (void) log_tests_skipped_errno(userns_fd, "User namespaces not available");
ASSERT_OK(userns_fd); ASSERT_OK(userns_fd);
r = nsresource_register_userns("foobar", userns_fd); _cleanup_(sd_varlink_unrefp) sd_varlink *link = NULL;
r = nsresource_connect(&link);
if (ERRNO_IS_NEG_DISCONNECT(r) || r == -ENOENT || ERRNO_IS_NEG_NOT_SUPPORTED(r)) if (ERRNO_IS_NEG_DISCONNECT(r) || r == -ENOENT || ERRNO_IS_NEG_NOT_SUPPORTED(r))
return (void) log_tests_skipped_errno(r, "systemd-nsresourced cannot be reached"); return (void) log_tests_skipped_errno(r, "systemd-nsresourced cannot be reached");
r = nsresource_register_userns(link, "foobar", userns_fd);
if (ERRNO_IS_NEG_NOT_SUPPORTED(r))
return (void) log_tests_skipped_errno(r, "systemd-nsresourced does not work");
ASSERT_OK(r); ASSERT_OK(r);
_cleanup_free_ char *ifname = NULL; _cleanup_free_ char *ifname = NULL;
_cleanup_close_ int tap_fd = nsresource_add_netif_tap(userns_fd, &ifname); _cleanup_close_ int tap_fd = nsresource_add_netif_tap(link, userns_fd, &ifname);
if (ERRNO_IS_NEG_NOT_SUPPORTED(tap_fd)) if (ERRNO_IS_NEG_NOT_SUPPORTED(tap_fd))
return (void) log_tests_skipped_errno(tap_fd, "tap device support not available"); return (void) log_tests_skipped_errno(tap_fd, "tap device support not available");
ASSERT_OK(tap_fd); ASSERT_OK(tap_fd);

View File

@ -12,6 +12,7 @@
#include "sd-daemon.h" #include "sd-daemon.h"
#include "sd-event.h" #include "sd-event.h"
#include "sd-id128.h" #include "sd-id128.h"
#include "sd-varlink.h"
#include "alloc-util.h" #include "alloc-util.h"
#include "architecture.h" #include "architecture.h"
@ -2041,11 +2042,16 @@ static int run_virtual_machine(int kvm_device_fd, int vhost_device_fd) {
if (asprintf(&userns_name, "vmspawn-" PID_FMT "-%s", getpid_cached(), arg_machine) < 0) if (asprintf(&userns_name, "vmspawn-" PID_FMT "-%s", getpid_cached(), arg_machine) < 0)
return log_oom(); return log_oom();
r = nsresource_register_userns(userns_name, delegate_userns_fd); _cleanup_(sd_varlink_unrefp) sd_varlink *nsresource_link = NULL;
r = nsresource_connect(&nsresource_link);
if (r < 0)
return log_error_errno(r, "Failed to connect to nsresourced: %m");
r = nsresource_register_userns(nsresource_link, userns_name, delegate_userns_fd);
if (r < 0) if (r < 0)
return log_error_errno(r, "Failed to register user namespace with systemd-nsresourced: %m"); return log_error_errno(r, "Failed to register user namespace with systemd-nsresourced: %m");
tap_fd = nsresource_add_netif_tap(delegate_userns_fd, /* ret_host_ifname= */ NULL); tap_fd = nsresource_add_netif_tap(nsresource_link, delegate_userns_fd, /* ret_host_ifname= */ NULL);
if (tap_fd < 0) if (tap_fd < 0)
return log_error_errno(tap_fd, "Failed to allocate network tap device: %m"); return log_error_errno(tap_fd, "Failed to allocate network tap device: %m");

View File

@ -88,14 +88,14 @@ testcase_multiple_features() {
-p PrivateTmp=yes \ -p PrivateTmp=yes \
-p PrivateDevices=yes \ -p PrivateDevices=yes \
-p PrivateNetwork=yes \ -p PrivateNetwork=yes \
-p PrivateUsersEx=self \ -p PrivateUsers=self \
-p PrivateIPC=yes \ -p PrivateIPC=yes \
-p ProtectHostname=yes \ -p ProtectHostname=yes \
-p ProtectClock=yes \ -p ProtectClock=yes \
-p ProtectKernelTunables=yes \ -p ProtectKernelTunables=yes \
-p ProtectKernelModules=yes \ -p ProtectKernelModules=yes \
-p ProtectKernelLogs=yes \ -p ProtectKernelLogs=yes \
-p ProtectControlGroupsEx=private \ -p ProtectControlGroups=private \
-p LockPersonality=yes \ -p LockPersonality=yes \
-p Environment=ABC=QED \ -p Environment=ABC=QED \
-p DelegateNamespaces=yes \ -p DelegateNamespaces=yes \

View File

@ -0,0 +1,145 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: LGPL-2.1-or-later
# shellcheck disable=SC2016
set -eux
set -o pipefail
# shellcheck source=test/units/util.sh
. "$(dirname "$0")"/util.sh
if ! can_do_rootless_nspawn; then
echo "Skipping unpriv nspawn test"
exit 0
fi
# We need FSCONFIG_SET_FD support in overlayfs for .mstack to work. Let's skip
# this test on old kernels, that didn't have that yet. Ideally we'd check for
# the feature itself here, but I couldn't figure out a nice way to detect
# support for this from shell, hence let's do a version check instead.
if systemd-analyze condition 'ConditionVersion= < 6.13' ; then
echo "Kernel too old for FSCONFIG_SET_FD support on overlayfs, skipping pull-oci test".
exit 0
fi
export SYSTEMD_LOG_LEVEL=debug
export SYSTEMD_LOG_TARGET=journal
at_exit() {
rm -rf /var/tmp/pull-oci-test
rm -rf /home/testuser/.local/state/machines/ocibasic
rm -rf /home/testuser/.local/state/machines/ocilayer
}
trap at_exit EXIT
# Install a PK rule that allows 'testuser' user to register a machine even
# though they are not on an fg console, just for testing
mkdir -p /etc/polkit-1/rules.d
cat >/etc/polkit-1/rules.d/registermachinetest.rules <<'EOF'
polkit.addRule(function(action, subject) {
if (action.id == "org.freedesktop.machine1.register-machine" &&
subject.user == "testuser") {
return polkit.Result.YES;
}
});
EOF
run0 -u testuser mkdir -p .local/state/machines
create_dummy_container /home/testuser/.local/state/machines/ocibasic
cat >/home/testuser/.local/state/machines/ocibasic/sbin/init <<EOF
#!/usr/bin/env bash
cat /etc/waldo
EOF
chmod +x /home/testuser/.local/state/machines/ocibasic/sbin/init
systemd-dissect --shift /home/testuser/.local/state/machines/ocibasic foreign
run0 -u testuser mkdir -p .local/state/machines/ocilayer/etc
cat >home/testuser/.local/state/machines/ocilayer/etc/waldo <<EOF
luftikus
EOF
systemd-dissect --shift /home/testuser/.local/state/machines/ocilayer foreign
loginctl enable-linger testuser
mkdir -p /var/tmp/pull-oci-test
run0 --pipe -u testuser importctl -m --user export-tar --format=gzip ocibasic - >/var/tmp/pull-oci-test/ocibasic.tar.gz
run0 --pipe -u testuser importctl -m --user export-tar --format=gzip ocilayer - >/var/tmp/pull-oci-test/ocilayer.tar.gz
OCIBASIC_SHA256="$(sha256sum /var/tmp/pull-oci-test/ocibasic.tar.gz | cut -d' ' -f1)"
OCIBASIC_SIZE="$(stat -c %s /var/tmp/pull-oci-test/ocibasic.tar.gz)"
OCILAYER_SHA256="$(sha256sum /var/tmp/pull-oci-test/ocilayer.tar.gz | cut -d' ' -f1)"
OCILAYER_SIZE="$(stat -c %s /var/tmp/pull-oci-test/ocilayer.tar.gz)"
# Let's now put together a simple, fake, static OCI registry that sits on
# file:// rather than https://, so that we don't have to spawn an HTTP
# server. After all we don't want to test the server side code, but only the
# client side code, and libcurl nicely abstracts https:// or ftp:// from us.
mkdir -p /var/tmp/pull-oci-test/v2/ocicombo/manifests
cat >/var/tmp/pull-oci-test/v2/ocicombo/manifests/latest <<EOF
{
"schemaVersion":2,
"mediaType":"application/vnd.oci.image.manifest.v1+json",
"layers":[
{
"mediaType" : "application/vnd.oci.image.layer.v1.tar+gzip",
"digest" : "sha256:$OCIBASIC_SHA256",
"size" : $OCIBASIC_SIZE
},
{
"mediaType" : "application/vnd.oci.image.layer.v1.tar+gzip",
"digest" : "sha256:$OCILAYER_SHA256",
"size" : $OCILAYER_SIZE
}
]
}
EOF
cat /var/tmp/pull-oci-test/v2/ocicombo/manifests/latest
jq < /var/tmp/pull-oci-test/v2/ocicombo/manifests/latest
cat > /usr/lib/systemd/oci-registry/registry.localfile.oci-registry <<EOF
{
"defaultProtocol" : "file",
"overrideRegistry" : "/var/tmp/pull-oci-test"
}
EOF
cat /usr/lib/systemd/oci-registry/registry.localfile.oci-registry
jq < /usr/lib/systemd/oci-registry/registry.localfile.oci-registry
mkdir /var/tmp/pull-oci-test/v2/ocicombo/blobs
ln -s /var/tmp/pull-oci-test/ocibasic.tar.gz /var/tmp/pull-oci-test/v2/ocicombo/blobs/sha256:"$OCIBASIC_SHA256"
ln -s /var/tmp/pull-oci-test/ocilayer.tar.gz /var/tmp/pull-oci-test/v2/ocicombo/blobs/sha256:"$OCILAYER_SHA256"
run0 -u testuser importctl -m --user pull-oci localfile/ocicombo:latest
ls -alR /home/testuser/.local/state/machines/ocicombo.mstack
run0 -u testuser systemd-mstack /home/testuser/.local/state/machines/ocicombo.mstack
systemd-mstack -M --read-only /home/testuser/.local/state/machines/ocicombo.mstack /tmp/ooo
test "$(cat /tmp/ooo/etc/waldo)" = "luftikus"
systemd-mstack -U /tmp/ooo
ls -alR /home/testuser/.local/state/machines/ocicombo.mstack
run0 -u testuser importctl list-images --user | grep -q ocicombo
ls -alR /home/testuser/.local/state/machines/ocicombo.mstack
run0 -u testuser systemd-nspawn -q --pipe -M ocicombo /sbin/init | grep -q luftikus
run0 -u testuser --pipe systemd-run -q --unit=fimpel --user -p PrivateUsers=managed -p RootMStack=/home/testuser/.local/state/machines/ocicombo.mstack --pipe /sbin/init | grep -q luftikus
run0 -u testuser machinectl list-images -a --user
run0 -u testuser machinectl --user remove ocibasic
run0 -u testuser machinectl --user remove ocilayer
run0 -u testuser machinectl --user remove ocicombo
run0 -u testuser machinectl --user remove .oci-sha256:"$OCIBASIC_SHA256"
run0 -u testuser machinectl --user remove .oci-sha256:"$OCILAYER_SHA256"
loginctl disable-linger testuser

View File

@ -20,6 +20,7 @@
# ELF dynamic relocations at runtime. # ELF dynamic relocations at runtime.
# pylint: disable=attribute-defined-outside-init # pylint: disable=attribute-defined-outside-init
# mypy: untyped-calls-exclude=elftools
import argparse import argparse
import hashlib import hashlib
@ -39,19 +40,8 @@ from ctypes import (
sizeof, sizeof,
) )
from elftools.elf.constants import SH_FLAGS from elftools import elf
from elftools.elf.elffile import ELFFile from elftools.elf import elffile
from elftools.elf.enums import (
ENUM_DT_FLAGS_1,
ENUM_RELOC_TYPE_AARCH64,
ENUM_RELOC_TYPE_ARM,
ENUM_RELOC_TYPE_i386,
ENUM_RELOC_TYPE_x64,
)
from elftools.elf.relocation import (
Relocation as ElfRelocation,
RelocationTable as ElfRelocationTable,
)
class PeCoffHeader(LittleEndianStructure): class PeCoffHeader(LittleEndianStructure):
@ -81,7 +71,7 @@ class PeRelocationBlock(LittleEndianStructure):
def __init__(self, PageRVA: int): def __init__(self, PageRVA: int):
super().__init__(PageRVA) super().__init__(PageRVA)
self.entries: typing.List[PeRelocationEntry] = [] self.entries: list[PeRelocationEntry] = []
class PeRelocationEntry(LittleEndianStructure): class PeRelocationEntry(LittleEndianStructure):
@ -193,7 +183,7 @@ class PeSection(LittleEndianStructure):
("Characteristics", c_uint32), ("Characteristics", c_uint32),
) )
def __init__(self): def __init__(self) -> None:
super().__init__() super().__init__()
self.data = bytearray() self.data = bytearray()
@ -248,7 +238,7 @@ def align_down(x: int, align: int) -> int:
return x & ~(align - 1) return x & ~(align - 1)
def next_section_address(sections: typing.List[PeSection]) -> int: def next_section_address(sections: list[PeSection]) -> int:
return align_to(sections[-1].VirtualAddress + sections[-1].VirtualSize, return align_to(sections[-1].VirtualAddress + sections[-1].VirtualSize,
SECTION_ALIGNMENT) SECTION_ALIGNMENT)
@ -257,7 +247,7 @@ class BadSectionError(ValueError):
"One of the sections is in a bad state" "One of the sections is in a bad state"
def iter_copy_sections(elf: ELFFile) -> typing.Iterator[PeSection]: def iter_copy_sections(file: elffile.ELFFile) -> typing.Iterator[PeSection]:
pe_s = None pe_s = None
# This is essentially the same as copying by ELF load segments, except that we assemble them # This is essentially the same as copying by ELF load segments, except that we assemble them
@ -265,16 +255,16 @@ def iter_copy_sections(elf: ELFFile) -> typing.Iterator[PeSection]:
# about so that there are no surprises. # about so that there are no surprises.
relro = None relro = None
for elf_seg in elf.iter_segments(): for elf_seg in file.iter_segments():
if elf_seg["p_type"] == "PT_LOAD" and elf_seg["p_align"] != SECTION_ALIGNMENT: if elf_seg["p_type"] == "PT_LOAD" and elf_seg["p_align"] != SECTION_ALIGNMENT:
raise BadSectionError(f"ELF segment {elf_seg['p_type']} is not properly aligned" raise BadSectionError(f"ELF segment {elf_seg['p_type']} is not properly aligned"
f" ({elf_seg['p_align']} != {SECTION_ALIGNMENT})") f" ({elf_seg['p_align']} != {SECTION_ALIGNMENT})")
if elf_seg["p_type"] == "PT_GNU_RELRO": if elf_seg["p_type"] == "PT_GNU_RELRO":
relro = elf_seg relro = elf_seg
for elf_s in elf.iter_sections(): for elf_s in file.iter_sections():
if ( if (
elf_s["sh_flags"] & SH_FLAGS.SHF_ALLOC == 0 elf_s["sh_flags"] & elf.constants.SH_FLAGS.SHF_ALLOC == 0
or elf_s["sh_type"] in IGNORE_SECTION_TYPES or elf_s["sh_type"] in IGNORE_SECTION_TYPES
or elf_s.name in IGNORE_SECTIONS or elf_s.name in IGNORE_SECTIONS
or elf_s["sh_size"] == 0 or elf_s["sh_size"] == 0
@ -286,9 +276,9 @@ def iter_copy_sections(elf: ELFFile) -> typing.Iterator[PeSection]:
# FIXME: figure out why those sections are inserted # FIXME: figure out why those sections are inserted
print("WARNING: Non-empty .got section", file=sys.stderr) print("WARNING: Non-empty .got section", file=sys.stderr)
if elf_s["sh_flags"] & SH_FLAGS.SHF_EXECINSTR: if elf_s["sh_flags"] & elf.constants.SH_FLAGS.SHF_EXECINSTR:
rwx = PE_CHARACTERISTICS_RX rwx = PE_CHARACTERISTICS_RX
elif elf_s["sh_flags"] & SH_FLAGS.SHF_WRITE: elif elf_s["sh_flags"] & elf.constants.SH_FLAGS.SHF_WRITE:
rwx = PE_CHARACTERISTICS_RW rwx = PE_CHARACTERISTICS_RW
else: else:
rwx = PE_CHARACTERISTICS_R rwx = PE_CHARACTERISTICS_R
@ -315,11 +305,14 @@ def iter_copy_sections(elf: ELFFile) -> typing.Iterator[PeSection]:
yield pe_s yield pe_s
def convert_sections(elf: ELFFile, opt: PeOptionalHeader) -> typing.List[PeSection]: def convert_sections(
file: elffile.ELFFile,
opt: PeOptionalHeader,
) -> list[PeSection]:
last_vma = (0, 0) last_vma = (0, 0)
sections = [] sections = []
for pe_s in iter_copy_sections(elf): for pe_s in iter_copy_sections(file):
# Truncate the VMA to the nearest page and insert appropriate padding. This should not # Truncate the VMA to the nearest page and insert appropriate padding. This should not
# cause any overlap as this is pretty much how ELF *segments* are loaded/mmapped anyways. # cause any overlap as this is pretty much how ELF *segments* are loaded/mmapped anyways.
# The ELF sections inside should also be properly aligned as we reuse the ELF VMA layout # The ELF sections inside should also be properly aligned as we reuse the ELF VMA layout
@ -357,18 +350,18 @@ def convert_sections(elf: ELFFile, opt: PeOptionalHeader) -> typing.List[PeSecti
def copy_sections( def copy_sections(
elf: ELFFile, file: elffile.ELFFile,
opt: PeOptionalHeader, opt: PeOptionalHeader,
input_names: str, input_names: str,
sections: typing.List[PeSection], sections: list[PeSection],
): ) -> None:
for name in input_names.split(","): for name in input_names.split(","):
elf_s = elf.get_section_by_name(name) elf_s = file.get_section_by_name(name)
if not elf_s: if not elf_s:
continue continue
if elf_s.data_alignment > 1 and SECTION_ALIGNMENT % elf_s.data_alignment != 0: if elf_s.data_alignment > 1 and SECTION_ALIGNMENT % elf_s.data_alignment != 0:
raise BadSectionError(f"ELF section {name} is not aligned") raise BadSectionError(f"ELF section {name} is not aligned")
if elf_s["sh_flags"] & (SH_FLAGS.SHF_EXECINSTR | SH_FLAGS.SHF_WRITE) != 0: if elf_s["sh_flags"] & (elf.constants.SH_FLAGS.SHF_EXECINSTR | elf.constants.SH_FLAGS.SHF_WRITE) != 0:
raise BadSectionError(f"ELF section {name} is not read-only data") raise BadSectionError(f"ELF section {name} is not read-only data")
pe_s = PeSection() pe_s = PeSection()
@ -383,11 +376,11 @@ def copy_sections(
def apply_elf_relative_relocation( def apply_elf_relative_relocation(
reloc: ElfRelocation, reloc: elf.relocation.Relocation,
image_base: int, image_base: int,
sections: typing.List[PeSection], sections: list[PeSection],
addend_size: int, addend_size: int,
): ) -> None:
[target] = [pe_s for pe_s in sections [target] = [pe_s for pe_s in sections
if pe_s.VirtualAddress <= reloc["r_offset"] < pe_s.VirtualAddress + len(pe_s.data)] if pe_s.VirtualAddress <= reloc["r_offset"] < pe_s.VirtualAddress + len(pe_s.data)]
@ -404,29 +397,29 @@ def apply_elf_relative_relocation(
def convert_elf_reloc_table( def convert_elf_reloc_table(
elf: ELFFile, file: elffile.ELFFile,
elf_reloc_table: ElfRelocationTable, elf_reloc_table: elf.relocation.RelocationTable,
elf_image_base: int, elf_image_base: int,
sections: typing.List[PeSection], sections: list[PeSection],
pe_reloc_blocks: typing.Dict[int, PeRelocationBlock], pe_reloc_blocks: dict[int, PeRelocationBlock],
): ) -> None:
NONE_RELOC = { NONE_RELOC = {
"EM_386": ENUM_RELOC_TYPE_i386["R_386_NONE"], "EM_386": elf.enums.ENUM_RELOC_TYPE_i386["R_386_NONE"],
"EM_AARCH64": ENUM_RELOC_TYPE_AARCH64["R_AARCH64_NONE"], "EM_AARCH64": elf.enums.ENUM_RELOC_TYPE_AARCH64["R_AARCH64_NONE"],
"EM_ARM": ENUM_RELOC_TYPE_ARM["R_ARM_NONE"], "EM_ARM": elf.enums.ENUM_RELOC_TYPE_ARM["R_ARM_NONE"],
"EM_LOONGARCH": 0, "EM_LOONGARCH": 0,
"EM_RISCV": 0, "EM_RISCV": 0,
"EM_X86_64": ENUM_RELOC_TYPE_x64["R_X86_64_NONE"], "EM_X86_64": elf.enums.ENUM_RELOC_TYPE_x64["R_X86_64_NONE"],
}[elf["e_machine"]] }[file["e_machine"]]
RELATIVE_RELOC = { RELATIVE_RELOC = {
"EM_386": ENUM_RELOC_TYPE_i386["R_386_RELATIVE"], "EM_386": elf.enums.ENUM_RELOC_TYPE_i386["R_386_RELATIVE"],
"EM_AARCH64": ENUM_RELOC_TYPE_AARCH64["R_AARCH64_RELATIVE"], "EM_AARCH64": elf.enums.ENUM_RELOC_TYPE_AARCH64["R_AARCH64_RELATIVE"],
"EM_ARM": ENUM_RELOC_TYPE_ARM["R_ARM_RELATIVE"], "EM_ARM": elf.enums.ENUM_RELOC_TYPE_ARM["R_ARM_RELATIVE"],
"EM_LOONGARCH": 3, "EM_LOONGARCH": 3,
"EM_RISCV": 3, "EM_RISCV": 3,
"EM_X86_64": ENUM_RELOC_TYPE_x64["R_X86_64_RELATIVE"], "EM_X86_64": elf.enums.ENUM_RELOC_TYPE_x64["R_X86_64_RELATIVE"],
}[elf["e_machine"]] }[file["e_machine"]]
for reloc in elf_reloc_table.iter_relocations(): for reloc in elf_reloc_table.iter_relocations():
if reloc["r_info_type"] == NONE_RELOC: if reloc["r_info_type"] == NONE_RELOC:
@ -436,7 +429,7 @@ def convert_elf_reloc_table(
apply_elf_relative_relocation(reloc, apply_elf_relative_relocation(reloc,
elf_image_base, elf_image_base,
sections, sections,
elf.elfclass // 8) file.elfclass // 8)
# Now that the ELF relocation has been applied, we can create a PE relocation. # Now that the ELF relocation has been applied, we can create a PE relocation.
block_rva = reloc["r_offset"] & ~0xFFF block_rva = reloc["r_offset"] & ~0xFFF
@ -446,7 +439,7 @@ def convert_elf_reloc_table(
entry = PeRelocationEntry() entry = PeRelocationEntry()
entry.Offset = reloc["r_offset"] & 0xFFF entry.Offset = reloc["r_offset"] & 0xFFF
# REL_BASED_HIGHLOW or REL_BASED_DIR64 # REL_BASED_HIGHLOW or REL_BASED_DIR64
entry.Type = 3 if elf.elfclass == 32 else 10 entry.Type = 3 if file.elfclass == 32 else 10
pe_reloc_blocks[block_rva].entries.append(entry) pe_reloc_blocks[block_rva].entries.append(entry)
continue continue
@ -455,21 +448,21 @@ def convert_elf_reloc_table(
def convert_elf_relocations( def convert_elf_relocations(
elf: ELFFile, file: elffile.ELFFile,
opt: PeOptionalHeader, opt: PeOptionalHeader,
sections: typing.List[PeSection], sections: list[PeSection],
minimum_sections: int, minimum_sections: int,
) -> typing.Optional[PeSection]: ) -> typing.Optional[PeSection]:
dynamic = elf.get_section_by_name(".dynamic") dynamic = file.get_section_by_name(".dynamic")
if dynamic is None: if dynamic is None:
raise BadSectionError("ELF .dynamic section is missing") raise BadSectionError("ELF .dynamic section is missing")
[flags_tag] = dynamic.iter_tags("DT_FLAGS_1") [flags_tag] = dynamic.iter_tags("DT_FLAGS_1")
if not flags_tag["d_val"] & ENUM_DT_FLAGS_1["DF_1_PIE"]: if not flags_tag["d_val"] & elf.enums.ENUM_DT_FLAGS_1["DF_1_PIE"]:
raise ValueError("ELF file is not a PIE") raise ValueError("ELF file is not a PIE")
# This checks that the ELF image base is 0. # This checks that the ELF image base is 0.
symtab = elf.get_section_by_name(".symtab") symtab = file.get_section_by_name(".symtab")
if symtab: if symtab:
exe_start = symtab.get_symbol_by_name("__executable_start") exe_start = symtab.get_symbol_by_name("__executable_start")
if exe_start and exe_start[0]["st_value"] != 0: if exe_start and exe_start[0]["st_value"] != 0:
@ -492,16 +485,16 @@ def convert_elf_relocations(
segment_offset = align_to(opt.SizeOfHeaders - sections[0].VirtualAddress, segment_offset = align_to(opt.SizeOfHeaders - sections[0].VirtualAddress,
SECTION_ALIGNMENT) SECTION_ALIGNMENT)
opt.AddressOfEntryPoint = elf["e_entry"] + segment_offset opt.AddressOfEntryPoint = file["e_entry"] + segment_offset
opt.BaseOfCode += segment_offset opt.BaseOfCode += segment_offset
if isinstance(opt, PeOptionalHeader32): if isinstance(opt, PeOptionalHeader32):
opt.BaseOfData += segment_offset opt.BaseOfData += segment_offset
pe_reloc_blocks: typing.Dict[int, PeRelocationBlock] = {} pe_reloc_blocks: dict[int, PeRelocationBlock] = {}
for reloc_type, reloc_table in dynamic.get_relocation_tables().items(): for reloc_type, reloc_table in dynamic.get_relocation_tables().items():
if reloc_type not in ["REL", "RELA"]: if reloc_type not in ["REL", "RELA"]:
raise BadSectionError(f"Unsupported relocation type {reloc_type}") raise BadSectionError(f"Unsupported relocation type {reloc_type}")
convert_elf_reloc_table(elf, convert_elf_reloc_table(file,
reloc_table, reloc_table,
opt.ImageBase + segment_offset, opt.ImageBase + segment_offset,
sections, sections,
@ -545,11 +538,11 @@ def convert_elf_relocations(
def write_pe( def write_pe(
file, file: typing.IO[bytes],
coff: PeCoffHeader, coff: PeCoffHeader,
opt: PeOptionalHeader, opt: PeOptionalHeader,
sections: typing.List[PeSection], sections: list[PeSection],
): ) -> None:
file.write(b"MZ") file.write(b"MZ")
file.seek(0x3C, io.SEEK_SET) file.seek(0x3C, io.SEEK_SET)
file.write(PE_OFFSET.to_bytes(2, byteorder="little")) file.write(PE_OFFSET.to_bytes(2, byteorder="little"))
@ -577,38 +570,38 @@ def write_pe(
file.truncate(offset) file.truncate(offset)
def elf2efi(args: argparse.Namespace): def elf2efi(args: argparse.Namespace) -> None:
elf = ELFFile(args.ELF) file = elffile.ELFFile(args.ELF)
if not elf.little_endian: if not file.little_endian:
raise ValueError("ELF file is not little-endian") raise ValueError("ELF file is not little-endian")
if elf["e_type"] not in ["ET_DYN", "ET_EXEC"]: if file["e_type"] not in ["ET_DYN", "ET_EXEC"]:
raise ValueError(f"Unsupported ELF type {elf['e_type']}") raise ValueError(f"Unsupported ELF type {file['e_type']}")
pe_arch = { pe_arch = {
"EM_386": 0x014C, "EM_386": 0x014C,
"EM_AARCH64": 0xAA64, "EM_AARCH64": 0xAA64,
"EM_ARM": 0x01C2, "EM_ARM": 0x01C2,
"EM_LOONGARCH": 0x6232 if elf.elfclass == 32 else 0x6264, "EM_LOONGARCH": 0x6232 if file.elfclass == 32 else 0x6264,
"EM_RISCV": 0x5032 if elf.elfclass == 32 else 0x5064, "EM_RISCV": 0x5032 if file.elfclass == 32 else 0x5064,
"EM_X86_64": 0x8664, "EM_X86_64": 0x8664,
}.get(elf["e_machine"]) }.get(file["e_machine"])
if pe_arch is None: if pe_arch is None:
raise ValueError(f"Unsupported ELF architecture {elf['e_machine']}") raise ValueError(f"Unsupported ELF architecture {file['e_machine']}")
coff = PeCoffHeader() coff = PeCoffHeader()
opt = PeOptionalHeader32() if elf.elfclass == 32 else PeOptionalHeader32Plus() opt = PeOptionalHeader32() if file.elfclass == 32 else PeOptionalHeader32Plus()
# We relocate to a unique image base to reduce the chances for runtime relocation to occur. # We relocate to a unique image base to reduce the chances for runtime relocation to occur.
base_name = pathlib.Path(args.PE.name).name.encode() base_name = pathlib.Path(args.PE.name).name.encode()
opt.ImageBase = int(hashlib.sha1(base_name).hexdigest()[0:8], 16) opt.ImageBase = int(hashlib.sha1(base_name).hexdigest()[0:8], 16)
if elf.elfclass == 32: if file.elfclass == 32:
opt.ImageBase = (0x400000 + opt.ImageBase) & 0xFFFF0000 opt.ImageBase = (0x400000 + opt.ImageBase) & 0xFFFF0000
else: else:
opt.ImageBase = (0x100000000 + opt.ImageBase) & 0x1FFFF0000 opt.ImageBase = (0x100000000 + opt.ImageBase) & 0x1FFFF0000
sections = convert_sections(elf, opt) sections = convert_sections(file, opt)
copy_sections(elf, opt, args.copy_sections, sections) copy_sections(file, opt, args.copy_sections, sections)
pe_reloc_s = convert_elf_relocations(elf, opt, sections, args.minimum_sections) pe_reloc_s = convert_elf_relocations(file, opt, sections, args.minimum_sections)
coff.Machine = pe_arch coff.Machine = pe_arch
coff.NumberOfSections = len(sections) coff.NumberOfSections = len(sections)
@ -616,7 +609,7 @@ def elf2efi(args: argparse.Namespace):
coff.SizeOfOptionalHeader = sizeof(opt) coff.SizeOfOptionalHeader = sizeof(opt)
# EXECUTABLE_IMAGE|LINE_NUMS_STRIPPED|LOCAL_SYMS_STRIPPED|DEBUG_STRIPPED # EXECUTABLE_IMAGE|LINE_NUMS_STRIPPED|LOCAL_SYMS_STRIPPED|DEBUG_STRIPPED
# and (32BIT_MACHINE or LARGE_ADDRESS_AWARE) # and (32BIT_MACHINE or LARGE_ADDRESS_AWARE)
coff.Characteristics = 0x30E if elf.elfclass == 32 else 0x22E coff.Characteristics = 0x30E if file.elfclass == 32 else 0x22E
opt.SectionAlignment = SECTION_ALIGNMENT opt.SectionAlignment = SECTION_ALIGNMENT
opt.FileAlignment = FILE_ALIGNMENT opt.FileAlignment = FILE_ALIGNMENT
@ -625,11 +618,11 @@ def elf2efi(args: argparse.Namespace):
opt.MajorSubsystemVersion = args.efi_major opt.MajorSubsystemVersion = args.efi_major
opt.MinorSubsystemVersion = args.efi_minor opt.MinorSubsystemVersion = args.efi_minor
opt.Subsystem = args.subsystem opt.Subsystem = args.subsystem
opt.Magic = 0x10B if elf.elfclass == 32 else 0x20B opt.Magic = 0x10B if file.elfclass == 32 else 0x20B
opt.SizeOfImage = next_section_address(sections) opt.SizeOfImage = next_section_address(sections)
# DYNAMIC_BASE|NX_COMPAT|HIGH_ENTROPY_VA or DYNAMIC_BASE|NX_COMPAT # DYNAMIC_BASE|NX_COMPAT|HIGH_ENTROPY_VA or DYNAMIC_BASE|NX_COMPAT
opt.DllCharacteristics = 0x160 if elf.elfclass == 64 else 0x140 opt.DllCharacteristics = 0x160 if file.elfclass == 64 else 0x140
# These values are taken from a natively built PE binary (although, unused by EDK2/EFI). # These values are taken from a natively built PE binary (although, unused by EDK2/EFI).
opt.SizeOfStackReserve = 0x100000 opt.SizeOfStackReserve = 0x100000
@ -703,7 +696,7 @@ def create_parser() -> argparse.ArgumentParser:
return parser return parser
def main(): def main() -> None:
parser = create_parser() parser = create_parser()
elf2efi(parser.parse_args()) elf2efi(parser.parse_args())

View File

@ -1,85 +0,0 @@
#!/bin/bash
# SPDX-License-Identifier: LGPL-2.1-or-later
# Usage:
# tools/setup-musl-build.sh <build-directory> <options…>
# E.g.
# tools/setup-musl-build.sh build-musl -Dbuildtype=debugoptimized && ninja -C build-musl
set -eux
BUILD_DIR="${1:?}"
shift
SETUP_DIR="${BUILD_DIR}/extra"
LINKS=(
acl
archive.h
archive_entry.h
asm
asm-generic
audit-records.h
audit_logging.h
bpf
bzlib.h
curl
dwarf.h
elfutils
fido.h
gcrypt.h
gelf.h
gnutls
gpg-error.h
idn2.h
libaudit.h
libcryptsetup.h
libelf.h
libkmod.h
linux
lz4.h
lz4frame.h
lz4hc.h
lzma
lzma.h
microhttpd.h
mtd
openssl
pcre2.h
pwquality.h
qrencode.h
seccomp-syscalls.h
seccomp.h
security
selinux
sys/acl.h
tss2
xen
xkbcommon
zconf.h
zlib.h
zstd.h
zstd_errors.h
)
rm -rf "${SETUP_DIR}"
for t in "${LINKS[@]}"; do
[[ -e /usr/include/"$t" ]]
link="${SETUP_DIR}/usr/include/${t}"
mkdir -p "${link%/*}"
ln -s /usr/include/"$t" "$link"
done
# Use an absolute path so that when we chdir into the build directory,
# the path still works. This is easier than figuring out the relative path.
[[ "${SETUP_DIR}" =~ ^/ ]] || SETUP_DIR="${PWD}/${SETUP_DIR}"
CFLAGS="-idirafter ${SETUP_DIR}/usr/include"
set -x
env \
CC=musl-gcc \
CXX=musl-gcc \
CFLAGS="$CFLAGS" \
CXXFLAGS="$CFLAGS" \
meson setup --reconfigure -Ddbus-interfaces-dir=no -Dlibc=musl "${BUILD_DIR}" "${@}"