1
0
mirror of https://github.com/systemd/systemd synced 2025-10-08 21:24:45 +02:00

Compare commits

...

21 Commits

Author SHA1 Message Date
Lennart Poettering
5efbd0bf89
Merge pull request #19371 from poettering/repart-initrd-usr-only
two /sysusr/ changes for repart, split out of #19234
2021-04-20 23:46:17 +02:00
Lennart Poettering
0aa714778a
Merge pull request #19372 from poettering/repart-initrd-usr-begin
fstab-generator: mount.usr= handling changes, split out of #19234
2021-04-20 23:44:49 +02:00
Lennart Poettering
ac02dccabc
Merge pull request #19368 from poettering/loop-seqnum
loop-util: let's try harder to avoid loopback block device recycle issues
2021-04-20 23:43:57 +02:00
Lennart Poettering
3464514457 man: document new initrd-usr-fs.target 2021-04-20 19:11:07 +02:00
Lennart Poettering
632b551ca2 units: change order of settings to match order in other similar unit 2021-04-20 19:11:07 +02:00
Lennart Poettering
8f47e32a3e repart: use /sysusr/ as --root= default in initrd, if mounted 2021-04-20 18:53:15 +02:00
Lennart Poettering
a73b2ad041 repart: try harder to find OS prefix
This teaches repart to look for the root block device both as the
backing for /sysroot and for /sysusr/usr.

The latter is a new addition, and starts making more sense with the next
commit. It's about supporting systems that are shipped with only a /usr/
fs, but where a root fs is allocated and formatted on first boot via
systemd-repart (or a similar tool). In this case it's useful to be able
to mount the ultimate /usr/ early on without mounting the root fs
right-away (simple because the rootfs might not exist yet, and we need
the repart data encoded in /usr/ to actually format it). Hence, instead
of requiring that we mount /sysroot/ first and /sysroot/usr/ second as
we did so far, let's rearrange things slightly:

1. We mount the /usr/ file system we discover to /sysusr/usr/
2. We mount the root file system we discover to /sysroot/
3. Once both are established we bind mount /sysusr/usr/ to /sysroot/usr/

And that' it. The first two steps can happen in either order, and we can
access /usr/ with or without a rootfs being around.

This commit implements nothing of the above. Instead, it teaches
systemd-repart to check both /sysroot/ and /sysusr/ for repart drop-ins,
and use the first of these hierarchies it finds populated. This way
systemd-repart can be spawned once /usr is mounted and it will work
correctly without root fs having to exist, or we can invoke it when the
root fs is already mounted, where it also will work correctly.
2021-04-20 18:53:15 +02:00
Lennart Poettering
fa138f5e26 fstab-generator: properly order generated mount units before "post" target units
Let's make sure, that our mount unit are properly ordered before the
"post" target unit even if DefaultDependencies= is used on the target
unit.
2021-04-20 18:26:17 +02:00
Lennart Poettering
e19ae92af6 fstab-generator: extend logging a bit 2021-04-20 18:26:17 +02:00
Lennart Poettering
29a24ab28e fstab-generator: if usr= is specified, mount it to /sysusr/usr/ first
This changes the fstab-generator to handle mounting of /usr/ a bit
differently than before. Instead of immediately mounting the fs to
/sysroot/usr/ we'll first mount it to /sysusr/usr/ and then add a
separate bind mount that mounts it from /sysusr/usr/ to /sysroot/usr/.

This way we can access /usr independently of the root fs, without for
waiting to be mounted via the /sysusr/ hierarchy. This is useful for
invoking systemd-repart while a root fs doesn't exist yet and for
creating it, with partition data read from the /usr/ hierarchy.

This introduces a new generic target initrd-usr-fs.target that may be
used to generically order services against /sysusr/ to become available.
2021-04-20 18:26:17 +02:00
Lennart Poettering
6e1454b4b9 ci: drop test/TEST-50-DISSECT/deny-list-ubuntu-ci
Let's see if this makes the test stable on the CI.
2021-04-20 17:21:22 +02:00
Lennart Poettering
4a62257d68 dissect: ignore udev database entries from before the loopback attachment
This tries to shorten the race of device reuse a bit more: let's ignore
udev database entries that are older than the time where we started to
use a loopback device.

This doesn't fix the whole loopback device raciness mess, but it makes
the race window a bit shorter.
2021-04-20 17:20:38 +02:00
Lennart Poettering
8ede1e86b2 loop-util: track CLOCK_MONOTONIC timestamp immediately before attaching a loopback device
This is similar to the preceding work to store the uevent seqnum, but
this stores the CLOCK_MONOTONIC timestamp.

Why? This allows to validate udev database entries, to determine if they
were created *after* we attached the device.

The uevent seqnum logic allows us to validate uevent, and the timestamp
database entries, hence together we should be able to validate both
sources of truth for us.

(note that this is all racy, just a bit less racy, since we cannot
atomically attach loopback devices and get the timestamp for it, the
same way we can't get the uevent seqnum. Thus is shortens the race
window, but doesn#t close it).
2021-04-20 17:20:38 +02:00
Lennart Poettering
8626b43be4 sd-device: add API to query from when a udev database entry is
We already store a CLOCK_MONOTONIC timestamp for each device appearance,
let' make this queriable.

This is useful to determine whether a udev device database entry is from
a current appearance of the device or a previous one, by comparing it
with appropriately taken timestamps.
2021-04-20 17:14:10 +02:00
Lennart Poettering
75dc190d39 dissect: ignore old uevents when waiting for loopback partition scan
Let's drop all monitor uevent that were enqueued before we actually
started setting up the device.

This doesn't fix the race, but it makes the race window smaller: since
we cannot determine the uevent seqnum and the loopback attachment
atomically, there's a tiny window where uevents might be generated by
the device which we mistake for being associated with out use of the
loopback device.
2021-04-20 17:14:10 +02:00
Lennart Poettering
31c75fcc41 loop-util: read kernel's uevent seqnum right before attaching a loopback device
Later, this will allow us to ignore uevents from earlier attachments a
bit better, as we can compare uevent seqnums with this boundary. It's
not a full fix for the race though, since we cannot atomically determine
the uevent and attach the device, but it at least shortens the window a
bit.
2021-04-20 17:13:56 +02:00
Lennart Poettering
79e8393a6a loop-util: initialize .devno in loop_device_open() too 2021-04-20 17:12:39 +02:00
Lennart Poettering
b0dbffd868 loop-util: port to random_u64_range()
Doesn't matter, but it's a bit easier to read I'd claim.
2021-04-20 17:12:39 +02:00
Lennart Poettering
38bd449f96 loop-util: make loop_device_make() return fd in all code paths
Previously, loop_device_make() would return the device fd in one success
code path, but not the other (where' we'd just return 0).
loop_device_open() returns it in all cases.

Hence, let's clean this up, and make sure in all success code paths of
both functions we return it (even though it strictly speaking is
redundant, since we return it in LoopDevice anyway, and currently noone
actually relies on this).
2021-04-20 17:12:39 +02:00
Lennart Poettering
02ef01ade3 sd-device: use right clock when comparing initialization usec
we actually use CLOCK_MONOTONIC for the timestamp, hence when
comparing/subtracting it from the current time, also use
CLOCK_MONOTONIC.
2021-04-20 17:12:39 +02:00
Lennart Poettering
a156eb89c8 sd-device: use right type for usec_initialized 2021-04-20 17:11:21 +02:00
27 changed files with 354 additions and 63 deletions

View File

@ -46,6 +46,7 @@
<filename>initrd-fs.target</filename>,
<filename>initrd-root-device.target</filename>,
<filename>initrd-root-fs.target</filename>,
<filename>initrd-usr-fs.target</filename>,
<filename>kbrequest.target</filename>,
<filename>kexec.target</filename>,
<filename>local-fs-pre.target</filename>,
@ -372,12 +373,13 @@
<term><filename>initrd-fs.target</filename></term>
<listitem>
<para><citerefentry><refentrytitle>systemd-fstab-generator</refentrytitle><manvolnum>3</manvolnum></citerefentry>
automatically adds dependencies of type
<varname>Before=</varname> to
<filename>sysroot-usr.mount</filename> and all mount points
found in <filename>/etc/fstab</filename> that have
<option>x-initrd.mount</option> and not have
<option>noauto</option> mount options set.</para>
automatically adds dependencies of type <varname>Before=</varname> to
<filename>sysroot-usr.mount</filename> and all mount points found in
<filename>/etc/fstab</filename> that have the <option>x-initrd.mount</option> mount option set
and do not have the <option>noauto</option> mount option set. It is also indirectly ordered after
<filename>sysroot.mount</filename>. Thus, once this target is reached the
<filename>/sysroot/</filename> hierarchy is fully set up, in preparation for the transition to
the host OS.</para>
</listitem>
</varlistentry>
<varlistentry>
@ -396,11 +398,27 @@
<term><filename>initrd-root-fs.target</filename></term>
<listitem>
<para><citerefentry><refentrytitle>systemd-fstab-generator</refentrytitle><manvolnum>3</manvolnum></citerefentry>
automatically adds dependencies of type
<varname>Before=</varname> to the
<filename>sysroot.mount</filename> unit, which is generated
from the kernel command line.
</para>
automatically adds dependencies of type <varname>Before=</varname> to the
<filename>sysroot.mount</filename> unit, which is generated from the kernel command line's
<varname>root=</varname> setting (or equivalent).</para>
</listitem>
</varlistentry>
<varlistentry>
<term><filename>initrd-usr-fs.target</filename></term>
<listitem>
<para><citerefentry><refentrytitle>systemd-fstab-generator</refentrytitle><manvolnum>3</manvolnum></citerefentry>
automatically adds dependencies of type <varname>Before=</varname> to the
<filename>sysusr-usr.mount</filename> unit, which is generated from the kernel command line's
<varname>usr=</varname> switch. Services may order themselves after this target unit in order to
run once the <filename>/sysusr/</filename> hierarchy becomes available, on systems that come up
initially without a root file system, but with an initialized <filename>/usr/</filename> and need
to access that before setting up the root file system to ultimately switch to. On systems where
<varname>usr=</varname> is not used this target is ordered afer
<filename>sysroot.mount</filename> and thus mostly equivalent to
<filename>initrd-root-fs.target</filename>. In effect on any system once this target is reached
the file system backing <filename>/usr/</filename> is mounted, though possibly at two different
locations, either below the <filename>/sysusr/</filename> or the <filename>/sysroot/</filename>
hierarchies.</para>
</listitem>
</varlistentry>
<varlistentry>

View File

@ -37,6 +37,7 @@
#define SPECIAL_INITRD_FS_TARGET "initrd-fs.target"
#define SPECIAL_INITRD_ROOT_DEVICE_TARGET "initrd-root-device.target"
#define SPECIAL_INITRD_ROOT_FS_TARGET "initrd-root-fs.target"
#define SPECIAL_INITRD_USR_FS_TARGET "initrd-usr-fs.target"
#define SPECIAL_REMOTE_FS_TARGET "remote-fs.target" /* LSB's $remote_fs */
#define SPECIAL_REMOTE_FS_PRE_TARGET "remote-fs-pre.target"
#define SPECIAL_SWAP_TARGET "swap.target"

View File

@ -1863,6 +1863,8 @@ int setup_namespace(
loop_device->fd,
&verity,
root_image_options,
loop_device->uevent_seqnum_not_before,
loop_device->timestamp_not_before,
dissect_image_flags,
&dissected_image);
if (r < 0)

View File

@ -781,6 +781,8 @@ static int run(int argc, char *argv[]) {
arg_image,
&arg_verity_settings,
NULL,
d->uevent_seqnum_not_before,
d->timestamp_not_before,
arg_flags,
&m);
if (r < 0)

View File

@ -433,6 +433,11 @@ static int add_mount(
if (r < 0)
return r;
/* Order the mount unit we generate relative to the post unit, so that DefaultDependencies= on the
* target unit won't affect us. */
if (post && !FLAGS_SET(flags, AUTOMOUNT) && !FLAGS_SET(flags, NOAUTO))
fprintf(f, "Before=%s\n", post);
if (passno != 0) {
r = generator_write_fsck_deps(f, dest, what, where, fstype);
if (r < 0)
@ -721,7 +726,7 @@ static int add_sysroot_mount(void) {
else
opts = arg_root_options;
log_debug("Found entry what=%s where=/sysroot type=%s", what, strna(arg_root_fstype));
log_debug("Found entry what=%s where=/sysroot type=%s opts=%s", what, strna(arg_root_fstype), strempty(opts));
if (is_device_path(what)) {
r = generator_write_initrd_root_device_deps(arg_dest, what);
@ -744,6 +749,10 @@ static int add_sysroot_mount(void) {
static int add_sysroot_usr_mount(void) {
_cleanup_free_ char *what = NULL;
const char *opts;
int r;
/* Returns 0 if we didn't do anything, > 0 if we either generated a unit for the /usr/ mount, or we
* know for sure something else did */
if (!arg_usr_what && !arg_usr_fstype && !arg_usr_options)
return 0;
@ -767,8 +776,23 @@ static int add_sysroot_usr_mount(void) {
return log_oom();
}
if (!arg_usr_what)
if (isempty(arg_usr_what)) {
log_debug("Could not find a usr= entry on the kernel command line.");
return 0;
}
if (streq(arg_usr_what, "gpt-auto")) {
/* This is handled by the gpt-auto generator */
log_debug("Skipping /usr/ directory handling, as gpt-auto was requested.");
return 1; /* systemd-gpt-auto-generator will generate a unit for this, hence report that a
* unit file is being created for the host /usr/ mount. */
}
if (path_equal(arg_usr_what, "/dev/nfs")) {
/* This is handled by the initrd (if at all supported, that is) */
log_debug("Skipping /usr/ directory handling, as /dev/nfs was requested.");
return 1; /* As above, report that NFS code will create the unit */
}
what = fstab_node_to_udev_node(arg_usr_what);
if (!what)
@ -781,17 +805,62 @@ static int add_sysroot_usr_mount(void) {
else
opts = arg_usr_options;
log_debug("Found entry what=%s where=/sysroot/usr type=%s", what, strna(arg_usr_fstype));
return add_mount(arg_dest,
what,
"/sysroot/usr",
NULL,
arg_usr_fstype,
opts,
is_device_path(what) ? 1 : 0, /* passno */
0,
SPECIAL_INITRD_FS_TARGET,
"/proc/cmdline");
/* When mounting /usr from the initrd, we add an extra level of indirection: we first mount the /usr/
* partition to /sysusr/usr/, and then afterwards bind mount that to /sysroot/usr/. We do this so
* that we can cover for systems that initially only have a /usr/ around and where the root fs needs
* to be synthesized, based on configuration included in /usr/, e.g. systemd-repart. Software like
* this should order itself after initrd-usr-fs.target and before initrd-fs.target; and it should
* look into both /sysusr/ and /sysroot/ for the configuration data to apply. */
log_debug("Found entry what=%s where=/sysusr/usr type=%s opts=%s", what, strna(arg_usr_fstype), strempty(opts));
r = add_mount(arg_dest,
what,
"/sysusr/usr",
NULL,
arg_usr_fstype,
opts,
is_device_path(what) ? 1 : 0, /* passno */
0,
SPECIAL_INITRD_USR_FS_TARGET,
"/proc/cmdline");
if (r < 0)
return r;
log_debug("Synthesizing entry what=/sysusr/usr where=/sysrootr/usr opts=bind");
r = add_mount(arg_dest,
"/sysusr/usr",
"/sysroot/usr",
NULL,
NULL,
"bind",
0,
0,
SPECIAL_INITRD_FS_TARGET,
"/proc/cmdline");
if (r < 0)
return r;
return 1;
}
static int add_sysroot_usr_mount_or_fallback(void) {
int r;
r = add_sysroot_usr_mount();
if (r != 0)
return r;
/* OK, so we didn't write anything out for /sysusr/usr/ nor /sysroot/usr/. In this case, let's make
* sure that initrd-usr-fs.target is at least ordered after sysroot.mount so that services that order
* themselves get the guarantee that /usr/ is definitely mounted somewhere. */
return generator_add_symlink(
arg_dest,
SPECIAL_INITRD_USR_FS_TARGET,
"requires",
"sysroot.mount");
}
static int add_volatile_root(void) {
@ -953,7 +1022,7 @@ static int run(const char *dest, const char *dest_early, const char *dest_late)
if (in_initrd()) {
r = add_sysroot_mount();
r2 = add_sysroot_usr_mount();
r2 = add_sysroot_usr_mount_or_fallback();
r3 = add_volatile_root();
} else

View File

@ -672,6 +672,8 @@ static int enumerate_partitions(dev_t devnum) {
r = dissect_image(
fd,
NULL, NULL,
UINT64_MAX,
USEC_INFINITY,
DISSECT_IMAGE_GPT_ONLY|
DISSECT_IMAGE_NO_UDEV|
DISSECT_IMAGE_USR_NO_ROOT,

View File

@ -756,4 +756,5 @@ LIBSYSTEMD_249 {
global:
sd_device_monitor_filter_add_match_sysattr;
sd_device_monitor_filter_add_match_parent;
sd_device_get_usec_initialized;
} LIBSYSTEMD_248;

View File

@ -69,7 +69,7 @@ struct sd_device {
char *id_filename;
uint64_t usec_initialized;
usec_t usec_initialized;
mode_t devmode;
uid_t devuid;

View File

@ -1428,6 +1428,27 @@ _public_ int sd_device_get_is_initialized(sd_device *device) {
return device->is_initialized;
}
_public_ int sd_device_get_usec_initialized(sd_device *device, uint64_t *ret) {
int r;
assert_return(device, -EINVAL);
r = device_read_db(device);
if (r < 0)
return r;
if (!device->is_initialized)
return -EBUSY;
if (device->usec_initialized == 0)
return -ENODATA;
if (ret)
*ret = device->usec_initialized;
return 0;
}
_public_ int sd_device_get_usec_since_initialized(sd_device *device, uint64_t *usec) {
usec_t now_ts;
int r;
@ -1441,10 +1462,10 @@ _public_ int sd_device_get_usec_since_initialized(sd_device *device, uint64_t *u
if (!device->is_initialized)
return -EBUSY;
if (!device->usec_initialized)
if (device->usec_initialized == 0)
return -ENODATA;
now_ts = now(clock_boottime_or_monotonic());
now_ts = now(CLOCK_MONOTONIC);
if (now_ts < device->usec_initialized)
return -EIO;

View File

@ -5483,6 +5483,8 @@ static int run(int argc, char *argv[]) {
arg_image,
&arg_verity_settings,
NULL,
loop->uevent_seqnum_not_before,
loop->timestamp_not_before,
dissect_image_flags,
&dissected_image);
if (r == -ENOPKG) {

View File

@ -4318,8 +4318,18 @@ static int parse_argv(int argc, char *argv[]) {
if (arg_image && arg_root)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Please specify either --root= or --image=, the combination of both is not supported.");
else if (!arg_image && !arg_root && in_initrd()) {
/* Default to operation on /sysroot when invoked in the initrd! */
arg_root = strdup("/sysroot");
/* By default operate on /sysusr/ or /sysroot/ when invoked in the initrd. We prefer the
* former, if it is mounted, so that we have deterministic behaviour on systems where /usr/
* is vendor-supplied but the root fs formatted on first boot. */
r = path_is_mount_point("/sysusr/usr", NULL, 0);
if (r <= 0) {
if (r < 0 && r != -ENOENT)
log_debug_errno(r, "Unable to determine whether /sysusr/usr is a mount point, assuming it is not: %m");
arg_root = strdup("/sysroot");
} else
arg_root = strdup("/sysusr");
if (!arg_root)
return log_oom();
}
@ -4471,8 +4481,43 @@ static int acquire_root_devno(
return 0;
}
static int find_os_prefix(const char **ret) {
int r;
assert(ret);
/* Searches for the right place to look for the OS root. This is relevant in the initrd: in the
* initrd the host OS is typically mounted to /sysroot/ except in setups where /usr/ is a separate
* partition, in which case it is mounted to /sysusr/usr/ before being moved to /sysroot/usr/. */
if (!in_initrd()) {
*ret = NULL; /* no prefix */
return 0;
}
r = path_is_mount_point("/sysroot", NULL, 0);
if (r < 0 && r != -ENOENT)
log_debug_errno(r, "Failed to determine whether /sysroot/ is a mount point, assuming it is not: %m");
else if (r > 0) {
log_debug("/sysroot/ is a mount point, assuming it's the prefix.");
*ret = "/sysroot";
return 0;
}
r = path_is_mount_point("/sysusr/usr", NULL, 0);
if (r < 0 && r != -ENOENT)
log_debug_errno(r, "Failed to determine whether /sysusr/usr is a mount point, assuming it is not: %m");
else if (r > 0) {
log_debug("/sysusr/usr/ is a mount point, assuming /sysusr/ is the prefix.");
*ret = "/sysusr";
return 0;
}
return -ENOENT;
}
static int find_root(char **ret, int *ret_fd) {
const char *t;
const char *t, *prefix;
int r;
assert(ret);
@ -4513,12 +4558,16 @@ static int find_root(char **ret, int *ret_fd) {
* latter we check for cases where / is a tmpfs and only /usr is an actual persistent block device
* (think: volatile setups) */
r = find_os_prefix(&prefix);
if (r < 0)
return log_error_errno(r, "Failed to determine OS prefix: %m");
FOREACH_STRING(t, "/", "/usr") {
_cleanup_free_ char *j = NULL;
const char *p;
if (in_initrd()) {
j = path_join("/sysroot", t);
if (prefix) {
j = path_join(prefix, t);
if (!j)
return log_oom();

View File

@ -395,6 +395,8 @@ static int portable_extract_by_path(
r = dissect_image(
d->fd,
NULL, NULL,
d->uevent_seqnum_not_before,
d->timestamp_not_before,
DISSECT_IMAGE_READ_ONLY |
DISSECT_IMAGE_GENERIC_ROOT |
DISSECT_IMAGE_REQUIRE_ROOT |

View File

@ -1201,10 +1201,13 @@ int image_read_metadata(Image *i) {
r = dissect_image(
d->fd,
NULL, NULL,
d->uevent_seqnum_not_before,
d->timestamp_not_before,
DISSECT_IMAGE_GENERIC_ROOT |
DISSECT_IMAGE_REQUIRE_ROOT |
DISSECT_IMAGE_RELAX_VAR_CHECK |
DISSECT_IMAGE_USR_NO_ROOT, &m);
DISSECT_IMAGE_USR_NO_ROOT,
&m);
if (r < 0)
return r;

View File

@ -123,10 +123,6 @@ static int enumerator_for_parent(sd_device *d, sd_device_enumerator **ret) {
if (r < 0)
return r;
r = sd_device_enumerator_allow_uninitialized(e);
if (r < 0)
return r;
r = sd_device_enumerator_add_match_subsystem(e, "block", true);
if (r < 0)
return r;
@ -229,6 +225,7 @@ static int device_is_partition(sd_device *d, sd_device *expected_parent, blkid_p
static int find_partition(
sd_device *parent,
blkid_partition pp,
usec_t timestamp_not_before,
sd_device **ret) {
_cleanup_(sd_device_enumerator_unrefp) sd_device_enumerator *e = NULL;
@ -244,6 +241,18 @@ static int find_partition(
return r;
FOREACH_DEVICE(e, q) {
uint64_t usec;
r = sd_device_get_usec_initialized(q, &usec);
if (r == -EBUSY) /* Not initialized yet */
continue;
if (r < 0)
return r;
if (timestamp_not_before != USEC_INFINITY &&
usec < timestamp_not_before) /* udev database entry older than our attachment? Then it's not ours */
continue;
r = device_is_partition(q, parent, pp);
if (r < 0)
return r;
@ -260,6 +269,7 @@ struct wait_data {
sd_device *parent_device;
blkid_partition blkidp;
sd_device *found;
uint64_t uevent_seqnum_not_before;
};
static inline void wait_data_done(struct wait_data *d) {
@ -275,6 +285,20 @@ static int device_monitor_handler(sd_device_monitor *monitor, sd_device *device,
if (device_for_action(device, SD_DEVICE_REMOVE))
return 0;
if (w->uevent_seqnum_not_before != UINT64_MAX) {
uint64_t seqnum;
r = sd_device_get_seqnum(device, &seqnum);
if (r < 0)
goto finish;
if (seqnum <= w->uevent_seqnum_not_before) { /* From an older use of this loop device */
log_debug("Dropping event because seqnum too old (%" PRIu64 " <= %" PRIu64 ")",
seqnum, w->uevent_seqnum_not_before);
return 0;
}
}
r = device_is_partition(device, w->parent_device, w->blkidp);
if (r < 0)
goto finish;
@ -294,6 +318,8 @@ static int wait_for_partition_device(
sd_device *parent,
blkid_partition pp,
usec_t deadline,
uint64_t uevent_seqnum_not_before,
usec_t timestamp_not_before,
sd_device **ret) {
_cleanup_(sd_event_source_unrefp) sd_event_source *timeout_source = NULL;
@ -305,7 +331,7 @@ static int wait_for_partition_device(
assert(pp);
assert(ret);
r = find_partition(parent, pp, ret);
r = find_partition(parent, pp, timestamp_not_before, ret);
if (r != -ENXIO)
return r;
@ -336,6 +362,7 @@ static int wait_for_partition_device(
_cleanup_(wait_data_done) struct wait_data w = {
.parent_device = parent,
.blkidp = pp,
.uevent_seqnum_not_before = uevent_seqnum_not_before,
};
r = sd_device_monitor_start(monitor, device_monitor_handler, &w);
@ -343,7 +370,7 @@ static int wait_for_partition_device(
return r;
/* Check again, the partition might have appeared in the meantime */
r = find_partition(parent, pp, ret);
r = find_partition(parent, pp, timestamp_not_before, ret);
if (r != -ENXIO)
return r;
@ -492,6 +519,8 @@ int dissect_image(
int fd,
const VeritySettings *verity,
const MountOptions *mount_options,
uint64_t uevent_seqnum_not_before,
usec_t timestamp_not_before,
DissectImageFlags flags,
DissectedImage **ret) {
@ -744,7 +773,7 @@ int dissect_image(
if (!pp)
return errno_or_else(EIO);
r = wait_for_partition_device(d, pp, deadline, &q);
r = wait_for_partition_device(d, pp, deadline, uevent_seqnum_not_before, timestamp_not_before, &q);
if (r < 0)
return r;
@ -2579,6 +2608,8 @@ int dissect_image_and_warn(
const char *name,
const VeritySettings *verity,
const MountOptions *mount_options,
uint64_t uevent_seqnum_not_before,
usec_t timestamp_not_before,
DissectImageFlags flags,
DissectedImage **ret) {
@ -2593,7 +2624,7 @@ int dissect_image_and_warn(
name = buffer;
}
r = dissect_image(fd, verity, mount_options, flags, ret);
r = dissect_image(fd, verity, mount_options, uevent_seqnum_not_before, timestamp_not_before, flags, ret);
switch (r) {
case -EOPNOTSUPP:
@ -2701,7 +2732,7 @@ int mount_image_privately_interactively(
if (r < 0)
return log_error_errno(r, "Failed to set up loopback device: %m");
r = dissect_image_and_warn(d->fd, image, &verity, NULL, flags, &dissected_image);
r = dissect_image_and_warn(d->fd, image, &verity, NULL, d->uevent_seqnum_not_before, d->timestamp_not_before, flags, &dissected_image);
if (r < 0)
return r;
@ -2792,6 +2823,8 @@ int verity_dissect_and_mount(
loop_device->fd,
&verity,
options,
loop_device->uevent_seqnum_not_before,
loop_device->timestamp_not_before,
dissect_image_flags,
&dissected_image);
/* No partition table? Might be a single-filesystem image, try again */
@ -2800,7 +2833,9 @@ int verity_dissect_and_mount(
loop_device->fd,
&verity,
options,
dissect_image_flags|DISSECT_IMAGE_NO_PARTITION_TABLE,
loop_device->uevent_seqnum_not_before,
loop_device->timestamp_not_before,
dissect_image_flags | DISSECT_IMAGE_NO_PARTITION_TABLE,
&dissected_image);
if (r < 0)
return log_debug_errno(r, "Failed to dissect image: %m");

View File

@ -159,8 +159,8 @@ DEFINE_TRIVIAL_CLEANUP_FUNC(MountOptions*, mount_options_free_all);
const char* mount_options_from_designator(const MountOptions *options, PartitionDesignator designator);
int probe_filesystem(const char *node, char **ret_fstype);
int dissect_image(int fd, const VeritySettings *verity, const MountOptions *mount_options, DissectImageFlags flags, DissectedImage **ret);
int dissect_image_and_warn(int fd, const char *name, const VeritySettings *verity, const MountOptions *mount_options, DissectImageFlags flags, DissectedImage **ret);
int dissect_image(int fd, const VeritySettings *verity, const MountOptions *mount_options, uint64_t uevent_seqnum_not_before, usec_t timestamp_not_before, DissectImageFlags flags, DissectedImage **ret);
int dissect_image_and_warn(int fd, const char *name, const VeritySettings *verity, const MountOptions *mount_options, uint64_t uevent_seqnum_not_before, usec_t timestamp_not_before, DissectImageFlags flags, DissectedImage **ret);
DissectedImage* dissected_image_unref(DissectedImage *m);
DEFINE_TRIVIAL_CLEANUP_FUNC(DissectedImage*, dissected_image_unref);

View File

@ -53,6 +53,23 @@ static int loop_is_bound(int fd) {
return true; /* bound! */
}
static int get_current_uevent_seqnum(uint64_t *ret) {
_cleanup_free_ char *p = NULL;
int r;
r = read_full_virtual_file("/sys/kernel/uevent_seqnum", &p, NULL);
if (r < 0)
return log_debug_errno(r, "Failed to read current uevent sequence number: %m");
truncate_nl(p);
r = safe_atou64(p, ret);
if (r < 0)
return log_debug_errno(r, "Failed to parse current uevent sequence number: %s", p);
return 0;
}
static int device_has_block_children(sd_device *d) {
_cleanup_(sd_device_enumerator_unrefp) sd_device_enumerator *e = NULL;
const char *main_sn, *main_ss;
@ -114,11 +131,15 @@ static int loop_configure(
int fd,
int nr,
const struct loop_config *c,
bool *try_loop_configure) {
bool *try_loop_configure,
uint64_t *ret_seqnum_not_before,
usec_t *ret_timestamp_not_before) {
_cleanup_(sd_device_unrefp) sd_device *d = NULL;
_cleanup_free_ char *sysname = NULL;
_cleanup_close_ int lock_fd = -1;
uint64_t seqnum;
usec_t timestamp;
int r;
assert(fd >= 0);
@ -167,6 +188,17 @@ static int loop_configure(
}
if (*try_loop_configure) {
/* Acquire uevent seqnum immediately before attaching the loopback device. This allows
* callers to ignore all uevents with a seqnum before this one, if they need to associate
* uevent with this attachment. Doing so isn't race-free though, as uevents that happen in
* the window between this reading of the seqnum, and the LOOP_CONFIGURE call might still be
* mistaken as originating from our attachment, even though might be caused by an earlier
* use. But doing this at least shortens the race window a bit. */
r = get_current_uevent_seqnum(&seqnum);
if (r < 0)
return r;
timestamp = now(CLOCK_MONOTONIC);
if (ioctl(fd, LOOP_CONFIGURE, c) < 0) {
/* Do fallback only if LOOP_CONFIGURE is not supported, propagate all other
* errors. Note that the kernel is weird: non-existing ioctls currently return EINVAL
@ -224,10 +256,21 @@ static int loop_configure(
goto fail;
}
if (ret_seqnum_not_before)
*ret_seqnum_not_before = seqnum;
if (ret_timestamp_not_before)
*ret_timestamp_not_before = timestamp;
return 0;
}
}
/* Let's read the seqnum again, to shorten the window. */
r = get_current_uevent_seqnum(&seqnum);
if (r < 0)
return r;
timestamp = now(CLOCK_MONOTONIC);
/* Since kernel commit 5db470e229e22b7eda6e23b5566e532c96fb5bc3 (kernel v5.0) the LOOP_SET_STATUS64
* ioctl can return EAGAIN in case we change the lo_offset field, if someone else is accessing the
* block device while we try to reconfigure it. This is a pretty common case, since udev might
@ -252,9 +295,14 @@ static int loop_configure(
/* Sleep some random time, but at least 10ms, at most 250ms. Increase the delay the more
* failed attempts we see */
(void) usleep(UINT64_C(10) * USEC_PER_MSEC +
random_u64() % (UINT64_C(240) * USEC_PER_MSEC * n_attempts/64));
random_u64_range(UINT64_C(240) * USEC_PER_MSEC * n_attempts/64));
}
if (ret_seqnum_not_before)
*ret_seqnum_not_before = seqnum;
if (ret_timestamp_not_before)
*ret_timestamp_not_before = timestamp;
return 0;
fail:
@ -312,6 +360,8 @@ int loop_device_make(
bool try_loop_configure = true;
struct loop_config config;
LoopDevice *d = NULL;
uint64_t seqnum = UINT64_MAX;
usec_t timestamp = USEC_INFINITY;
struct stat st;
int nr = -1, r;
@ -354,6 +404,8 @@ int loop_device_make(
.node = TAKE_PTR(loopdev),
.relinquished = true, /* It's not allocated by us, don't destroy it when this object is freed */
.devno = st.st_rdev,
.uevent_seqnum_not_before = UINT64_MAX,
.timestamp_not_before = USEC_INFINITY,
};
*ret = d;
@ -401,7 +453,7 @@ int loop_device_make(
if (!IN_SET(errno, ENOENT, ENXIO))
return -errno;
} else {
r = loop_configure(loop, nr, &config, &try_loop_configure);
r = loop_configure(loop, nr, &config, &try_loop_configure, &seqnum, &timestamp);
if (r >= 0) {
loop_with_fd = TAKE_FD(loop);
break;
@ -422,8 +474,8 @@ int loop_device_make(
/* Wait some random time, to make collision less likely. Let's pick a random time in the
* range 0ms250ms, linearly scaled by the number of failed attempts. */
(void) usleep(random_u64() % (UINT64_C(10) * USEC_PER_MSEC +
UINT64_C(240) * USEC_PER_MSEC * n_attempts/64));
(void) usleep(random_u64_range(UINT64_C(10) * USEC_PER_MSEC +
UINT64_C(240) * USEC_PER_MSEC * n_attempts/64));
}
if (fstat(loop_with_fd, &st) < 0)
@ -438,13 +490,20 @@ int loop_device_make(
.node = TAKE_PTR(loopdev),
.nr = nr,
.devno = st.st_rdev,
.uevent_seqnum_not_before = seqnum,
.timestamp_not_before = timestamp,
};
*ret = d;
return 0;
return d->fd;
}
int loop_device_make_by_path(const char *path, int open_flags, uint32_t loop_flags, LoopDevice **ret) {
int loop_device_make_by_path(
const char *path,
int open_flags,
uint32_t loop_flags,
LoopDevice **ret) {
_cleanup_close_ int fd = -1;
int r;
@ -567,6 +626,9 @@ int loop_device_open(const char *loop_path, int open_flags, LoopDevice **ret) {
.nr = nr,
.node = TAKE_PTR(p),
.relinquished = true, /* It's not ours, don't try to destroy it when this object is freed */
.devno = st.st_dev,
.uevent_seqnum_not_before = UINT64_MAX,
.timestamp_not_before = USEC_INFINITY,
};
*ret = d;

View File

@ -2,6 +2,7 @@
#pragma once
#include "macro.h"
#include "time-util.h"
typedef struct LoopDevice LoopDevice;
@ -13,6 +14,8 @@ struct LoopDevice {
dev_t devno;
char *node;
bool relinquished;
uint64_t uevent_seqnum_not_before; /* uevent sequm right before we attached the loopback device, or UINT64_MAX if we don't know */
usec_t timestamp_not_before; /* CLOCK_MONOTONIC timestamp taken immediately before attaching the loopback device, or USEC_INFINITY if we don't know */
};
int loop_device_make(int fd, int open_flags, uint64_t offset, uint64_t size, uint32_t loop_flags, LoopDevice **ret);

View File

@ -532,6 +532,8 @@ static int merge_subprocess(Hashmap *images, const char *workspace) {
img->path,
&verity_settings,
NULL,
d->uevent_seqnum_not_before,
d->timestamp_not_before,
flags,
&m);
if (r < 0)

View File

@ -79,6 +79,7 @@ int sd_device_get_action(sd_device *device, sd_device_action_t *ret);
int sd_device_get_seqnum(sd_device *device, uint64_t *ret);
int sd_device_get_is_initialized(sd_device *device);
int sd_device_get_usec_initialized(sd_device *device, uint64_t *usec);
int sd_device_get_usec_since_initialized(sd_device *device, uint64_t *usec);
const char *sd_device_get_tag_first(sd_device *device);

View File

@ -51,7 +51,7 @@ static void* thread_func(void *ptr) {
log_notice("Acquired loop device %s, will mount on %s", loop->node, mounted);
r = dissect_image(loop->fd, NULL, NULL, DISSECT_IMAGE_READ_ONLY, &dissected);
r = dissect_image(loop->fd, NULL, NULL, loop->uevent_seqnum_not_before, loop->timestamp_not_before, DISSECT_IMAGE_READ_ONLY, &dissected);
if (r < 0)
log_error_errno(r, "Failed dissect loopback device %s: %m", loop->node);
assert_se(r >= 0);
@ -188,7 +188,7 @@ int main(int argc, char *argv[]) {
sfdisk = NULL;
assert_se(loop_device_make(fd, O_RDWR, 0, UINT64_MAX, LO_FLAGS_PARTSCAN, &loop) >= 0);
assert_se(dissect_image(loop->fd, NULL, NULL, 0, &dissected) >= 0);
assert_se(dissect_image(loop->fd, NULL, NULL, loop->uevent_seqnum_not_before, loop->timestamp_not_before, 0, &dissected) >= 0);
assert_se(dissected->partitions[PARTITION_ESP].found);
assert_se(dissected->partitions[PARTITION_ESP].node);
@ -212,7 +212,7 @@ int main(int argc, char *argv[]) {
assert_se(make_filesystem(dissected->partitions[PARTITION_HOME].node, "ext4", "home", id, true) >= 0);
dissected = dissected_image_unref(dissected);
assert_se(dissect_image(loop->fd, NULL, NULL, 0, &dissected) >= 0);
assert_se(dissect_image(loop->fd, NULL, NULL, loop->uevent_seqnum_not_before, loop->timestamp_not_before, 0, &dissected) >= 0);
assert_se(mkdtemp_malloc(NULL, &mounted) >= 0);

View File

@ -1,2 +0,0 @@
Skip this test due to issue #17469
https://github.com/systemd/systemd/issues/17469

View File

@ -10,9 +10,9 @@
[Unit]
Description=Initrd File Systems
Documentation=man:systemd.special(7)
AssertPathExists=/etc/initrd-release
OnFailure=emergency.target
OnFailureJobMode=replace-irreversibly
AssertPathExists=/etc/initrd-release
After=initrd-parse-etc.service
DefaultDependencies=no
Conflicts=shutdown.target

View File

@ -0,0 +1,17 @@
# SPDX-License-Identifier: LGPL-2.1-or-later
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Initrd /usr File System
Documentation=man:systemd.special(7)
AssertPathExists=/etc/initrd-release
OnFailure=emergency.target
OnFailureJobMode=replace-irreversibly
DefaultDependencies=no
Conflicts=shutdown.target

View File

@ -14,6 +14,6 @@ OnFailure=emergency.target
OnFailureJobMode=replace-irreversibly
AssertPathExists=/etc/initrd-release
Requires=basic.target
Wants=initrd-root-fs.target initrd-root-device.target initrd-fs.target initrd-parse-etc.service
After=initrd-root-fs.target initrd-root-device.target initrd-fs.target basic.target rescue.service rescue.target
Wants=initrd-root-fs.target initrd-root-device.target initrd-fs.target initrd-usr-fs.target initrd-parse-etc.service
After=initrd-root-fs.target initrd-root-device.target initrd-fs.target initrd-usr-fs.target basic.target rescue.service rescue.target
AllowIsolate=yes

View File

@ -38,6 +38,7 @@ units = [
['initrd-switch-root.service', 'ENABLE_INITRD'],
['initrd-switch-root.target', 'ENABLE_INITRD'],
['initrd-udevadm-cleanup-db.service', 'ENABLE_INITRD'],
['initrd-usr-fs.target', 'ENABLE_INITRD'],
['initrd.target', 'ENABLE_INITRD'],
['kexec.target', ''],
['ldconfig.service', 'ENABLE_LDCONFIG',

View File

@ -12,7 +12,7 @@ Description=Repartition Root Disk
Documentation=man:systemd-repart.service(8)
DefaultDependencies=no
Conflicts=shutdown.target
After=sysroot.mount
After=initrd-usr-fs.target
Before=initrd-root-fs.target shutdown.target
ConditionVirtualization=!container
ConditionDirectoryNotEmpty=|/usr/lib/repart.d

View File

@ -12,7 +12,7 @@ Description=Enforce Volatile Root File Systems
Documentation=man:systemd-volatile-root.service(8)
DefaultDependencies=no
Conflicts=shutdown.target
After=sysroot.mount systemd-repart.service
After=sysroot.mount sysroot-usr.mount systemd-repart.service
Before=initrd-root-fs.target shutdown.target
AssertPathExists=/etc/initrd-release