Compare commits

...

17 Commits

Author SHA1 Message Date
Adrian Vovk cf612c5fd5
repart: Add tests for supplement partitions 2024-09-17 14:06:51 -04:00
Adrian Vovk 2cb9c68c3a
repart: Add SupplementFor= logic
This was designed to deal with $BOOT, as defined by the Boot Loader
Specification, but it was made a generic mechanism because it is useful
elsewhere too. See the updated man page for usage examples, motivation,
and an explanation of how this works.
2024-09-17 14:06:50 -04:00
Adrian Vovk 78e9059208
repart: Consider existing partitions when placing
Fixes an oversight in `context_allocate_partitions` that makes it
succeed in cases where it should fail. Essentially, there was nothing
actually enforcing SizeMinBytes= and PaddingMinBytes= for partitions
that exist, only for new partitions. This behavior is inconsistent with
the docs, which state that existing partitions will be grown to at least
the specified minimum size, and that "If the backing device does not
provide enough space to fulfill the constraints placing the partition
will fail".
2024-09-17 14:06:49 -04:00
Adrian Vovk e671bdc5c3
strv: Fixup STRV_FOREACH_PAIR macro
The macro didn't properly parenthesize a caller-controlled argument.
For example: `STRV_FOREACH_PAIR(a, b, something ?: something_else)`
would expand to `typeof(*something ?: something_else)`, which would
cause compile failures
2024-09-17 14:06:26 -04:00
PavlNekrasov d80a9042ca
Use correct error code in log message in output_waiting_jobs (#34404)
The error code `r` from the read function is being logged, but the error code `rc` from the table data insertion function should be logged instead.
2024-09-17 19:17:21 +09:00
Yu Watanabe a7afe5a3e7
Merge pull request #34443 from yuwata/network-sysctl-monitor-follow-ups
network/sysctl-monitor: several follow-ups and cleanups
2024-09-17 19:15:12 +09:00
Yu Watanabe a65b864835 docs: fix typo in filename: REATLIME -> REALTIME 2024-09-17 10:21:54 +02:00
Yu Watanabe 9959681a0d test/repart: fix mkfs checker
Follow-up for 27cacec939.
2024-09-17 10:15:21 +02:00
Daan De Meyer b3ebd480d6 Fix generator logging
log_setup() overrides the previously set log target again so we
can't use it in log_setup_generator().

Follow-up for aa976d8788
2024-09-17 15:10:39 +09:00
Arian van Putten 6695ff4c15 CONTROL_GROUP_INTERFACE: fix link to systemd-run code 2024-09-17 15:09:48 +09:00
Yu Watanabe 4d6ad22f8d network: drop unnecessary BPF related objects from Manager when disabled 2024-09-17 15:00:06 +09:00
Yu Watanabe 099ee34ca1 network/sysctl-monitor: do not allocate sysctl_shadow when eBPF is not supported
When eBPF is disabled, the hashmap will be never used. Let's not
allocate it.
2024-09-17 14:53:29 +09:00
Yu Watanabe a2fbe9f3f9 network/sysctl-monitor: fix use-after-free
Previously, manager_free() did not assign NULL to Manager.sysctl_shadow,
hence sysctl_clear_link_shadows() called by link_free() will causes
use-after-free. To fix the issue, this makes Manager.sysctl_shadow will be
set to NULL after it is freed,

Fixes a bug introduced by 6d9ef22acd.
2024-09-16 15:12:47 +09:00
Yu Watanabe 7c778cecdb network/sysctl: several cleanups for sysctl_add_monitor()
- rename rootcg -> root_cgroup_fd, to emphasize it is a fd,
- drop nested function call, and check error code.
2024-09-16 14:36:54 +09:00
Yu Watanabe 46718d344f bpf-link: introduce bpf_ring_buffer_free() and friends
Then, replace rb_free() in networkd.

Follow-up for 6d9ef22acd.
2024-09-16 14:36:54 +09:00
Yu Watanabe 9295c7ae09 network/sysctl: use wrapped free functions
No functional change, just refactoring.

Follow-up for 6d9ef22acd.
2024-09-16 14:36:54 +09:00
Yu Watanabe 41afafbf2a network/sysctl-monitor: fix sanity check in cut_last()
This also adds basic comment about the return code.

Follow-up for 6d9ef22acd.
2024-09-16 14:36:54 +09:00
19 changed files with 611 additions and 147 deletions

2
TODO
View File

@ -1860,7 +1860,7 @@ Features:
* fstab-generator: default to tmpfs-as-root if only usr= is specified on the kernel cmdline
* docs: bring https://systemd.io/MY_SERVICE_CANT_GET_REATLIME up to date
* docs: bring https://systemd.io/MY_SERVICE_CANT_GET_REALTIME up to date
* add a job mode that will fail if a transaction would mean stopping
running units. Use this in timedated to manage the NTP service

View File

@ -247,4 +247,4 @@ Note that scope units created by `machined`'s `CreateMachine()` call have this f
### Example
Please see the [systemd-run sources](http://cgit.freedesktop.org/systemd/systemd/plain/src/run/run.c) for a relatively simple example how to create scope or service units transiently and pass properties to them.
Please see the [systemd-run sources](https://github.com/systemd/systemd/blob/main/src/run/run.c) for a relatively simple example how to create scope or service units transiently and pass properties to them.

View File

@ -104,7 +104,7 @@ A: Use:
**Q: Whenever my service tries to acquire RT scheduling for one of its threads this is refused with EPERM even though my service is running with full privileges. This works fine on my non-systemd system!**
A: By default, systemd places all systemd daemons in their own cgroup in the "cpu" hierarchy. Unfortunately, due to a kernel limitation, this has the effect of disallowing RT entirely for the service. See [My Service Can't Get Realtime!](/MY_SERVICE_CANT_GET_REATLIME) for a longer discussion and what to do about this.
A: By default, systemd places all systemd daemons in their own cgroup in the "cpu" hierarchy. Unfortunately, due to a kernel limitation, this has the effect of disallowing RT entirely for the service. See [My Service Can't Get Realtime!](/MY_SERVICE_CANT_GET_REALTIME) for a longer discussion and what to do about this.
**Q: My service is ordered after `network.target` but at boot it is still called before the network is up. What's going on?**

View File

@ -76,16 +76,7 @@
<term><varname>Type=</varname></term>
<listitem><para>The GPT partition type UUID to match. This may be a GPT partition type UUID such as
<constant>4f68bce3-e8cd-4db1-96e7-fbcaf984b709</constant>, or an identifier.
Architecture specific partition types can use one of these architecture identifiers:
<constant>alpha</constant>, <constant>arc</constant>, <constant>arm</constant> (32-bit),
<constant>arm64</constant> (64-bit, aka aarch64), <constant>ia64</constant>,
<constant>loongarch64</constant>, <constant>mips-le</constant>, <constant>mips64-le</constant>,
<constant>parisc</constant>, <constant>ppc</constant>, <constant>ppc64</constant>,
<constant>ppc64-le</constant>, <constant>riscv32</constant>, <constant>riscv64</constant>,
<constant>s390</constant>, <constant>s390x</constant>, <constant>tilegx</constant>,
<constant>x86</constant> (32-bit, aka i386) and <constant>x86-64</constant> (64-bit, aka amd64).
</para>
<constant>4f68bce3-e8cd-4db1-96e7-fbcaf984b709</constant>, or an identifier.</para>
<para>The supported identifiers are:</para>
@ -237,7 +228,14 @@
</tgroup>
</table>
<para>This setting defaults to <constant>linux-generic</constant>.</para>
<para>Architecture specific partition types can use one of these architecture identifiers:
<constant>alpha</constant>, <constant>arc</constant>, <constant>arm</constant> (32-bit),
<constant>arm64</constant> (64-bit, aka aarch64), <constant>ia64</constant>,
<constant>loongarch64</constant>, <constant>mips-le</constant>, <constant>mips64-le</constant>,
<constant>parisc</constant>, <constant>ppc</constant>, <constant>ppc64</constant>,
<constant>ppc64-le</constant>, <constant>riscv32</constant>, <constant>riscv64</constant>,
<constant>s390</constant>, <constant>s390x</constant>, <constant>tilegx</constant>,
<constant>x86</constant> (32-bit, aka i386) and <constant>x86-64</constant> (64-bit, aka amd64).</para>
<para>Most of the partition type UUIDs listed above are defined in the <ulink
url="https://uapi-group.org/specifications/specs/discoverable_partitions_specification">Discoverable Partitions
@ -897,6 +895,59 @@
<xi:include href="version-info.xml" xpointer="v257"/></listitem>
</varlistentry>
<varlistentry>
<term><varname>SupplementFor=</varname></term>
<listitem><para>Takes a partition definition name, such as <literal>10-esp</literal>. If specified,
<command>systemd-repart</command> will avoid creating this partition and instead prefer to partially
merge the two definitions. However, depending on the existing layout of partitions on disk,
<command>systemd-repart</command> may be forced to fall back onto un-merging the definitions and
using them as originally written, potentially creating this partition. Specifically,
<command>systemd-repart</command> will fall back if this partition is found to already exist on disk,
or if the target partition already exists on disk but is too small, or if it cannot allocate space
for the merged partition for some other reason.</para>
<para>The following fields are merged into the target definition in the specified ways:
<varname>Weight=</varname> and <varname>PaddingWeight=</varname> are simply overwritten;
<varname>SizeMinBytes=</varname> and <varname>PaddingMinBytes=</varname> use the larger of the two
values; <varname>SizeMaxBytes=</varname> and <varname>PaddingMaxBytes=</varname> use the smaller
value; and <varname>CopyFiles=</varname>, <varname>ExcludeFiles=</varname>,
<varname>ExcludeFilesTarget=</varname>, <varname>MakeDirectories=</varname>, and
<varname>Subvolumes=</varname> are concatenated.</para>
<para>Usage of this option in combination with <varname>CopyBlocks=</varname>,
<varname>Encrypt=</varname>, or <varname>Verity=</varname> is not supported. The target definition
cannot set these settings either. A definition cannot simultaneously be a supplement and act as a
target for some other supplement definition. A target cannot have more than one supplement partition
associated with it.</para>
<para>For example, distributions can use this to implement <variable>$BOOT</variable> as defined in
the <ulink url="https://uapi-group.org/specifications/specs/boot_loader_specification/">Boot Loader
Specification</ulink>. Distributions may prefer to use the ESP as <variable>$BOOT</variable> whenever
possible, but to adhere to the spec XBOOTLDR must sometimes be used instead. So, they should create
two definitions: the first defining an ESP big enough to hold just the bootloader, and a second for
the XBOOTLDR that's sufficiently large to hold kernels and configured as a supplement for the ESP.
Whenever possible, <command>systemd-repart</command> will try to merge the two definitions to create
one large ESP, but if that's not allowable due to the existing conditions on disk a small ESP and a
large XBOOTLDR will be created instead.</para>
<para>As another example, distributions can also use this to seamlessly share a single
<filename>/home</filename> partition in a multi-boot scenario, while preferring to keep
<filename>/home</filename> on the root partition by default. Having a <filename>/home</filename>
partition separated from the root partition entails some extra complexity: someone has to decide how
to split the space between the two partitions. On the other hand, it allows a user to share their
home area between multiple installed OSs (i.e. via <citerefentry><refentrytitle>systemd-homed.service
</refentrytitle><manvolnum>8</manvolnum></citerefentry>). Distributions should create two definitions:
the first for a root partition that takes up some relatively small percentage of the disk, and the
second as a supplement for the first to create a <filename>/home</filename> partition that takes up
all the remaining free space. On first boot, if <command>systemd-repart</command> finds an existing
<filename>/home</filename> partition on disk, it'll un-merge the definitions and create just a small
root partition. Otherwise, the definitions will be merged and a single large root partition will be
created.</para>
<xi:include href="version-info.xml" xpointer="v257"/></listitem>
</varlistentry>
</variablelist>
</refsect1>

View File

@ -153,7 +153,7 @@ bool strv_overlap(char * const *a, char * const *b) _pure_;
_STRV_FOREACH_BACKWARDS(s, l, UNIQ_T(h, UNIQ), UNIQ_T(i, UNIQ))
#define _STRV_FOREACH_PAIR(x, y, l, i) \
for (typeof(*l) *x, *y, *i = (l); \
for (typeof(*(l)) *x, *y, *i = (l); \
i && *(x = i) && *(y = i + 1); \
i += 2)

View File

@ -36,23 +36,22 @@ struct str {
static long cut_last(u32 i, struct str *str) {
char *s;
/* Sanity check for the preverifier */
if (i >= str->l)
return 1; /* exit from the loop */
i = str->l - i - 1;
s = str->s + i;
/* Sanity check for the preverifier */
if (i >= str->l)
return 1;
if (*s == 0)
return 0;
return 0; /* continue */
if (*s == '\n' || *s == '\r' || *s == ' ' || *s == '\t') {
*s = 0;
return 0;
return 0; /* continue */
}
return 1;
return 1; /* exit from the loop */
}
/* Cut off trailing whitespace and newlines */

View File

@ -221,7 +221,7 @@ int link_set_ipv6ll_stable_secret(Link *link) {
}
return sysctl_write_ip_property(AF_INET6, link->ifname, "stable_secret",
IN6_ADDR_TO_STRING(&a), &link->manager->sysctl_shadow);
IN6_ADDR_TO_STRING(&a), manager_get_sysctl_shadow(link->manager));
}
int link_set_ipv6ll_addrgen_mode(Link *link, IPv6LinkLocalAddressGenMode mode) {
@ -232,7 +232,7 @@ int link_set_ipv6ll_addrgen_mode(Link *link, IPv6LinkLocalAddressGenMode mode) {
if (mode == link->ipv6ll_address_gen_mode)
return 0;
return sysctl_write_ip_property_uint32(AF_INET6, link->ifname, "addr_gen_mode", mode, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_uint32(AF_INET6, link->ifname, "addr_gen_mode", mode, manager_get_sysctl_shadow(link->manager));
}
static const char* const ipv6_link_local_address_gen_mode_table[_IPV6_LINK_LOCAL_ADDRESS_GEN_MODE_MAX] = {

View File

@ -604,7 +604,9 @@ int manager_new(Manager **ret, bool test_mode) {
.duid_product_uuid.type = DUID_TYPE_UUID,
.dhcp_server_persist_leases = true,
.ip_forwarding = { -1, -1, },
#if HAVE_VMLINUX_H
.cgroup_fd = -EBADF,
#endif
};
*ret = TAKE_PTR(m);
@ -624,8 +626,6 @@ Manager* manager_free(Manager *m) {
HASHMAP_FOREACH(link, m->links_by_index)
(void) link_stop_engines(link, true);
hashmap_free(m->sysctl_shadow);
m->request_queue = ordered_set_free(m->request_queue);
m->remove_request_queue = ordered_set_free(m->remove_request_queue);

View File

@ -122,12 +122,14 @@ struct Manager {
/* sysctl */
int ip_forwarding[2];
#if HAVE_VMLINUX_H
Hashmap *sysctl_shadow;
sd_event_source *sysctl_event_source;
struct ring_buffer *sysctl_buffer;
struct sysctl_monitor_bpf *sysctl_skel;
struct bpf_link *sysctl_link;
int cgroup_fd;
#endif
};
int manager_new(Manager **ret, bool test_mode);
@ -150,4 +152,12 @@ int manager_set_timezone(Manager *m, const char *timezone);
int manager_reload(Manager *m, sd_bus_message *message);
static inline Hashmap** manager_get_sysctl_shadow(Manager *manager) {
#if HAVE_VMLINUX_H
return &ASSERT_PTR(manager)->sysctl_shadow;
#else
return NULL;
#endif
}
DEFINE_TRIVIAL_CLEANUP_FUNC(Manager*, manager_free);

View File

@ -987,7 +987,7 @@ static int ndisc_router_process_reachable_time(Link *link, sd_ndisc_router *rt)
}
/* Set the reachable time for Neighbor Solicitations. */
r = sysctl_write_ip_neighbor_property_uint32(AF_INET6, link->ifname, "base_reachable_time_ms", (uint32_t) msec, &link->manager->sysctl_shadow);
r = sysctl_write_ip_neighbor_property_uint32(AF_INET6, link->ifname, "base_reachable_time_ms", (uint32_t) msec, manager_get_sysctl_shadow(link->manager));
if (r < 0)
log_link_warning_errno(link, r, "Failed to apply neighbor reachable time (%"PRIu64"), ignoring: %m", msec);
@ -1021,7 +1021,7 @@ static int ndisc_router_process_retransmission_time(Link *link, sd_ndisc_router
}
/* Set the retransmission time for Neighbor Solicitations. */
r = sysctl_write_ip_neighbor_property_uint32(AF_INET6, link->ifname, "retrans_time_ms", (uint32_t) msec, &link->manager->sysctl_shadow);
r = sysctl_write_ip_neighbor_property_uint32(AF_INET6, link->ifname, "retrans_time_ms", (uint32_t) msec, manager_get_sysctl_shadow(link->manager));
if (r < 0)
log_link_warning_errno(link, r, "Failed to apply neighbor retransmission time (%"PRIu64"), ignoring: %m", msec);
@ -1057,7 +1057,7 @@ static int ndisc_router_process_hop_limit(Link *link, sd_ndisc_router *rt) {
if (hop_limit <= 0)
return 0;
r = sysctl_write_ip_property_uint32(AF_INET6, link->ifname, "hop_limit", (uint32_t) hop_limit, &link->manager->sysctl_shadow);
r = sysctl_write_ip_property_uint32(AF_INET6, link->ifname, "hop_limit", (uint32_t) hop_limit, manager_get_sysctl_shadow(link->manager));
if (r < 0)
log_link_warning_errno(link, r, "Failed to apply hop_limit (%u), ignoring: %m", hop_limit);

View File

@ -34,13 +34,7 @@ static struct sysctl_monitor_bpf* sysctl_monitor_bpf_free(struct sysctl_monitor_
return NULL;
}
static struct ring_buffer* rb_free(struct ring_buffer *rb) {
sym_ring_buffer__free(rb);
return NULL;
}
DEFINE_TRIVIAL_CLEANUP_FUNC(struct sysctl_monitor_bpf *, sysctl_monitor_bpf_free);
DEFINE_TRIVIAL_CLEANUP_FUNC(struct ring_buffer *, rb_free);
static int sysctl_event_handler(void *ctx, void *data, size_t data_sz) {
struct sysctl_write_event *we = ASSERT_PTR(data);
@ -99,10 +93,10 @@ static int on_ringbuf_io(sd_event_source *s, int fd, uint32_t revents, void *use
int sysctl_add_monitor(Manager *manager) {
_cleanup_(sysctl_monitor_bpf_freep) struct sysctl_monitor_bpf *obj = NULL;
_cleanup_(bpf_link_freep) struct bpf_link *sysctl_link = NULL;
_cleanup_(rb_freep) struct ring_buffer *sysctl_buffer = NULL;
_cleanup_close_ int cgroup_fd = -EBADF, rootcg = -EBADF;
_cleanup_(bpf_ring_buffer_freep) struct ring_buffer *sysctl_buffer = NULL;
_cleanup_close_ int cgroup_fd = -EBADF, root_cgroup_fd = -EBADF;
_cleanup_free_ char *cgroup = NULL;
int idx = 0, r;
int idx = 0, r, fd;
assert(manager);
@ -116,9 +110,9 @@ int sysctl_add_monitor(Manager *manager) {
if (r < 0)
return log_warning_errno(r, "Failed to get cgroup path, ignoring: %m.");
rootcg = cg_path_open(SYSTEMD_CGROUP_CONTROLLER, "/");
if (rootcg < 0)
return log_warning_errno(rootcg, "Failed to open cgroup, ignoring: %m.");
root_cgroup_fd = cg_path_open(SYSTEMD_CGROUP_CONTROLLER, "/");
if (root_cgroup_fd < 0)
return log_warning_errno(root_cgroup_fd, "Failed to open cgroup, ignoring: %m.");
obj = sysctl_monitor_bpf__open_and_load();
if (!obj) {
@ -133,21 +127,27 @@ int sysctl_add_monitor(Manager *manager) {
if (sym_bpf_map_update_elem(sym_bpf_map__fd(obj->maps.cgroup_map), &idx, &cgroup_fd, BPF_ANY))
return log_warning_errno(errno, "Failed to update cgroup map: %m");
sysctl_link = sym_bpf_program__attach_cgroup(obj->progs.sysctl_monitor, rootcg);
sysctl_link = sym_bpf_program__attach_cgroup(obj->progs.sysctl_monitor, root_cgroup_fd);
r = bpf_get_error_translated(sysctl_link);
if (r < 0) {
log_info_errno(r, "Unable to attach sysctl monitor BPF program to cgroup, ignoring: %m.");
return 0;
}
sysctl_buffer = sym_ring_buffer__new(
sym_bpf_map__fd(obj->maps.written_sysctls),
sysctl_event_handler, &manager->sysctl_shadow, NULL);
fd = sym_bpf_map__fd(obj->maps.written_sysctls);
if (fd < 0)
return log_warning_errno(fd, "Failed to get fd of sysctl maps: %m");
sysctl_buffer = sym_ring_buffer__new(fd, sysctl_event_handler, &manager->sysctl_shadow, NULL);
if (!sysctl_buffer)
return log_warning_errno(errno, "Failed to create ring buffer: %m");
fd = sym_ring_buffer__epoll_fd(sysctl_buffer);
if (fd < 0)
return log_warning_errno(fd, "Failed to get poll fd of ring buffer: %m");
r = sd_event_add_io(manager->event, &manager->sysctl_event_source,
sym_ring_buffer__epoll_fd(sysctl_buffer), EPOLLIN, on_ringbuf_io, sysctl_buffer);
fd, EPOLLIN, on_ringbuf_io, sysctl_buffer);
if (r < 0)
return log_warning_errno(r, "Failed to watch sysctl event ringbuffer: %m");
@ -163,23 +163,11 @@ void sysctl_remove_monitor(Manager *manager) {
assert(manager);
manager->sysctl_event_source = sd_event_source_disable_unref(manager->sysctl_event_source);
if (manager->sysctl_buffer) {
sym_ring_buffer__free(manager->sysctl_buffer);
manager->sysctl_buffer = NULL;
}
if (manager->sysctl_link) {
sym_bpf_link__destroy(manager->sysctl_link);
manager->sysctl_link = NULL;
}
if (manager->sysctl_skel) {
sysctl_monitor_bpf__destroy(manager->sysctl_skel);
manager->sysctl_skel = NULL;
}
manager->sysctl_buffer = bpf_ring_buffer_free(manager->sysctl_buffer);
manager->sysctl_link = bpf_link_free(manager->sysctl_link);
manager->sysctl_skel = sysctl_monitor_bpf_free(manager->sysctl_skel);
manager->cgroup_fd = safe_close(manager->cgroup_fd);
manager->sysctl_shadow = hashmap_free(manager->sysctl_shadow);
}
int sysctl_clear_link_shadows(Link *link) {
@ -222,13 +210,13 @@ static void manager_set_ip_forwarding(Manager *manager, int family) {
return; /* keep */
/* First, set the default value. */
r = sysctl_write_ip_property_boolean(family, "default", "forwarding", t, &manager->sysctl_shadow);
r = sysctl_write_ip_property_boolean(family, "default", "forwarding", t, manager_get_sysctl_shadow(manager));
if (r < 0)
log_warning_errno(r, "Failed to %s the default %s forwarding: %m",
enable_disable(t), af_to_ipv4_ipv6(family));
/* Then, set the value to all interfaces. */
r = sysctl_write_ip_property_boolean(family, "all", "forwarding", t, &manager->sysctl_shadow);
r = sysctl_write_ip_property_boolean(family, "all", "forwarding", t, manager_get_sysctl_shadow(manager));
if (r < 0)
log_warning_errno(r, "Failed to %s %s forwarding for all interfaces: %m",
enable_disable(t), af_to_ipv4_ipv6(family));
@ -273,7 +261,7 @@ static int link_update_ipv6_sysctl(Link *link) {
if (!link_ipv6_enabled(link))
return 0;
return sysctl_write_ip_property_boolean(AF_INET6, link->ifname, "disable_ipv6", false, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_boolean(AF_INET6, link->ifname, "disable_ipv6", false, manager_get_sysctl_shadow(link->manager));
}
static int link_set_proxy_arp(Link *link) {
@ -286,7 +274,7 @@ static int link_set_proxy_arp(Link *link) {
if (link->network->proxy_arp < 0)
return 0;
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "proxy_arp", link->network->proxy_arp > 0, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "proxy_arp", link->network->proxy_arp > 0, manager_get_sysctl_shadow(link->manager));
}
static int link_set_proxy_arp_pvlan(Link *link) {
@ -299,7 +287,7 @@ static int link_set_proxy_arp_pvlan(Link *link) {
if (link->network->proxy_arp_pvlan < 0)
return 0;
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "proxy_arp_pvlan", link->network->proxy_arp_pvlan > 0, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "proxy_arp_pvlan", link->network->proxy_arp_pvlan > 0, manager_get_sysctl_shadow(link->manager));
}
int link_get_ip_forwarding(Link *link, int family) {
@ -341,7 +329,7 @@ static int link_set_ip_forwarding_impl(Link *link, int family) {
if (t < 0)
return 0; /* keep */
r = sysctl_write_ip_property_boolean(family, link->ifname, "forwarding", t, &link->manager->sysctl_shadow);
r = sysctl_write_ip_property_boolean(family, link->ifname, "forwarding", t, manager_get_sysctl_shadow(link->manager));
if (r < 0)
return log_link_warning_errno(link, r, "Failed to %s %s forwarding, ignoring: %m",
enable_disable(t), af_to_ipv4_ipv6(family));
@ -418,7 +406,7 @@ static int link_set_ipv4_rp_filter(Link *link) {
if (link->network->ipv4_rp_filter < 0)
return 0;
return sysctl_write_ip_property_int(AF_INET, link->ifname, "rp_filter", link->network->ipv4_rp_filter, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_int(AF_INET, link->ifname, "rp_filter", link->network->ipv4_rp_filter, manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv6_privacy_extensions(Link *link) {
@ -438,7 +426,7 @@ static int link_set_ipv6_privacy_extensions(Link *link) {
if (val == IPV6_PRIVACY_EXTENSIONS_KERNEL)
return 0;
return sysctl_write_ip_property_int(AF_INET6, link->ifname, "use_tempaddr", (int) val, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_int(AF_INET6, link->ifname, "use_tempaddr", (int) val, manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv6_accept_ra(Link *link) {
@ -448,7 +436,7 @@ static int link_set_ipv6_accept_ra(Link *link) {
if (!link_is_configured_for_family(link, AF_INET6))
return 0;
return sysctl_write_ip_property(AF_INET6, link->ifname, "accept_ra", "0", &link->manager->sysctl_shadow);
return sysctl_write_ip_property(AF_INET6, link->ifname, "accept_ra", "0", manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv6_dad_transmits(Link *link) {
@ -461,7 +449,7 @@ static int link_set_ipv6_dad_transmits(Link *link) {
if (link->network->ipv6_dad_transmits < 0)
return 0;
return sysctl_write_ip_property_int(AF_INET6, link->ifname, "dad_transmits", link->network->ipv6_dad_transmits, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_int(AF_INET6, link->ifname, "dad_transmits", link->network->ipv6_dad_transmits, manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv6_hop_limit(Link *link) {
@ -474,7 +462,7 @@ static int link_set_ipv6_hop_limit(Link *link) {
if (link->network->ipv6_hop_limit <= 0)
return 0;
return sysctl_write_ip_property_int(AF_INET6, link->ifname, "hop_limit", link->network->ipv6_hop_limit, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_int(AF_INET6, link->ifname, "hop_limit", link->network->ipv6_hop_limit, manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv6_retransmission_time(Link *link) {
@ -493,7 +481,7 @@ static int link_set_ipv6_retransmission_time(Link *link) {
if (retrans_time_ms <= 0 || retrans_time_ms > UINT32_MAX)
return 0;
return sysctl_write_ip_neighbor_property_uint32(AF_INET6, link->ifname, "retrans_time_ms", retrans_time_ms, &link->manager->sysctl_shadow);
return sysctl_write_ip_neighbor_property_uint32(AF_INET6, link->ifname, "retrans_time_ms", retrans_time_ms, manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv6_proxy_ndp(Link *link) {
@ -510,7 +498,7 @@ static int link_set_ipv6_proxy_ndp(Link *link) {
else
v = !set_isempty(link->network->ipv6_proxy_ndp_addresses);
return sysctl_write_ip_property_boolean(AF_INET6, link->ifname, "proxy_ndp", v, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_boolean(AF_INET6, link->ifname, "proxy_ndp", v, manager_get_sysctl_shadow(link->manager));
}
int link_set_ipv6_mtu(Link *link, int log_level) {
@ -538,7 +526,7 @@ int link_set_ipv6_mtu(Link *link, int log_level) {
mtu = link->mtu;
}
return sysctl_write_ip_property_uint32(AF_INET6, link->ifname, "mtu", mtu, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_uint32(AF_INET6, link->ifname, "mtu", mtu, manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv4_accept_local(Link *link) {
@ -551,7 +539,7 @@ static int link_set_ipv4_accept_local(Link *link) {
if (link->network->ipv4_accept_local < 0)
return 0;
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "accept_local", link->network->ipv4_accept_local > 0, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "accept_local", link->network->ipv4_accept_local > 0, manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv4_route_localnet(Link *link) {
@ -564,7 +552,7 @@ static int link_set_ipv4_route_localnet(Link *link) {
if (link->network->ipv4_route_localnet < 0)
return 0;
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "route_localnet", link->network->ipv4_route_localnet > 0, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "route_localnet", link->network->ipv4_route_localnet > 0, manager_get_sysctl_shadow(link->manager));
}
static int link_set_ipv4_promote_secondaries(Link *link) {
@ -579,7 +567,7 @@ static int link_set_ipv4_promote_secondaries(Link *link) {
* otherwise. The way systemd-networkd works is that the new IP of a lease is added as a
* secondary IP and when the primary one expires it relies on the kernel to promote the
* secondary IP. See also https://github.com/systemd/systemd/issues/7163 */
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "promote_secondaries", true, &link->manager->sysctl_shadow);
return sysctl_write_ip_property_boolean(AF_INET, link->ifname, "promote_secondaries", true, manager_get_sysctl_shadow(link->manager));
}
int link_set_sysctl(Link *link) {

View File

@ -6,6 +6,7 @@
#include "sd-daemon.h"
#include "bpf-dlopen.h"
#include "bpf-link.h"
#include "build-path.h"
#include "common-signal.h"
#include "env-util.h"
@ -141,8 +142,7 @@ Manager* manager_free(Manager *m) {
#if HAVE_VMLINUX_H
sd_event_source_disable_unref(m->userns_restrict_bpf_ring_buffer_event_source);
if (m->userns_restrict_bpf_ring_buffer)
sym_ring_buffer__free(m->userns_restrict_bpf_ring_buffer);
bpf_ring_buffer_free(m->userns_restrict_bpf_ring_buffer);
userns_restrict_bpf_free(m->userns_restrict_bpf);
#endif

View File

@ -404,6 +404,10 @@ typedef struct Partition {
PartitionEncryptedVolume *encrypted_volume;
char *supplement_for_name;
struct Partition *supplement_for, *supplement_target_for;
struct Partition *suppressing;
struct Partition *siblings[_VERITY_MODE_MAX];
LIST_FIELDS(struct Partition, partitions);
@ -411,6 +415,7 @@ typedef struct Partition {
#define PARTITION_IS_FOREIGN(p) (!(p)->definition_path)
#define PARTITION_EXISTS(p) (!!(p)->current_partition)
#define PARTITION_SUPPRESSED(p) ((p)->supplement_for && (p)->supplement_for->suppressing == (p))
struct FreeArea {
Partition *after;
@ -520,6 +525,28 @@ static Partition *partition_new(void) {
return p;
}
static void partition_unlink_supplement(Partition *p) {
assert(p);
assert(!p->supplement_for || !p->supplement_target_for); /* Can't be both */
if (p->supplement_target_for) {
assert(p->supplement_target_for->supplement_for == p);
p->supplement_target_for->supplement_for = NULL;
}
if (p->supplement_for) {
assert(p->supplement_for->supplement_target_for == p);
assert(!p->supplement_for->suppressing || p->supplement_for->suppressing == p);
p->supplement_for->supplement_target_for = p->supplement_for->suppressing = NULL;
}
p->supplement_for_name = mfree(p->supplement_for_name);
p->supplement_target_for = p->supplement_for = p->suppressing = NULL;
}
static Partition* partition_free(Partition *p) {
if (!p)
return NULL;
@ -563,6 +590,8 @@ static Partition* partition_free(Partition *p) {
partition_encrypted_volume_free(p->encrypted_volume);
partition_unlink_supplement(p);
return mfree(p);
}
@ -608,6 +637,8 @@ static void partition_foreignize(Partition *p) {
p->n_mountpoints = 0;
p->encrypted_volume = partition_encrypted_volume_free(p->encrypted_volume);
partition_unlink_supplement(p);
}
static bool partition_type_exclude(const GptPartitionType *type) {
@ -740,6 +771,10 @@ static void partition_drop_or_foreignize(Partition *p) {
p->dropped = true;
p->allocated_to_area = NULL;
/* If a supplement partition is dropped, we don't want to merge in its settings. */
if (PARTITION_SUPPRESSED(p))
p->supplement_for->suppressing = NULL;
}
}
@ -775,7 +810,7 @@ static bool context_drop_or_foreignize_one_priority(Context *context) {
}
static uint64_t partition_min_size(const Context *context, const Partition *p) {
uint64_t sz;
uint64_t sz, override_min;
assert(context);
assert(p);
@ -817,11 +852,13 @@ static uint64_t partition_min_size(const Context *context, const Partition *p) {
sz = d;
}
return MAX(round_up_size(p->size_min != UINT64_MAX ? p->size_min : DEFAULT_MIN_SIZE, context->grain_size), sz);
override_min = p->suppressing ? MAX(p->size_min, p->suppressing->size_min) : p->size_min;
return MAX(round_up_size(override_min != UINT64_MAX ? override_min : DEFAULT_MIN_SIZE, context->grain_size), sz);
}
static uint64_t partition_max_size(const Context *context, const Partition *p) {
uint64_t sm;
uint64_t sm, override_max;
/* Calculate how large the partition may become at max. This is generally the configured maximum
* size, except when it already exists and is larger than that. In that case it's the existing size,
@ -839,10 +876,11 @@ static uint64_t partition_max_size(const Context *context, const Partition *p) {
if (p->verity == VERITY_SIG)
return VERITY_SIG_SIZE;
if (p->size_max == UINT64_MAX)
override_max = p->suppressing ? MIN(p->size_max, p->suppressing->size_max) : p->size_max;
if (override_max == UINT64_MAX)
return UINT64_MAX;
sm = round_down_size(p->size_max, context->grain_size);
sm = round_down_size(override_max, context->grain_size);
if (p->current_size != UINT64_MAX)
sm = MAX(p->current_size, sm);
@ -851,13 +889,17 @@ static uint64_t partition_max_size(const Context *context, const Partition *p) {
}
static uint64_t partition_min_padding(const Partition *p) {
uint64_t override_min;
assert(p);
return p->padding_min != UINT64_MAX ? p->padding_min : 0;
override_min = p->suppressing ? MAX(p->padding_min, p->suppressing->padding_min) : p->padding_min;
return override_min != UINT64_MAX ? override_min : 0;
}
static uint64_t partition_max_padding(const Partition *p) {
assert(p);
return p->padding_max;
return p->suppressing ? MIN(p->padding_max, p->suppressing->padding_max) : p->padding_max;
}
static uint64_t partition_min_size_with_padding(Context *context, const Partition *p) {
@ -977,14 +1019,22 @@ static bool context_allocate_partitions(Context *context, uint64_t *ret_largest_
uint64_t required;
FreeArea *a = NULL;
/* Skip partitions we already dropped or that already exist */
if (p->dropped || PARTITION_EXISTS(p))
if (p->dropped || PARTITION_IS_FOREIGN(p) || PARTITION_SUPPRESSED(p))
continue;
/* How much do we need to fit? */
required = partition_min_size_with_padding(context, p);
assert(required % context->grain_size == 0);
/* For existing partitions, we should verify that they'll actually fit */
if (PARTITION_EXISTS(p)) {
if (p->current_size + p->current_padding < required)
return false; /* 😢 We won't be able to grow to the required min size! */
continue;
}
/* For new partitions, see if there's a free area big enough */
for (size_t i = 0; i < context->n_free_areas; i++) {
a = context->free_areas[i];
@ -1007,6 +1057,57 @@ static bool context_allocate_partitions(Context *context, uint64_t *ret_largest_
return true;
}
static bool context_unmerge_and_allocate_partitions(Context *context) {
assert(context);
/* This should only be called after plain context_allocate_partitions fails. This algorithm will
* try, in the order that minimizes the number of created supplement partitions, all combinations of
* un-suppressing supplement partitions until it finds one that works. */
/* First, let's try to un-suppress just one supplement partition and see if that gets us anywhere */
LIST_FOREACH(partitions, p, context->partitions) {
Partition *unsuppressed;
if (!p->suppressing)
continue;
unsuppressed = TAKE_PTR(p->suppressing);
if (context_allocate_partitions(context, NULL))
return true;
p->suppressing = unsuppressed;
}
/* Looks like not. So we have to un-suppress at least two partitions. We can do this recursively */
LIST_FOREACH(partitions, p, context->partitions) {
Partition *unsuppressed;
if (!p->suppressing)
continue;
unsuppressed = TAKE_PTR(p->suppressing);
if (context_unmerge_and_allocate_partitions(context))
return true;
p->suppressing = unsuppressed;
}
/* No combination of un-suppressed supplements made it possible to fit the partitions */
return false;
}
static uint32_t partition_weight(const Partition *p) {
assert(p);
return p->suppressing ? p->suppressing->weight : p->weight;
}
static uint32_t partition_padding_weight(const Partition *p) {
assert(p);
return p->suppressing ? p->suppressing->padding_weight : p->padding_weight;
}
static int context_sum_weights(Context *context, FreeArea *a, uint64_t *ret) {
uint64_t weight_sum = 0;
@ -1020,13 +1121,11 @@ static int context_sum_weights(Context *context, FreeArea *a, uint64_t *ret) {
if (p->padding_area != a && p->allocated_to_area != a)
continue;
if (p->weight > UINT64_MAX - weight_sum)
if (!INC_SAFE(&weight_sum, partition_weight(p)))
goto overflow_sum;
weight_sum += p->weight;
if (p->padding_weight > UINT64_MAX - weight_sum)
if (!INC_SAFE(&weight_sum, partition_padding_weight(p)))
goto overflow_sum;
weight_sum += p->padding_weight;
}
*ret = weight_sum;
@ -1091,7 +1190,6 @@ static bool context_grow_partitions_phase(
* get any additional room from the left-overs. Similar, if two partitions have the same weight they
* should get the same space if possible, even if one has a smaller minimum size than the other. */
LIST_FOREACH(partitions, p, context->partitions) {
/* Look only at partitions associated with this free area, i.e. immediately
* preceding it, or allocated into it */
if (p->allocated_to_area != a && p->padding_area != a)
@ -1099,11 +1197,14 @@ static bool context_grow_partitions_phase(
if (p->new_size == UINT64_MAX) {
uint64_t share, rsz, xsz;
uint32_t weight;
bool charge = false;
weight = partition_weight(p);
/* Calculate how much this space this partition needs if everyone would get
* the weight based share */
share = scale_by_weight(*span, p->weight, *weight_sum);
share = scale_by_weight(*span, weight, *weight_sum);
rsz = partition_min_size(context, p);
xsz = partition_max_size(context, p);
@ -1143,15 +1244,18 @@ static bool context_grow_partitions_phase(
if (charge) {
*span = charge_size(context, *span, p->new_size);
*weight_sum = charge_weight(*weight_sum, p->weight);
*weight_sum = charge_weight(*weight_sum, weight);
}
}
if (p->new_padding == UINT64_MAX) {
uint64_t share, rsz, xsz;
uint32_t padding_weight;
bool charge = false;
share = scale_by_weight(*span, p->padding_weight, *weight_sum);
padding_weight = partition_padding_weight(p);
share = scale_by_weight(*span, padding_weight, *weight_sum);
rsz = partition_min_padding(p);
xsz = partition_max_padding(p);
@ -1170,7 +1274,7 @@ static bool context_grow_partitions_phase(
if (charge) {
*span = charge_size(context, *span, p->new_padding);
*weight_sum = charge_weight(*weight_sum, p->padding_weight);
*weight_sum = charge_weight(*weight_sum, padding_weight);
}
}
}
@ -2155,7 +2259,9 @@ static int partition_finalize_fstype(Partition *p, const char *path) {
static bool partition_needs_populate(const Partition *p) {
assert(p);
return !strv_isempty(p->copy_files) || !strv_isempty(p->make_directories) || !strv_isempty(p->make_symlinks);
assert(!p->supplement_for || !p->suppressing); /* Avoid infinite recursion */
return !strv_isempty(p->copy_files) || !strv_isempty(p->make_directories) || !strv_isempty(p->make_symlinks) ||
(p->suppressing && partition_needs_populate(p->suppressing));
}
static int partition_read_definition(Partition *p, const char *path, const char *const *conf_file_dirs) {
@ -2196,6 +2302,7 @@ static int partition_read_definition(Partition *p, const char *path, const char
{ "Partition", "EncryptedVolume", config_parse_encrypted_volume, 0, p },
{ "Partition", "Compression", config_parse_string, CONFIG_PARSE_STRING_SAFE_AND_ASCII, &p->compression },
{ "Partition", "CompressionLevel", config_parse_string, CONFIG_PARSE_STRING_SAFE_AND_ASCII, &p->compression_level },
{ "Partition", "SupplementFor", config_parse_string, 0, &p->supplement_for_name },
{}
};
_cleanup_free_ char *filename = NULL;
@ -2320,6 +2427,18 @@ static int partition_read_definition(Partition *p, const char *path, const char
return log_syntax(NULL, LOG_ERR, path, 1, SYNTHETIC_ERRNO(EINVAL),
"DefaultSubvolume= must be one of the paths in Subvolumes=.");
if (p->supplement_for_name) {
if (!filename_part_is_valid(p->supplement_for_name))
return log_syntax(NULL, LOG_ERR, path, 1, SYNTHETIC_ERRNO(EINVAL),
"SupplementFor= is an invalid filename: %s",
p->supplement_for_name);
if (p->copy_blocks_path || p->copy_blocks_auto || p->encrypt != ENCRYPT_OFF ||
p->verity != VERITY_OFF)
return log_syntax(NULL, LOG_ERR, path, 1, SYNTHETIC_ERRNO(EINVAL),
"SupplementFor= cannot be combined with CopyBlocks=/Encrypt=/Verity=");
}
/* Verity partitions are read only, let's imply the RO flag hence, unless explicitly configured otherwise. */
if ((IN_SET(p->type.designator,
PARTITION_ROOT_VERITY,
@ -2626,6 +2745,58 @@ static int context_copy_from(Context *context) {
return 0;
}
static bool check_cross_def_ranges_valid(uint64_t a_min, uint64_t a_max, uint64_t b_min, uint64_t b_max) {
if (a_min == UINT64_MAX && b_min == UINT64_MAX)
return true;
if (a_max == UINT64_MAX && b_max == UINT64_MAX)
return true;
return MAX(a_min != UINT64_MAX ? a_min : 0, b_min != UINT64_MAX ? b_min : 0) <= MIN(a_max, b_max);
}
static int supplement_find_target(const Context *context, const Partition *supplement, Partition **ret) {
int r;
assert(context);
assert(supplement);
assert(ret);
LIST_FOREACH(partitions, p, context->partitions) {
_cleanup_free_ char *filename = NULL;
if (p == supplement)
continue;
r = path_extract_filename(p->definition_path, &filename);
if (r < 0)
return log_error_errno(r,
"Failed to extract filename from path '%s': %m",
p->definition_path);
*ASSERT_PTR(endswith(filename, ".conf")) = 0; /* Remove the file extension */
if (!streq(supplement->supplement_for_name, filename))
continue;
if (p->supplement_for_name)
return log_syntax(NULL, LOG_ERR, supplement->definition_path, 1, SYNTHETIC_ERRNO(EINVAL),
"SupplementFor= target is itself configured as a supplement.");
if (p->suppressing)
return log_syntax(NULL, LOG_ERR, supplement->definition_path, 1, SYNTHETIC_ERRNO(EINVAL),
"SupplementFor= target already has a supplement defined: %s",
p->suppressing->definition_path);
*ret = p;
return 0;
}
return log_syntax(NULL, LOG_ERR, supplement->definition_path, 1, SYNTHETIC_ERRNO(EINVAL),
"Couldn't find target partition for SupplementFor=%s",
supplement->supplement_for_name);
}
static int context_read_definitions(Context *context) {
_cleanup_strv_free_ char **files = NULL;
Partition *last = LIST_FIND_TAIL(partitions, context->partitions);
@ -2717,7 +2888,33 @@ static int context_read_definitions(Context *context) {
if (dp->minimize == MINIMIZE_OFF && !(dp->copy_blocks_path || dp->copy_blocks_auto))
return log_syntax(NULL, LOG_ERR, p->definition_path, 1, SYNTHETIC_ERRNO(EINVAL),
"Minimize= set for verity hash partition but data partition does not set CopyBlocks= or Minimize=.");
}
LIST_FOREACH(partitions, p, context->partitions) {
Partition *tgt = NULL;
if (!p->supplement_for_name)
continue;
r = supplement_find_target(context, p, &tgt);
if (r < 0)
return r;
if (tgt->copy_blocks_path || tgt->copy_blocks_auto || tgt->encrypt != ENCRYPT_OFF ||
tgt->verity != VERITY_OFF)
return log_syntax(NULL, LOG_ERR, p->definition_path, 1, SYNTHETIC_ERRNO(EINVAL),
"SupplementFor= target uses CopyBlocks=/Encrypt=/Verity=");
if (!check_cross_def_ranges_valid(p->size_min, p->size_max, tgt->size_min, tgt->size_max))
return log_syntax(NULL, LOG_ERR, p->definition_path, 1, SYNTHETIC_ERRNO(EINVAL),
"SizeMinBytes= larger than SizeMaxBytes= when merged with SupplementFor= target.");
if (!check_cross_def_ranges_valid(p->padding_min, p->padding_max, tgt->padding_min, tgt->padding_max))
return log_syntax(NULL, LOG_ERR, p->definition_path, 1, SYNTHETIC_ERRNO(EINVAL),
"PaddingMinBytes= larger than PaddingMaxBytes= when merged with SupplementFor= target.");
p->supplement_for = tgt;
tgt->suppressing = tgt->supplement_target_for = p;
}
return 0;
@ -3101,6 +3298,10 @@ static int context_load_partition_table(Context *context) {
}
}
LIST_FOREACH(partitions, p, context->partitions)
if (PARTITION_SUPPRESSED(p) && PARTITION_EXISTS(p))
p->supplement_for->suppressing = NULL;
add_initial_free_area:
nsectors = fdisk_get_nsectors(c);
assert(nsectors <= UINT64_MAX/secsz);
@ -3192,6 +3393,11 @@ static void context_unload_partition_table(Context *context) {
p->current_uuid = SD_ID128_NULL;
p->current_label = mfree(p->current_label);
/* A supplement partition is only ever un-suppressed if the existing partition table prevented
* us from suppressing it. So when unloading the partition table, we must re-suppress. */
if (p->supplement_for)
p->supplement_for->suppressing = p;
}
context->start = UINT64_MAX;
@ -4969,6 +5175,31 @@ static int add_exclude_path(const char *path, Hashmap **denylist, DenyType type)
return 0;
}
static int shallow_join_strv(char ***ret, char **a, char **b) {
_cleanup_free_ char **joined = NULL;
char **iter;
assert(ret);
joined = new(char*, strv_length(a) + strv_length(b) + 1);
if (!joined)
return log_oom();
iter = joined;
STRV_FOREACH(i, a)
*(iter++) = *i;
STRV_FOREACH(i, b)
if (!strv_contains(joined, *i))
*(iter++) = *i;
*iter = NULL;
*ret = TAKE_PTR(joined);
return 0;
}
static int make_copy_files_denylist(
Context *context,
const Partition *p,
@ -4977,6 +5208,7 @@ static int make_copy_files_denylist(
Hashmap **ret) {
_cleanup_hashmap_free_ Hashmap *denylist = NULL;
_cleanup_free_ char **override_exclude_src = NULL, **override_exclude_tgt = NULL;
int r;
assert(context);
@ -4996,13 +5228,26 @@ static int make_copy_files_denylist(
/* Add the user configured excludes. */
STRV_FOREACH(e, p->exclude_files_source) {
if (p->suppressing) {
r = shallow_join_strv(&override_exclude_src,
p->exclude_files_source,
p->suppressing->exclude_files_source);
if (r < 0)
return r;
r = shallow_join_strv(&override_exclude_tgt,
p->exclude_files_target,
p->suppressing->exclude_files_target);
if (r < 0)
return r;
}
STRV_FOREACH(e, override_exclude_src ?: p->exclude_files_source) {
r = add_exclude_path(*e, &denylist, endswith(*e, "/") ? DENY_CONTENTS : DENY_INODE);
if (r < 0)
return r;
}
STRV_FOREACH(e, p->exclude_files_target) {
STRV_FOREACH(e, override_exclude_tgt ?: p->exclude_files_target) {
_cleanup_free_ char *path = NULL;
const char *s = path_startswith(*e, target);
@ -5096,6 +5341,7 @@ static int add_subvolume_path(const char *path, Set **subvolumes) {
static int make_subvolumes_strv(const Partition *p, char ***ret) {
_cleanup_strv_free_ char **subvolumes = NULL;
Subvolume *subvolume;
int r;
assert(p);
assert(ret);
@ -5104,6 +5350,18 @@ static int make_subvolumes_strv(const Partition *p, char ***ret) {
if (strv_extend(&subvolumes, subvolume->path) < 0)
return log_oom();
if (p->suppressing) {
_cleanup_strv_free_ char **suppressing = NULL;
r = make_subvolumes_strv(p->suppressing, &suppressing);
if (r < 0)
return r;
r = strv_extend_strv(&subvolumes, suppressing, /* filter_duplicates= */ true);
if (r < 0)
return log_oom();
}
*ret = TAKE_PTR(subvolumes);
return 0;
}
@ -5114,18 +5372,22 @@ static int make_subvolumes_set(
const char *target,
Set **ret) {
_cleanup_strv_free_ char **paths = NULL;
_cleanup_set_free_ Set *subvolumes = NULL;
Subvolume *subvolume;
int r;
assert(p);
assert(target);
assert(ret);
ORDERED_HASHMAP_FOREACH(subvolume, p->subvolumes) {
r = make_subvolumes_strv(p, &paths);
if (r < 0)
return r;
STRV_FOREACH(subvolume, paths) {
_cleanup_free_ char *path = NULL;
const char *s = path_startswith(subvolume->path, target);
const char *s = path_startswith(*subvolume, target);
if (!s)
continue;
@ -5168,6 +5430,7 @@ static usec_t epoch_or_infinity(void) {
static int do_copy_files(Context *context, Partition *p, const char *root) {
_cleanup_strv_free_ char **subvolumes = NULL;
_cleanup_free_ char **override_copy_files = NULL;
int r;
assert(p);
@ -5177,11 +5440,17 @@ static int do_copy_files(Context *context, Partition *p, const char *root) {
if (r < 0)
return r;
if (p->suppressing) {
r = shallow_join_strv(&override_copy_files, p->copy_files, p->suppressing->copy_files);
if (r < 0)
return r;
}
/* copy_tree_at() automatically copies the permissions of source directories to target directories if
* it created them. However, the root directory is created by us, so we have to manually take care
* that it is initialized. We use the first source directory targeting "/" as the metadata source for
* the root directory. */
STRV_FOREACH_PAIR(source, target, p->copy_files) {
STRV_FOREACH_PAIR(source, target, override_copy_files ?: p->copy_files) {
_cleanup_close_ int rfd = -EBADF, sfd = -EBADF;
if (!path_equal(*target, "/"))
@ -5202,7 +5471,7 @@ static int do_copy_files(Context *context, Partition *p, const char *root) {
break;
}
STRV_FOREACH_PAIR(source, target, p->copy_files) {
STRV_FOREACH_PAIR(source, target, override_copy_files ?: p->copy_files) {
_cleanup_hashmap_free_ Hashmap *denylist = NULL;
_cleanup_set_free_ Set *subvolumes_by_source_inode = NULL;
_cleanup_close_ int sfd = -EBADF, pfd = -EBADF, tfd = -EBADF;
@ -5320,6 +5589,7 @@ static int do_copy_files(Context *context, Partition *p, const char *root) {
static int do_make_directories(Partition *p, const char *root) {
_cleanup_strv_free_ char **subvolumes = NULL;
_cleanup_free_ char **override_dirs = NULL;
int r;
assert(p);
@ -5329,7 +5599,13 @@ static int do_make_directories(Partition *p, const char *root) {
if (r < 0)
return r;
STRV_FOREACH(d, p->make_directories) {
if (p->suppressing) {
r = shallow_join_strv(&override_dirs, p->make_directories, p->suppressing->make_directories);
if (r < 0)
return r;
}
STRV_FOREACH(d, override_dirs ?: p->make_directories) {
r = mkdir_p_root_full(root, *d, UID_INVALID, GID_INVALID, 0755, epoch_or_infinity(), subvolumes);
if (r < 0)
return log_error_errno(r, "Failed to create directory '%s' in file system: %m", *d);
@ -5377,6 +5653,12 @@ static int make_subvolumes_read_only(Partition *p, const char *root) {
return log_error_errno(r, "Failed to make subvolume '%s' read-only: %m", subvolume->path);
}
if (p->suppressing) {
r = make_subvolumes_read_only(p->suppressing, root);
if (r < 0)
return r;
}
return 0;
}
@ -5496,6 +5778,38 @@ static int partition_populate_filesystem(Context *context, Partition *p, const c
return 0;
}
static int append_btrfs_subvols(char ***l, OrderedHashmap *subvolumes, const char *default_subvolume) {
Subvolume *subvolume;
int r;
assert(l);
ORDERED_HASHMAP_FOREACH(subvolume, subvolumes) {
_cleanup_free_ char *s = NULL, *f = NULL;
s = strdup(subvolume->path);
if (!s)
return log_oom();
f = subvolume_flags_to_string(subvolume->flags);
if (!f)
return log_oom();
if (streq_ptr(subvolume->path, default_subvolume) &&
!strextend_with_separator(&f, ",", "default"))
return log_oom();
if (!isempty(f) && !strextend_with_separator(&s, ":", f))
return log_oom();
r = strv_extend_many(l, "--subvol", s);
if (r < 0)
return log_oom();
}
return 0;
}
static int finalize_extra_mkfs_options(const Partition *p, const char *root, char ***ret) {
_cleanup_strv_free_ char **sv = NULL;
int r;
@ -5510,28 +5824,14 @@ static int finalize_extra_mkfs_options(const Partition *p, const char *root, cha
p->format);
if (partition_needs_populate(p) && root && streq(p->format, "btrfs")) {
Subvolume *subvolume;
r = append_btrfs_subvols(&sv, p->subvolumes, p->default_subvolume);
if (r < 0)
return r;
ORDERED_HASHMAP_FOREACH(subvolume, p->subvolumes) {
_cleanup_free_ char *s = NULL, *f = NULL;
s = strdup(subvolume->path);
if (!s)
return log_oom();
f = subvolume_flags_to_string(subvolume->flags);
if (!f)
return log_oom();
if (streq_ptr(subvolume->path, p->default_subvolume) && !strextend_with_separator(&f, ",", "default"))
return log_oom();
if (!isempty(f) && !strextend_with_separator(&s, ":", f))
return log_oom();
r = strv_extend_many(&sv, "--subvol", s);
if (p->suppressing) {
r = append_btrfs_subvols(&sv, p->suppressing->subvolumes, NULL);
if (r < 0)
return log_oom();
return r;
}
}
@ -8524,7 +8824,7 @@ static int determine_auto_size(Context *c) {
LIST_FOREACH(partitions, p, c->partitions) {
uint64_t m;
if (p->dropped)
if (p->dropped || PARTITION_SUPPRESSED(p))
continue;
m = partition_min_size_with_padding(c, p);
@ -8756,13 +9056,36 @@ static int run(int argc, char *argv[]) {
if (context_allocate_partitions(context, &largest_free_area))
break; /* Success! */
if (!context_drop_or_foreignize_one_priority(context)) {
r = log_error_errno(SYNTHETIC_ERRNO(ENOSPC),
"Can't fit requested partitions into available free space (%s), refusing.",
FORMAT_BYTES(largest_free_area));
determine_auto_size(context);
return r;
}
if (context_unmerge_and_allocate_partitions(context))
break; /* We had to un-suppress a supplement or few, but still success! */
if (context_drop_or_foreignize_one_priority(context))
continue; /* Still no luck. Let's drop a priority and try again. */
/* No more priorities left to drop. This configuration just doesn't fit on this disk... */
r = log_error_errno(SYNTHETIC_ERRNO(ENOSPC),
"Can't fit requested partitions into available free space (%s), refusing.",
FORMAT_BYTES(largest_free_area));
determine_auto_size(context);
return r;
}
LIST_FOREACH(partitions, p, context->partitions) {
if (!p->supplement_for)
continue;
if (PARTITION_SUPPRESSED(p)) {
assert(!p->allocated_to_area);
p->dropped = true;
log_debug("Partition %s can be merged into %s, suppressing supplement.",
p->definition_path, p->supplement_for->definition_path);
} else if (PARTITION_EXISTS(p))
log_info("Partition %s already exists on disk, using supplement verbatim.",
p->definition_path);
else
log_info("Couldn't allocate partitions with %s merged into %s, using supplement verbatim.",
p->definition_path, p->supplement_for->definition_path);
}
/* Now assign free space according to the weight logic */

View File

@ -31,7 +31,7 @@ int bpf_serialize_link(FILE *f, FDSet *fds, const char *key, struct bpf_link *li
return serialize_fd(f, fds, key, sym_bpf_link__fd(link));
}
struct bpf_link *bpf_link_free(struct bpf_link *link) {
struct bpf_link* bpf_link_free(struct bpf_link *link) {
/* If libbpf wasn't dlopen()ed, sym_bpf_link__destroy might be unresolved (NULL), so let's not try to
* call it if link is NULL. link might also be a non-null "error pointer", but such a value can only
* originate from a call to libbpf, but that means that libbpf is available, and we can let
@ -41,3 +41,10 @@ struct bpf_link *bpf_link_free(struct bpf_link *link) {
return NULL;
}
struct ring_buffer* bpf_ring_buffer_free(struct ring_buffer *rb) {
if (rb) /* See the comment in bpf_link_free(). */
sym_ring_buffer__free(rb);
return NULL;
}

View File

@ -12,5 +12,8 @@ bool bpf_can_link_program(struct bpf_program *prog);
int bpf_serialize_link(FILE *f, FDSet *fds, const char *key, struct bpf_link *link);
struct bpf_link *bpf_link_free(struct bpf_link *p);
struct bpf_link* bpf_link_free(struct bpf_link *p);
DEFINE_TRIVIAL_CLEANUP_FUNC(struct bpf_link *, bpf_link_free);
struct ring_buffer* bpf_ring_buffer_free(struct ring_buffer *rb);
DEFINE_TRIVIAL_CLEANUP_FUNC(struct ring_buffer *, bpf_ring_buffer_free);

View File

@ -968,7 +968,9 @@ void log_setup_generator(void) {
/* This effectively means: journal for per-user generators, kmsg otherwise */
log_set_target(LOG_TARGET_JOURNAL_OR_KMSG);
}
} else
log_set_target(LOG_TARGET_AUTO);
log_setup();
log_parse_environment();
log_open();
}

View File

@ -28,17 +28,16 @@ static int output_waiting_jobs(sd_bus *bus, Table *table, uint32_t id, const cha
while ((r = sd_bus_message_read(reply, "(usssoo)", &other_id, &name, &type, NULL, NULL, NULL)) > 0) {
_cleanup_free_ char *row = NULL;
int rc;
if (asprintf(&row, "%s %u (%s/%s)", prefix, other_id, name, type) < 0)
return log_oom();
rc = table_add_many(table,
r = table_add_many(table,
TABLE_STRING, special_glyph(SPECIAL_GLYPH_TREE_RIGHT),
TABLE_STRING, row,
TABLE_EMPTY,
TABLE_EMPTY);
if (rc < 0)
if (r < 0)
return table_log_add_error(r);
}

View File

@ -29,6 +29,9 @@ if ! systemd-detect-virt --quiet --container; then
udevadm control --log-level debug
fi
esp_guid=C12A7328-F81F-11D2-BA4B-00A0C93EC93B
xbootldr_guid=BC13C2FF-59E6-4262-A352-B275FD6F7172
machine="$(uname -m)"
if [ "${machine}" = "x86_64" ]; then
root_guid=4F68BCE3-E8CD-4DB1-96E7-FBCAF984B709
@ -1325,9 +1328,12 @@ testcase_compression() {
# TODO: add btrfs once btrfs-progs v6.11 is available in distributions.
for format in squashfs erofs; do
if ! command -v "mkfs.$format" && ! command -v mksquashfs >/dev/null; then
continue
fi
case "$format" in
squashfs)
command -v mksquashfs >/dev/null || continue ;;
*)
command -v "mkfs.$format" || continue ;;
esac
[[ "$format" == "squashfs" ]] && compression=zstd
[[ "$format" == "erofs" ]] && compression=lz4hc
@ -1429,6 +1435,82 @@ EOF
systemd-dissect -U "$imgs/mnt"
}
testcase_fallback_partitions() {
local workdir image defs
workdir="$(mktemp --directory "/tmp/test-repart.fallback.XXXXXXXXXX")"
# shellcheck disable=SC2064
trap "rm -rf '${workdir:?}'" RETURN
image="$workdir/image.img"
defs="$workdir/defs"
mkdir "$defs"
tee "$defs/10-esp.conf" <<EOF
[Partition]
Type=esp
Format=vfat
SizeMinBytes=10M
EOF
tee "$defs/20-xbootldr.conf" <<EOF
[Partition]
Type=xbootldr
Format=vfat
SizeMinBytes=100M
SupplementFor=10-esp
EOF
# Blank disk => big ESP should be created
systemd-repart --empty=create --size=auto --dry-run=no --definitions="$defs" "$image"
output=$(sfdisk -d "$image")
assert_in "${image}1 : start= 2048, size= 204800, type=${esp_guid}" "$output"
assert_not_in "${image}2" "$output"
# Disk with small ESP => ESP grows
sfdisk "$image" <<EOF
label: gpt
size=10M, type=${esp_guid}
EOF
systemd-repart --dry-run=no --definitions="$defs" "$image"
output=$(sfdisk -d "$image")
assert_in "${image}1 : start= 2048, size= 204800, type=${esp_guid}" "$output"
assert_not_in "${image}2" "$output"
# Disk with small ESP that can't grow => XBOOTLDR created
truncate -s 150M "$image"
sfdisk "$image" <<EOF
label: gpt
size=10M, type=${esp_guid},
size=10M, type=${root_guid},
EOF
systemd-repart --dry-run=no --definitions="$defs" "$image"
output=$(sfdisk -d "$image")
assert_in "${image}1 : start= 2048, size= 20480, type=${esp_guid}" "$output"
assert_in "${image}3 : start= 43008, size= 264152, type=${xbootldr_guid}" "$output"
# Disk with existing XBOOTLDR partition => XBOOTLDR grows, small ESP created
sfdisk "$image" <<EOF
label: gpt
size=10M, type=${xbootldr_guid},
EOF
systemd-repart --dry-run=no --definitions="$defs" "$image"
output=$(sfdisk -d "$image")
assert_in "${image}1 : start= 2048, size= 204800, type=${xbootldr_guid}" "$output"
assert_in "${image}2 : start= 206848, size= 100312, type=${esp_guid}" "$output"
}
OFFLINE="yes"
run_testcases