After 259125d53d, network interfaces
declared by .netdev files are created after systemd-networkd sends READY
notification. So, even when networkd is started, the netdevs may not
be created yet, and 'ip' command may fail. Let's also check the return
code of the command.
This also
- drops never worked stdout checks,
- makes the test fail if the interface is not created within the timeout.
The wrong error code was logged.
But actually given that userns_mkdir() is fine with existing dirs, let's
drop the redundant conditionalization.
Follow-up for: a1fcaa1549
Those are historical names, but there is nothing wrong with them. The files on
/ (/fastboot, /forcefsck, and /forcequotacheck) are problematic because they
require a modification of the root file system. But the commandline params work
fine. They have the obvious advantage compared to our "modern" option that they
are much easier to type without looking up the spelling in the docs. Undeprecate
them to avoid unnecessary churn.
That file contains a bunch of entries of which only some are related to SysV.
The rest are just "traditional APIs" that need to stay. In particular,
/var/lock a.k.a. /run/lock is used by many programs (LVM, iscsi, alsactl).
Similarly, the README about /var/log is something that should stay as long as
we have people migrating from older systems or using the copiuos documentation
that mentions /var/log/messages.txt on the Internet.
/var/lock/subsys is only used by sysvinit, and our code to support /forcefsck,
/fastboot, and /forcequotacheck is conditionalized on HAVE_SYSV_COMPAT, so
conditionalize those here on HAVE_SYSV_COMPAT too.
Follow-up for 2b07a3211b.
Fixes the failure found in
https://autopkgtest.ubuntu.com/results/autopkgtest-noble-upstream-systemd-ci-systemd-ci/noble/amd64/s/systemd-upstream/20241115_182040_92382@/log.gz
. Relevant logs:
```
Nov 16 02:48:36 systemd-networkd[2706]: veth99: Reconfiguring with /run/systemd/network/25-dhcp-client-ipv6-only.network.
Nov 16 02:48:36 systemd-networkd[2706]: veth99: NDISC: Started IPv6 Router Solicitation client
Nov 16 02:48:36 systemd-networkd[2706]: veth99: IPv6 Router Discovery is configured and started.
Nov 16 02:48:36 systemd-networkd[2706]: veth99: NDISC: Sent Router Solicitation, next solicitation in 3s
Nov 16 02:48:36 systemd-networkd[2706]: veth99: NDISC: Received Router Advertisement from fe80::1034:56ff:fe78:9abd: flags=0xc0(managed, other), preference=medium, lifetime=30min
Nov 16 02:48:36 systemd-networkd[2706]: veth99: NDISC: Invoking callback for 'router' event.
Nov 16 02:48:36 systemd-networkd[2706]: veth99: link_check_ready(): dynamic addressing protocols are enabled but none of them finished yet.
Nov 16 02:48:36 systemd-networkd[2706]: veth99: DHCPv6 client: Starting in Solicit mode
Nov 16 02:48:36 systemd-networkd[2706]: veth99: DHCPv6 client: State changed: stopped -> solicitation
Nov 16 02:48:36 systemd-networkd[2706]: veth99: Acquiring DHCPv6 lease on NDisc request
Nov 16 02:48:36 systemd-networkd[2706]: veth99: DHCPv6 client: Sent Solicit
Nov 16 02:48:36 systemd-networkd[2706]: veth99: DHCPv6 client: Next retransmission in 1s
Nov 16 02:48:37 systemd-networkd[2706]: veth99: DHCPv6 client: Sent Solicit
Nov 16 02:48:37 systemd-networkd[2706]: veth99: DHCPv6 client: Next retransmission in 1s
Nov 16 02:48:39 systemd-networkd[2706]: veth99: NDISC: Received Neighbor Advertisement from fe80::1034:56ff:fe78:9abd: Router=yes, Solicited=yes, Override=no
Nov 16 02:48:39 systemd-networkd[2706]: veth99: NDISC: Invoking callback for 'neighbor' event.
Nov 16 02:48:39 systemd-networkd[2706]: veth99: DHCPv6 client: Processed Reply message
Nov 16 02:48:39 systemd-networkd[2706]: veth99: DHCPv6 client: T1 expires in 50s
Nov 16 02:48:39 systemd-networkd[2706]: veth99: DHCPv6 client: T2 expires in 55s
Nov 16 02:48:39 systemd-networkd[2706]: veth99: DHCPv6 client: Valid lifetime expires in 2min
Nov 16 02:48:39 systemd-networkd[2706]: veth99: DHCPv6 client: State changed: solicitation -> bound
Nov 16 02:48:39 systemd-networkd[2706]: veth99: DHCPv6 address 2600::15/128 (valid for 1min 59s, preferred for 1min 59s)
Nov 16 02:48:41 systemd-networkd[2706]: veth99: Received updated DHCPv6 address (configured): 2600::15/128 (valid for 1min 58s, preferred for 1min 58s), flags: no-prefixroute, scope: global
Nov 16 02:48:41 systemd-networkd[2706]: veth99: DHCPv6 addresses and routes set.
Nov 16 02:48:41 systemd-networkd[2706]: veth99: link_check_ready(): IPv4LL:no DHCPv4:no DHCPv6:yes DHCP-PD:no NDisc:no
Nov 16 02:48:41 systemd-networkd[2706]: veth99: State changed: configuring -> configured
```
The interface veth99 entered the configured state after 5 seconds, but
at the same time, the `wait_online()` in the test script considered the
test failed.
The function `wait_online()` first invokes
`systemd-networkd-wait-online` with `--timeout=20`, then check setup
states of interfaces with 5 seconds timeout. So, the failure suggests
that `systemd-networkd-wait-online` finishes immediately, as the state
file was not updated when it is invoked, and thus it handles the
interface veth99 already in the configured state.
Otherwise, nexthop ID may contain e.g. 300, then
===
AssertionError: '300' unexpectedly found in
'default nhid 3860882700 via fe80::1034:56ff:fe78:9a99 proto ra metric 512 expires 1798sec pref high\n
default nhid 2639230080 via fe80::1034:56ff:fe78:9a98 proto ra metric 2048 expires 1798sec pref low'
===
"nsenter -a" doesn't migrate the specified process into the target
cgroup (it really should). Thus the cgroup will remain in a cgroup
that is (due to cgroup ns) outside our visibility. The kernel will
report the cgroup path of such cgroups as starting with "/../". Detect
that and print a reasonably error message instead of trying to resolve
that.
```
$ systemd-cryptenroll /dev/vda3
SLOT TYPE
0 password
$ systemd-cryptenroll --wipe-slot 1 /dev/vda3
Failed to wipe slot 1, continuing: No such file or directory
```
We now distinguish two cases: where the list of self modifiable fields
is explicitly set to empty, and where the default is empty.
Let's display them differently in the output. When set explicitly to
empty let's mention the admin, otherwise just say "none".
For system users we should lock things down, hence generate an empty
list.
This is mostly a safety precaution, but also hides really confusing
output of "userdbctl user" for an system user.
Follow-up for: a192250eda
For my understanding bsearch is searching in the wrong array. Or, if
it's the right one, then the size is wrong. In another commit I made the
arrays different by mistake and that triggered a SIGSEV during tests.
"systemctl status systemd-logind" otherwise looks a bit weird, since the
tasks and the fdstore lines are so close to each other but formatted
quite differently when it comes to coloring.
Currently, get_fixed_user() employs USER_CREDS_SUPPRESS_PLACEHOLDER,
meaning home path is set to NULL if it's empty or root. However,
the path is also used for applying WorkingDirectory=~, and we'd
spuriously use the invoking user's home as fallback even if
User= is changed in that case.
Let's instead delegate such suppression to build_environment(),
so that home is proper initialized for usage at other steps.
shell doesn't actually suffer from such problem, but it's changed
too for consistency.
Alternative to #34789
Fixes IPv6 Core Conformance test failures reported at #33468.
https://www.ipv6ready.org/docs/Core_Conformance.pdf
Test v6LC.2.2.23 h and j: Processing Router Advertisement with Route
Information Option (Host Only)
When a RA contains route option with ::/0 prefix, then previously that
may contradict with the default route requested with the RA header.
If the route option has zero lifetime, the existing default route should
be removed, and a new route based on the RA header should be configured.
If the route option has non-zero lifetime, the RA header should be
ignored.
So, we first need to process options with zero lifetime (not only
route option, as the similar reasons), then configure the default route
based on the RA, finally process options with non-zero lifetime.
This partially reverts 02eabaffe9.
As noted in https://github.com/systemd/systemd/pull/35211:
> The configuration parsing simply stores the string as-is, rather than
> creating the appropriate object
One way to fix the issue would be to store the "appropriate object", i.e.
actually the class. But that makes the code very verbose, with the conversion
being done in two places. And that still doesn't fix the issue, because we need
to map the class objects back to the original name in error messages.
So instead, store the setting as a string and only map it to the class much
later. This makes the code simpler and fixes the error messages too.
Resolves https://github.com/systemd/systemd/pull/35193
Add the `arm_fadvise64_64` syscall to the allow_list, in addition
to the existing `fadvise64` and `fadvise64_64` syscalls, as this is
the syscall actually defined for `arm` architecture. Adding it fixes
the syscall being rejected in arm32 containers.
Fixes#35194
* 51cd22f368 Update changelog for 257~rc2-3 release
* 5308c3b905 Backport patch to remove faulty unit test assertion
* b7d805151b Update changelog for 257~rc2-2 release
* 5afc23b288 Backport patch to fix FTBFS due to failing unit test
* 0ca89ce40c Update changelog for 257~rc2-1 release
* f27216d493 Update lintian override to ignore false positive typos
* 2caa74f473 d/rules: adjust blhc override to account for source files being moved
* 6b48328ead systemd-ukify: recommend systemd-repart
* 5e01b67f43 systemd-ukify: downgrade dependency on systemd, not mandatory
* 3a4dd59e41 Install new systemd-keyutil binary in the systemd-repart package
* e64cffab71 Drop all patches, merged upstream
* 0fcef228c7 Update upstream source from tag 'upstream/257_rc2'
* a01322bb29 d/t/control: add more packages to dummy hint-testsuite-triggers
* 7bd1d09f7f Change sysusers u! lines to u because we don't have support in rpm
* 943bd94cf6 Version 257~rc2
* 6162965002 Disable freezing of user sessions
* 0c236cedb9 Upload sources
* ea947ce068 Version 257~rc1
* 834ba50e79 Use %posttrans instead of %postun to restart services
* 8dafa3810b Disable OpenSSL v3 ENGINE on RHEL
* 8f44e8097d Add forgotten patch
* 86ca699d18 Backport user manager reexec changes
* 009c64d6a2 Use %systemd_preun in systemd-resolved
systemd-boot uses the existance of loader/keys/auto to determine
whether to auto-enroll secure boot or not so only create the directory
if we're actually going to put auto-enroll signature lists in it.
The second check was searching the symbols into the same array, but
using the size of the other. This generated a SIGSEV when they
occassionally mismatched.
lcov 2.1 introduced additional consistency checks [0] which make it trip
over our coverage results quite often:
Summary coverage rate:
source files: 915
lines.......: 36.9% (78950 of 214010 lines)
functions...: 53.3% (6906 of 12949 functions)
Message summary:
73 warning messages:
inconsistent: 73
lcov: ERROR: (corrupt) unable to read trace file '/var/tmp/systemd-test-TEST-04-JOURNAL/coverage-info.new': lcov: ERROR: (inconsistent) "/build/src/shutdown/umount.c":298: function 'umount_with_timeout' is not hit but line 317 is.
To skip consistency checks, see the 'check_data_consistency' section in man lcovrc(5).
(use "lcov --ignore-errors inconsistent ..." to bypass this error)
(use "lcov --ignore-errors corrupt ..." to bypass this error)
This is caused by coverage collected during shutdown which is a bit
unreliable, especially towards the final shutdown stage(s). Let's just
ignore the consistency errors for now.
[0] https://github.com/linux-test-project/lcov/releases/tag/v2.2
The proposal in https://github.com/systemd/systemd/pull/35091 suggests
that there are going to be more resources sooner or later that shall be
embeddable in a UKI, but are specific to some machine. The .hwids logic
as it is implemented right now is conceptually flexible enough to cover
that too (as long as the system has SMBIOS and thus CHIDs). Hence, let's
prepare the ground for a future (that might possibly never come, but
let's keep the door open) where the section can be reused for this
purpose.
The patch is really dumb ultimately. it just changes the initial field
in the "Device" struct to carry not just the size of it (as before) but
also a type indicator, that is for now fixed to 1, indicating DT blobs.
This breaks compatibility, hence this should get merged before we do the
v257 release, so that this is done properly before the first release
with .hwids.
We use the $WATCHDOG_USEC variable for two very closely uses: as part of
the sd_watchdog_enabled() protocol for implementing service watchdogs.
And as part of the protocol between the service manager and
systemd-shutdown across the PID 1 execve() transition during shutdown.
Apparently some exitrds tools got confused by the latter use. Let's
address that by setting $WATCHDOG_PID to 1, in accordance to the
sd_watchdog_enabled() protocol to make clear this is only intended for
PID 1 and nothing else.
Replaces: #35135
Given how long it took to come to a conclusion of the discussions around
https://github.com/systemd/systemd/issues/35026, let's add a comment
that makes this easier to grok for the next time this comes up.
Follow-up for: 6e207b370e
libnvme 1.11 appears to require a kernel built with NVME TLS
kconfigs, and fails hard if it is not, as the expected
privileged keyring '.nvme' is not present. We cannot just
create it from userspace, as privileged keyrings can only
be created by the kernel itself (those starting with '.').
Skip the test if the library exactly matches this version.
https://github.com/linux-nvme/nvme-cli/issues/2573
Fixes https://github.com/systemd/systemd/issues/35130
Stub behavior will be as following:
1. If there are no `.dtbauto` sections then is used `.dtb` if present
2. If there are `.dtbauto` sections and there is at least one matching
(either with the firmware-provided DT or via `.hwids`) then it'll be
used instead of the `.dtb`.
Based on #28959 and [dtbloader](https://github.com/TravMurav/dtbloader)
Closes#28959Fixes#31946
While doing that, even if mknod() failed, we anyway try to fall back to
use bind mount if arg_uid_shift == 0.
Mostly no functional change, just refactoring and preparation for later commit.
Follow-up for dc3223919f.
If nspawn is invoked with DevicePolicy= but DeviceAllow= does not
contain /dev/fuse, nspawn will fail to get fuse version with -EPERM.
Let's silence the warning in that case.
Outside of x86, some machines (e.g. Apple silicon, AMD Opteron A1100)
have physical memory mapped above 4GiB, meaning this allocation will
fail, causing the entire boot process to fail on these machines.
This commit makes it so that the below-4GB address space allocation
requirement is only set on x86 platforms, and not on other platforms
(that don't have the specific Linux x86 boot protocol), thereby fixing
boot on those that have no memory mapped below 4GiB in their address
space.
Tested on an Apple silicon M1 laptop and an AMD x86_64 desktop tower.
Fixes: #35026
Previously, when multiple routers send RAs with the same preference,
then the kernel merges routes with the same gateway address:
===
default proto ra metric 1024 expires 595sec pref medium
nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1
nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1
===
This causes IPv6 Conformance Test v6LC.2.2.11 failure, as reported in #33470.
To avoid the coalescing issue, we can use nexthop, as suggested by Ido Schimmel:
https://lore.kernel.org/netdev/ZytjEINNRmtpadr_@shredder/
> BTW, you can avoid the coalescing problem by using the nexthop API.
> # ip nexthop add id 1 via fe80::200:10ff:fe10:1060 dev enp0s9
> # ip -6 route add default nhid 1 expires 600 proto ra
> # ip nexthop add id 2 via fe80::200:10ff:fe10:1061 dev enp0s9
> # ip -6 route append default nhid 2 expires 600 proto ra
> # ip -6 route
> fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
> default nhid 1 via fe80::200:10ff:fe10:1060 dev enp0s9 proto ra metric 1024 expires 563sec pref medium
> default nhid 2 via fe80::200:10ff:fe10:1061 dev enp0s9 proto ra metric 1024 expires 594sec pref medium
Fixes#33470.
Suggested-by: Ido Schimmel <idosch@idosch.org>
By the previous commit, configuration source of addresses and routes are
saved on stop and restored on start. Hence, we can keep dynamic
configurations on stop.
Co-authored-by: Jian Zhang <zhangjian.3032@bytedance.com>
Currently, only configuration sources and providers of addresses and
routes are serialized/deserialized.
This should mostly not change behavior, as dynamic (except for DHCPv4)
configurations will be dropped before stopping networkd, and for DHCPv4
protocol, we have already had another logic to handle DHCPv4
configurations.
Preparation for later commits.
Follow-up for 1003093604.
If a netdev is detached for some reasons, then previously the request
was simply cancelled, and the underlying interface never enter the
configured state, as the 'stacked_netdevs_created' flag never set.
This makes the counter decremented manually by the function, and set the
flag. So, the underlying interface can eter the configured state.
After PR #34909, networkd tries to update an existing netdev interface if
possible. But, when .netdev files are loaded on start, we have not
enumerate interfaces, so we do not know if the corresponding interface
exists or not. Let's delay processing request a bit.
Follow-up for PR #34909.
This fixes an issue that network interfaces cannot join a master netdev,
like bond or bridge, when the corresponding .netdev is reloaded.
With PR #34909, networkd supports reloading .netdev files. However,
When a .netdev file is modified and reloaded, ifindex is copied from
the old NetDev object to the new one. Thus, even if the interface is
successfully updated, netdev_set_ifindex_impl() will return 0 and
netdev_enter_ready() will never called. If the netdev is a kind of
master netdev, then port interfaces cannot join the master netdev,
as REQUEST_TYPE_SET_LINK_MASTER requires that the master netdev is
in the ready state.
Follow-up for 17c5337f7b.
Older kernels (older than v6.5) refuse RTM_NEWLINK messages with IFLA_ADDRESS
attribute when the netdev already exists and is running, even if the MAC
address is unchanged.
So, let's not set IFLA_ADDRESS or IFLA_MTU if they are unchanged, and
set the attributes only when we can update them.
* 48fabbd5d2 Install new sd-keyutil binary in sd-repart package
* 6dd9ab10fe Update changelog for 257~rc1-4 release
* 6dd325f04b Backport patch to fix TEST-07-PID1 integration test
* 5988cc60ee Update changelog for 257~rc1-3 release
* cf3a2f7ccc Backport another patch to fix test failure on buildd
* 5d6a226dbb Update changelog for 257~rc1-2 release
* ebe97c52c8 Backport patch to fix unit test failure on buildd
* 21f63b20bb Update changelog for 257~rc1-1 release
* 0dfec51bbb d/copyright: remove pattern for directory that is no longer present
* 337b3bb2dd Ignore Lintian warning dh-exec-script-without-dh-exec-features
* b680e6b448 List new libsystemd0 symbols
* 3c00aa000c gbp.conf: use --first-parent for dch to avoid upstream commits
* d53ecc7769 Install new files
* 546e8c9137 Drop all patches, merged upstream
* 6757597480 Update upstream source from tag 'upstream/257_rc1'
* 4b82805020 gbp.conf: switch upstream branch to full upstream history
* e60c637a95 gbp.conf: enable signing tags by default
* 2ad27b63c4 Update changelog for 256.7-3 release
* a212c36c54 systemd-boot: provide integration with shim
We now import the upstream tag in the debian repository, so
this explodes as it tries to walk all upstream commits. Use
--first-parent so that merges only get added via the merge
commit.
When kill_whom == _ALL, there can be two cases that lead to
ESRCH: the session expects no scope at all or the scope is
not active. Let's distinguish the two cases.
The concept of synthetic errnos is about logging, which
is irrelevant irt bus error and we don't do any special
treatment in sd-bus for them, meaning the value propagated
would be spurious.
Follow-up for 972f1d17ab.
This fixes the logic of removing unnecessary routes configured by the
previously received RAs. Previously, we wrongly handled existing routes
could be updated, and unexpected routes would be kept.
The auditing subsystem is still not virtualized for containers, hence
the two values don't really make sense inside them, they will just leak
information from outside into the container. Hence don't make use of the
data if we detect we are run inside of a container.
This has visible effects: logind will no longer try to reuse the
auditing session ids as its own session ids when run inside a container.
While are at it, modernize the calls in more ways:
1. switch to pidref behaviour, all but one of our uses are using pidref
anyway already.
2. use read_virtual_file() + proc_mounted()
3. reasonably distinguish ENOENT errors when reading the process proc
files: distinguish the case where /proc is not mounted, from the case
where the process is already gone, from where auditing is not enabled in
the kernel build.
Apparently some terminal emulators have problems with overly long
titles, hence truncate them at some safe length (128).
Also, when parsing ANSI sequences ourselves accept longer sequences
(192), after all we should be fine when parsing our own title sequences.
Fixes: #35104
The auditing subsystem is still not virtualized for containers, hence the two
values don't really make sense inside them, they will just leak
information from outside into the container. Hence don't make use of the
data if we detect we are run inside of a container.
This has visible effects: logind will no longer try to reuse the
auditing session ids as its own session ids when run inside a container.
While are at it, modernize the calls in more ways:
1. switch to pidref behaviour, all but one of our uses are using pidref
anyway already.
2. use read_virtual_file() + proc_mounted()
3. reasonable distinguish ENOENT errors when reading the process proc
files: distinguish the case where /proc is not mounted, from the case
where the process is already gone, from where auditing is not enabled
in the kernel build.
- fix verifiers in test_router_preference() to make them actually check
if unnecessary routes are removed,
- stop radv in test_ndisc_vs_static_route() before checking if the static
route is preserved even when the router sends a RA with zero lifetime,
- make verifiers in NetworkdIPv6PrefixTests stricter.
Follow-up for 972f1d17ab.
This fixes the logic of removing unnecessary routes configured by the
previously received RAs. Previously, we wrongly handled existing routes
could be updated, and unexpected routes would be kept.
- drop unnecessary call of ndisc_set_route_priority() at the beginning,
as it is called later in the loop below,
- use RET_GATHER() and remove all possible routes even if failed.
Some ambiguity (e.g., same-named man pages in multiple volumes)
makes it impossible to fully automate this, but the following
Python snippet (run inside the man/ directory of the systemd repo)
helped to generate the sed command lines (which were subsequently
manually reviewed, run and the false positives reverted):
from pathlib import Path
import lxml
from lxml import etree as ET
man2vol: dict[str, str] = {}
man2citerefs: dict[str, list] = {}
for file in Path(".").glob("*.xml"):
tree = ET.parse(file, lxml.etree.XMLParser(recover=True))
meta = tree.find("refmeta")
if meta is not None:
title = meta.findtext("refentrytitle")
if title is not None:
vol = meta.findtext("manvolnum")
if vol is not None:
man2vol[title] = vol
citerefs = list(tree.iter("citerefentry"))
if citerefs:
man2citerefs[title] = citerefs
for man, refs in man2citerefs.items():
for ref in refs:
title = ref.findtext("refentrytitle")
if title is not None:
has = ref.findtext("manvolnum")
try:
should_have = man2vol[title]
except KeyError: # Non-systemd man page reference? Ignore.
continue
if has != should_have:
print(
f"sed -i '\\|<citerefentry><refentrytitle>{title}"
f"</refentrytitle><manvolnum>{has}</manvolnum>"
f"</citerefentry>|s|<manvolnum>{has}</manvolnum>|"
f"<manvolnum>{should_have}</manvolnum>|' {man}.xml"
)
Let's gather generic key/certificate operations in a new tool
systemd-keyutil instead of spreading them across various special purpose
tools.
Fixes#35087
When an interface went down, IPv4 non-local routes are removed by the
kernel without any notifications. Let's forget the routes in that case.
Fixes#35047.
When a nexthop is removed, routes depend on the removed nexthop are
already removed. It is not necessary to remove them, as already
commented. Let's forget them without trying to remove.
Previously, even if KeepConfiguration=dhcp or yes is specified in the
new .network file, dynamic configurations like DHCP address and routes
were dropped when 'networkctl reconfigure INTERFACE' is invoked.
If the setting is specified, let's gracefully handle the dynamic
configurations. Then, 'networkctl reconfigure' can be also used for
an interface that has critical connections.
Follow-up for dd6d53a8dc.
Unnecessary static configs will be anyway dropped later in
link_configure() -> link_drop_unmanaged_config(). Hence, even if we are
reconfiguring an interface cleanly, it is not necessary to drop static
configs here.
Unreachable routes are not owned by any interfaces, and its ifindex is
zero. Previously, if a non-upstream interface is reconfigured, all routes
including unreachable routes configured by the upstream interface are
removed.
This makes unreachable routes are always handled by the upstream interface,
and only removed when the delegated prefixes are changed or lost.
Follow-up for 451c2baf30.
With the commits, reloading .network files does not release previously
acquired DHCP lease and friends if possible.
On graceful reconfigure triggered by the reload, the interface may
acquire a new DHCPv4 lease earlier than DHCPv6 lease. In that case,
the check will fail as it is done with the new DHCPv4 lease and old
DHCPv6 lease, which does not contain any IPv6 DNS servers or so.
So, when switching from no -> yes, we need to wait a new lease with DNS
servers or so. To achieve that, we need to clean reconfigure the interface.
Follow-up for 451c2baf30.
With the commits, reloading .network files does not release previously
acquired DHCP lease and friends if possible. If previously a DHCP client
was configured as not requesting DNS servers or so, then the previously
acquired lease might not contain any DNS servers. In that case, if the
new .network file enables UseDNS=, then the interface should enter the
configured state after a new lease is acquired. To achieve that, we need
to reset the flags.
With this change, the workaround applied to the test by the commit
451c2baf30 can be dropped.
Otherwise, even if a link enters the configuring state at the beginning
of link_configure(), link_check_ready() may be called before
link_drop_unmanaged_config() is called, and the link may enter the
configured state.
Fixes#35092.
`loginctl kill-session --kill-whom=leader <N>` (or the D-Bus equivalent)
doesn't work because logind ends up calling `KillUnit(..., "main", ...)`
on a scope unit and these don't have a `MainPID` property. Here, I just
make it send a signal to the `Leader` directly.
Fixes a couple of bugs with systemd-sysupdated's target enumeration. See
commit messages for details.
<!-- devel-freezer =
{"comment-id":"2460494553","freezing-tag":"v257-rc1"} -->
To keep align with the logic used in udev_rules_parse_file(), we also
should skip the empty udev rules file while collecting the stats during
manager reload. Otherwise all udev rules files will be parsed again whenever
reloading udev manager with an empty udev rules file. It's time consuming
and the following uevents will fail with timeout.
A bit confusingly CONTAINER_UID_BASE_MAX is just the maximum *base* UID
for a container. Thus, with the usual 64K UID assignments, the last
actual container UID is CONTAINER_UID_BASE_MAX+0xFFFF.
To make this less confusing define CONTAINER_UID_MIN/MAX that add the
missing extra space.
Also adjust two uses where this was mishandled so far, due to this
confusion.
With this change the UID ranges we default to should properly match what
is documented on https://systemd.io/UIDS-GIDS/.
Let's gather generic key/certificate operations in a new tool
systemd-keyutil instead of spreading them across various special
purpose tools.
Fixes#35087
In the troff output, this doesn't seem to make any difference. But in the
html output, the whitespace is sometimes preserved, creating an additional
gap before the following content. Drop it everywhere to avoid this.
This allows loading the X.509 certificate from an OpenSSL provider
instead of a file system path. This allows loading certficates directly
from hardware tokens instead of having to export them to a file on
disk first.
<!-- devel-freezer =
{"comment-id":"2460915782","freezing-tag":"v257-rc1"} -->
This verb writes a public key to stdout extracted from either a public key
path, from a certificate (path or provider) or from a private key (path,
engine, provider). We'll use this in ukify to get rid of the use of the
python cryptography module to convert a private key or certificate to a
public key.
This allows loading the X.509 certificate from an OpenSSL provider
instead of a file system path. This allows loading certficates directly
from hardware tokens instead of having to export them to a file on
disk first.
Most people are probably on stable releases, but we don't want to update the
minor version all the time, so just specify 256.x as a hint to fill in the
full version.
I very much dislike the approach in which we were mixing Linux and UEFI C code
in the same subdirectory. No code was shared between two environments. This
layout was created in e7dd673d1e, with the
justification of "being more consistent with the rest of systemd", but I don't
see how it's supposed to be so.
Originally, when the C code was just a single bootctl.c file, this wasn't so
bad. But over time the userspace code grew quite a bit. With the moves done in
previuos commits, the intermediate subdirectory is now empty except for the
efi/ subdir, and this additional subdirectory level doesn't have a good
justification. The components is called "systemd-boot", not "systemd-efi", and
we can remove one level of indentation.
We have other subdirectories with just a single C file. And I expect
that systemd-measure will only grow over time, adding new functionality.
It's nicer to give its own subdirectory to maintain consistent structure.
We'd log that we're skipping the target, but it would never actually get
removed from the manager's list. Thus, we'd advertise targets that don't
actually exist to clients.
In the original version of the sysupdated PR, this was handled by
removing the target from the manager's list in target_free, and using a
_cleanup_ attribute to free the target when skipping. However, this
changed at some point during review. So, this commit takes the
alternative approach
Currently in mkosi and ukify we use sbsigntools to do secure boot
signing. This has multiple issues:
- sbsigntools is practically unmaintained, sbvarsign is completely
broken with the latest gnu-efi when built without -fshort-wchar and
upstream has completely ignored my bug report about this.
- sbsigntools only supports openssl engines and not the new providers
API.
- sbsigntools doesn't allow us to cache hardware token pins in the
kernel keyring like we do nowadays when we sign stuff ourselves in
systemd-repart or systemd-measure
There are alternative tools like sbctl and pesign but these do not
support caching hardware token pins in the kernel keyring either.
To get around the issues with sbsigntools, let's introduce our own
tool systemd-sbsign to do secure boot signing. This allows us to
take advantage of our own openssl infra so that hardware token pins
are cached in the kernel keyring as expected and we get openssl
provider support as well.
The section headers used quotes as if the strings were some constants. But
AFAICT, those are just normal plain-text titles. Also lowercase them, because
this is almost like a table and it's easier to read without capitalization.
We used both, in fact "Devicetree" was more common. But we have a general rule
that we capitalize all words in names and also we have a DeviceTree=
configuration setting, which we cannot change. If we use two different
spelllings, this will make it harder for people to use the correct one in
config files. So use the "DeviceTree" spelling everywhere.
Since v256 we completely fail to boot if v1 is configured. Fedora 41 was just
released with v256.7 and this is probably the first major exposure of users to
this code. It turns out not work very well. Fedora switched to v2 as default in
F31 (2019) and at that time some people added configuration to use v1 either
because of Docker or for other reasons. But it's been long enough ago that
people don't remember this and are now very unhappy when the system refuses to
boot after an upgrade.
Refusing to boot is also unnecessarilly punishing to users. For machines that
are used remotely, this could mean somebody needs to physically access the
machine. For other users, the machine might be the only way to access the net
and help, and people might not know how to set kernel parameters without some
docs. And because this is in systemd, after an upgrade all boot choices are
affected, and it's not possible to e.g. select an older kernel for boot. And
crashing the machine doesn't really serve our goal either: we were giving a
hint how to continue using v1 and nothing else.
If the new override is configured, warn and immediately boot to v1.
If v1 is configured w/o the override, warn and wait 30 s and boot to v2.
Also give a hint how to switch to v2.
https://bugzilla.redhat.com/show_bug.cgi?id=2323323https://bugzilla.redhat.com/show_bug.cgi?id=2323345https://bugzilla.redhat.com/show_bug.cgi?id=2322467https://www.reddit.com/r/Fedora/comments/1gfcyw9/refusing_to_run_under_cgroup_01_sy_specified_on/
The advice is to set systemd.unified_cgroup_hierarchy=1 (instead of removing
systemd.unified_cgroup_hierarchy=0). I think this is easier to convey. Users
who are understand what is going on can just remove the option instead.
The caching is dropped in cg_is_legacy_wanted(). It turns out that the
order in which those functions are called during early setup is very fragile.
If cg_is_legacy_wanted() is called before we have set up the v2 hierarchy,
we incorrectly cache a true answer. The function is called just a handful
of times at most, so we don't really need to cache the response.
The text added for .dtbauto/.hwids was very hard to grok. This rewords it to be
proper English. No semantic changes are intended.
When updating this, I noticed that the interaction of multi-profile UKIs and
dtb autoselection is very unclear, a FIXME is added.
Currently in mkosi and ukify we use sbsigntools to do secure boot
signing. This has multiple issues:
- sbsigntools is practically unmaintained, sbvarsign is completely
broken with the latest gnu-efi when built without -fshort-wchar and
upstream has completely ignored my bug report about this.
- sbsigntools only supports openssl engines and not the new providers
API.
- sbsigntools doesn't allow us to cache hardware token pins in the
kernel keyring like we do nowadays when we sign stuff ourselves in
systemd-repart or systemd-measure
There are alternative tools like sbctl and pesign but these do not
support caching hardware token pins in the kernel keyring either.
To get around the issues with sbsigntools, let's introduce our own
tool systemd-sbsign to do secure boot signing. This allows us to
take advantage of our own openssl infra so that hardware token pins
are cached in the kernel keyring as expected and we get openssl
provider support as well.
Let's systematically make sure that we link up the D-Bus interfaces from
the daemon man pages once in prose and once in short form at the bottom
("See Also"), for all daemons.
Also, add reverse links at the bottom of the D-Bus API docs.
Fixes: #34996
We have this in a similar fashion for the other APIs libsystemd
provides. Add the same for sd-varlink. There isn't too much on it for
now, but at least it's a start.
Also link it up everywhere.
Processes can easily survive the first kill operation we execute, hence
we shouldn't make strong claims about them having exited already. Let's
just say "likely" hence.
Fixes: #15032
Let's emphasize the privilege thing with a <caution> section.
Let's also point out that other D-Bus libraries are less restrictive
than sd-bus by default regarding permission access.
Fixes: #34735
.dtbauto section contains DT blobs, just like .dtb, the difference is
that multiple .dtbauto sections are allowed to be in a UKI and only one
is selected automatically
Temporarily drop an assert_cc() check in systemd-measure to make it compilable before the next commit
It is normal for DHCP leases not to have DNR options. We need to be less
verbose and more forgiving in these cases. Also, if either DHCP does not
have DNR options, make sure to still consider any DHCPv6/RA options.
Fixes: c7c9e3c7c0 (network: adjust log message about DNR)
While for engines we have ENGINE_ctrl() to set the UI method for the
second PIN prompt, for openssl providers we don't have such a feature
which means we get the default openssl UI for the second pin prompt.
Instead, let's set the default UI method which does get used for the
second pin prompt by the pkcs11 provider.
And use it when explicit reconfiguration is requested by Reconfigure() DBus method
or networkd certainly detects that connected network is changed.
Otherwise do not use the flag especially when we come back from sleep mode.
E.g. when a .network file is updated, but DHCP setting is unchanged, it
is not necessary to drop acquired DHCP lease.
So, let's not stop DHCP client and friends in link_reconfigure_impl(),
but stop them later when we know they are not necessary anymore.
Still DHCP clients and friends are stopped and leases are dropped when
the explicit reconfiguration is requested
When a reconfiguration of an interface is triggered, previously we
call link_foreignize_config(), which sets all static configurations as
foreign, then later call link_drop_foreign_config(), which drops
unnecessary foreign configurations.
This commit merges these two steps into one, link_drop_unmanaged_config(),
which drops unnecessary static and foreign configurations.
Also, this renames link_drop_managed_configs() to
link_drop_static_config(), as it only drops static configurations.
Note that dynamically aquired configurations are dropped by
link_stop_engines().
Effectively no functional changes, just refactoring and preparation for
later changes.
- convert boolean flag 'force' to LinkReconfigurationFlag enum,
- merge link_reconfigure() and reconfigure_handler_on_bus_method_reload() as
link_reconfigure_full(),
- Rename ReconfigureData -> LinkReconfigurationData,
- make Reconfigure() DBus message wait for reconfiguration being
started before sending reply.
Currently translated at 90.9% (230 of 253 strings)
po: Translated using Weblate (German)
Currently translated at 89.3% (226 of 253 strings)
po: Translated using Weblate (German)
Currently translated at 88.9% (225 of 253 strings)
po: Translated using Weblate (German)
Currently translated at 88.1% (223 of 253 strings)
Co-authored-by: Weblate Translation Memory <noreply-mt-weblate-translation-memory@weblate.org>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/de/
Translation: systemd/main
Currently translated at 90.9% (230 of 253 strings)
po: Translated using Weblate (German)
Currently translated at 89.3% (226 of 253 strings)
po: Translated using Weblate (German)
Currently translated at 88.9% (225 of 253 strings)
po: Translated using Weblate (German)
Currently translated at 88.1% (223 of 253 strings)
Co-authored-by: Ettore Atalan <atalanttore@googlemail.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/de/
Translation: systemd/main
This new setting allows unsharing the pid namespace in a unit. Because
you have to fork to get a process into a pid namespace, we fork in
systemd-executor to get into the new pid namespace. The parent then
sends the pid of the child process back to the manager and exits while
the child process continues on with the rest of exec_invoke() and then
executes the actual payload.
Communicating the child pid is done via a new pidref socket pair that is
set up on manager startup.
We unshare the PID namespace right before the mount namespace so we
mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes
to mount procfs.
When running unprivileged in a user session, user namespace is set up first
to allow for PID namespace to be unshared. However, when running in
privileged mode, we unshare the user namespace last to ensure the user
namespace does not own the PID namespace and cannot break out of the sandbox.
Note we disallow Type=forking services from using PrivatePIDs=yes since the
init proess inside the PID namespace must not exit for other processes in
the namespace to exist.
Note Daan De Meyer did the original work for this commit with Ryan Wilson
addressing follow-ups.
Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
In https://bugzilla.redhat.com/show_bug.cgi?id=2322937 we're getting
an error message:
Okt 29 22:21:03 fedora systemd-resolved[29311]: Could not create manager: Cannot allocate memory
I expect that this actually comes from dnstls_manager_init(), the
openssl version. But without real logs it's hard to know for sure.
Use EIO instead of ENOMEM, because the problem is unlikely to be actually
related to memory.
We need a sensible limit on the number of Encrypted DNS options allowed
so that the set of resolvers per link does not grow without bound.
Fixes: 0c90d1d2f2 ("ndisc: Parse RFC9463 encrypted DNS (DNR) option")
This allows a single tmpfiles snippet with lines to symlink directories
from /usr/share/factory to be shared across many different configurations
while making sure symlinks only get created if the source actually exists.
We enumerate interfaces at first, then enumerate other configurations
like addresses and so on. If we are running on a container, previously
we started to configure the enumerated interfaces before enumerating other
configurations.
Let's configure interfaces after all configurations are enumerated.
The previous commit removed the UINT_MAX check for the fd array. Let's
now re-add one, but at a better place, and with a more useful limit. As
it turns out the kernel does not allow passing more than 253 fds at the
same time, hence use that as limit. And do so immediately before
calculating the control buffer size, so that we catch multiplication
overflows.
Let's move the helper from nss-resolve.c to generic code, as it's going
to be useful in #34640.
Also, let's tighten the rules, and refuse negative ifindexes, because
they are invalid.
We fucked that up in the original sd_listen() calls, and then we fixed
that on the newer flavours. But pour internal common implementation
should of course use the full range size_t, as it should be.
This then allows us to drop a redundant range check.
This cleans up the handling of the "unset_environment" parameter to
sd_listen() and related calls: the man pages claim we operate on it on
error too. Hence, actually do so in strictly all error paths. Previously
we'd miss out on some, because wrapper functions mishandled them.
This was addressed before in 362dcfc5db
but some codepaths were missed. Complete the work now.
This establishes a common pattern: a function to unset the relevant env
vars, that is called from a goto section at the botom on both success
and failure.
Some kernel SAS drivers (e.g. smartpqi) expose ports with num_phys = 0. udev
shouldn't treat these ports as wide ports. SAS wide ports always have
num_phys > 1. See comments for sas_port_add_phy() in the kernel sources.
Sample data from a smartpqi system to illustrate the issue below.
Here the phy device is attached to port 0:0, which has no end devices attached
and the SAS end device (where sda is attached) is associated with SAS
port 0:1, which has no associated phy device. Thus num_phys for port-0:1 is 0.
This is arguably wrong, but it's how smartpqi has always set up its devices in
sysfs.
/sys/class/sas_phy/phy-0:0 -> ../../devices/pci0000:46/0000:46:02.0/0000:47:00.0/host0/scsi_host/host0/phy-0:0/sas_phy/phy-0:0
/sys/devices/pci0000:46/0000:46:02.0/0000:47:00.0/host0/scsi_host/host0/port-0:0/phy-0:0 -> ../phy-0:0
/sys/devices/pci0000:46/0000:46:02.0/0000:47:00.0/host0/scsi_host/host0/phy-0:0/port -> ../port-0:0
/sys/class/sas_device/end_device-0:1 -> ../../devices/pci0000:46/0000:46:02.0/0000:47:00.0/host0/scsi_host/host0/port-0:1/end_device-0:1/sas_device/end_device-0:1
/sys/class/block/sda -> ../../devices/pci0000:46/0000:46:02.0/0000:47:00.0/host0/scsi_host/host0/port-0:1/end_device-0:1/target0:0:0/0:0:0:0/block/sda
Signed-off-by: Martin Wilck <mwilck@suse.com>
In mkosi, we want to support signing via a hardware token. We already
support this in systemd-repart and systemd-measure. However, if the
hardware token is protected by a pin, the pin is asked as many as 20
times when building an image as the pin is not cached and thus requested
again for every operation.
Let's introduce a custom openssl ui when we use engines and providers
and plug systemd-ask-password into the process. With
systemd-ask-password, the pin can be cached in the kernel keyring,
allowing us to reuse it without querying the user again every time to
enter the pin.
We use the private key URI as the keyring identifier so that the cached
pin can be shared across multiple tools.
In mkosi, we want to support signing via a hardware token. We already
support this in systemd-repart and systemd-measure. However, if the
hardware token is protected by a pin, the pin is asked as many as 20
times when building an image as the pin is not cached and thus requested
again for every operation.
Let's introduce a custom openssl ui when we use engines and providers
and plug systemd-ask-password into the process. With systemd-ask-password,
the pin can be cached in the kernel keyring, allowing us to reuse it without
querying the user again every time to enter the pin.
We use the private key URI as the keyring identifier so that the cached pin
can be shared across multiple tools.
Note that if the private key is pin protected, openssl will prompt both when
loading the private key using the pkcs11 engine and when actually signing the
roothash. To make sure our custom UI is used when signing the roothash, we have
to also configure it with ENGINE_ctrl() which takes a non-owning pointer to
the UI_METHOD object and its userdata object which we have to keep alive so we
introduce a new AskPasswordUserInterface struct which we use to keep both objects
alive together with the EVP_PKEY object.
Because the AskPasswordRequest struct stores non-owning pointers to its fields,
we change repart to store the private key URI as a global variable again instead
of the EVP_PKEY object so that we can use the private key argument as the keyring
field of the AskPasswordRequest instance without running into lifetime issues.
No functional change, at least now. Preparation for later commits.
But we are planning to extend KeepConfiguration= and also keep
addresses and so on assigned by other dynamic configuration protocol
like DHCPv6 or NDisc.
However, when link_free_engines() is called here, acquired addresses so
on by NDisc will be removed, even if link_stop_engines() handles
restarting networkd or KeepConfiguration= gracefully.
So, let's not free engines here, but free them later in link_free().
It is not necessary to be called here anyway.
The names of these conflict with macros from efi.h that we'll move
to efi-fundamental.h in a later commit. Let's avoid the conflict by
getting rid of these helpers. Arguably this also improves readability
by clearly indicating we're passing arbitrary strings and not constants
to the macros when we invoke them.
Currently ask_password_auto() will always try to store the password into
the user keyring. Let's make this configurable so that we can configure
ask_password_auto() into the session keyring. This is required when working
with user namespaces, as the user keyring is namespaced by user namespaces
which makes it impossible to share cached keys across user namespaces by using
the user namespace while this is possible with the session keyring.
With https://github.com/systemd/mkosi/pull/3164, we'll be able to run
arbitrary commands in the mkosi sandbox, which has /usr from the tools
tree if one is configured. Let's add the required packages to be able to
run meson to setup the integration tests. This allows running the integration
tests without having to install meson or other build dependencies on the
host system.
"""
mkosi sandbox meson setup build
mkosi sandbox meson compile -C build mkosi
mkosi sandbox env SYSTEMD_INTEGRATION_TESTS=1 meson test -C build ...
"""
Currently, bind-mounted directories within a user/mount namespace get
the uid/gid stored on their files. If the host creates a file in the
source directory, it will still show as root in the namespace.
Id-mapping is a filesystem feature that allows a mount namespace to show
a different uid than what is actually stored on a file. Add support for
id-mappings to exec directories, so that the files within the mount
namespace are owned by the unprivileged uid/gid.
Example:
Using unit:
```
[Unit]
Description=Sample service
[Service]
MountAPIVFS=yes
DynamicUser=yes
PrivateUsers=yes
TemporaryFileSystem=/run /var/opt /var/lib /vol
UMask=0000
ExecStart=/bin/bash -c 'while true; do echo "ping"; sleep 5; done'
StateDirectory=andresstatedir:sampleservice
[Install]
WantedBy=multi-user.target
```
In the host namespace, creating a file "test":
```
root@abeltran-test:/var/lib/andresstatedir# ls -lah
total 8.0K
drwxr-xr-x 2 root root 4.0K Aug 21 23:48 .
drwx------ 3 root root 4.0K Aug 21 23:47 ..
-rw-r--r-- 1 root root 0 Aug 21 23:48 test
```
Within the unit namespace:
```
root@abeltran-test:/var/lib/sampleservice# ls -lah
total 4.0K
drwxr-xr-x 2 63750 63750 4.0K Aug 21 23:48 .
drwxr-xr-x 3 root root 60 Aug 21 23:47 ..
-rw-r--r-- 1 63750 63750 0 Aug 21 23:48 test
```
```
root@abeltran-test:/# mount | grep and
/dev/sda1 on /var/lib/private/andresstatedir type ext4 (rw,nosuid,noexec,relatime,idmapped,discard,errors=remount-ro,commit=30)
```
The check for the old flag was not restored when the weak blocker was
added, add it back. Also skip polkit check for root for the weak
blocker, to keep compatibility with the previous behaviour.
Partially fixes https://github.com/systemd/systemd/issues/34091
Follow-up for 804874d26a
The check for the old flag was not restored when the weak
blocker was added, add it back. Also skip polkit check for
root for the weak blocker, to keep compatibility with the
previous behaviour.
Partially fixes https://github.com/systemd/systemd/issues/34091
Follow-up for 804874d26a
We're just running a language server so no need to put a writable
overlay on top of the build sources to prevent modifications. This
hopefully helps the language server track modifications to the source
files better.
Let's disable symlink following if we attach a container's mount tree to
our own mount namespace. We afte rall mount the tree to a different
location in the mount tree than where it was inside the container, hence
symlinks (if they exist) will all point to the wrong places (even if
relative, some might point to other places). And since symlink attacks
are a thing, and we let libdw operate on the tree, let's lock this down
as much as we can and simply disable symlink traversal entirely.
This makes use of the new TIOCGPTPEER pty ioctl() for directly opening a
PTY peer, without going via path names. This is nice because it closes a
race around allocating and opening the peer. And also has the nice
benefit that if we acquired an fd originating from some other
namespace/container, we can directly derive the peer fd from it, without
having to reenter the namespace again.
When an exec directory is shared between services, this allows one of the
service to be the producer of files, and the other the consumer, without
letting the consumer modify the shared files.
This will be especially useful in conjunction with id-mapped exec directories
so that fully sandboxed services can share directories in one direction, safely.
This allows an unprivileged user that is active at the console to change
the fields that are in the selfModifiable allowlists (introduced in a
previous commit) without authenticating as a system administrator.
Administrators can disable this behavior per-user by setting the
relevant selfModifiable allowlists, or system-wide by changing the
policy of the org.freedesktop.home1.update-home-by-owner Polkit action.
Let's disable symlink following if we attach a container's mount tree to
our own mount namespace. We afte rall mount the tree to a different
location in the mount tree than where it was inside the container, hence
symlinks (if they exist) will all point to the wrong places (even if
relative, some might point to other places). And since symlink attacks
are a thing, and we let libdw operate on the tree, let's lock this down
as much as we can and simply disable symlink traversal entirely.
In
68511cebe5
the ability to pass the
coredump's mount namespace fd from the coredump patter handler was added
to systemd-coredump. For this the protocol was augmented, in attempt to
provide both forward and backward compatibility.
The protocol as of v256: one or more datagrams with journal log fields
about the coredump are sent via an SOCK_SEQPACKET connection. It is
finished with a zero length datagram which carries the coredump fd (this
last datagram is called "sentinel" sometimes).
The protocol after
68511cebe5
is extended
so that after the sentinal a 2nd sentinel is sent, with a pair of fds:
the coredump fd *again* and a mount fd (acquired via open_tree()) of the
container's mount tree. It's a bit ugly to send the coredump fd a 2nd
time, but what's more important the implementation didn't work: since on
SOCK_SEQPACKET a zero sized datagram cannot be distinguished from EOF
(which is a Linux API design mistake), an early EOF would be
misunderstood as a zero size datagram lacking any fd, which resulted in
protocol termination.
Moreover, I think if we touch the protocol we should make the move to
pidfs at the same time.
All of the above is what this protocol rework addresses.
1. A pidfd is now sent as well
2. The protocol is now payload, followed by the coredump fd datagram (as
before). But now followed by a second empty datagram with a pidfd,
and a third empty datagram with the mount tree fd. Of this the latter
two or last are optional. Thus, it's now a stream of payload
datagrams with one, two or three fd-laden datagrams as sentinel. If
we read the 2nd or 3rd sentinel without an attached fd we assume this
is actually an EOF (whether it actually is one or not doesn't matter
here). This should provide nice up and down compatibility.
3. The mount_tree_fd is moved into the Context object. The pidfd is
placed there too, as a PidRef. Thus the data we pass around is now
the coredump fd plus the context, which is simpler and makes a lot
more semantical sense I think.
4. The "first" boolean is replaced by an explicit state engine enum
Fixes: https://github.com/systemd/systemd/issues/34130
instead of passing a boolean picking the destruction method just have
different functions. That's much nicer in context of _cleanup_, and how
we usually do things.
Use pidref to acquire some fields. This just makes use of the pidref
helpers we already have. We acquire a lot of other data via classic pids
still, but for that we first have to write race-free pidref getters,
hence leave that for another time.
In 68511cebe5 the ability to pass the
coredump's mount namespace fd from the coredump patter handler was added
to systemd-coredump. For this the protocol was augmented, in attempt to
provide both forward and backward compatibility.
The protocol as of v256: one or more datagrams with journal log fields
about the coredump are sent via an SOCK_SEQPACKET connection. It is
finished with a zero length datagram which carries the coredump fd (this
last datagram is called "sentinel" sometimes).
The protocol after 68511cebe5 is extended
so that after the sentinal a 2nd sentinel is sent, with a pair of fds:
the coredump fd *again* and a mount fd (acquired via open_tree()) of the
container's mount tree. It's a bit ugly to send the coredump fd a 2nd
time, but what's more important the implementation didn't work: since on
SOCK_SEQPACKET a zero sized datagram cannot be distinguished from EOF
(which is a Linux API design mistake), an early EOF would be
misunderstood as a zero size datagram lacking any fd, which resulted in
protocol termination.
Moreover, I think if we touch the protocol we should make the move to
pidfs at the same time.
All of the above is what this protocol rework addresses.
1. A pidfd is now sent as well
2. The protocol is now payload, followed by the coredump fd datagram (as
before). But now followed by a second empty datagram with a pidfd,
and a third empty datagram with the mount tree fd. Of this the latter
two or last are optional. Thus, it's now a stream of payload
datagrams with one, two or three fd-laden datagrams as sentinel. If
we read the 2nd or 3rd sentinel without an attached fd we assume this
is actually an EOF (whether it actually is one or not doesn't matter
here). This should provide nice up and down compatibility.
3. The mount_tree_fd is moved into the Context object. The pidfd is
placed there too, as a PidRef. Thus the data we pass around is now
the coredump fd plus the context, which is simpler and makes a lot
more semantical sense I think.
4. The "first" boolean is replaced by an explicit state engine enum
Fixes: #34130
Let's rename this local variable, since we are not operating on the
coredump process here after all, but on the leader of the namespace the
coredump process in, which is quite different, hence let's make this
very clear via the name.
Follow-up for d2ebf5cc1d.
The detailed error response is already logged, hence not necessary to
log again with the errno converted from the error response, which typically
less informative, e.g.
===
varlink-26-26: Setting state idle-server
varlink-26-26: Received message: {"method":"io.systemd.UserDatabase.GetUserRecord","parameters":{"service":""}}
varlink-26-26: Changing state idle-server → processing-method
varlink-26-26: Sending message: {"error":"io.systemd.UserDatabase.BadService","parameters":{}}
varlink-26-26: Changing state processing-method → processed-method
varlink-26-26: Callback for io.systemd.UserDatabase.GetUserRecord returned error: Invalid request descriptor
varlink-26-26: Changing state processed-method → idle-server
varlink-26-26: Got POLLHUP from socket.
===
systemd-sysupdated is still unstable and we'd like to make breaking
changes to it even after the v257 release, so we document it as such and
disable building it by default in release builds. The distro can still
opt-in, and we still build it in developer mode so it has CI coverage
This commit introduces a build-time option to enable/disable sysupdated
separately from sysupdate. 'auto' translated to enabled by default in
developer builds.
Setting this flag is a noop without a corresponding call to
posix_spawnattr_setsigdefault.
If we call posix_spawnattr_setsigdefault with a full signal set,
it causes glibc's posix_spawn implementation to call sigaction 63 times,
once for each signal. That seems wasteful.
This feature is really only useful for signals which have their
disposition set to SIG_IGN. Otherwise the dispostion gets set to
SIG_DFL automatically, either by clone(CLONE_CLEAR_SIGHAND) or the
subsequent execve.
As far as I can tell, systemd does not have any signals set to SIG_IGN
under normal operating conditions.
The previous behavior of systemctl --when= seems absurd, i.e.
if we fail to schedule shutdown in the future it's performed
immediately. Let's instead hard fail, which also removes the need
of specializing on certain errnos (preparation for later commits).
Those text sections had a trailing NUL byte. It's debatable whether this is a
good idea or not. Correctly written consumers will look at the section size so
they wouldn't need this. Shim doesn't use a trailing NUL, so let's follow suit.
Fixes https://github.com/systemd/systemd/issues/33731.
898e9edc46 reworked this code, but didn't actually
change the logic. We have always been appending the trailing zero by using a
NUL-terminated string as the section contents. (I checked this with v253.18
from before the elf2efi rework.)
.sdmagic contains a string like "#### LoaderInfo: systemd-boot 257~devel ####",
which changes with each version, so previous versions would compare unequal
anyway, so we don't need to worry about backwards compatibility.
OSC sequences can be closed with one of three terminators:
1. ASCII code 7, aka BEL, aka ^G, aka \x07, aka \a
2. ASCII code 156, aka \x9c
2. Pair of ASCII code 27 followed by ASCII code 92, aka \x1b\x5c
Of these, in some corner case scenarios BEL makes problem (see #34604).
Hence switch away from that wherever we use it, and prefer the \x1b\x5c
instead. That's preferable over \x9c, since the latter is also a valid
UTF-8 codepoint. See discussion here for example:
https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#the-escape-sequenceFixes: #34604
This effectively reverts d2c1451b73.
After the commit d2ebf5cc1d, sd_varlink_error()
returns negative errno, hence the function always return negative errno
on failure.
Follow-up for 3cb72c7862.
The test container exits shortly, hence when varlinkctl is called, the
container may be already terminated. Let's make the container live
infinitely.
Also, this makes the os-release files removed after the container is started.
Let's make sure that sd_varlink_error() always returns an error code, so
that we can use it in a style "return sd_varlink_error(…);" everywhere,
which has two effects: return a good error reply to clients, and exit
the current stack frame with a failure code.
Interestingly sd_varlink_error_invalid_parameter() already worked like
this in some cases, but sd_varlink_error() itself didn't.
This is an alternative to the error handling tweak proposed in #34882,
but I think is a lot more generically useful, since it establishes a
pattern.
I checked our codebase, and this change should generally be OK without
breaking callsites, since the current callers (with exception of the
machined case from #34882) called sd_varlink_error() in the outermost
varlink method call dispatch stack frame, where this behaviour change
does not alter anything.
This is similar btw, how sd_bus_error_setf() and friends always return
error codes too, synthesized from its parameters.
All our public headers strive to C90 compatibility with a few
extensions, and thus avoided stdbool.h and bool.
The sd_json_format_enabled() helper seems like a poor place to start
requiring stdbool.h now.
Also drop __extension__ since we are not using it anywhere else in very
similar inline functions.
(And we probably should drop any _sd_const declarations on inline
functions. Given that the compiler has the function implementation
around always, because it's in the header there's really no reason to
specify this manually, the compiler can trivially figure this out on its
own. But that's for another time.)
Various fixes:
1. Adds O_CLOEXEC for two socketpair()s where we forgot it.
2. Uses FORK_WAIT instead of manual wait_for_terminate_and_check()
invocations.
3. Prefix opaque NULL/0 arguments with comments what they are.
4. Add a banch of assert()s, and change flag validation in
open_terminal() to be assert() (since flags mistakes are programming
errors, not runtime errors).
Then, when a .netdev file of a stacked netdev is modified, the netdev
can be reconfigured with the updated setting by something like the
following way:
```
ip link del vlan99
networkctl reload
```
Note, removing the vlan interface in the above example may not be necessary,
e.g. when only VLAN flags, egress mapping, or ingress mapping are updated.
But, it is necessary when VLAN ID is updated.
Closes#9627.
Closes#27177.
Closes#34907.
Replaces #22557.
Allows unifying the custom logic for the hostname and root shell. Root
password prompting remains separate as it's logic is substantially
different to the other prompts.
In mkosi, we want an easy way to set the keyring timeout for every
tool we invoke that might use systemd-ask-password to query for a
password which is then stored in the kernel keyring. Let's make this
possible via a new $SYSTEMD_ASK_PASSWORD_KEYRING_TIMEOUT_SEC environment
variable.
Using an environment variable means we don't have to modify every separate
tool to add a CLI option allowing to specify the timeout. In mkosi specifically,
we'll set up a new session keyring for the mkosi process linked to the user keyring
so that any pins in the user keyring are used if available, and otherwise we'll query
for and store password in mkosi's session keyring with a zero timeout so that they stay
in the keyring until the mkosi process exits at which point they're removed from the
keyring.
By default mount(8), umount(8), swapon(8) and swapoff(8) should run with
with the SMACK label inherited from systemd rather than the default one
meant for services.
Fixes: aa5ae9711e
Follow-up-for: 20bbf5ee4c
Several netdevs cannot set IFLA_ADDRESS or IFLA_MTU attribute on update.
Currently, the vtable field is unused, as we do not support updating
existing netdevs. Preparation for later commits.
- network_load() is always called with an empty OrderedHashmap, renamed the output
parameter to 'ret'.
- When netdev_load() is called on startup, the hashmap is NULL. When it is
called on reloading, the hashmap is not cleaned up.
Hence, then these cleanups are always no-op. Let's drop them.
This makes networkd process all queued remove requests when a
terminating or restarting signal is received. Otherwise, e.g. DHCPv4
address will not be removed on stop, especially when
KeepConfiguration=no.
Fixes a bug introduced by 85a6f300c1 and
its subsequent commits.
Fixes#34837.
Co-authored-by: Will Fancher <elvishjerricco@gmail.com>
Users will generally know what a qrcode is, so let's not treat them as dumb and
explain that it can be scanned. OTOH, we should say what the qrcode contains
and it is useful to give a hint why the users would want to scan it. Reword
messages accordingly.
(Also, don't say "to your phone", when somebody might be using a stolen phone,
or something else then a phone.)
People know what a qrcode is. We don't need to tell them to scan it.
Instead, we should say what the code contains.
While at it, rename "stream" to "f" in line with the usual style.
There's still some breaking changes we want to make to sysupdated, but
they'll potentially take months and we don't want to block the systemd
release for that long. So, we can instead mark sysupdate's API as
unstable
Let's make ConfigurationDirectory= a bit less "special-casey", by hiding
the fact that it's the only per-service dir we do not do chown()ing for
inside of a new EXEC_DIRECTORY_TYPE_SHALL_CHOWN() helper.
When building distribution packages without building an image, the
distribution packages will only be located in mkosi.builddir/ now and
not in mkosi.output/, so update the documentation to reflect that.
Also add installation instructions for distributions other than
CentOS/Fedora while we're at it.
When building distribution packages without building an image, the
distribution packages will only be located in mkosi.builddir/ now and
not in mkosi.output/, so update the documentation to reflect that.
Also add installation instructions for distributions other than CentOS/Fedora
while we're at it.
Print the times in seconds in the tooltip to remove the need to count
and trying to follow the lines in the svg diagram in order to see at
what times these events happen.
If an ifindex is specified, we are modifying the existing interface.
Hence, these flags should not be set. Otherwise, the request will be
refused with -EEXIST.
Not all possible DNS names will survive serialization. Restrict the set
of valid dns names to LDH encoded names.
Fixes: 25c33e3500 (network: parse RFC9463 DHCPv4 DNR option, 2024-01-16)
Fixes: a07e83cc58 (network: Parse RFC9463 DHCPv6 DNR option, 2024-01-17)
Fixes: 0c90d1d2f2 (ndisc: Parse RFC9463 encrypted DNS (DNR) option, 2024-01-19)
Currently every progress update results in a new progress message
which is extremely verbose. Instead, let's use the progress bar infra
to draw a proper progress bar similar to what we do in systemd-repart
now.
This generates the Windows Terminal OSC sequences indicating progress.
This let's the terminal know that we are doing a slow operation, and how
we are progressing.
Windows Terminal uses this in two ways: it shows a circle in the tab
that completes, and it highlights the progress in the task bar.
I found no Linux terminal that currently supports it, but also none that
didn't like it. Thankfully most terminals correctly ignore unrecognized
OSC sequences.
I think we should just merge this, and see if this trips up too many
people, but I have reason to believe this shouldn't be too bad.
And yes, I do work from Windows Terminal sometimes, ssh into my Linux
build systems, and it is really cute seeing the progress animation
there.
Let's ramp up security for system user accounts, at least where
possible, by creating them fully locked (instead of just with an invalid
password). This matters when taking non-password (i.e. SSH) logins into
account.
Fixes: #13522
* e42eed4afd test_sysusers_defined: support new ! line flag for creating fully locked accounts
* 2c6a4e2f90 Version 256.7
* bedc0270e7 Move yum/dnf protection removal config file under /usr
* 5a82129a41 Reword some descriptions
* ce99022f7b Version 256.6
We should avoid unnecessary abbreviations for such messages, and this
puts a maximum limit on things, hence it should indicate this in the
name.
Moreover, matches is a bit confusing, since most people will probably
call "busctl monitor" without any match specification, i.e. zero
matches, but that's not what was meant here at all.
Also, add a brief switch for this (-N) since I figure in particular
"-N1" might be a frequent operation people might want to use.
Follow-up for: 989e843e75
See: #34048
The --timeout= logic was implemented incorrectly, as it would not put a
a limit on the runtime of the operation, but only on the IO sleep.
However, spurious wakeups are possible, hence the timer would be reset
too often.
Fix that, by determining the absolute timestamp early, and checking
against that.
Follow-up for: 989e843e75
See: #34048
This generates the Windows Terminal OSC sequences indicating progress.
This let's the terminal know that we are doing a slow operation, and how
we are progressing.
Windows Terminal uses this in two ways: it shows a circle in the tab
that completes, and it highlights the progress in the task bar.
I found no Linux terminal that currently supports it, but also none that
didn't like it. Thankfully most terminals correctly ignore unrecognized
OSC sequences.
I think we should just merge this, and see if this trips up too many
people, but I have reason to believe this shouldn't be too bad.
And yes, I do work from Windows Terminal sometimes, ssh into my Linux
build systems, and it is really cute seeing the progress animation
there.
We are going to output a series of JSON objects, hence let's
automatically enable JSON-SEQ output mode, as we usually do.
"jq --seq" supports this natively, hence this should not really restrict
us.
Follow-up for: 67ea8a4c0e
Now that we have the mkosi.clangd script to run clangd from the mkosi
build script, it becomes clear that doing cleanup with mkosi.clean has
a big gap in that we always run the mkosi.clean script and thus we also
run it when we run the mkosi.clangd script, causing the previously built
packages to be removed when we run clangd without producing new ones.
In mkosi we're improving the situation by only running clean scripts when we
clean up the output directory and disallowing writing to the output directory
from build scripts.
Let's adapt systemd to these changes by moving the copying of packages to the
output directory to the postinst script.
When invoked on a running system, bsod would not print the qrcode.
The check for "color support" on stdout is pointless, since we're not
printing to stdout but to a terminal fd that is opened separately.
Before a339495b1d, update-utmp typically
connects the public DBus socket when disconnected from the private DBus
socket, as dbus service should be active even during PID1 is being reexecuted.
However, after a339495b1d, update-utmp
tries to connect only the private DBus socket, but reexecution of PID1
may be slow, hence all trials may fail when the reexecution is slow.
With this change, now it waits for 100ms to 2000ms, so in total it waits
about 37 seconds in average, previously about 4 seconds.
This applies the existing SocketUser=/SocketGroup= options to units
defining a POSIX message queue, bringing them in line with UNIX
sockets and FIFOs. They are set on the file descriptor rather than
a file system path because the /dev/mqueue path interface is an
optional mount unit.
This commit adds two settings private and strict to
the ProtectControlGroups= property. Private will unshare the cgroup
namespace and mount a read-write private cgroup2 filesystem at /sys/fs/cgroup.
Strict does the same except the mount is read-only. Since the unit is
running in a cgroup namespace, the new root of /sys/fs/cgroup is the unit's
own cgroup.
We also add a new dbus property ProtectControlGroupsEx which accepts strings
instead of boolean. This will allow users to use private/strict via dbus
and systemd-run in addition to service files.
Note private and strict fall back to no and yes respectively if the kernel
doesn't support cgroup2 or system is not using unified hierarchy.
Fixes: #34634
This commit refactors ProtectControlGroups= from using a boolean
in the dbus/execute backend to using an enum. There is no functional
change but this will allow adding new non-boolean values (e.g. strict,
private) a la PrivateHome.
The journal handles multi-line messages nicely, and they are easier
to read. Drop the recycling symbol, there is no circular process here,
we go from a to b and never back to a again.
We often used a pattern like if (!FLAGS_SET(flags, SD_JSON_FORMAT_OFF)),
which is rather verbose and also contains a double negative, which we try
to avoid. Add a little helper to avoid an explicit bit check.
This change clarifies an aditional thing: in some cases we treated
SD_JSON_FORMAT_OFF as a flag (flags & SD_JSON_FORMAT_OFF), while in other cases
we treated it as an independent enum value (flags == SD_JSON_FORMAT_OFF).
In the first form, flags like SD_JSON_FORMAT_SSE do _not_ turn the json
output on, while in the second form they do. Let's use the first form
everywhere.
No functional change intended.
Initially I wasn't sure if this helper should be made public or just internal,
but it seems such a common pattern that if we expose the flags, we might just
as well expose it too, to make life easier for any consumers.
We would need to use pure if the funtion was getting pointers and
dereferencing them. But sd128_t is a structure and those functions
only access the parameters of the call.
The default definition to add is `-D__loongarch64__`, which is not searched in [bpf_tracing.h](09b9e83102/src/bpf_tracing.h (L68))
This may avoid `error: Must specify a BPF target arch via __TARGET_ARCH_xxx` in loongarch64
Signed-off-by: Zhou Qiankang <wszqkzqk@qq.com>
The goal of RestartMode=direct is to make restarts invisible
to dependents, so auto restart jobs shouldn't bring them down
at all. So far we only skipped going through failed/dead states
in service_enter_dead(), i.e. the unit would never be considered
dead. But when constructing restart transaction, the stop job
would be propagated to dependents. Consider the following 2 units:
dependent.target:
[Unit]
BindsTo=a.service
After=a.service
a.service:
[Service]
ExecStart=bash -c 'sleep 100 && exit 1'
Restart=on-failure
RestartMode=direct
Before this commit, even though BindsTo= isn't triggered since
a.service never failed, when a.service auto-restarts, dependent.target
is also restarted. Let's suppress it by using JOB_REPLACE instead of
JOB_RESTART_DEPENDENCIES in service_enter_restart().
Fixes#34758
The example above is subtly different from the original report,
to illustrate that the new behavior makes sense for less exotic
use cases too.
Restart jobs are always run as stop jobs initially, and later gets
converted to start jobs by job engine. Hence UNIT_ATOM_PROPAGATE_STOP
should and does cover the restart case, as currently all dep types
with _RESTART also carries _STOP. Drop UNIT_ATOM_PROPAGATE_RESTART.
A colleague reported when RootDirectory= does not exist, systemd reports an error like:
```
Failed to set up mount namespacing: No such file or directory
```
Unfortunately, with large spec files, it can be hard to diagnose which path systemd is talking
about. Thus, to make the error message more helpful and similar to mount error messages, we add
the root directory/image path into the error message like:
```
Failed to set up mount namespacing: /tmp/thisdoesnotexist: No such file or directory
```
No functional change, but let's print yes/no rather than on/off in systemd-analyze.
Similar to 2e8a581b9c and
edd3f4d9b7.
(Note, the commit messages of those commits are wrong, as
parse_boolean() supports on/off anyway.)
We must be prepared that systemd temporarily drops off the bus or
disconnects our direct connections (due to systemctl daemon-reexec or
so). Hence automatically reconnect when we watch the unit status, and
handle this case gracefully.
Fixes: #32906#27204
Let's not confuse users with the login shell indicator and drop it from
the description. This means a run0 session will now usually show up with
a description of "[run0] /bin/bash" rather than "[run0] -/bin/bash".
I think we should try to communicate clearly if something is a run0
session, or a systemd-run invocation. Hence, let's initialize the
description so that the command is prefixed by
program_invocation_short_name.
Effectively this means that our run0 sessions now appear as services
with a description of "[run0] -/bin/bash"
The current logic is a bit complex how systemd-run units are called. It
used to be just the unique ID of the dbus connection. Which was nice,
since its system-widely, uniquely assigned to us. But this didn't work
out well, due to direct connections to PID 1 and due to soft reboots.
We nowadays have a better ID to use though, with nicer properties: the
kernel manages a pidfd ID for every process after all, and it's globally
unique, for any process, and regardless of soft reboots. Hence use that
for naming preferably, and just keep one branch with a randomized name
as fallback.
This makes use of the infra introduced in 229d4a9806 to indicate visually on each prompt that we are in superuser mode temporarily.
pick ad5de3222f userdbctl: add some basic client-side filtering
When PAMName= is set this should be enough to go through our entire user
changing story, so that PAM is definitely run, and environment variables
definitely pulled in and so on.
Previously, it would happen that under some circumstances we might no do
this when transitioning from root to root itself even though PAM was
enabled.
Fixes: #34682
This adds some basic client-side user/group filtering to "userdbctl":
1. by uid/gid min/max
2. by user "disposition" (i.e. show only regular users with "userdbctl
user -R")
3. by fuzzy name (i.e. search by substring/levenshtein of user name,
real name, and other identifiers of the user/group record).
In the long run we also want to support this server side, but let's
start out with doing this client-side, since many backends won't support
server-side filtering anytime soon anyway, so we need it in either case.
Currently we need ukify with support for --profile and --join-profile
which isn't in an official release yet so mention that a local build
from source might be required.
Otherwise, ProtectHome=tmpfs makes /home/ and friends not read-only.
Also, mount options for /run/ specified in MountAPIVFS=yes are not
applied.
The function append_static_mounts() was introduced in
5327c910d2, but at that time, there were
neither .read_only nor .options in the struct. But, when later the
struct is extended, the function was not updated and they were not
copied from the static table.
The fields has been used in static tables since
e4da7d8c79, and also in
94293d65cd.
Fixes#34825.
Currently, when comparing two DNS names when storing them in a
hashtable, and the DNS names are not actually valid we'll compare the
error codes.
This is not very smart however, since this means two invalid DNS names
that happen to be equally "invalid" will be considered identical, even
if their strings are entirely different.
Let's find a better solution for this niche case: let's simple compare
the domains as strings.
This matters in case of DNS label compression: if we already added added
an invalid DNS name into the label compression hash table, and lookup
any other invalid DNS name, this lookup will likely return what the
earlier one already returned, and that's confusing.
* 07a294d0c6 Do not mask systemd-gpt-auto-generator in upstream CI builds
* 5636398bf7 Backport patch to fix test failures with tzdata 2024b-1
* 354ded4946 Update changelog for 256.7-2 release
* e38c7c5345 Backport fixes for upstream autopkgtest suite
* 249676834c Disable utmp support, not y2038 safe
* 822d44da42 initramfs-tools: support missing /etc/udev/udev.conf
* ad71ebf700 systemd-boot: depend on systemd for kernel-install
* 5bf7008ef8 d/systemd.postinst: do not restart systemd-binfmt.service if masked
* 58d5aa1b41 d/rules: mask systemd-gpt-auto-generator on Ubuntu
* 481987d85c Update changelog for 256.7-1 release
* ce7f3d4b43 Revert "autopkgtest: skip TEST-64-UDEV-STORAGE due to qemu crash"
* 7007e73b22 Mark dependencies on clang and bpftool as :native
* 0e120cf704 Update upstream source from tag 'upstream/256.7'
|\
| * 914aae055c New upstream version 256.7
* fcea89cb00 d/t/upstream: honor /etc/apt configured by autopkgtest
service_enter_running() would re-arm timer for RuntimeMaxSec=,
hence it should be called instead of disabling timer completely
when live mount operation fails, in a similar fashion as
service_enter_reload_by_notify().
Clients should be able to know if the idle logic is available on a
session without secondary knowledge about the session class. Let's hence
expose a property for that.
Similar for the screen lock concept.
Fixes: #34844
Currently, when SD_JSON_FORMAT_OFF is set in verb_call, the json format
flags are set to SD_JSON_FORMAT_PRETTY_AUTO|SD_JSON_FORMAT_COLOR_AUTO,
rather than or'ing those flags in. This means that other flags that may
have been set, e.g. SD_JSON_FORMAT_SEQ when --more is set, will be
clobbered.
Fix this by masking SD_JSON_FORMAT_OFF out, and then or'ing the new
flags in.
CNAME doesn't exist at the zone apex. When we get an unsigned noerror
response to a direct query for a CNAME record, we don't yet know if this
name is zone apex. We already request the correct DS record in this
case, but previously skipped it at validation time, causing the answer
to appear bogus. Make sure to also consider the DS record for the query
name for negative replies.
Move the renaming function to reboot-util.h (since it writes out
/run/nologin at shutdown), and let's get rid of fileio-label.[ch] now
that it serves no purpose anymore.
This brings two benefits: we will label the created file only if it is
actually created, and we can correctly delete any file we create again
on failure.
Given that we have the LabelOps abstraction these days, we can teach
write_string_file() to use it, which means we can get rid of
fileio-label.[ch] as a separate concept.
(The only reason that fileio-label.[ch] exists independently of
fileio.[ch] was that the former linekd to libselinux potentially, and
thus had to be in src/shared/ while the other always was in src/basic/.
But the LabelOps vtable provides us with a nice work-around)
One of the big mistakes of Linux is that when you create a file with
open() and O_CREAT and the file already exists as dangling symlink that
the symlink will be followed and the file created that it points to.
This has resulted in many vulnerabilities, and triggered the creation of
the O_MOFOLLOW flag, addressing the problem.
O_NOFOLLOW is less than ideal in many ways, but in particular one: when
actually creating a file it makes sense to set, because it is a problem
to follow final symlinks in that case. But if the file is already
existing, it actually does make sense to follow the symlinks. With
openat_report_new() we distinguish these two cases anyway (the whole
function exists only to distinguish the create and the exists-already
case after all), hence let's do something about this: let's simply never
create files "through symlinks".
This can be implemented very easily: just pass O_NOFOLLOW to the 2nd
openat() call, where we actually create files.
And then basically remove 0dd82dab91
again, because we don't need to care anymore, we already will see ELOOP
when we touch a symlink.
Note that this change means that openat_report_new() will thus start to
deviate from plain openat() behaviour in this one small detail: when
actually creating files we will *never* follow the symlink. That should
be a systematic improvement of security.
Fixes: #34088
For SELinux it is essential that we reset the file creation label both
in the success and in the error path, hence do so.
Moreover, when calling the label post ops do it if possible with the
opened fd of the inode itself, rather than always going via its path,
simply to reduce the attack surface.
If openat_report_new() fails, then 'made_file' will be false, as no file
was created, hence there's no need to skip the unlinkat() explicitly
early, given that we check for 'made_file' anyway in the error path. The
extra error code checks are hence entirely redundant.
We have two distinct implementations of the post hook.
1. For SELinux we just reset the selinux label we told the kernel
earlier to use for new inodes.
2. For SMACK we might apply an xattr to the specified file.
The two calls are quite different: the first call we want to call in all
cases (failure or success), the latter only if we actually managed to
create an inode, in which case it is called on the inode.
We noticed some failures because we have code that connects to user
managers by setting DBUS_SESSION_BUS_ADDRESS without setting XDG_RUNTIME_DIR.
If that's the case, connect to the user session bus instead of the
private manager bus as we can't connect to the latter if XDG_RUNTIME_DIR
is not set.
This reverts the following commits:
- 180cc5421d
"sd-dhcp6-client: allow to request IA_PD on information requesting mode"
- cf7a403e47
"sd-dhcp6-lease: adjust information refresh time with lifetime of IA_PD"
- 1918eda30d
"network/dhcp6: process hostname and IA_PD on information requesting mode"
As per discussion in #34299,
https://github.com/systemd/systemd/issues/34299#issuecomment-2425153221
the offending commits violate RFC 8415 section 18.2.6:
> The client uses an Information-request message to obtain
> configuration information without having addresses and/or delegated
> prefixes assigned to it.
The structure of DNR options is considerably more complicated than most
DHCP options, and as a result the fuzzer has poor coverage of these code
paths.
This adds some DNR packets to the fuzzing corpus, not with the intent of
capturing some specific edge case, but with the intent to rapidly
improve the fuzzers' coverage of these codepaths by giving it a valid
example to begin with.
Also include an ndisc router advert with a few Encrypted DNS options,
for the same purpose.
Implement serialization/deserialization for DNR servers. This re-uses
the string format in place for user configuration of DoT servers, and as
a consequence non-DoT servers are discarded when recording the link
configuration, for correctness.
This also enables sd-resolved to use these servers as it would other DNS
servers.
This option will control the use of DNR for choosing DNS servers on the
link. Defaults to the value of UseDNS so that in most cases they will be
toggled together.
Now that mkosi supports generating UKI profiles, let's make use of
that to generate the UKI profiles required for the test instead of
doing it within the test itself.
Otherwise, when the test is executed on a system with signed PCRs,
cryptenroll will automatically pick up the public key from the UKI
which results in a volume that can't be unlocked because the pcrextend
tests appends extra things to pcr 11.
Let's provide a mechanism to select the number of screen columns for
rebreaking comments in Varlink IDL connected to a TTY, by honouring the
$COLUMNS env var then too. Previously we'd only honour when connected to
a TTY, but it's also useful otherwise for rebreaking ridiculously long
comments, hence honour it in this case too.
Previously, we were using touch(), which usually works fine, because the
path should always refer to an existing directory, in which case it just
updates the timestamp. However, if the dir does not exist yet (which
shouldn't happen), it would be created as regular file, which is just
wrong.
Hence, let's instead create the dir as dir if it is missing, and then
update its timestamp.
Same change as https://github.com/systemd/systemd/pull/34583 but for
systemd-measure. Otherwise we end up with PCR policy digest mismatches
as systemd-stub will measure the full virtual size of the kernel image
after it has been loaded while systemd-measure will disregard the extra
size introduced by SizeOfImage.
While ideally the stub would only measure the data that's actually on
disk and not the uninitialized data introduced by VirtualSize > SizeOfRawData,
we want newer systemd-measure to work with older stubs so we have to fix
systemd-measure and can't fix this in the stub.
Previously a full packet was cached only if the CD bit was set, but this
no longer corresponds to the cases where bypass is enabled.
Update the cache to retain a full packet in the cases where it might
actually be useful.
This is useful for a validating resolver to indicate to a non-validating
resolver when checking was disabled for the query. This matches the
behavior of the major public resovlers in response to queries with CD bu
tnot DO set.
Following 13e15dae9f, resolved does not forward the AD bit for bypass
queries, but resolved also didn't do it's own validation, making these
replies appear to never be authentic. We should enable validation for
bypass queries.
Let's disable our own validation when processing a +cd query, and also
ensure that it skips the cache so that we don't accidentally fail to
return inauthentic replies from upstream.
Previously, when we had a bypass transaction without cd, a cached,
authenticated, reply with cd could be served, leaving the cd bit
erroneously set in the reply. Only reply with a CD bit if the client
requested it.
Fixes: 13e15dae9f (resolved: clear the AD bit for bypass packets)
Optional features allow distros to define sets of transfers that can
be enabled or disabled by the system administrator. This is useful for
situations where a distro may want to ship some resources version-locked
to the core OS, but many people have no need for the resource, such as:
development tools/compilers, drivers for specialized hardware, language
packs, etc
We also rename sysupdate.d/*.conf -> sysupdate.d/*.transfer, because
now there are more than one type of definition in sysupdate.d/. For
backwards compat, we still load *.conf files as long as no *.transfer
files are found and the *.conf files don't try to declare themselves
as part of any features
Fixes https://github.com/systemd/systemd/issues/33343
Fixes https://github.com/systemd/systemd/issues/33344
This option is another way for DHCP servers to indicate preferred DNS
servers for the network, but includes more detailed info like the server
name, transport (DoT/DoH/DoQ etc.), and port.
Allow our DHCPv4 client to parse this option.
This type will be used to represent a "designated resolver", and the
necessary info for communicating with it. Beyond and address endpoint,
we may need to know the dns transport, authenticated domain name, DoH
path, etc.
The change was supposed to be about respecting inhibitors, but
it was extended to also error out when there are active user
sessions, which was not intentional. Previously systemctl skipped
all checks if the caller was root or root-equivalent. Restore the
previous behaviour and again avoid blocking systemctl reboot by root
if there are active sessions, as long as there are no active
inhibitors.
Fixes https://github.com/systemd/systemd/issues/34086
Follow-up for 804874d26a
@ -42,8 +42,8 @@ If such a lock is taken the operation will fail (but still may be overridden if
The InhibitDelayMaxSec= setting in [logind.conf(5)](http://www.freedesktop.org/software/systemd/man/logind.conf.html) controls the timeout for this. This is intended to be used by applications which need a synchronous way to execute actions before system suspend but shall not be allowed to block suspend indefinitely.
The InhibitDelayMaxSec= setting in [logind.conf(5)](http://www.freedesktop.org/software/systemd/man/logind.conf.html) controls the timeout for this. This is intended to be used by applications which need a synchronous way to execute actions before system suspend but shall not be allowed to block suspend indefinitely.
This mode is only available for _sleep_ and _shutdown_ locks.
This mode is only available for _sleep_ and _shutdown_ locks.
3. _block-weak_and _delay-weak_ that work as the non-weak counterparts, but that in addition may be ignored
3. _block-weak_that works as its non-weak counterpart, but that in addition may be ignored
automatically and silently under certain circumstances, unlike the formers which are always respected.
automatically and silently under certain circumstances, unlike the former which is always respected.
Inhibitor locks are taken via the Inhibit() D-Bus call on the logind Manager object:
Inhibitor locks are taken via the Inhibit() D-Bus call on the logind Manager object:
systemd 12 and newer support lightweight password agents which can be used to query the user for system-level passwords or passphrases.
systemd 12 and newer support lightweight password agents which can be used to
These are passphrases that are not related to a specific user, but to some kind of hardware or service.
query the user for system-level passwords or passphrases. These are
Right now this is used exclusively for encrypted hard-disk passphrases but later on this is likely to be used to query passphrases of SSL certificates at Apache startup time as well.
passphrases that are not related to a specific user, but to some kind of
The basic idea is that a system component requesting a password entry can simply drop a simple .ini-style file into `/run/systemd/ask-password` which multiple different agents may watch via `inotify()`, and query the user as necessary.
hardware or service. This is used for encrypted hard-disk passphrases or to
The answer is then sent back to the querier via an `AF_UNIX`/`SOCK_DGRAM` socket.
query passphrases of SSL certificates at web server start-up time. The basic
Multiple agents might be running at the same time in which case they all should query the user and the agent which answers first wins.
idea is that a system component requesting a password entry can simply drop a
Right now systemd ships with the following passphrase agents:
simple .ini-style file into `/run/systemd/ask-password/` which multiple
different agents may watch via `inotify()`, and query the user as necessary.
The answer is then sent back to the querier via an `AF_UNIX`/`SOCK_DGRAM`
socket. Multiple agents might be running at the same time in which case they
all should query the user and the agent which answers first wins. Right now
systemd ships with the following passphrase agents:
* A Plymouth agent used for querying passwords during boot-up
* A Plymouth agent used for querying passwords during boot-up
* A console agent used in similar situations if Plymouth is not available
* A console agent used in similar situations if Plymouth is not available
* A GNOME agent which can be run as part of the normal user session which pops up a notification message and icon which when clicked receives the passphrase from the user.
This is useful and necessary in case an encrypted system hard-disk is plugged in when the machine is already up.
* A [`wall(1)`](https://man7.org/linux/man-pages/man1/wall.1.html) agent which sends wall messages as soon as a password shall be entered.
* A [`wall(1)`](https://man7.org/linux/man-pages/man1/wall.1.html) agent which sends wall messages as soon as a password shall be entered.
* A simple tty agent which is built into "`systemctl start`" (and similar commands) and asks passwords to the user during manual startup of a service
* A simple tty agent which is built into "`systemctl start`" (and similar commands) and asks passwords to the user during manual startup of a service
* A simple tty agent which can be run manually to respond to all queued passwords
* A simple tty agent which can be run manually to respond to all queued passwords
## Implementing Agents
It is easy to write additional agents. The basic algorithm to follow looks like this:
It is easy to write additional agents. The basic algorithm to follow looks like this:
* Create an inotify watch on /run/systemd/ask-password, watch for `IN_CLOSE_WRITE|IN_MOVED_TO`
* Create an inotify watch on `/run/systemd/ask-password/`, watch for `IN_CLOSE_WRITE|IN_MOVED_TO`
* Ignore all events on files in that directory that do not start with "`ask.`"
* Ignore all events on files in that directory that do not start with "`ask.`"
* As soon as a file named "`ask.xxxx`" shows up, read it. It's a simple `.ini` file that may be parsed with the usual parsers. The `xxxx` suffix is randomized.
* As soon as a file named "`ask.xxxx`" shows up, read it. It's a simple `.ini` file that may be parsed with the usual parsers. The `xxxx` suffix is randomized.
* Make sure to ignore unknown `.ini` file keys in those files, so that we can easily extend the format later on.
* Make sure to ignore unknown `.ini` file keys in those files, so that we can easily extend the format later on.
@ -42,23 +47,57 @@ It is easy to write additional agents. The basic algorithm to follow looks like
* Make sure to hide a password query dialog as soon as a) the `ask.xxxx` file is deleted, watch this with inotify. b) the `NotAfter=` time elapses, if it is set `!= 0`.
* Make sure to hide a password query dialog as soon as a) the `ask.xxxx` file is deleted, watch this with inotify. b) the `NotAfter=` time elapses, if it is set `!= 0`.
* Access to the socket is restricted to privileged users.
* Access to the socket is restricted to privileged users.
To acquire the necessary privileges to send the answer back, consider using PolicyKit.
To acquire the necessary privileges to send the answer back, consider using PolicyKit.
In fact, the GNOME agent we ship does that, and you may simply piggyback on that, by executing "`/usr/bin/pkexec /usr/lib/systemd/systemd-reply-password 1 /path/to/socket`" or "`/usr/bin/pkexec /usr/lib/systemd/systemd-reply-password 0 /path/to/socket`" and writing the password to its standard input.
For convenience, a reference implementation is provided: "`/usr/bin/pkexec /usr/lib/systemd/systemd-reply-password 1 /path/to/socket`" or "`/usr/bin/pkexec /usr/lib/systemd/systemd-reply-password 0 /path/to/socket`" and writing the password to its standard input.
Use '`1`' as argument if a password was entered by the user, or '`0`' if the user canceled the request.
Use '`1`' as argument if a password was entered by the user, or '`0`' if the user canceled the request.
* If you do not want to use PK ensure to acquire the necessary privileges in some other way and send a single datagram
* If you do not want to use PK ensure to acquire the necessary privileges in some other way and send a single datagram
to the socket consisting of the password string either prefixed with "`+`" or with "`-`" depending on whether the password entry was successful or not.
to the socket consisting of the password string either prefixed with "`+`" or with "`-`" depending on whether the password entry was successful or not.
You may but don't have to include a final `NUL` byte in your message.
You may but don't have to include a final `NUL` byte in your message.
Again, it is essential that you stop showing the password box/notification/status icon if the `ask.xxx` file is removed or when `NotAfter=` elapses (if it is set `!= 0`)!
Again, it is essential that you stop showing the password
box/notification/status icon if the `ask.xxxx` file is removed or when
`NotAfter=` elapses (if it is set `!= 0`)!
It may happen that multiple password entries are pending at the same time.
It may happen that multiple password entries are pending at the same time.
Your agent needs to be able to deal with that. Depending on your environment you may either choose to show all outstanding passwords at the same time or instead only one and as soon as the user has replied to that one go on to the next one.
Your agent needs to be able to deal with that. Depending on your environment
you may either choose to show all outstanding passwords at the same time or
instead only one and as soon as the user has replied to that one go on to the
next one.
You may test this all with manually invoking the "`systemd-ask-password`" tool on the command line.
If you write a system level agent, a smart way to activate it is using systemd
Pass `--no-tty` to ensure the password is asked via the agent system.
`.path` units. This will ensure that systemd will watch the
Note that only privileged users may use this tool (after all this is intended purely for system-level passwords).
`/run/systemd/ask-password/` directory and spawn the agent as soon as that
directory becomes non-empty. In fact, the console, wall and Plymouth agents
are started like this. If systemd is used to maintain user sessions as well
you can use a similar scheme to automatically spawn your user password agent as
well.
If you write a system level agent a smart way to activate it is using systemd `.path` units.
## Implementing Queriers
This will ensure that systemd will watch the `/run/systemd/ask-password` directory and spawn the agent as soon as that directory becomes non-empty.
In fact, the console, wall and Plymouth agents are started like this.
It's also easy to implement applications that want to query passwords this way
If systemd is used to maintain user sessions as well you can use a similar scheme to automatically spawn your user password agent as well.
(i.e. client for the agents above). Simply bind an `AF_UNIX`/`SOCK_DGRAM`
(As of this moment we have not switched any DE over to use systemd for session management, however.)
socket somewhere (suggestion: you can do this in `/run/systemd/ask-password/`
under a randomized socket name, not beginning with `ask.`). Then, create an
`/run/systemd/ask-password/ask.xxxx` (replace the `xxxx` by some randomized
string) file, with the appropriate `Message=`, `PID=`, `Icon=`, `Echo=`,
`NotAfter=` fields in the `[Ask]` section. Most importantly, include `Socket=`
pointing to your socket entrypoint. Then, just wait until the password is
delivered to you on the socket. Finally, don't forget to remove the file and
the socket once done.
## Testing
You may test agents by manually invoking the "`systemd-ask-password`" tool from
a shell. Pass `--no-tty` to ensure the password is asked via the agent system.
You may test queriers by manually invoking the
"`systemd-tty-ask-password-agent`" from a shell.
## Unprivileged Per-User Password Agents
Starting with systemd v257 the scheme is extended to per-user password
agents. A second per-user directory `$XDG_RUNTIME_DIR/systemd/ask-password/` is
now available, with the same protocol as the system-wide
counterpart. Unprivileged, per-directory agents should watch this directory in
parallel to the system-wide one. Unprivileged queriers (i.e. clients to these
agents) should pick the per-user directory to place their password request
7. Update version number in `meson.version` (e.g. from `256~devel` to `256~rc1` or from `256~rc3` to `256`). Note that this uses a tilde (\~) instead of a hyphen (-) because tildes sort lower in version comparisons according to the [version format specification](https://uapi-group.org/specifications/specs/version_format_specification/), and we want `255~rc1` to sort lower than `255`.
7. Update version number in `meson.version` (e.g. from `256~devel` to `256~rc1` or from `256~rc3` to `256`). Note that this uses a tilde (\~) instead of a hyphen (-) because tildes sort lower in version comparisons according to the [version format specification](https://uapi-group.org/specifications/specs/version_format_specification/), and we want `255~rc1` to sort lower than `255`.
8. Check dbus docs with `ninja -C build update-dbus-docs`
8. Check dbus docs with `ninja -C build update-dbus-docs`
9. Update translation strings (`ninja -C build systemd-pot`, `ninja -C build systemd-update-po`) - drop the header comments from `systemd.pot` + re-add SPDX before committing. If the only change in a file is the 'POT-Creation-Date' field, then ignore that file.
9. Check manpages list with `ninja -C build update-man-rules`
10. Tag the release: `version="v$(sed 's/~/-/g' meson.version)" && git tag -s "${version}" -m "systemd ${version}"` (tildes are replaced with hyphens, because git doesn't accept the former).
10. Update translation strings (`ninja -C build systemd-pot`, `ninja -C build systemd-update-po`) - drop the header comments from `systemd.pot` + re-add SPDX before committing. If the only change in a file is the 'POT-Creation-Date' field, then ignore that file.
11. Do `ninja -C build`
11. Tag the release: `version="v$(sed 's/~/-/g' meson.version)" && git tag -s "${version}" -m "systemd ${version}"` (tildes are replaced with hyphens, because git doesn't accept the former).
12. Make sure that the version string and package string match: `build/systemctl --version`
12. Do `ninja -C build`
13. [FINAL] Close the github milestone and open a new one (https://github.com/systemd/systemd/milestones)
13. Make sure that the version string and package string match: `build/systemctl --version`
14. "Draft" a new release on github (https://github.com/systemd/systemd/releases/new), mark "This is a pre-release" if appropriate.
14. [FINAL] Close the github milestone and open a new one (https://github.com/systemd/systemd/milestones)
15. Check that announcement to systemd-devel, with a copy&paste from NEWS, was sent. This should happen automatically.
15. "Draft" a new release on github (https://github.com/systemd/systemd/releases/new), mark "This is a pre-release" if appropriate.
16. Update IRC topic (`/msg chanserv TOPIC #systemd Version NNN released | Online resources https://systemd.io/`)
16. Check that announcement to systemd-devel, with a copy&paste from NEWS, was sent. This should happen automatically.
19. [FINAL] Build and upload the documentation (on the -stable branch): `ninja -C build doc-sync`
20. [FINAL] Change the Github Pages branch to the newly created branch (https://github.com/systemd/systemd/settings/pages) and set the 'Custom domain' to 'systemd.io'
20. [FINAL] Change the Github Pages branch to the newly created branch (https://github.com/systemd/systemd/settings/pages) and set the 'Custom domain' to 'systemd.io'
21. [FINAL] Update version number in `meson.version` to the devel version of the next release (e.g. from `v256` to `v257~devel`)
21. [FINAL] Update version number in `meson.version` to the devel version of the next release (e.g. from `256` to `257~devel`)
<para><ulinkurl="https://lists.freedesktop.org/archives/systemd-devel/2011-May/002526.html">More information on how the system clock and RTC interact</ulink></para>
<member><ulinkurl="https://lists.freedesktop.org/archives/systemd-devel/2011-May/002526.html">More information on how the system clock and RTC interact</ulink></member>