@ -2,20 +2,21 @@ systemd System and Service Manager
CHANGES WITH 247 in spe:
* KERNEL API INCOMPATIBILTY: Linux 4.12 introduced two new uevents
* KERNEL API INCOMPATIBILI TY: Linux 4.12 introduced two new uevents
"bind" and "unbind" to the Linux device model. When this kernel
change was made, systemd-udevd was only minimally updated to handle
and propagate these new event types. The introduction of these new
uevents (which are typically generated for USB devices and devices
needing a firmware upload before being functional) resulted in a
number of software issues, we so far didn't address (mostly because
there was hope the kernel maintainers would themeselves address these
issues in some form – which did not happen). To handle them properly,
many (if not most) udev rules files shipped in various packages need
updating, and so do many programs that monitor or enumerate devices
with libudev or sd-device, or otherwise process uevents. Please note
that this incompatibility is not fault of systemd or udev, but caused
by an incompatible kernel change that happened back in Linux 4.12.
number of issues which we so far didn't address. We hoped the kernel
maintainers would themselves address these issues in some form, but
that did not happen. To handle them properly, many (if not most) udev
rules files shipped in various packages need updating, and so do many
programs that monitor or enumerate devices with libudev or sd-device,
or otherwise process uevents. Please note that this incompatibility
is not fault of systemd or udev, but caused by an incompatible kernel
change that happened back in Linux 4.12, but is becoming more and
more visible as the new uvents are generated by more kernel drivers.
To minimize issues resulting from this kernel change (but not avoid
them entirely) starting with systemd-udevd 247 the udev "tags"
@ -40,8 +41,8 @@ CHANGES WITH 247 in spe:
device. To accommodate for this a new automatic property CURRENT_TAGS
has been added that works similar to the existing TAGS property but
only lists tags set by the most recent uevent/database
update. Similar, the libudev/sd-device API has been updated with new
functions to enumerate these 'current' tags, in addition to the
update. Similarly , the libudev/sd-device API has been updated with
new functions to enumerate these 'current' tags, in addition to the
existing APIs that now enumerate the 'sticky' ones.
To properly handle "bind"/"unbind" on Linux 4.12 and newer it is
@ -53,12 +54,12 @@ CHANGES WITH 247 in spe:
ACTION=="remove",GOTO="xyz_end" instead, so that the
properties/tags they add are also applied whenever "bind" (or
"unbind") is seen. (This is most important for all physical device
types — as that's for which "bind" and "unbind" are currently
usually generated, for all other device types this change is still
types — those for which "bind" and "unbind" are currently
generated, for all other device types this change is still
recommended but not as important — but certainly prepares for
future kernel uevent type additions).
• Similar, all code monitoring devices that contains an 'if' branch
• Similarly , all code monitoring devices that contains an 'if' branch
discerning the "add" + "change" uevent actions from all other
uevents actions (i.e. considering devices only relevant after "add"
or "change", and irrelevant on all other events) should be reworked
@ -74,8 +75,8 @@ CHANGES WITH 247 in spe:
• Any code that uses device tags for deciding whether a device is
relevant or not most likely needs to be updated to use the new
udev_device_has_current_tag() API (or sd_device_has_current_tag()
in case sd-device is used), to check whether the tag is set
at the moment an uevent is seen (as opposed to the existing
in case sd-device is used), to check whether the tag is set at the
moment an uevent is seen (as opposed to the existing
udev_device_has_tag() API which checks if the tag ever existed on
the device, following the API concept redefinition explained
above).
@ -85,6 +86,12 @@ CHANGES WITH 247 in spe:
this is not caused by systemd/udev changes, but result of a kernel
behaviour change.
* The MountAPIVFS= service file setting now defaults to on if
RootImage= and RootDirectory= are used, which means that with those
two settings /proc/, /sys/ and /dev/ are automatically properly set
up for services. Previous behaviour may be restored by explicitly
setting MountAPIVFS=off.
* Since PAM 1.2.0 (2015) configuration snippets may be placed in
/usr/lib/pam.d/ in addition to /etc/pam.d/. If a file exists in the
latter it takes precedence over the former, similar to how most of
@ -97,9 +104,374 @@ CHANGES WITH 247 in spe:
packages' vendor versions of their PAM stack definitions from
/etc/pam.d/ to /usr/lib/pam.d/, but if such OS-wide migration is not
desired the location to which systemd installs its PAM stack
configuration file may be changed via the "pamconfdir" meson variable
at build time, optionally undoing this change of default paths
introduced with systemd 247.
configuration may be changed via the -Dpamconfdir Meson option.
* The runtime dependencies on libqrencode, libpcre2, libpwquality and
libcryptsetup have been changed to be based on dlopen(): instead of
regular dynamic library dependencies declared in the binary ELF
headers, these libraries are now loaded on demand only, if they are
available. If the libraries cannot be found the relevant operations
will fail gracefully, or a suitable fallback logic is chosen. This is
supposed to be useful for general purpose distributions, as it allows
minimizing the list of dependencies the systemd packages pull in,
permitting building of more minimal OS images, while still making use
of these "weak" dependencies should they be installed. Since many
package managers automatically synthesize package dependencies from
ELF shared library dependencies, some additional manual packaging
work has to be done now to replace those (slightly downgraded from
"required" to "recommended" or whatever is conceptually suitable for
the package manager). Note that this change does not alter build-time
behaviour: as before the build-time dependencies have to be installed
during build, even if they now are optional during runtime.
* sd-event.h gained a new call sd_event_add_time_relative() for
installing timers relative to the current time. This is mostly a
convenience wrapper around the pre-existing sd_event_add_time() call
which installs absolute timers.
* A new per-unit setting RootImageOptions= has been added which allows
tweaking the mount options for any file system mounted as effect of
the RootImage= setting.
* Another new per-unit setting MountImages= has been added, that allows
mounting additional disk images into the file system tree accessible
to the service.
* systemd-repart now generates JSON output when requested with the new
--json= switch.
* systemd-machined's OpenMachineShell() bus call will now pass
additional policy metadata data fields to the PolicyKit
authentication request.
* systemd-tmpfiles gained a new -E switch, which is equivalent to
--exclude-prefix=/dev --exclude-prefix=/proc --exclude=/run
--exclude=/sys. It's particularly useful in combination with --root=,
when operating on OS trees that do not have any of these four runtime
directories mounted, as this means no files below these subtrees are
created or modified, since those mount points should probably remain
empty.
* systemd-tmpfiles gained a new --image= switch which is like --root=,
but takes a disk image instead of a directory as argument. The
specified disk image is mounted inside a temporary mount namespace
and the tmpfiles.d/ drop-ins stored in the image are executed and
applied to the image. systemd-sysusers similarly gained a new
--image= switch, that allows the sysusers.d/ drop-ins stored in the
image to be applied onto the image.
* Similarly, the journalctl command also gained an --image= switch,
which is a quick one-step solution to look at the log data included
in OS disk images.
* journalctl's --output=cat option (which outputs the log content
without any metadata, just the pure text messages) will now make use
of terminal colors when run on a suitable terminal, similarly to the
other output modes.
* JSON group records now support a "description" string that may be
used to add a human-readable textual description to such groups. This
is supposed to match the user's GECOS field which traditionally
didn't have a counterpart for group records.
* The "systemd-dissect" tool that may be used to inspect OS disk images
and that was previously installed to /usr/lib/systemd/ has now been
moved to /usr/bin/, reflecting its updated status of an officially
supported tool with a stable interface. It gained support for a new
--mkdir switch which when combined with --mount has the effect of
creating the directory to mount the image to if it is missing
first. It also gained two new commands --copy-from and --copy-to for
copying files and directories in and out of an OS image without the
need to manually mount it. It also acquired support for a new option
--json= to generate JSON output when inspecting an OS image.
* The cgroup2 file system is now mounted with the
"memory_recursiveprot" mount option, supported since kernel 5.7. This
means that the MemoryLow= and MemoryMin= unit file settings now apply
recursively to whole subtrees.
* systemd-homed now defaults to using the btrfs file system — if
available — when creating home directories in LUKS volumes. This may
be changed with the DefaultFileSystemType= setting in homed.conf.
It's now the default file system in various major distributions and
has the major benefit for homed that it can be grown and shrunk while
mounted, unlike the other contenders ext4 and xfs, which can both be
grown online, but not shrunk (in fact xfs is the technically most
limited option here, as it cannot be shrunk at all).
* JSON user records managed by systemd-homed gained support for
"recovery keys". These are basically secondary passphrases that can
unlock user accounts/home directories. They are computer-generated
rather than user-chosen, and typically have greater entropy.
homectl's --recovery-key= option may be used to add a recovery key to
a user account. The generated recovery key is displayed as a QR code,
so that it can be scanned to be kept in a safe place. This feature is
particularly useful in combination with systemd-homed's support for
FIDO2 or PKCS#11 authentication, as a secure fallback in case the
security tokens are lost. Recovery keys may be entered wherever the
system asks for a password.
* systemd-homed now maintains a "dirty" flag for each LUKS encrypted
home directory which indicates that a home directory has not been
deactivated cleanly when offline. This flag is useful to identify
home directories for which the offline discard logic did not run when
offlining, and where it would be a good idea to log in again to catch
up.
* systemctl gained a new parameter --timestamp= which may be used to
change the style in which timestamps are output, i.e. whether to show
them in local timezone or UTC, or whether to show µs granularity.
* Alibaba's "pouch" container manager is now detected by
systemd-detect-virt, ConditionVirtualization= and similar constructs.
* systemd-nspawn has been reworked to use the /run/host/incoming/ as
place to use for propagating external mounts into the
container. Similarly /run/host/notify is now used as the socket path
for container payloads to communicate with the container manager
using sd_notify(). The container manager now uses the
/run/host/inaccessible/ directory to place "inaccessible" file nodes
of all relevant types which may be used by the container payload as
bind mount source to over-mount inodes to make them inaccessible.
/run/host/container-manager will now be initialized with the same
string as the $container environment variable passed to the
container's PID 1. /run/host/container-uuid will be initialized with
the same string as $container_uuid. This means the /run/host/
hierarchy is now the primary way to make host resources available to
the container. The Container Interface documents these new files and
directories:
https://systemd.io/CONTAINER_INTERFACE
* Support for the "ConditionNull=" unit file condition has been
deprecated and undocumented for 6 years. systemd started to warn
about its use 1.5 years ago. It has now been removed entirely.
* If the $SYSTEMD_LOG_SECCOMP=1 environment variable is set for
systemd-nspawn all system call filter violations will be logged by
the kernel (audit). This is useful for tracking down system calls
invoked by container payloads that are prohibited by the container's
system call filter policy.
* sd-bus.h gained a new API call sd_bus_error_has_names(), which takes
a sd_bus_error struct and a list of error names, and checks if the
error matches one of these names. It's a convenience wrapper that is
useful in cases where multiple errors shall be handled the same way.
* A new system call filter list "@known" has been added, that contains
all system calls known at the time systemd was built.
* Behaviour of system call filter allow lists has changed slightly:
system calls that are contained in @known will result in a EPERM by
default, while those not contained in it result in ENOSYS. This
should improve compatibility because known syscalls will thus be
communicated as prohibited, while unknown (and thus newer ones) will
be communicated as not implemented, which hopefully has the greatest
chance of triggering the right fallback code paths in client
applications.
* Two new unit file settings ProtectProc= and ProcSubset= have been
added that expose the hidepid= and subset= mount options of procfs.
All processes of the unit will only see processes in /proc that are
are owned by the unit's user. This is an important new sandboxing
option that is recommended to be set on all system services. All
long-running system services that are included in systemd itself set
this option now. This option is only supported on kernel 5.8 and
above, since the hidepid= option supported on older kernels was not a
per-mount option but actually applied to the whole PID namespace.
* Socket units gained a new boolean setting FlushPending=. If enabled
all pending socket data/connections are flushed whenever the socket
unit enters the "listening" state, i.e. after the associated service
exited.
* The unit file setting NUMAMask= gained a new "all" value: when used,
all existing NUMA nodes are added to the NUMA mask.
* A new "credentials" logic has been added to system services. This is
a simple mechanism to pass privileged data to services in a safe and
secure way. It's supposed to be used to pass per-service secret data
such as passwords or cryptographic keys but also associated less
private information such as user names, certificates, and similar to
system services. Each credential is identified by a short user-chosen
name and may contain arbitrary binary data. Two new unit file
settings have been added: SetCredential= and LoadCredential=. The
former allows setting a credential to a literal string, the latter
sets a credential to the contents of a file (or data read from a
user-chosen AF_UNIX stream socket). Credentials are passed to the
service via a special credentials directory, one file for each
credential. The path to the credentials directory is passed in a new
$CREDENTIALS_DIRECTORY environment variable. Since the credentials
are passed in the file system they may be easily referenced in
ExecStart= command lines too, thus no explicit support for the
credentials logic in daemons is required (though ideally daemons
would look for the bits they need in $CREDENTIALS_DIRECTORY
themselves automatically, if set). The $CREDENTIALS_DIRECTORY is
backed by unswappable memory if privileges allow it, immutable if
privileges allow it, is accessible only to the service's UID, and is
automatically destroyed when the service stops.
* systemd-nspawn supports the same credentials logic. It can both
consume credentials passed to it via the aforementioned
$CREDENTIALS_DIRECTORY protocol as well as pass these credentials on
to its payload. The service manager/PID 1 has been updated to match
this: it can also accept credentials from the container manager that
invokes it (in fact: any process that invokes it), and passes them on
to its services. Thus, credentials can be propagated recursively down
the tree: from a system's service manager to a systemd-nspawn
service, to the service manager that runs as container payload and to
the service it runs below. Credentials may also be added on the
systemd-nspawn command line, using new --set-credential= and
--load-credential= command line switches that match the
aforementioned service settings.
* systemd-repart gained new settings Format=, Encrypt=, CopyFiles= in
the partition drop-ins which may be used to format/LUKS
encrypt/populate any created partitions. The partitions are
encrypted/formatted/populated before they are registered in the
partition table, so that they appear atomically: either the
partitions do not exist yet or they exist fully encrypted, formatted,
and populated — there is no time window where they are
"half-initialized". Thus the system is robust to abrupt shutdown: if
the tool is terminated half-way during its operations on next boot it
will start from the beginning.
* systemd-repart's --size= operation gained a new "auto" value. If
specified, and operating on a loopback file it is automatically sized
to the minimal size the size constraints permit. This is useful to
use "systemd-repart" as an image builder for minimally sized images.
* systemd-resolved now gained a third IPC interface for requesting name
resolution: besides D-Bus and local DNS to 127.0.0.53 a Varlink
interface is now supported. The nss-resolve NSS module has been
modified to use this new interface instead of D-Bus. Using Varlink
has a major benefit over D-Bus: it works without a broker service,
and thus already during earliest boot, before the dbus daemon has
been started. This means name resolution via systemd-resolved now
works at the same time systemd-networkd operates: from earliest boot
on, including in the initrd.
* systemd-resolved gained support for a new DNSStubListenerExtra=
configuration file setting which may be used to specify additional IP
addresses the built-in DNS stub shall listen on, in addition to the
main one on 127.0.0.53:53.
* Name lookups issued via systemd-resolved's D-Bus and Varlink
interfaces (and thus also via glibc NSS if nss-resolve is used) will
now honour a trailing dot in the hostname: if specified the search
path logic is turned off. Thus "resolvectl query foo." is now
equivalent to "resolvectl query --search=off foo.".
* systemd-resolved gained a new D-Bus property "ResolvConfMode" that
exposes how /etc/resolv.conf is currently managed: by resolved (and
in which mode if so) or another subsystem. "resolvctl" will display
this property in its status output.
* The resolv.conf snippets systemd-resolved provides will now set "."
as the search domain if no other search domain is known. This turns
off the derivation of an implicit search domain by nss-dns for the
hostname, when the hostname is set to an FQDN. This change is done to
make nss-dns using resolv.conf provided by systemd-resolved behave
more similarly to nss-resolve.
* systemd-tmpfiles' file "aging" logic (i.e. the automatic clean-up of
/tmp/ and /var/tmp/ based on file timestamps) now looks at the
"birth" time (btime) of a file in addition to the atime, mtime, and
ctime.
* systemd-analyze gained a new verb "capability" that lists all known
capabilities by the systemd build and by the kernel.
* If a file /usr/lib/clock-epoch exists, PID 1 will read its mtime and
advance the system clock to it at boot if it is noticed to be before
that time. Previously, PID 1 would only advance the time to an epoch
time that is set during build-time. With this new file OS builders
can change this epoch timestamp on individual OS images without
having to rebuild systemd.
* systemd-logind will now listen to the KEY_RESTART key from the Linux
input layer and reboot the system if it is pressed, similarly to how
it already handles KEY_POWER, KEY_SUSPEND or KEY_SLEEP. KEY_RESTART
was originally defined in the Multimedia context (to restart playback
of a song or film), but is now primarily used in various embedded
devices for "Reboot" buttons. Accordingly, systemd-logind will now
honour it as such. This may configured in more detail via the new
HandleRebootKey= and RebootKeyIgnoreInhibited=.
* systemd-nspawn/systemd-machined will now reconstruct hardlinks when
copying OS trees, for example in "systemd-nspawn --ephemeral",
"systemd-nspawn --template=", "machinectl clone" and similar. This is
useful when operating with OSTree images, which use hardlinks heavily
throughout, and where such copies previously resulting in "exploding"
hardlinks.
* systemd-nspawn's --console= setting gained support for a new
"autopipe" value, which is identical to "interactive" when invoked on
a TTY, and "pipe" otherwise.
* systemd-networkd's .network files gained support for explicitly
configuring the multicast membership entries of bridge devices in the
[BridgeMDB] section. It also gained support for the PIE queuing
discipline in the [FlowQueuePIE] sections.
* systemd-networkd's .netdev files may now be used to create "BareUDP"
tunnels, configured in the new [BareUDP] setting. VXLAN tunnels may
now be marked to be independent of any underlying network interface
via the new Independent= boolean setting.
* systemctl gained support for two new verbs: "service-log-level" and
"service-log-target" may be used on services that implement the
generic org.freedesktop.LogControl1 D-Bus interface to dynamically
adjust the log level and target. All of systemd's long-running
services support this now, but ideally all system services would
implement this interface to make the system more uniformly
debuggable.
* The SystemCallErrorNumber= unit file setting now accepts the new
"kill" and "log" actions, in addition to arbitrary error number
specifications as before. If "kill" the the processes are killed on
the event, if "log" the offending syscall is audit logged.
* A new SystemCallLog= unit file setting has been added that accepts a
list of syscalls that shall be logged about (audit).
* The OS image dissection logic (as used by RootImage= in unit files or
systemd-nspawn's --image= switch) has gained support for identifying
and mounting explicit /usr/ partitions, which are now defined in the
discoverable partition specification. This should be useful for
environments where the root file system is
generated/formatted/populated dynamically on first boot and combined
with an immutable /usr/ tree that is supplied by the vendor.
* In the final phase of shutdown, within the systemd-shutdown binary
we'll now try to detach MD devices (i.e software RAID) in addition to
loopback block devices and DM devices as before. This is supposed to
be a safety net only, in order to increase robustness if things go
wrong. Storage subsystems are expected to properly detach their
storage volumes during regular shutdown already (or in case of
storage backing the root file system: in the initrd hook we return to
later).
* If the SYSTEMD_LOG_TID environment variable is set all systemd tools
will now log the thread ID in their log output. This is useful when
working with heavily threaded programs.
* If the SYSTEMD_RDRAND enviroment variable is set to "0", systemd will
not use the RDRAND CPU instruction. This is useful in environments
such as replay debuggers where non-deterministic behaviour is not
desirable.
* When building systemd the Meson option
-Dcompat-mutable-uid-boundaries may now be specified. If enabled,
systemd reads the system UID boundaries from /etc/login.defs, instead
of using the built-in values selected during build-time. This is an
option to improve compatibility for upgrades from old systems. It's
strongly recommended not to make use of this functionality on new
systems (or even enable it during build), as it makes something
runtime-configurable that is mostly an implementation detail of the
OS, and permits avoidable differences in deployments that create all
kinds of problems in the long run.
CHANGES WITH 246: