1
0
mirror of https://github.com/systemd/systemd synced 2026-04-26 17:04:50 +02:00

Compare commits

..

No commits in common. "6ef00eb846a89558ad46d2937addd8ea952b7062" and "a34ecd1c37b1d93062b08cbb5b0091ca047b8b5e" have entirely different histories.

7 changed files with 59 additions and 189 deletions

91
NEWS
View File

@ -4,7 +4,7 @@ CHANGES WITH 251:
Backwards-incompatible changes: Backwards-incompatible changes:
* The minimum kernel version required has been bumped from 3.13 to 4.15, * The minimum kernel version required has been bumped from 3.13 to 3.15,
and CLOCK_BOOTTIME is now assumed to always exist. and CLOCK_BOOTTIME is now assumed to always exist.
* C11 with GNU extensions (aka "gnu11") is now used to build our * C11 with GNU extensions (aka "gnu11") is now used to build our
@ -204,19 +204,6 @@ CHANGES WITH 251:
similar to sd_id128_to_string() but formats the ID in RFC 4122 UUID similar to sd_id128_to_string() but formats the ID in RFC 4122 UUID
format instead of simple series of hex characters. format instead of simple series of hex characters.
* The sd-device API gained two new calls sd_device_new_from_devname()
and sd_device_new_from_path() which permit allocating an sd_device
object from a device node name or file system path.
* sd-device also gained a new call sd_device_open() which will open the
device node associated with a device for which an sd_device object
has been allocated. The call is supposed to address races around
device nodes being removed/recycled due to hotplug events, or media
change events: the call checks internally whether the major/minor of
the device node and the "diskseq" (in case of block devices) match
with the metadata loaded in the sd_device object, thus ensuring that
the device once opened really matches the provided sd_device object.
Changes in PID1, systemctl, and systemd-oomd: Changes in PID1, systemctl, and systemd-oomd:
* A new set of service monitor environment variables will be passed to * A new set of service monitor environment variables will be passed to
@ -293,32 +280,6 @@ CHANGES WITH 251:
necessary to fix this aspect. Absolute links are interpreted as necessary to fix this aspect. Absolute links are interpreted as
before, and it is still possible to create them via other means. before, and it is still possible to create them via other means.
* A new "taint" flag named "old-kernel" is introduced which is set when
the kernel systemd runs on is older then the current baseline version
(see above). The flag is shown in "systemctl status" output.
* Two additional taint flags "short-uid-range" and "short-gid-range"
have been added as well, which are set when systemd notices it is run
within a userns namespace that does not define the full 0…65535 UID
range
* A new "unmerged-usr" taint flag has been added that is set whenever
running on systems where /bin/ + /sbin/ are *not* symlinks to their
counterparts in /usr/, i.e. on systems where the /usr/-merge has been
completed.
* Generators invoked by PID 1 will now have a couple of useful
environment variables set describing the execution context a
bit. $SYSTEMD_SCOPE encodes whether the generator is called from the
system service manager, or from the per-user service
manager. $SYSTEMD_IN_INITRD encodes whether the generator is invoked
in initrd context or on the host. $SYSTEMD_FIRST_BOOT encodes whether
systemd considers the current boot to be a "first"
boot. $SYSTEMD_VIRTUALIZATION encode whether virtualization is
detected and which type of hypervisor/container
manager. $SYSTEMD_ARCHITECTURE indicates which architecture the
kernel is built for.
Changes in systemd-journald: Changes in systemd-journald:
* The journal JSON export format has been added to listed of stable * The journal JSON export format has been added to listed of stable
@ -350,32 +311,6 @@ CHANGES WITH 251:
already-initialized devices, and only devices which haven't been already-initialized devices, and only devices which haven't been
initialized yet, respectively. initialized yet, respectively.
* udevadm gained a new "wait" command for safely waiting for a specific
device to show up in the udev device database. This is useful in
scripts that asynchronously allocate a block device (e.g. through
repartitioning, or allocating a loopback device or similar) and need
to synchronize on the creation to complete.
* udevadm gained a new "lock" command for locking one or more block
devices while formatting it or writing a partition table to it. It is
an implementation of https://systemd.io/BLOCK_DEVICE_LOCKING and
usable in scripts dealing with block devices.
* udevadm info will show a couple of additional device fields in its
output, and will not apply a limited set of coloring to line types.
* udevadm info --tree will now show a tree of objects (i.e. devices and
suchlike) in the /sys/ hierarchy.
* Block devices will now get a new set of device symlinks in
/dev/disk/by-diskseq/<nr>, which may be used to reference block
device nodes via the kernel's "diskseq" value. Note that this does
not guarantee that opening a device by a symlink like this will
guarantee that the opened device actually matches the specified
diskseq value. To be safe against races, the actual diskseq value of
the opened device (BLKGETDISKSEQ ioctl()) must still be compred with
the one in the symlink path.
* .link files gained support for setting MDI/MID-X on a link. * .link files gained support for setting MDI/MID-X on a link.
* .link files gained support for [Match] Firmware= setting to match on * .link files gained support for [Match] Firmware= setting to match on
@ -442,10 +377,6 @@ CHANGES WITH 251:
used, to ensure that communication between CPU and discrete TPM chips used, to ensure that communication between CPU and discrete TPM chips
cannot be eavesdropped to acquire disk encryption keys. cannot be eavesdropped to acquire disk encryption keys.
* A new switch --fido2-credential-algorithm= has been added to
systemd-cryptenroll allowing selection of the credential algorithm to
use when binding encryption to FIDO2 tokens.
Changes in systemd-hostnamed: Changes in systemd-hostnamed:
* HARDWARE_VENDOR= and HARDWARE_MODEL= can be set in /etc/machine-info * HARDWARE_VENDOR= and HARDWARE_MODEL= can be set in /etc/machine-info
@ -456,9 +387,7 @@ CHANGES WITH 251:
hostnamed. hostnamed.
* hostnamed's D-Bus interface gained a new method GetHardwareSerial() * hostnamed's D-Bus interface gained a new method GetHardwareSerial()
for reading the hardware serial number, as reportd by DMI. It also for reading the hardware serial number, as reportd by DMI.
exposes a new method D-Bus property FirmwareVersion that encode the
firmware version of the system.
Changes in other components: Changes in other components:
@ -475,22 +404,6 @@ CHANGES WITH 251:
used to set the default shell for user records and nspawn shell used to set the default shell for user records and nspawn shell
invocations (instead of of the default /bin/bash). invocations (instead of of the default /bin/bash).
* systemd-timesyncd now provides a D-Bus API for receiving NTP server
information dynamically at runtime via IPC.
* The systemd-creds tool gained a new "has-tpm2" verb, which reports
whether a functioning TPM2 infrastructure is available, i.e. if
firmware, kernel driver and systemd all have TPM2 support enabled and
a device found.
* The systemd-creds tool gained support for generating encrypted
credentials that are using an empty encryption key. While this
provides no integrity nor confidentiality it's useful to implement
codeflows that work the same on TPM-ful and TPM2-less systems. The
service manager will only accept credentials "encrypted" that way if
a TPM2 device cannot be detected, to ensure that credentials
"encrypted" like that cannot be used to trick TPM2 systems.
Experimental features: Experimental features:
* sd-boot gained a new *experimental* setting "reboot-for-bitlocker" in * sd-boot gained a new *experimental* setting "reboot-for-bitlocker" in

18
TODO
View File

@ -78,24 +78,6 @@ Janitorial Clean-ups:
Features: Features:
* TPM2: add auth policy for signed PCR values to make updates easy. i.e. do
what tpm2_policyauthorize tool does. To be truly useful scheme needs to be a
bit more elaborate though: policy probably must take some nvram based
generation counter into account that can only monotonically increase and can
be used to invalidate old PCR signatures. Otherwise people could downgrade to
old signed PCR sets whenever they want. Usecase: encrypt the rootfs with LUKS
with a key that can only be unlocked via a pristine pre-built Fedora
kernel+initrd.
* update HACKING.md to suggest developing systemd with the ideas from:
https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html
https://0pointer.net/blog/running-an-container-off-the-host-usr.html
* add a clear concept how the initrd can make up credentials on their own to
pass to the system when transitioning into the host OS. usecase: things like
cloud-init/ignitation and similar can parameterize the host with data they
acquire.
* Add ConditionCredentialExists= or so, that allows conditionalizing services * Add ConditionCredentialExists= or so, that allows conditionalizing services
depending on whether a specific system credential is set. Usecase: a service depending on whether a specific system credential is set. Usecase: a service
similar to the ssh keygen service that installs any SSH host key supplied via similar to the ssh keygen service that installs any SSH host key supplied via

View File

@ -29,36 +29,23 @@
<refsect1> <refsect1>
<title>Description</title> <title>Description</title>
<para><command>systemd-oomd</command> is a system service that uses cgroups-v2 and pressure stall <para><command>systemd-oomd</command> is a system service that uses cgroups-v2 and pressure stall information (PSI)
information (PSI) to monitor and take corrective action before an OOM occurs in the kernel space.</para> to monitor and take action on processes before an OOM occurs in kernel space.</para>
<para>You can enable monitoring and actions on units by setting <varname>ManagedOOMSwap=</varname> and <para>You can enable monitoring and actions on units by setting <varname>ManagedOOMSwap=</varname> and/or
<varname>ManagedOOMMemoryPressure=</varname> in the unit configuration, see <varname>ManagedOOMMemoryPressure=</varname> to the appropriate value. <command>systemd-oomd</command> will
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>. periodically poll enabled units' cgroup data to detect when corrective action needs to occur. When an action needs
<command>systemd-oomd</command> retrieves information about such units from <command>systemd</command> to happen, it will only be performed on the descendant cgroups of the enabled units. More precisely, only cgroups with
when it starts and watches for subsequent changes.</para> <filename>memory.oom.group</filename> set to <constant>1</constant> and leaf cgroup nodes are eligible candidates.
Action will be taken recursively on all of the processes under the chosen candidate.</para>
<para>Cgroups of units with <varname>ManagedOOMSwap=</varname> or <para>See
<varname>ManagedOOMMemoryPressure=</varname> set to <option>kill</option> will be monitored. <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
<command>systemd-oomd</command> periodically polls PSI statistics for the system and those cgroups to
decide when to take action. If the configured limits are exceeded, <command>systemd-oomd</command> will
select a cgroup to terminate, and send <constant>SIGKILL</constant> to all processes in it. Note that
only descendant cgroups are eligible candidates for killing; the unit with its property set to
<option>kill</option> is not a candidate (unless one of its ancestors set their property to
<option>kill</option>). Also only leaf cgroups and cgroups with <filename>memory.oom.group</filename> set
to <constant>1</constant> are eligible candidates; see <varname>OOMPolicy=</varname> in
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
</para>
<para><citerefentry><refentrytitle>oomctl</refentrytitle><manvolnum>1</manvolnum></citerefentry> can
be used to list monitored cgroups and pressure information.</para>
<para>See <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
for more information about the configuration of this service.</para> for more information about the configuration of this service.</para>
</refsect1> </refsect1>
<refsect1> <refsect1>
<title>System requirements and configuration</title> <title>Setup Information</title>
<para>The system must be running systemd with a full unified cgroup hierarchy for the expected cgroups-v2 features. <para>The system must be running systemd with a full unified cgroup hierarchy for the expected cgroups-v2 features.
Furthermore, memory accounting must be turned on for all units monitored by <command>systemd-oomd</command>. Furthermore, memory accounting must be turned on for all units monitored by <command>systemd-oomd</command>.
@ -66,25 +53,23 @@
is set to <constant>true</constant> in is set to <constant>true</constant> in
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
<para>The kernel must be compiled with PSI support. This is available in Linux 4.20 and above.</para> <para>You will need a kernel compiled with PSI support. This is available in Linux 4.20 and above.</para>
<para>It is highly recommended for the system to have swap enabled for <command>systemd-oomd</command> to <para>It is highly recommended for the system to have swap enabled for <command>systemd-oomd</command> to function
function optimally. With swap enabled, the system spends enough time swapping pages to let optimally. With swap enabled, the system spends enough time swapping pages to let <command>systemd-oomd</command> react.
<command>systemd-oomd</command> react. Without swap, the system enters a livelocked state much more Without swap, the system enters a livelocked state much more quickly and may prevent <command>systemd-oomd</command>
quickly and may prevent <command>systemd-oomd</command> from responding in a reasonable amount of from responding in a reasonable amount of time. See
time. See <ulink url="https://chrisdown.name/2018/01/02/in-defence-of-swap.html">"In defence of swap: <ulink url="https://chrisdown.name/2018/01/02/in-defence-of-swap.html">"In defence of swap: common misconceptions"</ulink>
common misconceptions"</ulink> for more details on swap. Any swap-based actions on systems without swap for more details on swap. Any swap-based actions on systems without swap will be ignored. While
will be ignored. While <command>systemd-oomd</command> can perform pressure-based actions on such a <command>systemd-oomd</command> can perform pressure-based actions on a system without swap, the pressure increases
system, the pressure increases will be more abrupt and may require more tuning to get the desired will be more abrupt and may require more tuning to get the desired thresholds and behavior.</para>
thresholds and behavior.</para>
<para>Be aware that if you intend to enable monitoring and actions on <filename>user.slice</filename>, <para>Be aware that if you intend to enable monitoring and actions on <filename>user.slice</filename>,
<filename>user-$UID.slice</filename>, or their ancestor cgroups, it is highly recommended that your <filename>user-$UID.slice</filename>, or their ancestor cgroups, it is highly recommended that your programs be
programs be managed by the systemd user manager to prevent running too many processes under the same managed by the systemd user manager to prevent running too many processes under the same session scope (and thus
session scope (and thus avoid a situation where memory intensive tasks trigger avoid a situation where memory intensive tasks trigger <command>systemd-oomd</command> to kill everything under the
<command>systemd-oomd</command> to kill everything under the cgroup). If you're using a desktop cgroup). If you're using a desktop environment like GNOME, it already spawns many session components with the
environment like GNOME or KDE, it already spawns many session components with the systemd user manager. systemd user manager.</para>
</para>
</refsect1> </refsect1>
<refsect1> <refsect1>
@ -94,11 +79,11 @@
<filename>-.slice</filename>, and allowing all descendant cgroups to be eligible candidates may make the most <filename>-.slice</filename>, and allowing all descendant cgroups to be eligible candidates may make the most
sense.</para> sense.</para>
<para><varname>ManagedOOMMemoryPressure=</varname> tends to work better on the cgroups below the root <para><varname>ManagedOOMMemoryPressure=</varname> tends to work better on the cgroups below the root slice
slice. For units which tend to have processes that are less latency sensitive (e.g. <filename>-.slice</filename>. For units which tend to have processes that are less latency sensitive (e.g.
<filename>system.slice</filename>), a higher limit like the default of 60% may be acceptable, as those <filename>system.slice</filename>), a higher limit like the default of 60% may be acceptable, as those processes
processes can usually ride out slowdowns caused by lack of memory without serious consequences. However, can usually ride out slowdowns caused by lack of memory without serious consequences. However, something like
something like <filename>user@$UID.service</filename> may prefer a much lower value like 40%.</para> <filename>user@$UID.service</filename> may prefer a much lower value like 40%.</para>
</refsect1> </refsect1>
<refsect1> <refsect1>

View File

@ -1108,24 +1108,24 @@ DeviceAllow=/dev/loop-control
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
will act on this unit's cgroups. Defaults to <option>auto</option>.</para> will act on this unit's cgroups. Defaults to <option>auto</option>.</para>
<para>When set to <option>kill</option>, the unit becomes a candidate for monitoring by <para>When set to <option>kill</option>, <command>systemd-oomd</command> will actively monitor this unit's
<command>systemd-oomd</command>. If the cgroup passes the limits set by cgroup metrics to decide whether it needs to act. If the cgroup passes the limits set by
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or its
the unit configuration, <command>systemd-oomd</command> will select a descendant cgroup and send overrides, <command>systemd-oomd</command> will send a <constant>SIGKILL</constant> to all of the processes
<constant>SIGKILL</constant> to all of the processes under it. You can find more details on under the chosen candidate cgroup. Note that only descendant cgroups can be eligible candidates for killing;
candidates and kill behavior at the unit that set its property to <option>kill</option> is not a candidate (unless one of its ancestors set
their property to <option>kill</option>). You can find more details on candidates and kill behavior at
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
and and <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>. Setting
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> either of these properties to <option>kill</option> will also automatically acquire
<para>Setting either of these properties to <option>kill</option> will also result in
<varname>After=</varname> and <varname>Wants=</varname> dependencies on <varname>After=</varname> and <varname>Wants=</varname> dependencies on
<filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.</para> <filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.
</para>
<para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this <para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this cgroup's
cgroup's data for monitoring and detection. However, if an ancestor cgroup has one of these data for monitoring and detection. However, if an ancestor cgroup has one of these properties set to
properties set to <option>kill</option>, a unit with <option>auto</option> can still be a candidate <option>kill</option>, a unit with <option>auto</option> can still be an eligible candidate for
for <command>systemd-oomd</command> to terminate.</para> <command>systemd-oomd</command> to act on.</para>
</listitem> </listitem>
</varlistentry> </varlistentry>

View File

@ -1123,25 +1123,15 @@
<varlistentry> <varlistentry>
<term><varname>OOMPolicy=</varname></term> <term><varname>OOMPolicy=</varname></term>
<listitem><para>Configure the out-of-memory (OOM) kernel killer policy. Note that the userspace OOM <listitem><para>Configure the Out-Of-Memory (OOM) killer policy. On Linux, when memory becomes scarce
killer the kernel might decide to kill a running process in order to free up memory and reduce memory
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
is a more flexible solution that aims to prevent out-of-memory situations for the userspace, not just
the kernel.</para>
<para>On Linux, when memory becomes scarce to the point that the kernel has trouble allocating memory
for itself, it might decide to kill a running process in order to free up memory and reduce memory
pressure. This setting takes one of <constant>continue</constant>, <constant>stop</constant> or pressure. This setting takes one of <constant>continue</constant>, <constant>stop</constant> or
<constant>kill</constant>. If set to <constant>continue</constant> and a process of the service is <constant>kill</constant>. If set to <constant>continue</constant> and a process of the service is
killed by the kernel's OOM killer this is logged but the service continues running. If set to killed by the kernel's OOM killer this is logged but the service continues running. If set to
<constant>stop</constant> the event is logged but the service is terminated cleanly by the service <constant>stop</constant> the event is logged but the service is terminated cleanly by the service
manager. If set to <constant>kill</constant> and one of the service's processes is killed by the OOM manager. If set to <constant>kill</constant> and one of the service's processes is killed by the OOM
killer the kernel is instructed to kill all remaining processes of the service too, by setting the killer the kernel is instructed to kill all remaining processes of the service, too. Defaults to the
<filename>memory.oom.group</filename> attribute to <constant>1</constant>; also see <ulink setting <varname>DefaultOOMPolicy=</varname> in
url="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html">kernel documentation</ulink>.
</para>
<para>Defaults to the setting <varname>DefaultOOMPolicy=</varname> in
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
is set to, except for services where <varname>Delegate=</varname> is turned on, where it defaults to is set to, except for services where <varname>Delegate=</varname> is turned on, where it defaults to
<constant>continue</constant>.</para> <constant>continue</constant>.</para>
@ -1152,9 +1142,9 @@
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry> for <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
details.</para> details.</para>
<para>This setting also applies to <command>systemd-oomd</command>, similar to the kernel OOM kills <para>This setting also applies to <command>systemd-oomd</command>, similar to kernel OOM kills
this setting determines the state of the service after <command>systemd-oomd</command> kills a cgroup this setting determines the state of the service after <command>systemd-oomd</command> kills a cgroup associated
associated with the service.</para></listitem> with the service.</para></listitem>
</varlistentry> </varlistentry>
</variablelist> </variablelist>

View File

@ -180,13 +180,13 @@ finish:
return r; return r;
} }
/* Fill 'new_h' with 'path's descendant OomdCGroupContexts. Only include descendant cgroups that are possible /* Fill `new_h` with `path`'s descendent OomdCGroupContexts. Only include descendent cgroups that are possible
* candidates for action. That is, only leaf cgroups or cgroups with memory.oom.group set to "1". * candidates for action. That is, only leaf cgroups or cgroups with memory.oom.group set to "1".
* *
* This function ignores most errors in order to handle cgroups that may have been cleaned up while * This function ignores most errors in order to handle cgroups that may have been cleaned up while populating
* populating the hashmap. * the hashmap.
* *
* 'new_h' is of the form { key: cgroup paths -> value: OomdCGroupContext } */ * `new_h` is of the form { key: cgroup paths -> value: OomdCGroupContext } */
static int recursively_get_cgroup_context(Hashmap *new_h, const char *path) { static int recursively_get_cgroup_context(Hashmap *new_h, const char *path) {
_cleanup_free_ char *subpath = NULL; _cleanup_free_ char *subpath = NULL;
_cleanup_closedir_ DIR *d = NULL; _cleanup_closedir_ DIR *d = NULL;

View File

@ -170,7 +170,7 @@ static int run(int argc, char *argv[]) {
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, -1) >= 0); assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, -1) >= 0);
if (arg_mem_pressure_usec > 0 && arg_mem_pressure_usec < 1 * USEC_PER_SEC) if (arg_mem_pressure_usec > 0 && arg_mem_pressure_usec < 1 * USEC_PER_SEC)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "DefaultMemoryPressureDurationSec= must be 0 or at least 1s"); log_error_errno(SYNTHETIC_ERRNO(EINVAL), "DefaultMemoryPressureDurationSec= must be 0 or at least 1s");
r = manager_new(&m); r = manager_new(&m);
if (r < 0) if (r < 0)