```
In file included from ../src/nsresourced/nsresourced-manager.c:9:
../src/shared/bpf-link.h:5:10: fatal error: bpf/libbpf.h: No such file or directory
5 | #include <bpf/libbpf.h>
| ^~~~~~~~~~~~~~
```
Follow-up for 46718d344f
The functionality was explicitly not included in 6.11 for some
unknown reason so drop the logic from systemd-repart as well so
we don't release v257 with it included.
Otherwise, `<variable>$BOOT</variable>` is rendered:
```
[2548/2992] Generating man/repart.d.5 with a custom command
Element variable in namespace '' encountered in para, but no template matches.
Element variable in namespace '' encountered in para, but no template matches.
```
If `Delegate` is configured in service, cgroup agent will never send out
any datagram as .control subcgroup is generated. Thus systemd will watch
all processes on the cgroup hierarchy for SIGCHLD to deal with unreliable
cgroup notifications.
In this way, systemd should rewatch all processes when any SIGCHLD is
captured, more than the control pid or main pid.
SetShowStatus() was added in order to fix#11447. Recently, I ran into
the exact same problem that OP was experiencing in #11447. I wasn’t able
to figure out how to deal with the problem until I found #11447, and it
took me a while to find #11447.
This commit takes what I learned from reading #11447 and adds it to the
documentation. Hopefully, this will make it easier for other people who
run into the same problem in the future.
Previously, if an file cannot be opened, e.g. due to its permission,
config_parse_many() or so did not log the error even if CONFIG_PARSE_WARN
flag is set. This makes all error paths in these functions are logged,
and the log level is controlled by the flag.
Prompted by #34436.
This was designed to deal with $BOOT, as defined by the Boot Loader
Specification, but it was made a generic mechanism because it is useful
elsewhere too. See the updated man page for usage examples, motivation,
and an explanation of how this works.
Fixes an oversight in `context_allocate_partitions` that makes it
succeed in cases where it should fail. Essentially, there was nothing
actually enforcing SizeMinBytes= and PaddingMinBytes= for partitions
that exist, only for new partitions. This behavior is inconsistent with
the docs, which state that existing partitions will be grown to at least
the specified minimum size, and that "If the backing device does not
provide enough space to fulfill the constraints placing the partition
will fail".
The macro didn't properly parenthesize a caller-controlled argument.
For example: `STRV_FOREACH_PAIR(a, b, something ?: something_else)`
would expand to `typeof(*something ?: something_else)`, which would
cause compile failures
When an interface enters unmanaged state, there are two possibilities:
- no matching .network file found,
- found a matching .network with Unmanaged=yes.
When a matching .network file is found, networkd logs the filename.
Let's also log when no matching .network file is found.
This also slightly adjust the log message when a matching .network file
found.
Closes#34436.
This tests the whole shebang:
1. That ukify can generate them properly
2. That systemd-boot can dissect them properly
3. That systemd-stub can accept profile selection propery
4. That the profile information ends up in /run/systemd/stub/ properly
5. That systemd-measure correctly calculates the expected PCR 11 values
for each profile and that we can unlock a public-key bound LUKS
volume with it
Previously, manager_free() did not assign NULL to Manager.sysctl_shadow,
hence sysctl_clear_link_shadows() called by link_free() will causes
use-after-free. To fix the issue, this makes Manager.sysctl_shadow will be
set to NULL after it is freed,
Fixes a bug introduced by 6d9ef22acd.
Wired and 2.4G dongle connectivity is covered by general trackball rule,
but with Bluetooth connectivity Kensington SlimBlade Pro uses the name
"SlimBlade Pro" which doesn't contain "[Tt]rack[Bb]all". We need to
process it specially.
Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
(This excludes any dirs that contain resources placed there by the user)
(I also didn't bother marking resources belonging to components that are
really not optional for us)
These are inspired by the existing commands that return the path to the
boot or ESP partitions. However, these new commands show the path to the
boot loader (systemd-boot) or UKI/stub (systemd-stub) that was used on
the current boot. This information is derived from EFI variables.
This introduces 'i' prefix for match string. When specified, string or
pattern will match case-insensitively.
Closes#34359.
Co-authored-by: Ryan Wilson <ryantimwilson@meta.com>
Currently, trying to boot images with type 1 entries generated by mkosi
with qemu freezes in the kernel EFI stub. I'm not going to pretend I
understand what's going on, but when I reported a similar problem with
UKIs, the fix was to rework the code in combine_initrds() in the stub
to behave like it does today. It seems that same fix was never applied
to systemd-boot's combine_initrds() function, so let's do that now to
fix the freezes I've been seeing trying to boot images with type 1 entries
in qemu.
This tries to get rid of most manual sigprocmask() changes, in favour
of:
1. The SD_EVENT_SIGNAL_PROCMASK flag to sd_event_add_signal()
2. The sd_event_set_signal_exit() call for handling SIGTERM/SIGINT
3. Move masking of SIGWINCH into ptyfwd, out of nspawn/vmspawn/run
And while we are at it get rid of a bunch of event source fields whose
lifetime is bound to the sd_event object they belong to anyway, and make
use of the "floating" event source feature of sd-event instead.
When the spec was initially written, we didn't add good documentation of how to
display the notes, also because there was no good way to display the data
except manually extracting the section to a file and running 'jq' on that. But
the tools have improved, so let's show the users how easy it is to use this
data.
The configuration files required by ldconfig could be put into
place by systemd-confext.service (ldconfig only looks in /etc) so
let's order the service after systemd-confext.service to make sure
any config files are in place before the service runs.
The verb s not really specific to credential management, it was always a
bit misplaced. Hence move it to systemd-analyze, where we already have
some general TPM related verbs such as "srk" and "pcrs"
* 2c9954fa51 mkosi-initrd: correct `--debug-shell` help output
* 671708a10b Merge pull request #2990 from behrmann/allthemanuals
|\
| * 2671849125 initrd: add --show-documentation option
| * e2238f5dc7 Move show_docs to its own module
| * e366093b1c doc: make documentation command take an argument
* | 9fcff08b34 Update documentation links
* | 113f7f67dd Only write to /etc/machine-id if /etc exists
|/
* 62a610c0e5 Merge pull request #3005 from DaanDeMeyer/mypy
|\
| * 9b569c93bb Don't delete reader in _tempfile() backport
| * 16f4c94930 Mark all class variables as Final
| * ca7021e9a7 Annotate two more variables that need it
| * fec368dd4d Move KeySource.Type out of KeySource
| * ff5f7b06b8 user: Drop lru_cache() for home() and name()
| * 8f7c7b366f Move code backported from cpython upstream to backport.py
| * f66212e9c2 Drop listify()
| * 4293866df2 mypy: Disable allow_redefinition
| * 2700337f11 Fix mypyc warnings in sandbox.py
|/
* 025483af04 sandbox: Use separate variable name when we change types
* b04800cd30 Merge pull request #3003 from DaanDeMeyer/initrd
|\
| * fd64be9b60 mkosi-initrd: Ignore gnupg subdirectory
| * 7a8a21f8f6 mkosi-initrd: Only set --cacheonly=metadata when running as root
| * 156880c398 mkosi-initrd: Add --debug-shell argument
|/
* a32c8f393a Merge pull request #3002 from DaanDeMeyer/cherry-pick
|\
| * 1d8bfabc97 news: add note to change where the manual pages are
| * 8917d65db1 initrd: flatten module into a single file
| * 76085b788a sandbox: flatten module into a single file
| * 9f48afa4a7 cli: add missing completion stubs to pyproject.toml
| * 6e21cceb03 doc: move man pages to resources/man
| * 25d1c6b579 cli: use ellipsis ligature instead of writing out ...
|/
* 013d9b5595 Move various functions to bootloader.py
* 508ad85475 Update NEWS.md
* f25b8dee6f Simplify package cache dir mirror key
* dce4c8af51 Merge pull request #2998 from DaanDeMeyer/ci
|\
| * f4934828f7 tests: Show debug messages on console
| * fa3ae22598 ci: Drop machine-id commit timeout drop-in
* dba01269de base64 encode mirror if we put it in package cache dir key
* 364b65f7bb Add 'login' to Debian/Ubuntu/Kali package list
* ee07b5b6d2 Bump github/codeql-action from 3.25.15 to 3.26.6
This is very similar to tools/fetch-distro.py. The idea is that we extend the
commit to update the mkosi hash with a git log --pretty=oneline output, so that
the reader can know what changes were actually included.
The motivation is that I'm always wondering what changed in mkosi when I see a
commit updating the hash, and it's nicer to have this information shown
directly in the commit.
The script does _not_ pull changes from upstream, on the assumption that the
person doing the commit always has a fresh checkout and that they tested with
that checkout.
This splits out the core part into a new function
pe_section_table_find().
pe_header_find_section() takes a PeHeader as input, while
pe_section_table_find() just takes the section table and its size.
This renames pe_read_section_data() to pe_read_section_data_by_name()
and makes pe_read_section_data() a bit more low-level: it takes a header
table entry directly, instead of searching it first by name.
systemd-stub provides the signing key for TPM2 signed PCR policies in a
file tpm2-pcr-public-key.pem to userspace. Hence, to clarify that this
is the same key as used when signing via "systemd-measure", let's rename
it in the docs like that.
Also rename the private key to tpm2-pcr-private-key.pem, to keep the
symmetry.
With this we should universally stick to this nomenclature:
1. tpm2-pcr-public-key.pem ← public part of signing key
2. tpm2-pcr-private-key.pem ← private part of signing key
3. tpm2-pcr-signature.json ← signature file made with key pair
Inspired by: #34069
Monitor the sysctl set by networkd for writes, if a sysctl is
overwritten with a different value than the one we set, emit a warning.
Writes are detected with an eBPF program attached as BPF_CGROUP_SYSCTL
which reports the sysctl writes only in net/.
The eBPF program only reports sysctl writes from a different cgroup than networkd.
To do this, it uses the `bpf_current_task_under_cgroup_proto()` helper,
which will be available allowed in BPF_CGROUP_SYSCTL from kernel 6.12[1].
Loading a BPF_CGROUP_SYSCTL program requires the CAP_SYS_ADMIN capability,
so drop it just after the program load, whether it loads successfully or not.
Writes are logged but permitted, in future the functionality can be
extended to also deny writes to managed sysctls.
[1] https://lore.kernel.org/bpf/20240819162805.78235-3-technoboy85@gmail.com/
This is a rework of e7a93e75219b22424bab95fe45982f5eef21d581: instead of
handling components with n_variants being zero at every step of the way, we instead
remove it from our list after loading all components, given that such a
component simply makes not sense for the rest of our logic.
If we operate in "offline" mode, i.e. know the device key, then we will
not have a TPM2 connection, hence don't try to read the PCR bank to use form
it.
We don't need it anyway because we are not going to test unseal things.
Fixes: #33855
The /dev/zramN devices can be used as regular block devices. They are
typically used for swap areas, but it would be beneficial to have
LABEL and UUID in the udev database to make it more user-friendly for
tools such as lsblk or mount (if used with other filesystems).
Such a policy won't provide any protection, but it's still entirely fine
to have it like this in various contexts, for example at OS install
time, to allocate the nvindex and reference it in enrollments. However,
it does deserve mention, hence log about it at LOG_NOTICE level.
This is based on a similar patch by Arnaud Patard
<arnaud.patard@collabora.com> proposed at #33663.
It is not true that "no string" is written to journal; the binary
name is used when run via `systemd-cat command`, or `cat` is used
when run via `command | systemd-cat`.
TEST-64-UDEV-STORAGE is invoked with the subtest appended, so TEST_SKIP=TEST-64-UDEV-STORAGE
does not work. Fix it by using TEST_SKIP as a partial match.
Follow-up for ddc91af4ea
These variables closely mirror the existing
LoaderDevicePartUUID/LoaderImageIdentifier variables. But the Stub…
variables indicate the location of the stub/UKI (i.e. of systemd-stub),
while the Loader… variables indicate the location of the boot loader
(i.e. of systemd-boot). (Except of course, there is no boot loader used,
in which case both sets point to the stub/UKI, as a special case).
This actually matters, as we support that sd-boot runs off the ESP,
while a UKI then runs off XBOOTLDR, i.e. two distinct partitions.
Let's always check if we have data to set *first*, and only then check
if an EFI var is already set.
Checking for the EFI var is more expensive after all.
First of all, these were always set, i.e. since sd-boot was merged into
our tree, i.e. v220. Let's say so explicitly.
Also, let's be more accurate, regarding which partition this referes to:
it's usually "the" ESP, but given that you can make firmware boot from
arbitrary disks, it could be any other partition too. Hence, be
explicit on this.
Also, clarify tha sd-stub will set this too, if sd-boot never set it.
If this is not done, and there are two images, image_1.raw and image_2.raw under
an image.raw.v folder, then the log will say "Using extensions image" instead of
using "Using extensions image_2.raw" which is the desired behavior for v-picked extensions.
Let's move copying out the PCR signature/key into its own tmpfiles
snippet.
And then let's add support for copying out the profile + os-release
information systemd-stub now places in the invoked initrd.
That way these four pieces of information are available even after the
initrd→host transition.
Now that we have multi-profile UKIs people likely want to stick more PE
sections into them than before. Hence, bump the number of available PE
section slots to 30 (up from 15). Also, make this configurable at build
time since some folks probably want even more, and others don't want
this at all.
(pre-allocating too many shouldn't matter too much btw, I'd advise
everyone to overshoot, except maybe on the tiniest of embedded boards)
Let's make use of libcryptsetup's new crypt_token_set_external_path()
API in place of the interposition stuff we have been doing before. Let's
kill it entirely, given that this was a developer feature only anyway
(and guarded by an appropriate ifdef).
Fixes: #30098
Login shells are supposed to marked via a dash as first char. We follow
that logic, but right now we simply overwrite the first char of the
shell. That might not be the right choice, given that this turns
"zsh" into "-sh", which suggests some bourne shell process.
Hence, let's correct things, and instead prefix a dash, which should be
safer.
Inspired by findings on https://github.com/systemd/systemd/issues/34153#issuecomment-2338104907
Then, it is not necessary to free NetDev.ifname when a conflicting
.netdev file is already loaded.
This also split out netdev_detach_name() and netdev_detach_impl().
No functional change, just refactoring.
This adds a ability to add alternative sections of a specific type in
the same UKI. The primary usecase is for supporting multiple different
kernel cmdlines that are baked into a UKI.
The mechanism is relatively simple (I think), in order to make it robust.
1. A new PE section ".profile" is introduced, that is a lot like
".osrel", but contains information about a specific "profile" to
boot. The ".profile" section can appear multiple times in the same
PE, and acts as delimiter indicating where a new profile starts.
Everything before the first ".profile" is called the "base profile",
and is shared among all other profiles, which can then override or
add addition PE sections on top.
2. An UKI's command line can be prefixed with an argument such as "@0" or
"@1" or "@2" which indicates the "profile" to boot. If no argument is
specified the default is profile 0. Also, a UKI that lacks any
.profile section is treated like one with only a profile 0, but with
no data in that profile section.
3. The stub will first search for its usual set of PE sections
(hereafter called "base sections"), and stop at the first .profile PE
section if any. It will then find the .profile matching the selected
profile by its index, and any sections found as part of that profile
on top of the base sections.
And that's already it.
Example: let's say a distro wants to provide a single UKI that can be
invoked in one of three ways:
1. The regular profile that just boots the system
2. A profile that boots into storagetm
3. A profile that initiates factory reset and reboots.
For this it would define a classic UKI with sections .linux, .initrd,
.cmdline, and whatever else it needs. The .cmdline section would contain
the kernel command line for the regular profile.
It would then insert one ".profile" section, with a contents like the
following:
ID=regular
This is the profile for profile 0. It would immediately afterwards add
another ".profile" section:
ID=storagetm
TITLE=Boot into Storage Target Mode
This would then followed with a .cmdline section that is just like the
basic one, but with "rd.systemd.unit=storage-target-mode.target"
suffixed. Then, another .profile section would be added:
ID=factory-reset
TITLE=Factory Reset
Which is then followed by one last PE section: a .cmdline one with
"systemd.unit=factory-reset.target" suffixed to te regular command line.
i.e. expressed in tabular form the above would be:
The base profile:
.linux
.initrd
.cmdline
.osrel
The regular boot profile:
.profile
The storagetm profile:
.profile
.cmdline
The factory reset profile:
.profile
.cmdline
You might wonder why the first .cmdline in the list above is placed in
the base profile rather than in the regular boot profile, given that it
is overriden in all other profiles anyway. And you are right. The only
reason I'd place it in the base profile is that it makes the UKI more
nicely extensible if later profiles are added that want to replace
something else instead of the .cmdline, for example .ucode or so. But it
really doesn't matter much.
While the primary usecase is of course multiple alternative command
lines, the concept is more powerful than that: for various usecases it
might be valuable to offer multiple choices of devicetree, ucode or
initrds.
The .profile contents is also passed to the invoked kernel as a file in
/.extra/profile (via a synthetic initrd). Thus, this functionality can
even be useful without overriding any section at all, simply by means of
reading that file from userspace.
Design choices:
1. On purposes I used a special command line marker (i.e. the "@" thing,
which maybe we should call the "profile selector"), that doesn't look
like a regular kernel command line option. This is because this is
really not a regular kernel command line option – we process it in
the stub, then remove it as prefix, and measure the unprefixed
command line only after that. The kernel will not see the profile
selector either. I think these special semantics are best
communicated by making it look substantially different from regular
options.
2. This moves around measurements a bit. Previously we measured our UKI
sections right after finding them. Now we first parse the profile
number from the command line, then search for the profile's sections,
and only then measure the sections we actually end up using for this
profile. I think that this logic makes most sense: measure what we
are using, not what we are overriding. Or in other words, if you boot
profile @3, then we'll measure .cmdline (assuming it exists) of
profile 3, and *not* measure .cmdline of the base profile. Also note
that if the user passes in a custom kernel command line via command
line arguments we'll strip off the profile selector (i.e. the initial
"@X" thing) before we pass it on.
3. The .profile stuff is supposed to be generic and extensible. For
example we could use it in future to mark "dangerous" options such as
factory reset, so that boot menus can ask for confirmation before
booting into it. Or we could introduce match expressions against
SMBIOS or other system identifiers, to filter out profiles on
specific hw.
Note btw, that PE allows defining multiple sections that point to the
same offsets in the file. This allows sharing payload under different
names. For example, if profile @4 and @7 shall carry the same .ucode
section, they can define .ucode in each profile and then make it point to
the same offset.
Also note that that one can even "mask" a base section in a profile, by
inserting an empty section. For example, if the base .dtb section should
not be used for profile @4, then add a section .dtb right after the
fourth .profile with a zero size to the UKI, and you will get your wish
fulfilled.
This code only contains changes to sd-stub. A follow-up commit will
teach sd-boot to also find this profile PE sections to synthesize
additional menu entries from a single UKI.
A later commit will add support for gnerating this via ukify.
Fixes: #24539
This adds helpers for:
1. Returning the PE section table of open PE files or memory
2. Scanning PE section tables for the sections that belong to a specific
profile
In mkosi, I want to add a sysupdate verb to wrap systemd-sysupdate.
The definitions will be picked up from mkosi.sysupdate/ and passed
to systemd-sysupdate. I want users to be able to write transfer
definitions that are independent of the output directory used by
mkosi. To make this possible, it should be possible to specify the
directory that transfer sources should be looked up in on the sysupdate
command line. Let's allow this via a new --transfer-source= option.
Additionally, transfer sources that want to take advantage of this
feature should specify PathRelativeTo=directory to indicate the configured
Path= is interpreted relative to the tranfer source directory specified
on the CLI.
This allows for the following transfer definition to be put in
mkosi.sysupdate:
"""
[Transfer]
ProtectVersion=%A
[Source]
Type=regular-file
Path=/
PathRelativeTo=directory
MatchPattern=ParticleOS_@v.usr-%a.@u.raw
[Target]
Type=partition
Path=auto
MatchPattern=ParticleOS_@v
MatchPartitionType=usr
PartitionFlags=0
ReadOnly=1
"""
Linux kernel v4.18 (2018-08-12) added user-namespace support to FUSE, and
bumped the FUSE version to 7.27 (see: da315f6e0398 (Merge tag
'fuse-update-4.18' of
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse, Linus Torvalds,
2018-06-07). This means that on such kernels it is safe to enable FUSE in
nspawn containers.
In outer_child(), before calling copy_devnodes(), check the FUSE version to
decide whether enable (>=7.27) or disable (<7.27) FUSE in the container. We
look at the FUSE version instead of the kernel version in order to enable FUSE
support on older-versioned kernels that may have the mentioned patchset
backported ([as requested by @poettering][1]). However, I am not sure that
this is safe; user-namespace support is not a documented part of the FUSE
protocol, which is what FUSE_KERNEL_VERSION/FUSE_KERNEL_MINOR_VERSION are meant
to capture. While the same patchset
- added FUSE_ABORT_ERROR (which is all that the 7.27 version bump
is documented as including),
- bumped FUSE_KERNEL_MINOR_VERSION from 26 to 27, and
- added user-namespace support
these 3 things are not inseparable; it is conceivable to me that a backport
could include the first 2 of those things and exclude the 3rd; perhaps it would
be safer to check the kernel version.
Do note that our get_fuse_version() function uses the fsopen() family of
syscalls, which were not added until Linux kernel v5.2 (2019-07-07); so if
nothing has been backported, then the minimum kernel version for FUSE-in-nspawn
is actually v5.2, not v4.18.
Pass whether or not to enable FUSE to copy_devnodes(); have copy_devnodes()
copy in /dev/fuse if enabled.
Pass whether or not to enable FUSE back over fd_outer_socket to run_container()
so that it can pass that to append_machine_properties() (via either
register_machine() or allocate_scope()); have append_machine_properties()
append "DeviceAllow=/dev/fuse rw" if enabled.
For testing, simply check that /dev/fuse can be opened for reading and writing,
but that actually reading from it fails with EPERM. The test assumes that if
FUSE is supported (/dev/fuse exists), then the testsuite is running on a kernel
with FUSE >= 7.27; I am unsure how to go about writing a test that validates
that the version check disables FUSE on old kernels.
[1]: https://github.com/systemd/systemd/issues/17607#issuecomment-745418835Closes#17607
Follow-up for 99aad9a2b9
The commit changed lookup_paths_init_or_warn() call to
be fatal to manager_reload(), but invoke_main_loop()
assumes that manager_reload() would only return
recoverable error, and put the manager back to
MANAGER_OK in that case, which is spurious.
Looking at it more, it appears to be utterly unnecessary
to reinitialize LookupPaths here, given that nothing during
the reload process would change the search dirs. Let's drop
the path altogether hence.
Right now it mostly duplicates a test that already exists in
TEST-50-DISSECT.mountfsd.sh, but it serves as a template for more unprivileged
nspawn tests.
The comment says that it is still in the host's CLONE_NEWUSER namespace,
which is not true if !arg_privileged. Also, it says that the CLONE_NEWNS
namespace was created by clone(), but if !arg_privileged then it was
actually created by nsresource_allocate_userns() and switched into by
setns(). Fix those inaccuracies.
When trying to word it clearly, there are enough commas and nested clauses
that I think it's clearer to break it into a list/table.
The .cred suffix is stripped from a credential as it is imported from
the ESP, hence it should not be included in the credential name embedded
in the credential.
Fixes: #33497
This options is pretty simple, it allows specifying an UKI whose
sections to import first, and place at the beginning of the new UKI.
This is useful for generating multi-profile UKIs piecemeal: generate the
base UKI first, then append a profile, and another one and another one.
The sections imported this way are not included in any PCR signature,
the assumption is that that already happened before in the imported UKI.
Let's make clearer what we are going to use /dev/kmsg for: read/write or just
writing. This hopefully should avoid confusion, such as the one #33975
is result of.
(Also while we are at it, add one extra debug message).
Fixes: #33975
So far you had to pick:
1. Use a signed PCR TPM2 policy to lock your disk to (i.e. UKI vendor
blesses your setup via signature)
or
2. Use a pcrlock policy (i.e. local system blesses your setup via
dynamic local policy stored in NV index)
It was not possible combine these two, because TPM2 access policies do
not allow the combination of PolicyAuthorize (used to implement #1
above) and PolicyAuthorizeNV (used to implement #2) in a single policy,
unless one is "further upstream" (and can simply remove the other from
the policy freely).
This is quite limiting of course, since we actually do want to enforce
on each TPM object that both the OS vendor policy and the local policy
must be fulfilled, without the chance for the vendor or the local system
to disable the other.
This patch addresses this: instead of trying to find a way to come up
with some adventurous scheme to combine both policy into one TPM2
policy, we simply shard the symmetric LUKS decryption key: one half we
protect via the signed PCR policy, and the other we protect via the
pcrlock policy. Only if both halves can be acquired the disk can be
decrypted.
This means:
1. we simply double the unlock key in length in case both policies shall
be used.
2. We store two resulting TPM policy hashes in the LUKS token JSON, one
for each policy
3. We store two sealed TPM policy key blobs in the LUKS token JSON, for
both halves of the LUKS unlock key.
This patch keeps the "sharding" logic relatively generic (i.e. the low
level logic is actually fine with more than 2 shards), because I figure
sooner or later we might have to encode more shards, for example if we
add further TPM2-based access policies, for example when combining FIDO2
with TPM2, or implementing TOTP for this.
Apparently _PATH_UTMPX is a glibc'ism. UTMPX_FILE is the same thing and
what everyone else uses. Since they are otherwise equivalent, let's just
switch.
We generally use utmpx instead of utmp (both are actually identical on
Linux, but utmpx is POSIX, while utmp is not). Let's fix one left-over
case.
UT_NAMESIZE does not exist in utmpx world, it has no direct counterpart,
hence let's just sizeof_field() to determine the size of the actual
field. (which comes to the same result of course: 32).
In 924453c225
ProtectHome was set to true for systemd-coredump in order to reduce risk, since an attacker could craft a malicious binary in order to compromise systemd-coredump.
At that point the object analysis was done in the main systemd-coredump process.
Because of this systemd-coredump is unable to product symbolicated call-stacks for binaries running under /home ("n/a" is shown instead of function names).
However, later in 61aea456c1 systemd-coredump was changed to do the object analysis in a forked process,
covering those security concerns.
Let's set ProtectHome to read-only so that systemd-coredump produces symbolicated call-stacks for processes running under /home.
This is the most basic preparatory work for supporting multi-profile
UKIs.
(This temporarily drops an assert_cc() check which we'll address in the
next commit)
The root directory is already mounted with a picked UID shift, hence
it is not necessary to remount with idmap. However, /usr/ is a bind-mount,
hence it must be remounted with idmap.
With this change, now '-U --volatile=yes' works fine.
Fixes#34254.
Previously, remount_idmap() failed as /var/ was already mounted, thus
remounting (strictly speaking, unmounting old root directory) failed
with -EBUSY.
As tmpfs /var/ is mounted with picked UID shift, it should not be
remounted with idmap, but needs to be mounted after the root directory
being remounted.
This makes '-U --volatile=state' work as expected.
This also
- rename variable n -> address,
- use log_syntax_parse_error() where applicable,
- add one more assertion for lvalue in config_parse_address().
The identifier 'stdin' is reserved in C. It can be #defined to any
statement that evaluates to a FILE*. We do not want that for our field,
so change to a more descriptive name.
The RADOS Block Device (rbd) can be used as any other block device with
further layers on top of it, hence allow the common persistent storage
rules to apply, including watching for changes.
We typically want to deal in usec_t, hence let's change the prototype
accordingly, and do proper range checks. Also, make sure are not
confused by negative times.
Do something similar for mktime_or_timegm().
This is a more comprehensive alternative to #34065
Replaces: #34065
An error message is already printed directly after, so the user already
knows that the validation failed. This also isn't done for the other
validation functions.
When sd-firstboot is ran during first boot of a new system this missing
newline leads to a bootup message being appended on the same line as the
message instructing to press a key.
Given that debug_invocation is a Unit thing, make
service_set_debug_invocation() generic. Plus, don't
say "Service failed", as it would be spurious when
Restart=always.
When dealing with copying COW images, we have to make a tradeoff:
- Either we don't touch the NOCOW bit on the copied file COW and get
an instant copy because we're able to reflink, but we might get
reduced performance if the source file was COW as COW files and lots
of random writes don't play well together.
- Or we force NOCOW for the copied file, which means we have to do a
full copy as reflinking from COW files to NOCOW files or vice versa
is not supported.
In exec-invoke.c, we've opted for the first option. In nspawn.c and
discover-image.c, we've opted for the second option.
In nspawn, this applies to the --ephemeral option to make ephemeral
copies. In discover-image.c, this applies to cloning images into
/var/lib/machines. Both these features might be used to run many
machines of the same original image. We really don't want to force
a full copy onto users in these scenarios when they're expecting
reflink behavior, leading to them running out of disk space. Instead,
degraded performance in their machines is a much less severe issue,
which they will discover on their own if it affects them, at which
point they can make their original image NOCOW at which point they'll
get both the reflinks and better performance.
Given the above reasoning, let's switch nspawn.c and discover-image.c
to use COPY_NOCOW_AFTER as well instead of enabling NOCOW upfront and
forcing a copy if the original source image is COW.
Unless otherwise requested, if we're going to copy a nocow file, make the
target file nocow as well.
Aside from keeping the performance characteristics of the cow or nocow file
intact, reflinking also only works from cow to cow or nocow to nocow files.
Reflinking from cow to nocow or nocow to cow files does not work and can
easily lead to unexpected copies for users, so by keeping the nocow bit
intact across copies by default we also make sure reflinks always work.
Instead of parsing the human readable output of apt-cache, let's
use apt patterns to figure out the dependencies.
We also filter out virtual packages as apt will fail and say we need
to install an implementation of the virtual package even if a package
that provides the virtual package is already installed.
Now that mkfs.btrfs is adding support for compressing the generated
filesystem (https://github.com/kdave/btrfs-progs/pull/882), let's
add general support for specifying the compression algorithm and
compression level to use.
We opt to not parse the specified compression algorithm and instead
pass it on as is to the mkfs tool. This has a few benefits:
- We support every compression algorithm supported by every tool
automatically.
- Users don't need to modify systemd-repart if a mkfs tool learns a
new compression algorithm in the future
- We don't need to maintain a bunch of tables for filesystem to map
from our generic compression algorithm enum to the filesystem specific
names.
We don't add support for btrfs just yet until the corresponding PR
in btrfs-progs is merged.
Let's also rename the error slightly, since what happens here is that a
a valid service RR name is CNAME'd onto an invalid one. That's an
inconsistency on the server side, which we really should report as such.
Follow-up for: b48ab08732
The original regex didn't cover the `run-unit-tests.py` script that
made the old framework pull in Python into the test image, which in turn
allowed the new TEST-69-SHUTDOWN Python script to get executed in the
old framework's image, causing unexpected fails with latest Python on
Rawhide.
If someone runs `updatectl update`, sysupdate will be running multiple
update jobs at the same time, which can make reasoning about the output
in the journal quite difficult. Especially if things go wrong: the error
messages didn't mention which job failed. Nor was there any link between
job ID and the PID of the worker process logging to the journal. This
is all fixed here!
Cuts out some `strdup`s, and also avoids a rather weird case of donating
memory to a function. Basically just duplicates the solution I just
implemented for sysupdate's callout handler.
Previously, if the callout binary (i.e. sd-pull, sd-import) failed
gracefully, we'd return its exit status from the event loop and thus
from run_callout(). Of course, exit status is a positive number in the
event of failure. Which means that we completely ignore the callout
binary failing, and instead continue using whatever it managed to
download before failing.
This is bad for obvious reasons, not the least of which is installing
a half-downloaded OS. This also means that we would completely ignore
failed signature checks 😬️
Force means force, we skip checks with PID1 for existing units, but
then bail out with EEXIST if the files are actually there. Overwrite
everything instead.
Currently, if for example a traffic control object already exist, networkd
will silently do nothing, even if the settings in the network file for the
traffic control object have changed. Let's instead replace the object if it
already exists so that new settings from the network file are applied as
expected.
Fixes#31226
Otherwise, if the same kind of tclass is already assigned, parameters
configured in .network file will not be used. So, let's first copy the
tclass and put it on Request, then on success generate a new copy based
on the netlink notification and store it to Link.
This is the same as 0a0c2672db,
65f5f58156, and friends, but for tclass.
Otherwise, if the same kind of qdisc is already assigned, parameters
configured in .network file will not be used. So, let's first copy the
qdisc and put it on Request, then on success generate a new copy based
on the netlink notification and store it to Link.
This is the same as 0a0c2672db,
65f5f58156, and friends, but for qdisc.
Preparation for fixing #31226.
This also makes all conf parsers defined in conf-parser.c return 1
on success, 0 on non-critical error.
Also, use free_and_strdup_warn() where applicable.
- use GREEDY_REALLOC() and FOREACH_ARRAY(),
- do not set an array with only terminating 'invalid' value.
Note, this macro is only used by parsing NamePolicy= and AlternativeNamesPolicy=
in .link files. and udevd correctly handles both an empty array and an
array with only 'invalid'. Hence, this does not change any behavior.
https://fedoraproject.org/wiki/Changes/RenameNobodyUser, 2018:
> Use "nobody:nobody" as the names for the kernel overflow UID:GID pair, and
> retire the old "nfsnobody" name and the old "nobody:nobody" pair with 99:99
> numbers.
The progress_bar functions do their own buffering: they reconfigure
stderr, then print, then flush and disable buffering on their own. In
situations where multiple progress bars are being drawn at a time (for
example, in updatectl), it's even more efficient to hoist the buffering
and flushing to the call site, and avoid drawing each progress bar
individually.
To that end, new _unbuffered variants of the progress_bar functions. And
we use them in updatectl.
This applies a couple of aesthetic changes to the way updatectl renders
progress information
1. We invert from "ICON TARGET MESSAGE" to "TARGET: ICON MESSAGE" to
better fit in with the systemd progress bars, which look like
"TARGET [==========---------] XX%". The original version of the
sysupdated PR implemented its own progress bars that were oriented
differently: "[==========---------] TARGET XX%". When we swapped
the progress bar we didn't swap the status messages
2. When a target finishes updating, instead of leaving a 100% progress
bar on screen for potentially extended periods of time (which implies
to the user that the update isn't actually done...), we show a status
message saying the target is done updating.
3. Fixed a minor bug where an extra newline would be printed after the
total progress bar. At the top of the rendering function, we scroll
the terminal's scroll-back just enough to fit a line for each target,
and one for the total. This means that we should not print an
additional line after the total, or else it'll scroll the terminal's
buffer by an additional character. This bug was introduced at some
point during review
4. Clears the Total progress bar before quitting. By the time we're
quitting, that progress bar will be showing no useful status for the
user. Also, the fix in point 3 will cause the shell's prompt to
appear on the same line as the Total progress bar, partially
overwriting it and leaving the shell in a glitchy state.
This fixes a bug introduced during review of sysupdated. Originally,
we just returned EALREADY verbatim to signify that the target is
already up-to-date. Then we switched this to a proper error
(org.freedesktop.sysupdate1.NoCandidate) during review. But that now
maps to EIO, not EALREADY. Thus, whenever there's nothing to update,
updatectl would report I/O errors to the user, even though nothing
actually went wrong.
Otherwise, when merging multiple directory trees, the output becomes
unreproducible as the directory timestamps will be changed to the current
time when copying identical directories from the second tree.
We introduce a new copy flag to achieve this behavior.
If the policy hash is empty we shouldn't return "0" from
search_policy_hash(), because that is understood as slot index 0, but
that's unlikely to match the policy.
Hence, return -ENOENT instead, indicating that we can't find a matching
slot.
We want the logarithm of the next power of two, which is the same
as the mask + 1, so add one to the mask to make sure the size is
sufficient to fit all flags.
Use these helpers whenever appropriate. Drop separate string checks,
since these helpers already do them anyway.
No actual code change, just a rework to make use of a nice helper we
have already.
* a67221c3f0 Always build ukify package
* abb115a905 Do not use patch to modify systemd-user pam config file
* 196ec98228 Drop %upstream conditionalization for patches
These operations might require slow I/O, and thus might block PID1's main
loop for an undeterminated amount of time. Instead of performing them
inline, fork a worker process and stash away the D-Bus message, and reply
once we get a SIGCHILD indicating they have completed. That way we don't
break compatibility and callers can continue to rely on the fact that when
they get the method reply the operation either succeeded or failed.
To keep backward compatibility, unlike reload control processes, these
are ran inside init.scope and not the target cgroup. Unlike ExecReload,
this is under our control and is not defined by the unit. This is necessary
because previously the operation also wasn't ran from the target cgroup,
so suddenly forking a copy-on-write copy of pid1 into the target cgroup
will make memory usage spike, and if there is a MemoryMax= or MemoryHigh=
set and the cgroup is already close to the limit, it will cause an OOM
kill, where previously it would have worked fine.
Currently translated at 92.8% (235 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 92.4% (234 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 91.3% (231 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 90.9% (230 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 90.5% (229 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 90.1% (228 of 253 strings)
Co-authored-by: Göran Uddeborg <goeran@uddeborg.se>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/sv/
Translation: systemd/main
Currently translated at 92.8% (235 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 92.4% (234 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 91.3% (231 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 90.9% (230 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 90.5% (229 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 90.1% (228 of 253 strings)
po: Translated using Weblate (Swedish)
Currently translated at 89.7% (227 of 253 strings)
Co-authored-by: Weblate Translation Memory <noreply-mt-weblate-translation-memory@weblate.org>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/sv/
Translation: systemd/main
Supposedly they're never going to rewrite their git history again
so let's give src.opensuse.org another try given that code.opensuse.org
is down again.
login is now from util-linux so credentials are supported.
It also needs to be pulled in as it's Protected: yes rather than
Essential: yes.
Keep the old setting for Ubuntu as that still uses login from shadow.
* aa17b7ddf9 Fix stage1 build
* 2c13391e33 Update changelog for 256.5-1 release
* 7d13196926 autopkgtest: skip TEST-64-UDEV-STORAGE due to qemu crash
* 47769e8d7c Drop patch merged upstream
* 4e8e9315b5 Update upstream source from tag 'upstream/256.5'
|\
| * 71b885347d New upstream version 256.5
* 89a33e5408 d/e/checkout-upstream: undo quilt patches before switching debian branch
* 3c942ecb0d d/e/checkout-upstream: do not rebase on main when building stable branches
* 28076e6232 Only make python3-pillow Recommends on Fedora
* a9807c4486 Do not require grubby on CentOS Stream 9
* d38cacfd3a Version 256.5
* 38291e13c1 Disable integration of userdb in sshd
* 53118d2112 Backport patch to only read /proc/cmdline when not in container
* 903e8e0f88 Backport upstream patch to try more initrd variants in 90-loaderentry.install
* b29a66006c Version 256.4
* 1cdae03391 Update tmpfiles --destroy-data patch
* 4fd4ef72a6 Upload sources
* 3c3772150d Version 256.3
rpm upstream is going to imply --noprep when running with --build-in-place so let's do the same on older
versions of rpm (e0925ad6e3)
Also, to keep things consistent between distros, run with --noprepare
on Arch Linux as well (we already skip patches on Debian/Ubuntu).
To keep things working on Arch, we apply the one downstream patch
manually ourselves.
Follow-up for bab889c51e (#33032).
Currently, they unconditionally returns EPOLLIN and USEC_INFINITY, respectively.
Just for consistency with sd-bus, sd-journal, sd-varlink, and so on. All
they have _get_fd(), _get_events(), and _get_timeout().
Closes#34094.
This adds two more fields in 'udevadm info':
- J for device ID, e.g. b128:1, c10:1, n1, and so on.
- B for driver subsystem, e.g. pci, i2c, and so on.
These, especially the device ID field may be useful to find udev
database file under /run/udev/data for a device.
As workarounded by fc0cbed2db, the pair of
subsystem and sysname is not unique. For examples,
- /sys/bus/gpio and /sys/class/gpio, both have gpiochip%N. However, these point to different devpaths.
- /sys/bus/mdio_bus and /sys/class/mdio_bus,
- /sys/bus/mei and /sys/class/mei,
- /sys/bus/typec and /sys/class/typec, and so on.
Let's refuse to provide sd_device object in such cases.
To create the sd_device object of a driver, the function
sd_device_new_from_subsystem_sysname() requires "drivers" for subsystem
and e.g. "pci:iwlwifi" for sysname. Similarly, sd_device_new_from_device_id()
also requires driver subsystem. However, we have never provided a
way to get the driver subsystem ("pci" for the previous example) from
an existing sd_device object.
Let's introduce a way to get driver subsystem.
This partially reverts the commit 730b76bd2c.
Before the commit, the function returned 0 on success, but the commit
made the function always return 1, as if device->devtype is NULL, the
function returns -ENOENT in the above.
Fortunately, udev_device_get_devtype() does not propagate any
non-negative value from sd_device_get_devtype(). Hence, hopefully we can
safely revert the change.
Commit 201e0d53bd ("stub: split out random seed part out of run()")
looks like refactoring but apparently it changed the logic when random
seed is refreshed in the ESP completely. Previously, process_random_seed()
was called when either:
- sd-stub was not present (LoaderFeatures var is unset) OR
- sd-stub was present but EFI_LOADER_FEATURE_RANDOM_SEED flag was unset.
Post-change, refresh_random_seed() bails under the exact same conditions (no
sd-stub or EFI_LOADER_FEATURE_RANDOM_SEED is unset) and thus
process_random_seed() is NOT called.
Restore the original logic. efivar_get_uint64_le()'s return value doesn't
require checking: loader_features is initialized to 0 and in case of failure it
stays untouched.
One of the major pait points of managing fleets of headless nodes is
that when something fails at startup, unless debug level was already
enabled (which usually isn't, as it's a firehose), one needs to manually
enable it and pray the issue can be reproduced, which often is really
hard and time consuming, just to get extra info. Usually the extra log
messages are enough to triage an issue.
This new option makes it so that when a service fails and is restarted
due to Restart=, log level for that unit is set to debug, so that all
setup code in pid1 and sd-executor logs at debug level, and also a new
DEBUG_INVOCATION=1 env var is passed to the service itself, so that it
knows it should start with a higher log level. Once the unit succeeds
or reaches the rate limit the original level is restored.
When the bypass logic is invoked, such as for queries to the stub with
the DO bit set, be certain to clear the AD bit in the reply before
forwarding it if the answer is not known to be authentic.
So far we manually hardcoded $LISTEN_FDNAMES to "varlink" in various
varlink service units we ship, even though FileDescriptorName=varlink
is specified in associated socket units already, because
FileDescriptorName= is currently silently ignored when combined with
Accept=yes. Let's step away from this, which seems saner.
Note that this is technically a compat break, but a mostly negligible
one as there shall be few users setting FileDescriptorName= but
still expecting LISTEN_FDNAMES=connection in the actual executable.
Preparation for #34080
This prevents bisecting to figure out which commit broke something
as when going backwards the git commit timestamp will be older meaning
package managers will refuse to upgrade to the "older" version. Let's
make sure the release is always newer by using the current date unless
$SOURCE_DATE_EPOCH is set.
DefaultRoute is a D-Bus property, not a valid setting name in .network
files nor resolved.conf.
Whether a link is the default route or not is configured with
DNSDefaultRoute= setting in .network files.
This introduce config_parse_routing_policy_rule(), which wraps existing
conf parsers. With this, we can drop many custom conf parsers for
[RoutingPolicyRule], and can reuse generic conf parsers in conf-parser.[ch].
Typically, conf parsers will ignore most errors during parsing strings
and return 0. Let's return 1 on success. Otherwise it is hard to reused
these function in another conf parser.
We currently search for 'bpf-gcc' and 'bpf-none-gcc'. Gentoo's
sys-devel/bpf-toolchain package uses 'bpf-unknown-none-gcc', as does Fedora's
cross-binutils. Search for this name too.
Inspired by #34098 → let's make it easier for users to understand and
correct the mistakes they made: let's early refuse invalid
interface/method names.
We usually do not set r = -1 when a functionality is disabled or not
supported. Even though the error code is not used, let's set a negative
errno in such case.
No functional change, just refactoring.
Follow-up for 0a4ecc54cb.
I don't actually need this anymore since we're going with a
unit based approach for the containers stuff internally so
let's just revert it.
Fixes#34085
This reverts commit ce2291730d.
A previous commit made sysupdate recognize installed versions where some
transfers are missing. This commit teaches sysupdate how to correctly
repair these incomplete versions.
Previously, if you had a incomplete installation of the OS booted, and
ran sysupdate in an attempt to repair it, sysupdate would make things
worse by creating copies of the currently-booted partitions in the
inactive slots. Then at boot you have two identical partitions, with
identical labels an UUIDs, and end up with a mess.
With this commit, sysupdate is able to recognize situations where it can
simply download the missing transfers and leave the rest of the system
undistrubed.
Partial fix for https://github.com/systemd/systemd/issues/33339
When enumerating what versions exist for a given target, sysupdate would
completely throw out any version that's incomplete (where some of the
transfers in the target have that version installed or available, and
other transfers do not).
If we're trying to find what versions we can offer for download, this is
great behavior. If the server side is advertising a partial update to
download, we shouldn't present it to the user.
On the other hand, if we're enumerating what versions we have currently
installed, this is a bad behavior. It makes sysupdate fragile. For
example, if a sysext introduces a new .conf file into
/usr/lib/sysupdate.d, suddenly the currently-installed OS stops being a
version that we've enumerated. Since it's not enumerated, it's not
protected, and so sysupdate will wipe the booted OS.
So if we're looking for installed versions, we now loosen the
restrictions and enumerate incomplete installations.
Partial fix for https://github.com/systemd/systemd/issues/33339
The current implementation will never find a match, because in the event
of a match instance_cmp falls through to comparing paths and the key
we're matching against will always have a path of NULL.
So let's just use a separate compare function, just to make sure future
updates to instance_cmp don't break resource_find_instance again.
This partially reverts 52bcc872b5.
We explicitly support running without user manager,
hence only user-runtime-dir@.service should be
required.
Fixes#33405
We call similar other fields in main.c (notably: rlimit stuff, env vars) "saved",
rather than "original". Hence stick to that kind of naming here too.
Follow-up for: #32937
This has been a glaring omission the docs: when people create
.user/.group/.user-privileged/.group-privileged drop-in files, they
should also create matching .membership files.
arg_root defaults to null, so if --root isn't given, this would try reading
etc/machine-info from the current working directory, which is likely to fail.
Fixes: 77db9ef2ab ("boot: Make sure we take --root into account everywhere.")
We usually configure a test rule with a unique priority. Hence, finding
rule by priority reduces the lines of output, and we can debug easily.
Also print short comments on check. That's helpful when the check is
called several times.
Otherwise, the other RoutingPolicyRule object may not have a valid
address family yet, and the existing rule may be wrongly handled as
that it is not requested by any interface, and it may be removed.
Follow-up for 727235006a.
Fixes#34068.
This softens the behavior originally introduced in eded61e410 to apply
only to the fallback dns servers.
The intent is that the global FallbackDNS (instead of DNS) can now be
used in conjunction with the per-link dns, providing a fallback behavior
without introducing a scope overlap.
References: eded61e410 (resolved: demote the global unicast scope, 2024-08-19)
This expands the role of fallback servers so they are applied not only
when there are no dns servers configured, but when all the configured
dns servers are configured only for non-default-route links.
This commit may have been a breaking change for sd-resolved foreign
resolv.conf mode, where a legacy network management daemon directly
modifies resolv.conf and sd-resolved consumes that.
This reverts commit eded61e410.
Follow-up for 7ac58157ca
With the mentioned commit, iff E2BIG we'd retry pidfd_spawn()
with POSIX_SPAWN_SETCGROUP disabled. However, the same strategy
should actually apply to EOPNOTSUPP/ENOSYS/EPERM too -
they can mean two things here: no clone3() or no CLONE_PIDFD.
Therefore, let's first try clone() + CLONE_PIDFD, and fall further back
to plain clone() (posix_spawn()) only as last resort. Plus, record
the fact so that we don't unnecessarily retry every single time
if CLONE_PIDFD is the one that's unavailable.
mkfs.btrfs has recently learned new options --subvol and --default-subvol
so let's stop failing when Subvolumes= and DefaultSubvolume= are used offline
and use the new --subvol and --default-subvol options instead to create subvolumes
in the generated root filesystem without root privileges or loop devices.
This is the command-line tool to manage systemd-sysudpated
Co-authored-by: Tom Coldrick <thomas.coldrick@codethink.co.uk>
Co-authored-by: Abderrahim Kitouni <abderrahim.kitouni@codethink.co.uk>
EINVAL means the kernel doesn't support it, ENODEV means it's
already revoked or the device is no longer there which has the same
effect anyway. All others - let's print an error to the logs.
We can easily hit the assertions without checking the internal states of
the DHCP client before calling these functions. That's annoying.
Let's do more gracefully.
When an interface enters the failed state, even if the DHCP client is
stopped, the acquired DHCP lease is not unreferenced, as the callback
dhcp4_handler() do nothing in that case. When the failed interface is
being reconfigured after that, the DHCP client is stopped again
(though it is already stopped), and SD_DHCP_CLIENT_EVENT_STOP event is
triggered and sd_dhcp_client_send_release() is called, and the
assertion in the function is triggered.
E.g.
===
systemd-networkd[98588]: wlp59s0: DHCPv4 address 192.168.86.250/24, gateway 192.168.86.1 acquired from 192.168.86.1
systemd-networkd[98588]: wlp59s0: Could not set DHCPv4 route: Nexthop has invalid gateway. Network is unreachable
systemd-networkd[98588]: wlp59s0: Failed
systemd-networkd[98588]: wlp59s0: State changed: configuring -> failed
systemd-networkd[98588]: wlp59s0: The interface entered the failed state frequently, refusing to reconfigure it automatically.
systemd-networkd[98588]: wlp59s0: DHCPv4 client: STOPPED
systemd-networkd[98588]: wlp59s0: DHCPv4 client: State changed: bound -> stopped
systemd-networkd[98588]: Got message type=method_call sender=:1.449 destination=org.freedesktop.network1 path=/org/freedesktop/network1 interface=org.freedesktop.network1.Manager member=ReconfigureLink ...
systemd-networkd[98588]: wlp59s0: State changed: failed -> initialized
systemd-networkd[98588]: wlp59s0: found matching network '/etc/systemd/network/50-wifi.network'.
systemd-networkd[98588]: wlp59s0: Configuring with /etc/systemd/network/50-wifi.network.
systemd-networkd[98588]: wlp59s0: DHCPv4 client: STOPPED
systemd-networkd[98588]: Assertion 'sd_dhcp_client_is_running(client)' failed at src/libsystemd-network/sd-dhcp-client.c:2197, function sd_dhcp_client_send_release(). Aborting.
===
When the interface is in the failed state, link_getlink_handler_internal()
will do nothing and return zero, thus the interface will not be
reconfigured, especially when the reconfiguration is triggered in
link_enter_failed().
Follow-up for c2eb7753dd.
In some kernels (specifically, 5.4) even though the clone3 syscall is
supported, setting CLONE_INTO_CGROUP is not. The error message returned
in this case is E2BIG.
If posix_spawn_wrapper encounters this error, it does not retry, and
cannot spawn any programs in said kernels.
This commit adds a check for the E2BIG error and retries pidfd_spawn()
without the POSIX_SPAWN_SETCGROUP flag.
If we encounter an E2BIG error, and the pidfd_spawn() succeeds after
removing the POSIX_SPAWN_SETCGROUP flag, then we cache the result so
that we do not retry every time.
Originally, this issue was reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1077204.
Signed-off-by: Kornilios Kourtis <kornilios@gmail.com>
That indicates the interface name in 'iif' or 'oif' cannot be resolved
when 'ip rule' command is invoked. That's natural when networkd fail to
remove rule but the corresponding interface is already removed.
To make not the residual rules interfere subsequent test cases, let's
ignore the flag and actually remove unwanted rules.
Currently, these attributes are not configured by us, but there may be a
existing rule created by user manually with one of these attribute.
To correctly manage such foreign rules, let's read these attributes.
The kernel does not distinguish rules with different flags in
rule_exists(), but the flags of an existing rule cannot be updated.
Let's remove rules that have conflicting flags, and configure new rules
later with requested flags.
When we fail to remove a rule, that mostly means the rule does not exist
in the kernel anymore, e.g. already removed manually and we have not
received notification about that yet.
Let's detach the rule in that case.
Otherwise, if we fail to configure the rule, then the manager will keep
nonexistent rule forever. So, let's first copy the rule and put it on
Request, then on success generate a new copy based on the netlink
notification and store it to Manager.
This is the same as 0a0c2672db, but for
routing policy rule.
If it is already requested, the new request will be anyway silently refused by
link_queue_request_safe(), which returns 0 in such case. Let's return earlier.
There should be no functional change, just refactoring.
They are stored in Manager.rules set or Network.rules_by_section hashmap.
For safety, let's not edit them even temporarily.
No functional change, just refactoring.
being connected to a TTY is not really enough to determine
interactivity in many cases. Let's also check if we have a controlling
TTY.
Inspired by #34016
This will greatly reduce the number of cases where the global unicast
scope overlaps with link scopes configured as default-route, making it
feasible to use the global DNS setting in conjunction with per-link dns
servers configured by the network.
This change is preferred over demoting links to default-route=no where
the user prefers to use the network provided DNS servers, and I expect
it is non-disruptive in that it should not degrade the efficacy of any
existing configuration.
let's instead generate ENOTTY on our own. This is more correct with out
coding style (since we generally do not propagate errors via errno), and
also addresses #34039 as side effect. (#34039 really needs to be fixed
in musl though, too, this is just a work-around as side-effect).
Fixes: #34039
glibc returs EIO on ttys that are hung up. That's not really correct,
POSIX seems to disagree.
Work around this in our code, and turn this into a clean "1", since a
hung up tty doesn't stop being a tty just because it is hung up.
Background: https://github.com/systemd/systemd/pull/34039
In apply_one_mount(), in the MOUNT_EXTENSION_DIRECTORY case,
char **extension_release was used as a return pointer twice but only
cleaned up once in the end. Fix it by removing duplicate code that
was causing this issue.
Fixes issue introduced in 55ea4ef096.
Currently, only FIB_RULE_INVERT flag can be configurable, but for
simplicity and future extension, let's manage all flags.
No functional change, just refactoring.
The kernel parses FRA_SUPPRESS_PREFIXLEN as uint32_t, but internally
handled as signed integer and negative values as unset. Let's explicitly
specify the size of the variable.
No functional change, just refactoring.
Unprivileged users often make themselves root by unsharing a user namespace
and then mapping their current user to root which does not require privileges.
Let's make sure our tests don't fail in such an environment by adding checks
where required to see if we're not running in a user namespace with only a
single user.
Note, `systemd-analyze foo@.service --instance=hoge` is equivalent to
`systemd-analyze foo@hoge.service`. But, the option may be useful when
e.g. passing multiple template units that have restriction on their
instance name:
```
$ ls
template_aaa@.service template_bbb@.service template_ccc@.service
$ systemd-analyze ./template_* --instance=hoge
```
Without the option, we need to embed an instance name into each unit
name, so cannot use globs.
Prompted by #33681.
Before this commit, the "Cannot raise nice level" branch
is rather confusing, as we're actually lowering the nice.
Also, it's better to log about the final nice value
for both cases, no matter whether we need to set to limit
or not.
Follow-up for 6d2984d21b
The current semantics of "filtered" in unit_is_filtered()
are actually the contrary of ListUnitsFiltered(). Let's
make things consistent, i.e. return true when the unit
shall be included.
The credential mounts should be managed singlehandedly by pid1.
Preparation for the future introduction of RefreshOnReload=credential,
where refreshing creds will be properly supported on reload.
Perform some checks earlier to avoid pointless polkit auth.
Plus, the missing unit_get_exec_context() shall not be
a formalized error. As it's our internal representation
and in the normal operation should never happen.
After 3976c43092 (#31423), IPMasquerade=
implies only per-interface IP forwarding. That means, nspawn users need
to manually enable IPv4/IPv6Forwarding= in networkd.conf when
--network-veth or friend is used. Even the change was announced in NEWS,
the change itself breaks backward compatibility and extremely reduces
usability.
Let's make the setting imply the global setting again.
Fixes#34010.
When running unprivileged, checking /proc/1/root doesn't work because
it requires privileges. Instead, let's add an environment variable so
the process that chroot's can tell (systemd) subprocesses whether
they're running in a chroot or not.
Previously the `_filter_units_by_property` completion function
outputs with a [zsh parameter expansion flag] `g⭕`. This means
that the returned result is unescaped as the zsh builtin `echo`,
except that octal escapes don’t take a leading zero. This seemed to
have worked back in the days when it was first introduced:
6c9414a700
But it now leads to incorrect over-unescaping; for example,
system-systemd\\x2djournald.slice (correct)
is incorrectly completed by zsh in commands such as
`systemctl kill`:
system-systemd-journald.slice (incorrect)
This commit fixes such problems by removing the `g⭕` flag.
See:
[zsh parameter expansion flag]: https://zsh.sourceforge.io/Doc/Release/Expansion.html#Parameter-Expansion-Flags
The net_id builtin only checked the of_node of a netdev's parent device,
not that of the netdev itself. While it is common that netdevs don't have
an OF node assigned themselves, as they are derived from some parent
device, this is not always the case. In particular when a single
controller provides multiple ports that can be referenced indiviually in
the Device Tree (both for aliases/MAC address assignment and phandle
references), the correct of_node will be that of the netdev itself, not
that of the parent, so it needs to be checked, too.
A new naming scheme flag NAMING_DEVICETREE_PORT_ALIASES is added to
allow selecting the new behavior.
Otherwise, several messages for the last invocation have not been
stored to journal yet.
Hopefully fixes the following race:
===
[ 603.037765] H systemd-run[10503]: Running as unit: invocation-id-test-26448.service; invocation ID: 1a49edeb05a641aaa2def72411134822
[ 603.099587] H bash[10504]: invocation 10 1a49edeb05a641aaa2def72411134822
[ 603.212069] H systemd[1]: invocation-id-test-26448.service: Deactivated successfully.
[ 603.225092] H systemd-run[10503]: Finished with result: success
[ 603.225163] H TEST-04-JOURNAL.sh[10506]: + journalctl --list-invocation -u invocation-id-test-26448.service
[ 603.225318] H systemd-run[10503]: Main processes terminated with: code=exited, status=0/SUCCESS
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: + tee /tmp/tmp.UzSmYamXyg/10
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: IDX INVOCATION ID FIRST ENTRY LAST ENTRY
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -9 d6efabb546014027b6bd7ee3a78386d6 Wed 2024-08-14 22:12:16 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -8 3e402b81c28d4a8fa2c5e8e31dffd9ee Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -7 5ebd0ba07d4f4f52bc84275f55a3ee2e Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -6 bc53c49d6ce24bb7acd438c3e61cfb23 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -5 24680907919e4839a75378117bb5a816 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -4 ec364ed7673c4a1fa22929f95ce7047b Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -3 2e8a4dea43044d1a9faf922f7a2f3d42 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -2 ac610b6e6c9c4a29bf8947890685478b Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -1 9b7d52c3620948f9831e323910f605f5 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: 0 1a49edeb05a641aaa2def72411134822 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225823] H systemd-run[10503]: Service runtime: 174ms
[ 603.225866] H TEST-04-JOURNAL.sh[10508]: + journalctl --list-invocation -u invocation-id-test-26448.service --reverse
[ 603.226110] H systemd-run[10503]: CPU time consumed: 12ms
[ 603.226142] H TEST-04-JOURNAL.sh[10509]: + tee /tmp/tmp.UzSmYamXyg/10-r
[ 603.226378] H systemd-run[10503]: Memory peak: 1.4M (swap: 0B)
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: IDX INVOCATION ID FIRST ENTRY LAST ENTRY
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: 0 1a49edeb05a641aaa2def72411134822 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:18 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -1 9b7d52c3620948f9831e323910f605f5 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -2 ac610b6e6c9c4a29bf8947890685478b Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -3 2e8a4dea43044d1a9faf922f7a2f3d42 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -4 ec364ed7673c4a1fa22929f95ce7047b Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -5 24680907919e4839a75378117bb5a816 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -6 bc53c49d6ce24bb7acd438c3e61cfb23 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -7 5ebd0ba07d4f4f52bc84275f55a3ee2e Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -8 3e402b81c28d4a8fa2c5e8e31dffd9ee Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -9 d6efabb546014027b6bd7ee3a78386d6 Wed 2024-08-14 22:12:16 UTC Wed 2024-08-14 22:12:17 UTC
===
This flag was added in db6aedab92 with the justification that locale
environment variables should be preserved by the user session. However,
the companion patch to drop the UnsetEnvironment= directive blocking
these variables was never merged, so the intended change was never
effected.
While the patch was ineffective toward its stated goal, the "-p" option
does have material negative consequences for the user session in
systemd — environment variables to support the use of
credentials and memory pressure directives, such as
$CREDENTIALS_DIRECTORY and $MEMORY_PRESSURE_WATCH, which are now
directly used by agetty and login, get leaked into the user session
potentially breaking applications that rely on these values.
E.g. systemd-ask-password fails from the tty when $CREDENTIALS_DIRECTORY
has been leaked from agetty, because it expects to be able to access
credentials in $CREDENTIALS_DIRECTORY.
This effectively reverts db6aedab92.
References: db6aedab92 (units: Tell login to preserve environment (#6023), 2017-05-24)
PTP device symlink creation rules are currently executed only when the
udev action is 'add'. If a user reloads the rules and runs the udevadm
trigger command to reapply changes, the symlink may be deleted, which
can prevent the chronyd service from restarting properly.
Signed-off-by: Chengen Du <chengen.du@canonical.com>
This fixes the following issues:
- We have a journal file, which contains entries of boot A and B. Let T
be the timestamp of the _last_ entry of boot A.
If sd_journal_seek_monotonic_usec() is called for boot A with a timestamp
_after_ T, following sd_journal_next() will provide the _first_ entry of
boot A, rather than the first entry of boot B.
- We have two journal files X and Y. The file X contains entries of boot A.
Let T be the timestamp of the _last_ entry of boot A in file X. The file Y
contains entries of boot A after timestamp T.
If sd_journal_seek_monotonic_usec() is called for boot A with a
timestamp _after_ T, following sd_journal_next() will provide the
_first_ entry of boot A, whose timestamp is of course earlier than T.
In the next commit, we'll introduce a varlink server for the user
manager. As preparation for that, let's introduce a new function to
initialize only the managed OOM connection whenever we send a managed
OOM update.
Follow-up for 329050c5e2
I don't particularly favor the duplicated strstrip()
and such, so let's ensure if we get fixed data it's
only trimmed once. Subsequently we can benefit more
by making all copies reflinks.
We initially read from temp file, then strip it, and write
back to it. If the file suddenly disappeared during the process,
it indicates someone else is touching our temp file
behind our back. Let's not silently continue.
- Add missing assertions
- Close all fds before spawning editor
- Use FOREACH_STRING() + empty_to_null() where appropriate
Note that this slightly changes the behavior, in that
empty envvars would be treated as unset and we'd try
the next candidate. But the new behavior is better IMO.
* 6e0f4f74ba Update changelog for 256.4-3 release
* 4b142f9c37 Depend on new linux-bpf-dev package where available
* f5fe5ecf4d autopkgtest: use hint-testsuite-triggers to ensure other packages changes trigger our testsuite
* 407932845d autopkgtest: run upstream test last
* 31458d03c2 Stop installing legaly pkla file in upstream CI too
* 484643291a Use d/not-installed instead of manual removals
* 752bb4c34c Stop shipping empty /etc/init.d directory
* 174603ffc2 Use debian/clean instead of override in d/rules
* 9a355e5a51 Drop redundant pot build
* 3d249c88cb Update changelog for 256.4-2 release
The nice value is part of struct sched_attr, and consequently invoking
sched_setattr() after setpriority() would clobber the nice value with
the default (as we are not setting it in struct sched_attr).
It would be best to combine both calls, but for now simply invoke
setpriority() after sched_setattr() to make sure Nice= remains effective
when used together with CPUSchedulingPolicy=.
When unit_need_daemon_reload() calls unit_find_dropin_paths() to check
for new drop-in configs, the manager's unit path cache is used to limit
which directories are considered. If a new drop-in directory is created,
it may not be in the unit path cache, and hence unit_need_daemon_reload()
may return false, despite a new drop-in being present. However, if a
unit path cache is not given to unit_file_find_dropin_paths() at all,
then it behaves as if the target path was found in the unit path cache.
So, to fix this, adapt unit_find_dropin_paths() to take a boolean
argument indicating whether or not to pass along the unit path cache.
Set this to false in unit_need_daemon_reload().
Fixes#31752
When pid 1 crashes, the getty unit for the console will happily keep
running which means we end up with two shells competing for the same
tty. Let's call vhangup on /dev/console to kill every other process
attached to the console before we spawn the crash shell. The getty
units have Restart=always but lucky for us, pid 1 just crashed in fire
and flames so it isn't actually able to restart the getty unit.
We generally don't care about library debuginfo so let's just disable
debuginfod so it doesn't get in the way when debugging.
We use /root/.gdbinit as the systemwide gdbinit location is distribution
specific.
gcc15 has -Wunterminated-string-initialization in -Wextra and
warns about string constants that are not null terminated even though
the functions do do out of bounds access.
Silence the warnings by simply not providing an explicit size.
This allows for example forcing to use /sbin/init instead of always
using /usr/lib/systemd/systemd if it exists. Or it allows using a
different path altogether.
When creating a user, check if the requested group name matches a user
name in the queue. If that matched user name is also going to be a group
name, then use it for the new user too. In other words, allow the
following:
u foo -
u bar -:foo
when both foo and bar are new users.
Fixes#33547
This fixes the following assertion:
===
SYSTEMD_LOG_LEVEL=debug systemctl --user -H foo --boot-loader-entry=help
Assertion 'transport != BUS_TRANSPORT_REMOTE || runtime_scope == RUNTIME_SCOPE_SYSTEM' failed at src/shared/bus-util.c:284, function bus_connect_transport(). Ignoring.
Failed to connect to bus: Operation not supported
===
Fixes a bug introduced by 97af80c5a7.
Fixes#33661.
Fixes oss-fuzz#70153.
Running the following commands:
# mkdir -p /var/lib/pcrlock.d/123-empty.pcrlock.d
# /usr/lib/systemd/systemd-pcrlock predict --pcr=1+2+3+4+5+16
Will result in:
...
Floating point exception
Running the following commands:
# mkdir -p /var/lib/pcrlock.d/123-empty.pcrlock.d
# /usr/lib/systemd/systemd-pcrlock make-policy --pcr=1+2+3+4+5+16
Will result to this (partial) log:
...
Predicted future PCRs in 133us.
[]
...
Written policy digest 0000000000000000000000000000000000000000000000000000000000000000 to NV index 0x1921da6
...
So, add missing checks to handle gracefully cases where there's no variant
inside the component.
Signed-off-by: Arnaud Patard <arnaud.patard@collabora.com>
The PrepareForShutdownWithMetadata signal was added via
e4aab5cf1a but a corresponding property
was not. A property has to be a single type, so the bool needs to be
one of the key/value pairs as 'ba{sv}' is not a valid property.
OpenSUSE's busybox has a bunch of Provides for basic tools that cause
it to get pulled into images unless the corresponding tool is explicitly
installed so let's add explicit tools to make sure we don't get busybox.
+ Scale the x-axis of the resulting plot by a factor (default 1.0)
+ Add activation timestamps to each bar
Signed-off-by: rajmohan r <rajmohan.r@kpit.com>
Rebuilding the integration test every time is very slow. Let's
introduce a way to iterate on an integration test without rebuilding
the image every time. By making a btrfs snapshot before we run the
integration test, we can then systemctl soft-reboot after running
the test to restore the rootfs to a pristine state before running
the test again.
As /run/nextroot will get nuked on reboot or soft-reboot, we introduce
a tmpfiles snippet to make sure it is recreated every (soft-)reboot
and adapt the existing tests to deal with this new symlink.
The next commit will introduce a way to iterate on integration
tests which depends on btrfs specific features.
We leave CentOS on ext4 as its kernel does not support btrfs.
- Add the required options to make the package managers non interactive
- Use apt-get instead of apt
- Remove --reinstall from apt-get command so we only install newer packages
- Add --needed to pacman command so we only install newer packages
As at this stage, a persistent journal file has been already opened, and
saved seqnum has been reset, and any later journal entries will be stored
to the file. Hence we should not open the runtime journal file by
server_system_journal_open() again.
The setting is about vacuuming archived journal files. It is not
necessary to rotate the current journal. Note, journal file rotation is
controlled by MaxFileSec=.
Fixes#31315.
Let's explicitly pass the value to -fstrict-flex-arrays. This does
not change behavior but it does (selfishly) make my error not bug
out with an error saying -fstrict-flex-arrays does not exist.
Now that we track auto-restarts with a dedicated state,
there's no need for a separate variable for this.
I also took the chance to reorder some struct members.
unit_start() advertises that start requests don't get suppressed,
so that it could be used to manually speed up auto restarts.
However, service_start() so far rejected this, stating that
clients should issue restart request in order to trigger
BindsTo=/OnFailure=.
That seems to be a red herring though, because for a long time
the service states between auto-restarts were buggy (#27594).
With the introduction of RestartMode=direct, the behavior
is sane again and customizable, hence I see no reason to refuse
this anymore. Whether those deps are triggered solely depends
on RestartMode= now.
Plus, filter out some intermediate states that should never
be seen in service_start().
Fixes#33890
Even if the glob pattern is valid, the pattern may match credentials
with invalid names. So, we need to check the names of the found
credentials.
Follow-up for 947c4d3952.
This fixes
commit 9b0688f491
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date: Tue Jan 9 10:52:49 2024 +0900
virt: add Google Compute Engine support
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The s390x platform provides confidential VMs using the "Secure Execution"
technology, which is also referred to as "Protected Virtualization" or
just "prot virt" in Linux / QEMU.
This can be detected through a simple sysfs attribute.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
We have different impls of detect_confidential_virtualization per
architecture. The detection is cached in the x86_64 impl, and as we
add support for more targets, we want to use caching for all. It thus
makes sense to split caching out into an architecture independent
method.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Since, at least the old framework, checks for the presence of the file
at the end and marks the whole test as skipped if it exists.
Resolves: systemd/systemd-centos-ci#728
cg_kill_kernel_sigkill() has a narrow use case, and currently
no code really reaches that branch. Let's detach it from
cg_kill_recursive() hence, and call it explicitly later
where appropriate.
When removing a cgroup, we always want to eliminate subcgroups
first, i.e. use cg_trim(). And cg_rmdir() (along with
CGROUP_REMOVE flag) is simply unused. Kill it.
tcp reset / icmp port-unreachable are markedly different conditions than
packet loss. It doesn't make much sense to retry in this case. It's
actually not clear if there is any benefit at all retrying tcp
connections, which were presumably already retried as necessary by the
tcp stack.
To make it work without sd-event.
Prompted by recent chat:
> Hey all!
> reading man libudev, it says to use sd-device instead now. I've read that
> APIs header file and it seems it no longer has an equivalent to libudev's
> udev_monitor_get_fd, which AFAICT means I have to use sd-event to watch
> for events I'm interested in. I know I can "embed" sd-event in other event
> loops I might already have, but that seems overkill when I'm only interested
> in this one type of event and don't need sd-event for anything else.
Previously, the main process of systemd-udevd manages worker process
with their sd_device_monitor object to save the destination address.
Let's save only destination address, and drop worker's sd_device_monitor
object.
Previously, device_monitor_enable_receiving() does
- update filter,
- bind socket.
But, binding socket can be done in when the socket is opened.
Let's remove device_monitor_enable_receiving() and bind the socket in
device_monitor_new_full().
Arch and Tumbleweed do not do EOLs but are still stable, so clarify the paragraph.
Also break the entry in paragraphs, to make it more readable when rendered.
The point was made on https://lists.debian.org/debian-ctte/2024/08/msg00005.html
that 'pre-release sounds' like an RC candidate, ie, something that will change
very slightly in the released version. But this is not necessarily the case
for example at the beginnig of a Fedora Rawhide or Debian Testing release cycle,
so change it to a more generic 'development'
Follow-up for 7102dc52e6
This is for experimental builds of the OS made to test some specific WIP
feature.
For example, let's say the distro in question is Asahi Linux and Apple
just released the M3 SoC. The Asahi developers will start porting to the
M3, and will quickly generate builds of Asahi Linux that can technically
boot but aren't ready for any kind of daily use. These images are marked
as experimental, and can be shared among the developers. If a user
somehow stumbles upon one of these images and tries to install it,
they'll be warned that they're about to install an experimental Apple M3
port of Asahi Linux. Eventually, once the Asahi developers think that
their M3 port is ready for a wider audience, they can merge it into the
mainline Asahi repos, where it will be distributed through the usual
nightly CI builds (where RELEASE_TYPE=pre-release; M3 support is no
longer experimental).
This will allow GUIs to customize their behavior a little based on the
type of release.
For example, an OS installer may display a warning/disclaimer if
RELEASE_TYPE=prerelease. The software updates app might be a bit more
insistent about upgrading to the next major release if
RELEASE_TYPE=stable than if RELEASE_TYPE=lts
Fixes the following error:
===
In file included from ../src/basic/macro.h:13,
from ../src/basic/dirent-util.h:8,
from ../src/journal/journalctl-misc.c:3:
../src/journal/journalctl-misc.c: In function 'show_log_ids':
../src/journal/journalctl-misc.c:107:22: error: comparison is always true due to limited range of data type [-Werror=type-limits]
107 | assert(n_ids < INT64_MAX);
| ^
../src/fundamental/macro-fundamental.h:70:44: note: in definition of macro '_unlikely_'
70 | #define _unlikely_(x) (__builtin_expect(!!(x), 0))
| ^
../src/basic/macro.h:165:22: note: in expansion of macro 'assert_message_se'
165 | #define assert(expr) assert_message_se(expr, #expr)
| ^~~~~~~~~~~~~~~~~
../src/journal/journalctl-misc.c:107:9: note: in expansion of macro 'assert'
107 | assert(n_ids < INT64_MAX);
| ^~~~~~
cc1: all warnings being treated as errors
===
Follow-up for 0a8c1f6212.
The --list-invocations command is similar to --list-boots, but shows
invocation IDs of specified unit. This should be useful when showing
a specific invocation of a unit.
The --invocation option is similar to --boot, but takes a invocation ID
or an offset. The -I option is equivalent to --invocation=0.
The struct itself is generic, and can be used for other ID.
Let's rename it to more generic one.
No functional change, just refactoring and preparation for later
commits.
History (c068650fcf,
941a12dcba) has proven
that we're not good at keeping socket and service states
in sync. Instead, let's query the high-level unit_active_state()
first, and only hardcode the two special auto-restart
service states.
Additionally, allow returning to listening state on SERVICE_CLEANING.
Let's use the newly added credentials to only enable autologin for
/dev/console (systemd-nspawn) and /dev/hvc0 (qemu) instead of enabling
autologin for every tty.
This allows for "per-instance" credentials for units. The use case
is best explained with an example. Currently all our getty units
have the following stanzas in their unit file:
"""
ImportCredential=agetty.*
ImportCredential=login.*
"""
This means that setting agetty.autologin=root as a system credential
will make every instance of our all our getty units autologin as the
root user. This prevents us from doing autologin on /dev/hvc0 while
still requiring manual login on all other ttys.
To solve the issue, we introduce support for renaming credentials with
ImportCredential=. This will allow us to add the following to e.g.
serial-getty@.service:
"""
ImportCredential=tty.serial.%I.agetty.*:agetty.
ImportCredential=tty.serial.%I.login.*:login.
"""
which for serial-getty@hvc0.service will make the service manager read
all credentials of the form "tty.serial.hvc0.agetty.xxx" and pass them
to the service in the form "agetty.xxx" (same goes for login). We can
apply the same to each of the getty units to allow setting agetty and
login credentials for individual ttys instead of globally.
Credentials are written to a temporary file and renamed to the
destination with renameat() which will replace existing files so
EEXIST should not happen so drop the handling for EEXIST.
All messages logged from exec_spawn() are attributed to the unit
and as such we should set the log level to the unit's max log level
for the duration of the function.
Remove an early return that prevents --prompt-root-password or
--prompt-root-shell and systemd.firstboot=off using credentials. In that case,
arg_prompt_root_password and arg_prompt_root_shell will be false, but the
prompt helpers still need to be called to read the credentials. Furthermore, if
only the root shell has been set, don't overwrite the root password.
If /etc/passwd and/or /etc/shadow exist but don't have an existing root entry,
one needs to be added. Previously this only worked if the files didn't exist.
With ambient capabilities being dropped at the start of process managers
(both system and user) as well as systemd-executor it isn't necessary
to drop them here. Moreover, at this point also the inheritable set can
be preserved. This makes it possible to assign a user session manager
inheritable capabilities which combined with file capabilites (ei sets)
of service executables enable running user services with capabilities
but only when started by the manager.
This reverts commit 943800f4e7.
Since the commit 963b6b906e ("core: drop ambient capabilities in
user manager") systemd running as the session manager has dropped ambient
capabilities retaining other sets allowing user services to be started
with elevated capabilities. This, worked fine until the introduction of
sd-executor. For a non-root process to be started with elevated
capabilities by a non-root parent it either needs file capabilities or
ambient capabilities in the parent process. Thus, systemd needs to allow
sd-executor to inherit its ambient capabilities and sd-executor should
drop them as systemd did before.
The ambient set is managed for both system and session managers, but
with the default set for PID#1 being empty, this code does not affect
operation of PID#1.
Fixes: bb5232b6a3 ("core: add systemd-executor binary")
Although locked and empty passwords in /etc/passwd are treated the same, in all
other cases the entry is configured to read the password from /etc/shadow.
A PE image's memory footprint might be larger than its file size due
to uninitialized memory sections. Normally all PE headers should be
parsed to check the actual required size, but the legacy EFI handover
protocol is only used for x86 Linux bzImages, so we know only the last
section will require extra memory. Use SizeOfImage from the PE header
and if it is larger than the file size, allocate and zero extra memory
before using it.
Fixes https://github.com/systemd/systemd/issues/33816
Otherwise, when an interface gained its carrier, the interface may not
have matching .network file yet, then link_reconfigure_impl() returns
zero, and link_handle_bound_by_list() is skipped.
Fixes#33837.
This reverts commit ffef01acdd.
Similar to 2d393b1b6d ("network: IPv6 Compliance: Router Advertisement
Processing, Reachable Time [v6LC.2.2.15]"),
Extract from: https://www.ietf.org/rfc/rfc4861.html#section-4.2, p.21,
first paragraph:
The Router Lifetime applies only to
the router's usefulness as a default router; it
does not apply to information contained in other
message fields or options.
So it does not make sense to prevent DHCPv6 when Router Lifetime is 0.
Fixes#33357.
The original CVM detection logic for TDX assumes that the guest can see
the standard TDX CPUID leaf. This was true in Azure when this code was
originally written, however, current Azure now blocks that leaf in the
paravisor. Instead it is required to use the same Azure specific CPUID
leaf that is used for SEV-SNP detection, which reports the VM isolation
type.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
As all callers do not care if the address has peer address.
This also drops prefixlen argument as it is always zero.
Fixes a bug introduced by 42f8b6a808.
Fixes#31950.
IPv4 addresses are managed with local and peer addresses and prefix
length. So, potentially, the same address with different prefix length
can be assigned on a link, e.g. 192.168.0.1/24 and 192.168.0.1/26.
If one of the address is configured with ACD but the other is not,
then previously ACD might be unexpectedly disabled or enabled on them,
as we managed ACD engines with only local addresses.
This makes ACD engines managed with the corresponding Address objects.
Even if a timespan specified to IgnoreCarrierLoss= for an interface,
when the carrier of the interface lost, bound interfaces might be bring
down immediately.
Let's also postpone bringing down bound interfaces with the specified
timespan.
This is similar to what we do for veth interfaces in remove_veth_links().
When a container rebooted, macvlan interfaces created by the previous
boot may still exist in the kernel, and that causes -EADDRINUSE after
reboot.
Hopefully fixes#680.
Similar to the implementation of cgroup.kill in the kernel, let's
skip kernel threads in cg_kill_items() as trying to kill kernel
threads as an unprivileged process will fail with EPERM and doesn't
do anything when running privileged.
On CentOS/Fedora, dracut is configured to write the initrd to
/boot/initramfs-$KERNEL_VERSION...img so let's check for that as well
if no initrds were supplied.
If we're running from within a container, we're very likely not going
to want to use the kernel command line from /proc/cmdline, so let's add
a check to see if we're running from a container to decide whether we'll
use the kernel command line from /proc/cmdline.
- Improve wording for explanation when these variables are inherited
- Clarify that these variables are not placed in the process environment block,
so /proc/PID/environ cannot be used as a debugging tool
The new file, modules.weakdep, generated by depmod to get the weak
dpendencies information can be present
(05828b4a6e),
so remove it like the other similar files.
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
This probably rarely helped anyway, but it also in some cases interferes
with auxiliary dnssec queries where the authoritative nameserver does
not support EDNS0/DNSSEC.
Fixes: ac6844460c ("resolved: support RFC 8914 EDE error codes")
It means: a) user cannot be created, something's wrong in the
test environment -> fail the test; b) user already exists, we shall not
continue and delete (foreign) user.
TEST-46-HOMED fails on ext4 because the filesystem is deemed to small
for activation by cryptsetup. Let's bump the minimal filesystem size for
ext4 a bit to be in the same ballpark as ext4 and btrfs to avoid weird
errors due to impossibly small filesystems.
Also use U64_MB while we're touching this.
Currently inhibitors are bypassed unless an explicit request is made to
check for them, or even in that case when the requestor is root or the
same uid as the holder of the lock.
But in many cases this makes it impractical to rely on inhibitor locks.
For example, in Debian there are several convoluted and archaic
workarounds that divert systemctl/reboot to some hacky custom scripts
to try and enforce blocking accidental reboots, when it's not expected
that the requestor will remember to specify the command line option
to enable checking for active inhibitor locks.
Also in many cases one wants to ensure that locks taken by a user are
respected by actions initiated by that same user.
Change logind so that inhibitors checks are not skipped in these
cases, and systemctl so that locks are checked in order to show a
friendly error message rather than "permission denied".
Add new block-weak and delay-weak modes that keep the previous
behaviour unchanged.
Currently, IS_SYNTHETIC_ERRNO() evaluates to true for all negative errnos,
because of the two's-complement negative value representation.
Subsequently, ERRNO= is not logged for most of our own code.
Let's fix this, by formatting all synthetic errnos as positive.
Then, treat all negative values as non-synthetic.
While at it, mark the evaluation order explicitly, and remove
unneeded comment.
Fixes#33800
Since the copy helpers now copy file attributes as well, let's not
explicitly disable copy-on-write anymore when we copy an image. If
the source already has copy-on-write disabled, the copy will have it
disabled as well. Otherwise, the copy will also have copy-on-write
enabled.
This makes sure that reflinks always work as reflink is only supported
if both source and target are copy-on-write or both source and target
are not copy-on-write.
COW on btrfs generally does not play well lots of random writes so
let's make the disk images generated by repart NOCOW by default on
btrfs like we do elsewhere across the codebase.
On btrfs, reflinks into a disk image that has copy-on-write disabled
only work if the source has copy-on-write disabled as well so let's
make sure that's the case if the disk image has copy-on-write disabled.
openat() will always resolve symlinks, except if O_NOFOLLOW is passed
or O_CREAT|O_EXCL is passed. This means that if a dangling symlink is
passed to openat_report_new(), the first call to openat() will always
fail with ENOENT and the second call to openat() will always fail with
EEXIST.
Let's catch this case explicitly and fallback to creating the file with
just O_CREAT and assume we're the ones that created the file. We can't
resolve the symlink with chase() because this function is itself called
by chase() so we could end up in weird recursive calls if we'd try to do
so.
This adds support in `systemd-analyze capability` for decoding
capability masks (sets), e.g.:
```console
$ systemd-analyze capability --mask 0000000000003c00
NAME NUMBER
cap_net_bind_service 10
cap_net_broadcast 11
cap_net_admin 12
cap_net_raw 13
```
This is intended as a convenience tool for pretty-printing capability
values as found in e.g. `/proc/$PID/status`.
* c7138e0b87 Configure default DNS servers for upstream CI builds
* bc5d1afe1e Drop out-of-tree localed patch and use D-Bus policy instead
* b5f8ababde autopkgtest: set Release= in mkosi.local.conf to distinguish testing vs unstable
* 323afafd80 autopkgtest: add allow-stderr to timedated test
* 0291f361e3 Install valrinkctl zsh completion file
* f40b9eba02 d/t/control: add Depends: lib{systemd,udev}-dev for upstream
* 3def595de3 d/t/upstream: ensure correct ubuntu codename is used
* 531bb6817e d/t/boot-and-services: fix a couple python sytax warnings
* 963ac13b7d d/t/boot-and-services: skip test_tmp_cleanup if tmp.mount is overridden
Follow-up for 6906c028e8
The mentioned commit uses access() to check if varlink socket
already exists in the filesystem, but that isn't sufficient.
> Varlink sockets are not serialized until v252, so upgrading from
> v251 or older means we will not listen anymore on the varlink sockets.
>
> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1074789
> for more details as this was found when updating from Debian Bullseye to a new version.
After this commit, the set up of varlink_server is effectively
split into two steps. manager_varlink_init_system(), which is
called after deserialization, would no longer skip listening
even if Manager.varlink_server is in place, but actually
check if we're listening on desired sockets.
Then, manager_deserialize() can be switched back to using
manager_setup_varlink_server().
Alternative to #33817
Co-authored-by: Luca Boccassi <bluca@debian.org>
We do this already in all string lookup tables. This way
it's guaranteed that iterators which ends with _NAMESPACE_TYPE_MAX
wouldn't overrun the array.
This test doesn't check the generated JSON data in detail, it simply
tests that round-tripping an RR key through the JSON representation
preserves its data.
The page was written when systemd-repart was primarily intended to be used on a
running system. But nowadays it's more often used to create images, so extend
that part of the description.
While at it, fix some whitespace issues and trim some overly complicated sentences.
Certainly on systemd 252 at least a configuration of
```
MemorySwapMax=40%
```
is supported but this was missing from the man page.
Only MemoryMax was documented as supporting a %.
It's important if the binary specified by the init= boot option is not systemd
otherwise it confuses systemctl that incorrectly assumes that systemd is still
the init system due to the presence of /run/systemd/system.
Also some tools might also check the presence of /run/systemd/private to test
if systemd is running as pid1.
When building packages of arbitrary commits of systemd-stable,
distributors might want to include a git sha of the exact commit
they're on. Let's extend vcs-tag a little to make this possible.
If we're on a commit matching a tag, don't generate a git sha at all.
If we're not on a commit matching a tag, generate a vcs tag as usually.
However, if we're not in developer mode, don't append a '^' if the tree
is dirty to accomodate package builds applying various patches to the
tree which shouldn't be considered as "dirty" edits.
Let's rename the tool to tools/fetch-distro. It's useful to be able to fetch
the distro directly. But when that functionality is added, the old name is
confusing.
Now --update/-u must be specified to update the commits.
--reference-if-able is used to speed up the clone of debian.
It saves about 75% of the download.
If there is an error with the execv call in fork_agent the
program exits without any meaningful log message. Log the
command and errno so the user gets more information about
the failure.
Fixes: #33418
Signed-off-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
These tests identify a couple of problems with OPT pseudo-RR parsing.
First, any TTL value with the high bit set is replaced with zero before
checking the record type. For most types this is correct, since TTLs
have the range of signed int32. But for OPT records where the TTL is
repurposed to hold the extended RCODE, EDNS version and flags, it means
that the high bit cannot be used in extended RCODEs. Any RCODE with the
high bit set will be read as zero.
Second, the DNS_PACKET_RCODE() function bit-shifts the extended RCODE by
24 places instead of 20, so that it ends up forming the lower 8 bits of
a 12-bit RCODE, instead of the upper 8 bits as intended.
We intend to fix these issues in other pull requests.
* 00babccdea Simplify BFQ scheduler enablement
* ef8ddb130b Rebuilt for https://fedoraproject.org/wiki/Fedora_41_Mass_Rebuild
* 5b4a5461d6 Fix changelog
* a8c5c736f6 Only apply shorter shutdown timer changes on Fedora
* f4e284cd7a Merge #150 `Deal with systemd-timesyncd backport in EPEL`
|\
| * 9378a0733a Deal with systemd-timesyncd backport in EPEL
* | 12d1f05029 Don't claim /sbin/installkernel if building for CentOS Stream 9
|/
* 79828f2753 spec: use "positive" conditions in conditionals
* c5d3af1638 Add build dependency on rsync on CentOS Stream 9
* 8d080fb5cb Backport udma buffer access patch
* 6084453807 Add support for building from a specific branch
* cb9d631ca0 Update PR patch metadata
* 3889da947e In standalone subpackages, suggest coreutils-single
* b7800e3e66 Drop versions from Conflicts for standalone packages
Add a test for the new bridge netlink attributes IFLA_BR_FDB_N_LEARNED and
IFLA_BR_FDB_MAX_LEARNED.
Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
Since Linux commit ddd1ad68826d ("net: bridge: Add netlink knobs for number
/ max learned FDB entries") [1] it is possible to limit to number of
dynamically learned fdb entries per bridge.
Add support to the systemd networkctl for the netlink bridge attributes
IFLA_BR_FDB_MAX_LEARNED and IFLA_BR_FDB_N_LEARNED.
[1] https://lore.kernel.org/all/20231016-fdb_limit-v5-0-32cddff87758@avm.de/
Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
Since Linux commit ddd1ad68826d ("net: bridge: Add netlink knobs for number
/ max learned FDB entries") [1] it is possible to limit to number of
dynamically learned fdb entries per bridge.
Add support to the systemd netdev bridge for the new netlink attribute
IFLA_BR_FDB_MAX_LEARNED.
[1] https://lore.kernel.org/all/20231016-fdb_limit-v5-0-32cddff87758@avm.de/
Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
These are now practically identical, with the only differences between
the two having no effect on the rpm builds we do with mkosi, so let's
cut out the middle man and just use the Fedora Rawhide spec for CentOS
as well.
Previously I thought it would make sense to allow running the build
scripts from within the VM/container to rebuild the packages. Instead
we ended up making it possible to rerun mkosi outside of the container/VM
to rebuild the packages, so let's switch back to $PKG_SUBDIR to tell the
build scripts where to look for the packaging sources.
When credentials are used with Type=simple + ExecStartPost=,
i.e. when multiple sd-executor instances are running in parallel
for a single service, the state of final credential dir
might be unexpected wrt path_is_mount_point() and other
steps. So, let's imply Type=exec if not explicitly specified,
and emit a warning otherwise.
This commit makes systemd-cryptsetup exit with a successful status when
the volume gets unlocked outside of the current systemd-cryptsetup
process while it was executing. This can be easily reproduced by calling
systemd-cryptsetup, and while it waits for user to input a password/PIN,
unlock the volume in a second terminal. Then after entering the password
systemd-cryptsetup will exit with a non-zero status code.
pci_get_hotplug_slot() has the following limitations:
- if slots are not hotpluggable, they are not in /sys/bus/pci/slots.
- the address at /sys/bus/pci/slots/X/addr doesn't contains the function part,
so on some system, 2 different slots with different _SUN end up with the same
hotplug_slot, leading to naming conflicts.
- it tries all parent devices until it finds a slot number, which is incorrect,
and what led to NAMING_BRIDGE_MULTIFUNCTION_SLOT being disabled.
The use of PCI hotplug to find the slot (ACPI _SUN) was introduced in
0035597a30
"udev: net_id - export PCI hotplug slot names" on 2012/11/26.
At the same time on the kernel side we got
bb74ac23b1
"ACPI: create _SUN sysfs file" on 2012/11/16.
Using PCI hotplug was the only way at the time, but now 12 years later we can use
firmware_node/sun sysfs file.
Looking at a small selection of server HW, for HPE (Gen10 DL325), the _SUN is attached
to the NIC device, whereas for Dell (R640/R6515/R6615) and Cisco (UCSC-C220-M5SX),
the _SUN is on the first parent pcieport.
We still fallback to pci_get_hotplug_slot() to handle the s390 case and
maybe some other coner cases (_SUN on grand parent device that is not a
bridge ?).
Currently, we have a bunch of Type=oneshot + RemainAfterExit=yes
services that make use of credentials. When those exits, the cred mounts
remain established, which is pointless and quite annoying. Let's
instead destroy the runtime data on SERVICE_EXITED, if no process
will be spawned for the unit again.
Previously only running `timedatectl` it was showing warning regarding the dangers of setting RTC to local TZ.
Now similar warning is also flashed when `set-local-rtc 1`.
If building with clang and clang does not support bpf, then enabling
-Dbpf-framework=enabled would silently drop the feature (even printing
bpf-framework: enabled in the meson build recap, and no message anywhere
that'd hint at the failure!)
This is unexpected, so add check to fail hard in this case.
All other code paths (gcc, missing bpftool) properly check for the
option, but it is not as easy for a custom command so check explicitly
The current behavior is actually OK, since use_ex_prop = !arg_expand_environment,
but that's very implicit and using STRV_MAKE() this way feels icky.
Let's make this more readable, by using exec_command_flags_to_strv().
correct redundant or mismatched tags and fill the argument field of
curcontext because _regex_words does not do that for us.
The _complete_help text now looks much more reasonable most of the time:
$ varlinkctl call /run/systemd/resolve/io.systemd.Resolve ^Xh
tags in context :completion::complete:varlinkctl::
argument-rest (_arguments _varlinkctl)
tags in context :completion::complete:varlinkctl-call:method:
varlink-methods (_varlinkctl_cmd _varlinkctl_command _arguments _varlinkctl)
Fixes: af63b4b769 ("zsh: add varlinkctl completions")
We've been getting some integration test failures due to timeouts
on finding the root partition device. Let's bump the default device
timeout a little to see if it mitigates these failures.
In PID 1 we write status information to /dev/console regularly, but we
cannot keep it open continously, due to the kernel's SAK logic (which
would kill PID 1 if user hits SAK). But closing/reopening it all the
time really sucks for tty types that have no window size management
(such as serial terminals/hvc0 and suchlike), because it also means the
TTY is fully closed most of the time, and that resets the window sizes
to 0/0.
Now, we reinitialize the window size on every reopen, but that is a bit
expensive for simple status output. Hence, cache the window size in the
usualy $COLUMNS/$ROWS environment variables. We don't inherit these to
our payloads anyway, hence these are free to us to use.
Various tty types come up with cols/rows not initialized (i.e. set to
zero). Let's detect these cases, and return a better error than EIO,
simply to make things easier to debug.
/dev/console is sometimes a symlink in container managers. Let's handle
that correctly, and resolve the symlink, and not consider the data from
/sys/ in that case.
It doesn't really make sense to have that in dev-setup.c, which is
mostly about setting up /dev/, creating device nodes and stuff.
let's move it to the other stuff that deals with /dev/console's
peculiarities.
Let's always rely on our own TTY reset logic and tty disallocation/clear
screen logic, thus always pass --noclear and --noreset.
Also, bring the list of baud rates to try into sync for console-getty
and serial-getty (the former might or might not be connected to rs232,
we can't know, hence assume the worst, and copy what
serial-getty@.service does)
It's a bit confusing, but we actually initialize the terminal twice for
each service, potentially. One earlier time, where we might end up
firing vhangup() and vt_disallocate(), which is a pretty brutal way to
reset things, by disconnecting and possibly invalidating the tty
completely. When we do this we do not keep any fd open afterwards, since
it quite likely points to a dead connection of a tty.
The 2nd time we initialize things when we actually want to use it.
The first initialization is hence "destructive" (killing any left-overs
from previous uses) the 2nd one "constructive" (preparing things for our
new use), if you so will.
Let's document this distinction in comments, and let's also move both
initializations to exec_invoke(), so that they are easier to see in their
symmetric behaviour. Moreover, let's run the tty initialization after we
opened both input and output, since we need both for doing the fancy
dimension auto init stuff now.
Oh, and of course, one thing to mention: we nowadays initialize
terminals both with ioctl() and with ansi sequences. But the latter
means we need an fd that is open for *write* (since we are *writing*
those ansi sequences to the tty). Hence, resetting via the input fd is
conceptually wrong, it worked only so far if we had O_RDWR open mode
selected)
Let's make sure to first issue the non-destructive operations, then
issue the hangup (for which we need the fd), then try to disallocate the
device (for which we don't need it anymore).
And while we are at it, merge exec_context_determine_tty_size() +
exec_context_apply_tty_size().
Let's simplify things, and merge the two funcs, since the latter just
does one more call.
At the same time, let's make sure we actually allow passing separate
input/output fds.
We nowadays reset TTYs by writing ANSI sequences to them. This can only
work if we operate on an *output* fd, not an input fd. Hence switch
various cases where we erroneously used an input fd to use an output fd
instead.
Numerous fixes:
1. use vtnr_from_tty() to parse out VT number from tty path
2. open tty for write only when we want to output just ansi sequences
3. open tty in asynchronous mode, and apply a timeout, just to be safe
4. propagate error from writing (most callers ignore it anyway, might as
well pass it along correctly)
This is a lot of stuff, and sometimes quite wild, let's turn this into
its own header.
All stuff color-related that just generates sequences is now in
ansi-color.h (no .c file!), and everything more complex that
probes/ineracts with terminals remains in termina-util.[ch]
Let's update the commentary a bit. Also, use a time-out of 100ms rather
than 50ms for this, simply to unify on the same value used in
vt_disallocate() in a similar case.
Let's put "terminal_" as prefix, like with the other reset calls, and
let's make clear that this only encapsulates the ioctl-based reset
logic, not the ANSI sequence based reset logic.
ESC c is a (vaguely defined) "reset to initial state" ANSI sequence.
Many terminals clear the screen in this case, but that's a bit drastic I
think for most resets.
ESC c was added to the reset logic in
00bc83a275 (i.e. very recently), and I
don't think the effect was clear at that time.
Let's keep the ESC c in place however when we actually want to clear the
screen. Hence move it from reset_terminal_fd() into vt_disallocate().
Fixes: #33689
When we are talking to a serial terminal quite commonly the dimensions
are not set properly, because the serial protocol has not handshake or
similar to transfer this information.
However, we can derive the dimensions via ANSI sequences too, which
should get us the right information, since ANSI sequences are
interpreted by the final terminal, rather than an intermediary local tty
driver (which is where TIOCGWINSZ is interpreted).
This adds a helper call that gets the dimensions this way.
Let's prefix these functions with the subsystem name, and clean them up
a bit. Specifically, drop the error logging, it's entirely duplicative,
since every single caller does it anyway.
Let's add an extra safety check: before issuing the ansi sequence to
query the bg color, let's make sure input and output fd actually
reference the same tty. because otherwise it's unlikely we'll be able to
read back the response from the tty driver.
This is mostly just paranoia.
If we only read partial information from the tty we ended up parsing it
again and again, confusing the state machine. hence, return how much
data we actually processed and drop it from the buffer.
The tap network device should be called "vt-", so that that the
80-vm-vt.network file we ship by default actually matches against it.
Also, turn off any qemu callout stuff, networkd is smart enough to
handle all this on its own, without ugly callouts.
Make the warning for oneshot services (where RuntimeMaxSec= has no
effect) more actionable by pointing to the directive people can use
instead to effectively limit their runtime.
polkitd by default just waves through requests from a root process.
A new POLKIT_CHECK_AUTHORIZATION_FLAGS_ALWAYS_CHECK flag was added
to main (will be part of v125 when it ships) that forces it to go
through the policy checks for root too. Previous versions will just
ignore it.
Change the flags handling slightly so that we pass this or the
interactive flags through, as the values match what polkit expects.
When we patch in a bg color we must make sure that when certain "reset"
sequences are transferred we fix up the bg color again.
Do so for \033[!p ("soft terminal reset") and \033c ("reset to initial
state" aka "full reset").
They might not be readable to the unprivileged user running the tests
and it shouldn't really matter what is used. OTOH, we need a real kernel
because we look at the header.
CentOS Stream 10 has a newer util-linux which means the terminal
gets correctly resized to the size specified by mkosi. This is a
much nicer experience than CentOS Stream 9 where you're stuck on
80x24 so let's make CentOS Stream 10 the default release to build.
Let's document in detail how to build the integration test image and run
the integration tests without building systemd. To streamline the process,
we stop automatically using binaries from build/ when invoking mkosi directly
and don't automatically use a tools tree anymore if systemd on the host is too
old. Instead, we document these options in HACKING.md and change the mkosi meson
target to automatically use the current build directory as an extra binary search
path for mkosi.
This is a common case, and nothing noteworthy at all. For example, if we
establish an enumerator for listing all devices tagged by some tag, then
the per-tag dir is not going to exist if there are currently no devices
tagged that way, but that's a really common case, and doesn't really
deserve any mention, not even at debug level.
It has bugged me for a while that we show the exact same welcome message
at boot twice: once in the initrd, and once after the initrd→host
transition. That's very confusing.
Let's change the text a bit, and tone down the initrd message a bit (by
removing the empty line before and after it), because it is the less
relevant one.
In previous commits, we've changed the JSON name mangling logic. This,
of course, will cause breaking changes to occur on anything that relied
on the JSON mangling logic.
This commit fixes those breaking changes by manually forcing the JSON
name back to what it was before.
First, when displaying JSON we convert dashes into underscores. We want
to avoid using dashes in JSON field names in new code, because some
JSON parsers don't support dashes very well.
Second, we make the first character of every word lower-case. This
better matches our JSON field name style, and makes the automatic
JSON name mangling a lot more useful for vertical tables, where fields
are given a display name. For example, "Foo Bar" would be converted into
"foo_bar" instead of "Foo_Bar", which much better matches our style.
We don't make the whole string lowercase to support cases like:
"fooBar" should stay as "fooBar".
Some situations don't behave quite perfectly, such as "Foo BarBaz" gets
converted into "foo_barBaz", or all-caps headings get mangled
incorrectly. In these situations, the JSON field should be overridden
manually. In most cases, or at least more cases than before, this
heuristic does good enough.
Lets you conveniently set JSON field names in table_add_many. Especially
useful for vertical tables. For example:
table_add_many(t,
TABLE_FIELD, "Display Name",
TABLE_STRING, obj->display_name,
TABLE_SET_JSON_FIELD_NAME, "displayName",
TABLE_FIELD, "Timestamp",
TABLE_TIMESTAMP, obj->timestamp,
TABLE_SET_JSON_FIELD_NAME, "timestampUSec");
We already have selinux=0 in the default kernel command line so
enforcing=0 is redundant. Instead, pass in enforcing=0 when we
enable selinux in TEST-06-SELINUX.
Previously, unit_freezer_new_freeze() would only return
UnitFreezer object if FreezeUnit() succeeds. This is not
ideal though, as a failed bus call doesn't mean the action
actually failed. E.g. a timeout might occur because pid1
is waiting for cgroup event from kernel, while the bus call
timeout was exceeded (#33269). In such a case, ThawUnit()
will never be called, resulting in frozen units remain that
way after resuming from sleep.
Therefore, let's get rid of unit_freezer_new_freeze(),
and make sure as long as unit freezer is involved, we'll
call ThawUnit() when we're done. This should make things
a lot more robust.
As per DPS the UUID for /var/ should be keyed by the local machine-id,
which is non-trivial to do in a script. Enhance 'systemd-id128' to
take 'var-partition-uuid' as a verb, and if so perform the
calculation.
In some cases userspace may need to create dmabuffers from userspace
on such example is the software ISP part of libcamera which needs to
allocate dma-buffers for the output of the software ISP.
At first the plan was to allow console users access to /dev/dma_heap/*,
this was discussed with various kernel folks here:
https://lore.kernel.org/all/bb372250-e8b8-4458-bc99-dd8365b06991@redhat.com/
Giving console users access to the dma_heap's was deemed a bad idea
because memory allocated this way is not accounted in cgroup limits.
Giving access to /dev/udmabuf OTOH was deemed acceptable so that
is what this patch adds.
Resolves: #32662
Prompted by #33737
The intention of b37e8184a5
is to expose sd_id128_get_app_specific() on command line.
But combining that with GPT type list makes little sense.
Same as the other aliases. Allows chaining commands like:
$ systemd-id128 show -P root-$(dpkg-architecture --query DEB_HOST_ARCH)
4f68bce3e8cd4db196e7fbcaf984b709
Let's make things a little more consistent and build the initrd
explicitly as a subimage as well instead of relying on mkosi building
it as part of the main image build.
We drop the opensuse initrd postinst script as we don't use erofs by
default anymore. We can always reintroduce it again later if needed.
Our usual rule is that we are more lenient towards misuse for public
users of our code than for ourselves. Or in other words: when validating
parameters of our public functions (those starting with sd_…) we prefer
assert_ret() over assert().
In C23 we can explicitly choose the integer type for an enum. Let's do
so to make our requirements for 64bit integers explicitly. Previously,
we'd rely on a GNU extension that would size the enum to 64bit if at
least one value outside the 32bit range is in the enum. Let's keep that
too, for compat with older compilers.
(Also, add the support for older compilers to the definition of
sd_json_dispatch_flags_t, where it was forgotten so far)
It's time. sd-json was already done earlier in this cycle, let's now
make sd-varlink public too.
This is mostly just a search/replace job of epical proportions.
I left some functions internal (mostly IDL handling), and I turned some
static inline calls into regular calls.
Let's make sure we don't load libnss_systemd.so from bash as the
necessary environment variables aren't set to make that work when
we're running with sanitizers enabled.
We can't add a sanitizer wrapper for bash as the wrapper runs using
bash so you end up in a loop.
We use -fdebug-prefix-map= because debugedit doesn't work for us (for
a currently unknown reason since it's the most obtuse code I've ever
had the pleasure of reading). With all the unique macros enabled, the
destination directory we pass to -fdebug-prefix-map= includes the package
release. The release is either the timestamp of the current commit or
the current time if the working tree is dirty. This means it generally
changes every time we rerun the build script. However, meson only reads
compiler arguments the first time it is invoked or if --wipe is specified.
This means that on a rerun -fdebug-prefix-map= will be configured wrong
and the build will fail.
Let's prevent this from happening by disabling the unique debug source
names by overriding the --unique-debug-src-base option that is passed to
find-debuginfo.sh by rpm via the _find_debuginfo_opts macro.
We switch to the c10s-sig-hyperscale branch of the spec repository
as it will receive all the latest changes the earliest before they
end up in the c9s-sig-hyperscale branch.
This allows us to add CI for CentOS Stream 10 as EPEL 10 doesn't
exist yet and won't exist for quite some time.
CentOS Stream 10 will be enabled later as soon as
https://issues.redhat.com/browse/RHEL-46604 is resolved.
We want the exitrd image to be built with the latest systemd as well.
As the exitrd image is built as part of mkosi.images, and all subimages
are built before the main image, this implies the packages must be built
as a subimage in mkosi.images/ as well. So we introduce the build image and
move all logic related to building distribution packages there.
This also has the nice side effect of slimming down the main image as the
build dependencies are not installed into the main image anymore. It also
makes sure the packages are built in a "clean" chroot without any of the
other packages which we install in the main image available.
* a3524fc837 Use a more precise Recommends for libkxbcommon
* 980ede8c0f Drop machined revert
* d569018a92 Rebuilt for the bin-sbin merge
* 8881fa94ee Version 256.2
* 1cc4f83002 Link systemd-executor statically
* 0319e62d9c Update dracut workaround
* c96f54de22 Fix ELN build
* 3f68c5d802 Only exclude dracut conflicts on non-fedora on upstream builds
* 7db154308b Conditionalize dracut Conflicts more
This reverts commit 1bd5db86f5.
The `kxcjk-1013` driver in Linux will parse the rotation matrix
from ACPI. This quirk is not specific enough to exist without
causing issues on different variations.
Signed-off-by: Sean Rhodes <sean@starlabs.systems>
This is an analog of x-systemd.requires that adds a Wants dependency
instead. This is useful for filesystems that support mounting in
degraded states (such as multi-device filesystems).
When boot counting is enabled, adding a new loader entry or UKI can conflict
with an existing one that has booted successfully and therefore has its boot
counter removed. systemd-bless-boot will fail to bless the new successful boot,
since a file without a boot counter already exists. Since kernel-install will
clobber existing files without boot counting, we should therefore remove files
without a boot count as well, when we add a file with one.
Fixes: #33504
Prompted by #33650
Previously, if a user manually starts user@.service (which is
something we support), we'd track it as 'manager' session.
However, since user_get_state() ignores all non-pinning sessions,
if lingering is not enabled, the user state would always be
reported as 'closing', which is spurious.
Let's instead take gc_mode into consideration, and ignore
non-pinning sessions only if USER_GC_BY_PIN.
Follow-up for 19a44dfe45
If a drop-in is set from upper level, e.g. global unit_type.d/,
even if a unit is masked, its dropin_paths would still be partially
populated. However, unit_need_daemon_reload() would always
compare u->dropin_paths with empty strv in case of masked units,
resulting in it always returning true. Instead, let's ignore
dropins entirely here.
Fixes#33672
Makes it possible to specify URLs to a changelog and an appstream
catalog XML in the sysupdate.d/*.conf files. This will be passed along
to the clients of systemd-sysupdated, which can then present this data.
This prevents sysupdate from going out to the network to enumerate
available instances. When combined with the list command, this lets us
query installed instances
We set up a NOTIFY_SOCKET to get download progress notifications from
each individual import helper. Along with the number of import jobs we
have to run, this gives an overall progress value which we report using
sd_notify
We use this at various places, let's unify this in one global constant.
This changes flags in crash-handler.c in a tiny irrelevant way: we ask
syscalls to be continued on signal arrival, which we previously didn't.
But that shouldn't change anything, the only thing we'll do in the
relevant process is call raise(), and that's it, hence there definitely
are no syscalls to restart or not to restart.
Stream sockets are stream sockets, i.e. they won#t give us the full data
right-away, we must buffer locally and read until we hit EOF. Hence do
so.
moreover, make sure to close the fd once we are done, otherwise the
sender might block on us.
It's usually how we do this: make the functions robust to be called in
any context, and validate the context in the functions themselves early,
instead of in the caller.
We'll *always* hit ENEOENT when iterating through SMBIOS type #11
fields, on the last one. it's very confusing to debug log about that,
let's just not do it.
* 8c025c3bdf Accepting request 1184267 from Base:System
|\
| * 735f8c4ba4 - Import commit 5a8eadd0c021758337a020c423f25a353bdb9b3c (merge of v255.8) For a complete list of changes, visit: 603cd1d4d8...5a8eadd0c0 - Drop 5003-Revert-run-pass-the-pty-slave-fd-to-transient-servic.patch as v255.8 contains the workaround (commit 639c922ede9485) for the broken commit 28459ba1f4.
* | 37853fecc3 Accepting request 1183029 from Base:System
|/
* 638de11012 - Don't automatically clean unmodified config files up (bsc#1226415)
* 369c023c24 reorder one more time...
* ffa9f0ac80 reorder the runtime deps of the testsuite package so the format_spec_file thingy stop screwing up the spec file...
* 12c1190a79 fix rev 1529: the devel packages are really needed by the testsuite script to install the dlopened libs into the image
* ca8e7f54ce - systemd.spec: move a misplaced %endif in the testsuite sub-package.
* b7944f5b14 - Merge systemd-coredump back into the main package (bsc#1091684)
* 3fa0dea84a - Don't pull the devel packages in when installing the testsuite package.
- Stop installing the policy in the initramfs as it's not really
supported anyway (https://github.com/fedora-selinux/selinux-policy/issues/2221)
- Stop relabeling on first boot and prefer to do it at image build time
- Disable mkosi relabeling by default but enable it in CI
- Build image as root in CI so the SELinux relabeling works properly
The "systemd-mount" tool is the one outlier in our codebase to specify
upper case column names. And it's quite pointless given that our table
output logic uppercases this anyway on output. Hence, let's fix that.
(This would be a compat break, if we'd support JSON output of this
table, but we do not currently. JSON fields use the literal column
name after all.)
The mode switch from any to pin is currently done in create_session().
However, if no (pinning) session is created before (or after) linger
is disabled, the user will not be gc'd after that. Therefore, also
perform the mode switch when linger is being disabled.
- Breaks AYANEO AIR family into different entries as not all are mounted the same.
- Corrects AYANEO AIR mount matrix.
- Adds mount matrices for AYANEO device families: 2021, AYANEO 2, AYANEO GEEK, and AYANEO FLIP
- Adds mount matrix for GPD WinMax2
- Adds mount matrix for OrangePi NEO
In https://github.com/systemd/systemd/pull/33659 the commit was
updated to point to my fork without changing it back after the mkosi
PR was merged so let's change it back to point to the official
repository.
In https://github.com/systemd/mkosi/pull/2847, the '@' specifier is
removed, CLI arguments take priority over configuration files again
and the "main" image is defined at the top level instead of in
mkosi.images/. Additionally, not every setting from the top level
configuration is inherited by the images in mkosi.images/ anymore,
only settings which make sense to be inherited are inherited.
This commit gets rid of all the usages of '@', moves the "main" image
configuration from mkosi.images/system to the top level and gets rid
of various hacks we had in place to deal with quirks of the old
configuration parsing logic.
We also remove usages of Images= and --append as these options are
removed by the mentioned PR.
All mips variants of qemu-system default to malta.
Signed-off-by: Henry Chen <henry.chen@oss.cipunited.com>
Signed-off-by: Henry Chen <chenx97@aosc.io>
This update has been tested on the 2023 Chuwi Freebook N100. The hwdb entry has been verified using these commands:
cat /sys/`udevadm info -q path -n /dev/iio:device0`/../modalias
acpi:MDA6655:MDA6655:
cat /sys/class/dmi/id/modalias
dmi:bvnAmericanMegatrendsInternational,LLC.:bvrDNN20AV1.03:bd12/29/2023:br1.3:efr0.7:svnCHUWIInnovationAndTechnology(ShenZhen)co.,Ltd:pnFreeBook:pvrDefaultstring:rvnDefaultstring:rnDefaultstring:rvrDefaultstring:cvnDefaultstring:ct10:cvrDefaultstring:skuDefaultstring:
The correct offset orientation has been tested with:
monitor-sensor
Waiting for iio-sensor-proxy to appear
+++ iio-sensor-proxy appeared
=== Has accelerometer (orientation: normal)
=== No ambient light sensor
=== No proximity sensor
When watching a given pathspec, systemd unconditionally installs
IN_ATTRIB watches to track the link count of the resolved file. This
way, we are notified if the watched path disappears, even if the
resolved file inode is not removed.
Similarly, systemd installs inotify watches on each parent directory, to
be notified when the specified path appears. However, for these watches
IN_ATTRIB is an unnecessary addition to the mask. In inotify, IN_ATTRIB
on a directory is emitted whenever the attributes of any child changes,
which, for many paths, has the potential to cause a high number of
spurious wakeups in systemd. Let's remove IN_ATTRIB from the mask when
installing watches on the parent directories of the specified path.
22 characters in three colums + overhead slightly exceeds the available
width on terminals with 80 columns, causing each row to wrap to two lines.
Reduce the item width to 20 to fit even the list of ~600 timezones.
Let's not insist on btrfs everywhere. 93440db8b5
switched us back to btrfs as we wanted to rely on the fact it records
timestamps properly. Since we now prefer to do incremental builds on the host
with "mkosi -t none" we don't mind anymore that timestamps are not recorded
properly so we're not forced to use btrfs anymore.
This also increases test coverage as we'll now test with different root
filesystems.
* Remove extra period at end of unit description.
Having an extra period at the end of this unit description makes log entries pertaining to it appear weirdly, as it seems the default expectation is that there is not to be a period at the end of a unit description.
e.g.: `systemd[1]: Started Displays emergency message in full screen..`
I don't know why yet, but TEST-73-LOCALE can take more than 10
minutes. Until we figure out why, let's give it a higher priority
so it doesn't bottleneck the test run.
Otherwise fixfiles will try to relabel it which could potentially
lead to disaster. We also change the recommendation in HACKING.md
to set the default so that TEST-06-SELINUX can override it.
This ensures that even in case the distro repository has newer
versions, the locally built packages are preferred and installed,
even to the point of downgrading already installed ones.
This is needed especially for future stable branches, when the
distros will have a newer version.
* stub: mem fixes in devicetree addon handling
Two bugs here: The elements are of size `DevicetreeAddon`, not `size_t`,
and `[]` binds stronger than `*`. This means the first element is ok,
but the second corrupts the stack.
Found this while refactoring #32463
An smbios object with no variable part is a special case, it's just
suffixed with two NUL btes. handle that properly.
This is inspired by a similar fix from https://github.com/systemd/systemd/pull/29726
Follow up for 8b3b01c4b7
We switch to PROJECT_VERSION instead of PROJECT_VERSION_FULL where
we report our version and which is likely being parsed to avoid
breaking compat. If we didn't, the output would change from systemd
255 to systemd 255.1 which could break various tools.
If the io.systemd.DynamicUser or io.systemd.Machine files exist,
but nothing is listening on them, the nss-systemd module returns
ECONNREFUSED and systemd-sysusers fails to creat the user/group.
This is problematic when ran by packaging scripts, as the package
assumes that after this has run, the user/group exist and can
be used. adduser does not fail in the same situation.
Change sysusers to print a loud warning but otherwise continue
when NSS returns an error.
The XDG base dir spec adopted ~/.local/state/ as a thing a while back,
and we updated our docs in b4d6bc63e6, but
forgot to to update the table at the bottom to fully reflect the update.
Fix that.
This file doesn't document features of systemd, but is more a of a
general description that generalizes/modernizes FHS. As such, the items
listed in it weren't "added" in systemd versions, they simply reflect
general concepts independent of any specific systemd version. hence
let's drop this misleading and confusing version info.
Or in other words, the man page currently claims under "/usr/": "Added
in version 215." – Which of course is rubbish, the directory existed
since time began.
This also rebreaks all paragaphs this touches.
No content changes.
The previous commit tries to extract a substring from the
extension-release suffix, but that is not right, it's only the
images that need to be versioned and extracted, use the extension-release
suffix as-is. Otherwise if it happens to contain a prefix that
matches the wrong image, it will be taken into account.
Follow-up for 37543971af
The GIT_VERSION is changed to use VERSION_TAG, but in case of cross build
for src/boot/efi, it's not set, causing build error because the compiler cannot
know it's a macro thus treating it as some variable and error out.
Signed-off-by: Chen Qi <Qi.Chen@windriver.com>
For MountImages=, if the source is a block device, it will most likely reside
in /dev. It should be also possible to mount a static device file system in
place of (or part of) /dev. So let's allow paths starting with /dev as an
exception for MountImages=.
Let's mention the new way to install the latest changes without
rebuilding the image. Let's also remove the duplicate info about
distribution packages that is already mentioned in its own section.
These are not actually needed or installed, so delete them from the
build directory, so that inside an image one can do:
apt install --reinstall /work/build/*.deb
Follow-up for 690a85b1d4
The new link-executor-shared option is similar to the existing
link-udev-shared: when set to false, we link to the static versions of our
internal libraries.
The resulting exuctor binary is fairly large, about as large as libsystemd-core
(14 MB without lto, 8 with lto).
This is intended as a workaround for the fuckup with the pinned executor
binary:
when an upgrade is performed, the package manager will install new version of
the libraries and new version of the code, and some time later reexecute the
managers. This creates a window when the pinned executor binary will fail to
execute. There are two factors which make the issue easier to hit:
- when the distribution uses a finely-grained shared-lib-tag. E.g. Fedora
uses version-release as the tag, which means that the issue occurs on
every package upgrade. This is the right thing to do, because the
ABI of our internal libraries is not stable at all, so replacing the
library from a different version in place creates a window where our
programs may crash or misbehave.
- when the distribution doesn't immediately reexec all the managers after
upgrade. In early versions of systemd, we used to hammer the machine during
upgrade, doing daemon-reexecs repeatedly. This works, but is ugly and
wasteful. Doing the reexecs while the upgrade is in progres also creates a
window where a mix of old and new configs or both is loaded. Users are
particularly annoyed by those reloads if there is some issue in the
configuration causing us to emit warnings on every reexec. Doing the
reexecs once after the new configuration and libraries have been put
in place is nicer.
The pinning of the executor binary breaks upgrades and in particular
it penalizes the distributions which make use of the features which
were previously added to avoid bugs and inefficiency during upgrades.
When the executor is linked statically, there is a smaller chance that it'll
fail to load libraries. The issue can still occur because other libraries, not
our own, are linked dynamically.
By itself, this is not useful. I'm making this a separate commit to
make debugging easier. It turns out that meson does static libraries
using references, so the "static library" a tiny stub stub that refers
to the object files on disk and this has negligible cost:
$ ls -lhd build/src/core/libsystemd-core-257.{a,so}
-rw-r--r-- 1 zbyszek zbyszek 36K Jul 3 16:54 build/src/core/libsystemd-core-257.a
-rwxr-xr-x 1 zbyszek zbyszek 6.1M Jul 3 16:54 build/src/core/libsystemd-core-257.so
Our variables for internal libraries are named 'libfoo' for the shared lib
variant, and 'libfoo_static' for the static lib variant. The only exception was
libbasic, because we didn't have a shared variant for it. But let's rename it
for consitency. This makes the build config easier to understand.
Previously, the order was quite chaotic, even sometimes interleaved with
entirely unrelated switches. Let's clean this up and use the same order
as in the spec.
This doesn't change anything real, but I think it's a worthy clean-up in
particular as this order is documented as the PCR measurement order of
these sections, hence there's actually a bit of relevance to always
communicate the same order everywhere.
The patch is originally from Brenton Simpson, I (Lennart) just added some
comments and rebased it.
I didn't test this, but the patch looks so obviously right to me, that
I think we should just merge it, instead of delaying this further. In
the worst case noone notices, in the best case this makes sd-boot work
reasonably nicely on devices that only have a hadware power key + volume
rocker.
Fixes: #30598
Replaces: #31135
At this point we have a clearer model:
* systemd-measure should be used for measuring UKIs on vendor build
systems, i.e. only cover stuff predictable by the OS vendor, and
identical on all systems. And that is pretty much only PCR 11.
* systemd-pcrlock should cover the other PCRs, which carry inherently
local information, and can only be predicted locally and not already
on vendor build systems.
Because of that, let's not bother with any PCRs except for 11 in
systemd-measure. This was added at a time where systemd-pcrlock didn't
exist yet, and hence it wasn't clear how this will play out in the end.
Let's simplify the code a bit, and parse Type 2 entries in a function of
its own, separate from the directory enumeration.
This closely follows a similar split we did a long time ago for Type 1.
This is just refactoring, no real code change.
While we write data to this parameter, it's not really a return
parameter, we after all do not fully set it, we just fill in some
fields. Hence it must be initialized beforehand.
According to our coding style only parameters that are purely used for
returning something should be named "ret_xyz", hence this one should not
be.
(We'll later rely on the current behaviour that it leaves array entries
for which we find no sections untouched, hence leave behaviour as is,
just rename the parameters to something more appropriate).
(Since we are dropping the "ret_" prefix of "ret_sections", let's rename
the old "section" parameter at the same time to "section_names", to make
clearer what it is about).
With the latest mkosi, mkosi -t none can be used to rerun the build
script without messing with a previously built image. This allows
one to run "mkosi -t disk -f qemu" in one terminal to build and boot
an image in qemu and then run "mkosi -t none" in another terminal to
rebuild the packages. If one then has "RuntimeBuildSources=yes" set
in their mkosi configuration, the build directory is mounted into the
virtual machine, which means that one can then run "dnf upgrade
/work/build/*.rpm" from within the VM to install the new packages.
This allows for quickly iterating on changes without having to rebuild
the image all the time.
We'll probably want to document this at some point, but let's start
with making it possible by copying the built packages to the build directory.
Currently if git merge-base fails we'll hide the error and exit with
exit status 0. Let's make we only exit early if git merge-base exits
with 1 which indicates the current commit is not on the target branch.
Any other error is considered fatal.
Two very similar devices, with two functions - a regular camera and IR.
The peculiarity of their infrared camera is that it uses a color image
format (YUYV), although it is essentially black and white.
The IR camera interface differs from the regular camera interface by name:
"HP Wide Vision FHD Camera: HP W" for the regular camera and
"HP Wide Vision FHD Camera: HP I" for an infrared camera
Therefore, glob *I is used to separate the IR camera
* f9fe17dbde Use vmlinux.h from kernel-devel
* 9cbad936a6 Pull in openssl-devel-engine
* 8ae009f929 Only add Requires on python3-zstd on Fedora
* 750e910c7c Drop BuildRequires on python3-zstd
``repart.d/\*.conf`` files describe basic properties of partitions of block
devices of the local system. They may be used to declare types, names and sizes of partitions that shall
exist. The
:ref:`systemd-repart(8)`
service reads these files and attempts to add new partitions currently missing and enlarge existing
partitions according to these definitions. Operation is generally incremental, i.e. when applied, what
exists already is left intact, and partitions are never shrunk, moved or deleted.
These definition files are useful for implementing operating system images that are prepared and
delivered with minimally sized images (for example lacking any state or swap partitions), and which on
first boot automatically take possession of any remaining disk space following a few basic rules.
Currently, support for partition definition files is only implemented for GPT partition
tables.
Partition files are generally matched against any partitions already existing on disk in a simple
algorithm: the partition files are sorted by their filename (ignoring the directory prefix), and then
compared in order against existing partitions matching the same partition type UUID. Specifically, the
first existing partition with a specific partition type UUID is assigned the first definition file with
the same partition type UUID, and the second existing partition with a specific type UUID the second
partition file with the same type UUID, and so on. Any left-over partition files that have no matching
existing partition are assumed to define new partition that shall be created. Such partitions are
appended to the end of the partition table, in the order defined by their names utilizing the first
partition slot greater than the highest slot number currently in use. Any existing partitions that have
no matching partition file are left as they are.
Note that these definitions may only be used to create and initialize new partitions or to grow
existing ones. In the latter case it will not grow the contained files systems however; separate
mechanisms, such as
:ref:`systemd-growfs(8)` may be
used to grow the file systems inside of these partitions. Partitions may also be marked for automatic
growing via the ``GrowFileSystem=`` setting, in which case the file system is grown on
first mount by tools that respect this flag. See below for details.
[Partition] Section Options
===========================
``Type=``
---------
The GPT partition type UUID to match. This may be a GPT partition type UUID such as
``4f68bce3-e8cd-4db1-96e7-fbcaf984b709``, or an identifier.
Architecture specific partition types can use one of these architecture identifiers:
``alpha``, ``arc``, ``arm`` (32-bit),
``arm64`` (64-bit, aka aarch64), ``ia64``,
``loongarch64``, ``mips-le``, ``mips64-le``,
``parisc``, ``ppc``, ``ppc64``,
``ppc64-le``, ``riscv32``, ``riscv64``,
``s390``, ``s390x``, ``tilegx``,
``x86`` (32-bit, aka i386) and ``x86-64`` (64-bit, aka amd64).
The supported identifiers are:
..list-table:: GPT partition type identifiers
:header-rows:1
* - Identifier
- Explanation
* - ``esp``
- EFI System Partition
* - ``xbootldr``
- Extended Boot Loader Partition
* - ``swap``
- Swap partition
* - ``home``
- Home (``/home/``) partition
* - ``srv``
- Server data (``/srv/``) partition
* - ``var``
- Variable data (``/var/``) partition
* - ``tmp``
- Temporary data (``/var/tmp/``) partition
* - ``linux-generic``
- Generic Linux file system partition
* - ``root``
- Root file system partition type appropriate for the local architecture (an alias for an architecture root file system partition type listed below, e.g. ``root-x86-64``)
* - ``root-verity``
- Verity data for the root file system partition for the local architecture
* - ``root-verity-sig``
- Verity signature data for the root file system partition for the local architecture
* - ``root-secondary``
- Root file system partition of the secondary architecture of the local architecture (usually the matching 32-bit architecture for the local 64-bit architecture)
* - ``root-secondary-verity``
- Verity data for the root file system partition of the secondary architecture
* - ``root-secondary-verity-sig``
- Verity signature data for the root file system partition of the secondary architecture
* - ``root-{arch}``
- Root file system partition of the given architecture (such as ``root-x86-64`` or ``root-riscv64``)
* - ``root-{arch}-verity``
- Verity data for the root file system partition of the given architecture
* - ``root-{arch}-verity-sig``
- Verity signature data for the root file system partition of the given architecture
* - ``usr``
- ``/usr/`` file system partition type appropriate for the local architecture (an alias for an architecture ``/usr/`` file system partition type listed below, e.g. ``usr-x86-64``)
* - ``usr-verity``
- Verity data for the ``/usr/`` file system partition for the local architecture
* - ``usr-verity-sig``
- Verity signature data for the ``/usr/`` file system partition for the local architecture
* - ``usr-secondary``
- ``/usr/`` file system partition of the secondary architecture of the local architecture (usually the matching 32-bit architecture for the local 64-bit architecture)
* - ``usr-secondary-verity``
- Verity data for the ``/usr/`` file system partition of the secondary architecture
* - ``usr-secondary-verity-sig``
- Verity signature data for the ``/usr/`` file system partition of the secondary architecture
* - ``usr-{arch}``
- ``/usr/`` file system partition of the given architecture
* - ``usr-{arch}-verity``
- Verity data for the ``/usr/`` file system partition of the given architecture
* - ``usr-{arch}-verity-sig``
- Verity signature data for the ``/usr/`` file system partition of the given architecture
This setting defaults to ``linux-generic``.
Most of the partition type UUIDs listed above are defined in the `Discoverable Partitions
The textual label to assign to the partition if none is assigned yet. Note that this
setting is not used for matching. It is also not used when a label is already set for an existing
partition. It is thus only used when a partition is newly created or when an existing one had a no
label set (that is: an empty label). If not specified a label derived from the partition type is
automatically used. Simple specifier expansion is supported, see below.
..only:: html
..versionadded:: 245
``UUID=``
---------
The UUID to assign to the partition if none is assigned yet. Note that this
setting is not used for matching. It is also not used when a UUID is already set for an existing
partition. It is thus only used when a partition is newly created or when an existing one had a
all-zero UUID set. If set to "null", the UUID is set to all zeroes. If not specified
a UUID derived from the partition type is automatically used.
..only:: html
..versionadded:: 246
``Priority=``
-------------
A numeric priority to assign to this partition, in the range -2147483648…2147483647,
with smaller values indicating higher priority, and higher values indicating smaller priority. This
priority is used in case the configured size constraints on the defined partitions do not permit
fitting all partitions onto the available disk space. If the partitions do not fit, the highest
numeric partition priority of all defined partitions is determined, and all defined partitions with
this priority are removed from the list of new partitions to create (which may be multiple, if the
same priority is used for multiple partitions). The fitting algorithm is then tried again. If the
partitions still do not fit, the now highest numeric partition priority is determined, and the
matching partitions removed too, and so on. Partitions of a priority of 0 or lower are never
removed. If all partitions with a priority above 0 are removed and the partitions still do not fit on
the device the operation fails. Note that this priority has no effect on ordering partitions, for
that use the alphabetical order of the filenames of the partition definition files. Defaults to
0.
..only:: html
..versionadded:: 245
``Weight=``
-----------
A numeric weight to assign to this partition in the range 0…1000000. Available disk
space is assigned the defined partitions according to their relative weights (subject to the size
constraints configured with ``SizeMinBytes=``, ``SizeMaxBytes=``), so
that a partition with weight 2000 gets double the space as one with weight 1000, and a partition with
weight 333 a third of that. Defaults to 1000.
The ``Weight=`` setting is used to distribute available disk space in an
"elastic" fashion, based on the disk size and existing partitions. If a partition shall have a fixed
size use both ``SizeMinBytes=`` and ``SizeMaxBytes=`` with the same
value in order to fixate the size to one value, in which case the weight has no
effect.
..only:: html
..versionadded:: 245
``PaddingWeight=``
------------------
Similar to ``Weight=``, but sets a weight for the free space after the
partition (the "padding"). When distributing available space the weights of all partitions and all
defined padding is summed, and then each partition and padding gets the fraction defined by its
weight. Defaults to 0, i.e. by default no padding is applied.
Padding is useful if empty space shall be left for later additions or a safety margin at the
end of the device or between partitions.
..only:: html
..versionadded:: 245
``SizeMinBytes=, SizeMaxBytes=``
--------------------------------
Specifies minimum and maximum size constraints in bytes. Takes the usual K, M, G, T,
… suffixes (to the base of 1024). If ``SizeMinBytes=`` is specified the partition is
created at or grown to at least the specified size. If ``SizeMaxBytes=`` is specified
the partition is created at or grown to at most the specified size. The precise size is determined
through the weight value configured with ``Weight=``, see above. When
``SizeMinBytes=`` is set equal to ``SizeMaxBytes=`` the configured
weight has no effect as the partition is explicitly sized to the specified fixed value. Note that
partitions are never created smaller than 4096 bytes, and since partitions are never shrunk the
previous size of the partition (in case the partition already exists) is also enforced as lower bound
for the new size. The values should be specified as multiples of 4096 bytes, and are rounded upwards
(in case of ``SizeMinBytes=``) or downwards (in case of
``SizeMaxBytes=``) otherwise. If the backing device does not provide enough space to
fulfill the constraints placing the partition will fail. For partitions that shall be created,
depending on the setting of ``Priority=`` (see above) the partition might be dropped
and the placing algorithm restarted. By default a minimum size constraint of 10M and no maximum size
constraint is set.
..only:: html
..versionadded:: 245
``PaddingMinBytes=, PaddingMaxBytes=``
--------------------------------------
Specifies minimum and maximum size constraints in bytes for the free space after the
partition (the "padding"). Semantics are similar to ``SizeMinBytes=`` and
``SizeMaxBytes=``, except that unlike partition sizes free space can be shrunk and can
be as small as zero. By default no size constraints on padding are set, so that only
``PaddingWeight=`` determines the size of the padding applied.
..only:: html
..versionadded:: 245
``CopyBlocks=``
---------------
Takes a path to a regular file, block device node, char device node or directory, or
the special value "auto". If specified and the partition is newly created, the data
from the specified path is written to the newly created partition, on the block level. If a directory
is specified, the backing block device of the file system the directory is on is determined, and the
data read directly from that. This option is useful to efficiently replicate existing file systems
onto new partitions on the block level — for example to build a simple OS installer or an OS image
builder. Specify ``/dev/urandom`` as value to initialize a partition with random
data.
If the special value "auto" is specified, the source to copy from is
automatically picked up from the running system (or the image specified with
``--image=`` — if used). A partition that matches both the configured partition type (as
declared with ``Type=`` described above), and the currently mounted directory
appropriate for that partition type is determined. For example, if the partition type is set to
"root" the partition backing the root directory (``/``) is used as
source to copy from — if its partition type is set to "root" as well. If the
declared type is "usr" the partition backing ``/usr/`` is used as
source to copy blocks from — if its partition type is set to "usr" too. The logic is
capable of automatically tracking down the backing partitions for encrypted and Verity-enabled
volumes. "CopyBlocks=auto" is useful for implementing "self-replicating" systems,
i.e. systems that are their own installer.
The file specified here must have a size that is a multiple of the basic block size 512 and not
be empty. If this option is used, the size allocation algorithm is slightly altered: the partition is
created at least as big as required to fit the data in, i.e. the data size is an additional minimum
size value taken into consideration for the allocation algorithm, similar to and in addition to the
``SizeMin=`` value configured above.
This option has no effect if the partition it is declared for already exists, i.e. existing
data is never overwritten. Note that the data is copied in before the partition table is updated,
i.e. before the partition actually is persistently created. This provides robustness: it is
guaranteed that the partition either doesn't exist or exists fully populated; it is not possible that
the partition exists but is not or only partially populated.
This option cannot be combined with ``Format=`` or
``CopyFiles=``.
..only:: html
..versionadded:: 246
``Format=``
-----------
Takes a file system name, such as "ext4", "btrfs",
"xfs", "vfat", "erofs",
"squashfs" or the special value "swap". If specified and the partition
is newly created it is formatted with the specified file system (or as swap device). The file system
UUID and label are automatically derived from the partition UUID and label. If this option is used,
the size allocation algorithm is slightly altered: the partition is created at least as big as
required for the minimal file system of the specified type (or 4KiB if the minimal size is not
known).
This option has no effect if the partition already exists.
Similarly to the behaviour of ``CopyBlocks=``, the file system is formatted
before the partition is created, ensuring that the partition only ever exists with a fully
initialized file system.
This option cannot be combined with ``CopyBlocks=``.
..only:: html
..versionadded:: 247
``CopyFiles=``
--------------
Takes a pair of colon separated absolute file system paths. The first path refers to
a source file or directory on the host, the second path refers to a target in the file system of the
newly created partition and formatted file system. This setting may be used to copy files or
directories from the host into the file system that is created due to the ``Format=``
option. If ``CopyFiles=`` is used without ``Format=`` specified
explicitly, "Format=" with a suitable default is implied (currently
"vfat" for "ESP" and "XBOOTLDR" partitions, and
"ext4" otherwise, but this may change in the future). This option may be used
multiple times to copy multiple files or directories from host into the newly formatted file system.
The colon and second path may be omitted in which case the source path is also used as the target
path (relative to the root of the newly created file system). If the source path refers to a
directory it is copied recursively.
This option has no effect if the partition already exists: it cannot be used to copy additional
files into an existing partition, it may only be used to populate a file system created anew.
The copy operation is executed before the file system is registered in the partition table,
thus ensuring that a file system populated this way only ever exists fully initialized.
Note that ``CopyFiles=`` will skip copying files that aren't supported by the
target filesystem (e.g symlinks, fifos, sockets and devices on vfat). When an unsupported file type
is encountered, ``systemd-repart`` will skip copying this file and write a log message
about it.
Note that ``systemd-repart`` does not change the UIDs/GIDs of any copied files
and directories. When running ``systemd-repart`` as an unprivileged user to build an
image of files and directories owned by the same user, you can run ``systemd-repart``
in a user namespace with the current user mapped to the root user to make sure the files and
directories in the image are owned by the root user.
Note that when populating XFS filesystems with ``systemd-repart`` and loop
devices are not available, populating XFS filesystems with files containing spaces, tabs or newlines
might fail on old versions of
:man-pages:`mkfs.xfs(8)`
due to limitations of its protofile format.
Note that when populating XFS filesystems with ``systemd-repart`` and loop
devices are not available, extended attributes will not be copied into generated XFS filesystems
due to limitations :man-pages:`mkfs.xfs(8)`'s
protofile format.
This option cannot be combined with ``CopyBlocks=``.
When
:ref:`systemd-repart(8)` is
invoked with the ``--copy-source=`` command line switch the file paths are taken
relative to the specified directory. If ``--copy-source=`` is not used, but the
``--image=`` or ``--root=`` switches are used, the source paths are taken
relative to the specified root directory or disk image root.
..only:: html
..versionadded:: 247
``ExcludeFiles=, ExcludeFilesTarget=``
--------------------------------------
Takes an absolute file system path referring to a source file or directory on the
host. This setting may be used to exclude files or directories from the host from being copied into
the file system when ``CopyFiles=`` is used. This option may be used multiple times to
exclude multiple files or directories from host from being copied into the newly formatted file
system.
If the path is a directory and ends with "/", only the directory's
contents are excluded but not the directory itself. If the path is a directory and does not end with
"/", both the directory and its contents are excluded.
``ExcludeFilesTarget=`` is like ``ExcludeFiles=`` except that
instead of excluding the path on the host from being copied into the partition, we exclude any files
and directories from being copied into the given path in the partition.
When
:ref:`systemd-repart(8)`
is invoked with the ``--image=`` or ``--root=`` command line switches the
paths specified are taken relative to the specified root directory or disk image root.
..only:: html
..versionadded:: 254
``MakeDirectories=``
--------------------
Takes one or more absolute paths, separated by whitespace, each declaring a directory
to create within the new file system. Behaviour is similar to ``CopyFiles=``, but
instead of copying in a set of files this just creates the specified directories with the default
mode of 0755 owned by the root user and group, plus all their parent directories (with the same
ownership and access mode). To configure directories with different ownership or access mode, use
``CopyFiles=`` and specify a source tree to copy containing appropriately
owned/configured directories. This option may be used more than once to create multiple
directories. When ``CopyFiles=`` and ``MakeDirectories=`` are used
together the former is applied first. If a directory listed already exists no operation is executed
(in particular, the ownership/access mode of the directories is left as is).
The primary use case for this option is to create a minimal set of directories that may be
mounted over by other partitions contained in the same disk image. For example, a disk image where
the root file system is formatted at first boot might want to automatically pre-create
``/usr/`` in it this way, so that the "usr" partition may
over-mount it.
Consider using
:ref:`systemd-tmpfiles(8)`
with its ``--image=`` option to pre-create other, more complex directory hierarchies (as
well as other inodes) with fine-grained control of ownership, access modes and other file
attributes.
..only:: html
..versionadded:: 249
``Subvolumes=``
---------------
Takes one or more absolute paths, separated by whitespace, each declaring a directory
that should be a subvolume within the new file system. This option may be used more than once to
specify multiple directories. Note that this setting does not create the directories themselves, that
can be configured with ``MakeDirectories=`` and ``CopyFiles=``.
Note that this option only takes effect if the target filesystem supports subvolumes, such as
"btrfs".
Note that due to limitations of "mkfs.btrfs", this option is only supported
when running with ``--offline=no``.
..only:: html
..versionadded:: 255
``DefaultSubvolume=``
---------------------
Takes an absolute path specifying the default subvolume within the new filesystem.
Note that this setting does not create the subvolume itself, that can be configured with
``Subvolumes=``.
Note that this option only takes effect if the target filesystem supports subvolumes, such as
"btrfs".
Note that due to limitations of "mkfs.btrfs", this option is only supported
when running with ``--offline=no``.
..only:: html
..versionadded:: 256
``Encrypt=``
------------
Takes one of "off", "key-file",
"tpm2" and "key-file+tpm2" (alternatively, also accepts a boolean
value, which is mapped to "off" when false, and "key-file" when
true). Defaults to "off". If not "off" the partition will be
formatted with a LUKS2 superblock, before the blocks configured with ``CopyBlocks=``
are copied in or the file system configured with ``Format=`` is created.
The LUKS2 UUID is automatically derived from the partition UUID in a stable fashion. If
"key-file" or "key-file+tpm2" is used, a key is added to the LUKS2
superblock, configurable with the ``--key-file=`` option to
``systemd-repart``. If "tpm2" or "key-file+tpm2" is
used, a key is added to the LUKS2 superblock that is enrolled to the local TPM2 chip, as configured
with the ``--tpm2-device=`` and ``--tpm2-pcrs=`` options to
``systemd-repart``.
When used this slightly alters the size allocation logic as the implicit, minimal size limits
of ``Format=`` and ``CopyBlocks=`` are increased by the space necessary
for the LUKS2 superblock (see above).
This option has no effect if the partition already exists.
..only:: html
..versionadded:: 247
``Verity=``
-----------
Takes one of "off", "data",
"hash" or "signature". Defaults to "off". If set
to "off" or "data", the partition is populated with content as
specified by ``CopyBlocks=`` or ``CopyFiles=``. If set to
"hash", the partition will be populated with verity hashes from the matching verity
data partition. If set to "signature", the partition will be populated with a JSON
object containing a signature of the verity root hash of the matching verity hash partition.
A matching verity partition is a partition with the same verity match key (as configured with
``VerityMatchKey=``).
If not explicitly configured, the data partition's UUID will be set to the first 128
bits of the verity root hash. Similarly, if not configured, the hash partition's UUID will be set to
the final 128 bits of the verity root hash. The verity root hash itself will be included in the
output of ``systemd-repart``.
This option has no effect if the partition already exists.
Usage of this option in combination with ``Encrypt=`` is not supported.
For each unique ``VerityMatchKey=`` value, a single verity data partition
("Verity=data") and a single verity hash partition ("Verity=hash")
must be defined.
..only:: html
..versionadded:: 252
``VerityMatchKey=``
-------------------
Takes a short, user-chosen identifier string. This setting is used to find sibling
verity partitions for the current verity partition. See the description for
``Verity=``.
..only:: html
..versionadded:: 252
``VerityDataBlockSizeBytes=``
-----------------------------
Configures the data block size of the generated verity hash partition. Must be between 512 and
4096 bytes and must be a power of 2. Defaults to the sector size if configured explicitly, or the underlying
block device sector size, or 4K if systemd-repart is not operating on a block device.
..only:: html
..versionadded:: 255
``VerityHashBlockSizeBytes=``
-----------------------------
Configures the hash block size of the generated verity hash partition. Must be between 512 and
4096 bytes and must be a power of 2. Defaults to the sector size if configured explicitly, or the underlying
block device sector size, or 4K if systemd-repart is not operating on a block device.
..only:: html
..versionadded:: 255
``FactoryReset=``
-----------------
Takes a boolean argument. If specified the partition is marked for removal during a
factory reset operation. This functionality is useful to implement schemes where images can be reset
into their original state by removing partitions and creating them anew. Defaults to off.
..only:: html
..versionadded:: 245
``Flags=``
----------
Configures the 64-bit GPT partition flags field to set for the partition when creating
it. This option has no effect if the partition already exists. If not specified the flags values is
set to all zeroes, except for the three bits that can also be configured via
``NoAuto=``, ``ReadOnly=`` and ``GrowFileSystem=``; see
below for details on the defaults for these three flags. Specify the flags value in hexadecimal (by
prefixing it with "0x"), binary (prefix "0b") or decimal (no
prefix).
..only:: html
..versionadded:: 249
``NoAuto=, ReadOnly=, GrowFileSystem=``
---------------------------------------
Configures the No-Auto, Read-Only and Grow-File-System partition flags (bit 63, 60
and 59) of the partition table entry, as defined by the `Discoverable Partitions Specification <https://uapi-group.org/specifications/specs/discoverable_partitions_specification>`_. Only
available for partition types supported by the specification. This option is a friendly way to set
bits 63, 60 and 59 of the partition flags value without setting any of the other bits, and may be set
via ``Flags=`` too, see above.
If ``Flags=`` is used in conjunction with one or more of
``NoAuto=``/``ReadOnly=``/``GrowFileSystem=`` the latter
control the value of the relevant flags, i.e. the high-level settings
@ -223,7 +223,7 @@ Use these APIs to register any kind of process workload with systemd to be place
### Reading Accounting Information
Note that there's currently no systemd API to retrieve accounting information from cgroups. For now, if you need to retrieve this information use `/proc/$PID/cgroup` to determine the cgroup path for your process in the `cpuacct` controller (or whichever controller matters to you), and then read the attributes directly from the cgroup tree.
Accounting information is available via the `MemoryCurrent`, `MemoryPeak`, `MemorySwapCurrent`, `MemorySwapPeak`, `MemoryZSwapCurrent`, `MemoryAvailable`, `EffectiveMemoryMax`, `EffectiveMemoryHigh`, `CPUUsageNSec`, `EffectiveCPUs`, `EffectiveMemoryNodes`, `TasksCurrent`, `EffectiveTasksMax`, `IPIngressBytes`, `IPIngressPackets`, `IPEgressBytes`, `IPEgressPackets`, `IOReadBytes`, `IOReadOperations`, `IOWriteBytes`, and `IOWriteOperations` D-Bus properties. To read this and other information directly from the cgroup tree, get the unit's cgroup path (relative to `/sys/fs/cgroup`) from the `ControlGroup` property, by calling [`sd_pid_get_cgroup()`](https://www.freedesktop.org/software/systemd/man/latest/sd_pid_get_cgroup.html), or by parsing `/proc/$PID/cgroup`.
If you want to collect the exit status and other runtime parameters of your transient scope or service unit after the processes in them ended set the `RemainAfterExit` boolean property when creating it. This will has the effect that the unit will stay around even after all processes in it died, in the `SubState="exited"` state. Simply watch for state changes until this state is reached, then read the status details from the various properties you need, and finally terminate the unit via `StopUnit()` on the `Manager` object or `Stop()` on the `Unit` object itself.
@ -247,4 +247,4 @@ Note that scope units created by `machined`'s `CreateMachine()` call have this f
### Example
Please see the [systemd-run sources](http://cgit.freedesktop.org/systemd/systemd/plain/src/run/run.c) for a relatively simple example how to create scope or service units transiently and pass properties to them.
Please see the [systemd-run sources](https://github.com/systemd/systemd/blob/main/src/run/run.c) for a relatively simple example how to create scope or service units transiently and pass properties to them.
**Q: Whenever my service tries to acquire RT scheduling for one of its threads this is refused with EPERM even though my service is running with full privileges. This works fine on my non-systemd system!**
A: By default, systemd places all systemd daemons in their own cgroup in the "cpu" hierarchy. Unfortunately, due to a kernel limitation, this has the effect of disallowing RT entirely for the service. See [My Service Can't Get Realtime!](/MY_SERVICE_CANT_GET_REATLIME) for a longer discussion and what to do about this.
A: By default, systemd places all systemd daemons in their own cgroup in the "cpu" hierarchy. Unfortunately, due to a kernel limitation, this has the effect of disallowing RT entirely for the service. See [My Service Can't Get Realtime!](/MY_SERVICE_CANT_GET_REALTIME) for a longer discussion and what to do about this.
**Q: My service is ordered after `network.target` but at boot it is still called before the network is up. What's going on?**
To make use of this, please install `mkosi` from the [GitHub repository](https://github.com/systemd/mkosi#running-mkosi-from-the-repository).
`mkosi` will build an image for the host distro by default.
First, run `mkosi genkey` to generate a key and certificate to be used for secure boot and verity signing.
After that is done, it is sufficient to type `mkosi` in the systemd project directory to generate a disk image you can boot either in `systemd-nspawn` or in a UEFI-capable VM:
@ -45,71 +44,52 @@ or:
$ mkosi qemu
```
Every time you rerun the `mkosi` command a fresh image is built,
incorporating all current changes you made to the project tree.
By default a directory image is built.
This requires `virtiofsd` to be installed on the host.
To build a disk image instead which does not require `virtiofsd`, add the following to `mkosi.local.conf`:
```conf
[Output]
Format=disk
```
To boot in UEFI mode instead of using QEMU's direct kernel boot, add the following to `mkosi.local.conf`:
By default, the tools from your host system are used to build the image. To have
`mkosi` use the systemd tools from the `build/` directory, add the following to
`mkosi.local.conf`:
```conf
[Host]
QemuFirmware=uefi
ExtraSearchPaths=build/
```
To avoid having to build a new image all the time when iterating on a patch,
add the following to `mkosi.local.conf`:
And if you want `mkosi` to build a tools image and use the tools from there
instead of looking for tools on the host, add the following to
`mkosi.local.conf`:
```conf
[Host]
RuntimeBuildSources=yes
ToolsTree=default
```
After enabling this setting, the source and build directories will be mounted to
`/work/src` and `/work/build` respectively when booting the image as a container
or virtual machine. To build the latest changes and re-install, run
`meson install -C /work/build --only-changed` in the container or virtual machine
and optionally restart the daemon(s) you're working on using
`systemctl restart <units>` or `systemctl daemon-reexec` if you're working on pid1
or `systemctl soft-reboot` to restart everything.
Aside from the image, the `mkosi.output` directory will also be populated with a
set of distribution packages. Assuming you're running the same distribution and
release as the mkosi image, you can install these rpms on your host or test
system as well for any testing or debugging that cannot easily be performed in a
VM or container.
By default, no debuginfo packages are produced. To produce debuginfo packages,
run mkosi with the `WITH_DEBUG` environment variable set to `1`:
Every time you rerun the `mkosi` command a fresh image is built, incorporating
all current changes you made to the project tree. To build the latest changes
and re-install after booting the image, run one of the following commands in
another terminal on your host (choose the right one depending on the
distribution of the container or virtual machine):
@ -42,6 +42,9 @@ If such a lock is taken the operation will fail (but still may be overridden if
The InhibitDelayMaxSec= setting in [logind.conf(5)](http://www.freedesktop.org/software/systemd/man/logind.conf.html) controls the timeout for this. This is intended to be used by applications which need a synchronous way to execute actions before system suspend but shall not be allowed to block suspend indefinitely.
This mode is only available for _sleep_ and _shutdown_ locks.
3. _block-weak_ and _delay-weak_ that work as the non-weak counterparts, but that in addition may be ignored
automatically and silently under certain circumstances, unlike the formers which are always respected.
Inhibitor locks are taken via the Inhibit() D-Bus call on the logind Manager object:
18. [FINAL] Build and upload the documentation (on the -stable branch): `ninja -C build doc-sync`
20. [FINAL] Change the Github Pages branch to the newly created branch (https://github.com/systemd/systemd/settings/pages) and set the 'Custom domain' to 'systemd.io'
21. [FINAL] Update version number in `meson.version` to the devel version of the next release (e.g. from `v256` to `v257~devel`)
# Steps to a Successful Stable Release
1. Backport at least the commits from all PRs tagged with `needs-stable-backport` on Github with `git cherry-pick -x`. Any other commits that fix bugs, change documentation, tests, CI or mkosi can generally be backported as well. Since 256 the stable branches live [here](https://github.com/systemd/systemd/). Stable branches for older releases are available [here](https://github.com/systemd/systemd-stable/). Check each commit to see if it makes sense to backport and check the comments on the PR to see if the author indicated that only specific commits should be backported.
2. Update the version number in `meson.version` (e.g. from `256.2` to `256.3`) (only for 256-stable or newer)
3. Tag the release: `version="v$(cat meson.version)" && git tag -s "${version}" -m "systemd-stable ${version}"` (Fill in the version manually on releases older than 256)
<entry>Persistent cache data of the package. If this directory is flushed, the application should work correctly on next invocation, though possibly slowed down due to the need to rebuild any local cache files. The application must be capable of recreating this directory should it be missing and necessary.</entry>
<entry>The first partition with this type UUID on the same disk as the root partition is mounted to <filename>/var/</filename> — under the condition its partition UUID matches the first 128 bit of the HMAC-SHA256 of the GPT type uuid of this partition keyed by the machine ID of the installation stored in <citerefentry><refentrytitle>machine-id</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</entry>
<entry>The first partition with this type UUID on the same disk as the root partition is mounted
to <filename>/var/</filename> — under the condition its partition UUID matches the first 128 bit
of the HMAC-SHA256 of the GPT type uuid of this partition keyed by the machine ID of the