When mount points are stacked, bind_remount_recursive_with_mountinfo()
uses the existing mount options of the "lower" level mount (ie: the
first one that was mounted on a mount point). But the actual mount
point in use is the "top" one (ie: the last one that was mounted on a
mount point), so in practice if the mount options are different between
the layers, the bottom options are used by mistake on the top mount,
which is not what we want. This is because libmount returns the "bottom"
one first.
If the hashmap returns EEXIST, which means the same key (path) with different
value (options) is already present, update the hashmap instead of discarding
the result. This way, the last/top mount options are always used when
mounts are stacked on a mount point.
This was found to cause problems as LXC version 4.x stacks two /sys mounts,
the bottom one read-write and the top one read-only. systemd accidentally
remounts the top-one read-write, breaking various expectations since a
read-only /sys is the way we decide whether we are running in a container
or not (in this particular case, networkd tests are broken as networkd
expects to be able to modify network settings with a writable /sys).
Future versions of LXC will no longer do this double-stacking, but we
need to support running inside older versions too.
This was triggered by https://github.com/systemd/systemd/commit/6720e356c137
as that causes a recursive remount of '/', which processes '/sys' as one
of the submounts, from make_nosuid(). But it's likely that other combinations
of options could trigger this as well.
Before:
root@systemd-debug:/# systemd-run -t --wait --property ProtectSystem=yes findmnt
Running as unit: run-u9.service
Press ^] three times within 1s to disconnect TTY.
TARGET SOURCE FSTYPE OPTIONS
/ /dev/sda2[/var/lib/lxc/systemd-debug/rootfs]
│ ext4 ro,nosuid,relatime,errors=remount-ro,stripe=
├─/dev none tmpfs rw,nosuid,relatime,size=492k,mode=755
│ ├─/dev/.lxc/proc proc proc rw,nosuid,relatime
│ ├─/dev/.lxc/sys sys sysfs rw,nosuid,relatime
│ ├─/dev/console devpts[/2] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptm
│ ├─/dev/pts devpts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptm
│ ├─/dev/ptmx devpts[/ptmx] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptm
│ ├─/dev/tty1 devpts[/0] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptm
│ ├─/dev/tty2 devpts[/1] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptm
│ ├─/dev/tty3 devpts[/2] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptm
│ ├─/dev/tty4 devpts[/3] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptm
│ ├─/dev/shm tmpfs tmpfs rw,nosuid,nodev
│ ├─/dev/hugepages hugetlbfs hugetlbfs rw,nosuid,relatime,pagesize=2M
│ └─/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime
├─/proc proc proc rw,nosuid,nodev,noexec,relatime
│ ├─/proc/sys proc[/sys] proc ro,nosuid,nodev,noexec,relatime
│ │ ├─/proc/sys/net proc[/sys/net] proc rw,nosuid,nodev,noexec,relatime
│ │ └─/proc/sys/kernel/random/boot_id
│ │ none[/.lxc-boot-id] tmpfs ro,nosuid,nodev,noexec,relatime,size=492k,mo
│ └─/proc/sysrq-trigger proc[/sysrq-trigger] proc ro,nosuid,nodev,noexec,relatime
├─/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime
│ └─/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime
│ ├─/sys/devices/virtual/net sysfs sysfs rw,relatime
│ │ └─/sys/devices/virtual/net
│ │ sysfs[/devices/virtual/net] sysfs rw,nosuid,relatime
│ ├─/sys/fs/fuse/connections fusectl fusectl rw,nosuid,nodev,noexec,relatime
│ └─/sys/fs/cgroup cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,m
├─/run tmpfs tmpfs ro,nosuid,nodev,size=4912348k,nr_inodes=8192
│ ├─/run/credentials tmpfs[/systemd/inaccessible/dir] tmpfs ro,nosuid,nodev,noexec,size=4912348k,nr_inod
│ └─/run/systemd/incoming tmpfs[/systemd/propagate/run-u9.service]
│ tmpfs ro,nosuid,nodev,size=4912348k,nr_inodes=8192
├─/tmp tmpfs tmpfs rw,nosuid,nodev,size=12280872k,nr_inodes=409
│ └─/tmp tmpfs[/systemd-private-b730df90da424397a3f246cb15dcdbb1-run-u9.service-K6EUwf/tmp]
│ tmpfs rw,nosuid,nodev,size=12280872k,nr_inodes=409
└─/var/tmp /dev/sda2[/var/lib/lxc/systemd-debug/rootfs/var/tmp/systemd-private-b730df90da424397a3f246cb15dcdbb1-run-u9.service-vEHyRi/tmp]
ext4 rw,nosuid,relatime,errors=remount-ro,stripe=
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 14.249s
CPU time consumed: 37ms
After:
root@systemd-debug:/# systemd-run -t --wait --property ProtectSystem=yes findmnt
Running as unit: run-u3.service
Press ^] three times within 1s to disconnect TTY.
TARGET SOURCE FSTYPE OPTIONS
/ /dev/sda2[/var/lib/lxc/systemd-debug/rootfs]
│ ext4 rw,relatime,errors=remount-ro,stripe=32699
├─/dev none tmpfs rw,relatime,size=492k,mode=755
│ ├─/dev/.lxc/proc proc proc rw,relatime
│ ├─/dev/.lxc/sys sys sysfs rw,relatime
│ ├─/dev/console devpts[/2] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode
│ ├─/dev/pts devpts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode
│ ├─/dev/ptmx devpts[/ptmx] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode
│ ├─/dev/tty1 devpts[/0] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode
│ ├─/dev/tty2 devpts[/1] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode
│ ├─/dev/tty3 devpts[/2] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode
│ ├─/dev/tty4 devpts[/3] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode
│ ├─/dev/shm tmpfs tmpfs rw,nosuid,nodev
│ ├─/dev/hugepages hugetlbfs hugetlbfs rw,relatime,pagesize=2M
│ └─/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime
├─/proc proc proc rw,nosuid,nodev,noexec,relatime
│ ├─/proc/sys proc[/sys] proc ro,nosuid,nodev,noexec,relatime
│ │ ├─/proc/sys/net proc[/sys/net] proc rw,nosuid,nodev,noexec,relatime
│ │ └─/proc/sys/kernel/random/boot_id
│ │ none[/.lxc-boot-id] tmpfs ro,nosuid,nodev,noexec,relatime,size=492k,mode=75
│ └─/proc/sysrq-trigger proc[/sysrq-trigger] proc ro,nosuid,nodev,noexec,relatime
├─/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime
│ └─/sys sysfs sysfs ro,nosuid,nodev,noexec,relatime
│ ├─/sys/devices/virtual/net sysfs sysfs rw,relatime
│ │ └─/sys/devices/virtual/net
│ │ sysfs[/devices/virtual/net] sysfs rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/fuse/connections fusectl fusectl rw,nosuid,nodev,noexec,relatime
│ └─/sys/fs/cgroup cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory
├─/run tmpfs tmpfs rw,nosuid,nodev,size=4912348k,nr_inodes=819200,mo
│ ├─/run/credentials tmpfs[/systemd/inaccessible/dir]
│ │ tmpfs ro,nosuid,nodev,noexec,size=4912348k,nr_inodes=81
│ └─/run/systemd/incoming tmpfs[/systemd/propagate/run-u3.service]
│ tmpfs ro,nosuid,nodev,size=4912348k,nr_inodes=819200,mo
├─/tmp tmpfs tmpfs rw,nosuid,nodev,size=12280872k,nr_inodes=409600
├─/boot /dev/sda2[/var/lib/lxc/systemd-debug/rootfs/boot]
│ ext4 ro,relatime,errors=remount-ro,stripe=32699
└─/usr /dev/sda2[/var/lib/lxc/systemd-debug/rootfs/usr]
ext4 ro,relatime,errors=remount-ro,stripe=32699
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 14ms
CPU time consumed: 5ms
Host (LXC):
root@systemd-debug:/# findmnt
TARGET SOURCE FSTYPE OPTIONS
/ /dev/sda2[/var/lib/lxc/systemd-debug/rootfs]
│ ext4 rw,relatime,errors=remount-ro,stripe=32699
├─/run tmpfs tmpfs rw,nosuid,nodev,size=4912348k,nr_inodes=819200,mode=755
├─/tmp tmpfs tmpfs rw,nosuid,nodev,size=12280872k,nr_inodes=409600
├─/dev none tmpfs rw,relatime,size=492k,mode=755
│ ├─/dev/pts devpts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666,ma
│ ├─/dev/ptmx devpts[/ptmx] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666,ma
│ ├─/dev/tty1 devpts[/0] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666,ma
│ ├─/dev/tty2 devpts[/1] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666,ma
│ ├─/dev/tty3 devpts[/2] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666,ma
│ ├─/dev/tty4 devpts[/3] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666,ma
│ ├─/dev/shm tmpfs tmpfs rw,nosuid,nodev
│ ├─/dev/hugepages hugetlbfs hugetlbfs rw,relatime,pagesize=2M
│ ├─/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime
│ ├─/dev/console devpts[/2] devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000
│ ├─/dev/.lxc/proc proc proc rw,relatime
│ └─/dev/.lxc/sys sys sysfs rw,relatime
├─/proc proc proc rw,nosuid,nodev,noexec,relatime
│ ├─/proc/sys proc[/sys] proc ro,nosuid,nodev,noexec,relatime
│ │ ├─/proc/sys/kernel/random/boot_id
│ │ │ none[/.lxc-boot-id] tmpfs ro,nosuid,nodev,noexec,relatime,size=492k,mode=755
│ │ └─/proc/sys/net proc[/sys/net] proc rw,nosuid,nodev,noexec,relatime
│ └─/proc/sysrq-trigger proc[/sysrq-trigger] proc ro,nosuid,nodev,noexec,relatime
└─/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime
└─/sys sysfs sysfs ro,nosuid,nodev,noexec,relatime
├─/sys/devices/virtual/net sysfs sysfs rw,relatime
│ └─/sys/devices/virtual/net
│ sysfs[/devices/virtual/net]
│ sysfs rw,nosuid,nodev,noexec,relatime
├─/sys/fs/fuse/connections fusectl fusectl rw,nosuid,nodev,noexec,relatime
└─/sys/fs/cgroup cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recurs
Fixes https://github.com/systemd/systemd/issues/20032
This reverts commit cb0e818f7cc2499d81ef143e5acaa00c6e684711.
After this was merged, some design and implementation issues were discovered,
see the discussion in #18782 and #19385. They certainly can be fixed, but so
far nobody has stepped up, and we're nearing a release. Hopefully, this feature
can be merged again after a rework.
Fixes#19345.
This graphic chip doesn't have a DRM driver and fallback to vesa-framebuffer
driver.
Without this patch, users of such chip suddenly see their GUI broken without
any indication or reason of what happened (no error message). Hence this
regression is near to impossible to troubleshoot for end users.
Add ip protocol token to SocketBind{Allow|Deny}= property parser.
Use parse_socket_bind_item helper.
Replace int32_t with int in cgroup item for socket-bind as it was
requested in [0].
Update tests.
[0] https://github.com/systemd/systemd/pull/19942#discussion_r652150024
Parse address family, ip protocol and ports, any of them can be
optional. If neither is specified, a special value 'any' is expected.
Helper is placed in shared to be reused in both fragment and dbus.
Add unit tests with valid and invalid examples.
Thin wrappers of ip_protocol_{from|to}_name targeting IPPROTO_TCP and
IPPROTO_UDP only.
Used to parse IP protocol configuration restricted only to TCP and UDP,
e.g. in SocketBind{Allow|Deny}= unit property.
These helpers are inspired by af_{from|to}_ipv4_ipv6 and potentially
extendable with other IP protocols if there is a use-case to expose
them.
Lookup ip protocol in a socket address to allow or deny binding a socket
to the address.
Matching rule is extended with 'protocol' field. If its value is 0
(IPPROTO_IP) ip protocol comparison is omitted and matching is passed to
the next token which is ip ports.
Documentation is updated.
dns_resource_record_copy() assumes that NSEC types bitmap is non-empty
which results in a null pointer dereference inside bitmap_copy() in some
cases. Fix this by calling bitmap_copy() conditionally.
socket_broadcast_group_unref() is only called in netlink_slot_disconnect(),
so the assertion should not be triggered as the match slot was
successfully created.
But, we usually design `_ref/unref()` functions as they can be called
for any inputs. So, let's also follow the design rule here.
This effectively reverts the commit 2a394d0bf2f0afd8b9ed5faeb33f23459e3c6504.
But drop trailing '\r' of the read value, as sd_device_set_sysattr_value() drops it.
Fixes#20025.
We checked the wrong field, which was always NULL here, so we would always
reject the assignment. We would also print the wrong string in the error
message:
$ sudo systemd-run --socket-property ListenFIFO=/tmp/fifo3 cat
Failed to start transient socket unit: Invalid socket path: FIFO
By the "same logic as above...", we want to continue to fallback here,
but the break prohibits that.
This is a follow-up for ee1aa61c4710ae567a2b844e0f0bb8cb0456ab8c .
When an ExtensionImages= extension-release metadata does not match, the
log messages (unless debug level is set) are pretty much incomprehensible:
systemd[463]: run-u11.service: Failed to set up mount namespacing: /run/systemd/unit-extensions/0: Stale file handle
systemd[463]: run-u11.service: Failed at step NAMESPACE spawning /usr/bin/echo: Stale file handle
Add an explicit log message if we get ESTALE from the dissect code, to
make it clear what's happening without needing to enable debugging:
systemd[463]: Failed to mount image /tmp/app3.raw, extension-release metadata does not match the lower layer's: ID=debian VERSION_ID=11 SYSEXT_LEVEL=11
Previously, the value is once stringified, and later again parsed,
that is completely redundant.
Follow-up for 1001167ca5e4cfdc6230562e4fb9029e5f624d53.
Replaces #20013.
Only treat interface names containing dots specially when resolvectl is
pretending to be resolvconf to fix
https://github.com/systemd/systemd/issues/20014 .
Move the special suffix-stripping behaviour of ifname_mangle out to the
new ifname_resolvconf_mangle to be called from resolvconf only.
The mount option has special meaning when SELinux is enabled. To make
NoNewPrivileges=yes not break SELinux enabled systems, let's not set the
mount flag on such systems.
This reverts commit 1753d3021564671fba3d3196a84da657d15fb632.
Let's re-enable that feature now. As reported when the original commit
was merged, this causes some trouble on SELinux enabled systems. So,
in the subsequent commit, the feature will be disabled when SELinux is enabled.
But, anyway, this commit just re-enable that feature unconditionally.
This fixes repart's, systemctl's, sysusers' and tmpfiles' specifier
expansion to honour the root dir specified with --root=. This is
relevant for specifiers such as %m, %o, … which are directly sourced
from files on disk.
This doesn't try to be overly smart: specifiers referring to runtime
concepts (i.e. boot ID, architecture, hostname) rather than files on the
medium are left as is. There's certainly a point to be made that they
should fail in case --root= is specified, but I am not entirely convinced
about that, and it's certainly something we can look into later if
there's reason to.
I wondered for a while how to hook this up best, but given that quite a
large number of specifiers resolve to data from files on disks, and most
of our tools needs this, I ultimately decided to make the root dir a
first class parameter to specifier_printf().
Replaces: #16187Fixes: #16183
Due to a little misunderstanding the last patch doesn't work as
expected, since test_create_image() is called only for the first image
(usually TEST-01-BASIC), and all subsequent images are then (possibly)
modified with test_append_files().
Follow-up to 179ca4d2b1b5579014773a128462475f99b7a91b.
If ActivationPolicy= is set to down, always-down, or manual, then any
matching link will delay boot (due to delaying network-online.target).
If RequiredForOnline= wasn't explicitly set, then default it to false
if ActivationPolicy= is down or manual. If ActivationPolicy=always-down,
then force RequiredForOnline=no.
We would always call path_simplify() before doing a lookup, which requires the
path key to be duplicated first. But the hashmap lookup doesn't require this…
So let's opportunistically skip the allocation if the key is already present.
Inspired by https://github.com/systemd/systemd/pull/19973.
The approach with function pointer was neat, but it gets in the way
when we want to resolve the symbol dynamically: static initialization
is not possible. It also makes the code more complicated than necessary.
In this case, a simple boolean is sufficient.
We warn when the operation fails, not when it succeeds. Hence this should be
"<do>_or_<handle failure>", not "<do>_and_<handle failure>". We *could* use
whatever convention we want, but rust and perl are rather consistent in using
the logical convention. We don't care about perl that much, but having a naming
convention inverted wrt. rust would be rather confusing.
Also, pretty much every implementation does similar steps, so add a nice
wrapper which combines opening of the library and loading of the symbols.
Also add missing sentinel attribute in dlopen_or_warn().
The goal is to move everything that requires selinux or smack
away from src/basic/. This means that src/basic/label.[ch] must move,
which implies btrfs-util.[ch], copy.[ch], and a bunch of other files
which form a cluster of internal use.
This is just moving text around, so there should be no functional difference.
test-blockdev-util is new, because path_is_encrypted() is moved to
blockdev-util.c, and so far we didn't have any tests for code there.
This was added in 88d775b734644f26fb490836769c2bc275498fde,
with the apparent intent of using in shared/ and the rest of our code.
It doesn't matter much for our code, since libdl is part of glibc anyway,
but moving it removes one linkage from libsystemd. (libshared was already
linking to libdl explicitly).
fd_duplicate_data_fd() is renamed to copy_data_fd(). This makes
the two functions have nicely similar names.
Now fd-util.[ch] is again about low-level file descriptor manipulations.
copy_data_fd() is a complex function that internally wraps the other
functions in copy.c. I want to move copy.c and the whole cluster of
related code from basic/ to shared/ later on, and this is a preparatory
step for that.
This makes DHCP client ignore FORCERENEW requests, as unauthenticated
FORCERENEW requests causes a security issue (TALOS-2020-1142, CVE-2020-13529).
Let's re-enable this after RFC3118 (Authentication for DHCP Messages)
and/or RFC6704 (Forcerenew Nonce Authentication) are implemented.
Fixes#16774.
Strictly speaking, this breaks backward compatibility, as previously
`ENV{key}="val"` ignored `string_escape=` option. But, introducing
a new option such as `string_escape=hoge` sounds overkill for me.
The default escape mode is `ESCAPE_UNSET`, so I hope this merely break
existing rules.
It turns out the "supporting services" were run in _all_ tests if
TEST-01-BASIC was run as the first test (which is usually the case),
since with the original condition in test_create_image() we would skip
the masking and then propagate the change to the default image used by
other tests. This has been causing multiple bogus test timeouts
(especially when the hwdb was being rebuilt in tests with short
timeouts, like TEST-52-HONORFIRSTSHUTDOWN).
Let's "fix" this by making the call to mask_supporting_services()
uncoditional and override the test_create_image() function in
TEST-01-BASIC to avoid the masking in this single case.
When checking the unit state after `systemctl freeze|thaw` we can be
"too fast" and get the intermediate state (freezing/thawing) which we're
not interested in. Let's wait a bit and try to get the state again in
such cases to avoid unnecessary flakiness.
```
[ 29.390203] testsuite-38.sh[218]: + state=thawing
[ 29.390203] testsuite-38.sh[218]: + '[' thawing = running ']'
[ 29.390203] testsuite-38.sh[218]: + echo 'error: unexpected freezer state, expected: running, actual: thawing'
[ 29.390203] testsuite-38.sh[218]: error: unexpected freezer state, expected: running, actual: thawing
[ 29.390203] testsuite-38.sh[218]: + exit 1
```
test-loop-block needs to run in qemu, so we are currently not
testing it in the CI. Run it by itself in a separate job from
TEST-02-UNITTESTS to avoid slowing that suite down.
Fixes https://github.com/systemd/systemd/issues/19966
Disable it in the bionic-* CI for now, as it's affected by
the same uevent ordering issue as TEST-50-DISSECT which makes
it flaky.
Fixes a bug introduced by cfea7618f28562c053a1ee194108feaa502081ff.
Before this commit:
mode=1777,size=10%,nr_inodes=400k,uid=496107520,gid=496107520,context=,sys.id:sys.role:systemd.nspawn.container.fs:s0,
After this commit:
mode=1777,size=10%,nr_inodes=400k,uid=496107520,gid=496107520,context=sys.id:sys.role:systemd.nspawn.container.fs:s0
Fixes#19976.
format_timestamp_relative currently returns the plural form of
years and months no matter the quantity, and in many cases (for
durations > 1 week) this is the same with days.
This patch changes this so that the function takes the quantity into account,
returning "1 month 1 week ago" instead of "1 months 1 weeks ago".
This is useful for provisioning initially empty secondary A/B root file
systems. We don't want those to ever be considered for automatic
mounting, for example in "systemd-nspawn --image=", hence we should
create them with the No-Auto flag turned on. Once a file system image is
dropped into the partition the flag may be turned off by the updater
tool, so that it is considered from then on.
Thew new option for this is called NoAuto. I dislike negated options
like this, but this is taken from the naming in the spec, which in turn
inherited the name from the same flag for Microsoft Data Partitions. To
minimize confusion, let's stick to the name hence.
The two are completely identical, only the return code is inverted.
let's hence make it easy for the compiler to make it the same function
call even in lowest optimization modes.
In many CI runs I noticed a race where we check the "active" state a bit
too early where the unit is still in the "inactive" state, causing the
`is-failed` check to fail. Mitigate this by waiting even if the unit is
in the inactive state and introduce a "safe net" which checks whether
the unit is not restarting indefinitely or more than it should (as
described in the original issue #3166).
Example:
```
[ 5.757784] testsuite-11.sh[216]: + systemctl --no-block start fail-on-restart.service
[ 5.853657] testsuite-11.sh[222]: ++ systemctl show --value --property ActiveState fail-on-restart.service
[ 5.946044] testsuite-11.sh[216]: + active_state=inactive
[ 5.946044] testsuite-11.sh[216]: + [[ inactive == \a\c\t\i\v\a\t\i\n\g ]]
[ 5.946044] testsuite-11.sh[216]: + [[ inactive == \a\c\t\i\v\e ]]
[ 5.946044] testsuite-11.sh[216]: + systemctl is-failed fail-on-restart.service
[ 5.946816] systemd[1]: fail-on-restart.service: Passing 0 fds to service
[ 5.946913] systemd[1]: fail-on-restart.service: About to execute false
[ 5.947011] systemd[1]: fail-on-restart.service: Forked false as 228
[ 5.947093] systemd[1]: fail-on-restart.service: Changed dead -> start
[ 5.947172] systemd[1]: Starting Fail on restart...
[ 5.947272] systemd[228]: fail-on-restart.service: Executing: false
[ 5.960553] testsuite-11.sh[227]: activating
[ 5.965188] testsuite-11.sh[216]: + exit 1
[ 6.011838] systemd[1]: Received SIGCHLD from PID 228 (4).
[ 6.012510] systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
[ 6.012638] systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
[ 6.012834] systemd[1]: fail-on-restart.service: Service will restart (restart setting)
[ 6.012963] systemd[1]: fail-on-restart.service: Changed running -> failed
[ 6.013081] systemd[1]: fail-on-restart.service: Unit entered failed state.
```
Text currently refers to `/etc/nsswitch.conf` where it should refer to `/etc/resolv.conf`.
This is in the context of defining a nameserver IP and search domains.
The three-argument match() is a GNU AWK extension, thus breaking the
compatibility with mawk (used on Ubuntu/Debian, for example). Let's
replace it with a (hopefully) more portable sed expression to drop the
inadvertently introduced gawk dependency.
Fixes: #19957
Show message "Deactivated successfully" in debug mode (when manager is
user) rather than in info mode. This message has low information value
for regular users and it might be a bit overwhelming on a system with
a lot of devices.
The general idea with users and groups created through sysusers is that an
appropriate number is picked when the allocation is made. The number that is
selected will be different on each system based on the order of creation of
users, installed packages, etc. Since system users and groups are not shared
between installations, this generally is not an issue. But it becomes a problem
for initrd: some file systems are shared between the initrd and the host (/run
and /dev are probably the only ones that matter). If the allocations are
different in the host and the initrd, and files survive switch-root, they will
have wrong ownership.
This makes the gids build-time-configurable for all groups and users where
state may survive the switch from initrd to the host.
In particular, all "hardware access" groups are like this: files in /dev will
be owned by them. Eventually the new udev would change ownership, but there
would be a momemnt where the files were owned by the wrong group. The
allocations are "soft-static" in the language of Fedora packaging guidelines:
the uid/gid will be used if possible, but we'll fall back to a different
one. TTY_GID is the exception, because the number is used directly.
Similarly, the possibility to configure "soft-static" uids is added for daemons
which may usefully run in the initramfs: systemd-network (lease information and
interface state is serialized to /run), systemd-resolve (stub files and
interface state), systemd-timesync (/run/systemd/timesync).
Journal files are owned by the group systemd-journal, and acls are granted
for wheel and adm.
systemd-oom and systemd-coredump are excluded from this patch: I assume that
oomd is not useful in the initrd, and coredump leaves no state (it only creates
a pipe in /run?).
The defaults are not changed: if nothing is configured, dynamic allocation will
be used. I looked at a Debian system, and the numbers are all different than
on Fedora.
For Fedora, see the list of uids and gids at https://pagure.io/setup/blob/master/f/uidgid.
In particular, systemd-network and systemd-resolve got soft-static numbers to
make it easy to transition from a non-host-specific initrd to a host system
already a few years back (https://bugzilla.redhat.com/show_bug.cgi?id=1102002).
I also requested static allocations for sgx, input, render in
https://pagure.io/packaging-committee/issue/1078,
https://pagure.io/setup/pull-request/27.
Support filtering by ip protocol (L4) in SocketBind{Allow|Deny}=
properties.
The signature of dbus methods must be finalized before new release is
cut, hence reserve a parameter for ip protocol.
Implementation will follow.
Closes https://github.com/systemd/systemd/issues/19891
The logic is that if the options are updated after boot, we *don't* use
the new value. But we still want to print out the changed contents in
bootctl as to not confuse people.
Fixes#19597.
Also https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988450.
$ build/bootctl systemd-efi-options
quiet
Note: SystemdOptions EFI variable has been modified since boot. New value: debug
The hint is printed to stderr, so scripts should not be confused.
Creating those string dynamically at runtime is slow and unnecessary.
Let's use static strings with a bit of macro magic and the let the compiler
coalesce as much as possible.
$ size build/src/shared/libsystemd-shared-248.so{.old,}
text data bss dec hex filename
2813453 94572 4584 2912609 2c7161 build/src/shared/libsystemd-shared-248.so.old
2812309 94564 4584 2911457 2c6ce1 build/src/shared/libsystemd-shared-248.so
A nice side-effect is that the same form is used everywhere, so it's easier to
figure out all variables that are used, and where each specific variable is
used.
C.f. 2b0445262ad9be2a9bf49956ab8e886ea2e48a0a.
Note: 'const char *foo = alloca(…);' seems OK. Our coding style document and
alloca(3) only warn against using alloca() in function invocations. Declaring
both stack variable and alloca at the same time should be fine: no matter in
which order they happen, i.e. if the pointer variable is above the contents,
or the contents are above the pointer, or even if the pointer is elided by the
compiler, everything should be fine.
In the light of https://lwn.net/Articles/859679/ let's drop
quotactl_path() again from the filter set list, as it got backed out
again in 5.13-rc3.
It's likely going to be replaced by quotactl_fd() eventually, but that
hasn't made its way into the tree yet, hence let's not replace the entry
for now.
This partially reverts 34254e599a28529bdb89f91571adeaf7c76d9f43.
So in theory UUID Variant 2 (i.e. microsoft GUIDs) are supposed to be
displayed in native endian. That is of course a bad idea, and Linux
userspace generally didn't implement that, i.e. uuidd and similar.
Hence, let's not bother either, but let's document that we treat
everything the same as Variant 1, even if it declares something else.
Previously, when `link_request_queue()` is called in link_request_set_link(),
`SetLinkOperation` is casted with INT_TO_PTR(), and the value is assigned to
`void *object`. However the value was read directly through the member
`SetLinkOperation set_link_operation` of the union which `object`
beloging to. Thus, read value was always 0 on big-endian systems.
Fixes configuring link issue on s390x systems.
Debugging udev issues especially during the early boot is fairly
difficult. Currently, you need to enable (at least) debug logging and
start monitoring uevents, try to reproduce the issue and then analyze
and correlate two (usually) huge log files. This is not ideal.
This patch aims to provide much more focused debugging tool,
tracepoints. More often then not we tend to have at least the basic idea
about the issue we are trying to debug further, e.g. we know it is
storage related. Hence all of the debug data generated for network
devices is useless, adds clutter to the log files and generally
slows things down.
Using this set of tracepoints you can start asking very specific
questions related to event processing for given device or subsystem.
Tracepoints can be used with various tracing tools but I will provide
examples using bpftrace.
Another important aspect to consider is that using tracepoints you can
debug production systems. There is no need to install test packages with
added logging, no debuginfo packages, etc...
Example usage (you might be asking such questions during the debug session),
Q: How can I list all tracepoints?
A: bpftrace -l 'usdt:/usr/lib/systemd/systemd-udevd:udev:*'
Q: What are the arguments for each tracepoint?
A: Look at the code and search for use of DEVICE_TRACE_POINT macro.
Q: How many times we have executed external binary?
A: bpftrace -e 'usdt:/usr/lib/systemd/systemd-udevd:udev:spawn_exec { @cnt = count(); }'
Q: What binaries where executed while handling events for "dm-0" device?
A bpftrace -e 'usdt:/usr/lib/systemd/systemd-udevd:udev:spawn_exec / str(arg1) == "dm-0"/ { @cmds[str(arg4)] = count(); }'
Thanks to Thomas Weißschuh <thomas@t-8ch.de> for reviewing this patch
and contributions that allowed us to drop the dependency on dtrace tool
and made the resulting code much more concise.
This reverts commit 592d419ce6e283c443901be4a69c95984821ff06.
The commit makes journald unstable, and is just an optimization
for the size of journal. Hence, it is safe to revert the commit.
Fixes#19895.
The previous string was "unknown", but that's wrong, because we *do*
know what we are going to do with those partitions: we leave them
unmodified, hence say "unchanged" in the output, to be clearer.
The currently hardcoded value works with the default configuration, but
breaks when QEMU_MEM != 512M (in sanitizer runs, for example).
```
# QEMU_MEM=1G make -C test/TEST-36-NUMAPOLICY/ run
make: Entering directory '/home/fsumsal/repos/@systemd/systemd/test/TEST-36-NUMAPOLICY'
TEST-36-NUMAPOLICY RUN: test NUMAPolicy= and NUMAMask= options
+ /bin/qemu-kvm -smp 8 -net none -m 1G -nographic -kernel /boot/vmlinuz-5.12.5-300.fc34.x86_64 -drive format=raw'
qemu-kvm: total memory for NUMA nodes (0x20000000) should equal RAM size (0x40000000)
E: QEMU failed with exit code 1
```
Before 81107b8419c39f726fd2805517a5b9faab204e59, the compare functions
for the latest or earliest prioq did not handle ratelimited flag.
So, it was ok to not reshuffle the time prioq when changing the flag.
But now, those two compare functions also compare the source is
ratelimited or not. So, it is necessary to reshuffle the time prioq
after changing the ratelimited flag.
Hopefully fixes#19903.
This reverts commit d8e3c31bd8e307c8defc759424298175aa0f7001.
A poorly documented fact is that SELinux unfortunately uses nosuid mount flag
to specify that also a fundamental feature of SELinux, domain transitions, must
not be allowed either. While this could be mitigated case by case by changing
the SELinux policy to use `nosuid_transition`, such mitigations would probably
have to be added everywhere if systemd used automatic nosuid mount flags when
`NoNewPrivileges=yes` would be implied. This isn't very desirable from SELinux
policy point of view since also untrusted mounts in service's mount namespaces
could start triggering domain transitions.
Alternatively there could be directives to override this behavior globally or
for each service (for example, new directives `SUIDPaths=`/`NoSUIDPaths=` or
more generic mount flag applicators), but since there's little value of the
commit by itself (setting NNP already disables most setuid functionality), it's
simpler to revert the commit. Such new directives could be used to implement
the original goal.
This fixes the following spurious logs on enumerating links:
```
wlan0: Saved original MTU 1500 (min: 256, max: 2304)
wlan0: MTU is changed: 0 → 1500 (min: 256, max: 2304)
```
Most real network devices refuse to set MAC address when its operstate
is not down. So, setting MAC address once failed, then let's bring down
the interface and retry to set.
Closes#6696.
Previously (v248 or earlier), even if no static address is configured,
the link did not enter configured state, as e.g. Link::static_addresses_configured
is false until the link gained its carrier.
But, after the commit 1187fc337577cecd685d331eeab656be186ba3b2, the
situation was changed. Static addresses, routes, and etc are requested even
if the link does not have its carrier, and thus the link enters configured
state when no static address and etc are specified.
This makes the link does not enter configured state before it gains its
carrier when at least one of dynamic address assignment protocols (e.g.
DHCP) except for NDISC is enabled.
Note that, unfortunately, netplan always enables ConfigureWithoutCarrier=
for all virtual devices, e.g. bridge. See,
978e20f902
So, we need to support e.g. the following strange config:
```
[Netowkr]
ConfigureWithoutCarrier=yes
DHCP=yes
```
Fixes#19855.
Say that r should be declared at the top of the function.
Don't say that fixed buffers result in truncation, right after saying that they
must only be used if size is known.
Adjust order of examples to be consistent.
Cgroups may be unnecessarily realized when they are not needed. This
happens, e.g. for mount units parsed from /proc/$PID/mountinfo, check
touch /run/ns_mount
unshare -n sh -c "mount --bind /proc/self/ns/net /run/ns_mount"
# no cgroup exists
file /sys/fs/cgroup/system.slice/run-ns_mount.mount
systemctl daemon-reload
# the vain cgroup exists
file /sys/fs/cgroup/system.slice/run-ns_mount.mount
. (Such cgroups can account to a large number with many similar mounts.)
The code already accounts for "lazy" realization (see various checks for
Unit.cgroup_realized) but the unit_deserialize() in the reload/reexec
path performs unconditional realization.
Invalidate (and queue) the units for realization only if we know that
they were already realized in the past. This is a safe thing to do even
in the case the reload brings some new cgroup setting (controllers, BPF)
because units that aren't realized will use the updated setting when the
time for their realization comes. (It's not even needed to add a code
comment because the current formulation suggests the changed behavior.)
I wanted to see what is_path_read_only_fs() and is_path_temporary_fs() return
in a chroot, and various tests would fail. For most of our codebase, we can
assume that /proc and such are mounted, and it doesn't make sense to make the
tests work in a chroot. But let's do it here. (In general, it would be useful
for most stuff in src/basic/, since it's linked into libraries which might be
invoked in incorrectly set up environments and should not fail too badly.)
a70581ffb5c13c91c76ff73ba6f5f3ff59c5a915 added ExecRuntime.ipcns_storage_socket[], and
serialization in exec_runtime_serialize(), and deserialization in exec_runtime_deserialize_one(),
but also deserialization in exec_runtime_deserialize_compat(). exec_runtime_deserialize_compat()
is for deserializating ExecRuntime when it was serialized as part of the unit before
e8a565cb660a7a11f76180fe441ba8e4f9383771. There was never any code which would serialize
ExecRuntime.ipcns_storage_socket[] this way, so the deserialization attempts are pointless.
All unit types can be serialized. This function was really checking whether the
unit type has custom serialization/deserialization code. But we don't need a
function for this.
Also, the check that both .serialize() and .deserialize_item() are defined is
better written as an assert. Not we have a function which would skip
serialization/deserializaton for the unit if we forgot to set either of the
fields.
Apparently people use such large key files. Specifically, people used 4M
key files, and we lowered the limit from 4M to 4M-1 back in 248.
This raises the limit to 64M for read_full_file() to avoid these
specific issues and give some non-trivial room beyond the 4M files seen
IRL.
Note that that a 64M allocation in glibc is always immediately done via
mmap(), and is thus a lot slower than shorter allocations. This means
read_virtual_file() becomes ridiculously slow if we'd use the large
limit, since we use it all the time for reading /proc and /sys metadata,
and read_virtual_file() typically allocates the full size with malloc()
in advance. In fact it becomes so slow, that test-process-util kept
timing out on me all the time, once I blindly raised the limit.
This patch hence introduces two distinct limits for read_full_file() and
read_virtual_file(): the former is much larger than the latter and the
latter remains where it is. This is safe since the former uses an
exponentially growing realloc() loop while the latter uses the
aforementioend ahead-of-time full limit allocation.
Fixes: #19193
When we have a unit which cannot be enabled:
# foo@.service:
...
[Install]
WantedBy=foo.target # there is no instance, so we don't know what to enable
we should throw an error when invoked directly with 'enable', but
not when doing 'preset' or 'preset-all'.
Fixes#19856.
Instead of ordering non-pending before pending we should order
"non-pending OR ratelimited" before "pending AND not-ratelimited".
This fixes a bug where ratelimited events were ordered at the end of the
priority queue and could be stuck there for an indeterminate amount of
time.
Since those workarounds have been added, work has been done to tighten
up log_*() return values. Seems we get no warning with
gcc-11.1.1-1.fc34.x86_64 and -O0/-O2.
$ systemctl enable --root=/ serial-getty@.service
Failed to enable unit, unit getty.target is a non-template unit.
↓
Failed to enable serial-getty@.service, destination unit getty.target is a non-template unit.
This had some purpose back in the day, but right now I cannot see what
difference this makes. It's hard to keep the list of all possible errors up to
date. So let's remove this, hopefully nothing breaks.
It's hard to trigger the failure to exit the rate limit state in
isolation as it needs multiple event sources in order to show that it
gets stuck in the queue. Hence why this is an extended test.
There is no reason to tie the two together: in principle we may have
in the future a unit type which does not define .serialize/.deserialize_item,
but we would still want to call the compat deserialization code for it.
When suppressing duplicate fields between files we so far tried to reuse
the already known hash value of the data fields between files. This was
fine as long as we used the same hash function everywhere. However,
since addition of the keyed hash feature for journal files this doesn't
work anymore, since the hashes will be different for different files.
Fixes: #19172
This makes the followings:
- reduces scope of variables,
- drop unnecessary 'else'
- use CLOSE_AND_REPLACE() macro
- use strnull() for possible NULL string
If we have BPF_F_ALLOW_MULTI support we can install the new program
before we drop the old (because we can install two program at the same
time). Let's do that, and thus fully close the firewall
gap.
E.g. nexthop requires IFF_UP flag, but the currently stored flag may be
outdated if we called link_down(). This makes such requests pending if
at least one of the flags are updating.
On carrier lost, then all requests which require carrier will not be
processed. And they will be processed when the interface gained its
carrier again. So, it is not necessary to drop requests here.
Previously, IPv6LinkLocalAddressGenerationMode= is not set, then we
define the address generation mode based on the result of reading
stable_secret sysctl value. This makes the mode is determined by whether
a secret address is specified in the new setting.
Closes#19622.
Mostly logging related: let's downgrade logging in dlopen_bpf() for
example, and remove duplicate logging at various places. Add %m to log
messages and so on.
These are so many runtime objects, let's add a bpf_firewall_close()
helper that destroys them all, and call that from unit_free(), simply as
an excercise of encapsulating more BPF code in bpf-firewall.c.
This also brings the destruction order and variable declaration order in
struct Unit into the same systematic order.
No change in behaviour just some minor refactoring.
In dns_server_unlink_marked() and dns_server_mark_all() we done recursively.
People might have dozens of servers defined, and it's better to avoid recursion
when a simple loop suffices.
dns_server_unlink_marked() would only unmark the first marked server.
Fixes#19651.
Journal files have space allocated in 8MiB-aligned increments.
This can add up to substantial wasted space as many archived journals
accumulate without using all the allocated space.
This commit introduces truncating to the offset a subsequent append
would get written at when archiving.
Fixes https://github.com/systemd/systemd/issues/17613
Not sure, but at the time the target partition device is created or
enumerated, some sysattrs or properties may not be ready.
So, let's find partition on timeout. The device may be ready at that
time.
For "systemd-tmpfiles --cleanup", when the "Age" parameter
is specified, the criteria for deletion is determined from
the path's last modification timestamp ("mtime"), its last
access timestamp ("atime") and its last status change
timestamp ("ctime").
For instance, if one of those paths to be cleaned up are
opened, it results in the modification of "atime", which
results file system entry to not be removed because the
default aging algorithm would skip the entry.
Add an optional "age-by" argument by extending the "Age"
parameter to restrict the clean-up for a particular type
of file timestamp, which can be specified in "tmpfiles.d"
as follows:
[age-by:]cleanup-age, where age-by is "[abcmACBM]+"
For example:
d /foo/bar - - - abM:1m -
Would clean-up any files that were not accessed and created,
or directories that were not modified less than a minute ago
in "/foo/bar".
Fixes: #17002
Add the '=' action modifier that instructs tmpfiles.d to check the file
type of a path and remove objects that do not match before trying to
open or create the path.
BUG=chromium:1186405
TEST=./test/test-systemd-tmpfiles.py "$(which systemd-tmpfiles)"
Change-Id: If807dc0db427393e9e0047aba640d0d114897c26
When using top level drop-ins it isn't immediately obvious that one can
make use of symlinking to disable a top-level drop in for a specific
unit.
Signed-off-by: Peter Morrow <pemorrow@linux.microsoft.com>
When e.g. tmp.mount is present in the initrd, and we serialize it, switch root,
and deserialize, the new systemd is confused because it thinks /tmp is mounted.
In general, it doesn't make sense to serialize anything that refers to paths in
the old root file system.
This fixes two errors for me:
1. tmp.mount was not mounted properly before local-fs.target. It would be
mounted as some point (I guess when we re-read /proc/self/mountinfo for some
other reason). In effect systemd-tmpfiles-setup.service would see one fs, and
some other units started later a different one. In particular gdm.service would
fail because the pre-created /tmp/.X11-unix with proper permissions would not
exist at time it was started.
2. # systemd[1]: proc-sys-fs-binfmt_misc.automount: Got hangup/error on autofs pipe from kernel. Likely our automount point has been unmounted by someone or something else?
# systemd[1]: proc-sys-fs-binfmt_misc.automount: Failed with result 'unmounted'.
# systemd[1]: Mounting proc-sys-fs-binfmt_misc.mount...
# systemd[1]: Mounted proc-sys-fs-binfmt_misc.mount.
# systemd[1]: Starting systemd-binfmt.service...
# systemd[1]: Finished systemd-binfmt.service.
# systemd[1]: proc-sys-fs-binfmt_misc.automount: Path /proc/sys/fs/binfmt_misc is already a mount point, refusing start.
# systemd[1]: Failed to set up automount proc-sys-fs-binfmt_misc.automount.
# systemd[1]: proc-sys-fs-binfmt_misc.automount: Path /proc/sys/fs/binfmt_misc is already a mount point, refusing start.
# systemd[1]: Failed to set up automount proc-sys-fs-binfmt_misc.automount.
# systemd[1]: proc-sys-fs-binfmt_misc.automount: Path /proc/sys/fs/binfmt_misc is already a mount point, refusing start.
# systemd[1]: Failed to set up automount proc-sys-fs-binfmt_misc.automount.
# systemd[1]: Stopping systemd-binfmt.service...
# systemd[1]: systemd-binfmt.service: Deactivated successfully.
# systemd[1]: Stopped systemd-binfmt.service.
I couldn't understand the error here, but in retrospect the first line is entirely
correct: "someone or something else" was the old systemd unmounting the old root.
When /var/lib/systemd/coredump/ is backed by a tmpfs, all disk usage
will be accounted under the systemd-coredump process cgroup memory
limit.
If MemoryMax is set, this might cause systemd-coredump to be terminated
by the kernel oom handler when writing large uncompressed core files,
even if the compressed core would fit within the limits.
Detect if a tmpfs is used, and if so check MemoryMax from the process
and slice cgroups, and do not write uncompressed core files that are
greater than half the available memory. If the limit is breached,
stop writing and compress the written chunk immediately, then delete
the uncompressed chunk to free more memory, and resume compressing
directly from STDIN.
Example debug log when this situation happens:
systemd-coredump[737455]: Setting max_size to limit writes to 51344896 bytes.
systemd-coredump[737455]: ZSTD compression finished (51344896 -> 3260 bytes, 0.0%)
systemd-coredump[737455]: ZSTD compression finished (1022786048 -> 47245 bytes, 0.0%)
systemd-coredump[737455]: Process 737445 (a.out) of user 1000 dumped core.
Try to infer the unused memory that a unit can claim before the
memory.max limit is reached, including any limit set on any parent
slice above the unit itself.
We were effectively doing all post-upgrade scripts twice in Fedora. We got this
wrong, so it's likely other people will get it wrong too. So let's explain
what is actually needed to make this work, but also when it's not useful.
It is not necessary to stop whole configuration process until MTU and
IPv6LL address generation mode are set. But it is enough just setting
IPv6 MTU again after MTU is set, and dropping IPv6LL address after
setting the address generation mode.
The condition does not fix infinite loop of interface reset, as the
interface is reset after netlink reply is received, thus setting_mtu is
false.
See also #18738.
Previously, several failures in link_carrier_gained() make link enter
failed state, and other errors are ignored. Now, all failures in
link_carrier_gained(), moreover, link_update() are critical.
networkd already has all information about routes. It is not necessary
to re-read them by using local_gateways().
This also makes manager_find_uplink() take family.
Seems the assert should be placed in-before decrypted_key
pointer is passed to libcryptsetup API.
Original placement would trigger abort in case tpm2
hw was not present in the system while required
to activate crypt devices.
Fixes#19437.
As reported in the bug:
> # drkonqi-coredump-processor@.service
> ...
> [Install]
> WantedBy=systemd-coredump@.service
>
> The plan here is to have a systemd-coredump@ instance start the same %i for
> drkonqi-coredump-processor@. Works perfectly when creating the symlink manually
> ln -sv /usr/lib/systemd/system/drkonqi-coredump-processor@.service
> /etc/systemd/system/systemd-coredump@.service.wants/.
When DefaultInstance is set, we replace template references with
template@default-inst. But in this case we want to create a symlink for the
template name, so that systemd will fill in the instance from the
wanting/requiring unit. This is only possible for those units that actually
have an instance set, so we create the symlink only from .requires/ or .wants
of an instantiated unit (then this specific instance will be used), or a
template (than some instance will be inherited later).
Specifically:
...
[Install]
WantedBy=other@.service, fixed.service
DefaultInstance=inst
→ enable foo@.service creates other@.service.wants/foo@inst.service, and
other@a.service will want foo@inst.service, and other@b.service will want foo@inst.service,
and fixed.service will want foo@inst.service.
Without DefaultInstance,
→ enable foo@.service creates other@.service.wants/foo@.service, and
other@a.service would want foo@a.service, and other@b.service would want foo@b.service,
but enablement fails because no dependency can be created for fixed.service:
Failed to enable unit, unit fixed.service is a non-template unit.
Otherwise, update flag become incomplete and the IFA_F_MANAGETEMPADDR flag
will not be stored, thus no temporary addresses will be removed when
networkd requests to remove the main address.
Follow-up for a8481354f0cd2c0855472193d0f57c7a77674969.
Fixes#13218.
Fixes#19838.
Initially I wanted to add ConditionPathExists=!/etc/initrd-release in various
units (ldconfig.service, systemd-sysusers.service, systemd-hwdb-update.service,
systemd-journal-catalog-update, systemd-update-done.service), but I think it's
better to just disable the mechanism in the initrd altogether. Initrd images
are put together in a very particular way, and there is not need to do
post-update steps on them. If a unit from some other package winds up in the
initrd, we wouldn't want to invoke it either.
Also, any modifications are ephemeral, so any update would happen on every
use. And finally, initrd images are all about speed, and we shouldn't invoke
any unneeded services.
Use the option name 'password-echo' instead of the generic term
'silent'.
Make the option take an argument for better control over echoing
behavior.
Related discussion in https://github.com/systemd/systemd/pull/19619
prepare_socket_bind_bpf() is called from two sites: socket_bind_supported() and
socket_bind_install_impl(). For the latter, when errors occur we certainly want
to log, since they'll be fatal for the unit. But for the former, we should be
quiet, at least on the "expected" errors like lack of permissions. I kept error
on map resizing and such, which should not fail, at log_warning(). They are not
fatal when called from socket_bind_suppported(), but still a sign that
something is off.
Currently BPF filters can only be used by privileged users. Thus each systemd
--user will fail in socket_bind_supported(). With the patch, we only log this
at debug level.
https://lwn.net/ml/bpf/cover.1620499942.git.yifeifz2@illinois.edu/ gives some
hope that unprivileged access will be possible, so let's keep the code trying.
We might get lucky and get support for filters in user mode without any changes
on our side.
Some devices sent CHANGE and REMOVE uevent simultaneously.
To support that such device read udev database, let's copy minimal set of
properties which requires to read the database.
Fixes#19788.
This makes the last 11 chars are always preserved for hashed string.
So, it is hard to generate a path which conflicts to another path.
Fixes an issue demonstrated in the previous commit.
The commit e64943363a8dd8bd320c2b633478be8befd1af5c introduces hashed
path at the end of the filename. But we can easily generate the path
which conflicts another path. The issue will be fixed in later commit.
The usual: bitfields make sense as a memory-saving measure when we have many
objects of a given type. When the object appears at most in a few copies, the
overhead of additional code to access bitfields is more than the savings.
With the previous commit, we would not complain about the not-found path, but
the check is still not useful. We use a libc function to resolve the glob, and
it has no notion of treating autofs specially. So we can't avoid touching
autofs when resolving globs. But usually the glob is found in the last
component of the path, so if we strip the glob part, we can still do a useful
check in many cases. (E.g. if /var/tmp is on autofs, something like
"/var/tmp/<glob>" is much more likely than "/var/<glob-that-matches-tmp>/<something>".)
With the system config in F34, we check the following prefixes:
/var/tmp/abrt/* → /var/tmp/abrt/
/run/log/journal/08a5690a2eed47cf92ac0a5d2e3cf6b0/*.journal* → /run/log/journal/08a5690a2eed47cf92ac0a5d2e3cf6b0/
/var/lib/systemd/coredump/.#core*.21e5c6c28c5747e6a4c7c28af9560a3d* → /var/lib/systemd/coredump/
/tmp/podman-run-* → /tmp/
/tmp/systemd-private-21e5c6c28c5747e6a4c7c28af9560a3d-*/tmp → /tmp/
/tmp/systemd-private-21e5c6c28c5747e6a4c7c28af9560a3d-* → /tmp/
/tmp/containers-user-* → /tmp/
/var/tmp/beakerlib-* → /var/tmp/
/var/tmp/dnf*/locks/* → /var/tmp/
/var/tmp/systemd-private-21e5c6c28c5747e6a4c7c28af9560a3d-*/tmp → /var/tmp/
/var/tmp/systemd-private-21e5c6c28c5747e6a4c7c28af9560a3d-* → /var/tmp/
/var/tmp/abrt/* → /var/tmp/abrt/
/var/tmp/beakerlib-* → /var/tmp/
/var/tmp/dnf*/locks/* → /var/tmp/
/tmp/podman-run-* → /tmp/
/tmp/containers-user-* → /tmp/
/tmp/systemd-private-21e5c6c28c5747e6a4c7c28af9560a3d-* → /tmp/
/tmp/systemd-private-21e5c6c28c5747e6a4c7c28af9560a3d-*/tmp → /tmp/
/var/tmp/systemd-private-21e5c6c28c5747e6a4c7c28af9560a3d-* → /var/tmp/
/var/tmp/systemd-private-21e5c6c28c5747e6a4c7c28af9560a3d-*/tmp → /var/tmp/
/var/lib/systemd/coredump/.#core*.21e5c6c28c5747e6a4c7c28af9560a3d* → /var/lib/systemd/coredump/
/run/log/journal/08a5690a2eed47cf92ac0a5d2e3cf6b0/*.journal* → /run/log/journal/08a5690a2eed47cf92ac0a5d2e3cf6b0/
Lines in the dumps are ordered by some pseudo-random hashmap entry order, which
makes it hard to diff two outputs. This sort the entries alphabetically, and
also sorts items within the entries, and supresses timestamps and other fields
which always vary.
We could sort the output inside of systemd itself, but it'd make things more
complex, and we probably don't need output to be sorted in most cases. It also
wouldn't be enough, because timestamps and such would still need to be ignored
to do a nice diff. So I think doing the sorting and suppression in a python
helper is a better approach.
unit_serialize_item() was dropped in d68c645bd3323ae1f0dfcb8fd74ea6b19681db8a.
But "cannot be restored from other sources" is also not entirely true: for
example for mounts we may be able to figure out most state from /p/s/mountinfo.
So let's make the comment more oblique.
If the name of the old device didn#t work for us, we don't have to clean
anything up, since we know for sure that there won't be a device unit
for it. hence downgrade log message about it.
We want to propagate errors here, since we want to make dependent on the
success of creating the main device unit the creation of the auxiliary
device units. Thus if we suppress errors here we might end up in exotic
corner cases in a situation were we create the auxiliary ("following")
device units without the primary one.
This adds --visible=yes|no|asterisk which allow controlling the echo of
the password prompt in detail. The existing --echo switch is then made
an alias for --visible=yes (and a shortcut -e added for it too).
The value is set dynamically when sd_device_get_subsystem() is called
first time.
Fixes the following issue:
```
$ build/udevadm test /sys/class/block/dm-1
...
Assertion '_subsystem' failed at src/libsystemd/sd-device/sd-device.c:767, function device_set_subsystem(). Aborting.
Program received signal SIGABRT, Aborted.
```
systemd-tmpfiles[328]: Failed to determine whether '/run/cryptsetup' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/etc/resolv.conf' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/lock/subsys' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/setrans' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/console' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/faillock' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/sepermit' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/motd.d' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/motd.d' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/motd' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/run/nologin' is below autofs, ignoring: No such file or directory
systemd-tmpfiles[328]: Failed to determine whether '/var/lib/systemd/pstore' is below autofs, ignoring: No such file or directory
... and so on and so on.
I always found this a bit annoying.
With the patch:
$ SYSTEMD_LOG_LEVEL=debug build/udevadm test /sys/class/block/dm-1
...
Loaded timestamp for '/etc/systemd/network'.
Loaded timestamp for '/usr/lib/systemd/network'.
Parsed configuration file /usr/lib/systemd/network/99-default.link
Parsed configuration file /etc/systemd/network/10-eth0.link
Created link configuration context.
Loaded timestamp for '/etc/udev/rules.d'.
Loaded timestamp for '/usr/lib/udev/rules.d'.
...
We had:
systemd[1]: varlink-36: New incoming message: {"method":"io.systemd.UserDatabase.GetMemberships","parameters":{"userName":"gdm","service":"io.systemd.DynamicUser"},"more":true}
systemd[1]: varlink-36: varlink: changing state idle-server → processing-method-more
systemd[1]: varlink-36: Sending message: {"error":"io.systemd.UserDatabase.NoRecordFound","parameters":{}}
systemd[1]: varlink-36: varlink: changing state processing-method-more → processed-method
systemd[1]: varlink-36: varlink: changing state processed-method → idle-server
systemd[1]: varlink-36: Got POLLHUP from socket.
systemd[1]: varlink-36: varlink: changing state idle-server → pending-disconnect
systemd[1]: varlink-36: varlink: changing state pending-disconnect → processing-disconnect
systemd[1]: varlink-36: varlink: changing state processing-disconnect → disconnected
So let's drop the "varlink:" prefix and use capitalized sentences like in other messages.
For new connections, we log something like this:
systemd[1]: n/a: New incoming connection.
systemd[1]: n/a: Connections of user 997: 0 (of 1024 max)
systemd[1]: varlink-22: varlink: setting state idle-server
systemd[1]: varlink-22: New incoming message: ...
This "n/a" is not very pretty, and without context it would be hard to even
figure out this is a varlink connection.
There will likely be none, hence don't bother.
This fixes an issue in systemd-gpt-auto-generator where we'll try to
wait for the udev db for the partitions even though though udev might
simplynot be around and via the DISSECT_IMAGE_NO_UDEV flag were
explicitly told not to bother.
Fixes: #19377
Without this parameter, we would allow user@ to start if the user
has no password (i.e. the password is "locked"). But when the user does have a password,
and it is marked as expired, we would refuse to start the service.
There are other authentication mechanisms and we should not tie this service to
the password state.
The documented way to disable an *account* is to call 'chage -E0'. With a disabled
account, user@.service will still refuse to start:
systemd[16598]: PAM failed: User account has expired
systemd[16598]: PAM failed: User account has expired
systemd[16598]: user@1005.service: Failed to set up PAM session: Operation not permitted
systemd[16598]: user@1005.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
systemd[1]: user@1005.service: Main process exited, code=exited, status=224/PAM
systemd[1]: user@1005.service: Failed with result 'exit-code'.
systemd[1]: Failed to start user@1005.service.
systemd[1]: Stopping user-runtime-dir@1005.service...
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1961746.
Note that this means EFI-systems with a manually added TPM device won't
be supported automatically, but given that the TPM2 trust model kinda
requires firmware support I doubt it matters supporting this. And in all
other cases it speeds things up a bit.
No need to benchmark pbkdf when asking for minimal values
anyway.
1000 iterations count is minimum for both LUKS1 and LUKS2
pbkdf2 keyslot parameters according to NIST SP 800-132, ch. 5.2.
Iterations count can not be lower than recommended minimum
when benchmark is disabled. The time_ms member is ignored with
benchmark disabled.
Code using libcryptsetup already sets the global log function if it uses
dlopen_cryptsetup(). Make sure we do the same for the three programs
that explicitly link against libcryptsetup and hence to not use
dlopen_cryptsetup().
So far we only set the per-crypt_device log functions, but some
libcryptsetup calls we invoke without a crypt_device objects, and we
want those to redirect to our infra too.
We want user records to be extensible, hence we shouldn't complain about
fields we can't parse. In particular we want them to be extensible for
our own future extensions.
Some code already turned the permissive flag when parsing the JSON data,
but most did not. Fix that. A few select cases remain where the bit is
not set: where we just gnerated the JSON data ourselves, and thus can be
reasonably sure that if we can't parse it it's our immediate programming
error and not just us processing a user record from some other tool or a
newer version of ourselves.
This catches up homed's FIDO2 support with cryptsetup's: we'll now store
the uv/up/clientPin configuration at enrollment in the user record JSON
data, and use it when authenticating with it.
This also adds explicit "uv" support: we'll only allow it to happen when
the client explicity said it's OK. This is then used by clients to print
a nice message suggesting "uv" has to take place before retrying
allowing it this time. This is modelled after the existing handling for
"up".
Giving --echo to systemd-ask-password allows to echo the user input.
There's nothing secret, so do not show a lock and key emoji by default.
The behavior can be controlled with --emoji=yes|no|auto. The default is
auto, which defaults to yes, unless --echo is given.
In https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34803, we fail with:
Assertion 'IN_SET(r, -ENOMEM, -EMFILE, -ENFILE)' failed at src/journal-remote/fuzz-journal-remote.c:69,
function int LLVMFuzzerTestOneInput(const uint8_t *, size_t)(). Aborting.
AddressSanitizer:DEADLYSIGNAL
Let's try to print the error, so maybe we can see what is going on.
With the previous commit we shouldn't print out anything.
Those are unexpected, so a user-visible message seems appropriate.
But they are not our errors, and to some extent we can recover from
them, so "warning" seems more appropriate than "error".
When fuzzing, the following happens:
- we parse 'data' and produce an argv array,
- one of the items in argv is assigned to arg_host,
- the argv array is subsequently freed by strv_freep(), and arg_host has a dangling symlink.
In normal use, argv is static, so arg_host can never become a dangling pointer.
In fuzz-systemctl-parse-argv, if we repeatedly parse the same array, we
have some dangling pointers while we're in the middle of parsing. If we parse
the same array a second time, at the end all the dangling pointers will have been
replaced again. But for a short time, if parsing one of the arguments uses another
argument, we would use a dangling pointer.
Such a case occurs when we have --host=… --boot-loader-entry=help. The latter calls
acquire_bus() which uses arg_host.
I'm not particularly happy with making the code more complicated just for
fuzzing, but I think it's better to resolve this, even if the issue cannot
occur in normal invocations, than to deal with fuzzer reports.
Should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31714.
This line is so long, that the end is usually not visible on
the terminal. The dot looks out of place, and dropping it saves one
column for more interesting content.
When looking at logs from a boot with an encrypted device, I see
(with unrelevant messages snipped):
[ 2.751692] systemd[1]: Started Dispatch Password Requests to Console.
[ 7.929199] systemd-cryptsetup[258]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/2d9b648a-15b1-4204-988b-ec085089f8ce.
[ 9.499483] systemd[1]: Finished Cryptography Setup for luks-2d9b648a-15b1-4204-988b-ec085089f8ce.
There is a hug gap in timing without any explanatory message. If I didn't type
in the password, there would be no way to figure out why things blocked from
this log, so let's log something to the log too.
0cf84693877f060254f04cf38120f52c2aa3059c added --console.
6af621248f2255f9ce50b0bafdde475305dc4e57 added an optional argument, but didn't
update the help texts.
Note that there is no ambiguity with the optional argument because no positional
arguments are allowed.
Let's show the touch emoji whenever the user is likely going to have to
interact with the security token. We had this at many but not all such
messages. Let's add it everywhere.
Also, upgrade all messages where the user is supposed to do something to
LOG_NOTICE. Previously some where at LOG_NOTICE and others at LOG_INFO.
These messages are more than informational after all, they require user
action, hence deserve the higher prio, in particular as that formats
them bold with our usual log coloring.
Always use the word "test" in log messages, instead of "check".
Finally, always use the same wording: "confirm presence on security
token" for "up" and "verify user on security token" for "uv"
Let's improve compatibility with systemd 248 enrollments of FIDO2 keys:
if we have no information about the up/uv/pin settings, let's try to
determine them automatically, i.e. use up and pin if needed.
This only has an effect on LUKS2 volumes where a FIDO2 key was enrolled
with systemd 248 and thus the JSON data lacks the up/uv/pin fields. It
also matters if the user configured FIDO2 parameters explicitly via
crypttab options, so that the JSON data is not used.
For newer enrollments we'll stick to the explicit settings, as that's
generally much safer and robust.
We need really need to trust the feature set, since we are about to set
it in stone storing the result in JSON, hence react a bit more allergic
about token that misadvertise the feature.
Note that I added this to be defensive, I am not aware any token that
actually misadvertises this. hence it should be safe to make this fatal,
and should this not work we can always revisit things.
Let's try to handle keys gracefully that do not implement all features
we ask for: simply turn the feature off, and continue.
This is in particular relevant since we enroll with PIN and UP by
default, and on devices that don't support that we should just work.
Replaces: #18509
This makes the functions handle "xx/" and "xx/." as equivalent.
Moreover, now path_extract_directory() returns normalized path, that is
no redundant "/" or "/./" are contained.
This also makes path_compare() may return arbitrary integer as it now
simply pass the result of strcmp() or memcmp().
This changes the behavior of path_extract_filename/directory() when
e.g. "/." or "/./" are input. But the change should be desired.
without waiting for online, there is a race condition between systemd-networkd
actually setting the new values and the test checking those values
This also sets the link down before restarting systemd-networkd, to avoid
the wait for online being a no-op
In commit d895e10a a test was introduced to validate that prefix is a
child of rootprefix. However, it only works when rootprefix is "/".
Since the test is ignored when rootprefix is equal to prefix, this is
only noticed if specifying both -Drootprefix= and -Dprefix=, e.g.:
$ meson foo -Drootprefix=/foo -Dprefix=/foo/bar
meson.build:111:8: ERROR: Problem encountered: Prefix is not below
root prefix (now rootprefix=/foo prefix=/foo/bar)
The generic string_hash_ops_free_free hash operations vtable currently
assumes the data pointer is of type char*. There's really no reason to
assume that though, we regularly store non-string data as value in a
hashmap. Hence, to accomodate for that, use void* as pointer for the
value (and keep char* for the key, as that's what
string_hash_ops_free_free is for, after all).
This adds two things:
- A new switch --uuid is added to "udevadm trigger". If specified a
random UUID is associated with the synthettic uevent and it is printed
to stdout. It may then be used manually to match up uevents as they
propagate through the system.
- The UUID logic is now implicitly enabled if "udevadm trigger --settle"
is used, in order to wait for precisely the uevents we actually
trigger. Fallback support is kept for pre-4.13 kernels (where the
requests for trigger uevents with uuids results in EINVAL).
Since kernel 4.13 the kerne allows passing a UUID to generated uevents.
Optionally do so via a new sd_device_trigger_with_uuid() call, and add
sd_device_get_trigger_uuid() as helper to retrieve the UUID from a
uevent we receive.
This is useful for tracking uevents through the udev system, and waiting
for specific triggers.
(Note that the 4.13 patch allows passing arbitrary meta-info into the
uevent as well. This does not add an API for that, because I am not
convinced it makes sense — as it conflicts with our general rule that
events are "stateless" if you so will — and it complicates the interface
quite a bit).
This replaces #13881 in a way, which added a similar infra, but which
stalled, and whose synchronous settling APIs are somewhat problematic
and probably not material to merge.
This is the case because the ID128 we generate are all marked as v4 UUID
which requires that some bits are zero and others are one. Let's
document this so that people can rely on SD_ID128_NULL being a special
value for "uninitialized" that is always distinguishable from generated
UUIDs.
When `NoNewPrivileges=yes`, the service shouldn't have a need for any
setuid/setgid programs, so in case there will be a new mount namespace anyway,
mount the file systems with MS_NOSUID.
The code works differently than the docs, and the code is right here.
Fix the doc hence.
See VALID_CHARS in unit-name.c for details about allowed chars in unit
names, but keep in mind that "-" and "\" are special, since generated by
the escaping logic: they are OK to show up in unit names, but need to be
escaped when converting foreign strings to unit names to make sure
things remain reversible.
Fixes: #19623
Strictly speaking adding this is a compatibility break, given that
previously % weren't special. But I'd argue that was simply a bug, as
for the much more prominent Environment= service setting we always
resolved specifiers, and DEfaultEnvironment= is explicitly listed as
being the default for that. Hence, let's fix that.
Replaces: #16787
This might be useful for CopyFiles=, to reference some subdir of $TMP in
a generic way. This allows us to use the new common
system_and_tmp_specifier_table[].
This moves the definition of the specifier table consisting only of
system and /tmp specifiers into generic code so that we can share it.
This patch only adds one user of it for now. Follow-up patches will add
more.
Otherwise things get very confusing since we mix up netens data from our
client side and from the data we retrieve from networkd.
In the long run we should teach networkctl some switch to operate safely
on other netns, and in that case also determine the right networkd
instance for that namespace.
Fixes: #19236
This is useful for clients to determine whether they are running in the
same network namespace as networkd.
Note that access to /proc/$PID/ns/ is restricted and only permitted to
equally privileged programs. This new bus property is primarily a way to
work around this, so that unprivileged clients can determine the
networkd netns, too.
The comment suggests we validate paths here, but we actually didn't, we
only validated filenames. Let' fix that.
(Note this still lets any kind of paths through, including those with
".." and stuff, this is not a normalization check after all)
Previously, we supported only "," as separator. This adds support for
"+" and makes it the documented choice.
This is to make specifying PCRs in crypttab easier, since commas are
already used there for separating volume options, and needless escaping
sucks.
"," continues to be supported, but in order to keep things minimal not
documented.
Fixe: #19205
When watching paths that contain symlinks in some element we so far
always only watched the inode they are pointing to, not the symlink
inode itself. Let's fix that and always watch both. We do this by simply
installing the inotify watch once with and once without IN_DONT_FOLLOW.
For non-symlink inodes this just overrides the same watch twice (where
the second one replaces the first), which is has no effect effectively.
For symlinks it means we'll watch both source and destination.
Fixes: #17727
This moves all calls that shall do deferred work on detecting whether to
start/stop the unit or dependent units after a unit state change to the
end of the function, to make things easier to read.
So far, these calls were spread all over the function, and
conditionalized needlessly on MANAGER_RELOADING(). This is unnecessary,
since the queues are not dispatched while reloading anyway, and
immediately before acting on a queued unit we'll check if the suggested
operation really makes sense.
The only conditionalizaiton we leave in is on checking the new unit
state itself, since we have that in a local variable anyway.
So far StopWhenUnneeded= handling and UpheldBy= handling was already
processed by a queue that is dispatched in a deferred mode of operation
instead of instantly. This changes BoundBy= handling to be processed the
same way.
This should ensure that all *event*-to-job propagation is done directly
from unit_notify(), while all *state*-to-job propagation is done from a
deferred work queue, quite systematically. The work queue is submitted
to by unit_notify() too.
Key really is the difference between event and state: some jobs shall be
queued one-time on events (think: OnFailure= + OnSuccess= and similar),
others shall be queued continuously when a specific state is in effect
(think: UpheldBy=). The latter cases are usually effect of the
combination of states of a few units (e.g. StopWhenUnneeded= checks
wether any of the Wants=/Requires=/… deps are still up before acting),
and hence it makes sense to trigger them to be run after an individual
unit's state changed, but process them on a queue that runs whenever
there's nothing else to do that ensures the decision on them is only
taken after all jobs/queued IO events are dispatched, and things
settled, so that it makes sense to come to a combined conclusion. If
we'd dispatch this work immediately inside of unit_notify() we'd always
act instantly, even though another event from another unit that is
already queued might make the work unnecessary or invalid.
This is mostly a commit to make things philosophically clean. It does
not add features, but it should make corner cases more robust.
Let's not consider a unit unneeded while it is reloading.
Uneeded should be a pretty weak concept: if there's any doubt that
something bit be needed, then assume it is.
This is like a really strong version of Wants=, that keeps starting the
specified unit if it is ever found inactive.
This is an alternative to Restart= inside a unit, acknowledging the fact
that whether to keep restarting the unit is sometimes not a property of
the unit itself but the state of the system.
This implements a part of what #4263 requests. i.e. there's no
distinction between "always" and "opportunistic". We just dumbly
implement "always" and become active whenever we see no job queued for
an inactive unit that is supposed to be upheld.
This is similar to OnFailure= but is activated whenever a unit returns
into inactive state successfully.
I was always afraid of adding this, since it effectively allows building
loops and makes our engine Turing complete, but it pretty much already
was it was just hidden.
Given that we have per-unit ratelimits as well as an event loop global
ratelimit I feel safe to add this finally, given it actually is useful.
Fixes: #13386
This takes inspiration from PropagatesReloadTo=, but propagates
stop jobs instead of restart jobs.
This is defined based on exactly two atoms: UNIT_ATOM_PROPAGATE_STOP +
UNIT_ATOM_RETROACTIVE_STOP_ON_STOP. The former ensures that when the
unit the dependency is originating from is stopped based on user
request, we'll propagate the stop job to the target unit, too. In
addition, when the originating unit suddenly stops from external causes
the stopping is propagated too. Note that this does *not* include the
UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT atom (which is used by BoundBy=),
i.e. this dependency is purely about propagating "edges" and not
"levels", i.e. it's about propagating specific events, instead of
continious states.
This is supposed to be useful for dependencies between .mount units and
their backing .device units. So far we either placed a BindsTo= or
Requires= dependency between them. The former gave a very clear binding
of the to units together, however was problematic if users establish
mounnts manually with different block device sources than our
configuration defines, as we there might come to the conclusion that the
backing device was absent and thus we need to umount again what the user
mounted. By combining Requires= with the new StopPropagatedFrom= (i.e.
the inverse PropagateStopTo=) we can get behaviour that matches BindsTo=
in every single atom but one: UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT is
absent, and hence the level-triggered logic doesn't apply.
Replaces: #11340
Let's add an implicit reverse dep OnFailureOf=. This is exposed via the
bus to make things more debuggable: you can now ask systemd for which
units a specific unit is the failure handler.
OnFailure= was the only dependency type that had no inverse, this fixes
that.
Now that deps are a bit cheaper, it should be OK to add deps that only
serve debug purposes.
The slice a unit is assigned to is currently a UnitRef reference. Let's
turn it into a proper dependency, to simplify and clean up code a bit.
Now that new dep types are cheaper, deps should generally be preferable
over everything else, if the concept applies.
This brings one major benefit: we often have to iterate through all unit
a slice contains. So far we iterated through all Before= dependencies of
the slice unit to achieve that, filtering out unrelated units, and
taking benefit of the fact that slice units are implicitly ordered
Before= the units they contain. By making Slice= a proper dependency,
and having an accompanying SliceOf= dependency type, this is much
simpler and nicer as we can directly enumerate the units a slice
contains.
The forward dependency is actually called InSlice internally, since we
already used the UNIT_SLICE name as UnitType field. However, since we
don't intend to expose the dependency to users as dep anyway (we already
have the regular Slice D-Bus property for this) this shouldn't matter.
The SliceOf= implicit dependency type (the erverse of Slice=/InSlice=)
is exported over the bus, to make things a bit nicer to debug and
discoverable.
In a later commit we intend to move the slice logic to use proper
dependencies instead of a "UnitRef" object. This preparatory commit
drops direct use of the slice UnitRef object for a static inline
function UNIT_GET_SLICE() that is both easier to grok, and allows us to
easily replace its internal implementation later on.
On Debian, bpftool is installed in /usr/sbin, which is not in $PATH for
non-root users by default, so finding it fails.
Add a secondary, hard-coded '/usr/sbin/bpftool' after 'bpftool' so that
meson can find it.
https://packages.debian.org/sid/amd64/bpftool/filelist
Previously, when a link has already in a numbered group, we cannot
remove the link from the group.
This also fixes the range mentioned in the man page.
The manpage says that exiting 77 is the same as exiting 0,
then skipping all other hooks, but the behaviour heretofor
was to exit 0, skip all, and behave as if all hooks exited 0
This is not very pretty, but the code in fs-util.c already provisions for
missing /proc. We ourselves are careful to set up /proc, but not everybody
is and it is important for sysusers to also work where shadow-utils would:
I would like to replace calls to useradd and groupadd in Fedora systemd rpm
scriptlets with a call to sysusers. It has a number of advantages:
- dogfooding
- we don't need to manually duplicate the information from our sysusers
files to scriptlets
- a dependency on shadow-utils is dropped, which transitively drops dependencies
on setup and fedora-repos and bunch of other stuff.
We could try to get 'dnf' and 'rpm --root' and such to be reworked,
but not in any reasonable timeframe. And even if this was done, we'd still
want to support older rpm/dnf versions.
I'm trying to use systemd-sysusers for systemd.rpm itself, and the invocation
in dnf chroot is failing like this:
...
Creating group input with gid 999.
Creating group kvm with gid 36.
Creating group render with gid 998.
Creating group sgx with gid 997.
Creating group systemd-journal with gid 190.
Creating group systemd-network with gid 192.
Creating user systemd-network (systemd Network Management) with uid 192 and gid 192.
Creating group systemd-oom with gid 996.
Creating user systemd-oom (systemd Userspace OOM Killer) with uid 996 and gid 996.
Creating group systemd-resolve with gid 193.
Creating user systemd-resolve (systemd Resolver) with uid 193 and gid 193.
Creating group systemd-timesync with gid 995.
Creating user systemd-timesync (systemd Time Synchronization) with uid 995 and gid 995.
Creating group systemd-coredump with gid 994.
Creating user systemd-coredump (systemd Core Dumper) with uid 994 and gid 994.
Failed to write files: Function not implemented
Let's add more info to make such failures easier to debug.
Add quotes around use of $env{MODALIAS} in rules.d/80-drivers.rules. The
modalias can contain whitespace, for example when it is dynamically generated
using device or vendor IDs.
There are nothing we can configure in udevd for loopback interfaces;
no ethertool configs can be applied, MAC address, interface name should
not be touched.
ethtool_set_glinksettings() already fallback to use ETHTOOL_GSET/ETHTOOL_SSET
commands when ETHTOOL_GLINKSETTINGS/ETHTOOL_SLINKSETTINGS are not
supported.
The atkbd device on the Lenovo Yoga 300-11IBR 2-in-1 sends unknown
keycodes when the touchpad is toggled on/off:
[ 1918.995562] atkbd serio0: Unknown key pressed (translated set 2, code 0x63 on isa0060/serio0).
[ 1918.995610] atkbd serio0: Use 'setkeycodes 63 <keycode>' to make it known.
[ 1919.032121] atkbd serio0: Unknown key released (translated set 2, code 0x63 on isa0060/serio0).
[ 1919.032135] atkbd serio0: Use 'setkeycodes 63 <keycode>' to make it known.
[ 1926.098414] atkbd serio0: Unknown key pressed (translated set 2, code 0x62 on isa0060/serio0).
[ 1926.098461] atkbd serio0: Use 'setkeycodes 62 <keycode>' to make it known.
[ 1926.146537] atkbd serio0: Unknown key released (translated set 2, code 0x62 on isa0060/serio0).
[ 1926.146583] atkbd serio0: Use 'setkeycodes 62 <keycode>' to make it known.
The "Ideapad extra buttons" driver alreadys sends f22 / f23 key-events
when the touchpad is toggles off, so map the keycodes for the duplicate
atkbd events to unknown to silence these kernel warnings.
Instead of comparing strings everywhere, let's use the new enum. This
allows us to drop sleep_settings(), since the operation enum can be
directly used as index into the config settings.
Some minor other refactoring is done, but mostly just shifting thing
around a bit, no actual change in behaviour.
Since d8f9686c0f1f276c0a687d9bd69f3adf33f15a95 we use the chattr +i flag
for marking containers in directories as reead-only. But to do so we
need the cap for it, hence grant it.
Fixes: #19115
I'm working on building initramfs images directly from normal packages, and it
doesn't make sense for those units to be started. Pristine system rpms need to
behave correctly as much as possible also in the initrd, and those units are
enabled by the rpms. There usually isn't enough time for the timer to actually
fire, but starting it gives a line on the console and generally looks confusing
and sloppy. Flushing the journal means that its actually lost, since the real
/var is not available yet.
Another approach would be not enable those units, but right now they are
statically enabled, and changing that would be more work, and doesn't really
seem necessary, since the condition checks are very quick.
Checking for /etc/initrd-release is the standard condition that the initrd
units use, so let's do the same here.
Previously we'd pass all return values of read_virtual_file() to
log_info_errno() as error, but that makes no sense, given that we
sometimes return positive one with means "not truncated" but we'd show
as "Permission denied. Let's fix this, and log differently for sucess
and error.
This reverts a major part of: e17c95af8e450caacde692875b30675cea75211f
Using format strings for concatenating strings is pretty unefficient,
and using PATH_MAX buffers unpretty as well. Let's revert to using
strjoina() as before.
However, to fix the fuzz issue at hand, let's explicitly verify the two
input strings ensuring they are valid path names. This includes a length
check (to 2K each), thus making things prettier, faster and using less
memory again.
This is also not entirely obvious. I think the code I came
up with is pretty elegant ;] The final part of of the code that makes
use of the parsed data is kept very similar to the shell code on purpose,
even though it could be written a bit more idiomatically.
Let's order the fields from the most general to least: os name, os variant, os
version, machine-parseable version details, metadata, special settings. I added
section headers to roughly group the settings. The division is not strict,
because for example CPE_NAME also includes the version, and PRETTY_NAME may
too, but it still makes it easier to find the right name.
Also split out Examples to separate paragraphs:
almost all descriptions had "Example:" at the end, where multiple
examples were listed. Splitting this out to separate paragraphs
makes the whole thing much easier to read.
Add missing markup and punctuation while at it.
About
- If not set, defaults to <literal>NAME=Linux</literal>.
+ If not set, a default of <literal>NAME=Linux</literal> may be used.
and similar changes: in many circumstances, if this is not set, no value should
be used. The fallback mostly make sense when we need to present something to the
user. So let's reword this to not imply that the default is necessary.
The kernel can be compiled without support for any memory.swap.* files, or
it can be disabled at boot time with the 'swapaccount=0' boot parameter,
so if the file doesn't exist log warning indicating the kernel doesn't
support the file and the user may need to try using the 'swapaccount=1'
boot param.
Note that the actual error from the call to fopen() is ENOENT, but
that is translated into ENODATA in cg_get_attribute_as_uint64()
The kernel still provides the /proc and cgroup pressure files even
if its psi support is disabled, so we need to actually read the files
to verify they don't return -EOPNOTSUPP
These macros will log a message at the specified level only the first time
they are called. On all later calls, if the specified level is debug, the
logs will be suppressed; otherwise the message will be logged at debug.
Every location that this macro is used, it will be true the first
time it's checked, then false each time after that.
This can be useful for things such as one-time logging.
If the journal file being processed is archivied, seqnum_id will not be
initialized before being passed on, and coverity complains.
Initialize it to zero.
CID #1453235
This ensures that the fuzz test code is also built by default.
It also increases the test coverage a bit. Compiling the tests
*with* sanitizers is painfully slow, so this is not enabled. But
just compiling them sauté is hardly noticable. Running the tests
increases the test count and runtime:
622 tests, 26 s
to
922 tests, 35 s
I think this is acceptable.
We use the `autologin` mkosi option (see
mkosi.default.d/10-systemd.conf), so the pexpect root login throws
a (harmless) error:
```
Arch Linux (built from systemd tree)
Kernel 5.4.0-1047-azure on an x86_64 (console)
image login: root (automatic login)
root
root
[root@image ~]# systemctl poweroff
root
-bash: root: command not found
[root@image ~]# systemctl poweroff
```
Let's introduce a somewhat ugly workaround for #19442 and retry
the systemd-nspawn image boot test up to three times in case it dies
with the dissect timeout. Since this issue occurs only in the Arch job,
limit the workaround to this job only.
Hardcoding major numbers sucks. And we generally don't do it, except
when determining whether something is a PTY. Thing though is that we
don't actually need to do that here either, hence don#t.
This new option does three things for a host user specified via
--bind-user=:
1. Bind mount the home directory from the host directory into
/run/host/home/<username>
2. Install an additional user namepace UID/GID mapping mapping the host
UID/GID of the host user to an unused one from the container in the range
60514…60577.
3. Synthesize a user/group record for the user/group under the same name
as on the host, with minimized information, and the UID/GID set to
the mapped UID/GID. This data is written to /run/host/userdb/ where
nss-system will pick it up.
This should make sharing users and home directories from host into the
container pretty seamless, under some conditions:
1. User namespacing must be used.
2. The host UID/GID of the user/group cannot be in the range assigned to
the container (kernel already refuses this, as this would mean two
host UIDs/GIDs might end up being mapped to the same continer
UID/GID.
3. There's a free UID/GID in the aforementioned range in the container,
and the name of the user/group is not used in the container.
4. Container payload is new enough to include an nss-systemd version
that picks up records from /run/host/userdb/
We recently started making more use of malloc_usable_size() and rely on
it (see the string_erase() story). Given that we don't really support
sytems where malloc_usable_size() cannot be trusted beyond statistics
anyway, let's go fully in and rework GREEDY_REALLOC() on top of it:
instead of passing around and maintaining the currenly allocated size
everywhere, let's just derive it automatically from
malloc_usable_size().
I am mostly after this for the simplicity this brings. It also brings
minor efficiency improvements I guess, but things become so much nicer
to look at if we can avoid these allocation size variables everywhere.
Note that the malloc_usable_size() man page says relying on it wasn't
"good programming practice", but I think it does this for reasons that
don't apply here: the greedy realloc logic specifically doesn't rely on
the returned extra size, beyond the fact that it is equal or larger than
what was requested.
(This commit was supposed to be a quick patch btw, but apparently we use
the greedy realloc stuff quite a bit across the codebase, so this ends
up touching *a*lot* of code.)
This is a wrapper around malloc_usable_size() but is typesafe, and
divides by the element size.
A test it is also added ensuring what it does it does correcly.
It's a wrapper around malloc_usable_size() that is supposed to be
compatible with _FORTIFY_SOURCES=1, by taking the
__builtin_object_size() data into account, the same way as the
_FORTIFY_SOURCES=1 logic does.
Fixes: #19203
m4 is required to build the test SELinux module:
```
[ 31.321789] sh[483]: /bin/sh: line 1: m4: command not found
[ 31.882668] sh[488]: Compiling targeted systemd_test module
[ 32.120862] sh[492]: /bin/sh: line 1: m4: command not found
[ 32.159897] sh[458]: make: *** [/usr/share/selinux/devel/include/Makefile:156: tmp/systemd_test.mod] Error 127
```
... and /usr/bin/ path for a library package which provides an executable we
care about (libxslt).
This way the mkosi dependency list corresponds directly to the names which are
used in the dependency() and find_program() lines in meson.build. It also makes
the thing more resilient to package splits and renames.
In case the link online state is invalid, networkctl will print
"unknown", which is sufficiently neutral. The same goes for the overall
manager online state if there are no managed links, or if
RequiredForOnline=no for all managed links.
Example output:
$ networkctl status
● State: routable
Online state: partial
Address: 172.22.0.130 on wlan0
...
$ networkctl status wlan0
● 3: wlan0
Link File: /lib/systemd/network/99-default.link
Network File: /etc/systemd/network/50-wlan0.network
Type: wlan
State: routable (configured)
Online state: online
...
With new "online state" semantics in networkd, make the description of
RequiredFamilyForOnline= a little more broad. Some rewording has been
done to make the passage easier to understand.
Since networkd advertises a reliable online state, use it in
network_is_online(). If for some reason networkd does not know the
online state (e.g. it does not manage any of the network interfaces),
fall back to the original best-guess logic.
Add a new state of type LinkOnlineState which indicates whether a link
is online or not. The state is also used by networkd's manager to expose
the overall online state of the system.
The possible states are:
offline the link (or system) is offline
partial at least one required link is online (see below)
online all required links are online
For links, a link is defined to be "online" if:
- it is managed; and
- its operational state is within the range defined by
RequiredForOnline=; and
- it has an IPv4 address if RequiredFamilyForOnline=ipv4 or =both; and
- it has an IPv6 address if RequiredFamilyForOnline=ipv6 or =both.
A link is defined to be "offline" if:
- it is managed; and
- it is not online, i.e. its operational state is not within the range
defined by RequiredForOnline=, and/or it is missing an IP address in
a required address family.
Otherwise, the link online state is undefined (represented internally as
_LINK_ONLINE_STATUS_INVALID or -EINVAL). Put another way, networkd will
only offer a meaningful online state for managed links where
RequiredForOnline=yes.
For the manager, the online state is a function of the online state of
all links which are requried for online, i.e. RequiredForOnline=yes. If
all required links are online, then the manager online state is defined
to be "online". If at least one of the required links is online, then
the manager online state is defined to be "partial". If none of
the required links are online, then the manager online state is defined
to be "offline". If there are no managed links, or RequiredForOnline=no
for all managed links, then the manager online state is undefined as
above.
The purpose of the "partial" state is analogous to the --any switch in
systemd-networkd-wait-online.service(8). For example, a required link
which lacks a carrier on boot will not force the overall (manager)
online state to "offline" if there is an alternative link available.
Recent meson versions include the directory name in the target name,
so there is no conflict for files with the same name in different
directories. But at least with meson-0.49.2 in buster we have conflict
with sysusers.d/systemd.conf.
This doesn't matter too much, but makes things a bit more consistent.
A minor advantage is that the file is not a configuration file for meson
anymore, so:
a) It is not built unless pulled in by another target. Since
we don't usually build man pages by default, this saves a tiny
amount of work.
b) When the .in file is updated, meson does not reconfigure everything,
but just rebuilds the dependent targets.
Now that the conversion is finished, time for benchmarking:
a full build with default settings (and -Dstandalonebinaries=true), yields
before this pull request: 1687 targets, 148.13s user 35.17s system 317% cpu 57.697 total
with the full pull request: 1714 targets, 143.07s user 27.87s system 314% cpu 54.369 total
The difference doesn't seem significant. Partial rebuilds might be faster as
mentioned before.
We had two big 'configuration_data' objects in meson config. (There are in fact
more. On is added in this series, and there's one for efi… But those others
have a handful variables only for specific purposes and don't matter). The two
sets are 'conf' and 'substs', and were inherited from the original autotools
system. In the past there was even a third set ('m4_defines'), but @yuwata
removed it in 348b44372f36010d48d9a7dda14ef67155753a71. And those two/three
systems had very similar data, but with different variable names, because of
historical reasons. They also used subtly different quoting (.set()
vs. .set10() vs. .set_quoted()), which was required because the templating
engines were not flexible enough. This meants we had more work when changing
things, and we needed to search for different variable names, etc.
With a more flexible templating engine we can do with just one
configuration_data object.
One stanza had "if install_sysconfdir_samples", while the other
"if install_sysconfdir", which looks like a mistake.
install_sysconfdir_samples is now used for both.
The naming of variables is very inconsistent. I tried to use more
modern style naming (UNDERSCORED_TITLE_CASE), but I didn't change existing
names too much. Only SYSTEM_DATA_UNIT_PATH is renamed to SYSTEM_DATA_UNIT_DIR
to match SYSTEM_CONFIG_UNIT_DIR.
I wanted to use jinja2 templating here too, but it's hard to get right:
custom_target() strips the executable bit by default (unlike configure_file
apparently). custom_target() has install_mode setting, but it was only added
in meson-0.47, so it can't be used while we support 0.46. And without the
executable bit the test is not invoked properly. For example, "root-unittests"
in the debian package calls test-* after installation, so the executable bit
there is necessary. It would be possible to adjust the file mode after the
fact, but it would make things more complicated.
So let's use the native meson substitutions here. We don't need anything more
fancy.
I want to stop using 'substs'. But in this case, configure_file() is nicer
than custom_target(), because it causes meson to immediately generate the
helpers after configuration, so it's possible to do
'meson build && build/man/man ...', without building anything first.
We only substitute one variable here, so let's use a custom configuration_data()
object.
m4 was hugely popular in the past, because autotools, automake, flex, bison and
many other things used it. But nowadays it much less popular, and might not even
be installed in the buildroot. (m4 is small, so it doesn't make a big difference.)
(FWIW, Fedora dropped make from the buildroot now,
https://fedoraproject.org/wiki/Changes/Remove_make_from_BuildRoot. I think it's
reasonable to assume that m4 will be dropped at some point too.)
The main reason to drop m4 is that the syntax is not very nice, and we should
minimize the number of different syntaxes that we use. We still have two
(configure_file() with @FOO@ and jinja2 templates with {{foo}} and the
pythonesque conditional expressions), but at least we don't need m4 (with
m4_dnl and `quotes').
Jinja2 inserts an empty line after the first macro body, which I don't know how
to get rid of. Only the first macro causes problems: the other ones don't have
conditional statements at the end and the issue does not occur. As a work-around
I moved ProtectHostname to the end of the first macro.
Output is identical, except for horizontal whitespace and change in position of
ProtectHostname.
HAVE_SMACK_RUN_LABEL was dropped back in 348b44372f36010d48d9a7dda14ef67155753a71,
so one line in etc.conf was not rendered as expected ;(
Checking if names are defined is paying for itself!
The comment talks about upstream development steps and doesn't make
sense for users. We used special '## ' syntax to strip it out during
build, but it got inadvertently reformatted as a normal comment
in 3982becc92197b920d86f03c3c52ae085e26ca60.
We don't need two (and half) templating systems anymore, yay!
I'm keeping the changes minimal, to make the diff manageable. Some enhancements
due to a better templating system might be possible in the future.
For handling of '## ' — see the next commit.
m4 was nice in '85, but the syntax feels a bit dated. Since we use python for
meson, let's use a popular python templating engine to replace some m4 usage.
A little nicety is that typos are caught:
FAILED: sysusers.d/systemd-remote.conf
/usr/bin/meson --internal exe --capture sysusers.d/systemd-remote.conf -- /home/zbyszek/src/systemd/tools/meson-render-jinja2.py config.h ../sysusers.d/systemd-remote.conf.j2
Traceback (most recent call last):
File "/home/zbyszek/src/systemd/tools/meson-render-jinja2.py", line 28, in <module>
print(render(sys.argv[2], defines))
File "/home/zbyszek/src/systemd/tools/meson-render-jinja2.py", line 24, in render
return template.render(defines)
File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/usr/lib/python3.9/site-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "<template>", line 8, in top-level template code
jinja2.exceptions.UndefinedError: 'HAVE_MICROHTTP' is undefined
This checking mirrors what 349cc4a507c4d84fcadf61f42159ea6412717896 did for C defines.
Previously, when link_new() fails, `link_unref()` was called, so,
`Manager::links` may become dirty.
This introduces `link_drop_or_unref()` and it will be called on
failure.
When after_configure() for a request fails, then the request is not
removed from the queue at that time, and the link enters to failed
state. After that, if the link is reconfigured or its carrier is lost,
then the request is dropped from the queue, and the message_counter is
decreased. However, the counter is already or will be also decreased
when the corresponding netlink reply is received.
So, the counter is decreased twice.
User managers always pass their environment on to their children.
Make that clear in the description of ManagerEnvironment= which
states that none of those args will get passed to child processes of
service managers.
In the change set 6c045a999800c62368470938307951bb669f5afc the error
text for the old flag `--private-users-chown` was repurposed for the
new flag `--private-users-ownership=own` and while doing so the word
`may` was dropped leading to a grammatically incorrect error text.
When /etc/localtime is a symbolic link pointing to another symbolic
link, get_timezone will return -EINVAL instead of the timezone.
This issue can cause systemd-networkd DHCPServer to fail.
Instead of returning failure, log a warning indicating that that
the timezone will not be sent.
modified: networkd-dhcp-server.c
Adds a crypttab option 'silent' that enables the AskPasswordFlag
ASK_PASSWORD_SILENT. This allows usage of systemd-cryptsetup to default
to silent mode, rather than requiring the user to press tab every time.
Printing stdout and stderr from a failed test makes it harder to
interpret what the specific problem was; instead let's print out
the lines in order as we got them when the test was run
Also save failed test output to file if ARTIFACT_DIRECTORY is defined
In 0e0fd08fc832b8f42e567d722d388eba086da5ff I added reference counts to keep
track of the DnsQueryCandidate objects. Unfortunately, dns_query_unref_candidates()
was written as
while (q->candidates)
dns_query_candidate_unref(q->candidates);
i.e. it would keep dropping the reference count as many times as needed for it
to hit 0, making the patch less than fully effective.
dns_query_unref_candidates() is renamed to dns_query_detach_candidates() and
changed to drop exactly one reference from each of the linked candidates.
Example failure:
==463== Invalid read of size 8
==463== at 0x419C93: dns_query_candidate_go (resolved-dns-query.c:159)
==463== by 0x41A143: dns_query_candidate_notify (resolved-dns-query.c:304)
==463== by 0x434BD6: dns_transaction_complete (resolved-dns-transaction.c:437)
==463== by 0x436A0F: dns_transaction_process_dnssec (resolved-dns-transaction.c:976)
==463== by 0x4378C1: dns_transaction_process_reply (resolved-dns-transaction.c:1387)
==463== by 0x437CE9: on_dns_packet (resolved-dns-transaction.c:1444)
==463== by 0x4B2DC9B: source_dispatch (sd-event.c:3512)
==463== by 0x4B2FB1F: sd_event_dispatch (sd-event.c:4077)
==463== by 0x4B2FFFA: sd_event_run (sd-event.c:4138)
==463== by 0x4B301D6: sd_event_loop (sd-event.c:4159)
==463== by 0x464A24: run (resolved.c:92)
==463== by 0x464B3C: main (resolved.c:99)
==463== Address 0x5f409d0 is 32 bytes inside a block of size 72 free'd
==463== at 0x48410E4: free (vg_replace_malloc.c:755)
==463== by 0x418EDF: mfree (alloc-util.h:48)
==463== by 0x4197E8: dns_query_candidate_free (resolved-dns-query.c:67)
==463== by 0x4198B7: dns_query_candidate_unref (resolved-dns-query.c:70)
==463== by 0x41A2E3: dns_query_unref_candidates (resolved-dns-query.c:337)
==463== by 0x41C5FE: dns_query_cname_redirect (resolved-dns-query.c:1028)
==463== by 0x41CA04: dns_query_process_cname_one (resolved-dns-query.c:1128)
==463== by 0x41CA80: dns_query_process_cname_many (resolved-dns-query.c:1157)
==463== by 0x40C0BD: bus_method_resolve_hostname_complete (resolved-bus.c:198)
==463== by 0x41B312: dns_query_complete (resolved-dns-query.c:562)
==463== by 0x41C1AC: dns_query_accept (resolved-dns-query.c:922)
==463== by 0x41C2C4: dns_query_ready (resolved-dns-query.c:955)
==463== by 0x41A162: dns_query_candidate_notify (resolved-dns-query.c:314)
==463== by 0x434BD6: dns_transaction_complete (resolved-dns-transaction.c:437)
==463== by 0x438995: dns_transaction_prepare (resolved-dns-transaction.c:1728)
==463== by 0x43921D: dns_transaction_go (resolved-dns-transaction.c:1928)
==463== by 0x419C7C: dns_query_candidate_go (resolved-dns-query.c:163)
==463== by 0x41A143: dns_query_candidate_notify (resolved-dns-query.c:304)
==463== by 0x434BD6: dns_transaction_complete (resolved-dns-transaction.c:437)
==463== by 0x436A0F: dns_transaction_process_dnssec (resolved-dns-transaction.c:976)
==463== by 0x4378C1: dns_transaction_process_reply (resolved-dns-transaction.c:1387)
==463== by 0x437CE9: on_dns_packet (resolved-dns-transaction.c:1444)
==463== by 0x4B2DC9B: source_dispatch (sd-event.c:3512)
==463== by 0x4B2FB1F: sd_event_dispatch (sd-event.c:4077)
==463== by 0x4B2FFFA: sd_event_run (sd-event.c:4138)
==463== by 0x4B301D6: sd_event_loop (sd-event.c:4159)
==463== by 0x464A24: run (resolved.c:92)
==463== by 0x464B3C: main (resolved.c:99)
==463== Block was alloc'd at
==463== at 0x483E86F: malloc (vg_replace_malloc.c:380)
==463== by 0x418F81: malloc_multiply (alloc-util.h:96)
==463== by 0x419378: dns_query_candidate_new (resolved-dns-query.c:23)
==463== by 0x41B42C: dns_query_add_candidate (resolved-dns-query.c:582)
==463== by 0x41BB7A: dns_query_go (resolved-dns-query.c:762)
==463== by 0x40CE3A: bus_method_resolve_hostname (resolved-bus.c:464)
==463== by 0x4A84B86: method_callbacks_run (bus-objects.c:414)
==463== by 0x4A87961: object_find_and_run (bus-objects.c:1323)
==463== by 0x4A87FEE: bus_process_object (bus-objects.c:1443)
==463== by 0x4AA3434: process_message (sd-bus.c:2964)
==463== by 0x4AA3623: process_running (sd-bus.c:3006)
==463== by 0x4AA4110: bus_process_internal (sd-bus.c:3226)
==463== by 0x4AA41EF: sd_bus_process (sd-bus.c:3253)
==463== by 0x4AA5343: io_callback (sd-bus.c:3604)
==463== by 0x4B2DC9B: source_dispatch (sd-event.c:3512)
==463== by 0x4B2FB1F: sd_event_dispatch (sd-event.c:4077)
==463== by 0x4B2FFFA: sd_event_run (sd-event.c:4138)
==463== by 0x4B301D6: sd_event_loop (sd-event.c:4159)
==463== by 0x464A24: run (resolved.c:92)
==463== by 0x464B3C: main (resolved.c:99)
Fixes#19376.
This reverts commit a2031de849da52aa85b7e4326c0112ed7e5b5672.
The patch itself seems OK, but it exposes a bug in lxml or libxml2-2.9.12 which
was just released. This is being resolved in
https://gitlab.gnome.org/GNOME/libxml2/-/issues/255, but it might be while. So
let's revert this for now to unbreak our CI.
Fixes#19601.
Old meson fails with:
Element not a string: [<Holder: <ExternalProgram 'sh' -> ['/bin/sh']>>, '-c', 'test -n "$DESTDIR" || /bin/journalctl --update-catalog']
I'm doing it as a revert so that it's easy to undo the revert when we require
newer meson. The effect is not so bad, maybe a dozen or so lines about finding
'sh'.
We obviously have lots of those, so even small savings add up.
Bitfields are dropped because they don't give any memory savings due to
alignment requirements (but would still require more complex to access).
/* size: 184, cachelines: 3, members: 28 */
/* sum members: 172, holes: 1, sum holes: 4 */
/* sum bitfield members: 4 bits (0 bytes) */
/* padding: 7 */
/* bit_padding: 4 bits */
↓
/* size: 176, cachelines: 3, members: 28 */
The structure is rearranged to have less holes. Also fields in the union
are rearranged not to have holes (though most variants of the union still
have some padding at the end).
The full size does not decrease a lot, but the compiler should be able to
copy less bytes when it knows the specific type of the union.
Bitfields are dropped because they don't give any memory savings due to
alignment requirements (but would still require more complex to access).
The change from the this and previous commit:
/* size: 128, cachelines: 2, members: 13 */
/* sum members: 112, holes: 3, sum holes: 15 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
↓
/* size: 112, cachelines: 2, members: 13 */
/* sum members: 108, holes: 1, sum holes: 4 */
Change from the last three commits:
/* size: 312, cachelines: 5, members: 46 */
/* sum members: 296, holes: 5, sum holes: 16 */
↓
/* size: 288, cachelines: 5, members: 46 */
/* sum members: 286, holes: 1, sum holes: 1 */
It's not a big difference, but we might have quite a few queries in flight,
so let' make this a bit more efficient.
Meson 0.58 has gotten quite bad with emitting a message every time
a quoted command is used:
Program /home/zbyszek/src/systemd-work/tools/meson-make-symlink.sh found: YES (/home/zbyszek/src/systemd-work/tools/meson-make-symlink.sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program xsltproc found: YES (/usr/bin/xsltproc)
Configuring custom-entities.ent using configuration
Message: Skipping bootctl.1 because ENABLE_EFI is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Message: Skipping journal-remote.conf.5 because HAVE_MICROHTTPD is false
Message: Skipping journal-upload.conf.5 because HAVE_MICROHTTPD is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Message: Skipping loader.conf.5 because ENABLE_EFI is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
...
Let's suffer one message only for each command. Hopefully we can silence
even this when https://github.com/mesonbuild/meson/issues/8642 is
resolved.
Currently the label database is not reloaded with libselinux 3.2 on a
policy reload.
Since libselinux 3.2 avc_open(3) uses the SELinux status page instead of
a netlink socket to check for policy reloads.
The status page is also queried in mac_selinux_maybe_reload().
Thus calls to selinux_check_access(3) might consume an update, queried
by selinux_status_updated(3), leaving mac_selinux_maybe_reload() unable
to detect a policy reload.
Do not use selinux_status_updated(3), use selinux_status_policyload(3)
unconditionally.
Relevant libselinux commit: 05bdc03130
Debian Bullseye is going to ship libselinux 3.1, so stay compatible for
backports.
The settings were listen in a completely random order, also different
between the v4 and v6 sections. Order by "options sent", "options received",
"communication settings" in both sections.
Also minor formatting changes are done, e.g. "=" is added in various places.
When `--json` option is specified, "status" and "list" commands gives
the same information, as originally "list" just gives partial
information of "status" in different format.
systemd-run is documented to as being able to connect and run on a
specific user bus with "--user --machine=lennart@.host" arguments.
This PR updates some logic that prevented this from working.
I occasionally do 'build/man/man systemd.directives' when working on man pages,
and it's annoying slow. By paralellizing the parsing of xml, we can make it a
bit faster.
This is still rather innefficient. Only the parsing part is serialized, xml is
still produced serially at the end, which is hard to avoid.
$ ninja -C build man/systemd.directives.xml
before:
8.20s user 0.21s system 99% cpu 8.460 total
8.33s user 0.18s system 98% cpu 8.619 total
8.72s user 0.19s system 98% cpu 9.019 total
after:
13.99s user 0.73s system 345% cpu 4.262 total
14.15s user 0.35s system 348% cpu 4.161 total
14.33s user 0.35s system 339% cpu 4.321 total
I.e. it uses almost twice as much cpu, but cuts the wallclock time down (on a
2-core/4-thread cpu) to about half too, which is an overall win if you're just
trying to render the man page.
The change from list and .append() to set and .add() is something that could
have been done before too, but it's noticable now. It cuts down on the
serialization/deserialization time (about .2s).
- Internet specifications give 1600 DPI @ 1000Hz for this sensor
- Confirmed experimentally via `mouse-dpi-tool`
- vid, pid, and name match string from `mouse-dpi-tool`
Apparently CAN links will show up in rtnetlink with very low MTUs. We
shouldn't consider them relevant if no IP is spoken over them, since
these MTUs are irrelevant for us then.
Hence, let's check if there's an address assigned to the link before
considering its MTU.
As additional safety net filter out MTUs smaller than the minimum DNS
packet size, too.
Finally, in case we don't find any suitable interface MTU, let's default
to 1500 as the generic Ethernet MTU.
Fixes: #19396
This was a copy/paste mistae apparently, there's not "try_authtok" and
this was supposed to copy what Fedora uses, which uses "use_authtok"
correctly. Hence adjust this.
Fixes: #19369
This drops the "const" specifier from the opaque object parameters to
various functions in our API.
This effectively reverts #19292 and more.
Why drop this? Our public APIs should not leak too much information
about how stuff is implemented internally. In our public APIs we
shouldn't give too many guarantees we don#t want to necessarily keep.
Specifically: in many cases it makes sense that getters actually
generate/parse/allocate data on the fly, storing/caching the result
internally, to speed things up, do things lazily or to track memory
allocations so that they can be freed later. Doing this means we need to
change the objects, even though the getters are semantically a read
operation.
We want to retain the freedom that we can change things around
internally. By exposing the objects as "const" we remove a good chunk of
that, for little gain.
See sd_bus_creds_get_description() for a real example of a getter that
implicitly caches and thus modifies the relevant object.
This removes the "const" decorators from sd-dhcp and sd-netlink, two
APIs that we intend to make public eventually even though they still are
not, leaving us the chance to still fix this before it becomes set in
stone.
Why is this necessary? Several examples below.
- When a route sets prefsrc, then the address must be already assigned
(see issue #19285), and also it must be ready if IPv6.
- When a route or nexthop sets gateway, then the address must be reachable.
- When a route sets nexthop ID, then the corresponding nexthop must be
assigned.
- When a route sets multipath routes on another interface, then the
interface must exist and be ready to configure.
- When configuring address, the same address must not be under removing
(see issue #18108).
Etc,. etc,...
So, this makes all requests about addresses, routes, and nethops are once
stored in the queue, and will be processed when they are ready to configure.
Fixes#18108 and #19285.
We usually call specifier_printf() and then check the validity of
the result. In many cases, validity checkers, e.g. path_is_valid(),
refuse too long strings. This makes specifier_printf() refuse such
long results earlier.
Moreover, unit_full_string() and description field in sysuser now
refuse results longer than LONG_LINE_MAX. config_parse() already
refuses the line longer than LONG_LINE_MAX. Hence, it should be ok
to set the same value as the maximum length of the resolved string.
Let's make sure that our close handler unrefs a connection again that we
are already unreffing a few stack frames up by invalidating the pointer
first, and dropping the ref counter only after that.
Replaces: 39ad3f1c092b5dffcbb4b1d12eb9ca407f010a3c
Fixes: #18025
In most of our codebase when we referenced "ipv4" and "ipv6" on the
right-hand-side of an assignment, we lowercases it (on the
left-hand-side we used CamelCase, and thus "IPv4" and "IPv6"). In
particular all across the networkd codebase the various "per-protocol
booleans" use the lower-case spelling. Hence, let's use lower-case for
SocketBindAllow=/SocketBindDeny= too, just make sure things feel like
they belong together better.
(This work is not included in any released version, hence let's fix this
now, before any fixes in this area would be API breakage)
Follow-up for #17655
Some motherboards convert the path to uppercase under certain circumstances
(e.g. after booting into the Boot Menu in the ASUS ROG STRIX B350-F GAMING).
I have also seen that VIOS LTH17 has the exact same correction and it's also a SIPODEV composite hid device also through usb. In the D330 is a detachable keyboard. It's possible that a very generic way to apply this to at least affected sipodev keyboard could be found using the device ids, but needs info to do that and ensure all sipodev keyboard with the pertinent ids need it.
Signed-off-by: David Santamaría Rogado <howl.nsp@gmail.com>
We use pvr match for efifb pitch and drm orientation quirk and in touchpad toggle keymap. Also seems most consistent with the devices here.
While at it, correct a typo, 81H3 and 81MD are product names not numbers, my bad.
Signed-off-by: David Santamaría Rogado <howl.nsp@gmail.com>
This is a follow-up for #19514 which changed unit_name_to_instance() to
return ENOMEM as a UnitType enum, even though the enum didn't
necessarily have range for that.
Let's extend the range explicitly, so that we can cover the full errno
range in it.
The directory might not be created in the ESP but in the extended boot
loader partition, hence don#t claim otherwise.
Also, give a brief reason why the concept exists at all.
Link up machine-id man page.
Follow-up for: 6a3fff75baad94d9ebff1a6c7d1fb35448c44a81
This fixes some line-break confusion introduced by #11199
(c6cecb744b53561efd329309af7d02a3f9979ed1). It also restores a test with
GID_INVALID that was dropped, presumably by accident.
FLAGS_SET() checks if *all* the bits are set. In this case we want to check
if *any* are. FLAGS_SET() was added in cde2f8605e0c3842f9a87785dd758f955f2d04ba,
but not a bug then yet, because with just one bit, both options are equivalent.
But when more bits were added later, this stopped being correct.
Fixup for cde2f8605e0c3842f9a87785dd758f955f2d04ba. Use PIN+PV because the
status quo ante was that we turned off "uv" and left "up" and "clientPin" in
its default values, which with yubikeys (i.e. the most popular hardware) meant
both "up" and "clientPin" were enabled by default.
Coverity CID#1453085.
Let's initialize this at the same place for any iterator allocated. (Yes
not all types of iterator objects need this, but it's still nice to
share this trivial code at one place).
Clearly communicate to callers that we didn't find a single varlink
service, when a lookup is attempted. Note that the fallback's to NSS,
drop-ins and synthesis might eat up this error again, but we should
really make this case reasonably recognizable, in particular as our
various tools already handle this condition correctly and print a nice
message then.
To determine the network interface type for use in the `Type=` directive, it is more concise to use the `list` command. Whereas, the `status` command requires an interface parameter.
For example, on a RaspberryPi 4 the following shows that the `wlan0` interface type `wlan` is more coveniently listed by the `list` command.
```
root@raspberrypi4-64:~# networkctl list
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier unmanaged
2 eth0 ether routable configured
3 wlan0 wlan off unmanaged
3 links listed.
```
Whereas the `networkctl status` command doesn't include this information.
```
root@raspberrypi4-64:~# networkctl status
● State: routable
Address: 192.168.1.141 on eth0
fd8b:8779:b7a4::f43 on eth0
fd8b:8779:b7a4:0:dea6:32ff:febe:d1ce on eth0
fe80::dea6:32ff:febe:d1ce on eth0
Gateway: 192.168.1.1 (CZ.NIC, z.s.p.o.) on eth0
DNS: 192.168.1.1
May 07 14:17:18 raspberrypi4-64 systemd-networkd[212]: eth0: Gained carrier
May 07 14:17:19 raspberrypi4-64 systemd-networkd[212]: eth0: Gained IPv6LL
May 07 14:17:19 raspberrypi4-64 systemd-networkd[212]: eth0: DHCPv6 address fd8b:8779:b7a4::f43/128 timeout preferred -1 valid -1
May 07 14:17:21 raspberrypi4-64 systemd-networkd[212]: eth0: DHCPv4 address 192.168.1.141/24 via 192.168.1.1
```
To get the interface type using the `status` command you need to specify an additional argument.
```
root@raspberrypi4-64:~# networkctl status wlan0
● 3: wlan0
Link File: /lib/systemd/network/99-default.link
Network File: n/a
Type: wlan
State: off (unmanaged)
Path: platform-fe300000.mmcnr
Driver: brcmfmac
HW Address: dc:a6:32:be:d1:cf (Raspberry Pi Trading Ltd)
MTU: 1500 (min: 68, max: 1500)
QDisc: noop
IPv6 Address Generation Mode: eui64
Queue Length (Tx/Rx): 1/1
```
During an update of RRs, the records of each DNS-SD service are
replaced with new ones. However the old RRs can only be removed from
the mDNS scopes as long as they remain accessible from the DnssdService
structures, otherwise they remain stuck there.
Therefore the removal must take place before the update.
Binutils for non-x86 architectures currently does not support PE binaries. Thus
linux.efi.stub is useless on those, as one cannot use any tooling to add
linux/cmdline/splash sections to it. In addition to PE linux.efi.stub also
install ELF linux.elf.stub, such that one can use objcopy ELF target to copy in
linux/cmdline/splash sections and then convert the result to a PE binary.
This ensures we not only synthesize regular paswd/group records of
userdb records, but shadow records as well. This should make sure that
userdb can be used as comprehensive superset of the classic
passwd/group/shadow/gshadow functionality.
Setting the flags means we won#t try to read the data from /etc/shadow
when reading a user record, thus slightly making conversion quicker and
reducing the chance of generating MAC faults, because we needlessly
access a privileged resource. Previously, passing the flag didn't
matter, when converting our JSON records to NSS since the flag only had
an effect on whether to use NSS getspnam() and related calls or not. But
given that we turn off NSS anyway as backend for this conversion (since
we want to avoid NSS loops, where we turn NSS data to our JSON user
records, and then to NSS forever and ever) it was unnecessary to pass
it.
This changed in one of the previous commits however, where we added
support for reading user definitions from drop-in files, with separate
drop-in files for the shadow data.
This adds a two new values to --private-users-ownership=: "map" and
"auto".
"map" exposes the kernel 5.12 idmap feature pretty much 1:1. It fails if
the kernel or used file system doesn't support ID mapping.
"auto" is a bit smarter: if we can make ID mapping work, we'll use it,
otherwise revert back to classic chown()ing. We'll also use chown()ing
if we detect that an image is already ID shifted, both to increase
compatibility with the status quo ante, and to simplify our codepaths,
since the mappings become a lot simpler if we only have to map from zero
to something else, instead of from anything to anything else.
The short -U switch, and --private-users=pick will now imply
--private-users-ownership=auto instead of
--private-users-ownership=chown, since the new logic should be the much
better choice.
This makes use of the new kernel 5.12 APIs to add an idmap to a mount
point. It does so by cloning the mountpoint, changing it, and then
unmounting the old mountpoint, replacing it later with the new one.
This replaces --private-user-chown by an enum value
--private-user-ownership=off|chown. Changes otherwise very little.
This is mostly preparation for a follow-up commit adding a new "map"
mode, using kernel 5.12 UID mapping mounts.
Note that this does alter codeflow a bit: the new enum already knows
three different values instead of the old true/false pair. Besides "off"
and "chown" it knows -EINVAL, i.e. whenever the value wsn't set
explicitly. This value is changed to "off" or "chown" before use, thus
retaining compat to the status quo before, except it won't override
explicit configuration anymore. Thus, if you explicitly request
--private-user=pick you can now combine it wiht an explicit
--private-user-ownership=off if you like, which will give you a
container that runs under its own UID set, but the files will be owned
by the original image. Makes not much sense besids maybe debugging, but
if requested explicitly I think it's OK to implement.
userns identity 1:1 mapping is a pretty useful concept since it isolates
capability sets between containers and hosts, even if it doesn't map
any uid ranges. Let's support it with an explicit concept.
(Note that this is identical to --private-users=0:65536 (which in turn
is identical to --private-users=0), but I think it makes to emphasize
this concept as a high-level one that makes sense to support.)
Some tokens support authorization via fingerprint or other biometric
ID. Add support for "user verification" to cryptenroll and cryptsetup.
Disable by default, as it is still quite uncommon.
In some cases user presence might not be required to get _a_
secret out of a FIDO2 device, but it might be required to
the get actual secret that was used to lock the volume.
Record whether we used it in the LUKS header JSON metadata.
Let the cryptenroll user ask for the feature, but bail out if it is
required by the token and the user disabled it.
Enabled by default.
Closes: https://github.com/systemd/systemd/issues/19246
Some FIDO2 devices allow the user to choose whether to use a PIN or not
and will HMAC with a different secret depending on the choice.
Some other devices (or some device-specific configuration) can instead
make it mandatory.
Allow the cryptenroll user to choose whether to use a PIN or not, but
fail immediately if it is a hard requirement.
Record the choice in the JSON-encoded LUKS header metadata so that the
right set of options can be used on unlock.
On headless setups, in case other methods fail, asking for a password/pin
is not useful as there are no users on the terminal, and generates
unwanted noise. Add a parameter to /etc/crypttab to skip it.
So far we basically had two ways to iterate through NSS records: one via
the varlink IPC and one via the userdb.[ch] infra, with slightly
different implementations.
Let's clean this up, and always use userdb.[ch] also when resolving via
userdbd. The different codepaths for the NameServiceSwitch and the
Multiplexer varlink service now differ only in the different flags
passed to the userdb lookup.
Behaviour shouldn't change by this. This is mostly refactoring, reducing
redundant codepaths.
Let's use "exclude" for flags that really exclude records from our
lookup. Let's use "avoid" referring to concepts that when flag is set
we'll not use but we have a fallback path for that should yield the same
result. Let' use "suppress" for suppressing partial info, even if we
return the record otherwise.
So far we used "avoid" for all these cases, which was confusing.
Whiel we are at it, let's reassign the bits a bit, leaving some space
for bits follow-up commits are going to add.
This fixes the following error:
```
In file included from ../src/basic/af-list.h:6,
from ../src/basic/af-list.c:7:
../src/basic/string-util.h: In function 'char_is_cc':
../src/basic/string-util.h:133:19: error: comparison is always true due to limited range of data type [-Werror=type-limits]
133 | return (p >= 0 && p < ' ') || p == 127;
| ^~
cc1: all warnings being treated as errors
```
Fixes#19543.
userdbd listens on "two" sockets, that are actually the same: one is a
real AF_UNIX socket in the fs, and the other is a symlink to it.
So far, when userdbd was started from the command line it would make one
a symlink and the other a real socket, but when invoked via unit files
they'd be swapped, i.e. the other would be a symlink and the one a real
socket.
Let's bring this in line.
Since the "io.systemd.Multiplexer" is our main interface, let's make it
the one exposed as socket, and then make "io.systemd.NameServiceSwitch"
a symlink to it. Or in other words, let's adjust the C code to match the
unit file.
Add SBAT support, when -Dsbat-distro value is specified. One can use
-Dsbat-distro=auto for autodetection of all sbat options. Many meson configure
options added to customize SBAT CSV values, but sensible defaults are auto
detected by default. SBAT support is required if shim v15+ is used to load
systemd-boot binary or kernel.efi (Type II BootLoaderSpec).
Fixes#19247
When we are queried for membership lists on a system that has exactly
zero, then we'll return ESRCH immediately instead of at EOF. Which is
OK, but we need to handle this in various places, and not get confused
by it.
It's not going to be efficient if called in inner loops, but it's oh so
handy, and we have some code that does this:
asprintf(&p, "%s…", b, …);
free(b);
b = TAKE_PTR(p);
which can now be replaced by the quicker and easier to read:
strextendf(&p, "…", …);
Let's add three defines for the 3 special cases of passwords.
Some of our tools used different values for the "locked"/"invalid" case,
let's settle on using "!*" which means the password is both locked *and*
invalid.
Other tools like to use "!!" for this case, which however is less than
ideal I think, since the this could also be a considered an entry with
an empty password, that can be enabled again by unlocking it twice.
This fixes an issue introduced by the commit 954c77c2510c0328fd98354a59f380945752c38c.
For some reasons, setting default ACL on $TESTDIR makes TEST-29-PORTABLE
fail. Let's drop the default ACL, and set ACL on saved results instead.
Fixes#19519.
D330-10IGM has been added due the fact that 81H3 and 81MD product name belongs to the same product version. So the fact is that now that we know 81MD has the same transformation matrix that the 81H3 we can just use the product version and get rid the product name.
Signed-off-by: David Santamaría Rogado <howl.nsp@gmail.com>
Quoting Documentation/driver-api/vfio.rst in Linux:
> note that /dev/vfio/vfio provides no capabilities on its own and is therefore
> expected to be set to mode 0666 by the system
If ":" was the last char in the string, we would call access() on ".../drivers/", which
would pass. It probably doesn't matter, but let's reject this anyway.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33881.
Not only we would duplicate unknown input on the stack, we would do it
over and over. So let's first check that the input has reasonable length,
but also allocate just one fixed size buffer.
The ID_FFADO environment variable comes from external FFADO project.
Now we have comprehensive and self-contained rules instead of it.
Let's remove it.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
In IEC 61883-1:1998, we can see some values for AV/C device with vendor
unique command set in IEC 61883-1:1998. Current udev rule handles it
for video. However it brings an issue that the functions in AV/C device
are not distinguished just by the content of configuration ROM.
In former commit, hardware database was added to describe function type
of unit in the node, then udev rules are added to utilize the database.
However, we have an request to obsolete existent udev rules by putting
enough entries to the database. It should be done carefully.
This commit adds entry into hardware database just for backward
compatibility. The entry can match to some node and unit unexpectedly.
Therefore this commit modifies existent entries to invalidate the effect
from added entry.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Typical node of AV/C device has standard content of configuration ROM.
This is defined in documentation of 1394 Trading Association.
* Configuration ROM for AV/C Devices 1.0 (Dec. 12, 2000, 1394 Trading
Association, TA Document 1999027)
However, it brings an issue that the functions in AV/C device are not
distinguished just by the content of configuration ROM.
In former commit, hardware database was added to describe function type
of unit in the node, then udev rules are added to utilize the database.
However, we have an request to obsolete existent udev rules by putting
enough entries to the database. It should be done carefully.
This commit adds entry into hardware database just for backward
compatibility. The entry can match to some node and unit unexpectedly.
Therefore this commit modifies existent entries to invalidate the effect
from added entry.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
IIDC specification describes configuration ROM without model field, thus
it's not possible to match any entry with vendor ID and model ID.
Current entry for Cool Stream iSweet can match any node and unit of
IIDC.
This commit removes the entry. I note that this model uses Texus
Instruments MC680-DCC as all-in-one chipset for video function in
IEEE 1394 bus.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Point Grey Research, inc. shipped cameras to support IIDC, however some
of them are necessarily compliant to IIDC specification in terms of the
value of software version field in unit directory of configuration ROM.
This commit adds entries for them.
Reviewed-by: Damien Douxchamps <damien@douxchamps.net>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Instrumentation & Industrial Digital Camera (IIDC) specifications are
defined by 1394 Trading Association for camera device in IEEE 1394 bus.
IIDC2 specifications are defined by joint working group between Japan
Industrial Imaging Association (JIIA) and 1394 Trade Association as
bus-independent specification.
This commit adds entries for the specifications to remove existent udev
rules. Supported specifications are listed below:
* 1394-based Digital Camera Specification Version 1.04 (Aug. 9, 1996,
1394 Trading Association)
* 1394-based Digital Camera Specification Version 1.20 (Jul. 23, 1998,
1394 Trading Association)
* IIDC Digital Camera Control Specification Ver.1.30 (Jul. 25, 2000,
1394 Trading Association)
* IIDC Digital Camera Control Specification Ver.1.31 (Feb. 2, 2004,
1394 Trading Association, TA Document 2003017)
* IIDC Digital Camera Control Specification Ver.1.32 (Jul. 24, 2008,
1394 Trading Association, Document number 2007009)
* IIDC2 Digital Camera Control Specification Ver.1.0.0 (Jan 26th, 2012,
1394 Trading Association, TS2011001)
* IIDC2 Digital Camera Control Specification Ver.1.1.0 (May 19th, 2015,
1394 Trading Association, TS2015001)
Reviewed-by: Damien Douxchamps <damien@douxchamps.net>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Linux kernel has firedtv kernel module as driver for Digital Everywhere
FloppyDTV and FireDTV. Although this driver works without any help of
userspace application, it's better to add entries to hardware database
for developer's convenience.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Zbigniew Jędrzejewski-Szmek points that current entries are against the
convention of indentation. It should be indented by one space instead of
two.
This commit fixes current entries according to it.
Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Fixes: 1b6d9a05b14a ("hwdb: add database entries for models with ASICs in BeBoB solution")
Fixes: 0db0564e957f ("hwdb: add database entries for models with Fireworks board module")
Fixes: 38338b302cb0 ("hwdb: add database entries for models with OXFW970/971 ASICs")
Fixes: c0d8b61f9385 ("hwdb: add database entries for models based on DICE ASICs with TCAT specification")
Fixes: a774b5099bce ("hwdb: add database entries for models based on DICE ASICs specialized to M-Audio")
Fixes: ff1cb7b9393a ("hwdb: add database entries for models based on DICE ASICs specialized to Weiss Engineering")
Fixes: 6f44dddbe20a ("hwdb: add database entries for models based on DICE ASICs specialized by Loud Technologies")
Fixes: 49ed0aad525b ("hwdb: add database entries for models based on DICE ASICs specialized by Harman Music Group")
Fixes: effbb4024b8b ("hwdb: add database entries for models based on DICE ASICs specialized by Solid State Logic")
Fixes: 4aaa093b5fb6 ("hwdb: add database entries for models of Digidesign Digi 00x family")
Fixes: c489e7f9d3c4 ("hwdb: add database entries for Tascam FireWire series")
Fixes: 650b8967a57b ("hwdb: add database entries for MOTU FireWire series")
Fixes: 51e9242b9b91 ("hwdb: add database entries for RME Fireface series")
Fixes: a90a6a9ae9f8 ("hwdb: add database entries for Yamaha mLAN 2nd generation")
Fixes: 41f2d0d393a4 ("hwdb: add database entries for Yamaha mLAN 3rd generation")
Fixes: 1d2ee962922f ("hwdb: add database entries for Focusrite Liquid Mix series")
Fixes: 0c20543835d6 ("hwdb: add database entries for TC Electronic PowerCore FireWire series")
Fixes: 8b4b76dc5021 ("hwdb: add database entry for node with single unit with video function")
Fixes: 12dd2404bee8 ("hwdb: add database entries for node with multiple units")
Fixes: dece0357e1c8 ("hwdb: add database entries for node with single unit for multiple functions")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
When given no arguments, hwdb parser script seeks test target files by
glob pattern. Although I added a new file for IEEE 1394 unit functions,
the file is excluded as test target due to the pattern.
This commit fixes it.
Fixes: 7713f3fc6a2 ("hwdb: add parser grammar for IEEE 1394 unit function list")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
The function returns non-negative UnitNameFlags on success, and negative
errno on error. In the past we kept the return type as int because of those
negative return values. But nowadays _UNIT_NAME_INVALID == -EINVAL. And if
we tried to actually return something that doesn't fit in the return type,
the compiler would throw an error. By changing to the "real" return type,
we allow the debugger to use symbolic representation for the variables.
I think quoting is more useful than not quoting. Without, arguments with
whitespace cannot be split correctly.
Unlike in coredump, "normal" quoting is used in those two cases. This output is
mostly for informational purposes, so the more readable quoting seems apropriate.
dbus GetProcesses:
$ busctl --user call org.freedesktop.systemd1 /org/freedesktop/systemd1/unit/run_2dr4450e1ae73944194bb6593fcfd255fbe_2eservice org.freedesktop.systemd1.Service GetProcesses
a(sus) 2
"/user.slice/user-1000.slice/user@1000.service/app.slice/run-r4450e1ae73944194bb6593fcfd255fbe.service" 131494 "/usr/bin/bash -c \"sleep 100; sleep 20\""
"/user.slice/user-1000.slice/user@1000.service/app.slice/run-r4450e1ae73944194bb6593fcfd255fbe.service" 131496 "sleep 100"
$ coredumpctl info |grep Command
Command Line: bash -c kill -SEGV $$ (before)
Command Line: bash -c "kill -SEGV \$\$" (road not taken, C quotes)
Command Line: bash -c $'kill -SEGV $$' (now, POSIX quotes)
Before we wouldn't use any quoting, making it impossible to figure how the
command line was split into arguments. We could use "normal" quotes, but this
has the disadvantage that the commandline *looks* like it could be pasted into
the terminal and executed, but this is not true: various non-printable
characters cannot be expressed in this quoting style. (This is not visible in
this example). Thus, "POSIX quotes" are used, which should allow any command
line to be expressed acurrately and pasted directly into a shell prompt to
reexecute.
I wonder if we should another field in the coredump entry that simply shows the
original cmdline with embedded NULs, in the original /proc/*/cmdline
format. This would allow clients to format the data as they see fit. But I
think we'd want to keep the serialized form anyway, for backwards compatibility.
The new flag is not used, except in tests, so no functional change yet.
This way, the command as shown can be copied-and-pasted into the shell
in more cases. For simple cases, shell quoting with "" is enough. But
$'' is needed when there are control characters in the command.
Significant time was spent in the getpid() measurement code, which is not very
important. So let's optimize this a bit by running the slower version less
times, and only running both tests a lesser amount of times unless slow tests
are enabled.
This gives the better accuracy then before in slow mode, and still reasonable
accuracy in fast mode without a noticable slowdown.
It makes little sense to always print the stuff that is fully deterministic
and verified by asserts. It can be opted-in with $SYSTEMD_LOG_LEVEL when
developing the tests or debugging a failure.
Since the new functionality is controlled by an option, this causes no change
in output yet, except tests.
The login in the old branch of !(flags & PROCESS_CMDLINE_QUOTE) is essentially
unmodified. But there is an important difference in behaviour: instead of
unconditionally reading the whole virtual file, we now read only 'max_columns'
bytes. This makes out code to write process lists quite a bit more efficient
when there are processes with long command lines.
So far we would append "…" or "..." when the string was wider than the specified
output width. But let's add a mode where the caller knows that the string being
passed is already truncated.
The condition for jumping back in utf8_escape_non_printable_full() was
off-by-one. But we only jumped to that label after doing a check with a
stronger condition, so I think it didn't matter. Now it matters because we'd
output the forced ellipsis one column too early.
The comment in the code said that so far this didn't matter, but I want to use
shell quoting in more places where this will make a difference. So control
characters are now escaped. Normal utf-8 characters are passed through, it
is 2021 after all and pretty much everyone is (or should be) using utf-8.
While touching the code, change 'char *r' → 'char *buf', in line with modern
style.
shell_escape() is mostly used for mount paths and similar, where we assume
no newlines are present in the string. But if any were ever present, we
should escape them. So let's simplify the code by making this unconditional.
"systemd-testsuite" gets in the way when grepping for "testsuite-*.sh".
Also, the name doesn't matter for anything, so let's just use something
very short to save space.
Today this is v248 with 938bdfc0fa737d86eb3ecc70506e11e5f740e0dc, which,
if you don't know about the github webflow key fails to configure with
meson.build:724:8: ERROR: String "gpg: Signature made Tue 30 Mar 2021 22:59:02 CEST\ngpg: using RSA key 4AEE18F83AFDEB23\ngpg: Can't check signature: No public key\n1617137942\n" cannot be converted to int
or, if you do, with
meson.build:724:8: ERROR: String 'gpg: Signature made Tue 30 Mar 2021 22:59:02 CEST\ngpg: using RSA key 4AEE18F83AFDEB23\ngpg: Good signature from "GitHub (web-flow commit signing) <noreply@github.com>" [unknown]\ngpg: WARNING: This key is not certified with a trusted signature!\ngpg: There is no indication that the signature belongs to the owner.\nPrimary key fingerprint: 5DE3 E050 9C47 EA3C F04A 42D3 4AEE 18F8 3AFD EB23\n1617137942\n' cannot be converted to int
We had the following scenario:
under /etc/systemd/system/
- foo@.service
- bar@tty12.service → foo@tty12.service
- multi-user.target.wants/foo@tty12.service
Existing code did not "know" that foo@tty12.service has alias bar@tty12.service:
$ systemctl show -P Names foo@tty12.servicefoo@tty12.service
Since multi-user.target is always loaded, we would load foo@tty12.service.
When trying to load bar@tty12.service, it would (correctly) detect that
bar@tty12.service is an alias for foo@tty12.service, and try to merge the
bar@tty12.service unit into the foo@tty12.service. This would fail, because
foo@tty12.service was already loaded, and only about-to-be-loaded units can
be merged.
With the patch we consider bar@tty12.service an alias of foo@tty12.service
immediately, so the issue does not occur:
$ systemctl show -P Names foo@tty12.servicefoo@tty12.servicebar@tty12.serviceFixes#19409.
This turned in a bigger rewrite. The logic add "the main name and all aliases"
was implemented twice, slightly different in both cases. I split that part out
to a new function. The result about the same length, but hopefully a bit easier
to read.
Logging output is also improved a bit. Some left-over debug logs have been
removed or cleaned up.
This is a fairly big change, but (with the addition in the following commit),
we have pretty good coverage of this logic.
New glibc deprecated mallocinfo(), even newer glibc added mallocinfo2()
as replacement. Use it, if it exists.
Follow-up for 4b6f74f5a0943e0abfa8e6997811f8f7b7f00a15 and related
commits.
This fixes a bug introduced by 822be62fb23ed0ec1062ffd18057e53f6c2f8c01.
Before this, if terminal width is not enough, the all subsequent lines
are included in the hyperlink.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1955475.
We would try to return a value that could be nonzero only if the kernel
reported writing more bytes than we gave to it, hopefully a rare occurence.
Instead, assert that this doesn't happen.
Instead, return true if we got to the end of the iovec array. The caller
can use this information to know that the whole iovec array was written.
This allows one loop to be dropped in write_to_syslog().
Also drop _unlikely_: this function is called with very short arrays, and
it *is* likely that we trigger this condition. Let's just let the compiler
generate normal code without giving it a potentially false hint.
$ SYSTEMD_LOG_LEVEL=debug build/systemd --test --user
...
Failed to lookup RuntimeDirectory path: No such device or address <---- this line is new
Failed to allocate manager object: No such device or address
We would fail and only say "Failed to allocate manager object: ENODEV" which is
not entirely self-explanatory. Let's add a better log message.
When editing this function in 7bf20e48bd7d641a39a14a7feb749b7e8, I couldn't
decide whether to initialize ret at the top and only reset it on success, or
whether to assign a value in each branch. In the end I did neither ;( So if the
test finished without creating any of the result files, we would echo a
message, but return "success".
But there was bigger confusion with /failed: some tests create it empty, some
don't. I think we may want to do away pre-creation of /failed completely, and
assume the test failed unless /testok is found. But I'm leaving that for later
rework. For now let's just make sure we report return success only if /testok
or /skipped is found.
This commit applies the filtering imposed by LogLevelMax on a unit's
processes to messages logged by PID1 about the unit as well.
The target use case for this feature is a service that runs on a timer
many times an hour, where the system administrator decides that writing
a generic success message to the journal every few minutes or seconds
adds no diagnostic value and isn't worth the clutter or disk I/O.
This reverts commit 7c20dd4b6ef6e69862576722ac69b895d7a92dc9.
Debian has now been updated to patch the issue, so SemaphoreCI should
no longer fail. The fix has also been backported to the affected
stable branches.
Basically the same scenario as in
a33e2692e162671f0d97856ad2f49a2620a1ec10, where `awk` exits as soon
as it finds a match, thus sending SIGPIPE to `ldd` if it's not fast
enough. That, in combination with `set -o pipefail` causes random &
unexpected fails, like:
```
No journal files were found.
-rw-r----- 1 root root 16777216 Apr 30 10:31
/var/tmp/TEST-01-BASIC_sanitizers-nspawn/system.journal
TEST-01-BASIC RUN: Basic systemd setup [OK]
systemd is not linked against the ASan DSO
gcc does this by default, for clang compile with -shared-libasan
make: *** [Makefile:2: clean-again] Error 1
make: Leaving directory '/build/test/TEST-01-BASIC'
```
DMI vendor information fields do not provide enough information for us to
distinguish between Amazon EC2 virtual machines and bare-metal instances.
SMBIOS provides a BIOS Information
table (https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.4.0.pdf
Ch. 7) that provides a field to indicate that the current machine is a virtual
machine. On EC2 virtual machine instances, this field is set, while bare-metal
instances leave this unset, so we inspect the field via the kernel's
/sys/firemware/dmi/entries interface.
Fixes#18929
Previously, watch handle is saved in the udev databse. But in most cases,
the handle saved in the database is not updated. Especially, when udevd
is restarted, the inotify watch is restarted, but the database is not
updated.
Moreover, it is not necessary to save watch handle in the database, as
the handle is only take a effect during udevd is running, and the value
is meaningless when udevd is restarted.
So, this makes the opposite map from device ID to watch handle is saved
in /run/udev/watch as a symbolic link, and the handle not saved in the
database anymore.
Fixes#18525.
Some udev rule may erroneously set inotify watch on remove event.
For safety, silently ignore such an inotify watch enablement.
This also moves inotify watch enablement code to udev-event.c.
When udev rules are not applied correctly, then run program lists is
not perfect. So, udev_event_execute_run() later in
worker_process_device() should not be called.
When manager_exit() or manager_free() is called, the global variable in
udev-watch.c is not set '-1'. Of course, that is safe, as the event source
for the inotify fd is unref()ed in manager_exit() and manager_free().
But let's not store fd globally.
Now, RoutesToDNS= and RoutesToNTP= are enabled by default on DHCPv4
client. So, if DHCP server picks up DNS or NTP servers from uplink,
then the routes may break CI environment.
Hopefully fixes#19463.
Otherwise a coredump started at the inconvinient moment can stop
shutdown.target leaving the system in a halfway-down state:
Pulling in shutdown.target/start from systemd-poweroff.service/start
Added job shutdown.target/start to transaction.
...
Keeping job shutdown.target/start because of systemd-poweroff.service/start
...
[ OK ] Stopped target Remote File Systems.
shutdown.target: starting held back, waiting for: systemd-networkd.socket
sysinit.target: stopping held back, waiting for: remount_tmp.service
systemd-coredump.socket: Incoming traffic
...
systemd-coredump@0-243-0.service: Trying to enqueue job systemd-coredump@0-243-0.service/start/replace
Added job systemd-coredump@0-243-0.service/start to transaction.
Pulling in systemd-journald.socket/start from systemd-coredump@0-243-0.service/start
Added job systemd-journald.socket/start to transaction.
Pulling in system.slice/start from systemd-journald.socket/start
Added job system.slice/start to transaction.
Pulling in -.slice/start from system.slice/start
Added job -.slice/start to transaction.
Pulling in system-systemd\x2dcoredump.slice/start from systemd-coredump@0-243-0.service/start
Added job system-systemd\x2dcoredump.slice/start to transaction.
Pulling in system.slice/start from system-systemd\x2dcoredump.slice/start
Pulling in shutdown.target/stop from system-systemd\x2dcoredump.slice/start
Added job shutdown.target/stop to transaction.
...
Keeping job systemd-poweroff.service/stop because of umount.target/stop
Keeping job shutdown.target/stop because of systemd-coredump@0-243-0.service/start
The maximum allowed value of the sysfs device index entry was limited to
16383 (2^14-1) to avoid the generation of unreasonable onboard interface
names.
For s390 the index can assume a value of up to 65535 (2^16-1) which is
now allowed depending on the new naming flag NAMING_16BIT_INDEX.
Larger index values are considered unreasonable and remain to be
ignored.
See kernel's rtm_to_fib_config() in net/ipv4/fib_frontend.c and
rtm_to_fib6_config() in net/ipv6/route.c.
Note that if both gateway and multipath routes are specified, then
kernel ignores gateway. So, strictly speaking, setting both gateway and
multipath routes is allowed by kernel. But such situation is mostly
user's misconfiguration. Let's refuse it.
Note that the conditions newly added in route_configure() are redundant,
as all static configurations are already verified in
route_section_verify(), and dynamic configurations do not set
nexthop_id or multipath routes. Just for safety.
Usually, removing non-existing addresses, routes, and etc, are safe.
However, when multiple interfaces lost their carriers simultaneously,
then manager_drop_routes() and manager_drop_nexthop() are called multiple
times. If a route with a blackhole nexthop is removed in that process,
the later removal requests of the route fail with -EINVAL, rathar
than -ESRCH, as the corresponding nexthop does not exist anymore.
So, let's not remove routes which managed by Manager more than once.
Hidden and backup files cannot be valid unit name (we reject anything
starting with a dot, and we require type suffixes). So let's not iterate
over those at all.
The ifdef pattern is the same for all syscalls, so most of the time, if one is
not defined, all others will too. So let's reduce the noise a bit and emit one
warning in case the support for the architecture is fully missing. (Current
template was copied over from before when we added numbers for each syscall by
hand and stopped making sense when we started generating the header from a
table that is expected to have all syscall numbers.)
This patch fix scancode 0x120001 mapping to key code F20 micmute
The previous scancode is not correct, it will cause the micmute
hotkey no function when testing the mic mute
https://github.com/systemd/systemd/pull/19316 failed with:
[1065/1670] Linking target systemd-hwdb
--- command ---
14:28:29 /root/src/test/hwdb-test.sh
--- stdout ---
./systemd-hwdb does not exist, please build first
I'm not sure what is going on here… In principle meson says that tests may be
called from any directory, but in practice is was always the build directory.
So far we were relying on systemd-hwdb being present in '.', and this worked.
Either way, it's nicer to pass the exact path, so let's do that.
This allows to limit units to machines that run on a certain firmware
type. For device tree defined machines checking against the machine's
compatible is also possible.
We were duplicating setting flags for the message and a combination of
NLM_F_APPEND and NLM_F_CREATE which does not make sense. We should have
been using NLM_F_REPLACE and NLM_F_CREATE since the kernel can
dynamically create neighbors prior to us adding an entry. Otherwise, we
can end up with cases where the message will time out after ~25s even
though the neighbor still gets added. This delays the rest of the setup
of the interface even though the error is ultimately ignored.
Commit 65224c1d0e50667a87c2c4f840c49d4918718f80 renamed ShutdownWatchdogUsec
into RebootWatchdogUsec but left a reference of ShutdownWatchdogUsec in
system.conf.
Verify that service exited correctly if valid ports are passed to
SocketBind{Allow|Deny}=
Use `ncat` program starting a listening service binding to a specified
port, e.g.
"timeout --preserve-status -sSIGTERM 1s /bin/nc -l -p ${port} -vv"
Add supported and install unit interface for socket-bind feature.
supported verifies that
- unified cgroup hierarchy (cgroup v2) is used
- BPF_FRAMEWORK (libbpf + clang + llvm + bpftool) was available in
compile time
- kernel supports BPF_PROG_TYPE_CGROUP_SOCK_ADDR
- bpf programs can be loaded into kernel
- bpf link can be used
install:
- load bpf_object from bpf skeleton
- resize rules map to fit socket_bind_allow and socket_bind deny rules
from cgroup context
- populate cgroup-bpf maps with rules
- get bpf programs from bpf skeleton
- attach programs to unit cgroup using bpf link
- save bpf link in the unit
* Add `bpf-framework` feature gate with 'auto', 'true' and 'false' choices
* Add libbpf [0] dependency
* Search for clang llvm-strip and bpftool binaries in compile time to
generate bpf skeleton.
For libbpf [0], make 0.2.0 [1] the minimum required version.
If libbpf is satisfied, set HAVE_LIBBPF config option to 1.
If `bpf-framework` feature gate is set to 'auto', means that whether
bpf feature is enabled or now is defined by the presence of all of
libbpf, clang, llvm and bpftool in build
environment.
With 'auto' all dependencies are optional.
If the gate is set to `true`, make all of the libbpf, clang and llvm
dependencies mandatory.
If it's set to `false`, set `BPF_FRAMEWORK` to false and make libbpf
dependency optional.
libbpf dependency is dynamic followed by the common pattern in systemd.
meson, bpf: add build rule for socket_bind program
Add a build script to compile bpf source code. A program in restricted
C is compiled into an object file. Object file is converted to BPF
skeleton [0] header file.
If build with custom meson build rule, the target header will reside in
build/ directory (not in source tree), e.g the path for socket_bind:
`build/src/core/bpf/socket_bind/socket-bind.skel.h`
Script runs the phases:
* clang to generate *.o from restricted C
* llvm-strip to remove useless DWARF info
* bpf skeleton generation with bpftool
These phases are logged to stderr for debug purposes.
To include BTF debug information, -g option is passed to clang.
[0] https://lwn.net/Articles/806911/
Introduce BPF program compiled from BPF source code in
restricted C - socket-bind.
It addresses feature request [0].
The goal is to allow systemd services to bind(2) only to a predefined set
of ports. This prevents assigning socket address with unallowed port
to a socket and creating servers listening on that port.
This compliments firewalling feature presenting in systemd:
whereas cgroup/{egress|ingress} hooks act on packets, this doesn't
protect from untrusted service or payload hijacking an important port.
While ports in 0-1023 range are restricted to root only, 1024-65535
range is not protected by any mean.
Performance is another aspect of socket_bind feature since per-packet
cost can be eliminated for some port-based filtering policies.
The feature is implemented with cgroup/bind{4|6} hooks [1].
In contrast to the present systemd approach using raw bpf instructions,
this program is compiled from sources. Stretch goal is to
make bpf ecosystem in systemd more friendly for developer and to clear
path for more BPF programs.
[0] https://github.com/systemd/systemd/pull/13496#issuecomment-570573085
[1] https://www.spinics.net/lists/netdev/msg489054.html
Specifying the test number manually is tedious and prone to errors (as
recently proven). Since we have all the necessary data to work out the
test number, let's do it automagically.
We should test both serialization and deserialization works properly.
But the serialization/deserialization code is deeply entwined with the
manager state, and I think quite a bit of refactoring will be required before
this is possible. But let's at least add this simple test for now.
After 4b30f2e135ee84041bb597edca7225858f4ef4fb, reading stable_secret
sysctl property fails with -ENOMEM, instead of -EIO.
This is due to read_full_virtual_file() uses read() as the backend while
read_one_line_file() uses fgetc(). And each functions return different
error on fails.
Anyway, the failure is harmless here. So, the log message and comment is
updated.
Closes one of the issues in #19410.
Previously, we'd generally attempt the operation first, without any
passwords, and only query for a password if that operation then fails
and asks for one. This is done to improve compatibility with
password-less authentication schemes, such as security tokens and
similar.
This patch modifies this slightly: if a password can be acquired cheaply
via the keyring password cache, the $CREDENTIALS_PATH credential store,
or the $PASSWORD/$PIN environment variables, acquire it *before* issuing
the first requested.
This should save us a pointless roundtrip, and should never hurt.
We want to use the result in a shell pipeline hence use -P mode (pipe
mode) instead of -t mode (interactive tty mode) for systemd-run.
This shouldn't change much about the test, but is slightly more correct
(and quicker).
We have to invoke the tests as superuser, and not being able to read
the journal as the invoking user is annoying. I don't think there are
any security considerations here, since the invoking user can already
put arbitrary code in the Makefile and test scripts which get executed
with root privileges.
Let's enable this in all tools that intend to write to the OS images.
It's not conditionalized for now, as there already is conditionalization
in the existance or absence of the flag in the GPT partition table (and
it's opt-in), hence it should be OK to just enable this by default for
now if the flag is set.
systemd-repart can grow partitions dynamically at boot, but it won't
grow the file systems inside them. In /etc/fstab you can request that
via x-systemd.growfs. So far we didn't have a nice scheme for images
with GPT auto-discovery however, and that meant in particular in tools
such as systemd-nspawn the file systems couldn't be grown automatically.
Let's address this: let's define a new GPT partition flag that can be
set for our partition types. If set it indicates that the file system
should be grown to the partition size on mount.
This commit adds the flag and adds code to discover it when dissecting
images. There's no code yet to actually do something about it.
Let's rename MountpointsFlags → MountPointFlags. In most of our codebase
we name things mount_point/MountPoint rather than mountpoint/Mountpoint,
do so here too.
Also, prefix the enum values with "MOUNT_". The fact the enum values
weren#t prefixed was pretty unique in our codebase, and pretty
surprising. Let's fix that.
This is just refactoring, no actual change in behaviour
The logic to query test state was rather complex. I don't quite grok the point
of ret=$((ret+1))… But afaics, the precise result was always ignored by the
caller anyway.
We would remove stuff only if successful, so repeated invocations would
trivially fail.
Also drop "-f", so that if we expect to remove something, it must be there.
oomd works way better with swap, so let's make the test less flaky by
configuring a swap device for it. This also allows us to drop the ugly
`cat`s from the load-generating script.
I found myself often looking for a quick way to determine "the local IP
address", and then being lost in the "ip addr" output to find for the
right one to use. This is supposed to help a bit with that. Let's
introduce a new special hostname "_outbound" with semantics similar to
"_gateway" that resolves to addresses that are the closest I could come
up with that maps to "the" local IP address.
This adds a small helper, similar in style to local_addresses() and
local_gateways() that determines the local "outbound" addresses.
What's an "outbound" address supposed to be? The local IP addresses that
are the most likely used for outbound communication. It's determined
by using connect() towards the default gws on an UDP socket, and then
reading the address of the socket this caused it to be bound to.
This is not the "public" or "external" IP address of the local system,
and is not supposed to be. It's just the local IP addresses that are
likely the ones going to be used by the local IP stack for
communication with other hosts.
Currently, if [Install] section contains WantedBy=target that doesn't exist,
systemd creates the symlinks anyway. That is just user-unfriendly.
Let's be nice and warn about installing non-existent targets.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1835351.
Replaces: #15834
The kernel will send us a PARTN= uevent proprty with partition add
events, let's use it instead of going for the "partition" sysfs attr.
It's less racy that way and there are reports the sysfs attr shows up
after the device, which makes it evern worse.
Cover the case where a service is recovered out of reloading state via
a restart Restart= configuration.
Signed-off-by: Peter Morrow <pemorrow@linux.microsoft.com>
If a service is in reloading state but has exited do not delay
the final exit until the service reload timer expires. Instead allow
the service to exit immediately since we can't expect the service to
ever transition out of reloading state.
For example if a service sent RELOADING=1 but crashed before it could
send READY=1 then it should be restarted if the service had
Restart= configured.
Signed-off-by: Peter Morrow <pemorrow@linux.microsoft.com>
It's OK to specify the root dir as target directory when copying
directories. However, in that case path_extract_filename() is going to
fail, because the root dir simply has not filename.
Let's address that by moving the call further down into the loop, when
we made sure that the target dir doesn't exist yet (the root dir always
exists, hence this check is sufficient).
Moreover, in the branch for copying regular files, also move the calls
down, and generate friendly error messages in case people try to
overwrite dirs with regular files (and the root dir is just a special
case of a dir).
Altogether this makes CopyFiles=/some/place:/ work, i.e. copying some
dir on the host into the root dir of the newly created fs. Previously
this would fail with an error about the inability to extract a filename
from "/", needlessly.
When we copy files into the freshly formatted file system, the mount
point prefix must be prepended to the *target* path, not the *source*
path. Not just in code but in the log message about it, too.
On some conditions (particularly when mobile CPUs are going to sleep),
the posix_fallocate(), which is called when a new journal file is allocated,
can return -1 (EINTR). This is counted as a fatal error. So the journald
closes both old and journals, and simply throwing away further incoming
events, because of no log files open.
Introduce posix_fallocate_loop() that restarts the function in the case
of EINTR. Also let's make code base more uniform by returning negative
values on error.
Fix assert in test-sigbus.c that incorrectly counted positive values as
success. After changing the function return values, that will actually work.
Fixes: #19041
Signed-off-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
So far all file systems where checked by instances of
systemd-fsck@.service, with the exception of the root fs which was
covered by systemd-fsck-root.service. The special handling is necessary
to deal with ordering issues: we typically want the root fs to be
checked before all others, and — weirdly — allow mounting it before the
fsck done (for compat with initrd-less boots).
This adds similar special handling for /usr: if the hierarchy is placed
on a separate file system check it with a special
systemd-fsck-usr.service instead of a regular sysemd-fsck@.service
instance. Reason is again ordering: we want to allow mounting of /usr
without the root fs already being around in the initrd, to cover for
cases where the root fs is created on first boot and thus cannot be
mounted/checked before /usr.
Some interfaces require that the DHCPOFFER message is sent via broadcast
if they can't receive unicast messages before they've been configured
with an IP address.
E.g., s390 ccwgroup network interfaces operating in layer3 mode face
this limitation. This can prevent the interfaces from receiving an
IP address via DHCP, if the have been configured for layer3.
To allow DHCP over such interfaces, we're introducing a new device
property ID_NET_DHCP_BROADCAST which can be set for those.
The networkd DHCP client will check whether this property is set
for an interface, and if so will set the broadcast flag, unless
the network configuration for the interface has an explicit
RequestBroadcast setting.
Besides that, we're adding a udev rule to set this device property
for ccwgroup devices operating in layer3 mode, which is the case
if the ID_NET_DRIVER property is qeth_l3.
Supercedes #18829
Similar to sd_bus_error_has_names() that was added in
2b07ec316a0e25a3e10c270c7f6baee9e0187bf8.
It is made inline in the hope that the compiler will be able to optimize
all the va_args boilerplate away, and do an efficient comparison when
the arguments are all constants.
This PR made modification on Lennart Poettering's basis. Fix the LineMax's function failure problem.
Signed-off-by: Yangyang Shen <shenyangyang4@huawei.com>
Let's exit early if we are invoked to generate an fsck unit for the
rootfs or /usr of the initrd itself. The "systemd-root-fsck.service" and
"systemd-usr-fsck.service" units are after all for the host file
systems, and the initrd file hierarchy is from an unpacked cpio anyway.
Hence, this semantically doesn't really make sense, so quickly exit if
we detect this case. This allows us to remove some checks further down
the codepath.
Previously, if DUID-UUID is used, all configurations are configured
after networkd gets product uuid of machine.
This makes only DHCP clients are delayed, and other configs are
configured earlier.
In man pages, horizontal space it at premium, and everything should
generally be indented with 2 spaces to make it more likely that the
examples fit on a user's screen.
C.f. 798d3a524ea57aaf40cb53858aaa45ec702f012d.
This teaches repart to look for the root block device both as the
backing for /sysroot and for /sysusr/usr.
The latter is a new addition, and starts making more sense with the next
commit. It's about supporting systems that are shipped with only a /usr/
fs, but where a root fs is allocated and formatted on first boot via
systemd-repart (or a similar tool). In this case it's useful to be able
to mount the ultimate /usr/ early on without mounting the root fs
right-away (simple because the rootfs might not exist yet, and we need
the repart data encoded in /usr/ to actually format it). Hence, instead
of requiring that we mount /sysroot/ first and /sysroot/usr/ second as
we did so far, let's rearrange things slightly:
1. We mount the /usr/ file system we discover to /sysusr/usr/
2. We mount the root file system we discover to /sysroot/
3. Once both are established we bind mount /sysusr/usr/ to /sysroot/usr/
And that' it. The first two steps can happen in either order, and we can
access /usr/ with or without a rootfs being around.
This commit implements nothing of the above. Instead, it teaches
systemd-repart to check both /sysroot/ and /sysusr/ for repart drop-ins,
and use the first of these hierarchies it finds populated. This way
systemd-repart can be spawned once /usr is mounted and it will work
correctly without root fs having to exist, or we can invoke it when the
root fs is already mounted, where it also will work correctly.
This changes the fstab-generator to handle mounting of /usr/ a bit
differently than before. Instead of immediately mounting the fs to
/sysroot/usr/ we'll first mount it to /sysusr/usr/ and then add a
separate bind mount that mounts it from /sysusr/usr/ to /sysroot/usr/.
This way we can access /usr independently of the root fs, without for
waiting to be mounted via the /sysusr/ hierarchy. This is useful for
invoking systemd-repart while a root fs doesn't exist yet and for
creating it, with partition data read from the /usr/ hierarchy.
This introduces a new generic target initrd-usr-fs.target that may be
used to generically order services against /sysusr/ to become available.
This tries to shorten the race of device reuse a bit more: let's ignore
udev database entries that are older than the time where we started to
use a loopback device.
This doesn't fix the whole loopback device raciness mess, but it makes
the race window a bit shorter.
This is similar to the preceding work to store the uevent seqnum, but
this stores the CLOCK_MONOTONIC timestamp.
Why? This allows to validate udev database entries, to determine if they
were created *after* we attached the device.
The uevent seqnum logic allows us to validate uevent, and the timestamp
database entries, hence together we should be able to validate both
sources of truth for us.
(note that this is all racy, just a bit less racy, since we cannot
atomically attach loopback devices and get the timestamp for it, the
same way we can't get the uevent seqnum. Thus is shortens the race
window, but doesn#t close it).
We already store a CLOCK_MONOTONIC timestamp for each device appearance,
let' make this queriable.
This is useful to determine whether a udev device database entry is from
a current appearance of the device or a previous one, by comparing it
with appropriately taken timestamps.
Let's drop all monitor uevent that were enqueued before we actually
started setting up the device.
This doesn't fix the race, but it makes the race window smaller: since
we cannot determine the uevent seqnum and the loopback attachment
atomically, there's a tiny window where uevents might be generated by
the device which we mistake for being associated with out use of the
loopback device.
Later, this will allow us to ignore uevents from earlier attachments a
bit better, as we can compare uevent seqnums with this boundary. It's
not a full fix for the race though, since we cannot atomically determine
the uevent and attach the device, but it at least shortens the window a
bit.
Previously, loop_device_make() would return the device fd in one success
code path, but not the other (where' we'd just return 0).
loop_device_open() returns it in all cases.
Hence, let's clean this up, and make sure in all success code paths of
both functions we return it (even though it strictly speaking is
redundant, since we return it in LoopDevice anyway, and currently noone
actually relies on this).
the `man systemd.service` say:
Defaults to the setting DefaultOOMPolicy= in systemd-system.conf(5) is set to
but there is no such line in this config.
This is the default value I extracted from
systemctl show --property=DefaultOOMPolicy
Even if we set up a loopback device read-only and mount it read-only
this means nothing, ext4 will still write through to the backing storage
file.
Yes, I lost 6h debugging time on this.
Apparently, we have to specify "norecovery" when mounting such file
systems, to force them into truly read-only mode. Let's do so.
Let's make the GPT partition flags configurable when creating new
partitions. This is primarily useful for the read-only flag (which we
want to set for verity enabled partitions).
This adds two settings for this: Flags= and ReadOnly=, which strictly
speaking are redundant. The main reason to have both is that usually the
ReadOnly= setting is the one wants to control, and it' more generic.
Moreover we might later on introduce inherting of flags from CopyBlocks=
partitions, where one might want to control most flags as is except for
the RO flag and similar, hence let's keep them separate.
When using systemd-repart as an installer that replicates the install
medium on another medium it is useful to reference the root
partition/usr partition or verity data that is currently booted, in
particular in A/B scenarios where we have two copies and want to
reference the one we currently use. Let's add a CopyBlocks=auto for this
case: for a partition that uses that we'll copy a suitable partition
from the host.
CopyBlocks=auto finds the partition to copy like this: based on the
configured partition type uuid we determine the usual mount point (i.e.
for the /usr partition type we determine /usr/, and so on). We then
figure out the block device behind that path, through dm-verity and
dm-crypt if necessary. Finally, we compare the partition type uuid of
the partition found that way with the one we are supposed to fill and
only use it if it matches (the latter is primarily important on
dm-verity setups where a volume is likely backed by two partitions and
we need to find the right one).
This is particularly fun to use in conjunction with --image= (where
we'll restrict the device search onto the specify device, for security
reasons), as this allows "duplicating" an image like this:
# systemd-repart --image=source.raw --empty=create --size=auto target.raw
If the right repart data is embedded into "source.raw" this will be able
to create and initialize a partition table on target.raw that carrries
all needed partitions, and will stream the source's file systems onto it
as configured.
So far we already had the CopyFiles= option in systemd-repart drop-in
files, as a mechanism for populating freshly formatted file systems with
files and directories. This adds MakeDirectories= in similar style, and
creates simple directories as listed. The option is of course entirely
redundant, since the same can be done with CopyFiles= simply by copying
in a directory. It's kinda nice to encode the dirs to create directly in
the drop-in files however, instead of providing a directory subtree to
copy in somehere, to make the files more self-contained — since often
just creating dirs is entirely sufficient.
The main usecase for this are GPT OS images that carry only a /usr/
tree, and for which a root file system is only formatted on first boot
via repart. Without any additional CopyFiles=/MakeDirectories=
configuration these root file systems are entirely empty of course
initially. To mount in the /usr/ tree, a directory inode for /usr/ to
mount over needs to be created. systemd-nspawn will do so automatically
when booting up the image, as will the initrd during boot. However, this
requires the image to be writable – which is OK for npawn and
initrd-based boots, but there are plenty tools where read-only operation
is desirable after repart ran, before the image was booted for the first
time. Specifically, "systemd-dissect" opens the image in read-only to
inspect its contents, and this will only work of /usr/ can be properly
mounted. Moreover systemd-dissect --mount --read-only won't succeed
either if the fs is read-only.
Via MakeDirectories= we now provide a way that ensures that the image
can be mounted/inspected in a fully read-only way immediately after
systemd-repart completed. Specifically, let's consider a GPT disk image
shipping with a file usr/lib/repart.d/50-root.conf:
[Partition]
Type=root
Format=btrfs
MakeDirectories=/usr
MakeDirectories=/efi
With this in place systemd-repart will create a root partition when run,
and add /usr and /efi into it as directory inods. This ensures that the
whole image can then be mounted truly read-only anf /usr and /efi can be
overmounted by the /usr partition and the ESP.
libfdisk appears to return NULL when encountering an empty partition
label, let's handle this sanely, and treat NULL and "" for the current
label as the same, but for the new label as distinct: there NULL means
nothing is set, and "" means an actual empty label.
This is similar to the --image= switch in the other tools, like
systemd-sysusers or systemd-tmpfiles, i.e. it apply the configuration
from the image to the image.
This is particularly useful for downloading minimized GPT image, and
then extending it to the desired size via:
# systemd-repart --image=foo.image --size=5G
Let's have one flag to request that when dissecting an image the
loopback device is made read-only, and another one to request that when
it is mounted to make it read-only. Previously both concepts were always
done read-only together.
(Of course, making the loopback device read-only but mounting it
read-write doesn't make too much sense, but the kernel should catch that
for us, no need to make restrictions from our side there)
Use-case for this: in systemd-repart we'd like to operate on images for
adding partitions. Thus we'd like to have the loopback device writable,
but if we read repart.d/ snippets from it, we want to do that read-only.
Add the Sierra Wireless EM7345-LTE modem to the list of USB devices which
can safely autosuspend. This helps the processor reach deaper PC# states
when idle.
This was tested on a ThinkPad8 tablet with such a modem builtin.
systemd-networkd.socket can re-start systemd-networkd.service in
shutdown and by doing this even stop shutdown.target leaving the
system in halfway-down state.
Fixes#4955.
This code was partially broken, since the firmware directory was
undefined. Also, some of the parts were a dead code, since they relied
on code from the original dracut test suite.
`command -v <bin> | grep ...` can under certain conditions cause the
`command` to exit with SIGPIPE, which in combination with `set -o
pipefail` means that the tests sometimes randomly die during setup.
Let's avoid using pipes in such cases.
This breaks some existing loops which previously ignored if the piped
program exited with EC >0. Rewrite them to mitigate this (and also make
them more robust in some cases).
This fixes maybe-uninitialized warning:
```
../src/basic/fileio.c: In function ‘chase_symlinks_and_fopen_unlocked’:
../src/basic/fileio.c:1026:19: warning: ‘f’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1026 | *ret_file = f;
| ~~~~~~~~~~^~~
```
Try to make this more manageable by reording:
- dependencies / inputs
(with subcategory of compression libraries)
- major components / outputs
- optional features / conditionals that don't fit into the two above categories
The division isn't well defined, because libraries often correspond one-to-one
to feature, but not always.
The test appears to be occasionally failing. It uses systemd-run to echo
'hello world' into a namespaced journal and then uses journalctl to look for it,
but it doesn't wait.
In the failed runs it can't find it, but the automated journal dump shows
the message at the end.
Use --wait to avoid races.
We were writing to the wrong buffer with a wrong offset :(
Bug present since the original introduction of the code in
04b28be1a306fd2ba454d3ee333d63df71aa3873.
As @yuwata correctly points out, this became broken when log_debug()
started returning -EIO. I wanted to preserve this pattern, but it turns
out it is not very widely used, and preserving it would make the whole
thing, already quite complicated, even more complex.
log_debug() is made like log_info() and friends, and returns void.
Let's assert if we ever happen to pass 0 to one of the log functions.
With the preceding commit to return -EIO from log_*(), passing 0 wouldn't
affect the return value any more, but it is still most likely an error.
The unit test code is an exception: we fairly often pass the return value
to print it, before checking what it is. So let's assert that we're not
passing 0 in non-test code. As with the previous check for %m, this is only
done in developer mode. We are depending on external code setting
errno correctly for us, which might not always be true, and which we can't
test, so we shouldn't assert, but just handle this gracefully.
I did a bunch of greps to try to figure out if there are any places where
we're passing 0 on purpose, and couldn't find any.
The one place that failed in tests is adjusted.
About "zerook" in the name: I wanted the suffix to be unambiguous. It's a
single "word" because each of the words in log_full_errno is also meaningful,
and having one term use two words would be confusing.
This is only done in developer mode. It is a pretty rare occurence that we
make this kind of mistake. And even if it happens, the result is just a misleading
error message. So let's only do the check in non-release builds.
This silences some warnigns where gcc thinks that some variables are
unitialized. One particular case:
../src/journal/journald-server.c: In function 'ache_space_refresh':
../src/journal/journald-server.c:136:28: error: 'vfs_avail' may be used uninitialized in this function [-Werror=maybe-uninitialized]
136 | uint64_t vfs_used, vfs_avail, avail;
| ^~~~~~~~~
../src/journal/journald-server.c:136:18: error: 'vfs_used' may be used uninitialized in this function [-Werror=maybe-uninitialized]
136 | uint64_t vfs_used, vfs_avail, avail;
| ^~~~~~~~
cc1: all warnings being treated as errors
which is caused by
d = opendir(path);
if (!d)
return log_full_errno(errno == ENOENT ? LOG_DEBUG : LOG_ERR,
errno, "Failed to open %s: %m", path);
if (fstatvfs(dirfd(d), &ss) < 0)
return log_error_errno(errno, "Failed to fstatvfs(%s): %m", path);
For some reason on aarch64 gcc thinks we might return non-negative here. In
principle errno must be set in both cases, but it's hard to say for certain.
So let's make sure that our code flow is correct, even if somebody forgot to
set the global variable somewhere.
We were writing to the wrong buffer with a wrong offset :(
Bug present since the original introduction of the code in
04b28be1a306fd2ba454d3ee333d63df71aa3873.
Using a enum is all nice and generic, but at this point it seems unlikely that
we'll add further build modes. But having an enum means that we need to include
the header file with the enumeration whenerever the conditional is used. I want
to use the conditional in log.h, which makes it hard to avoid circular imports.
With some versions of the compiler, the _cleanup_ attr makes it think
the variable might be freed/closed when uninitialized, even though it
cannot happen. The added cost is small enough to be worth the benefit,
and optimized builds will help reduce it even further.
We intentionally do not inline initializations with definitions for
a bunch of _cleanup_ variables in tests, to ensure valgrind is triggered.
This triggers a lot of maybe-uninitialized false positives when -O2 and
-flto are used. Suppress them.
The commit 0b81225e5791f660506f7db0ab88078cf296b771 makes that networkd
remove all foreign rules except those with "proto kernel".
But, in some situation, people may want to manage routing policy rules
with other tools, e.g. 'ip' command. To support such the situation,
this introduce ManageForeignRoutingPolicyRules= boolean setting.
Closes#19106.
Otherwise rebase-helper thinks we're are a dist-git repository,
replacing the generated git archive with PR changes with the tarball
found in the 'sources' file.
There are two ambiguity in the original description:
1. It will delay all RUN instructions, include builtin.
2. It will delay before running RUN, not each of RUN{program} instructions.
The test occasionally fails as the umount is not yet completed when
cryptsetup close is invoked.
Both cryptsetup and losetup have supported deferred cleanup for some
time now, so use it instead to avoid races.
++ losetup -P --show --find /tmp/test-repart.dMOfYQ8UUF/zzz
+ LOOP=/dev/loop6
+ VOLUME=test-repart-11882
+ touch /tmp/test-repart.dMOfYQ8UUF/empty-password
+ cryptsetup open --type=luks2 --key-file=/tmp/test-repart.dMOfYQ8UUF/empty*** test-repart-11882
+ mkdir /tmp/test-repart.dMOfYQ8UUF/mount
+ mount -t ext4 /dev/mapper/test-repart-11882 /tmp/test-repart.dMOfYQ8UUF/mount
+ diff -r /tmp/test-repart.dMOfYQ8UUF/mount/def /tmp/test-repart.dMOfYQ8UUF/definitions
+ umount /tmp/test-repart.dMOfYQ8UUF/mount
+ cryptsetup close test-repart-11882
Device test-repart-11882 is still in use.
+ rm -rf /tmp/test-repart.dMOfYQ8UUF
There are tokens with dots (and other symbols) in PKCS11 URI:
pkcs11:model=Rutoken%20ECP;manufacturer=Aktiv%20Co.;serial=3xxxxxxb;token=livelace
pkcs11:model=PRO;manufacturer=Aladdin%20R.D.;serial=CC62FB25;token=val%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00;id=%33%32%31%30%33%61%36%37%36%65%32%34%35%62%32%31;type=private
- Handle BPFProgram= property in string format
"<bpf_attach_type>:<bpffs_path>", e.g. egress:/sys/fs/bpf/egress-hook.
- Add dbus getter to list foreign bpf programs attached to a cgroup.
- Pin trivial bpf programs to bpf filesystem, compose BPFProgram= option
string and pass it to a unit. Programs store `0` in r0 BPF register for
denying action, e.g. drop a packet.
- Load trivial BPF programs
- Test is skipped if not run under root or if can not lock enough
memory.
- For egress and ingress hooks, test BPFProgram= option along with
with IP{Egress|Ingress}FilterPath=, expected result should not depend on
which rule is executed first.
Expected results for BPF_CGROUP_INET_INGRESS:
5 packets transmitted, 0 received, 100% packet loss, time 89ms
For BPF_CGROUP_INET_SOCK_CREATE:
ping: socket: Operation not permitted
- Introduce support of cgroup-bpf programs managed (i.e. compiled,
loaded to and unloaded from kernel) externally. Systemd is only
responsible for attaching programs to unit cgroup hence the name
'foreign'.
Foreign BPF programs are identified by bpf program ID and attach type.
systemd:
- Gets kernel FD of BPF program;
- Makes a unique identifier of BPF program from BPF attach type and
program ID. Same program IDs mean the same program, i.e the same
chunk of kernel memory. Even if the same program is passed multiple
times, identical (program_id, attach_type) instances are collapsed
into one;
- Attaches programs to unit cgroup.
- Store foreign bpf programs in cgroup context. A program is considered
foreign if it was loaded to a kernel by an entity external to systemd,
so systemd is responsible only for attach and detach paths.
- Support the case of pinned bpf programs: pinning to bpffs so a program
is kept loaded to the kernel even when program fd is closed by a user
application is a common way to extend program's lifetime.
- Aadd linked list node struct with attach type and bpffs path
fields.
Introduce bpf_cgroup_attach_type_table with accustomed attached type
names also used in bpftool.
Add bpf_cgroup_attach_type_{from|to}_string helpers to convert from|to
string representation of pinned bpf program, e.g.
"egress:/sys/fs/bpf/egress-hook" for
/sys/fs/bpf/egress-hook path and BPF_CGROUP_INET_EGRESS attach type.
Add helpers to:
- Create new BPFProgram instance from a path in bpf
filesystem and bpf attach type;
- Pin a program to bpf fs;
- Get BPF program ID by BPF program FD.
Reduced version of [0].
Use BPF_F_ALLOW_MULTI attach flag for bpf-firewall if kernel supports
it.
Aside from addressing security issue in [0] attaching with 'multi'
allows further attaching of cgroup egress, ingress hooks specified by
BPFProgram=.
[0] 4e42210d40
lstat() returns the error in errno, not as return value. Let's propagate
this correctly.
This broke the bolt test suite, as @gicmo discovered.
Follow-up for acfc2a1d15560084e077ffb3be472cd117e9020a.
Let's define both an enum and a typedef named SpecialGlyph, the way we
usually do it.
Also, introduce an "invalid" special glyph, assigned to -EINVAL, also
like we always do it. (And handle it somewhat sanely in special_glyph()
'! grep -v' does *not* test that there are no matching lines.
Instead, it checks that whether there are any non-matching lines.
And of course, for the test to fail, '! grep' cannot be part of
an expression with &&.
We were grepping for 'hello world', and in the namespace we would
match on 'hello world', and outside, on 'echo "hello world"'. When
the condition check was fixed, the test gave a false positive.
We were invoking 'systemd-run bash', but the test invoked by bash
was not effective. When the result of that check is propagated, the
outer command fails.
create_fifo() was added in a2fc2f8dd30c17ad1e23a31fc6ff2aeba4c6fa27, and
would always ignore failure. The test was trying to fail in this case, but
we actually don't fail, which seems to be correct. We didn't notice before
because the test was ineffective.
To make things consistent, generally log at warning level, but don't propagate
the error. For symlinks, log at debug level, as before.
For 'e', failure is not propagated now. The test is adjusted to match.
I think warning is appropriate in most cases: we do not expect a device node to
be replaced by a different device node or even a non-device file. This would
most likely be an error somewhere. An exception is made for symlinks, which are
mismatched on purpose, for example /etc/resolv.conf. With this patch, we don't
get any warnings with the any of the 74 tmpfiles.d files, which suggests that
increasing the warning levels will not cause too many unexpected warnings. If
it turns out that there are valid cases where people have expected mismatches
for non-symlink types, we can always decrease the log levels again.
Quoting of values differs between distros: Fedora doesn't quote the ID_
fields, but CentOS does.
Adjust the test checks to account for this.
Fixes#19242
Also add "system" in the messages, because we set the internal value,
and are just skipping the setting of the external value, so the message
could be confusing without that clarification.
We didn't document this behaviour one way or another, so I think it's
OK to change. All callers do the NULL check before callling this to avoid
the assert warning, so it seems reasonable to do it internally.
sd_bus_can_send() is similar, but there we expressly say that an
error is returned on NULL, so I didn't change it.
Also use standard error loggin/return pattern.
Only cursory tested, by checking that with a simple config file
the array is the same before/after. Not tested with actual scsi
rules and devices, due to missing hardware.
Some static analyzers (lgtm) warn against using non-re-entrant functions,
even though at the moment this code is not multi-threaded, just switch to
format_timestamp.
The warning was disabled in 8794164fed5f0142c34358613f92f4f761af4edd to avoid
false positives. But it is useful in finding errors, even if it sometimes
results in untrue warnings (c.f. 77fac974fe, da46a1bc3c).
After #19168, #19169, and #19175, there are no warnings with
-Dbuildtype=debug-optimized/-O2 and gcc-11.0.1-0.3.fc34.x86_64. Warnings
are reenabled for -O[23]
-O0 is good for development, and -O2 is the default optimization level for
Fedora package builds. -Os, -O3, -O1, and -Og still generate some warnings. In
fact, with -Os the number of warnings seems completely hopeless. Dozens and
dozens.
gcc-11.0.1-0.3.fc34.x86_64 with -Og was complaining that 'r' might be
unitialized. It cannot, but let's rework the code to use a goto instead of
conditionalizing on 'call' being unset, which I think is clearer and less error
prone. This silences the warning.
gcc-11.0.1-0.3.fc34.x86_64 was complaining that n might be unset with
--optimization=1. It was wrong, but let's rework the code to make it
obvious that it is always set.
"! test ..." does not cause the script to fail, even with set -e.
IIUC, bash treats this command as part of an expression line, as it
would if 'test ... && ...' was used. Failing expression lines do not
terminate the script.
This fixes the obvious cases by changing '! test' → 'test !'.
Then the inversion happens internally in test and bash will propagate
the failure.
This also makes function id is parsed as uint64_t. Kernel internally
uses uint32_t for function id (see the definition of 'struct zpci_dev),
but it maybe extended in the future.
On Fedora /usr/bin/ld is a symlink managed via the "alternatives"
system. This unfortunately means the binary is not usable in
environments where /var or /etc are unpopulated. Let's address this by
redirecting "ld" to "ld.bfd" manually if such an environment is
detected, via $PATH.
This is useful for building systemd in mkosi with UsrOnly=1 set.
gcc 9.3.0 "cc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0" with --optimization=1 was
not able to figure out that all cases are covered because r is either set in
the switch or type < _TABLE_DATA_TYPE_MAX.
But for a human reader this might also not be obvious: the cases are not in
exactly the same order as enum definitions, and it's a long list. By using the
goto, there should be no doubt, and we avoid checking the condition a second
time.
So far when parsing /proc/cmdline we'd consider backslashes as
mechanisms for escaping whitepace or quotes. This changes things so that
they are retained as they are instead. The kernel itself doesn't allow such
escaping, and hence we shouldn't do so either (see lib/cmdline.c in the
kernel sources; it does support "" quotes btw).
This fix is useful to allow specifying backslash escapes in the "root="
cmdline option to be passed through to systemd-fstab-generator. Example:
root=/dev/disk/by-partlabel/Root\x20Partition
Previously we'd eat up the "\" so that we'd then look for a device
/dev/disk/by-partlabel/Rootx20Partition which never shows up.
In b9c19bc384fd41c173a8e453bd157544400af059, I added an assert to _setfv() and
_setf(), but I forgot to do the same in _set(). Let's do this for completeness.
While at it, restructure _set() to use the same style as _setfv().
This should make it easier to remove those warnings when the compiler
gets smarter. Not sure if I got them all...
Double space before the comment start to make it easier to separate from the
preceding line.
Avid Adrenarine and Mojo has configuration ROM in which single unit exists
in root directory, however the unit has both video and audio functions.
For the case, it's better to distinguish from the case of composite node.
This commit adds database entries for them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
In IEEE 1394 bus, one node can include multiple units, which represent
certain functions such as video and audio. Although it's possible to
distinguish each unit, Linux FireWire character device corresponding to
the node can not have multiple group owners, therefore it's forced to
select one of the units as representative for function.
This commit adds database entries for units belongs to the same node.
The entries are aligned to inverse order of corresponding unit order
in configuration ROM to select the first unit as the representative.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Cool Stream shipped iSweet. This model has single unit for video function.
This commit adds database entry for it as sample of node with single unit
for video.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
TC Electronic had PowerCore platform for products of digital audio signal
processing. This platform consists of NXP PowerQUICC II Processor with PCI
interface (XPC8245, MPC8245), Xilinx Spartan-II FPGA (XC2S50), and some
NXP 24-Bit Audio Digital Signal Processor (DSP56367). The products for
IEEE 1394 bus has additional TI OHCI 1.1, 1394a link layer controller
(TSB43AB23).
The content of configuration ROM has layout of standard of 1394 Trading
Association.
This commit adds database entries for the models. At present, no driver is
developed, thus this is just for convenience to developers.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Sintefex Audio Lda. designed Liquid Mix as OEM of Focusrite Audio
Engineering, Ltd. The models serve digital signal processing service via
asynchronous transaction in IEEE 1394 bus.
The content of configuration ROM is not standard of 1394 Trading
Association.
This commit adds an rule entry for the models. At present, no driver is
developed, thus this is just for convenience to developers.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
TC Applied Technologies designed DiceII ASIC to adapt to two protocols.
One of the protocol is mLAN defined by Yamaha Corporation, and another
is own protocol. The DiceII ASIC adapted to mLAN protocol was used some
products by Yamaha and its child company, Steinberg.
The content of configuration ROM for the models has completely different
layout from the one defined by 1394 Trading Association.
This commit adds an udev rule for the models. At present, no driver is
developed, thus this is just for convenience to developers.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Yamaha Corporation designed mLAN protocol based on IEEE 1394
specification. Yamaha developed specific ICs for the purpose (mLAN-NC1
and mLAN-PH2), and shipped some products with them, as well as OEM.
The content of configuration ROM is completely different from standard
layout defined by 1394 Trading Association.
This commit adds database entries for the models. At present, two vendors
are known for models with mLAN IC. At present, no driver is developed
for the models, thus this is just for convenience to developers.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
RME GmbH shipped Fireface series. The configuration ROM in the models of
series has some quirks and against standard of 1394 Trading Association.
This commit adds database entries for the models. ALSA fireface driver
supports them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Mark of the unicorn (MOTU) shipped FireWire series. The configuration ROM
in the models of series has some quirks and against standard of 1394
Trading Association.
This commit adds database entries for the models. ALSA firewire-motu driver
supports them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
TEAC Corporation shipped FireWire series in its TASCAM brand. The
configuration ROM in the models of series has some quirks and against
standard of 1394 Trading Association.
This commit adds database entries for the models. ALSA firewire-tascam
driver supports them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Avid Audio shipped Digi 00x family in its Digidesign brand. The
configuration ROM in the models of family has some quirks and against
standard of 1394 Trading Association.
This commit adds database entries for the model. ALSA firewire-digi00x
driver supports them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Solid State Logic, Ltd. shipped some models based on DICE ASICs. The
content of configuration ROM has a quirk that the value of category
field is unique (0x51 or 0x52).
This commit adds database entries for the models. ALSA dice driver supports
them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Harman International Industries, Inc. shipped some models based on DICE
ASICs in its Lexicon brand. The content of configuration ROM has a quirk
that the value of category field is unique (0x20).
This commit adds database entries for the models. ALSA dice driver supports
them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
LOUD Audio, LLC (formerly known as LOUD Technologies, Inc.) shipped some
models based on DICE ASICs in its Mackie brand. The content of
configuration ROM has a quirk that the value of category field is unique
(0x10).
This commit adds database entries for the models. ALSA dice driver supports
them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Weiss Engineering Ltd. shipped some models based on DICE ASICs. The
content of configuration ROM has a quirk that the value of category
field is unique (0x00).
This commit adds database entries for the models. ALSA dice driver supports
them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
M-Audio shipped some models based on DICE ASICs. The content of
configuration ROM has a quirk that the value of version field in unit
directory is different from the one in TCAT specification (0x000001).
This commit adds database entries for the models. ALSA dice driver supports
them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
TC Applied Technologies designed the series of ASIC for audio and music
data transmission in several types of communication bus. It's named as
Digital Interface Communication Engine (DICE). Four ASICs are known in
the series for IEEE 1394 bus; Dice II, TCD2210 (Dice Jr.), TCD2220 (Dice
Mini), and TCD3070 (DiceIII).
The content of configuration ROM in products based on DICE ASICs is
known against specification defined by 1394 Trading Association.
This commit adds database entries for models without any customization by
vendors. In TCAT specification, The value of GUID field is split to four
parts; 24-bit OUI, 8-bit category, 10-bit product ID, and 22-bit serial
number in the order. In the specification, the value of category field is
fixed to 0x04. The root directory includes leaf entries for vendor and
model names. Although the specifier_id field in unit directory differs
depending on vendors, the version field in unit directory is fixed to
0x000001. ALSA dice driver supports them, but expects userspace
application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Once Oxford Semiconductor designed FW970 and FW971 ASICs as Multi-Channel
Isochronous Streaming FireWire Audio Controller. Some vendors used them
in their products for audio and music units.
The content of configuration ROM has standard layout of 1394 Trading
Association with an additional Dependent Information directory.
This commit adds database entries for the known models. ALSA oxfw
driver supports them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Echo Audio Corporation designed Fireworks board module. The module is used
by several vendors for models.
The content of configuration ROM in the models s some quirks and against
standard of 1394 Trading Association.
This commit adds database entries for the model. ALSA fireworks driver
supports them but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
ArchWave AG, formerly known as BridgeCo. AG, designed DM1000, DM1100, and
DM1500 ASICs for BridgeCo. Enhancement BreakOut Box (BeBoB) solution.
They were used for many models shipped by many vendors.
The content of configuration ROM has standard layout of 1394 Trading
Association with an additional Dependent Information directory.
This commit adds database entries for the known models. ALSA bebob
driver supports them, but expects userspace application to control them.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Although in IEEE 1394 unit function list I have a plan to use slash sign
in name of property, current implementation of parser doesn't allow it.
When parsing current entries in database excluded from parser testing, we
can find usage of slash sign in name of property.
This commit adds slash sign in allow list of the parser for my
convenience.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
In added IEEE 1394 unit function list, I use custom key to detect unit
entries in node context. Although the list is not widely used in the most
of systemd users, I would like to add parser grammar for testing, by
borrowing a bit time in builders.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Current udev rules configures group owner of firewire character device
to video group, corresponding to nodes in IEEE 1394 in below cases:
1.the node with any unit for any minor version of IIDC version 1
specification defined by 1394 Trading Association
2.the node with any unit for specification defined by Point Grey Research
3.the node with any unit for AV/C device v1.0 defined by 1394 Trading
Association
4.the node with any unit for vendor-unique protocol defined by 1394
Trading Association
Nevertheless, case 3 and 4 can cover the node with any unit for audio
function as well. In the cases, it's convenient to assign audio group.
Additionally, some nodes are known to have layout different from
the specification defined by 1394 Trading Association. In the case,
it's required to add rules specific to them.
Furthermore, some nodes have no fields for vendor name and model name in
configuration ROM. In the case, it's required to add entries to hardware
database for users convenience.
For the above reasons, this commit adds rules to use information in
hardware database for known units in IEEE 1394. One database entry
corresponds to one unit. Two types of key are used to match the unit;
customized key from node context, kernel modalias of unit context.
The entry has the type of function, at least. Supplementally, it has
vendor and model names.
For your information, below statements with Python pyparsing module are
expected to parse all of the custom key and module alias in the list:
```
subsystem_prefix = pp.Literal('ieee1394:').suppress()
hex_to_int = lambda a: int(a[0], 16)
node_prefix = pp.Literal('node:').suppress()
prefixed_lower_hex = pp.Combine(pp.Literal('0x') + pp.Word(pp.srange('[a-z0-9]'), exact=6)).setParseAction(hex_to_int)
ven_in_node = pp.dictOf(pp.Literal('ven'), prefixed_lower_hex)
mo_in_node = pp.dictOf(pp.Literal('mo'), prefixed_lower_hex)
unit_in_node = pp.Group(prefixed_lower_hex + pp.Literal(':').suppress() + prefixed_lower_hex)
units_in_node = pp.Group(pp.Literal('units') + pp.ZeroOrMore(pp.Literal('*')).suppress() + unit_in_node + pp.ZeroOrMore(pp.Literal('*')).suppress())
node_parser = subsystem_prefix + node_prefix + ven_in_node + pp.Optional(mo_in_node) + units_in_node
higher_hex = pp.Word(pp.srange('[A-Z0-9]'), exact=8).setParseAction(hex_to_int)
ven_in_unit = pp.dictOf(pp.Literal('ven'), higher_hex)
mo_literal_in_unit = pp.dictOf(pp.Literal('mo'), higher_hex)
mo_in_unit = pp.dictOf(pp.Literal('mo'), higher_hex ^ pp.Literal('*'))
sp_in_unit = pp.dictOf(pp.Literal('sp'), higher_hex)
ver_in_unit = pp.dictOf(pp.Literal('ver'), higher_hex)
unit_parser = subsystem_prefix + ven_in_unit + mo_in_unit + sp_in_unit + ver_in_unit
key_parser = node_parser ^ unit_parser
```
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Append 'package' and 'packageVersion' to the journal as discrete fields
COREDUMP_PKGMETA_PACKAGE and COREDUMP_PKGMETA_PACKAGEVERSION respectively,
and the full json blurb as COREDUMP_PKGMETA_JSON.
We forgot a call to dlopen_tpm2() in the unseal codepaths. As long as
automatic TPM2 device discovery was used that didn't matter, since in
that codepaths we'd have another call dlopen_tpm2(). But with an
explicitly configured TPM2 device things should work too, hence add the
missing call.
Fixes: #19206
Let's ensure our key sizes calculations are correct.
This doesn't actually change anything, just adds more safety checks.
Inspired by #19203, but not a fix.
The words and cword variables are not localized in all Bash completion
scripts that call _init_completion.
cur, prev, words, and cword (and split if using the -s flag) are all
variables that should be localized in Bash completion scripts before
calling _init_completion (even if they don't otherwise appear in the
calling script). This is done for cur and prev, but not for words and
cword. Letting words and cword remain unlocalized may clobber variables
the user is using for other purposes, which is bad.
This issue can be resolved by declaring words and cword as local
variables.
Resolves#19188.
Single-param LoadCredential= in units causes systemd v247/v248 to
assert when parsing. Disable it for now, until the fix is merged
in the stable trees, released and available (eg: in Debian
for the CI)
See: https://github.com/systemd/systemd/issues/19178
In some instances, particularly with swap on zram, swap used will be high
while there is still a lot of memory available. FB OOMD handles this by
thresholding kills to X% of total swap usage. Let's do the same thing here.
Anecdotally with these thresholds and my laptop which is exclusively swap
on zram I can sit at 0K / 4G swap free with most of memory free and
systemd-oomd doesn't kill anything.
Partially addresses aggressive kill behavior from
https://bugzilla.redhat.com/show_bug.cgi?id=1941170
The s390 PCI driver assigns the hotplug slot name from the
function_id attribute of the PCI device using a 8 char hexadecimal
format to match the underlying firmware/hypervisor notation.
Further, there's always a one-to-one mapping between a PCI
function and a hotplug slot, as individual functions can
hot plugged even for multi-function devices.
As the generic matching code will always try to parse the slot
name in /sys/bus/pci/slots as a positive decimal number, either
a wrong value might be produced for ID_NET_NAME_SLOT if
the slot name consists of decimal numbers only, or none at all
if a character in the range from 'a' to 'f' is encountered.
Additionally, the generic code assumes that two interfaces
share a hotplug slot, if they differ only in the function part
of the PCI address. E.g., for an interface with the PCI address
dddd:bb:aa.f, it will match the device to the first slot with
an address dddd:bb:aa. As more than one slot may have this address
for the s390 PCI driver, the wrong slot may be selected.
To resolve this we're adding a new naming schema version with the
flag NAMING_SLOT_FUNCTION_ID, which enables the correct matching
of hotplug slots if the device has an attribute named function_id.
The ID_NET_NAME_SLOT property will only be produced if there's
a file /sys/bus/pci/slots/<slotname> where <slotname> matches
the value of /sys/bus/pci/devices/.../function_id in 8 char
hex notation.
Fixes#19016
See also #19078
When this test is run in mkosi, the previously tested cgroup that we write
xattrs into and the root cgroup are the same.
Since the root cgroup is a live cgroup anyways (vs. the test cgroups which are
remade each time) let's generate the expected preference values from reading
the xattrs instead of assuming it will be NONE.
Since this is only changed the first time the limit is hit (and remains
set as long as the pressure remains over), I changed the name to better
reflect that.
Keeps consistent with "last_had_mem_reclaim" which is actually updated
every time there is reclaim activity.
systemd-oomd only monitors and kills within a selected cgroup subtree
For memory pressure kills, this means it's unnecessary to get the
pgscan rate across all the monitored memory pressure cgroups.
The increase will show up whether we do a total sum or not, but since
we only care about the increase in the subtree we're about to target
for a kill, we can simplify the code a bit by not doing this total sum.
One thing that came out of the test week is that systoomd needs to poll
more frequently so as not to race with the kernel oom killer in
situations where memory is eaten quickly. Memory pressure counters are
lagging so it isn't worthwhile to change the current read rate; however swap
is not lagging and can be checked more frequently.
So let's split these into 2 different timer events. As a result, swap
now also doesn't have to be subject to the post-action (post-kill) delay
that we need for memory pressure events.
Addresses some of slowness to kill discussed in
https://bugzilla.redhat.com/show_bug.cgi?id=1941340
Fixes this error I got building on F33:
/usr/bin/ld: test-random-util.p/src_test_test-random-util.c.o: undefined
reference to symbol 'sqrt@@GLIBC_2.2.5'
/usr/bin/ld: /usr/lib64/libm.so.6: error adding symbols: DSO missing
from command line
Follow-up for 7117842657c0fc5a3446b6fe158615279cf2d650.
sd_device_monitor_filter_add_match_subsystem_devtype() now returns 1 to signify
that something was done, and 0 to signify that nothing was done, but
udev_monitor_filter_add_match_subsystem_devtype() needs to return 0 as documented.
udev_monitor_filter_add_match_tag() is adjusted to match.
This makes gdm start successfully here again.
Before, it would just not boot, with nothing very obvious in the logs:
gdm[1756]: Gdm: GdmDisplay: Session never registered, failing
Replaces #19171.
I want to tweak behaviour further, and that'll be easier when "style"
is converted to a bitfield.
Some callers used ESCAPE_BACKSLASH_ONELINE, and others not. But the
ones that didn't, simply didn't care, because the argument was assumed to
be one-line anyway (e.g. a service name). In environment-generator, this
could make a difference. But I think it's better to escape the newlines
there too. So newlines are now always escaped, to simplify the code and
the test matrix.
The issue was introduced in the refactoring in 775ae35403f8f3c01b7ac13387fe8aac1759993f.
We would pass an initialized value to a helper function. We would only *use*
it if it was initialized. But the mere passing of an unitialized variable is
UB, so let's not do that. This silences a gcc warning.
The old code was just fine, but gcc doesn't understand that max_brightness is
initialized. Let's rework it a bit to move some logic to the main function. Now
get_max_brightness() just retrieves and parses the attribute, and the main
function decides what to do with it.
gcc was very unhappy for some reason:
[988/1664] Compiling C object systemd-oomd.p/src_oom_oomd.c.o
In file included from ../src/basic/path-util.h:10,
from ../src/shared/pretty-print.c:14,
from ../src/oom/oomd.c:15:
../src/shared/pretty-print.c: In function ‘conf_files_cat’:
../src/basic/strv.h:123:32: warning: ‘prefixes’ may be used uninitialized [-Wmaybe-uninitialized]
123 | for ((s) = (l); (s) && *(s); (s)++)
| ^
In file included from ../src/oom/oomd.c:15:
../src/shared/pretty-print.c:283:16: note: ‘prefixes’ was declared here
283 | char **prefixes, **prefix;
| ^~~~~~~~
../src/shared/pretty-print.c:305:12: warning: ‘is_collection’ may be used uninitialized in this function [-Wmaybe-uninitialized]
305 | if (!is_collection) {
| ^
../src/shared/pretty-print.c:301:13: warning: ‘extension’ may be used uninitialized in this function [-Wmaybe-uninitialized]
301 | r = conf_files_list_strv(&files, extension, root, 0, (const char* const*) dirs);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maybe this is caused by the statis char** variables?
[1/429] Compiling C object src/shared/libsystemd-shared-248.a.p/bus-message-util.c.o
../src/shared/bus-message-util.c: In function ‘bus_message_read_dns_servers’:
../src/shared/bus-message-util.c:165:21: warning: ‘family’ may be used uninitialized in this function [-Wmaybe-uninitialized]
165 | r = in_addr_full_new(family, &a, port, 0, server_name, dns + n);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/shared/bus-message-util.c:165:21: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized]
../src/shared/bus-message-util.c:165:21: warning: ‘server_name’ may be used uninitialized in this function [-Wmaybe-uninitialized]
The warning would be there despite all the asserts in bus_error_setfv() and
sd_bus_error_set(). So let's add an explicit assert.
[2/3] Compiling C object test-capability.p/src_test_test-capability.c.o
../src/test/test-capability.c: In function ‘main’:
../src/test/test-capability.c:270:12: warning: ‘run_ambient’ may be used uninitialized in this function [-Wmaybe-uninitialized]
270 | if (run_ambient)
| ^
gcc-11.0.1-0.3.fc34.x86_64
[91/180] Compiling C object libsystemd.a.p/src_libsystemd_sd-event_sd-event.c.o
In file included from ../src/basic/macro.h:12,
from ../src/basic/alloc-util.h:9,
from ../src/libsystemd/sd-event/sd-event.c:11:
../src/libsystemd/sd-event/sd-event.c: In function ‘sd_event_wait’:
../src/fundamental/macro-fundamental.h:86:63: warning: ‘child_min_priority’ may be used uninitialized in this function [-Wmaybe-uninitialized]
86 | UNIQ_T(A, aq) < UNIQ_T(B, bq) ? UNIQ_T(A, aq) : UNIQ_T(B, bq); \
| ^
../src/libsystemd/sd-event/sd-event.c:3983:45: note: ‘child_min_priority’ was declared here
3983 | int64_t epoll_min_priority, child_min_priority;
| ^~~~~~~~~~~~~~~~~~
Alternative to #19159.
[59/655] Compiling C object src/shared/libsystemd-shared-248.a.p/varlink.c.o
../src/shared/varlink.c: In function ‘varlink_write’:
../src/shared/varlink.c:459:12: warning: ‘n’ may be used uninitialized in this function [-Wmaybe-uninitialized]
459 | if (n < 0) {
| ^
../src/shared/varlink.c: In function ‘varlink_process’:
../src/shared/varlink.c:541:12: warning: ‘n’ may be used uninitialized in this function [-Wmaybe-uninitialized]
541 | if (n < 0) {
| ^
../src/shared/varlink.c:486:17: note: ‘n’ was declared here
486 | ssize_t n;
| ^
It was one giant all of text in pseudo-random order. Let's split it into
paragraphs talk about one subject each.
And unfortunately, the description of what happens when the error is not
set was not correct. In general, various functions treat 0/NULL as
not-an-error, and return 0.
I was hoping it would help with the following gcc warning:
[35/657] Compiling C object src/shared/libsystemd-shared-248.a.p/bus-message-util.c.o
../src/shared/bus-message-util.c: In function ‘bus_message_read_dns_servers’:
../src/shared/bus-message-util.c:165:21: warning: ‘family’ may be used uninitialized in this function [-Wmaybe-uninitialized]
165 | r = in_addr_full_new(family, &a, port, 0, server_name, dns + n);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/shared/bus-message-util.c:165:21: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized]
../src/shared/bus-message-util.c:165:21: warning: ‘server_name’ may be used uninitialized in this function [-Wmaybe-uninitialized]
It actually doesn't, but the compiler has a point here: the code is specified
in sd_bus_error_map[], and it has no way of knowning that we want it to be a
positive value.
I think this should be an assert, because if this assumption fails, a
programming error has occured, something that'd want to catch.
[11/657] Compiling C object src/basic/libbasic.a.p/fileio.c.o
../src/basic/fileio.c: In function ‘write_string_stream_ts’:
../src/basic/fileio.c:167:21: warning: ‘fd’ may be used uninitialized in this function [-Wmaybe-uninitialized]
167 | if (futimens(fd, twice) < 0)
| ^~~~~~~~~~~~~~~~~~~
[59/1551] Compiling C object src/basic/libbasic.a.p/socket-util.c.o
../src/basic/socket-util.c: In function ‘socket_get_mtu’:
../src/basic/socket-util.c:1393:16: warning: ‘mtu’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1393 | *ret = (size_t) mtu;
| ^~~~~~~~~~~~
Same motivation as in the parent commit: let's define variables later, ideally
right when they are first initialized, so it's easier to figure out that they
are properly initialized.
error_id and r_tuple* were previously initialized, but I don't see why they
would need to be.
No functional change intended.
Since the switch to varlink in 0c73f4f075a2d23f7cabe708b589f19f4bbbec37, the
code wasn't functional. The JSON_VARIANT_UNSIGNED/JSON_VARIANT_STRING mismatch
meant that we'd reject any reply. Once past that, the code would use
unitialized 'c' and 'n' variables, so it's lucky we never got that far ;)
With -Wmaybe-unitialized, gcc would warn.
I think that declaring the huge list of local variables with very short names
at the top of the function was making it harder to understand what is going on
in the function. So let's rename the variables a bit, and initialize them upon
declaration if possible.
$ build/test-nss-hosts resolve 1.1.1.1 1.0.0.1 10.38.5.41
======== resolve ========
_nss_resolve_gethostbyaddr2_r("1.1.1.1") → status=NSS_STATUS_SUCCESS
errno=999/--- h_errno=0/Resolver Error 0 (no error) ttl=0
"one.one.one.one"
AF_INET 1.1.1.1
_nss_resolve_gethostbyaddr_r("1.1.1.1") → status=NSS_STATUS_SUCCESS
errno=999/--- h_errno=0/Resolver Error 0 (no error)
"one.one.one.one"
AF_INET 1.1.1.1
_nss_resolve_gethostbyaddr2_r("1.0.0.1") → status=NSS_STATUS_SUCCESS
errno=999/--- h_errno=0/Resolver Error 0 (no error) ttl=0
"one.one.one.one"
AF_INET 1.0.0.1
_nss_resolve_gethostbyaddr_r("1.0.0.1") → status=NSS_STATUS_SUCCESS
errno=999/--- h_errno=0/Resolver Error 0 (no error)
"one.one.one.one"
AF_INET 1.0.0.1
_nss_resolve_gethostbyaddr2_r("10.38.5.41") → status=NSS_STATUS_SUCCESS
errno=999/--- h_errno=0/Resolver Error 0 (no error) ttl=0
"squid.redhat.com"
alias "squid.corp.redhat.com"
alias "squid2.corp.redhat.com"
alias "squid3.corp.redhat.com"
alias "squid4.corp.redhat.com"
alias "squid5.corp.redhat.com"
AF_INET 10.38.5.41
_nss_resolve_gethostbyaddr_r("10.38.5.41") → status=NSS_STATUS_SUCCESS
errno=999/--- h_errno=0/Resolver Error 0 (no error)
"squid.redhat.com"
alias "squid.corp.redhat.com"
alias "squid2.corp.redhat.com"
alias "squid3.corp.redhat.com"
alias "squid4.corp.redhat.com"
alias "squid5.corp.redhat.com"
AF_INET 10.38.5.41
(I have 10.38.5.41 squid.redhat.com squid.corp.redhat.com squid2.corp.redhat.com squid3.corp.redhat.com squid4.corp.redhat.com squid5.corp.redhat.com
in /etc/hosts for testing.)
RFC 6762 defines the top bit in RRs to mean cache flush (section 10.2),
and the top bit in questions to mean that a unicast reply is wanted
(section 5.4).
dns_packet_read_key() is used for parsing both questions and RRs.
When called from dns_packet_extract_question(), the top bit being set
should not result in the packet being rejected as invalid.
Fixes https://github.com/systemd/systemd/issues/17973
Add an --extension parameter to portablectl, and new DBUS methods
to attach/detach/reattach/inspect.
Allows to append separate images on top of the root directory (os-release
will be searched in there) and mount the images using an overlay-like
setup (unit files will be searched in there) using the new ExtensionImages
service option.
Check the return code from gcrypt's functions. In some
cases just log, as it shoulnd't really happen.
Fixes various Coverity issues:
CID #1444702
CID #1444704
CID #1444706
CID #1444711
CID #1444712
CID #1444713
$ sudo dnf remove --installroot=/var/tmp/img1 systemd-networkd
...
Running scriptlet: systemd-networkd-248~rc4-4.fc32.x86_64 1/1
Removed /etc/systemd/system/multi-user.target.wants/systemd-networkd.service.
Removed /etc/systemd/system/sockets.target.wants/systemd-networkd.socket.
Removed /etc/systemd/system/dbus-org.freedesktop.network1.service.
Removed /etc/systemd/system/network-online.target.wants/systemd-networkd-wait-online.service.
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
(Another option would be make --now do nothing if systemd is not running.
But I think that's not too good. 'disable --now' doing nothing would be OK,
since if systemd is not running, the service is not running either, so we are
in the desired state. But that argument doesn't work for 'enable --now'. And
accepting 'disable --now' but not 'enable --now' seems overly complex. So I
think it is better to make the scriptlet handle this case explicitly.)
Also, let's reindent the file to 4 spaces. Very deeply nested scriptlets are
harder to read, and the triggers file is indented to 4 spaces already.
This specifes two new optional fields for /etc/os-release:
IMAGE_VERSION= and IMAGE_ID= that are supposed to identify the image of
the current booted system by name and version.
This is inspired by the versioning stuff in
https://github.com/systemd/mkosi/pull/683.
In environments where pre-built images are installed and updated as a
whole the existing os-release version/distro identifier are not
sufficient to describe the system's version, as they describe only the
distro an image is built from, but not the image itself, even if that
image is deployed many times on many systems, and even if that image
contains more resources than just the RPMs/DEBs.
In particular, "mkosi" is a tool for building disk images based on
distro RPMs with additional resources dropped in. The combination of all
of these together with their versions should also carry an identifier
and version, and that's what IMAGE_VERSION= and IMAGE_ID= is supposed to
be.
If we define a partition with CopyFiles=/efi/ this should just work.
However it previously didn't because basename() would return the
trailing slash.
Let's fix this by moving things to path_extract_{directory|filename}()
The advantage of stream compression is keeping a low memory profile,
but the lz4 stream compressor usage mmaps the whole file in memory.
Change it to read bits by bits, like the other stream compression
helpers.
This code is trying to do two things: when reading a file with working
st.st_size, detect when the file size changes between the fstat() and our
allocation of the buffer based on the returned size, and the subsequent read().
When reading a file without st.st_size, read up to READ_FULL_BYTES_MAX.
But this second scenario was partially broken: we'd start with size = 4095, and
double the size up to three times, i.e. up to 32767. But we want to read up to
READ_FULL_BYTES_MAX.
So let's listentangle the two cases a bit: if a file returns non-zero st._size,
proceed as before. But if we don't know the size, let's immediately allocate
the buffer of maximum size of READ_FULL_BYTES_MAX. I think that allocating 4MB
and 1MB is going to take pretty much the same time as long as the memory is not
written to, so by allocating 1MB, 2MB, and 4MB, we wouldn't really be saving
anything internally, but wasting time on repeated reads, if the file is long
enough.
Also, don't do the seek if we know we're going to return an error immediately
after.
This should fix reading of any files in /proc, which all have size == 0. In
particular, various files read by coredump might be larger than 32767.
What about /sys? The file there return a fake value, usually 4096. So we'll
allocate a small buffer and read that.
Improve the logging to only print if systemd-oomd killed something. And
also print which cgroup was targeted.
Demote general swap above/pressure above messages to debug.
[zjs: fix some issuelets found in review]
https://bugzilla.redhat.com/show_bug.cgi?id=1944171
This was in F33, systemd-246.13, but the logic in the code didn't change.
Thread 1 (Thread 0x7fb5f0341b80 (LWP 1974)):
№0 selabel_lookup_common (rec=0x0, translating=0, key=0x55f616ac4750 "/run/user/1000/systemd/units/invocation:systemd-tmpfiles-clean.service", type=40960) at label.c:167
'rec' is the handle that we passed.
№1 0x00007fb5f13ae87f in selabel_lookup_raw (rec=<optimized out>, con=con@entry=0x7fffef307380, key=key@entry=0x55f616ac4750 "/run/user/1000/systemd/units/invocation:systemd-tmpfiles-clean.service", type=type@entry=40960) at label.c:256
lr = <optimized out>
'rec' is passed through as is to selabel_lookup_common().
№2 0x00007fb5f1561b2d in selinux_create_file_prepare_abspath (abspath=0x55f616ac4750 "/run/user/1000/systemd/units/invocation:systemd-tmpfiles-clean.service", mode=40960) at ../src/basic/selinux-util.c:368
filecon = 0x0
r = <optimized out>
__PRETTY_FUNCTION__ = "selinux_create_file_prepare_abspath"
__func__ = "selinux_create_file_prepare_abspath"
№3 0x00007fb5f1561ec3 in mac_selinux_create_file_prepare (path=<optimized out>, mode=40960) at ../src/basic/selinux-util.c:431
r = 0
abspath = 0x55f616ac4750 "/run/user/1000/systemd/units/invocation:systemd-tmpfiles-clean.service"
__PRETTY_FUNCTION__ = "mac_selinux_create_file_prepare"
We checked label_hnd != NULL, but then we apparently called
avc_netlink_check_nb(), which reset label_hnd. Yay for global state!
№4 0x00007fb5f1549950 in symlink_atomic_label (from=0x55f6169d8b50 "69a8dcf7a7ac46b29306f2fddbed3edc", to=0x55f616ab8380 "/run/user/1000/systemd/units/invocation:systemd-tmpfiles-clean.service") at ../src/basic/label.c:55
r = <optimized out>
__PRETTY_FUNCTION__ = "symlink_atomic_label"
In the logs:
Mar 29 14:48:44 fedorapad.home systemd[1974]: selinux: avc: received policyload notice (seqno=2)
Mar 29 14:48:44 fedorapad.home systemd[1974]: Failed to initialize SELinux labeling handle: No such file or directory
Mar 29 14:48:44 fedorapad.home systemd[1974]: selinux: avc: received policyload notice (seqno=3)
Mar 29 14:48:44 fedorapad.home systemd[1974]: selinux: avc: received setenforce notice (enforcing=0)
With 8f20232fcb52dbe6255f3df6101fc057af90bcfa systemd-localed supports
generating locales when required. This fails if the locale directory is
read-only, so make it writable.
Closes#19138
LLD 13 and GNU ld 2.37 support -z start-stop-gc which allows garbage
collection of C identifier name sections despite the __start_/__stop_
references. Simply set the retain attribute so that GCC 11 (if
configure-time binutils is 2.36 or newer)/Clang 13 will set the
SHF_GNU_RETAIN section attribute to prevent garbage collection.
Without the patch, there are linker errors like the following with -z
start-stop-gc.
```
ld: error: undefined symbol: __start_SYSTEMD_BUS_ERROR_MAP
>>> referenced by bus-error.c:93 (../src/libsystemd/sd-bus/bus-error.c:93)
>>> sd-bus_bus-error.c.o:(bus_error_name_to_errno) in archive src/libsystemd/libsystemd_static.a
```
let's make sure we set the "aa" bit in the stub only if we answer with
fully authoritative data. For this ensure:
1. Either all data is synthetic, including all CNAME/DNAME redirects
2. Or all data comes from the local trust anchor or the local zones
(i.e. not the network or the cache)
Follow-up for 4ad017cda57b04b9d65e7da962806cfcc50b5f0c
Coverity was complaining that we don't check the return value, which we stopped
doing in 772e0a76f34914f6f81205e912e4744c6b23f704.
But it seems that we don't want those calls at all. The test was originally
added with the call in a6ee01caf3409ba9820e8824b9262fbac31a9f77, but I don't
see why we should override this. If the user wants to execute the test with
mempool disabled, we shouldn't ignore that.
Coverity CID#1444464, CID#1444466.
We'd proceed rather inefficiently: the initial buffer size was LINE_MAX/2,
i.e. only 1k. We can read 4k at the same cost.
Also, we'd try to allocate 1025, 2049, 4097 bytes, i.e. always one higher than
the power-of-two size. Effectively the allocation would be bigger, and we'd
waste the additional space. So let's allocate aligned to the power-of-two size.
size=4095, 8191, 16383, so we allocate 4k, 8k, 16k.
We'd first assign a value up to SSIZE_MAX, and then immediately check if we
have a value bigger than READ_FULL_BYTES_MAX. This wasn't exactly wrong, but a
bit roundabout. Let's immediately assign the value from the appropriate range
or error out.
Coverity CID#1450973.
This adds generic support for the SetCredential=/LoadCredential= logic
to our password querying infrastructure: if a password is requested by a
program that has a credential store configured via
$CREDENTIALS_DIRECTORY we'll look in it for a password.
The "systemd-ask-password" tool is updated with an option to specify the
credential to look for.
Let's make use of our own credentials infrastructure in our tools: let's
hook up systemd-sysusers with the credentials logic, so that the root
password can be provisioned this way. This is really useful when working
with stateless systems, in particular nspawn's "--volatile=yes" switch,
as this works now:
# systemd-nspawn -i foo.raw --volatile=yes --set-credential=passwd.plaintext-password:foo
For the first time we have a nice, non-interactive way to provision the
root password for a fully stateless system from the container manager.
Yay!
Let's be a bit less strict when setting up credentials: if the service
manager didn't receieve a cred, and we shall propagate it down via
LoadCredentials= don't fail. Fail on all other errors though, as before,
and on explicitly listed paths.
This allows "LoadCredentials=foo" to be used as shortcut for
"LoadCredentials=foo:foo", i.e. it's a very short way to inherit a
credential under its original name from the service manager into a
service.
In bind_remount_one_with_mountinfo() let's handle mount failures
gracefully if the flags already match anyway. This isn't perfect, since
it mixes up superblock and mount point flags, but it's close enough.
And get rid of get_mount_flags() altogether.
(This drops the statvfs() fallback that get_mount_flags() did. That
fallback was incomplete however, and mostly hid errors. Our primary
avenue to get mount flags is /proc/self/mountinfo and we should trust
it, and fix bugs we might encounter with it, but not tape over it.
Dropping the fallback is relevant in particular as it actually returned
mount flags for any path, not just mount points, which was very icky.)
This replaces the "todo" set with a "todo" hash map that stores the
mount flags we found. This makes an explicit call to get_mount_flags()
unncessary, since we have the flags handy right-away, and lowers our
work from O(n^2) to O(n). Nice!
The "done" set is also improved slightly: we'll use more modern ways to
allocate it, via set_ensure_consume(), and freeing-via-hash_ops.
Finally, failures on submount remounts are now handled gracefully,
there are just too many reasons why they might fail, given NFS, autofs,
FUSE which weird access controls, where even root might lack the privs
to do something.
Fixes: #16156
Instead of marking the bind mount read-only right-away, let's just
restart the loop, so that we'll pick it up like any other mount and then
remount like that.
Let's always query one property, check it, and then query the next,
preferring "cheap" ones over "slow" ones (i.e. cheap are the ones we can
check directly, and slow are the ones we need to check with some loop of
some kind).
The various flavours of stat() basically tell us for free if something
is a symlink. If it is, then it's definitely not a mount point. Use
that.
All other inode types can be mount point, just symlinks cannot.
This fixes resetting of initial_jitter_elapsed: the first time the timer
hits after initial_jitter_scheduled is set we need to mark things as
elapsed.
(Also improve log messages around this while we are at it)
This adds the same line to most of our .conf files.
Not for systemd/user.conf though, since we can't correctly display it right
now:
$ systemd-analyze cat-config --user systemd/user.conf
Option --user is not supported for cat-config right now.
For sysusers.d, tmpfiles.d, rules.d, etc, there is no single file. Maybe
we should short READMEs in /usr/lib/sysusers.d, /usr/lib/tmpfiles.d, etc.?
Inspired by #19118.
When following CNAME/DNAME redirects in the stub we currently first
iterate through the packet and pick up what we can use (in
dns_stub_collect_answer_by_question() and friends), following all
CNAMEs/DNAMEs, and would then issue dns_query_process_cname() to move
the DnsQuery object forward too, where we'd then possibly restart
the query and pick things up again, as above.
There's one thought error in this though: dns_query_process_cname()
tries to be smart and will internally follow not just a single
CNAME/DNAME redirect, but a chain of them if they are contained inside
the same packet until we reach the point where the answer is not
included in the packet anymore, where we'd restart the query. This was
great as long as we only focussed on the D-Bus and Varlink resolver
APIs, since there the CNAME/DNAME chain in the middle doesn't actually
matter, we just return information about the final name of the RR and
its content, and aren't interested in the chain to it. For the DNS stub
this is different however: there we need to place the full CNAME/DNAME
chain (and all the appropriate metadata RRs) in the stub reply.
Hence rework this so that we build on the fact that the previous commit
split dns_query_process_cname() in two:
1. dns_query_process_cname_one() will do exactly one CNAME/DNAME
redirect step. This will be called by the stub, so that we can pick
up matching RRs for every single step along the way.
2. dns_query_process_cname_many() will follow a chain as long as that's
possible within the same packet. It's thus pretty much identical to
the old dns_query_process_cname() call. This is what we now use in
the D-Bus and Varlink APIs. dns_query_process_cname_many() is
basically just a loop around dns_query_process_cname_one().
Any logic to follow and pick up RRs manually in the stub along the
CNAME/DNAME path is now dropped (i.e.
dns_stub_collect_answer_by_question() becomes trivially simple again),
we solely rely on dns_query_process_cname_one() to follow CNAME/DNAME
now: each step followed by a full call of dns_stub_assign_sections() to
copy out the RRs that matter.
Net result: things are a bit simpler again, as the only place we follow
CNAME/DNAME redirects is DnsQuery again, and stub answers are always
complete: they contain all CNAME/DNAME RRs on the way including all
their metadata we might pick up in the other sections.
This does some refactoring: the dns_query_process_cname() function
becomes two: dns_query_process_cname_one() and
dns_query_process_cname_many(). The former will process exactly one
CNAME chain element, the latter will follow a chain for as long as
possible within the current packet.
dns_query_process_cname_many() is mostly identical to the old
dns_query_process_cname(), and all existing code is moved over to using
that.
This is mostly preparation for the next commit, where we make direct use
of dns_query_process_cname_one().
This also renames the DNS_QUERY_RESTARTED return value to
DNS_QUERY_CNAME. That's because in the dns_query_process_cname_many()
case as before if we return this we restarted the query in case we
reached the end of the chain without a conclusive answer, as before. But
in dns_query_process_cname_one() we'll only go one step anyway, and
leave restarting if needed to the caller. Hence DNS_QUERY_RESTARTED is a
bit of a misnomer in that case.
This also gets rid of the weird tail recursion in
dns_query_process_cname() and replaces it with an explicit loop in
dns_query_process_cname_many(). The old recursion wasn't a security
issue since we put a limit on the number of CNAMEs we follow anyway, but
it's still icky to scale stack use by that.
Static analyzers need a hint that optval is not pointing
off the end of the msg_advertise array, since pos can go
up to the full length of it. The array is manually
constructed so we know this won't happen, but adding one
more assert should be enough to avoid false positives.
Coverity CID #1394277
Previously we'd stick all answer sections RRs we acquired into
the authoritative section if we didn't find them directly answering our
question. Let's put them into additional instead. The authoritative
section should hence only include what comes from the upstream
authoritative section, and nothing else.
Previously we'd iterate through the RRs of an mDNS reply and then find
exactly one matching transaction on our scope for it, and pass it as
reply to that. If multiple RRs of the same packet match we'd pas the
packet multiple times to the transaction even.
This all doesn't really work anymore since there can be multiple open
transactions for the same key (with different flags), and it's kinda
ugly anywy. Hence let's turn this around: let's iterate through the
transactions and check if any of the included RRs match it, and if so
pass the packet to that transaction exactly once.
This speeds up mDNS a bit, since previously we'd oftentimes fail to find
all suitable transactions for an mDNS reply (because there can be
multiple transactions for the same RR key with different flags, and we
checked exactly one flag combination). Which would then mean the
transaction would time out, and be retried – at which point the cache
would be populated and thus it would still succeed, but only after this
timeout. With this fix this is corrected: every transaction that matches
will get the reply, instantly as we get it.
This is inspired by a recent thread on fedora-devel: it's noteworthy
when we switch to the fallback servers, since it might (or might not)
indicate some configuration problem.
Fixes: #18788
This can happen if ifi fails to be read from the netlink message and the
error is ENODATA.
Fixes the following valgrind message when running netstat:
==164141== Conditional jump or move depends on uninitialised value(s)
==164141== at 0x524AE60: address_compare (local-addresses.c:29)
==164141== by 0x48BCC78: msort_with_tmp.part.0 (msort.c:105)
==164141== by 0x48BC9E4: msort_with_tmp (msort.c:45)
==164141== by 0x48BC9E4: msort_with_tmp.part.0 (msort.c:53)
==164141== by 0x48BCF85: msort_with_tmp (msort.c:45)
==164141== by 0x48BCF85: qsort_r (msort.c:297)
==164141== by 0x52500FC: UnknownInlinedFun (sort-util.h:47)
==164141== by 0x52500FC: local_gateways.constprop.0 (local-addresses.c:310)
==164141== by 0x5251C05: _nss_myhostname_gethostbyaddr2_r (nss-myhostname.c:456)
==164141== by 0x5252006: _nss_myhostname_gethostbyaddr_r (nss-myhostname.c:500)
==164141== by 0x498E7FE: gethostbyaddr_r@@GLIBC_2.2.5 (getXXbyYY_r.c:274)
==164141== by 0x498E560: gethostbyaddr (getXXbyYY.c:135)
==164141== by 0x121353: INET_rresolve.constprop.0 (inet.c:212)
==164141== by 0x1135B9: INET_sprint (inet.c:261)
==164141== by 0x121BFC: addr_do_one.constprop.0.isra.0 (netstat.c:1156)
Alternative title: Replace get_process_cmdline()'s fopen()/fread() with
read_full_virtual_file().
When RLIMIT_STACK is set to infinity:infinity, _SC_ARG_MAX will
return 4611686018427387903 (depending on the system, but definitely
something larger than most systems have). It's impractical to allocate this
in one go when most cmdlines are much shorter than that.
Instead use read_full_virtual_file() which seems to increase the buffer
depending on the size of the contents.
The generated string may include %, which will confuse both the
xprintf call, and the VA_FORMAT_ADVANCE macro.
Pass the generated string as an argument to a "%s" format string
instead.
Inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1929936.
This is similar to test-nss-hosts, but does users, groups, uid, gids.
Functions tested are:
_nss_*_getpwnam_r
_nss_*_getgrnam_r
_nss_*_getpwgid_r
_nss_*_getgrgid_r
Other entry points should be tested too, but it's not relevant to the bug
I was investigating, so I'm leaving that for later ;)
This reverts the gist of commit 798445ab84cff51bde7fcf936f0fb19c37cf858c.
Unfortunately the new syscall causes test-event to hang. 32 bit architectures
seem affected: i686 and arm32 in fedora koji. 32 bit build of test-event hangs
reliably under valgrind:
$ PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig meson build-32 -Dc_args=-m32 -Dc_link_args=-m32 -Dcpp_args=-m32 -Dcpp_link_args=-m32 && ninja -C build-32 test-event && valgrind build/test-event
If I set epoll_pwait2_absent=true, so the new function is never called, then
the issue does not reproduce. It seems to be strictly tied to the syscall.
On amd64, the syscall is not used, at least with the kernel that Fedora
provides. The kernel patch 58169a52ebc9a733aeb5bea857bc5daa71a301bb says:
For timespec, only support this new interface on 2038 aware platforms
that define __kernel_timespec_t. So no CONFIG_COMPAT_32BIT_TIME.
And Fedora sets CONFIG_COMPAT_32BIT_TIME=y. I expect most other distros will too.
On amd64: epoll_wait_usec: epoll_pwait2: ret=-1 / errno=38
On i686 (same kernel): epoll_wait_usec: epoll_pwait2: ret=2 / errno=0
Is this some kind of emulation? Anyway, it seems that this is what is going wrong.
So let's disable the syscall until it becomes more widely available and the
kinks have been ironed out.
Fixes test-event issue in #19052.
FirewallContext is used by networkd and nspawn. Both allocates the
context when it is really necessary. Hence, it is not necessary to delay
probing backend.
Moreover, if iptables backend is not enabled on build, and nftables is
not supported by kernel, previously `fw_nftables_init()` is called
everytime when we try to configure masquerade or dnat. It causes
significant performance loss.
Fixes test-firewall-util issue in #19052.
When trying to calculate the next firing of 'Sun *-*-* 01:00:00', we'd fall
into an infinite loop, because mktime() moves us "backwards":
Before this patch:
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
...
We rely on mktime() normalizing the time. The man page does not say that it'll
move the time forward, but our algorithm relies on this. So let's catch this
case explicitly.
With this patch:
$ TZ=Europe/Dublin faketime 2021-03-21 build/systemd-analyze calendar --iterations=5 'Sun *-*-* 01:00:00'
Normalized form: Sun *-*-* 01:00:00
Next elapse: Sun 2021-03-21 01:00:00 GMT
(in UTC): Sun 2021-03-21 01:00:00 UTC
From now: 59min left
Iter. #2: Sun 2021-04-04 01:00:00 IST
(in UTC): Sun 2021-04-04 00:00:00 UTC
From now: 1 weeks 6 days left <---- note the 2 week jump here
Iter. #3: Sun 2021-04-11 01:00:00 IST
(in UTC): Sun 2021-04-11 00:00:00 UTC
From now: 2 weeks 6 days left
Iter. #4: Sun 2021-04-18 01:00:00 IST
(in UTC): Sun 2021-04-18 00:00:00 UTC
From now: 3 weeks 6 days left
Iter. #5: Sun 2021-04-25 01:00:00 IST
(in UTC): Sun 2021-04-25 00:00:00 UTC
From now: 1 months 4 days left
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1941335.
This reverts commit 6d18c13e79a0b3374599a3416a644a7837d5a1e6.
The syntax like "0666" is very unclear. It only makes sense for some subset of
people who do C programming. Let's use the much more sensible modern python
syntax instead.
I *think* it doesn't actually make any difference, because ":" will be ignored.
437f48a471f51ac9dd2697ee3b848a71b4f101df added prefixing with ":", but didn't
take into account the fact that we also use "" with a different meaning than
NULL here. But let's restore the original behaviour of specifying the empty
string.
The output is rather long at this makes it easier to jump to the right place.
Also use normal output routines and set_unset_env() to make things more
compact.
The scope of start & stop is narrowed down, and they are assigned only once.
No functional change, but I think the code is easier to read this way.
Also add a comment to make the code easier to read.
When we checking if the responses we collected for a DnsQuery are
sufficient to complete it we previously only check if one of the
collected response RRs matches at least one of the question RR keys.
This changes the logic to require that there must be at least one
response RR matched *each* of the question RR keys before considering
the answer complete.
Otherwise we might end up accepting an A reply as complete answer for an
A/AAAA query and vice versa, but we want to make sure we wait until we
get a reply on both types before returning this to the user in all
cases.
This has been broken for basically forever, but didn't surface until
b1eea703e01da1e280e179fb119449436a0c9b8e since until then we'd basically
ignore the auxiliary RRs included in CNAME/DNAME replies. Once that
commit was made we'd start using the auxiliary RRs included in
CNAME/DNAME replies but those typically included only A or only AAAA
which we then took for complete.
Fixe: #19049
This follows up on 0b1f3c768ce1bd1490a5e53f539976dcef8ca765, adding more places
where we should reopen the log after forking with FORK_CLOSE_ALL_FDS.
When immediately calling exec in the child, prefer to explicitly reopen the log
after exec fails. In other cases, just use FORK_REOPEN_LOG.
Commit 0b1f3c768ce1bd1490a5e53f539976dcef8ca765 has introduced log_open()
calls after exec fails post-fork. However, the log_open() call itself could
change the value of errno, which, for me, manifested in:
$ coredumpctl gdb
...
Failed to invoke gdb: Success
Fix this by using PROTECT_ERRNO in log_open().
We have a bug where we seem to enter an infinite loop when running in the
Europe/Dublin timezone. The timezone is "special" because it has negative SAVE
values. The handling of this should obviously be fixed, but let's use a
belt-and-suspenders approach, and gracefully fail if we fail to find an answer
within a specific number of attempts. The code in this function is rather
complex, and it's hard to rule out another bug in the future.
This fixes the --size= switch, i.e. where we grow a disk image: after
growing it we need to expand the partition table so that its idea of the
the medium size matches the new reality. Otherwise our disk size
calculations in the subsequent steps might still use the original
ungrown size.
(This used to work, I guess this was borked when libfdisk learnt the
concept of "minimized" partition tables)
* for /dev/vsock a file permission of 0o666 was mentioned but 0666 is probably better understood, so let's use that
* correct non existing command 'ip dev'
* flag-set.cocci: perform the transformation only if the second
argument is a constant
* sd-journal/lookup3.c: skip the cocci completely for this file, since
it's not "ours"
* strjoina.cocci: skip the transformation on the "test_strjoina" test,
since it intentionally tests the "incorrect" expression we're trying to
transform (the same thing was already done in strjoin.cocci)
rr is asserted upon a few lines above, no need to check for null.
Coverity-found issue, CID 1450844
CID 1450844: Null pointer dereferences (REVERSE_INULL)
Null-checking "rr" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
We aren't interested in the data previousl read, hence free() followed
by malloc() is typically better since it means libc doesn't have to
restore the contained data needlessly.
If it is specified as NULL read_full_file() assumes the caller wants a C
string, and it looks for embedded NUL bytes to ensure that works. Given
we don#t actually use the size argument here, let's drop it.
(in one case the size argument is used, but not for actually processing
the full returned data, but just as a shortcut to compare things with
the original string. Let's drop use of that there, too given the risk of
embedded NUL bytes in the data read.)
Wherever we read virtual files we better should use
read_full_virtual_file(), to make sure we get a consistent response
given how weird the kernel's handling with partial read on such file
systems is.
For pressure based killing we want to target who has the highest
increase in pgscan from the previous interval (vs. the previous logic
which used raw pgscan). This will prevent biasing towards long running
cgroups as mentioned in #19007.
When the test suite is being run in a foreign environment,
/sys/fs/cgroup might not be set up in a way that we recognize.
Returning ENOMEDIUM causes the tests to be skipped in this case.
Bug: https://bugs.gentoo.org/771819
Fixes an issue introduced by 73b49d433c2c8e6304c8b82538bd4231d070fce4.
When PrefixDelegationHint= is not set, dhcp6_option_append_pd() sets
wrong length for IA_PD option, as `r` is `-EINVAL`.
Fixes#19021.
The function already has a ridiculous amount of paramaters, let's drop
one that is either not used at all or has a constant value and let's
pick it internally.
Previously, the flag did two things at once: enable support for using
generic partitions as root fs if there were only one/allow use of
partition-table-less images as root fs. And secondly, insist that there
was a rootfs, and fail if not. Let's split these two in two separate
options so that they can be used independently of each other.
There are cases where one wants to use one without the other (i.e. when
inspecting things with systemd-dissect tool it should be OK to do so
even if image has no root fs), and it's cleaner anyway.
Let's add a very simple mechanism for doing A/B updating of disk images:
for root + /usr and their verity partitions let's ue strverscmp() on the
label to determine which one to use when dissecting a disk image. That
way, if the root partition label contains a string such as "foo-0.15"
and another one "foo-0.16", the latter wins.
For other partition types let's stick to the logic of "first partition
found" win, as before. Versioning makes sense for partitions that
typically and primarily may carry software packages, but the other
partition types usuall don't.
Let's make use of the new dissection in all tools where this makes
sense, which are all tools that dissect images, except for those which
inherently operate on state/configuraiton and thus where an image
without state nor configuration is useless (e.g.
systemd-tmpfiles/systemd-firstboot/… --image= switch).
Let's add support for images that include an /usr/ file system but no
root fs. Mount a tmpfs as root for images like this, all controlled by a
new flag DISSECT_IMAGE_USR_NO_ROOT.
This is useful for entirely stateless images, that come up pristine on
every single boot.
Previously, when a process outputs something and exit just after
epoll_wait() but before process_child(), then the IO event is ignored
even if the IO event has higher priority. See #18190.
This can be solved by checking epoll event again after process_child().
However, there exists a possibility that another process outputs and
exits just after process_child() but before the second epoll_wait().
When the IO event has lower priority than the child event, still IO
event is processed.
So, this makes new epoll events and child events are checked in a loop
until no new event is detected. To prevent an infinite loop, the number
of maximum trial is set to 10.
Fixes#18190.
When doing a CNAME/DNAME redirect let's first check if the answer we
already have fully answers the redirected question already. If so, let's
use that. If not, let's properly restart things.
This simply removes one call to dns_answer_reset() that was placed too
early: instead of resetting when we detect a CNAME/DNAME redirect, do so
only after checking if the answer we already have doesn't match the
reply, and then decide to *actually* follow it. Or in other words: rely
on the dns_answer_reset() call in dns_query_go() which we'll call to
actually begin with the redirected question.
This fixes an optimization path which was broken back in 7820b320eaa608748f66f8105621640cf80e483a.
(This doesn't really matter as much as one might think, since our cache
stepped in anyway and answered the questions before going back to the
network. However, this adds noise if RRs with very short TTLs are cached
– which some CDNs do – and is of course relavant when people turn off
the local cache.)
Previously by mistake we'd always match every single reply we get in a
CNAME chain to the original question from the stub client. That's
broken, we need to test it against the CNAME query we are currently
looking at.
The effect of this incorrect matching was that we'd assign the RRs to
the wrong section since we'd assume they'd be auxiliary answers instead
of primary answers.
Fixes: #18972
When responding from DNS cache, let's slightly tweak how the TTL is
lowered: as before let's round down when converting from our internal µs
to the external seconds. (This is preferable, since records should
better be cached too short instead of too long.) Let's avoid rounding
down to zero though, since that has special semantics in many cases (in
particular mDNS). Let's just use 1s in that case.
We nowadays cache full answer RRset combinations instead of just the
exact matching rrset. This means we should not cache RRs that are not
immediate answers to our question for longer then their own RRs. Or in
other words: let's determine the shortest TTL of all RRs in the whole
answer, and use that as cache lifetime.
When using hidepid=invisible on procfs, the kernel will check if the
gid of the process trying to access /proc is the same as the gid of
the process that mounted the /proc instance, or if it has the ptrace
capability:
https://github.com/torvalds/linux/blob/v5.10/fs/proc/base.c#L723https://github.com/torvalds/linux/blob/v5.10/fs/proc/root.c#L155
Given we set up the /proc instance as root for system services,
The same restriction applies to CAP_SYS_PTRACE, if a process runs with
it then hidepid=invisible has no effect.
ProtectProc effectively can only be used with User= or DynamicUser=yes,
without CAP_SYS_PTRACE.
Update the documentation to explicitly state these limitations.
Fixes#18997
These were added to eficonex.h in gnu-efi 3.0.13. Let's move them
to missing_efi.h behind an appropriate guard to fix the build with
recent versions of gnu-efi.
The integer overflow happens when utf8_encoded_valid_unichar() returns an error
code. The error code is a negative number: -22. This overflows when it is
assigned to `z` (type `size_t`). This can cause an infinite loop if the value
of `q` is 22 or larger.
To reproduce the bug, you need to run `systemd-ask-password` and enter an
invalid unicode character, followed by a backspace character.
GHSL-2021-052
Previously, when a process outputs something and exit just after
epoll_wait() but before process_child(), then the IO event is ignored
even if the IO event has higher priority. See #18190.
This can be solved by checking epoll event again after process_child().
However, there exists a possibility that another process outputs and
exits just after process_child() but before the second epoll_wait().
When the IO event has lower priority than the child event, still IO
event is processed.
So, this makes new epoll events and child events are checked in a loop
until no new event is detected. To prevent an infinite loop, the number
of maximum trial is set to 10.
Fixes#18190.
Since f17bdf8264e231fa31c769bff2475ef698487d0b the test-repart was
effectively disabled, since `/dev/loop-control` is a character special
file, whereas `-f` works only on regular files. Even though we could use
`-c` to check specifically for character special files, let's use `-e`
just in case.
After all we are only interested in symlinks either in top-level config
directory or in .wants and .requires sub-directories.
As a bonus this should speed up ListUnitFiles() roughly 3-4x on systems
with a lot of units that use drop-ins (e.g. SSH jump hosts with a lot of
user session scopes).
Tables with only one column aren't really tables, they are lists. And if
each cell only consists of a single word, they are probably better
written in a single line. Hence, shorten the man page a bit, and list
boot loader spec partition types in a simple sentence.
Also, drop "root-secondary" from the list. When dissecting images we'll
upgrade "root-secondary" to "root" if we mount it, and do so only if
"root" doesn't exist. Hence never mention "root-secondary" as we never
will mount a partition under that id.
This makes sure nspawn's --volatile=yes switch works again: there we
have a read-only image that is overmounted by a tmpfs (with the
exception of /usr). This we need to mkdir all mount points even though
the image is read-only.
Hence, let's drop the optimizatio of avoiding mkdir() on images that are
read-only, it's wrong and misleading here, since the image itself might
be read-only but our mounts are not.
Previously handling of DISSECT_IMAGE_MKDIR was pretty weird and broken:
it would control both if we create the top-level mount point when
mounting an image, and the inner mount points for images that consist of
multiple file systems. However, the latter is redundant, since
1f0f82f1311e4c52152b8e2b6f266258709c137d does this too, a few lines
further up – unconditionally!
Hence, let's make the meaning of DISSECT_IMAGE_MKDIR more strict: it
shall be only about the top-level mount point, not about the inner ones
(where we'll continue to create what is missing alwayway). Having a
separate flag for the top-level mount point is relevant, since the mount
point dir created by it will remain on the host fs – unlike the
directories we create inside the image, which will stay within the
image.
This slightly change of meaning is actually inline with what the flag is
actually used for and documented in systemd-dissect.
Apart from tests, the new argument isn't used anywhere, so there should be no
functional change. Note that the two arms of the big conditional are switched, so the
diff is artificially inflated. The actual code change is rather small. I dropped the
path which extracts ret_value manually, because it wasn't supporting unescaping of the
escape character properly.
… when not used to escape the separator (,) or the escape character (\).
This mostly restores behaviour from before 0645b83a40d1c782f173c4d8440ab2fc82a75006,
but still allows "," to be escaped.
Partially fixes#18952.
With EXTRACT_UNESCAPE_SEPARATORS, backslash is used to escape the separator.
But it wasn't possible to insert the backslash itself. Let's allow this and
add test.
A test for stripping of escaped backslashes without any flags was explicitly
added back in 4034a06ddb82ec9868cd52496fef2f5faa25575f. So it seems to be on
purpose, though I would say that this is at least surprising and hence deserves
a comment.
In test-extract-word, add tests for standalone EXTRACT_UNESCAPE_SEPARATORS.
Only behaviour combined with EXTRACT_CUNESCAPE was tested.
In the conversion from strv_split() to strv_split_full() done in
7bb553bb98a57b4e03804f8192bdc5a534325582, EXTRACT_DONT_COALESCE_SEPARATORS was
added. I think this was just by mistake… We never look for "empty options", so
whether we immediately ignore the extra separator or store the empty string in
strv, should make no difference.
When we failed to split the options (because of disallowed quoting syntax, which
might be a bug in its own), we would silently fail. Instead, let's emit a warning.
Since we ignore the value if we cannot parse it anyway, let's ignore this error
too.
Closes#18669.
This creates a "well known" for sgx_enclave ownership. By doing this here we
avoid the risk that various projects making use of the device will provide
similar-but-slightly-incompatible installation instructions, in particular
using different group names.
ACLs are actually a better approach to grant access to users, but not in all
cases, so we want to provide a standard group anyway.
Mode is 0o660, not 0o666 because this is very new code and distributions are
likely to not want to give full access to all users. This might change in the
future, but being conservative is a good default in the beginning.
Rules for /dev/sgx_provision will be provided by libsg-ae-pce:
https://github.com/intel/linux-sgx/issues/678.
Skip printing the coredump info table when using the `debug` verb in
combination with the `-q/--quiet` option. Useful when trying to gather
coredump info non-interactively via scripted gdb commands.
Fixes: systemd/systemd#18935
otherwise udev complains about the file being world-writable:
systemd-udevd[228]: Configuration file /etc/udev/rules.d/00-set-LD_PRELOAD.rules is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Fixes: systemd/systemd-centos-ci#354
This reverts commit 876c75fe870846b09b54423a6b719d80bc879b27.
The patch seems to cause usb devices to get some attributes set from the parent
PCI device. 'hwdb' builtin has support for breaking iteration upwards on usb
devices. But when '--subsystem=foo' is specified, iteration is continued. I'm
sure it *could* be figured out, but it seems hard to get all the combinations
correct. So let's revert to functional status quo ante, even if does the lookup
more than once unnecessarily.
Fixes#18125.
When running TEST-22 under ASan, there's a chain of events which causes
`stat` to output an extraneous ASan error message, causing following
fail:
```
+ test -d /tmp/d/1
++ stat -c %U:%G:%a /tmp/d/1
==82==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
+ test = daemon:daemon:755
.//usr/lib/systemd/tests/testdata/units/testsuite-22.02.sh: line 24: test: =: unary operator expected
```
This is caused by `stat` calling nss which in Arch's configuration calls
the nss-systemd module, that pulls in libasan which causes the $LD_PRELOAD
error message, since `stat` is an uninstrumented binary.
The $LD_PRELOAD variable is explicitly unset for all testsuite-* services
since it causes various issues when calling uninstrumented libraries, so
setting it globally is not an option. Another option would be to set
$LD_PRELOAD for each `stat` call, but that would unnecessarily clutter
the test code.
So Linux has this (insane — in my opinion) "feature" that if you name a
network interface "foo%d" then it will automatically look for the
interface starting with "foo…" with the lowest number that is not used
yet and allocates that.
We should never clash with this "magic" handling of ifnames, hence
refuse this, since otherwise we never know what the name is we end up
with.
We should probably switch things from a deny list to an allow list
sooner or later and be much stricter. Since the kernel directly enforces
only very few rules on the names, we'd need to do some research what is
safe and what is not first, though.
We use 'unsigned' as the type, but netlink(7) says the type is 'int'.
It doesn't really matter, since they are both the same size. Let's use
our helper to shorten the code a bit.
Without this, privilege escalation through polkit does not work, because all
methods fail with permission errors.
Forgotten in 8885fed4e3a52cf1bf105e42043203c485ed9d92.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1933335.
PID1 already logs about the service being started, so this line isn't necessary
in normal use. Also, by the time it is emitted, the service has already
signalled readiness, so let's not say "starting" but "started".
Fixes#18025, https://bugzilla.redhat.com/show_bug.cgi?id=1931034.
We drop the reference stored in Manager.managed_oom_varlink_request in two code paths:
vl_disconnect() which is installed as a disconnect callback, and in manager_varlink_done().
But we also make a disconnect from manager_varlink_done(). So we end up with the following
call stack:
(gdb) bt
0 vl_disconnect (s=0x112c7b0, link=0xea0070, userdata=0xe9bcc0) at ../src/core/core-varlink.c:414
1 0x00007f1366e9d5ac in varlink_detach_server (v=0xea0070) at ../src/shared/varlink.c:1210
2 0x00007f1366e9d664 in varlink_close (v=0xea0070) at ../src/shared/varlink.c:1228
3 0x00007f1366e9d6b5 in varlink_close_unref (v=0xea0070) at ../src/shared/varlink.c:1240
4 0x0000000000524629 in manager_varlink_done (m=0xe9bcc0) at ../src/core/core-varlink.c:479
5 0x000000000048ef7b in manager_free (m=0xe9bcc0) at ../src/core/manager.c:1357
6 0x000000000042602c in main (argc=5, argv=0x7fff439c43d8) at ../src/core/main.c:2909
When we enter vl_disconnect(), m->managed_oom_varlink_request.n_ref==1.
When we exit from vl_discconect(), m->managed_oom_varlink_request==NULL. But
varlink_close_unref() has a copy of the pointer in *v. When we continue executing
varlink_close_unref(), this pointer is dangling, and the call to varlink_unref()
is done with an invalid pointer.
Reading file '/usr/lib/systemd/ntp-units.d/80-systemd-timesync.list'
Failed to add NTP service "# This file is part of systemd.", ignoring: Invalid argument
Failed to add NTP service "# See systemd-timedated.service(8) for more information.", ignoring: Invalid argument
:(
Previously we'd just check if the ID was no-empty an no longer than
FILENAME_MAX. The latter was probably a mistake, given the comment next
to it. Instead of fixing that to check for NAME_MAX let's instead just
switch over to filename_is_valid() which odes a similar check, plus a
some minor additional checks. After all we do want that valid EFI boot
menu entry ids are usable as filenames.
This fixes two checks where we compare string sizes when validating with
FILENAME_MAX. In both cases the check apparently wants to check if the
name fits in a filename, but that's not actually what FILENAME_MAX can
be used for, as it — in contrast to what the name suggests — actually
encodes the maximum length of a path.
In both cases the stricter change doesn't actually change much, but the
use of FILENAME_MAX is still misleading and typically wrong.
Previously, if the hashmap is allow-list and a new deny-listed syscall
is added, seccomp_parse_syscall_filter() simply drop the new syscall
from hashmap even if error number is specified.
This makes 'allow-list' hashmap store two types of entries:
- allow-listed syscalls, which are stored with negative value (-1).
- deny-listed syscalls, which are stored with specified errno.
Fixes#18916.
parse_syscall_and_errno() does not check the validity of syscall name or
syscall group name, but it just split into syscall name and errno.
So, it is not necessary to call it for SystemCallLog=.
Shouldn't make any difference, but let's first flush any pending messages, then
unref the reference-counted stuff, and only at the end do the direct free calls.
C.f. 9793530228.
We'd crash when trying to access an already-deallocated object:
Thread no. 1 (7 frames)
#2 log_assert_failed_realm at ../src/basic/log.c:844
#3 event_inotify_data_drop at ../src/libsystemd/sd-event/sd-event.c:3035
#4 source_dispatch at ../src/libsystemd/sd-event/sd-event.c:3250
#5 sd_event_dispatch at ../src/libsystemd/sd-event/sd-event.c:3631
#6 sd_event_run at ../src/libsystemd/sd-event/sd-event.c:3689
#7 sd_event_loop at ../src/libsystemd/sd-event/sd-event.c:3711
#8 run at ../src/home/homed.c:47
The source in question is an inotify source, and the messages are:
systemd-homed[1340]: /home/ moved or renamed, recreating watch and rescanning.
systemd-homed[1340]: Assertion '*_head == _item' failed at src/libsystemd/sd-event/sd-event.c:3035, function event_inotify_data_drop(). Aborting.
on_home_inotify() got called, then manager_watch_home(), which unrefs the
existing inotify_event_source. I assume that the source gets dispatched again
because it was still in the pending queue.
I can't reproduce the issue (timing?), but this should
fix#17824, https://bugzilla.redhat.com/show_bug.cgi?id=1899264.
Dell new Privacy feature provide new hardware level privacy
protect for users
This patch remaps scancode 0x120001 to key code F20 micmute
The old matching string cannot cover some other Dell products
which have the privacy feature,expand the string to all the system
that can load the privacy driver,privacy driver already detect the
system if it can support this feature. So here we can safely just
map the micmute key to scancode 0x120001
Signed-off-by: Perry Yuan <perry_yuan@dell.com>
This test would normally get stuck when trying to mount the verity image
due to:
systemd-udevd[299]: dm-0: '/usr/sbin/dmsetup udevflags 6293812'(err) '==371==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.'
systemd-udevd[299]: dm-0: Process '/usr/sbin/dmsetup udevflags 6293812' failed with exit code 1
...
systemd-udevd[299]: dm-0: '/usr/sbin/dmsetup udevcomplete 6293812'(err) '==372==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.'
systemd-udevd[299]: dm-0: Process '/usr/sbin/dmsetup udevcomplete 6293812' failed with exit code 1.
systemd-udevd[299]: dm-0: Command "/usr/sbin/dmsetup udevcomplete 6293812" returned 1 (error), ignoring.
so let's add a simple udev rule which sets $LD_PRELOAD for the block
subsystem.
Also, install the ASan library along with necessary dependencies into
the verity minimal image, to get rid of the annoying (yet harmless)
errors about missing library from $LD_LIBRARY.
If we synthesize a stub reply from multiple upstream packet (i.e. a
series of CNAME/DNAME redirects), it might happen that we add the same
RR to a different reply section at a different CNAME/DNAME redirect
chain element. Let's clean this up once we are about to send the reply
message to the client: let's remove sections from "lower-priority"
sections when they are already listed in a "higher-priority" section.
In 2f4d8e577ca7bc51fb054b8c2c8dd57c2e188a41 I argued that following
CNAMEs in the stub is not necessary anymore. However, I think it' better
to revert to the status quo ante and follow it after all, given it is
easy for us and makes sure our D-Bus/varlink replies are more similar to
our DNS stub replies that way, and we save clients potential roundtrips.
Hence, whenever we hit a CNAME/DNAME redirect, let's restart the query
like we do for the D-Bus/Varlink case, and collect replies as we go.
www.netflix.com responds with a chain of CNAMEs in the same packet.
Let's handle that properly (so far we only followed CNAMEs a single step
when in the same packet)
Fixes: #18819
Let's refuse to consider CNAME/DNAME replies matching for RR types where
that is not really conceptually allow (i.e. on CNAME/DNAME lookups
themselves).
(And add a similar check to dns_resource_key_match_cname_or_dname() too,
which implements a smilar match)
Also use structued initialization in one more place, use '\0' for NUL bytes,
and move variable to the right block (the code was OK, but it is strange to
have 'char *value' defined in a different scope then 'size_t value_allocated').
The fuzzer seems to have no trouble with this sample. It seems that the
problem reported in the bug is not caused by the match parsing code. But
let's add the sample just in case.
https://bugzilla.redhat.com/show_bug.cgi?id=1935084
This fuzzer is based on test-bus-match. Even the initial corpus is
derived entirely from it.
https://bugzilla.redhat.com/show_bug.cgi?id=1935084 shows an crash
in bus_match_parse(). I checked the coverage stats on oss-fuzz, and
sadly existing fuzzing did not cover this code at all.
I'm getting the following error under valgrind:
==305970== Invalid free() / delete / delete[] / realloc()
==305970== at 0x483E9F1: free (vg_replace_malloc.c:538)
==305970== by 0x4012CD: mfree (alloc-util.h:48)
==305970== by 0x4012EF: freep (alloc-util.h:83)
==305970== by 0x4017F4: LLVMFuzzerTestOneInput (fuzz-bus-match.c:58)
==305970== by 0x401A58: main (fuzz-main.c:39)
==305970== Address 0x59972f0 is 0 bytes inside a block of size 8,192 free'd
==305970== at 0x483FCE4: realloc (vg_replace_malloc.c:834)
==305970== by 0x4C986F7: _IO_mem_finish (in /usr/lib64/libc-2.33.so)
==305970== by 0x4C8F5E0: fclose@@GLIBC_2.2.5 (in /usr/lib64/libc-2.33.so)
==305970== by 0x49D2CDB: fclose_nointr (fd-util.c:108)
==305970== by 0x49D2D3D: safe_fclose (fd-util.c:124)
==305970== by 0x4A4BCCC: fclosep (fd-util.h:41)
==305970== by 0x4A4E00F: bus_match_to_string (bus-match.c:859)
==305970== by 0x4016C2: LLVMFuzzerTestOneInput (fuzz-bus-match.c:58)
==305970== by 0x401A58: main (fuzz-main.c:39)
==305970== Block was alloc'd at
==305970== at 0x483FAE5: calloc (vg_replace_malloc.c:760)
==305970== by 0x4C98787: open_memstream (in /usr/lib64/libc-2.33.so)
==305970== by 0x49D56D6: open_memstream_unlocked (fileio.c:97)
==305970== by 0x4A4DEC5: bus_match_to_string (bus-match.c:859)
==305970== by 0x4016C2: LLVMFuzzerTestOneInput (fuzz-bus-match.c:58)
==305970== by 0x401A58: main (fuzz-main.c:39)
==305970==
So the fclose() which is called from _cleanup_fclose_ clearly reallocates the
buffer (maybe to save memory?). open_memstream(3) says:
The locations referred to by these pointers are updated each time the
stream is flushed (fflush(3)) and when the stream is closed (fclose(3)).
This seems to mean that we should close the stream first before grabbing the
buffer pointer.
In PR #17431 we have introduced retry loop in link_update() in order to
maximize the chance that we end up with correct target when there are
multiple contenders for given symlink.
Number of iterations in retry loop is either 1 or
LINK_UPDATE_MAX_RETRIES, depending on the value of 'initialized' db
flag. When device appears for the first time we need to set the
flag before calling link_update() via update_devnode() for the second
time to make sure we run the second invocation with higher retry loop
counter.
When running integration tests under sanitizers D-Bus fails to
shutdown cleanly, causing unnecessary noise in the logs:
```
dbus-daemon[272]: ==272==LeakSanitizer has encountered a fatal error.
dbus-daemon[272]: ==272==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
dbus-daemon[272]: ==272==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
```
Since we're not "sanitizing" D-Bus anyway let's disable LSan's at_exit
check for the dbus.service to get rid of this error.
NULLSTR_FOREACH expects two terminating NULs, but the joined string
for extension-release.d only had the canonical one.
Use a placeholder when joining and fix it manually.
Given these files are part of procfs, let's use the correct API calls
for reading them.
This changes one occasion of read_one_line_file() to
read_full_virtual_file(), which superficially is a different thing, but
shouldn't actually be a difference, since sysctls can't be longer than
4K anyway, and the piecemeal logic behind read_one_line_file() cannot
work with the special semantics of procfs anyway.
Back in v232 systemd-shutdown would log to /dev/console. However after
the addition of always_reopen_console (v233) it would log to STDERR.
This caused some debugging issues as container systemd-shutdown logs
weren't being logged to console as the arg `--log-target=console` suggested.
Since it appears that always_reopen_console was intended for pid1, set
it in systemd-shutdown as well so logs will go to /dev/console.
It's so similar to copy_access(), hence let's move it over and rename it
in similar style to the rest of the functions.
No change in behaviour, just moving things over.
This reverts behaviour of systemd-run's unit name generation to the
status quo ante of #18871: we chop off the ":1." prefix if we can.
However, to address the issue that the unique name can overrun we then
do what #18871 did as fallback: only chop off the ":" prefix.
This way we should have pretty names that look like they always looked
in the common case, but in the case of a unique name overrun we still
will have names that work.
Follow-up for #18871
REMOVE_CHMOD is necessary to remove files/dirs that are owned by us but
have an access mode that would not allow us to remove them. In generic
destructor calls for use with `_cleanup_` that are "fire-and-forget"
style we should make use of that, to maximize the chance we can actually
remove the files/dirs.
(Also, add in REMOVE_MISSING_OK. Just because prettier, we ignore the
return codes anyway, but it' a bit nicer to ignore a bit fewer errors.)
In https://bugzilla.redhat.com/show_bug.cgi?id=1933873 a keymap was set without
the package that provides it being installed (it2 is in kbd-legacy, which is
not installed by default). Setting a non-installed keymap is problematic,
because it results in nasty failures afterward (*). So let's to the same as
e.g. for locale data, and refuse a setting if the definition doesn't exists in
the filesystem.
The implementation using nftw() is not the most efficient, but I think it's OK
in this case. This is definitely not in any kind of hot path, and I prefer not
to duplicate the filename manipulation logic in a second function.
(*) If the keymap is not found, vconsole-setup.service will fail.
dracut-cmdline-ask.service has Requires=vconsole-setup.service, so it will also
fail, and this breaks boot. dracut-cmdline-ask.service having a hard dependency
is appropriate though: we sadly don't display what the keymap is, and with a wrong
keymap, any attempts to enter a password are likely to fail.
We would return a real error sometimes from the callback, and FTW_STOP other
times. Because of FTW_ACTIONRETVAL, everything except FTW_STOP would be
ignored. I don't think using FTW_ACTIONRETVAL is useful.
nftw() can only be used meaningfully with errno. Even if we return a proper
value ourselves from the callback, it will be propagated as a return value from
nftw(), but there is no way to distinguish this from a value generated by
nftw() itself, which is -1/-EPERM on error. So let's set errno ourselves so the
caller can at least look at that.
The code still ignores all errors.
Some code in systemd-run checks that a bus's unique name must start with
`:1.`. However the dbus specification on unique connection names only specifies
that it must begin with a colon. And the freedesktop/dbus implementation allows
allows unique names to go up to `:INT_MAX.INT_MAX`. So update the
current check to only look for a colon at the beginning.
Nominally, the bug was in unit_load_dropin(), which just took the last mtime
instead of calculating the maximum. But instead of adding code to wrap the
loop, this patch goes in the other direction.
All (correct) callers of config_parse() followed a very similar pattern to
calculate the maximum mtime. So let's simplify things by making config_parse()
assume that mtime is initialized and update it to the maximum. This makes all
the callers that care about mtime simpler and also fixes the issue in
unit_load_dropin().
config_parse_many_nulstr() and config_parse_many() are different, because it
makes sense to call them just once, and current ret_mtime behaviour make sense.
Fixes#17730, https://bugzilla.redhat.com/show_bug.cgi?id=1933137.
70-uaccess.rules sets the uaccess tag on devices with ID_SMARTCARD_READER
set, but it is set in 99-systemd.rules .
Move this to a 60-*.rules which already matches USB CCID class, factorising
the matching, so 70-uaccess.rules sets up these devices as expected.
It's useful to be able to combine a regular /usr/ file system with a
tmpfs as root, for an OS that boots up in volatile mode on every single
boot. Let's add explicit support for this via root=tmpfs.
Note the relationship to the existing systemd.volatile= option:
1. The kernel command line "root=/dev/… systemd.volatile=yes" will mount
the specified root fs, and then hide everything at the top by
overmounting it with a tmpfs, except for the /usr subtree.
2. The kernel command line "root=tmpfs mount.usr=/dev/…" otoh will mount
a toot fs at the top (just like the case above), but will then mount
the top-level dir of the fs specified in mount.usr= directly below
it.
Or to say this differently: in the first case /usr/ from the physical
storage fs is going to become /usr/ of the hierarchy ultimately booted,
while in the second case / from the physical storage fs is going to
become /usr of the hierarchy booted.
Philosophically I figure systemd.volatile= is more an option for
"one-off" boots, while root=tmpfs is something to have as default mode
of operation for suitable images.
This is currently hard to test reasonably, since Dracut refuses to
accept root=tmpfs. This needs to be addressed separately though.
Let's fine-tune the path_extract_filename() interface: on succes return
O_DIRECTORY as indicator that the input path was slash-suffixed, and
regular 0 otherwise. This is useful since in many cases it is useful to
filter out paths that must refer to dirs early on.
I opted for O_DIRECTORY instead of the following other ideas:
1. return -EISDIR: I think the function should return an extracted
filename even when referring to an obvious dir, so this is not an
option.
2. S_ISDIR, this was a strong contender, but I think O_DIRECTORY is a
tiny bit nicer since quite likely we will go on and open the thing,
maybe with openat(), and hence it's quite nice to be able to OR in
the return value into the flags argument of openat().
3. A new enum defined with two values "dont-know" and
"definitely-directory". But I figured this was unnecessary, given we
have other options too, that reuse existing definitions for very
similar purposes.
These two together are a lot like dirname() + basename() but have the
benefit that they return clear errors when one passes a special case
path to them where the extraction doesn't make sense, i.e. "", "/",
"foo", "foo/" and so on.
Sooner or later we should probably port all our uses of
dirname()/basename() over to this, to catch these special cases more
safely.
Add the i2c subsystem to those that create by-path links.
i2c devices may not have IDs so we can't rely on the by-id links
but they (or some of them) should at least have a path that we can use.
BPF filtering accesses fields in the netlink header that are
only filled in by libudev, never by the kernel. Therefore adding
BPF filters for kernel monitors is pointless. Even false filtering
of kernel events might be possible; at least it's hard to prove that
it can't occur.
We generally operate on the assumption that a source is "gone" as soon
as we unref it. This is generally true because we have the only reference.
But if something else holds the reference, our unref doesn't really stop
the source and it could fire again.
In particular, on_query_timeout() is called with DnsQuery* as userdata, and
it calls dns_query_stop() which invalidates that pointer. If it was ever
called again, we'd be accessing already-freed memory.
I don't see what would hold the reference. sd-event takes a temporary reference,
but on the sd_event object, not on the individual sources. And our sources
are non-floating, so there is no reference from the sd_event object to the
sources.
For #18427.
This got moved under the systemd umbrella a long time ago.
Github redirects from the old path, so the link worked, but it's
nicer to use the real location.
PID 1 will now check upfront which v1 controller hiearchies are
available and modifiable and therefore it will not attempt to touch
them. If we get an EROFS failure then, it points to another
inconsistency so we will report it again. The revert also simplifies the
code a bit.
systemd user instance assumed same controllers are available to it as to
PID 1. That is not true generally, in v1 (legacy, hybrid) we don't delegate any
controllers to anyone and in v2 (unified) we may delegate only subset of
controllers.
The user instance would fail silently when the controller cgroup cannot
be created or the controller cannot be enabled on the unified hierarchy.
The changes in 7b63961415 ("cgroup: Swap cgroup v1 deletion and
migration") caused some attempts of operating on non-delegated
controllers to be logged.
Make the user instance first check what controllers are availble to it
and narrow operations only to these controllers. The original checks are
kept in place.
Note that daemon-reexec needs to be invoked in order to update the set
of unabled controllers after a change.
Fixes: #18047Fixes: #17862
The function controller_is_accessible() doesn't do really much in case
of the unified hierarchy. Move common parts into cg_get_path_and_check
and make controller check v1 specific. This is refactoring only.
2021-02-11 11:51:59 +01:00
1613 changed files with 75403 additions and 30826 deletions
echo"deb http://archive.ubuntu.com/ubuntu $UBUNTU_RELEASE-backports main restricted universe multiverse"| sudo tee -a /etc/apt/sources.list.d/backports.list
@ -26,12 +26,14 @@ Information about build requirements is provided in the [README file](README).
Consult our [NEWS file](NEWS) for information about what's new in the most recent systemd versions.
Please see the [Code Map](docs/ARCHITECTURE.md) for information about this repository's layout and content.
Please see the [Hacking guide](docs/HACKING.md) for information on how to hack on systemd and test your modifications.
Please see our [Contribution Guidelines](docs/CONTRIBUTING.md) for more information about filing GitHub Issues and posting GitHub Pull Requests.
When preparing patches for systemd, please follow our [Coding Style Guidelines](docs/CODING_STYLE.md).
If you are looking for support, please contact our [mailing list](https://lists.freedesktop.org/mailman/listinfo/systemd-devel) or join our [IRC channel](irc://irc.freenode.org/%23systemd).
If you are looking for support, please contact our [mailing list](https://lists.freedesktop.org/mailman/listinfo/systemd-devel) or join our [IRC channel](irc://irc.libera.chat/%23systemd).
Stable branches with backported patches are available in the [stable repo](https://github.com/systemd/systemd-stable).
a seccomp option we don't have to set NNP. For that, change uid first whil
keeping CAP_SYS_ADMIN, then apply seccomp, the drop cap.
* add a concept for automatically loading per-unit secrets off disk and
inserting them into the kernel keyring. Maybe SecretsDirectory= similar to
ConfigurationDirectory=.
* when no locale is configured, default to UEFI's PlatformLang variable
* bootctl,sd-boot: actually honour the "architecture" key
@ -565,13 +614,6 @@ Features:
output of "systemctl list-units" slightly by showing the tree structure of
the slices, and the units attached to them.
* the a-posteriori stopping of units bound to units that disappeared logic
should be reworked: there should be a queue of units, and we should only
enqueue stop jobs from a defer event that processes queue instead of
right-away when we find a unit that is bound to one that doesn't exist
anymore. (similar to how the stop-unneeded queue has been reworked the same
way)
* nspawn: make nspawn suitable for shell pipelines: instead of triggering a
hangup when input is finished, send ^D, which synthesizes an EOF. Then wait
for hangup or ^D before passing on the EOF.
@ -599,8 +641,6 @@ Features:
* add support for "portablectl attach http://foobar.com/waaa.raw (i.e. importd integration)
* add attach --enable and attach --now (for attach+enable+start)
* sync dynamic uids/gids between host+portable srvice (i.e. if DynamicUser=1 is set for a service, make sure that the
selected user is resolvable in the service even if it ships its own /etc/passwd)
@ -643,9 +683,6 @@ Features:
* add proper dbus APIs for the various sd_notify() commands, such as MAINPID=1
and so on, which would mean we could report errors and such.
* teach tmpfiles.d q/Q logic something sensible in the context of XFS/ext4
project quota
* introduce DefaultSlice= or so in system.conf that allows changing where we
place our units by default, i.e. change system.slice to something
else. Similar, ManagerSlice= should exist so that PID1's own scope unit could
@ -762,10 +799,6 @@ Features:
"systemd-gdb" for attaching to the start-up of any system service in its
natural habitat.
* gpt-auto logic: related to the above, maybe support a "secondary" root
partition, that is mounted to / and is writable, and where the actual root's
/usr is mounted into.
* gpt-auto logic: support encrypted swap, add kernel cmdline option to force it, and honour a gpt bit about it, plus maybe a configuration file
* drop nss-myhostname in favour of nss-resolve?
@ -798,13 +831,13 @@ Features:
on PID 1 with the relevant signals, and makes relevant files in /sys and
/proc (such as the sysrq stuff) unavailable
* Support ReadWritePaths/ReadOnlyPaths/InaccessiblePaths in systemd --user instances
via the new unprivileged Landlock LSM (https://landlock.io)
* make sure the ratelimit object can deal with USEC_INFINITY as way to turn off things
* journalctl: make sure -f ends when the container indicated by -M terminates
* mount: automatically search for "main" partition of an image has multiple
partitions
* in nss-systemd, if we run inside of RootDirectory= with PrivateUsers= set,
find a way to map the User=/Group= of the service to the right name. This way
a user/group for a service only has to exist on the host for the right
@ -852,6 +885,10 @@ Features:
* fstab-generator: default to tmpfs-as-root if only usr= is specified on the kernel cmdline
* initrd-parse-etc.service: can we skip daemon-reload if /sysroot/etc/fstab is missing?
Note that we start initrd-fs.target and initrd-cleanup.target there, so a straightforward
ConditionPathExists= is not enough.
* docs: bring http://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime up to date
* add a job mode that will fail if a transaction would mean stopping
@ -890,8 +927,6 @@ Features:
* firstboot: make it useful to be run immediately after yum --installroot to set up a machine. (most specifically, make --copy-root-password work even if /etc/passwd already exists
* maybe add support for specifier expansion in user.conf, specifically DefaultEnvironment=
* maybe allow timer units with an empty Units= setting, so that they
can be used for resuming the system but nothing else.
@ -208,9 +208,9 @@ On EFI, any such images shall be added to the list of valid boot entries.
Note that these configurations snippets do not need to be the only configuration source for a boot loader. It may extend this list of entries with additional items from other configuration files (for example its own native configuration files) or automatically detected other entries without explicit configuration.
To make this explicitly clear: this specification is designed with "free" operating systems in mind, starting Windows or MacOS is out of focus with these configuration snippets, use boot-loader specific solutions for that. In the text above, if we say "OS" we hence imply "free", i.e. primarily Linux (though this could be easily be extended to the BSDs and whatnot).
To make this explicitly clear: this specification is designed with "free" operating systems in mind, starting Windows or macOS is out of focus with these configuration snippets, use boot-loader specific solutions for that. In the text above, if we say "OS" we hence imply "free", i.e. primarily Linux (though this could be easily be extended to the BSDs and whatnot).
Note that all paths used in the configuration snippets use a Unix-style "/" as path separator. This needs to be converted to an EFI-style "\" separator in EFI boot loaders.
Note that all paths used in the configuration snippets use a Unix-style "/" as path separator. This needs to be converted to an EFI-style "\\" separator in EFI boot loaders.
| `8f461b0d-14ee-4e81-9aa9-049b6fb97abd` | _`/usr/` Verity Partition (x86)_ | Any native, optionally in LUKS | Similar semantics to root Verity partition, but just for the `/usr/` partition. |
| `8f461b0d-14ee-4e81-9aa9-049b6fb97abd` | _`/usr/` Verity Partition (x86)_ | A dm-verity superblock followed by hash data | Similar semantics to root Verity partition, but just for the `/usr/` partition. |
| `3b8f8425-20e0-4f3b-907f-1a25a76f98e8` | _Server Data Partition_ | Any native, optionally in LUKS | The first partition with this type UUID on the disk containing the root partition is automatically mounted to `/srv/`. If the partition is encrypted with LUKS, the device mapper file will be named `/dev/mapper/srv`. |
| `4d21b016-b534-45c2-a9fb-5c16e091fd2d` | _Variable Data Partition_ | Any native, optionally in LUKS | The first partition with this type UUID on the disk containing the root partition is automatically mounted to `/var/` — under the condition that its partition UUID matches the first 128 bit of `HMAC-SHA256(machine-id, 0x4d21b016b53445c2a9fb5c16e091fd2d)` (i.e. the SHA256 HMAC hash of the binary type UUID keyed by the machine ID as read from [`/etc/machine-id`](https://www.freedesktop.org/software/systemd/man/machine-id.html). This special requirement is made because `/var/` (unlike the other partition types listed here) is inherently private to a specific installation and cannot possibly be shared between multiple OS installations on the same disk, and thus should be bound to a specific instance of the OS, identified by its machine ID. If the partition is encrypted with LUKS, the device mapper file will be named `/dev/mapper/var`. |
| `7ec6f557-3bc5-4aca-b293-16ef5df639d1` | _Temporary Data Partition_ | Any native, optionally in LUKS | The first partition with this type UUID on the disk containing the root partition is automatically mounted to `/var/tmp/`. If the partition is encrypted with LUKS, the device mapper file will be named `/dev/mapper/tmp`. Note that the intended mount point is indeed `/var/tmp/`, not `/tmp/`. The latter is typically maintained in memory via <tt>tmpfs</tt> and does not require a partition on disk. In some cases it might be desirable to make `/tmp/` persistent too, in which case it is recommended to make it a symlink or bind mount to `/var/tmp/`, thus not requiring its own partition type UUID. |
| `0657fd6d-a4ab-43c4-84e5-0933c84b4f4f` | _Swap_ | Swap | All swap partitions on the disk containing the root partition are automatically enabled. |
| `0657fd6d-a4ab-43c4-84e5-0933c84b4f4f` | _Swap_ | Swap | All swap partitions on the disk containing the root partition are automatically enabled. This partition type predates the Discoverable Partitions Specification. |
| `0fc63daf-8483-4772-8e79-3d69d8477de4` | _Generic Linux Data Partitions_ | Any native, optionally in LUKS | No automatic mounting takes place for other Linux data partitions. This partition type should be used for all partitions that carry Linux file systems. The installer needs to mount them explicitly via entries in <tt>/etc/fstab</tt>. Optionally, these partitions may be encrypted with LUKS. This partition type predates the Discoverable Partitions Specification. |
| `c12a7328-f81f-11d2-ba4b-00a0c93ec93b` | _EFI System Partition_ | VFAT | The ESP used for the current boot is automatically mounted to `/efi/` (or `/boot/` as fallback), unless a different partition is mounted there (possibly via `/etc/fstab`, or because the Extended Boot Loader Partition — see below — exists) or the directory is non-empty on the root disk. This partition type is defined by the [UEFI Specification](http://www.uefi.org/specifications). |
| `bc13c2ff-59e6-4262-a352-b275fd6f7172` | _Extended Boot Loader Partition_ | Typically VFAT | The Extended Boot Loader Partition (XBOOTLDR) used for the current boot is automatically mounted to <tt>/boot/</tt>, unless a different partition is mounted there (possibly via <tt>/etc/fstab</tt>) or the directory is non-empty on the root disk. This partition type is defined by the [Boot Loader Specification](https://systemd.io/BOOT_LOADER_SPECIFICATION). |
| `0fc63daf-8483-4772-8e79-3d69d8477de4` | _Other Data Partitions_ | Any native, optionally in LUKS | No automatic mounting takes place for other Linux data partitions. This partition type should be used for all partitions that carry Linux file systems. The installer needs to mount them explicitly via entries in <tt>/etc/fstab</tt>. Optionally, these partitions may be encrypted with LUKS. |
Other GPT type IDs might be used on Linux, for example to mark software RAID or
LVM partitions. The definitions of those GPT types is outside of the scope of
@ -94,24 +94,48 @@ localized.
## Partition Flags
For the root, `/usr/`, server data, home, variable data, temporary data and swap
partitions, the partition flag bit 63 ("*no-auto*") may be used to turn off
auto-discovery for the specific partition. If set, the partition will not be
automatically mounted or enabled.
This specification defines three GPT partition flags that may be set for the
partition types defined above:
For the root, `/usr/`, server data, home, variable data and temporary data
partitions, the partition flag bit 60 ("*read-only*") may be used to mark a
partition for read-only mounts only. If set, the partition will be mounted
read-only instead of read-write. Note that the variable data partition and the
temporary data partition will generally not be able to serve their purpose if
marked read-only, since by their very definition they are supposed to be
mutable. (The home and server data partitions are generally assumed to be
mutable as well, but the requirement for them is not equally strong.) Because
of that, while the read-only flag is defined and supported, it's almost never a
good idea to actually use it for these partitions.
1. For the root, `/usr/`, Verity, home, server data, variable data, temporary data,
swap and extended boot loader partitions, the partition flag bit 63
("*no-auto*") may be used to turn off auto-discovery for the specific
partition. If set, the partition will not be automatically mounted or
enabled.
Note that these two flag definitions happen to map nicely to the ones used by
Microsoft Basic Data Partitions.
2. For the root, `/usr/`, Verity, home, server data, variable data, temporary
data and extended boot loader partitions, the partition flag bit 60
("*read-only*") may be used to mark a partition for read-only mounts only.
If set, the partition will be mounted read-only instead of read-write. Note
that the variable data partition and the temporary data partition will
generally not be able to serve their purpose if marked read-only, since by
their very definition they are supposed to be mutable. (The home and server
data partitions are generally assumed to be mutable as well, but the
requirement for them is not equally strong.) Because of that, while the
read-only flag is defined and supported, it's almost never a good idea to
actually use it for these partitions. Also note that Verity partitions are
by their semantics always read-only. The flag is hence of little effect for
them, and it is recommended to set it unconditionally for the Verity
partition types.
3. For the root, `/usr/`, home, server data, variable data, temporary data and
extended boot loader partitions, the partition flag bit 59
("*grow-file-system*") may be used to mark a partition for automatic growing
of the contained file system to the size of the partition when
mounted. Tools that automatically mount disk image with a GPT partition
table are suggested to implicitly grow the contained file system to the
partition size they are contained in. This flag is without effect on
partitions marked read-only.
Note that the first two flag definitions happen to map nicely to the ones used
by Microsoft Basic Data Partitions.
All three of these flags generally affect only auto-discovery and automatic
mounting of disk images. If partitions marked with these flags are mounted
using low-level commands like
[mount(8)](https://man7.org/linux/man-pages/man2/mount.8.html) or directly with
[mount(2)](https://man7.org/linux/man-pages/man2/mount.2.html), they typically
have no effect.
## Suggested Mode of Operation
@ -162,7 +186,14 @@ partition is listed in `/etc/fstab` or with `root=` on the kernel command line,
it _must_ take precedence over automatically discovered partitions. If a
`/home/`, `/usr/`, `/srv/`, `/boot/`, `/var/`, `/var/tmp/`, `/efi/` or `/boot/`
directory is found to be populated already in the root partition, the automatic
discovery _must not_ mount any discovered file system over it.
discovery _must not_ mount any discovered file system over it. Optionally, in
case of the root, `/usr/` and their Verity partitions instead of strictly
mounting the first suitable partition an OS might choose to mount the partition
whose label compares the highest according to `strverscmp()` or a similar
logic, in order to implement a simple partition-based A/B versioning
scheme. The precise rules are left for the implementation to decide, but when
in doubt earlier partitions (by their index) should always win over later
partitions if the label comparison is inconclusive.
A *container**manager* should automatically discover and mount the root,
`/usr/`, `/home/`, `/srv/`, `/var/`, `/var/tmp/` partitions inside a container
@ -190,11 +221,11 @@ We are not. `/etc/fstab` always overrides automatic discovery and is indeed
mentioned in the specifications. We are simply trying to make the boot and
installation processes of Linux a bit more robust and self-descriptive.
### Why did you only define the root partition for x86, x86-64, ARM, ARM64, ia64?
### Why did you only define the root partition for x86, x86-64, ARM, ARM64, ia64, riscv32, riscv64?
The automatic discovery of the root partition is defined to operate on the disk
containing the current EFI System Partition (ESP). Since EFI only exists on
x86, x86-64, ia64, and ARM so far, we only defined root partition UUIDs for
x86, x86-64, ia64, ARM and RISC-V so far, we only defined root partition UUIDs for
these architectures. Should EFI become more common on other architectures, we
14. [FINAL] Push commits to stable, create an empty -stable branch: `git push systemd-stable origin/master:master origin/master:refs/heads/${version}-stable`, and change the default branch to latest release (https://github.com/systemd/systemd-stable/settings/branches).
15. [FINAL] Push commits to stable, create an empty -stable branch: `git push systemd-stable origin/master:master origin/master:refs/heads/${version}-stable`, and change the default branch to latest release (https://github.com/systemd/systemd-stable/settings/branches).
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.