Compare commits

...

6 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek b239392ac8
Merge 9e3d6b193d into 514d9e1665 2024-11-08 13:17:00 +01:00
Franck Bui 514d9e1665 test: install integration-test-setup.sh in testdata/
integration-test-setup.sh is an auxiliary script that tests rely on at
runtime. As such, install the script in testdata/.

Follow-up for af153e36ae.
2024-11-08 12:37:40 +01:00
Lennart Poettering b480a4c15e update TODO 2024-11-08 10:10:11 +01:00
Lennart Poettering af3baf174a fs-util: add comment about XO_NOCOW 2024-11-08 09:21:25 +01:00
Ryan Wilson d8091e1281 Fix PrivatePIDs=yes integration test for kernels with no /proc/scsi 2024-11-08 13:38:35 +09:00
Zbigniew Jędrzejewski-Szmek 9e3d6b193d journal: again create user journals for users with high uids
This effectively reverts a change in 115d5145a2
'journald: move uid_for_system_journal() to uid-alloc-range.h', which slipped
in an additional check of uid_is_container(uid). The problem is that that change
is not backwards-compatible at all and very hard for users to handle.
There is no common agreement on mappings of high-range uids. Systemd declares
ownership of a large range for container uids in https://systemd.io/UIDS-GIDS/,
but this is only a recent change and various sites allocated those ranges
in a different way, in particular FreeIPA uses (used?) uids from this range
for human users. On big sites with lots of users changing uids is obviously a
hard problem. We generally assume that uids cannot be "freed" and/or changed
and/or reused safely, so we shouldn't demand the same from others.

This is somewhat similar to the situation with SYSTEM_ALLOC_UID_MIN /
SYSTEM_UID_MAX, which we tried to define to a fixed value in our code, causing
huge problems for existing systems with were created with a different
definition and couldn't be easily updated. For that case, we added a
configuration time switch and we now parse /etc/login.defs to actually use the
value that is appropriate for the local system.

Unfortunately, login.defs doesn't have a concept of container allocation ranges
(and we don't have code to parse and use those nonexistent names either), so we
can't tell users to adjust logind.defs to work around the changed definition.

login.defs has SUB_UID_{MIN,MAX}, but those aren't really the same thing,
because they are used to define where the add allocations for subuids, which is
generally a much smaller range. Maybe we should talk with other folks about
the appropriate allocation ranges and define some new settings in login.defs.
But this would require discussion and coordination with other projects first.

Actualy, it seems that this change was needed at all. The code in the container
does not log to the outside journal. It talks to its own journald, which does
journal splitting using its internal logic based on shifted uids. So let's
revert the change to fix user systems.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2251843.
2024-07-11 15:05:11 +02:00
6 changed files with 33 additions and 33 deletions

25
TODO
View File

@ -129,6 +129,10 @@ Deprecations and removals:
Features:
* format-table: introduce new cell type for strings with ansi sequences in
them. display them in regular output mode (via strip_tab_ansi()), but
suppress them in json mode.
* machined: when registering a machine, also take a relative cgroup path,
relative to the machine's unit. This is useful when registering unpriv
machines, as they might sit down the cgroup tree, below a cgroup delegation
@ -217,12 +221,8 @@ Features:
services where mount propagation from the root fs is off, an still have
confext/sysext propagated in.
* support F_DUDFD_QUERY for comparing fds in same_fd (requires kernel 6.10)
* generic interface for varlink for setting log level and stuff that all our daemons can implement
* use pty ioctl to get peer wherever possible (TIOCGPTPEER)
* maybe teach repart.d/ dropins a new setting MakeMountNodes= or so, which is
just like MakeDirectories=, but uses an access mode of 0000 and sets the +i
chattr bit. This is useful as protection against early uses of /var/ or /tmp/
@ -253,8 +253,6 @@ Features:
* initrd: when transitioning from initrd to host, validate that
/lib/modules/`uname -r` exists, refuse otherwise
* tmpfiles: add "owning" flag for lines that limits effect of --purge
* signed bpf loading: to address need for signature verification for bpf
programs when they are loaded, and given the bpf folks don't think this is
realistic in kernel space, maybe add small daemon that facilitates this
@ -458,9 +456,6 @@ Features:
* introduce mntid_t, and make it 64bit, as apparently the kernel switched to
64bit mount ids
* use udev rule networkd ownership property to take ownership of network
interfaces nspawn creates
* mountfsd/nsresourced
- userdb: maybe allow callers to map one uid to their own uid
- bpflsm: allow writes if resulting UID on disk would be userns' owner UID
@ -647,6 +642,7 @@ Features:
- openpt_allocate_in_namespace()
- unit_attach_pid_to_cgroup_via_bus()
- cg_attach() requires new kernel feature
- journald's process cache
* ddi must be listed as block device fstype
@ -1470,9 +1466,6 @@ Features:
* in sd-id128: also parse UUIDs in RFC4122 URN syntax (i.e. chop off urn:uuid: prefix)
* DynamicUser= + StateDirectory= → use uid mapping mounts, too, in order to
make dirs appear under right UID.
* systemd-sysext: optionally, run it in initrd already, before transitioning
into host, to open up possibility for services shipped like that.
@ -1644,14 +1637,6 @@ Features:
* maybe add kernel cmdline params: to force random seed crediting
* introduce a new per-process uuid, similar to the boot id, the machine id, the
invocation id, that is derived from process creds, specifically a hashed
combination of AT_RANDOM + getpid() + the starttime from
/proc/self/status. Then add these ids implicitly when logging. Deriving this
uuid from these three things has the benefit that it can be derived easily
from /proc/$PID/ in a stable, and unique way that changes on both fork() and
exec().
* let's not GC a unit while its ratelimits are still pending
* when killing due to service watchdog timeout maybe detect whether target

View File

@ -1131,6 +1131,8 @@ int xopenat_full(int dir_fd, const char *path, int open_flags, XOpenFlags xopen_
* If O_CREAT is used with XO_LABEL, any created file will be immediately relabelled.
*
* If the path is specified NULL or empty, behaves like fd_reopen().
*
* If XO_NOCOW is specified will turn on the NOCOW btrfs flag on the file, if available.
*/
if (isempty(path)) {

View File

@ -127,5 +127,5 @@ bool uid_for_system_journal(uid_t uid) {
/* Returns true if the specified UID shall get its data stored in the system journal. */
return uid_is_system(uid) || uid_is_dynamic(uid) || uid == UID_NOBODY || uid_is_container(uid);
return uid_is_system(uid) || uid_is_dynamic(uid) || uid == UID_NOBODY;
}

View File

@ -142,11 +142,13 @@ endif
############################################################
if install_tests
foreach script : ['integration-test-setup.sh', 'run-unit-tests.py']
install_data(script,
install_mode : 'rwxr-xr-x',
install_dir : testsdir)
endforeach
install_data('run-unit-tests.py',
install_mode : 'rwxr-xr-x',
install_dir : testsdir)
install_data('integration-test-setup.sh',
install_mode : 'rwxr-xr-x',
install_dir : testdata_dir)
endif
############################################################

View File

@ -7,9 +7,9 @@ Before=getty-pre.target
[Service]
ExecStartPre=rm -f /failed /testok
ExecStartPre=/usr/lib/systemd/tests/integration-test-setup.sh setup
ExecStartPre=/usr/lib/systemd/tests/testdata/integration-test-setup.sh setup
ExecStart=@command@
ExecStopPost=/usr/lib/systemd/tests/integration-test-setup.sh finalize
ExecStopPost=/usr/lib/systemd/tests/testdata/integration-test-setup.sh finalize
Type=oneshot
MemoryAccounting=@memory-accounting@
StateDirectory=%N

View File

@ -132,10 +132,12 @@ testcase_unpriv() {
return 0
fi
# The kernel has a restriction for unprivileged user namespaces where they cannot mount a less restrictive
# instance of /proc/. So if /proc/ is masked (e.g. /proc/kmsg is over-mounted with tmpfs as systemd-nspawn does),
# then mounting a new /proc/ will fail and we will still see the host's /proc/. Thus, to allow tests to run in
# a VM or nspawn, we mount a new proc on a temporary directory with no masking to bypass this kernel restriction.
# IMPORTANT: For /proc/ to be remounted in pid namespace within an unprivileged user namespace, there needs to
# be at least 1 unmasked procfs mount in ANY directory. Otherwise, if /proc/ is masked (e.g. /proc/scsi is
# over-mounted with tmpfs), then mounting a new /proc/ will fail.
#
# Thus, to guarantee PrivatePIDs=yes tests for unprivileged users pass, we mount a new procfs on a temporary
# directory with no masking. This will guarantee an unprivileged user can mount a new /proc/ successfully.
mkdir -p /tmp/TEST-07-PID1-private-pids-proc
mount -t proc proc /tmp/TEST-07-PID1-private-pids-proc
@ -146,7 +148,16 @@ testcase_unpriv() {
umount /tmp/TEST-07-PID1-private-pids-proc
rm -rf /tmp/TEST-07-PID1-private-pids-proc
# Now verify the behavior with masking - units should fail as PrivatePIDs=yes has no graceful fallback.
# Now we will mask /proc/ by mounting tmpfs over /proc/scsi. This will guarantee that mounting /proc/ will fail
# for unprivileged users when using PrivatePIDs=yes. Now units should fail as PrivatePIDs=yes has no graceful
# fallback.
#
# Note some kernels do not have /proc/scsi so we verify the directory exists prior to running the test.
if [ ! -d /proc/scsi ]; then
echo "/proc/scsi does not exist, skipping unprivileged PrivatePIDs=yes test with masked /proc/"
return 0
fi
if [[ "$HAS_EXISTING_SCSI_MOUNT" == "no" ]]; then
mount -t tmpfs tmpfs /proc/scsi
fi