Compare commits

..

2 Commits

Author SHA1 Message Date
Ivan Kruglov 7f27c11c93
Merge aa601c4918 into 0df42ebcd6 2024-11-08 07:00:46 +09:00
Ivan Kruglov aa601c4918 machine: introduce io.systemd.Machine.Mount method 2024-11-07 14:54:14 +01:00
8 changed files with 82 additions and 147 deletions

25
TODO
View File

@ -129,10 +129,6 @@ Deprecations and removals:
Features: Features:
* format-table: introduce new cell type for strings with ansi sequences in
them. display them in regular output mode (via strip_tab_ansi()), but
suppress them in json mode.
* machined: when registering a machine, also take a relative cgroup path, * machined: when registering a machine, also take a relative cgroup path,
relative to the machine's unit. This is useful when registering unpriv relative to the machine's unit. This is useful when registering unpriv
machines, as they might sit down the cgroup tree, below a cgroup delegation machines, as they might sit down the cgroup tree, below a cgroup delegation
@ -221,8 +217,12 @@ Features:
services where mount propagation from the root fs is off, an still have services where mount propagation from the root fs is off, an still have
confext/sysext propagated in. confext/sysext propagated in.
* support F_DUDFD_QUERY for comparing fds in same_fd (requires kernel 6.10)
* generic interface for varlink for setting log level and stuff that all our daemons can implement * generic interface for varlink for setting log level and stuff that all our daemons can implement
* use pty ioctl to get peer wherever possible (TIOCGPTPEER)
* maybe teach repart.d/ dropins a new setting MakeMountNodes= or so, which is * maybe teach repart.d/ dropins a new setting MakeMountNodes= or so, which is
just like MakeDirectories=, but uses an access mode of 0000 and sets the +i just like MakeDirectories=, but uses an access mode of 0000 and sets the +i
chattr bit. This is useful as protection against early uses of /var/ or /tmp/ chattr bit. This is useful as protection against early uses of /var/ or /tmp/
@ -253,6 +253,8 @@ Features:
* initrd: when transitioning from initrd to host, validate that * initrd: when transitioning from initrd to host, validate that
/lib/modules/`uname -r` exists, refuse otherwise /lib/modules/`uname -r` exists, refuse otherwise
* tmpfiles: add "owning" flag for lines that limits effect of --purge
* signed bpf loading: to address need for signature verification for bpf * signed bpf loading: to address need for signature verification for bpf
programs when they are loaded, and given the bpf folks don't think this is programs when they are loaded, and given the bpf folks don't think this is
realistic in kernel space, maybe add small daemon that facilitates this realistic in kernel space, maybe add small daemon that facilitates this
@ -456,6 +458,9 @@ Features:
* introduce mntid_t, and make it 64bit, as apparently the kernel switched to * introduce mntid_t, and make it 64bit, as apparently the kernel switched to
64bit mount ids 64bit mount ids
* use udev rule networkd ownership property to take ownership of network
interfaces nspawn creates
* mountfsd/nsresourced * mountfsd/nsresourced
- userdb: maybe allow callers to map one uid to their own uid - userdb: maybe allow callers to map one uid to their own uid
- bpflsm: allow writes if resulting UID on disk would be userns' owner UID - bpflsm: allow writes if resulting UID on disk would be userns' owner UID
@ -642,7 +647,6 @@ Features:
- openpt_allocate_in_namespace() - openpt_allocate_in_namespace()
- unit_attach_pid_to_cgroup_via_bus() - unit_attach_pid_to_cgroup_via_bus()
- cg_attach() requires new kernel feature - cg_attach() requires new kernel feature
- journald's process cache
* ddi must be listed as block device fstype * ddi must be listed as block device fstype
@ -1466,6 +1470,9 @@ Features:
* in sd-id128: also parse UUIDs in RFC4122 URN syntax (i.e. chop off urn:uuid: prefix) * in sd-id128: also parse UUIDs in RFC4122 URN syntax (i.e. chop off urn:uuid: prefix)
* DynamicUser= + StateDirectory= → use uid mapping mounts, too, in order to
make dirs appear under right UID.
* systemd-sysext: optionally, run it in initrd already, before transitioning * systemd-sysext: optionally, run it in initrd already, before transitioning
into host, to open up possibility for services shipped like that. into host, to open up possibility for services shipped like that.
@ -1637,6 +1644,14 @@ Features:
* maybe add kernel cmdline params: to force random seed crediting * maybe add kernel cmdline params: to force random seed crediting
* introduce a new per-process uuid, similar to the boot id, the machine id, the
invocation id, that is derived from process creds, specifically a hashed
combination of AT_RANDOM + getpid() + the starttime from
/proc/self/status. Then add these ids implicitly when logging. Deriving this
uuid from these three things has the benefit that it can be derived easily
from /proc/$PID/ in a stable, and unique way that changes on both fork() and
exec().
* let's not GC a unit while its ratelimits are still pending * let's not GC a unit while its ratelimits are still pending
* when killing due to service watchdog timeout maybe detect whether target * when killing due to service watchdog timeout maybe detect whether target

View File

@ -1131,8 +1131,6 @@ int xopenat_full(int dir_fd, const char *path, int open_flags, XOpenFlags xopen_
* If O_CREAT is used with XO_LABEL, any created file will be immediately relabelled. * If O_CREAT is used with XO_LABEL, any created file will be immediately relabelled.
* *
* If the path is specified NULL or empty, behaves like fd_reopen(). * If the path is specified NULL or empty, behaves like fd_reopen().
*
* If XO_NOCOW is specified will turn on the NOCOW btrfs flag on the file, if available.
*/ */
if (isempty(path)) { if (isempty(path)) {

View File

@ -1808,50 +1808,40 @@ char* umount_and_unlink_and_free(char *p) {
return mfree(p); return mfree(p);
} }
static int path_get_mount_info_at( static int path_get_mount_info(
int dir_fd,
const char *path, const char *path,
char **ret_fstype, char **ret_fstype,
char **ret_options) { char **ret_options) {
_cleanup_(mnt_free_tablep) struct libmnt_table *table = NULL; _cleanup_(mnt_free_tablep) struct libmnt_table *table = NULL;
_cleanup_(mnt_free_iterp) struct libmnt_iter *iter = NULL;
int r, mnt_id;
assert(dir_fd >= 0 || dir_fd == AT_FDCWD);
r = path_get_mnt_id_at(dir_fd, path, &mnt_id);
if (r < 0)
return log_debug_errno(r, "Failed to get mount ID: %m");
r = libmount_parse("/proc/self/mountinfo", NULL, &table, &iter);
if (r < 0)
return log_debug_errno(r, "Failed to parse /proc/self/mountinfo: %m");
for (;;) {
struct libmnt_fs *fs;
r = mnt_table_next_fs(table, iter, &fs);
if (r == 1)
break; /* EOF */
if (r < 0)
return log_debug_errno(r, "Failed to get next entry from /proc/self/mountinfo: %m");
if (mnt_fs_get_id(fs) != mnt_id)
continue;
_cleanup_free_ char *fstype = NULL, *options = NULL; _cleanup_free_ char *fstype = NULL, *options = NULL;
struct libmnt_fs *fs;
int r;
assert(path);
table = mnt_new_table();
if (!table)
return -ENOMEM;
r = mnt_table_parse_mtab(table, /* filename = */ NULL);
if (r < 0)
return r;
fs = mnt_table_find_mountpoint(table, path, MNT_ITER_FORWARD);
if (!fs)
return -EINVAL;
if (ret_fstype) { if (ret_fstype) {
fstype = strdup(strempty(mnt_fs_get_fstype(fs))); fstype = strdup(strempty(mnt_fs_get_fstype(fs)));
if (!fstype) if (!fstype)
return log_oom_debug(); return -ENOMEM;
} }
if (ret_options) { if (ret_options) {
options = strdup(strempty(mnt_fs_get_options(fs))); options = strdup(strempty(mnt_fs_get_options(fs)));
if (!options) if (!options)
return log_oom_debug(); return -ENOMEM;
} }
if (ret_fstype) if (ret_fstype)
@ -1860,29 +1850,21 @@ static int path_get_mount_info_at(
*ret_options = TAKE_PTR(options); *ret_options = TAKE_PTR(options);
return 0; return 0;
}
return log_debug_errno(SYNTHETIC_ERRNO(ESTALE), "Cannot find mount ID %i from /proc/self/mountinfo.", mnt_id);
} }
int path_is_network_fs_harder_at(int dir_fd, const char *path) { int path_is_network_fs_harder(const char *path) {
_cleanup_close_ int fd = -EBADF;
int r;
assert(dir_fd >= 0 || dir_fd == AT_FDCWD);
fd = xopenat(dir_fd, path, O_PATH | O_CLOEXEC | O_NOFOLLOW);
if (fd < 0)
return fd;
r = fd_is_network_fs(fd);
if (r != 0)
return r;
_cleanup_free_ char *fstype = NULL, *options = NULL; _cleanup_free_ char *fstype = NULL, *options = NULL;
r = path_get_mount_info_at(fd, /* path = */ NULL, &fstype, &options); int r, ret;
assert(path);
ret = path_is_network_fs(path);
if (ret > 0)
return true;
r = path_get_mount_info(path, &fstype, &options);
if (r < 0) if (r < 0)
return r; return RET_GATHER(ret, r);
if (fstype_is_network(fstype)) if (fstype_is_network(fstype))
return true; return true;

View File

@ -181,7 +181,4 @@ int mount_credentials_fs(const char *path, size_t size, bool ro);
int make_fsmount(int error_log_level, const char *what, const char *type, unsigned long flags, const char *options, int userns_fd); int make_fsmount(int error_log_level, const char *what, const char *type, unsigned long flags, const char *options, int userns_fd);
int path_is_network_fs_harder_at(int dir_fd, const char *path); int path_is_network_fs_harder(const char *path);
static inline int path_is_network_fs_harder(const char *path) {
return path_is_network_fs_harder_at(AT_FDCWD, path);
}

View File

@ -538,53 +538,9 @@ TEST(bind_mount_submounts) {
} }
TEST(path_is_network_fs_harder) { TEST(path_is_network_fs_harder) {
_cleanup_close_ int dir_fd = -EBADF; ASSERT_OK_ZERO(path_is_network_fs_harder("/dev"));
int r; ASSERT_OK_ZERO(path_is_network_fs_harder("/sys"));
ASSERT_OK_ZERO(path_is_network_fs_harder("/run"));
ASSERT_OK(dir_fd = open("/", O_PATH | O_CLOEXEC));
FOREACH_STRING(s,
"/", "/dev/", "/proc/", "/run/", "/sys/", "/tmp/", "/usr/", "/var/tmp/",
"", ".", "../../../", "/this/path/should/not/exist/for/test-mount-util/") {
r = path_is_network_fs_harder(s);
log_debug("path_is_network_fs_harder(%s) → %i: %s", s, r, r < 0 ? STRERROR(r) : yes_no(r));
const char *q = path_startswith(s, "/") ?: s;
r = path_is_network_fs_harder_at(dir_fd, q);
log_debug("path_is_network_fs_harder_at(root, %s) → %i: %s", q, r, r < 0 ? STRERROR(r) : yes_no(r));
}
if (geteuid() != 0 || have_effective_cap(CAP_SYS_ADMIN) <= 0) {
(void) log_tests_skipped("not running privileged");
return;
}
_cleanup_(rm_rf_physical_and_freep) char *t = NULL;
assert_se(mkdtemp_malloc("/tmp/test-mount-util.path_is_network_fs_harder.XXXXXXX", &t) >= 0);
r = safe_fork("(make_mount-point)",
FORK_RESET_SIGNALS |
FORK_CLOSE_ALL_FDS |
FORK_DEATHSIG_SIGTERM |
FORK_WAIT |
FORK_REOPEN_LOG |
FORK_LOG |
FORK_NEW_MOUNTNS |
FORK_MOUNTNS_SLAVE,
NULL);
ASSERT_OK(r);
if (r == 0) {
ASSERT_OK(mount_nofollow_verbose(LOG_INFO, "tmpfs", t, "tmpfs", 0, NULL));
ASSERT_OK_ZERO(path_is_network_fs_harder(t));
ASSERT_OK_ERRNO(umount(t));
ASSERT_OK(mount_nofollow_verbose(LOG_INFO, "tmpfs", t, "tmpfs", 0, "x-systemd-growfs,x-systemd-automount"));
ASSERT_OK_ZERO(path_is_network_fs_harder(t));
ASSERT_OK_ERRNO(umount(t));
_exit(EXIT_SUCCESS);
}
} }
DEFINE_TEST_MAIN(LOG_DEBUG); DEFINE_TEST_MAIN(LOG_DEBUG);

View File

@ -142,13 +142,11 @@ endif
############################################################ ############################################################
if install_tests if install_tests
install_data('run-unit-tests.py', foreach script : ['integration-test-setup.sh', 'run-unit-tests.py']
install_data(script,
install_mode : 'rwxr-xr-x', install_mode : 'rwxr-xr-x',
install_dir : testsdir) install_dir : testsdir)
endforeach
install_data('integration-test-setup.sh',
install_mode : 'rwxr-xr-x',
install_dir : testdata_dir)
endif endif
############################################################ ############################################################

View File

@ -7,9 +7,9 @@ Before=getty-pre.target
[Service] [Service]
ExecStartPre=rm -f /failed /testok ExecStartPre=rm -f /failed /testok
ExecStartPre=/usr/lib/systemd/tests/testdata/integration-test-setup.sh setup ExecStartPre=/usr/lib/systemd/tests/integration-test-setup.sh setup
ExecStart=@command@ ExecStart=@command@
ExecStopPost=/usr/lib/systemd/tests/testdata/integration-test-setup.sh finalize ExecStopPost=/usr/lib/systemd/tests/integration-test-setup.sh finalize
Type=oneshot Type=oneshot
MemoryAccounting=@memory-accounting@ MemoryAccounting=@memory-accounting@
StateDirectory=%N StateDirectory=%N

View File

@ -132,12 +132,10 @@ testcase_unpriv() {
return 0 return 0
fi fi
# IMPORTANT: For /proc/ to be remounted in pid namespace within an unprivileged user namespace, there needs to # The kernel has a restriction for unprivileged user namespaces where they cannot mount a less restrictive
# be at least 1 unmasked procfs mount in ANY directory. Otherwise, if /proc/ is masked (e.g. /proc/scsi is # instance of /proc/. So if /proc/ is masked (e.g. /proc/kmsg is over-mounted with tmpfs as systemd-nspawn does),
# over-mounted with tmpfs), then mounting a new /proc/ will fail. # then mounting a new /proc/ will fail and we will still see the host's /proc/. Thus, to allow tests to run in
# # a VM or nspawn, we mount a new proc on a temporary directory with no masking to bypass this kernel restriction.
# Thus, to guarantee PrivatePIDs=yes tests for unprivileged users pass, we mount a new procfs on a temporary
# directory with no masking. This will guarantee an unprivileged user can mount a new /proc/ successfully.
mkdir -p /tmp/TEST-07-PID1-private-pids-proc mkdir -p /tmp/TEST-07-PID1-private-pids-proc
mount -t proc proc /tmp/TEST-07-PID1-private-pids-proc mount -t proc proc /tmp/TEST-07-PID1-private-pids-proc
@ -148,16 +146,7 @@ testcase_unpriv() {
umount /tmp/TEST-07-PID1-private-pids-proc umount /tmp/TEST-07-PID1-private-pids-proc
rm -rf /tmp/TEST-07-PID1-private-pids-proc rm -rf /tmp/TEST-07-PID1-private-pids-proc
# Now we will mask /proc/ by mounting tmpfs over /proc/scsi. This will guarantee that mounting /proc/ will fail # Now verify the behavior with masking - units should fail as PrivatePIDs=yes has no graceful fallback.
# for unprivileged users when using PrivatePIDs=yes. Now units should fail as PrivatePIDs=yes has no graceful
# fallback.
#
# Note some kernels do not have /proc/scsi so we verify the directory exists prior to running the test.
if [ ! -d /proc/scsi ]; then
echo "/proc/scsi does not exist, skipping unprivileged PrivatePIDs=yes test with masked /proc/"
return 0
fi
if [[ "$HAS_EXISTING_SCSI_MOUNT" == "no" ]]; then if [[ "$HAS_EXISTING_SCSI_MOUNT" == "no" ]]; then
mount -t tmpfs tmpfs /proc/scsi mount -t tmpfs tmpfs /proc/scsi
fi fi