Compare commits

...

8 Commits

Author SHA1 Message Date
Lennart Poettering d4ba26d61d
Merge 6fc93d269b into 5261c521e3 2024-11-08 12:40:08 +00:00
Yu Watanabe 5261c521e3 mount-util: make path_get_mount_info() work arbitrary inode
Follow-up for d49d95df0a.
Replaces 9a032ec55a.
Fixes #35075.
2024-11-08 13:25:17 +01:00
Franck Bui 514d9e1665 test: install integration-test-setup.sh in testdata/
integration-test-setup.sh is an auxiliary script that tests rely on at
runtime. As such, install the script in testdata/.

Follow-up for af153e36ae.
2024-11-08 12:37:40 +01:00
Lennart Poettering b480a4c15e update TODO 2024-11-08 10:10:11 +01:00
Lennart Poettering af3baf174a fs-util: add comment about XO_NOCOW 2024-11-08 09:21:25 +01:00
Lennart Poettering 6fc93d269b process-util: more gracefully handle oom adjust parsing/setting
Who knows what kind of mount shenanigans people employ, let's gracefully
handle parse failures of proc files, like we alway do otherwsie.
2024-11-07 22:20:45 +01:00
Lennart Poettering 1487b7407d audit-util: modernize use_audit() a bit
Use ERRNO_IS_xyz() macros where appropriate.

Also, reduce indentation a bit by inverted early check.

And log in more error codepaths.
2024-11-07 22:20:45 +01:00
Lennart Poettering de9d4db272 audit-util: return -ENODATA from audit_{session|loginuid}_from_pid() if invoked in a container
The auditing subsystem is still not virtualized for containers, hence the two
values don't really make sense inside them, they will just leak
information from outside into the container. Hence don't make use of the
data if we detect we are run inside of a container.

This has visible effects: logind will no longer try to reuse the
auditing session ids as its own session ids when run inside a container.

While are at it, modernize the calls in more ways:

1. switch to pidref behaviour, all but one of our uses are using pidref
   anyway already.
2. use read_virtual_file() + proc_mounted()
3. reasonable distinguish ENOENT errors when reading the process proc
   files: distinguish the case where /proc is not mounted, from the case
   where the process is already gone, from where auditing is not enabled
   in the kernel build.
2024-11-07 22:20:45 +01:00
16 changed files with 264 additions and 128 deletions

25
TODO
View File

@ -129,6 +129,10 @@ Deprecations and removals:
Features:
* format-table: introduce new cell type for strings with ansi sequences in
them. display them in regular output mode (via strip_tab_ansi()), but
suppress them in json mode.
* machined: when registering a machine, also take a relative cgroup path,
relative to the machine's unit. This is useful when registering unpriv
machines, as they might sit down the cgroup tree, below a cgroup delegation
@ -217,12 +221,8 @@ Features:
services where mount propagation from the root fs is off, an still have
confext/sysext propagated in.
* support F_DUDFD_QUERY for comparing fds in same_fd (requires kernel 6.10)
* generic interface for varlink for setting log level and stuff that all our daemons can implement
* use pty ioctl to get peer wherever possible (TIOCGPTPEER)
* maybe teach repart.d/ dropins a new setting MakeMountNodes= or so, which is
just like MakeDirectories=, but uses an access mode of 0000 and sets the +i
chattr bit. This is useful as protection against early uses of /var/ or /tmp/
@ -253,8 +253,6 @@ Features:
* initrd: when transitioning from initrd to host, validate that
/lib/modules/`uname -r` exists, refuse otherwise
* tmpfiles: add "owning" flag for lines that limits effect of --purge
* signed bpf loading: to address need for signature verification for bpf
programs when they are loaded, and given the bpf folks don't think this is
realistic in kernel space, maybe add small daemon that facilitates this
@ -458,9 +456,6 @@ Features:
* introduce mntid_t, and make it 64bit, as apparently the kernel switched to
64bit mount ids
* use udev rule networkd ownership property to take ownership of network
interfaces nspawn creates
* mountfsd/nsresourced
- userdb: maybe allow callers to map one uid to their own uid
- bpflsm: allow writes if resulting UID on disk would be userns' owner UID
@ -647,6 +642,7 @@ Features:
- openpt_allocate_in_namespace()
- unit_attach_pid_to_cgroup_via_bus()
- cg_attach() requires new kernel feature
- journald's process cache
* ddi must be listed as block device fstype
@ -1470,9 +1466,6 @@ Features:
* in sd-id128: also parse UUIDs in RFC4122 URN syntax (i.e. chop off urn:uuid: prefix)
* DynamicUser= + StateDirectory= → use uid mapping mounts, too, in order to
make dirs appear under right UID.
* systemd-sysext: optionally, run it in initrd already, before transitioning
into host, to open up possibility for services shipped like that.
@ -1644,14 +1637,6 @@ Features:
* maybe add kernel cmdline params: to force random seed crediting
* introduce a new per-process uuid, similar to the boot id, the machine id, the
invocation id, that is derived from process creds, specifically a hashed
combination of AT_RANDOM + getpid() + the starttime from
/proc/self/status. Then add these ids implicitly when logging. Deriving this
uuid from these three things has the benefit that it can be derived easily
from /proc/$PID/ in a stable, and unique way that changes on both fork() and
exec().
* let's not GC a unit while its ratelimits are still pending
* when killing due to service watchdog timeout maybe detect whether target

View File

@ -15,27 +15,42 @@
#include "parse-util.h"
#include "process-util.h"
#include "socket-util.h"
#include "stat-util.h"
#include "user-util.h"
#include "virt.h"
int audit_session_from_pid(pid_t pid, uint32_t *id) {
_cleanup_free_ char *s = NULL;
const char *p;
uint32_t u;
int audit_session_from_pid(const PidRef *pid, uint32_t *ret_id) {
int r;
assert(id);
if (!pidref_is_set(pid))
return -ESRCH;
/* We don't convert ENOENT to ESRCH here, since we can't
* really distinguish between "audit is not available in the
* kernel" and "the process does not exist", both which will
* result in ENOENT. */
/* Auditing is currently not virtualized for containers. Let's hence not use the audit session ID
* from now, it will be leaked in from the host */
if (detect_container() > 0)
return -ENODATA;
p = procfs_file_alloca(pid, "sessionid");
const char *p = procfs_file_alloca(pid->pid, "sessionid");
r = read_one_line_file(p, &s);
_cleanup_free_ char *s = NULL;
bool enoent = false;
r = read_virtual_file(p, SIZE_MAX, &s, /* ret_size= */ NULL);
if (r == -ENOENT) {
if (proc_mounted() == 0)
return -ENOSYS;
enoent = true;
} else if (r < 0)
return r;
r = pidref_verify(pid);
if (r < 0)
return r;
if (enoent) /* We got ENOENT, but /proc/ was mounted and the PID still valid? In that case it appears
* auditing is not supported by the kernel. */
return -ENODATA;
uint32_t u;
r = safe_atou32(s, &u);
if (r < 0)
return r;
@ -43,31 +58,53 @@ int audit_session_from_pid(pid_t pid, uint32_t *id) {
if (!audit_session_is_valid(u))
return -ENODATA;
*id = u;
if (ret_id)
*ret_id = u;
return 0;
}
int audit_loginuid_from_pid(pid_t pid, uid_t *uid) {
_cleanup_free_ char *s = NULL;
const char *p;
uid_t u;
int audit_loginuid_from_pid(const PidRef *pid, uid_t *ret_uid) {
int r;
assert(uid);
if (!pidref_is_set(pid))
return -ESRCH;
p = procfs_file_alloca(pid, "loginuid");
/* Auditing is currently not virtualized for containers. Let's hence not use the audit session ID
* from now, it will be leaked in from the host */
if (detect_container() > 0)
return -ENODATA;
r = read_one_line_file(p, &s);
const char *p = procfs_file_alloca(pid->pid, "loginuid");
_cleanup_free_ char *s = NULL;
bool enoent = false;
r = read_virtual_file(p, SIZE_MAX, &s, /* ret_size= */ NULL);
if (r == -ENOENT) {
if (proc_mounted() == 0)
return -ENOSYS;
enoent = true;
} else if (r < 0)
return r;
r = pidref_verify(pid);
if (r < 0)
return r;
if (enoent) /* We got ENOENT, but /proc/ was mounted and the PID still valid? In that case it appears
* auditing is not supported by the kernel. */
return -ENODATA;
uid_t u;
r = parse_uid(s, &u);
if (r == -ENXIO) /* the UID was -1 */
return -ENODATA;
if (r < 0)
return r;
*uid = u;
if (ret_uid)
*ret_uid = u;
return 0;
}
@ -113,33 +150,32 @@ bool use_audit(void) {
static int cached_use = -1;
int r;
if (cached_use < 0) {
int fd;
if (cached_use >= 0)
return cached_use;
fd = socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, NETLINK_AUDIT);
if (fd < 0) {
cached_use = !IN_SET(errno, EAFNOSUPPORT, EPROTONOSUPPORT, EPERM);
if (!cached_use)
log_debug_errno(errno, "Won't talk to audit: %m");
} else {
/* If we try and use the audit fd but get -ECONNREFUSED, it is because
* we are not in the initial user namespace, and the kernel does not
* have support for audit outside of the initial user namespace
* (see https://elixir.bootlin.com/linux/latest/C/ident/audit_netlink_ok).
*
* If we receive any other error, do not disable audit because we are not
* sure that the error indicates that audit will not work in general. */
r = try_audit_request(fd);
if (r < 0) {
cached_use = r != -ECONNREFUSED;
log_debug_errno(r, cached_use ?
"Failed to make request on audit fd, ignoring: %m" :
"Won't talk to audit: %m");
} else
cached_use = true;
safe_close(fd);
}
_cleanup_close_ int fd = socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, NETLINK_AUDIT);
if (fd < 0) {
cached_use = !ERRNO_IS_PRIVILEGE(errno) && !ERRNO_IS_NOT_SUPPORTED(errno);
if (cached_use)
log_debug_errno(errno, "Unexpected error while creating audit socket, proceeding with its use: %m");
else
log_debug_errno(errno, "Won't talk to audit, because feature or privilege absent: %m");
} else {
/* If we try and use the audit fd but get -ECONNREFUSED, it is because we are not in the
* initial user namespace, and the kernel does not have support for audit outside of the
* initial user namespace (see
* https://elixir.bootlin.com/linux/latest/C/ident/audit_netlink_ok).
*
* If we receive any other error, do not disable audit because we are not sure that the error
* indicates that audit will not work in general. */
r = try_audit_request(fd);
if (r < 0) {
cached_use = r != -ECONNREFUSED;
log_debug_errno(r, cached_use ?
"Failed to make request on audit fd, ignoring: %m" :
"Won't talk to audit: %m");
} else
cached_use = true;
}
return cached_use;

View File

@ -5,10 +5,12 @@
#include <stdint.h>
#include <sys/types.h>
#include "pidref.h"
#define AUDIT_SESSION_INVALID UINT32_MAX
int audit_session_from_pid(pid_t pid, uint32_t *id);
int audit_loginuid_from_pid(pid_t pid, uid_t *uid);
int audit_session_from_pid(const PidRef *pid, uint32_t *id);
int audit_loginuid_from_pid(const PidRef *pid, uid_t *uid);
bool use_audit(void);

View File

@ -1131,6 +1131,8 @@ int xopenat_full(int dir_fd, const char *path, int open_flags, XOpenFlags xopen_
* If O_CREAT is used with XO_LABEL, any created file will be immediately relabelled.
*
* If the path is specified NULL or empty, behaves like fd_reopen().
*
* If XO_NOCOW is specified will turn on the NOCOW btrfs flag on the file, if available.
*/
if (isempty(path)) {

View File

@ -1815,6 +1815,9 @@ int namespace_fork(
int set_oom_score_adjust(int value) {
char t[DECIMAL_STR_MAX(int)];
if (!oom_score_adjust_is_valid(value))
return -EINVAL;
xsprintf(t, "%i", value);
return write_string_file("/proc/self/oom_score_adj", t,
@ -1831,11 +1834,16 @@ int get_oom_score_adjust(int *ret) {
delete_trailing_chars(t, WHITESPACE);
assert_se(safe_atoi(t, &a) >= 0);
assert_se(oom_score_adjust_is_valid(a));
r = safe_atoi(t, &a);
if (r < 0)
return r;
if (!oom_score_adjust_is_valid(a))
return -ENODATA;
if (ret)
*ret = a;
return 0;
}

View File

@ -526,8 +526,8 @@ static void client_context_really_refresh(
client_context_read_basic(c);
(void) client_context_read_label(c, label, label_size);
(void) audit_session_from_pid(c->pid, &c->auditid);
(void) audit_loginuid_from_pid(c->pid, &c->loginuid);
(void) audit_session_from_pid(&PIDREF_MAKE_FROM_PID(c->pid), &c->auditid);
(void) audit_loginuid_from_pid(&PIDREF_MAKE_FROM_PID(c->pid), &c->loginuid);
(void) client_context_read_cgroup(s, c, unit_id);
(void) client_context_read_invocation_id(s, c);

View File

@ -1118,7 +1118,7 @@ int bus_creds_add_more(sd_bus_creds *c, uint64_t mask, PidRef *pidref, pid_t tid
}
if (missing & SD_BUS_CREDS_AUDIT_SESSION_ID) {
r = audit_session_from_pid(pidref->pid, &c->audit_session_id);
r = audit_session_from_pid(pidref, &c->audit_session_id);
if (r == -ENODATA) {
/* ENODATA means: no audit session id assigned */
c->audit_session_id = AUDIT_SESSION_INVALID;
@ -1131,7 +1131,7 @@ int bus_creds_add_more(sd_bus_creds *c, uint64_t mask, PidRef *pidref, pid_t tid
}
if (missing & SD_BUS_CREDS_AUDIT_LOGIN_UID) {
r = audit_loginuid_from_pid(pidref->pid, &c->audit_login_uid);
r = audit_loginuid_from_pid(pidref, &c->audit_login_uid);
if (r == -ENODATA) {
/* ENODATA means: no audit login uid assigned */
c->audit_login_uid = UID_INVALID;

View File

@ -1006,7 +1006,7 @@ static int create_session(
"Maximum number of sessions (%" PRIu64 ") reached, refusing further sessions.",
m->sessions_max);
(void) audit_session_from_pid(leader.pid, &audit_id);
(void) audit_session_from_pid(&leader, &audit_id);
if (audit_session_is_valid(audit_id)) {
/* Keep our session IDs and the audit session IDs in sync */

View File

@ -254,7 +254,7 @@ int session_set_leader_consume(Session *s, PidRef _leader) {
s->leader_fd_saved = true;
}
(void) audit_session_from_pid(s->leader.pid, &s->audit_id);
(void) audit_session_from_pid(&s->leader, &s->audit_id);
return 1;
}

View File

@ -1808,63 +1808,81 @@ char* umount_and_unlink_and_free(char *p) {
return mfree(p);
}
static int path_get_mount_info(
static int path_get_mount_info_at(
int dir_fd,
const char *path,
char **ret_fstype,
char **ret_options) {
_cleanup_(mnt_free_tablep) struct libmnt_table *table = NULL;
_cleanup_free_ char *fstype = NULL, *options = NULL;
struct libmnt_fs *fs;
int r;
_cleanup_(mnt_free_iterp) struct libmnt_iter *iter = NULL;
int r, mnt_id;
assert(path);
assert(dir_fd >= 0 || dir_fd == AT_FDCWD);
table = mnt_new_table();
if (!table)
return -ENOMEM;
r = mnt_table_parse_mtab(table, /* filename = */ NULL);
r = path_get_mnt_id_at(dir_fd, path, &mnt_id);
if (r < 0)
return r;
return log_debug_errno(r, "Failed to get mount ID: %m");
fs = mnt_table_find_mountpoint(table, path, MNT_ITER_FORWARD);
if (!fs)
return -EINVAL;
r = libmount_parse("/proc/self/mountinfo", NULL, &table, &iter);
if (r < 0)
return log_debug_errno(r, "Failed to parse /proc/self/mountinfo: %m");
if (ret_fstype) {
fstype = strdup(strempty(mnt_fs_get_fstype(fs)));
if (!fstype)
return -ENOMEM;
for (;;) {
struct libmnt_fs *fs;
r = mnt_table_next_fs(table, iter, &fs);
if (r == 1)
break; /* EOF */
if (r < 0)
return log_debug_errno(r, "Failed to get next entry from /proc/self/mountinfo: %m");
if (mnt_fs_get_id(fs) != mnt_id)
continue;
_cleanup_free_ char *fstype = NULL, *options = NULL;
if (ret_fstype) {
fstype = strdup(strempty(mnt_fs_get_fstype(fs)));
if (!fstype)
return log_oom_debug();
}
if (ret_options) {
options = strdup(strempty(mnt_fs_get_options(fs)));
if (!options)
return log_oom_debug();
}
if (ret_fstype)
*ret_fstype = TAKE_PTR(fstype);
if (ret_options)
*ret_options = TAKE_PTR(options);
return 0;
}
if (ret_options) {
options = strdup(strempty(mnt_fs_get_options(fs)));
if (!options)
return -ENOMEM;
}
if (ret_fstype)
*ret_fstype = TAKE_PTR(fstype);
if (ret_options)
*ret_options = TAKE_PTR(options);
return 0;
return log_debug_errno(SYNTHETIC_ERRNO(ESTALE), "Cannot find mount ID %i from /proc/self/mountinfo.", mnt_id);
}
int path_is_network_fs_harder(const char *path) {
int path_is_network_fs_harder_at(int dir_fd, const char *path) {
_cleanup_close_ int fd = -EBADF;
int r;
assert(dir_fd >= 0 || dir_fd == AT_FDCWD);
fd = xopenat(dir_fd, path, O_PATH | O_CLOEXEC | O_NOFOLLOW);
if (fd < 0)
return fd;
r = fd_is_network_fs(fd);
if (r != 0)
return r;
_cleanup_free_ char *fstype = NULL, *options = NULL;
int r, ret;
assert(path);
ret = path_is_network_fs(path);
if (ret > 0)
return true;
r = path_get_mount_info(path, &fstype, &options);
r = path_get_mount_info_at(fd, /* path = */ NULL, &fstype, &options);
if (r < 0)
return RET_GATHER(ret, r);
return r;
if (fstype_is_network(fstype))
return true;

View File

@ -181,4 +181,7 @@ int mount_credentials_fs(const char *path, size_t size, bool ro);
int make_fsmount(int error_log_level, const char *what, const char *type, unsigned long flags, const char *options, int userns_fd);
int path_is_network_fs_harder(const char *path);
int path_is_network_fs_harder_at(int dir_fd, const char *path);
static inline int path_is_network_fs_harder(const char *path) {
return path_is_network_fs_harder_at(AT_FDCWD, path);
}

View File

@ -46,6 +46,7 @@ simple_tests += files(
'test-alloc-util.c',
'test-architecture.c',
'test-argv-util.c',
'test-audit-util.c',
'test-barrier.c',
'test-bitfield.c',
'test-bitmap.c',

View File

@ -0,0 +1,35 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include "audit-util.h"
#include "tests.h"
TEST(audit_loginuid_from_pid) {
_cleanup_(pidref_done) PidRef self = PIDREF_NULL, pid1 = PIDREF_NULL;
int r;
assert_se(pidref_set_self(&self) >= 0);
assert_se(pidref_set_pid(&pid1, 1) >= 0);
uid_t uid;
r = audit_loginuid_from_pid(&self, &uid);
assert_se(r >= 0 || r == -ENODATA);
if (r >= 0)
log_info("self audit login uid: " UID_FMT, uid);
assert_se(audit_loginuid_from_pid(&pid1, &uid) == -ENODATA);
uint32_t sessionid;
r = audit_session_from_pid(&self, &sessionid);
assert_se(r >= 0 || r == -ENODATA);
if (r >= 0)
log_info("self audit session id: %" PRIu32, sessionid);
assert_se(audit_session_from_pid(&pid1, &sessionid) == -ENODATA);
}
static int intro(void) {
log_show_color(true);
return EXIT_SUCCESS;
}
DEFINE_TEST_MAIN_WITH_INTRO(LOG_INFO, intro);

View File

@ -538,9 +538,53 @@ TEST(bind_mount_submounts) {
}
TEST(path_is_network_fs_harder) {
ASSERT_OK_ZERO(path_is_network_fs_harder("/dev"));
ASSERT_OK_ZERO(path_is_network_fs_harder("/sys"));
ASSERT_OK_ZERO(path_is_network_fs_harder("/run"));
_cleanup_close_ int dir_fd = -EBADF;
int r;
ASSERT_OK(dir_fd = open("/", O_PATH | O_CLOEXEC));
FOREACH_STRING(s,
"/", "/dev/", "/proc/", "/run/", "/sys/", "/tmp/", "/usr/", "/var/tmp/",
"", ".", "../../../", "/this/path/should/not/exist/for/test-mount-util/") {
r = path_is_network_fs_harder(s);
log_debug("path_is_network_fs_harder(%s) → %i: %s", s, r, r < 0 ? STRERROR(r) : yes_no(r));
const char *q = path_startswith(s, "/") ?: s;
r = path_is_network_fs_harder_at(dir_fd, q);
log_debug("path_is_network_fs_harder_at(root, %s) → %i: %s", q, r, r < 0 ? STRERROR(r) : yes_no(r));
}
if (geteuid() != 0 || have_effective_cap(CAP_SYS_ADMIN) <= 0) {
(void) log_tests_skipped("not running privileged");
return;
}
_cleanup_(rm_rf_physical_and_freep) char *t = NULL;
assert_se(mkdtemp_malloc("/tmp/test-mount-util.path_is_network_fs_harder.XXXXXXX", &t) >= 0);
r = safe_fork("(make_mount-point)",
FORK_RESET_SIGNALS |
FORK_CLOSE_ALL_FDS |
FORK_DEATHSIG_SIGTERM |
FORK_WAIT |
FORK_REOPEN_LOG |
FORK_LOG |
FORK_NEW_MOUNTNS |
FORK_MOUNTNS_SLAVE,
NULL);
ASSERT_OK(r);
if (r == 0) {
ASSERT_OK(mount_nofollow_verbose(LOG_INFO, "tmpfs", t, "tmpfs", 0, NULL));
ASSERT_OK_ZERO(path_is_network_fs_harder(t));
ASSERT_OK_ERRNO(umount(t));
ASSERT_OK(mount_nofollow_verbose(LOG_INFO, "tmpfs", t, "tmpfs", 0, "x-systemd-growfs,x-systemd-automount"));
ASSERT_OK_ZERO(path_is_network_fs_harder(t));
ASSERT_OK_ERRNO(umount(t));
_exit(EXIT_SUCCESS);
}
}
DEFINE_TEST_MAIN(LOG_DEBUG);

View File

@ -142,11 +142,13 @@ endif
############################################################
if install_tests
foreach script : ['integration-test-setup.sh', 'run-unit-tests.py']
install_data(script,
install_mode : 'rwxr-xr-x',
install_dir : testsdir)
endforeach
install_data('run-unit-tests.py',
install_mode : 'rwxr-xr-x',
install_dir : testsdir)
install_data('integration-test-setup.sh',
install_mode : 'rwxr-xr-x',
install_dir : testdata_dir)
endif
############################################################

View File

@ -7,9 +7,9 @@ Before=getty-pre.target
[Service]
ExecStartPre=rm -f /failed /testok
ExecStartPre=/usr/lib/systemd/tests/integration-test-setup.sh setup
ExecStartPre=/usr/lib/systemd/tests/testdata/integration-test-setup.sh setup
ExecStart=@command@
ExecStopPost=/usr/lib/systemd/tests/integration-test-setup.sh finalize
ExecStopPost=/usr/lib/systemd/tests/testdata/integration-test-setup.sh finalize
Type=oneshot
MemoryAccounting=@memory-accounting@
StateDirectory=%N