Compare commits

...

9 Commits

Author SHA1 Message Date
Lennart Poettering 9e7c8f64cf time-util: also use 32bit hack on EOVERFLOW
As per
https://github.com/systemd/systemd/issues/14362#issuecomment-566722686
let's also prepare for EOVERFLOW.
2019-12-19 12:46:24 +01:00
Lennart Poettering 17ef83b231
Merge pull request #14388 from anitazha/man_uid_updates
man: document uids for user journals
2019-12-19 12:45:59 +01:00
Lennart Poettering 222633b646
Merge pull request #13823 from anitazha/unpriv_privateusers
core: PrivateUsers=true for (unprivileged) user managers
2019-12-19 12:03:06 +01:00
Anita Zhang a1533ad73f [man] note which UID ranges will get user journals
Fixes #13926
2019-12-18 16:12:43 -08:00
Anita Zhang d59fc29bb7 [man] fix URL 2019-12-18 16:08:53 -08:00
Anita Zhang b6657e2c53 test: add test case for PrivateDevices=y and Group=daemon
For root, group enforcement needs to come after PrivateDevices=y set up
according to 096424d123. Add a test to
verify this is the case.
2019-12-18 11:09:30 -08:00
Anita Zhang e5f10cafe0 core: create inaccessible nodes for users when making runtime dirs
To support ProtectHome=y in a user namespace (which mounts the inaccessible
nodes), the nodes need to be accessible by the user. Create these paths and
devices in the user runtime directory so they can be used later if needed.
2019-12-18 11:09:30 -08:00
Filipe Brandenburger a49ad4c482 core: add test case for PrivateUsers=true in user manager
The test exercises that PrivateTmp=yes and ProtectHome={read-only,tmpfs}
directives work as expected when PrivateUsers=yes in a user manager.

Some code is also added to test-functions to help set up test cases that
exercise the user manager.
2019-12-18 11:09:30 -08:00
Anita Zhang 5749f855a7 core: PrivateUsers=true for (unprivileged) user managers
Let per-user service managers have user namespaces too.

For unprivileged users, user namespaces are set up much earlier
(before the mount, network, and UTS namespaces vs after) in
order to obtain capbilities in the new user namespace and enable use of
the other listed namespaces. However for privileged users (root), the
set up for the user namspace is still done at the end to avoid any
restrictions with combining namespaces inside a user namespace (see
inline comments).

Closes #10576
2019-12-18 11:09:30 -08:00
22 changed files with 367 additions and 81 deletions

View File

@ -110,8 +110,11 @@
<listitem><para>Controls whether to split up journal files per user, either <literal>uid</literal> or <listitem><para>Controls whether to split up journal files per user, either <literal>uid</literal> or
<literal>none</literal>. Split journal files are primarily useful for access control: on UNIX/Linux access <literal>none</literal>. Split journal files are primarily useful for access control: on UNIX/Linux access
control is managed per file, and the journal daemon will assign users read access to their journal files. If control is managed per file, and the journal daemon will assign users read access to their journal files. If
<literal>uid</literal>, all regular users will each get their own journal files, and system users will log to <literal>uid</literal>, all regular users (with UID outside the range of system users, dynamic service users,
the system journal. If <literal>none</literal>, journal files are not split up by user and all messages are and the nobody user) will each get their own journal files, and system users will log to the system journal.
See <ulink url="https://systemd.io/UIDS-GIDS">Users, Groups, UIDs and GIDs on systemd systems</ulink>
for more details about UID ranges.
If <literal>none</literal>, journal files are not split up by user and all messages are
instead stored in the single system journal. In this mode unprivileged users generally do not have access to instead stored in the single system journal. In this mode unprivileged users generally do not have access to
their own log data. Note that splitting up journal files by user is only available for journals stored their own log data. Note that splitting up journal files by user is only available for journals stored
persistently. If journals are stored on volatile storage (see <varname>Storage=</varname> above), only a single persistently. If journals are stored on volatile storage (see <varname>Storage=</varname> above), only a single

View File

@ -200,8 +200,11 @@ systemd-tmpfiles --create --prefix /var/log/journal</programlisting>
writable. Adding a user to this group thus enables them to read writable. Adding a user to this group thus enables them to read
the journal files.</para> the journal files.</para>
<para>By default, each logged in user will get their own set of <para>By default, each user, with a UID outside the range of system users,
journal files in <filename>/var/log/journal/</filename>. These dynamic service users, and the nobody user, will get their own set of
journal files in <filename>/var/log/journal/</filename>. See
<ulink url="https://systemd.io/UIDS-GIDS">Users, Groups, UIDs and GIDs on systemd systems</ulink>
for more details about UID ranges. These journal
files will not be owned by the user, however, in order to avoid files will not be owned by the user, however, in order to avoid
that the user can write to them directly. Instead, file system that the user can write to them directly. Instead, file system
ACLs are used to ensure the user gets read access only.</para> ACLs are used to ensure the user gets read access only.</para>

View File

@ -830,7 +830,8 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
<para>Also note that some sandboxing functionality is generally not available in user services (i.e. services run <para>Also note that some sandboxing functionality is generally not available in user services (i.e. services run
by the per-user service manager). Specifically, the various settings requiring file system namespacing support by the per-user service manager). Specifically, the various settings requiring file system namespacing support
(such as <varname>ProtectSystem=</varname>) are not available, as the underlying kernel functionality is only (such as <varname>ProtectSystem=</varname>) are not available, as the underlying kernel functionality is only
accessible to privileged processes.</para> accessible to privileged processes. However, most namespacing settings, that will not work on their own in user
services, will work when used in conjunction with <varname>PrivateUsers=</varname><option>true</option>.</para>
<variablelist class='unit-directives'> <variablelist class='unit-directives'>
@ -1251,6 +1252,13 @@ BindReadOnlyPaths=/var/lib/systemd</programlisting>
such as <varname>CapabilityBoundingSet=</varname> will affect only the latter, and there's no way to acquire such as <varname>CapabilityBoundingSet=</varname> will affect only the latter, and there's no way to acquire
additional capabilities in the host's user namespace. Defaults to off.</para> additional capabilities in the host's user namespace. Defaults to off.</para>
<para>When this setting is set up by a per-user instance of the service manager, the mapping of the
<literal>root</literal> user and group to itself is omitted (unless the user manager is root).
Additionally, in the per-user instance manager case, the
user namespace will be set up before most other namespaces. This means that combining
<varname>PrivateUsers=</varname><option>true</option> with other namespaces will enable use of features not
normally supported by the per-user instances of the service manager.</para>
<para>This setting is particularly useful in conjunction with <para>This setting is particularly useful in conjunction with
<varname>RootDirectory=</varname>/<varname>RootImage=</varname>, as the need to synchronize the user and group <varname>RootDirectory=</varname>/<varname>RootImage=</varname>, as the need to synchronize the user and group
databases in the root directory and on the host is reduced, as the only users and groups who need to be matched databases in the root directory and on the host is reduced, as the only users and groups who need to be matched
@ -1258,9 +1266,7 @@ BindReadOnlyPaths=/var/lib/systemd</programlisting>
<para>Note that the implementation of this setting might be impossible (for example if user namespaces are not <para>Note that the implementation of this setting might be impossible (for example if user namespaces are not
available), and the unit should be written in a way that does not solely rely on this setting for available), and the unit should be written in a way that does not solely rely on this setting for
security.</para> security.</para></listitem>
<xi:include href="system-only.xml" xpointer="singular"/></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>

View File

@ -535,7 +535,7 @@ w- /proc/sys/vm/swappiness - - - - 10</programlisting></para>
guaranteed to be resolvable during early boot. If this field references users/groups that only become guaranteed to be resolvable during early boot. If this field references users/groups that only become
resolveable during later boot (i.e. after NIS, LDAP or a similar networked directory service become resolveable during later boot (i.e. after NIS, LDAP or a similar networked directory service become
available), execution of the operations declared by the line will likely fail. Also see <ulink available), execution of the operations declared by the line will likely fail. Also see <ulink
url="https://systemd.io/UIDS-GIDS.html#notes-on-resolvability-of-user-and-group-names">Notes on url="https://systemd.io/UIDS-GIDS/#notes-on-resolvability-of-user-and-group-names">Notes on
Resolvability of User and Group Names</ulink> for more information on requirements on system user/group Resolvability of User and Group Names</ulink> for more information on requirements on system user/group
definitions.</para> definitions.</para>
</refsect2> </refsect2>

View File

@ -1514,7 +1514,7 @@ int time_change_fd(void) {
* See: https://github.com/systemd/systemd/issues/14362 */ * See: https://github.com/systemd/systemd/issues/14362 */
#if SIZEOF_TIME_T == 8 && ULONG_MAX < UINT64_MAX #if SIZEOF_TIME_T == 8 && ULONG_MAX < UINT64_MAX
if (ERRNO_IS_NOT_SUPPORTED(errno)) { if (ERRNO_IS_NOT_SUPPORTED(errno) || errno == EOVERFLOW) {
static const struct itimerspec its32 = { static const struct itimerspec its32 = {
.it_value.tv_sec = INT32_MAX, .it_value.tv_sec = INT32_MAX,
}; };

View File

@ -1900,7 +1900,7 @@ static bool exec_needs_mount_namespace(
return false; return false;
} }
static int setup_private_users(uid_t uid, gid_t gid) { static int setup_private_users(uid_t ouid, gid_t ogid, uid_t uid, gid_t gid) {
_cleanup_free_ char *uid_map = NULL, *gid_map = NULL; _cleanup_free_ char *uid_map = NULL, *gid_map = NULL;
_cleanup_close_pair_ int errno_pipe[2] = { -1, -1 }; _cleanup_close_pair_ int errno_pipe[2] = { -1, -1 };
_cleanup_close_ int unshare_ready_fd = -1; _cleanup_close_ int unshare_ready_fd = -1;
@ -1909,38 +1909,43 @@ static int setup_private_users(uid_t uid, gid_t gid) {
ssize_t n; ssize_t n;
int r; int r;
/* Set up a user namespace and map root to root, the selected UID/GID to itself, and everything else to /* Set up a user namespace and map the original UID/GID (IDs from before any user or group changes, i.e.
* the IDs from the user or system manager(s)) to itself, the selected UID/GID to itself, and everything else to
* nobody. In order to be able to write this mapping we need CAP_SETUID in the original user namespace, which * nobody. In order to be able to write this mapping we need CAP_SETUID in the original user namespace, which
* we however lack after opening the user namespace. To work around this we fork() a temporary child process, * we however lack after opening the user namespace. To work around this we fork() a temporary child process,
* which waits for the parent to create the new user namespace while staying in the original namespace. The * which waits for the parent to create the new user namespace while staying in the original namespace. The
* child then writes the UID mapping, under full privileges. The parent waits for the child to finish and * child then writes the UID mapping, under full privileges. The parent waits for the child to finish and
* continues execution normally. */ * continues execution normally.
* For unprivileged users (i.e. without capabilities), the root to root mapping is excluded. As such, it
* does not need CAP_SETUID to write the single line mapping to itself. */
if (uid != 0 && uid_is_valid(uid)) { /* Can only set up multiple mappings with CAP_SETUID. */
if (have_effective_cap(CAP_SETUID) && uid != ouid && uid_is_valid(uid))
r = asprintf(&uid_map, r = asprintf(&uid_map,
"0 0 1\n" /* Map root → root */ UID_FMT " " UID_FMT " 1\n" /* Map $OUID → $OUID */
UID_FMT " " UID_FMT " 1\n", /* Map $UID → $UID */ UID_FMT " " UID_FMT " 1\n", /* Map $UID → $UID */
uid, uid); ouid, ouid, uid, uid);
if (r < 0) else
return -ENOMEM; r = asprintf(&uid_map,
} else { UID_FMT " " UID_FMT " 1\n", /* Map $OUID → $OUID */
uid_map = strdup("0 0 1\n"); /* The case where the above is the same */ ouid, ouid);
if (!uid_map)
return -ENOMEM;
}
if (gid != 0 && gid_is_valid(gid)) {
r = asprintf(&gid_map,
"0 0 1\n" /* Map root → root */
GID_FMT " " GID_FMT " 1\n", /* Map $GID → $GID */
gid, gid);
if (r < 0) if (r < 0)
return -ENOMEM; return -ENOMEM;
} else {
gid_map = strdup("0 0 1\n"); /* The case where the above is the same */ /* Can only set up multiple mappings with CAP_SETGID. */
if (!gid_map) if (have_effective_cap(CAP_SETGID) && gid != ogid && gid_is_valid(gid))
r = asprintf(&gid_map,
GID_FMT " " GID_FMT " 1\n" /* Map $OGID → $OGID */
GID_FMT " " GID_FMT " 1\n", /* Map $GID → $GID */
ogid, ogid, gid, gid);
else
r = asprintf(&gid_map,
GID_FMT " " GID_FMT " 1\n", /* Map $OGID -> $OGID */
ogid, ogid);
if (r < 0)
return -ENOMEM; return -ENOMEM;
}
/* Create a communication channel so that the parent can tell the child when it finished creating the user /* Create a communication channel so that the parent can tell the child when it finished creating the user
* namespace. */ * namespace. */
@ -2983,6 +2988,7 @@ static int exec_child(
char **final_argv = NULL; char **final_argv = NULL;
dev_t journal_stream_dev = 0; dev_t journal_stream_dev = 0;
ino_t journal_stream_ino = 0; ino_t journal_stream_ino = 0;
bool userns_set_up = false;
bool needs_sandboxing, /* Do we need to set up full sandboxing? (i.e. all namespacing, all MAC stuff, caps, yadda yadda */ bool needs_sandboxing, /* Do we need to set up full sandboxing? (i.e. all namespacing, all MAC stuff, caps, yadda yadda */
needs_setuid, /* Do we need to do the actual setresuid()/setresgid() calls? */ needs_setuid, /* Do we need to do the actual setresuid()/setresgid() calls? */
needs_mount_namespace, /* Do we need to set up a mount namespace for this kernel? */ needs_mount_namespace, /* Do we need to set up a mount namespace for this kernel? */
@ -2997,6 +3003,8 @@ static int exec_child(
#if HAVE_APPARMOR #if HAVE_APPARMOR
bool use_apparmor = false; bool use_apparmor = false;
#endif #endif
uid_t saved_uid = getuid();
gid_t saved_gid = getgid();
uid_t uid = UID_INVALID; uid_t uid = UID_INVALID;
gid_t gid = GID_INVALID; gid_t gid = GID_INVALID;
size_t n_fds; size_t n_fds;
@ -3418,6 +3426,30 @@ static int exec_child(
} }
} }
if (needs_sandboxing) {
#if HAVE_SELINUX
if (use_selinux && params->selinux_context_net && socket_fd >= 0) {
r = mac_selinux_get_child_mls_label(socket_fd, command->path, context->selinux_context, &mac_selinux_context_net);
if (r < 0) {
*exit_status = EXIT_SELINUX_CONTEXT;
return log_unit_error_errno(unit, r, "Failed to determine SELinux context: %m");
}
}
#endif
/* If we're unprivileged, set up the user namespace first to enable use of the other namespaces.
* Users with CAP_SYS_ADMIN can set up user namespaces last because they will be able to
* set up the all of the other namespaces (i.e. network, mount, UTS) without a user namespace. */
if (context->private_users && !have_effective_cap(CAP_SYS_ADMIN)) {
userns_set_up = true;
r = setup_private_users(saved_uid, saved_gid, uid, gid);
if (r < 0) {
*exit_status = EXIT_USER;
return log_unit_error_errno(unit, r, "Failed to set up user namespacing for unprivileged user: %m");
}
}
}
if ((context->private_network || context->network_namespace_path) && runtime && runtime->netns_storage_socket[0] >= 0) { if ((context->private_network || context->network_namespace_path) && runtime && runtime->netns_storage_socket[0] >= 0) {
if (ns_type_supported(NAMESPACE_NET)) { if (ns_type_supported(NAMESPACE_NET)) {
@ -3466,7 +3498,9 @@ static int exec_child(
#endif #endif
} }
/* Drop groups as early as possbile */ /* Drop groups as early as possible.
* This needs to be done after PrivateDevices=y setup as device nodes should be owned by the host's root.
* For non-root in a userns, devices will be owned by the user/group before the group change, and nobody. */
if (needs_setuid) { if (needs_setuid) {
r = enforce_groups(gid, supplementary_gids, ngids); r = enforce_groups(gid, supplementary_gids, ngids);
if (r < 0) { if (r < 0) {
@ -3475,25 +3509,19 @@ static int exec_child(
} }
} }
if (needs_sandboxing) { /* If the user namespace was not set up above, try to do it now.
#if HAVE_SELINUX * It's preferred to set up the user namespace later (after all other namespaces) so as not to be
if (use_selinux && params->selinux_context_net && socket_fd >= 0) { * restricted by rules pertaining to combining user namspaces with other namespaces (e.g. in the
r = mac_selinux_get_child_mls_label(socket_fd, command->path, context->selinux_context, &mac_selinux_context_net); * case of mount namespaces being less privileged when the mount point list is copied from a
if (r < 0) { * different user namespace). */
*exit_status = EXIT_SELINUX_CONTEXT;
return log_unit_error_errno(unit, r, "Failed to determine SELinux context: %m");
}
}
#endif
if (context->private_users) { if (needs_sandboxing && context->private_users && !userns_set_up) {
r = setup_private_users(uid, gid); r = setup_private_users(saved_uid, saved_gid, uid, gid);
if (r < 0) { if (r < 0) {
*exit_status = EXIT_USER; *exit_status = EXIT_USER;
return log_unit_error_errno(unit, r, "Failed to set up user namespacing: %m"); return log_unit_error_errno(unit, r, "Failed to set up user namespacing: %m");
} }
} }
}
/* We repeat the fd closing here, to make sure that nothing is leaked from the PAM modules. Note that we are /* We repeat the fd closing here, to make sure that nothing is leaked from the PAM modules. Note that we are
* more aggressive this time since socket_fd and the netns fds we don't need anymore. We do keep the exec_fd * more aggressive this time since socket_fd and the netns fds we don't need anymore. We do keep the exec_fd

View File

@ -536,7 +536,7 @@ int mount_setup(bool loaded_policy) {
/* Also create /run/systemd/inaccessible nodes, so that we always have something to mount inaccessible nodes /* Also create /run/systemd/inaccessible nodes, so that we always have something to mount inaccessible nodes
* from. */ * from. */
(void) make_inaccessible_nodes(NULL, UID_INVALID, GID_INVALID); (void) make_inaccessible_nodes("/run/systemd", UID_INVALID, GID_INVALID);
return 0; return 0;
} }

View File

@ -12,6 +12,7 @@
#include "base-filesystem.h" #include "base-filesystem.h"
#include "dev-setup.h" #include "dev-setup.h"
#include "fd-util.h" #include "fd-util.h"
#include "format-util.h"
#include "fs-util.h" #include "fs-util.h"
#include "label.h" #include "label.h"
#include "loop-util.h" #include "loop-util.h"
@ -905,6 +906,7 @@ static int apply_mount(
const char *root_directory, const char *root_directory,
MountEntry *m) { MountEntry *m) {
_cleanup_free_ char *inaccessible = NULL;
bool rbind = true, make = false; bool rbind = true, make = false;
const char *what; const char *what;
int r; int r;
@ -916,6 +918,8 @@ static int apply_mount(
switch (m->mode) { switch (m->mode) {
case INACCESSIBLE: { case INACCESSIBLE: {
_cleanup_free_ char *tmp = NULL;
const char *runtime_dir;
struct stat target; struct stat target;
/* First, get rid of everything that is below if there /* First, get rid of everything that is below if there
@ -930,10 +934,20 @@ static int apply_mount(
return log_debug_errno(errno, "Failed to lstat() %s to determine what to mount over it: %m", mount_entry_path(m)); return log_debug_errno(errno, "Failed to lstat() %s to determine what to mount over it: %m", mount_entry_path(m));
} }
what = mode_to_inaccessible_node(target.st_mode); if (geteuid() == 0)
if (!what) runtime_dir = "/run/systemd";
else {
if (asprintf(&tmp, "/run/user/"UID_FMT, geteuid()) < 0)
log_oom();
runtime_dir = tmp;
}
r = mode_to_inaccessible_node(runtime_dir, target.st_mode, &inaccessible);
if (r < 0)
return log_debug_errno(SYNTHETIC_ERRNO(ELOOP), return log_debug_errno(SYNTHETIC_ERRNO(ELOOP),
"File type not supported for inaccessible mounts. Note that symlinks are not allowed"); "File type not supported for inaccessible mounts. Note that symlinks are not allowed");
what = inaccessible;
break; break;
} }

View File

@ -6,6 +6,7 @@
#include "sd-bus.h" #include "sd-bus.h"
#include "bus-error.h" #include "bus-error.h"
#include "dev-setup.h"
#include "fs-util.h" #include "fs-util.h"
#include "format-util.h" #include "format-util.h"
#include "label.h" #include "label.h"
@ -91,6 +92,8 @@ static int user_mkdir_runtime_path(
log_warning_errno(r, "Failed to fix label of \"%s\", ignoring: %m", runtime_path); log_warning_errno(r, "Failed to fix label of \"%s\", ignoring: %m", runtime_path);
} }
/* Set up inaccessible nodes now so they're available if we decide to use them with user namespaces. */
(void) make_inaccessible_nodes(runtime_path, uid, gid);
return 0; return 0;
fail: fail:

View File

@ -883,8 +883,7 @@ static int mount_overlay(const char *dest, CustomMount *m) {
} }
static int mount_inaccessible(const char *dest, CustomMount *m) { static int mount_inaccessible(const char *dest, CustomMount *m) {
_cleanup_free_ char *where = NULL; _cleanup_free_ char *where = NULL, *source = NULL;
const char *source;
struct stat st; struct stat st;
int r; int r;
@ -897,7 +896,9 @@ static int mount_inaccessible(const char *dest, CustomMount *m) {
return m->graceful ? 0 : r; return m->graceful ? 0 : r;
} }
assert_se(source = mode_to_inaccessible_node(st.st_mode)); r = mode_to_inaccessible_node("/run/systemd", st.st_mode, &source);
if (r < 0)
return m->graceful ? 0 : r;
r = mount_verbose(m->graceful ? LOG_DEBUG : LOG_ERR, source, where, NULL, MS_BIND, NULL); r = mount_verbose(m->graceful ? LOG_DEBUG : LOG_ERR, source, where, NULL, MS_BIND, NULL);
if (r < 0) if (r < 0)

View File

@ -3252,6 +3252,7 @@ static int outer_child(
int netns_fd) { int netns_fd) {
_cleanup_close_ int fd = -1; _cleanup_close_ int fd = -1;
const char *p;
pid_t pid; pid_t pid;
ssize_t l; ssize_t l;
int r; int r;
@ -3447,7 +3448,9 @@ static int outer_child(
return r; return r;
(void) dev_setup(directory, arg_uid_shift, arg_uid_shift); (void) dev_setup(directory, arg_uid_shift, arg_uid_shift);
(void) make_inaccessible_nodes(directory, arg_uid_shift, arg_uid_shift);
p = prefix_roota(directory, "/run/systemd");
(void) make_inaccessible_nodes(p, arg_uid_shift, arg_uid_shift);
r = setup_pts(directory); r = setup_pts(directory);
if (r < 0) if (r < 0)

View File

@ -61,20 +61,20 @@ int make_inaccessible_nodes(const char *root, uid_t uid, gid_t gid) {
const char *name; const char *name;
mode_t mode; mode_t mode;
} table[] = { } table[] = {
{ "/run/systemd", S_IFDIR | 0755 }, { "", S_IFDIR | 0755 },
{ "/run/systemd/inaccessible", S_IFDIR | 0000 }, { "/inaccessible", S_IFDIR | 0000 },
{ "/run/systemd/inaccessible/reg", S_IFREG | 0000 }, { "/inaccessible/reg", S_IFREG | 0000 },
{ "/run/systemd/inaccessible/dir", S_IFDIR | 0000 }, { "/inaccessible/dir", S_IFDIR | 0000 },
{ "/run/systemd/inaccessible/fifo", S_IFIFO | 0000 }, { "/inaccessible/fifo", S_IFIFO | 0000 },
{ "/run/systemd/inaccessible/sock", S_IFSOCK | 0000 }, { "/inaccessible/sock", S_IFSOCK | 0000 },
/* The following two are likely to fail if we lack the privs for it (for example in an userns /* The following two are likely to fail if we lack the privs for it (for example in an userns
* environment, if CAP_SYS_MKNOD is missing, or if a device node policy prohibit major/minor of 0 * environment, if CAP_SYS_MKNOD is missing, or if a device node policy prohibit major/minor of 0
* device nodes to be created). But that's entirely fine. Consumers of these files should carry * device nodes to be created). But that's entirely fine. Consumers of these files should carry
* fallback to use a different node then, for example /run/systemd/inaccessible/sock, which is close * fallback to use a different node then, for example <root>/inaccessible/sock, which is close
* enough in behaviour and semantics for most uses. */ * enough in behaviour and semantics for most uses. */
{ "/run/systemd/inaccessible/chr", S_IFCHR | 0000 }, { "/inaccessible/chr", S_IFCHR | 0000 },
{ "/run/systemd/inaccessible/blk", S_IFBLK | 0000 }, { "/inaccessible/blk", S_IFBLK | 0000 },
}; };
_cleanup_umask_ mode_t u; _cleanup_umask_ mode_t u;

View File

@ -339,38 +339,72 @@ int repeat_unmount(const char *path, int flags) {
} }
} }
const char* mode_to_inaccessible_node(mode_t mode) { int mode_to_inaccessible_node(const char *runtime_dir, mode_t mode, char **dest) {
/* This function maps a node type to a corresponding inaccessible file node. These nodes are created during /* This function maps a node type to a corresponding inaccessible file node. These nodes are created during
* early boot by PID 1. In some cases we lacked the privs to create the character and block devices (maybe * early boot by PID 1. In some cases we lacked the privs to create the character and block devices (maybe
* because we run in an userns environment, or miss CAP_SYS_MKNOD, or run with a devices policy that excludes * because we run in an userns environment, or miss CAP_SYS_MKNOD, or run with a devices policy that excludes
* device nodes with major and minor of 0), but that's fine, in that case we use an AF_UNIX file node instead, * device nodes with major and minor of 0), but that's fine, in that case we use an AF_UNIX file node instead,
* which is not the same, but close enough for most uses. And most importantly, the kernel allows bind mounts * which is not the same, but close enough for most uses. And most importantly, the kernel allows bind mounts
* from socket nodes to any non-directory file nodes, and that's the most important thing that matters. */ * from socket nodes to any non-directory file nodes, and that's the most important thing that matters. */
_cleanup_free_ char *d = NULL;
const char *node = NULL;
char *tmp;
assert(dest);
switch(mode & S_IFMT) { switch(mode & S_IFMT) {
case S_IFREG: case S_IFREG:
return "/run/systemd/inaccessible/reg"; node = "/inaccessible/reg";
break;
case S_IFDIR: case S_IFDIR:
return "/run/systemd/inaccessible/dir"; node = "/inaccessible/dir";
break;
case S_IFCHR: case S_IFCHR:
if (access("/run/systemd/inaccessible/chr", F_OK) == 0) d = path_join(runtime_dir, "/inaccessible/chr");
return "/run/systemd/inaccessible/chr"; if (!d)
return "/run/systemd/inaccessible/sock"; return log_oom();
if (access(d, F_OK) == 0) {
*dest = TAKE_PTR(d);
return 0;
}
node = "/inaccessible/sock";
break;
case S_IFBLK: case S_IFBLK:
if (access("/run/systemd/inaccessible/blk", F_OK) == 0) d = path_join(runtime_dir, "/inaccessible/blk");
return "/run/systemd/inaccessible/blk"; if (!d)
return "/run/systemd/inaccessible/sock"; return log_oom();
if (access(d, F_OK) == 0) {
*dest = TAKE_PTR(d);
return 0;
}
node = "/inaccessible/sock";
break;
case S_IFIFO: case S_IFIFO:
return "/run/systemd/inaccessible/fifo"; node = "/inaccessible/fifo";
break;
case S_IFSOCK: case S_IFSOCK:
return "/run/systemd/inaccessible/sock"; node = "/inaccessible/sock";
break;
} }
return NULL;
if (!node)
return -EINVAL;
tmp = path_join(runtime_dir, node);
if (!tmp)
return log_oom();
*dest = tmp;
return 0;
} }
#define FLAG(name) (flags & name ? STRINGIFY(name) "|" : "") #define FLAG(name) (flags & name ? STRINGIFY(name) "|" : "")

View File

@ -31,4 +31,4 @@ int mount_option_mangle(
unsigned long *ret_mount_flags, unsigned long *ret_mount_flags,
char **ret_remaining_options); char **ret_remaining_options);
const char* mode_to_inaccessible_node(mode_t mode); int mode_to_inaccessible_node(const char *runtime_dir, mode_t mode, char **dest);

View File

@ -20,7 +20,8 @@ int main(int argc, char *argv[]) {
f = prefix_roota(p, "/run"); f = prefix_roota(p, "/run");
assert_se(mkdir(f, 0755) >= 0); assert_se(mkdir(f, 0755) >= 0);
assert_se(make_inaccessible_nodes(p, 1, 1) >= 0); f = prefix_roota(p, "/run/systemd");
assert_se(make_inaccessible_nodes(f, 1, 1) >= 0);
f = prefix_roota(p, "/run/systemd/inaccessible/reg"); f = prefix_roota(p, "/run/systemd/inaccessible/reg");
assert_se(stat(f, &st) >= 0); assert_se(stat(f, &st) >= 0);

View File

@ -313,6 +313,7 @@ static void test_exec_privatedevices(Manager *m) {
test(__func__, m, "exec-privatedevices-yes.service", can_unshare ? 0 : EXIT_FAILURE, CLD_EXITED); test(__func__, m, "exec-privatedevices-yes.service", can_unshare ? 0 : EXIT_FAILURE, CLD_EXITED);
test(__func__, m, "exec-privatedevices-no.service", 0, CLD_EXITED); test(__func__, m, "exec-privatedevices-no.service", 0, CLD_EXITED);
test(__func__, m, "exec-privatedevices-disabled-by-prefix.service", can_unshare ? 0 : EXIT_FAILURE, CLD_EXITED); test(__func__, m, "exec-privatedevices-disabled-by-prefix.service", can_unshare ? 0 : EXIT_FAILURE, CLD_EXITED);
test(__func__, m, "exec-privatedevices-yes-with-group.service", can_unshare ? 0 : EXIT_FAILURE, CLD_EXITED);
/* We use capsh to test if the capabilities are /* We use capsh to test if the capabilities are
* properly set, so be sure that it exists */ * properly set, so be sure that it exists */

View File

@ -0,0 +1,9 @@
BUILD_DIR=$(shell ../../tools/find-build-dir.sh)
all setup run:
@basedir=../.. TEST_BASE_DIR=../ BUILD_DIR=$(BUILD_DIR) ./test.sh --$@
clean clean-again:
@basedir=../.. TEST_BASE_DIR=../ BUILD_DIR=$(BUILD_DIR) ./test.sh --clean
.PHONY: all setup run clean clean-again

View File

@ -0,0 +1,46 @@
#!/bin/bash
set -e
TEST_DESCRIPTION="Test PrivateUsers=yes on user manager"
. $TEST_BASE_DIR/test-functions
test_setup() {
create_empty_image_rootdir
(
LOG_LEVEL=5
eval $(udevadm info --export --query=env --name=${LOOPDEV}p2)
setup_basic_environment
inst_binary stat
mask_supporting_services
usermod --root $initdir -d /home/nobody -s /bin/bash nobody
mkdir $initdir/home $initdir/home/nobody
# Ubuntu's equivalent is nogroup
chown nobody:nobody $initdir/home/nobody || chown nobody:nogroup $initdir/home/nobody
enable_user_manager nobody
nobody_uid=$(id -u nobody)
# setup the testsuite service
cat >$initdir/etc/systemd/system/testsuite.service <<EOF
[Unit]
Description=Testsuite service
After=systemd-logind.service user@$nobody_uid.service
[Service]
ExecStart=/testsuite.sh
Type=oneshot
EOF
cp testsuite.sh $initdir/
setup_testsuite
)
setup_nspawn_root
}
has_user_dbus_socket || exit 0
do_test "$@"

View File

@ -0,0 +1,70 @@
#!/bin/bash
set -ex
set -o pipefail
systemd-analyze log-level debug
runas() {
declare userid=$1
shift
su "$userid" -c 'XDG_RUNTIME_DIR=/run/user/$UID "$@"' -- sh "$@"
}
runas nobody systemctl --user --wait is-system-running
runas nobody systemd-run --user --unit=test-private-users \
-p PrivateUsers=yes -P echo hello
runas nobody systemd-run --user --unit=test-private-tmp-innerfile \
-p PrivateUsers=yes -p PrivateTmp=yes \
-P touch /tmp/innerfile.txt
# File should not exist outside the job's tmp directory.
test ! -e /tmp/innerfile.txt
touch /tmp/outerfile.txt
# File should not appear in unit's private tmp.
runas nobody systemd-run --user --unit=test-private-tmp-outerfile \
-p PrivateUsers=yes -p PrivateTmp=yes \
-P test ! -e /tmp/outerfile.txt
# Confirm that creating a file in home works
runas nobody systemd-run --user --unit=test-unprotected-home \
-P touch /home/nobody/works.txt
test -e /home/nobody/works.txt
# Confirm that creating a file in home is blocked under read-only
runas nobody systemd-run --user --unit=test-protect-home-read-only \
-p PrivateUsers=yes -p ProtectHome=read-only \
-P bash -c '
test -e /home/nobody/works.txt
! touch /home/nobody/blocked.txt
'
test ! -e /home/nobody/blocked.txt
# Check that tmpfs hides the whole directory
runas nobody systemd-run --user --unit=test-protect-home-tmpfs \
-p PrivateUsers=yes -p ProtectHome=tmpfs \
-P test ! -e /home/nobody
# Confirm that home, /root, and /run/user are inaccessible under "yes"
runas nobody systemd-run --user --unit=test-protect-home-yes \
-p PrivateUsers=yes -p ProtectHome=yes \
-P bash -c '
test "$(stat -c %a /home)" = "0"
test "$(stat -c %a /root)" = "0"
test "$(stat -c %a /run/user)" = "0"
'
# Confirm we cannot change groups because we only have one mapping in the user
# namespace (no CAP_SETGID in the parent namespace to write the additional
# mapping of the user supplied group and thus cannot change groups to an
# unmapped group ID)
! runas nobody systemd-run --user --unit=test-group-fail \
-p PrivateUsers=yes -p Group=daemon \
-P true
systemd-analyze log-level info
echo OK > /testok
exit 0

View File

@ -102,6 +102,7 @@ test_data_files = '''
test-execute/exec-privatedevices-no-capability-mknod.service test-execute/exec-privatedevices-no-capability-mknod.service
test-execute/exec-privatedevices-no-capability-sys-rawio.service test-execute/exec-privatedevices-no-capability-sys-rawio.service
test-execute/exec-privatedevices-no.service test-execute/exec-privatedevices-no.service
test-execute/exec-privatedevices-yes-with-group.service
test-execute/exec-privatedevices-yes-capability-mknod.service test-execute/exec-privatedevices-yes-capability-mknod.service
test-execute/exec-privatedevices-yes-capability-sys-rawio.service test-execute/exec-privatedevices-yes-capability-sys-rawio.service
test-execute/exec-privatedevices-yes.service test-execute/exec-privatedevices-yes.service

View File

@ -0,0 +1,16 @@
[Unit]
Description=Test Group=group is applied after PrivateDevices=yes
[Service]
PrivateDevices=yes
Group=daemon
Type=oneshot
# Check the group applied
ExecStart=/bin/sh -x -c 'test "$$(id -n -g)" = "daemon"'
# Check that the namespace applied
ExecStart=/bin/sh -c 'test ! -c /dev/kmsg'
# Check that the owning group of a node is not daemon (should be the host root)
ExecStart=/bin/sh -x -c 'test ! "$$(stat -c %%G /dev/stderr)" = "daemon"'

View File

@ -787,7 +787,7 @@ install_libnss() {
install_dbus() { install_dbus() {
inst $ROOTLIBDIR/system/dbus.socket inst $ROOTLIBDIR/system/dbus.socket
# Newer Fedora versions use dbus-broker by default. Let's install it is available. # Newer Fedora versions use dbus-broker by default. Let's install it if it's available.
if [ -f $ROOTLIBDIR/system/dbus-broker.service ]; then if [ -f $ROOTLIBDIR/system/dbus-broker.service ]; then
inst $ROOTLIBDIR/system/dbus-broker.service inst $ROOTLIBDIR/system/dbus-broker.service
inst_symlink /etc/systemd/system/dbus.service inst_symlink /etc/systemd/system/dbus.service
@ -809,6 +809,31 @@ install_dbus() {
done done
} }
install_user_dbus() {
inst $ROOTLIBDIR/user/dbus.socket
inst_symlink /usr/lib/systemd/user/sockets.target.wants/dbus.socket || inst_symlink /etc/systemd/user/sockets.target.wants/dbus.socket
# Append the After= dependency on dbus in case it isn't already set up
mkdir -p "$initdir/etc/systemd/system/user@.service.d/"
cat <<EOF >"$initdir/etc/systemd/system/user@.service.d/dbus.conf"
[Unit]
After=dbus.service
EOF
# Newer Fedora versions use dbus-broker by default. Let's install it if it's available.
if [ -f $ROOTLIBDIR/user/dbus-broker.service ]; then
inst $ROOTLIBDIR/user/dbus-broker.service
inst_symlink /etc/systemd/user/dbus.service
elif [ -f $ROOTLIBDIR/system/dbus-daemon.service ]; then
# Fedora rawhide replaced dbus.service with dbus-daemon.service
inst $ROOTLIBDIR/user/dbus-daemon.service
# Alias symlink
inst_symlink /etc/systemd/user/dbus.service
else
inst $ROOTLIBDIR/user/dbus.service
fi
}
install_pam() { install_pam() {
( (
if [[ "$LOOKS_LIKE_DEBIAN" ]] && type -p dpkg-architecture &>/dev/null; then if [[ "$LOOKS_LIKE_DEBIAN" ]] && type -p dpkg-architecture &>/dev/null; then
@ -879,6 +904,28 @@ install_terminfo() {
dracut_install -o ${_terminfodir}/l/linux dracut_install -o ${_terminfodir}/l/linux
} }
has_user_dbus_socket() {
if [ -f /usr/lib/systemd/user/dbus.socket ] || [ -f /etc/systemd/user/dbus.socket ]; then
return 0
else
echo "Per-user instances are not supported. Skipping..."
return 1
fi
}
enable_user_manager() {
has_user_dbus_socket || return 0
local _userid
[[ $# -gt 0 ]] || set -- nobody
mkdir -p "$initdir/var/lib/systemd/linger"
for _userid; do
touch "$initdir/var/lib/systemd/linger/$_userid"
done
dracut_install su
install_user_dbus
}
setup_testsuite() { setup_testsuite() {
cp $TEST_BASE_DIR/testsuite.target $initdir/etc/systemd/system/ cp $TEST_BASE_DIR/testsuite.target $initdir/etc/systemd/system/
cp $TEST_BASE_DIR/end.service $initdir/etc/systemd/system/ cp $TEST_BASE_DIR/end.service $initdir/etc/systemd/system/