1
0
mirror of https://github.com/systemd/systemd synced 2026-03-16 18:14:46 +01:00

Compare commits

...

11 Commits

Author SHA1 Message Date
Topi Miettinen
d8e3c31bd8 Mount all fs nosuid when NoNewPrivileges=yes
When `NoNewPrivileges=yes`, the service shouldn't have a need for any
setuid/setgid programs, so in case there will be a new mount namespace anyway,
mount the file systems with MS_NOSUID.
2021-05-26 17:42:39 +02:00
Lennart Poettering
aa6dc3ec33 man: fix list of escaped characters in unit names
The code works differently than the docs, and the code is right here.
Fix the doc hence.

See VALID_CHARS in unit-name.c for details about allowed chars in unit
names, but keep in mind that "-" and "\" are special, since generated by
the escaping logic: they are OK to show up in unit names, but need to be
escaped when converting foreign strings to unit names to make sure
things remain reversible.

Fixes: #19623
2021-05-26 17:27:24 +02:00
Lennart Poettering
36c357b486
Merge pull request #19729 from poettering/networkctl-netns-check
networkctl: check that client netns matches networkd netns
2021-05-26 17:26:34 +02:00
Lennart Poettering
3dfeb04491 hexdecoct: make return parameters of unbase64mem() and unhexmem() optional
Inspired by: #19059
2021-05-26 16:17:33 +02:00
Yu Watanabe
06043c7821 test-network: refuse RA if not necessary 2021-05-26 21:22:13 +09:00
Yu Watanabe
618da3e7d5 test-network: wait for that the link is in configuring state at the beginning 2021-05-26 21:13:56 +09:00
Lennart Poettering
205013c800 man: document udevadm info output prefixes
Fixes: #19663
2021-05-26 12:46:51 +01:00
Lennart Poettering
74c88a2520 man: try to clarify that nss-mymachines does not provide name resolution outside its own scope
Fixes: #18229
2021-05-26 12:45:20 +01:00
Lennart Poettering
7dbc38db50 man: explicit say for priority/weight values whether more is more or less
Fixes: #17523
2021-05-26 12:42:13 +01:00
Lennart Poettering
3b085db3b6 networkctl: politely refuse being called from a different netns than the networkd instance we talk to
Otherwise things get very confusing since we mix up netens data from our
client side and from the data we retrieve from networkd.

In the long run we should teach networkctl some switch to operate safely
on other netns, and in that case also determine the right networkd
instance for that namespace.

Fixes: #19236
2021-05-26 10:40:57 +02:00
Lennart Poettering
f2ef8b28a5 networkd: add bus property exposing network namepace ID we run in
This is useful for clients to determine whether they are running in the
same network namespace as networkd.

Note that access to /proc/$PID/ns/ is restricted and only permitted to
equally privileged programs. This new bus property is primarily a way to
work around this, so that unprivileged clients can determine the
networkd netns, too.
2021-05-26 10:37:18 +02:00
20 changed files with 220 additions and 42 deletions

View File

@ -39,6 +39,15 @@
Note that the name that is resolved is the one registered with <command>systemd-machined</command>, which
may be different than the hostname configured inside of the container.</para>
<para>Note that this NSS module only makes available names of the containers running immediately below
the current system context. It does not provide host name resolution for containers running side-by-side
with the invoking system context, or containers further up or down the container hierarchy. Or in other
words, on the host system it provides host name resolution for the containers running immediately below
the host environment. When used inside a container environment however, it will not be able to provide
name resolution for containers running on the host (as those are siblings and not children of the current
container environment), but instead only for nested containers running immediately below its own
container environment.</para>
<para>To activate the NSS module, add <literal>mymachines</literal> to the line starting with
<literal>hosts:</literal> in <filename>/etc/nsswitch.conf</filename>.</para>

View File

@ -675,9 +675,10 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
<varname>SystemCallArchitectures=</varname>,
<varname>SystemCallFilter=</varname>, or
<varname>SystemCallLog=</varname> are specified. Note that even if this setting is overridden
by them, <command>systemctl show</command> shows the original value of this setting. Also see
<ulink url="https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html">No New
Privileges Flag</ulink>.</para></listitem>
by them, <command>systemctl show</command> shows the original value of this setting. In case the
service will be run in a new mount namespace anyway, all file systems are mounted with MS_NOSUID
flag. Also see <ulink url="https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html">
No New Privileges Flag</ulink>.</para></listitem>
</varlistentry>
<varlistentry>
@ -1036,8 +1037,10 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
<varlistentry>
<term><varname>Nice=</varname></term>
<listitem><para>Sets the default nice level (scheduling priority) for executed processes. Takes an integer
between -20 (highest priority) and 19 (lowest priority). See
<listitem><para>Sets the default nice level (scheduling priority) for executed processes. Takes an
integer between -20 (highest priority) and 19 (lowest priority). In case of resource contention,
smaller values mean more resources will be made available to the unit's processes, larger values mean
less resources will be made available. See
<citerefentry><refentrytitle>setpriority</refentrytitle><manvolnum>2</manvolnum></citerefentry> for
details.</para></listitem>
</varlistentry>
@ -1054,11 +1057,13 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
<varlistentry>
<term><varname>CPUSchedulingPriority=</varname></term>
<listitem><para>Sets the CPU scheduling priority for executed processes. The available priority range depends
on the selected CPU scheduling policy (see above). For real-time scheduling policies an integer between 1
(lowest priority) and 99 (highest priority) can be used. See
<citerefentry project='man-pages'><refentrytitle>sched_setscheduler</refentrytitle><manvolnum>2</manvolnum></citerefentry> for
details. </para></listitem>
<listitem><para>Sets the CPU scheduling priority for executed processes. The available priority range
depends on the selected CPU scheduling policy (see above). For real-time scheduling policies an
integer between 1 (lowest priority) and 99 (highest priority) can be used. In case of CPU resource
contention, smaller values mean less CPU time is made available to the service, larger values mean
more. See <citerefentry
project='man-pages'><refentrytitle>sched_setscheduler</refentrytitle><manvolnum>2</manvolnum></citerefentry>
for details. </para></listitem>
</varlistentry>
<varlistentry>
@ -1122,11 +1127,13 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
<varlistentry>
<term><varname>IOSchedulingPriority=</varname></term>
<listitem><para>Sets the I/O scheduling priority for executed processes. Takes an integer between 0 (highest
priority) and 7 (lowest priority). The available priorities depend on the selected I/O scheduling class (see
above). If the empty string is assigned to this option, all prior assignments to both
<varname>IOSchedulingClass=</varname> and <varname>IOSchedulingPriority=</varname> have no effect.
See <citerefentry><refentrytitle>ioprio_set</refentrytitle><manvolnum>2</manvolnum></citerefentry> for
<listitem><para>Sets the I/O scheduling priority for executed processes. Takes an integer between 0
(highest priority) and 7 (lowest priority). In case of I/O contention, smaller values mean more I/O
bandwidth is made available to the unit's processes, larger values mean less bandwidth. The available
priorities depend on the selected I/O scheduling class (see above). If the empty string is assigned
to this option, all prior assignments to both <varname>IOSchedulingClass=</varname> and
<varname>IOSchedulingPriority=</varname> have no effect. See
<citerefentry><refentrytitle>ioprio_set</refentrytitle><manvolnum>2</manvolnum></citerefentry> for
details.</para></listitem>
</varlistentry>

View File

@ -180,13 +180,14 @@
<term><varname>StartupCPUWeight=<replaceable>weight</replaceable></varname></term>
<listitem>
<para>Assign the specified CPU time weight to the processes executed, if the unified control group hierarchy
is used on the system. These options take an integer value and control the <literal>cpu.weight</literal>
control group attribute. The allowed range is 1 to 10000. Defaults to 100. For details about this control
group attribute, see <ulink
url="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html">Control Groups v2</ulink> and <ulink
url="https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html">CFS Scheduler</ulink>.
The available CPU time is split up among all units within one slice relative to their CPU time weight.</para>
<para>Assign the specified CPU time weight to the processes executed, if the unified control group
hierarchy is used on the system. These options take an integer value and control the
<literal>cpu.weight</literal> control group attribute. The allowed range is 1 to 10000. Defaults to
100. For details about this control group attribute, see <ulink
url="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html">Control Groups v2</ulink>
and <ulink url="https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html">CFS
Scheduler</ulink>. The available CPU time is split up among all units within one slice relative to
their CPU time weight. A higher weight means more CPU time, a lower weight means less.</para>
<para>While <varname>StartupCPUWeight=</varname> only applies to the startup phase of the system,
<varname>CPUWeight=</varname> applies to normal runtime of the system, and if the former is not set also to
@ -435,13 +436,14 @@
<term><varname>StartupIOWeight=<replaceable>weight</replaceable></varname></term>
<listitem>
<para>Set the default overall block I/O weight for the executed processes, if the unified control group
hierarchy is used on the system. Takes a single weight value (between 1 and 10000) to set the default block
I/O weight. This controls the <literal>io.weight</literal> control group attribute, which defaults to
100. For details about this control group attribute, see <ulink
url="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.
The available I/O bandwidth is split up among all units within one slice relative to their block
I/O weight.</para>
<para>Set the default overall block I/O weight for the executed processes, if the unified control
group hierarchy is used on the system. Takes a single weight value (between 1 and 10000) to set the
default block I/O weight. This controls the <literal>io.weight</literal> control group attribute,
which defaults to 100. For details about this control group attribute, see <ulink
url="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#io-interface-files">IO
Interface Files</ulink>. The available I/O bandwidth is split up among all units within one slice
relative to their block I/O weight. A higher weight means more I/O bandwidth, a lower weight means
less.</para>
<para>While <varname>StartupIOWeight=</varname> only applies
to the startup phase of the system,

View File

@ -273,10 +273,11 @@
objects in the file system hierarchy. Example: a device unit <filename>dev-sda.device</filename> refers to a device
with the device node <filename index="false">/dev/sda</filename> in the file system.</para>
<para>The escaping algorithm operates as follows: given a string, any <literal>/</literal> character is replaced by
<literal>-</literal>, and all other characters which are not ASCII alphanumerics or <literal>_</literal> are
replaced by C-style <literal>\x2d</literal> escapes. In addition, <literal>.</literal> is replaced with such a
C-style escape when it would appear as the first character in the escaped string.</para>
<para>The escaping algorithm operates as follows: given a string, any <literal>/</literal> character is
replaced by <literal>-</literal>, and all other characters which are not ASCII alphanumerics,
<literal>:</literal>, <literal>_</literal> or <literal>.</literal> are replaced by C-style
<literal>\x2d</literal> escapes. In addition, <literal>.</literal> is replaced with such a C-style escape
when it would appear as the first character in the escaped string.</para>
<para>When the input qualifies as absolute file system path, this algorithm is extended slightly: the path to the
root directory <literal>/</literal> is encoded as single dash <literal>-</literal>. In addition, any leading,

View File

@ -186,6 +186,45 @@
<xi:include href="standard-options.xml" xpointer="help" />
</variablelist>
<para>The generated output shows the current device database entry in a terse format. Each line shown
is prefixed with one of the following characters:</para>
<table>
<title><command>udevadm info</command> output prefixes</title>
<tgroup cols='2' align='left' colsep='1' rowsep='1'>
<colspec colname="prefix" />
<colspec colname="meaning" />
<thead>
<row>
<entry>Prefix</entry>
<entry>Meaning</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>P:</literal></entry>
<entry>Device path in <filename>/sys/</filename></entry>
</row>
<row>
<entry><literal>N:</literal></entry>
<entry>Kernel device node name</entry>
</row>
<row>
<entry><literal>L:</literal></entry>
<entry>Device node symlink priority</entry>
</row>
<row>
<entry><literal>S:</literal></entry>
<entry>Device node symlink</entry>
</row>
<row>
<entry><literal>E:</literal></entry>
<entry>Device property</entry>
</row>
</tbody>
</tgroup>
</table>
</refsect2>
<refsect2><title>udevadm trigger

View File

@ -115,8 +115,6 @@ int unhexmem_full(const char *p, size_t l, bool secure, void **ret, size_t *ret_
uint8_t *z;
int r;
assert(ret);
assert(ret_len);
assert(p || l == 0);
if (l == SIZE_MAX)
@ -150,8 +148,10 @@ int unhexmem_full(const char *p, size_t l, bool secure, void **ret, size_t *ret_
*z = 0;
*ret_len = (size_t) (z - buf);
*ret = TAKE_PTR(buf);
if (ret_len)
*ret_len = (size_t) (z - buf);
if (ret)
*ret = TAKE_PTR(buf);
return 0;
@ -705,8 +705,6 @@ int unbase64mem_full(const char *p, size_t l, bool secure, void **ret, size_t *r
int r;
assert(p || l == 0);
assert(ret);
assert(ret_size);
if (l == SIZE_MAX)
l = strlen(p);
@ -802,8 +800,10 @@ int unbase64mem_full(const char *p, size_t l, bool secure, void **ret, size_t *r
*z = 0;
*ret_size = (size_t) (z - buf);
*ret = TAKE_PTR(buf);
if (ret_size)
*ret_size = (size_t) (z - buf);
if (ret)
*ret = TAKE_PTR(buf);
return 0;

View File

@ -3191,6 +3191,8 @@ static int apply_mount_namespace(
.protect_proc = context->protect_proc,
.proc_subset = context->proc_subset,
.private_ipc = context->private_ipc || context->ipc_namespace_path,
/* If NNP is on, we can turn on MS_NOSUID, since it won't have any effect anymore. */
.mount_nosuid = context->no_new_privileges,
};
} else if (!context->dynamic_user && root_dir)
/*

View File

@ -1464,6 +1464,27 @@ static int make_noexec(const MountEntry *m, char **deny_list, FILE *proc_self_mo
return 0;
}
static int make_nosuid(const MountEntry *m, FILE *proc_self_mountinfo) {
bool submounts = false;
int r = 0;
assert(m);
assert(proc_self_mountinfo);
submounts = !IN_SET(m->mode, EMPTY_DIR, TMPFS);
if (submounts)
r = bind_remount_recursive_with_mountinfo(mount_entry_path(m), MS_NOSUID, MS_NOSUID, NULL, proc_self_mountinfo);
else
r = bind_remount_one_with_mountinfo(mount_entry_path(m), MS_NOSUID, MS_NOSUID, proc_self_mountinfo);
if (r == -ENOENT && m->ignore)
return 0;
if (r < 0)
return log_debug_errno(r, "Failed to re-mount '%s'%s: %m", mount_entry_path(m),
submounts ? " and its submounts" : "");
return 0;
}
static bool namespace_info_mount_apivfs(const NamespaceInfo *ns_info) {
assert(ns_info);
@ -1660,6 +1681,17 @@ static int apply_mounts(
}
}
/* Fourth round, flip the nosuid bits without a deny list. */
if (ns_info->mount_nosuid)
for (MountEntry *m = mounts; m < mounts + *n_mounts; ++m) {
r = make_nosuid(m, proc_self_mountinfo);
if (r < 0) {
if (error_path && mount_entry_path(m))
*error_path = strdup(mount_entry_path(m));
return r;
}
}
return 1;
}

View File

@ -74,6 +74,7 @@ struct NamespaceInfo {
bool mount_apivfs;
bool protect_hostname;
bool private_ipc;
bool mount_nosuid;
ProtectHome protect_home;
ProtectSystem protect_system;
ProtectProc protect_proc;

View File

@ -2993,6 +2993,45 @@ static int networkctl_main(int argc, char *argv[]) {
return dispatch_verb(argc, argv, verbs, NULL);
}
static int check_netns_match(void) {
_cleanup_(sd_bus_error_free) sd_bus_error error = SD_BUS_ERROR_NULL;
_cleanup_(sd_bus_flush_close_unrefp) sd_bus *bus = NULL;
struct stat st;
uint64_t id;
int r;
r = sd_bus_open_system(&bus);
if (r < 0)
return log_error_errno(r, "Failed to connect system bus: %m");
r = sd_bus_get_property_trivial(
bus,
"org.freedesktop.network1",
"/org/freedesktop/network1",
"org.freedesktop.network1.Manager",
"NamespaceId",
&error,
't',
&id);
if (r < 0) {
log_debug_errno(r, "Failed to query network namespace of networkd, ignoring: %s", bus_error_message(&error, r));
return 0;
}
if (id == 0) {
log_debug("systemd-networkd.service not running in a network namespace (?), skipping netns check.");
return 0;
}
if (stat("/proc/self/ns/net", &st) < 0)
return log_error_errno(r, "Failed to determine our own network namespace ID: %m");
if (id != st.st_ino)
return log_error_errno(SYNTHETIC_ERRNO(EREMOTE),
"networkctl must be invoked in same network namespace as systemd-networkd.service.");
return 0;
}
static void warn_networkd_missing(void) {
if (access("/run/systemd/netif/state", F_OK) >= 0)
@ -3010,6 +3049,10 @@ static int run(int argc, char* argv[]) {
if (r <= 0)
return r;
r = check_netns_match();
if (r < 0)
return r;
warn_networkd_missing();
return networkctl_main(argc, argv);

View File

@ -263,6 +263,34 @@ static int bus_method_describe(sd_bus_message *message, void *userdata, sd_bus_e
return sd_bus_send(NULL, reply, NULL);
}
static int property_get_namespace_id(
sd_bus *bus,
const char *path,
const char *interface,
const char *property,
sd_bus_message *reply,
void *userdata,
sd_bus_error *error) {
uint64_t id = 0;
struct stat st;
assert(bus);
assert(reply);
/* Returns our own network namespace ID, i.e. the inode number of /proc/self/ns/net. This allows
* unprivileged clients to determine whether they are in the same network namespace as us (note that
* access to that path is restricted, thus they can't check directly unless privileged). */
if (stat("/proc/self/ns/net", &st) < 0) {
log_warning_errno(errno, "Failed to stat network namespace, ignoring: %m");
id = 0;
} else
id = st.st_ino;
return sd_bus_message_append(reply, "t", id);
}
const sd_bus_vtable manager_vtable[] = {
SD_BUS_VTABLE_START(0),
@ -272,6 +300,7 @@ const sd_bus_vtable manager_vtable[] = {
SD_BUS_PROPERTY("IPv4AddressState", "s", property_get_address_state, offsetof(Manager, ipv4_address_state), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
SD_BUS_PROPERTY("IPv6AddressState", "s", property_get_address_state, offsetof(Manager, ipv6_address_state), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
SD_BUS_PROPERTY("OnlineState", "s", property_get_online_state, offsetof(Manager, online_state), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
SD_BUS_PROPERTY("NamespaceId", "t", property_get_namespace_id, 0, SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_METHOD_WITH_ARGS("ListLinks",
SD_BUS_NO_ARGS,

View File

@ -4,3 +4,4 @@ Name=test1
[Network]
Address=192.168.10.30/24
Gateway=192.168.10.1
IPv6AcceptRA=no

View File

@ -3,6 +3,7 @@ Name=dummy98
[Network]
Address=192.168.20.20/24
IPv6AcceptRA=no
[NextHop]
Id=20

View File

@ -4,6 +4,7 @@ Name=wg98
[Network]
Address=192.168.123.123/24
Address=fd8d:4d6d:3ccb:0500::1/64
IPv6AcceptRA=no
# nat64 via 1
[Route]

View File

@ -3,3 +3,4 @@ Name=wg99
[Network]
Address=192.168.124.1/24
IPv6AcceptRA=no

View File

@ -1,9 +1,12 @@
[Match]
Name=client-peer
[Network]
Address=192.168.6.2/24
DHCPServer=yes
IPForward=ipv4
IPv6AcceptRA=no
[DHCPServer]
RelayTarget=192.168.5.1
BindToInterface=no

View File

@ -1,5 +1,7 @@
[Match]
Name=client
[Network]
DHCP=yes
IPForward=ipv4
IPv6AcceptRA=no

View File

@ -1,5 +1,7 @@
[Match]
Name=server-peer
[Network]
Address=192.168.5.2/24
IPForward=ipv4
IPv6AcceptRA=no

View File

@ -1,9 +1,11 @@
[Match]
Name=server
[Network]
Address=192.168.5.1/24
IPForward=ipv4
DHCPServer=yes
IPv6AcceptRA=no
[DHCPServer]
BindToInterface=no

View File

@ -2785,7 +2785,7 @@ class NetworkdNetworkTests(unittest.TestCase, Utilities):
for iteration in range(4):
with self.subTest(iteration=iteration, expect_up=expect_up):
operstate = 'routable' if expect_up else 'off'
setup_state = 'configured' if expect_up else None
setup_state = 'configured' if expect_up else ('configuring' if iteration == 0 else None)
self.wait_operstate('test1', operstate, setup_state=setup_state, setup_timeout=20)
if expect_up: