Compare commits

..

No commits in common. "da1b880a3a772aa46e023353bdfa4f558286fa5e" and "699b890bf331b690b8bab1fbbd50366345e3adbc" have entirely different histories.

13 changed files with 252 additions and 281 deletions

View File

@ -6,19 +6,20 @@ layout: default
# Predictable Network Interface Names
Starting with v197 systemd/udev will automatically assign predictable, stable network interface names for all local Ethernet, WLAN and WWAN interfaces. This is a departure from the traditional interface naming scheme (`eth0`, `eth1`, `wlan0`, ...), but should fix real problems.
Starting with v197 systemd/udev will automatically assign predictable, stable network interface names for all local Ethernet, WLAN and WWAN interfaces. This is a departure from the traditional interface naming scheme ("eth0", "eth1", "wlan0", ...), but should fix real problems.
## Why?
The classic naming scheme for network interfaces applied by the kernel is to simply assign names beginning with `eth0`, `eth1`, ... to all interfaces as they are probed by the drivers. As the driver probing is generally not predictable for modern technology this means that as soon as multiple network interfaces are available the assignment of the names `eth0`, `eth1` and so on is generally not fixed anymore and it might very well happen that `eth0` on one boot ends up being `eth1` on the next. This can have serious security implications, for example in firewall rules which are coded for certain naming schemes, and which are hence very sensitive to unpredictable changing names.
The classic naming scheme for network interfaces applied by the kernel is to simply assign names beginning with "eth0", "eth1", ... to all interfaces as they are probed by the drivers. As the driver probing is generally not predictable for modern technology this means that as soon as multiple network interfaces are available the assignment of the names "eth0", "eth1" and so on is generally not fixed anymore and it might very well happen that "eth0" on one boot ends up being "eth1" on the next. This can have serious security implications, for example in firewall rules which are coded for certain naming schemes, and which are hence very sensitive to unpredictable changing names.
To fix this problem multiple solutions have been proposed and implemented. For a longer time udev shipped support for assigning permanent `ethX` names to certain interfaces based on their MAC addresses. This turned out to have a multitude of problems, among them: this required a writable root directory which is generally not available; the statelessness of the system is lost as booting an OS image on a system will result in changed configuration of the image; on many systems MAC addresses are not actually fixed, such as on a lot of embedded hardware and particularly on all kinds of virtualization solutions. The biggest of all however is that the userspace components trying to assign the interface name raced against the kernel assigning new names from the same `ethX` namespace, a race condition with all kinds of weird effects, among them that assignment of names sometimes failed. As a result support for this has been removed from systemd/udev a while back.
To fix this problem multiple solutions have been proposed and implemented. For a longer time udev shipped support for assigning permanent "ethX" names to certain interfaces based on their MAC addresses. This turned out to have a multitude of problems, among them: this required a writable root directory which is generally not available; the statelessness of the system is lost as booting an OS image on a system will result in changed configuration of the image; on many systems MAC addresses are not actually fixed, such as on a lot of embedded hardware and particularly on all kinds of virtualization solutions. The biggest of all however is that the userspace components trying to assign the interface name raced against the kernel assigning new names from the same "ethX" namespace, a race condition with all kinds of weird effects, among them that assignment of names sometimes failed. As a result support for this has been removed from systemd/udev a while back.
Another solution that has been implemented is `biosdevname` which tries to find fixed slot topology information in certain firmware interfaces and uses them to assign fixed names to interfaces which incorporate their physical location on the mainboard. In a way this naming scheme is similar to what is already done natively in udev for various device nodes via `/dev/*/by-path/` symlinks. In many cases, biosdevname departs from the low-level kernel device identification schemes that udev generally uses for these symlinks, and instead invents its own enumeration schemes.
Another solution that has been implemented is "biosdevname" which tries to find fixed slot topology information in certain firmware interfaces and uses them to assign fixed names to interfaces which incorporate their physical location on the mainboard. In a way this naming scheme is similar to what is already done natively in udev for various device nodes via /dev/*/by-path/ symlinks. In many cases, biosdevname departs from the low-level kernel device identification schemes that udev generally uses for these symlinks, and instead invents its own enumeration schemes.
Finally, many distributions support renaming interfaces to user-chosen names (think: `internet0`, `dmz0`, ...) keyed off their MAC addresses or physical locations as part of their networking scripts. This is a very good choice but does have the problem that it implies that the user is willing and capable of choosing and assigning these names.
Finally, many distributions support renaming interfaces to user-chosen names (think: "internet0", "dmz0", ...) keyed off their MAC addresses or physical locations as part of their networking scripts. This is a very good choice but does have the problem that it implies that the user is willing and capable of choosing and assigning these names.
We believe it is a good default choice to generalize the scheme pioneered by `biosdevname`. Assigning fixed names based on firmware/topology/location information has the big advantage that the names are fully automatic, fully predictable, that they stay fixed even if hardware is added or removed (i.e. no reenumeration takes place) and that broken hardware can be replaced seamlessly. That said, they admittedly are sometimes harder to read than the `eth0` or `wlan0` everybody is used to. Example: `enp5s0`
We believe it is a good default choice to generalize the scheme pioneered by "biosdevname". Assigning fixed names based on firmware/topology/location information has the big advantage that the names are fully automatic, fully predictable, that they stay fixed even if hardware is added or removed (i.e. no reenumeration takes place) and that broken hardware can be replaced seamlessly. That said, they admittedly are sometimes harder to read than the "eth0" or "wlan0" everybody is used to. Example: "enp5s0"
## What precisely has changed in v197?
@ -46,14 +47,14 @@ With this new scheme you now get:
* Stable interface names even if you have to replace broken ethernet cards by new ones
* The names are automatically determined without user configuration, they just work
* The interface names are fully predictable, i.e. just by looking at lspci you can figure out what the interface is going to be called
* Fully stateless operation, changing the hardware configuration will not result in changes in `/etc`
* Fully stateless operation, changing the hardware configuration will not result in changes in /etc
* Compatibility with read-only root
* The network interface naming now follows more closely the scheme used for aliasing block device nodes and other device nodes in `/dev` via symlinks
* The network interface naming now follows more closely the scheme used for aliasing block device nodes and other device nodes in /dev via symlinks
* Applicability to both x86 and non-x86 machines
* The same on all distributions that adopted systemd/udev
* It's easy to opt out of the scheme (see below)
Does this have any drawbacks? Yes, it does. Previously it was practically guaranteed that hosts equipped with a single ethernet card only had a single `eth0` interface. With this new scheme in place, an administrator now has to check first what the local interface name is before he can invoke commands on it where previously he had a good chance that `eth0` was the right name.
Does this have any drawbacks? Yes, it does. Previously it was practically guaranteed that hosts equipped with a single ethernet card only had a single "eth0" interface. With this new scheme in place, an administrator now has to check first what the local interface name is before he can invoke commands on it where previously he had a good chance that "eth0" was the right name.
## I don't like this, how do I disable this?
@ -61,9 +62,9 @@ Does this have any drawbacks? Yes, it does. Previously it was practically guaran
You basically have three options:
1. You disable the assignment of fixed names, so that the unpredictable kernel names are used again. For this, simply mask udev's .link file for the default policy: `ln -s /dev/null /etc/systemd/network/99-default.link`
1. You create your own manual naming scheme, for example by naming your interfaces `internet0`, `dmz0` or `lan0`. For that create your own `.link` files in `/etc/systemd/network/`, that choose an explicit name or a better naming scheme for one, some, or all of your interfaces. See [systemd.link(5)](http://www.freedesktop.org/software/systemd/man/systemd.link.html) for more information.
1. You pass the `net.ifnames=0` on the kernel command line
1. You create your own manual naming scheme, for example by naming your interfaces "internet0", "dmz0" or "lan0". For that create your own .link files in /etc/systemd/network/, that choose an explicit name or a better naming scheme for one, some, or all of your interfaces. See [[systemd.link(5)|http://www.freedesktop.org/software/systemd/man/systemd.link.html]] for more information.
1. You pass the net.ifnames=0 on the kernel command line
## How does the new naming scheme look like, precisely?
That's documented in detail the [systemd.net-naming-scheme(7)](https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html) man page. Please refer to this in case you are wondering how to decode the new interface names.
That's documented in detail in a comment block [[the sources of the net_id built-in|https://github.com/systemd/systemd/blob/master/src/udev/udev-builtin-net_id.c#L20]]. Please refer to this in case you are wondering how to decode the new interface names.

View File

@ -262,7 +262,7 @@
<varlistentry>
<term><constant>v238</constant></term>
<listitem><para>This is the naming scheme that was implemented in systemd 238.</para></listitem>
<listitem><para>This is the naming naming that was implemented in systemd 238.</para></listitem>
</varlistentry>
<varlistentry>
@ -428,7 +428,8 @@ ID_NET_NAME_PATH=encf5f0</programlisting>
<para>
<citerefentry><refentrytitle>udev</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
<citerefentry><refentrytitle>udevadm</refentrytitle><manvolnum>8</manvolnum></citerefentry>,
<ulink url="https://systemd.io/PREDICTABLE_INTERFACE_NAMES">Predictable Network Interface Names</ulink>
<ulink url="https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames">the
original page describing stable interface names</ulink>
</para>
</refsect1>

View File

@ -10,16 +10,25 @@
#include "qdisc.h"
#include "string-util.h"
static int fair_queuing_controlled_delay_fill_message(Link *link, QDisc *qdisc, sd_netlink_message *req) {
FairQueuingControlledDelay *fqcd;
int fair_queuing_controlled_delay_new(FairQueuingControlledDelay **ret) {
FairQueuingControlledDelay *fqcd = NULL;
fqcd = new0(FairQueuingControlledDelay, 1);
if (!fqcd)
return -ENOMEM;
*ret = TAKE_PTR(fqcd);
return 0;
}
int fair_queuing_controlled_delay_fill_message(Link *link, const FairQueuingControlledDelay *fqcd, sd_netlink_message *req) {
int r;
assert(link);
assert(qdisc);
assert(fqcd);
assert(req);
fqcd = FQ_CODEL(qdisc);
r = sd_netlink_message_open_container_union(req, TCA_OPTIONS, "fq_codel");
if (r < 0)
return log_link_error_errno(link, r, "Could not open container TCA_OPTIONS: %m");
@ -48,7 +57,6 @@ int config_parse_tc_fair_queuing_controlled_delay_limit(
void *userdata) {
_cleanup_(qdisc_free_or_set_invalidp) QDisc *qdisc = NULL;
FairQueuingControlledDelay *fqcd;
Network *network = data;
int r;
@ -57,23 +65,18 @@ int config_parse_tc_fair_queuing_controlled_delay_limit(
assert(rvalue);
assert(data);
r = qdisc_new_static(QDISC_KIND_FQ_CODEL, network, filename, section_line, &qdisc);
if (r == -ENOMEM)
return log_oom();
r = qdisc_new_static(network, filename, section_line, &qdisc);
if (r < 0)
return log_syntax(unit, LOG_ERR, filename, line, r,
"More than one kind of queueing discipline, ignoring assignment: %m");
fqcd = FQ_CODEL(qdisc);
return r;
if (isempty(rvalue)) {
fqcd->limit = 0;
qdisc->fq_codel.limit = 0;
qdisc = NULL;
return 0;
}
r = safe_atou32(rvalue, &fqcd->limit);
r = safe_atou32(rvalue, &qdisc->fq_codel.limit);
if (r < 0) {
log_syntax(unit, LOG_ERR, filename, line, r,
"Failed to parse '%s=', ignoring assignment: %s",
@ -81,13 +84,8 @@ int config_parse_tc_fair_queuing_controlled_delay_limit(
return 0;
}
qdisc->has_fair_queuing_controlled_delay = true;
qdisc = NULL;
return 0;
}
const QDiscVTable fq_codel_vtable = {
.object_size = sizeof(FairQueuingControlledDelay),
.tca_kind = "fq_codel",
.fill_message = fair_queuing_controlled_delay_fill_message,
};

View File

@ -2,15 +2,16 @@
* Copyright © 2019 VMware, Inc. */
#pragma once
#include "sd-netlink.h"
#include "conf-parser.h"
#include "qdisc.h"
#include "networkd-link.h"
typedef struct FairQueuingControlledDelay {
QDisc meta;
uint32_t limit;
} FairQueuingControlledDelay;
DEFINE_QDISC_CAST(FQ_CODEL, FairQueuingControlledDelay);
extern const QDiscVTable fq_codel_vtable;
int fair_queuing_controlled_delay_new(FairQueuingControlledDelay **ret);
int fair_queuing_controlled_delay_fill_message(Link *link, const FairQueuingControlledDelay *sfq, sd_netlink_message *req);
CONFIG_PARSER_PROTOTYPE(config_parse_tc_fair_queuing_controlled_delay_limit);

View File

@ -13,19 +13,33 @@
#include "string-util.h"
#include "tc-util.h"
static int network_emulator_fill_message(Link *link, QDisc *qdisc, sd_netlink_message *req) {
int network_emulator_new(NetworkEmulator **ret) {
NetworkEmulator *ne = NULL;
ne = new(NetworkEmulator, 1);
if (!ne)
return -ENOMEM;
*ne = (NetworkEmulator) {
.delay = USEC_INFINITY,
.jitter = USEC_INFINITY,
};
*ret = TAKE_PTR(ne);
return 0;
}
int network_emulator_fill_message(Link *link, const NetworkEmulator *ne, sd_netlink_message *req) {
struct tc_netem_qopt opt = {
.limit = 1000,
};
NetworkEmulator *ne;
int r;
assert(link);
assert(qdisc);
assert(ne);
assert(req);
ne = NETEM(qdisc);
if (ne->limit > 0)
opt.limit = ne->limit;
@ -68,7 +82,6 @@ int config_parse_tc_network_emulator_delay(
_cleanup_(qdisc_free_or_set_invalidp) QDisc *qdisc = NULL;
Network *network = data;
NetworkEmulator *ne;
usec_t u;
int r;
@ -77,20 +90,15 @@ int config_parse_tc_network_emulator_delay(
assert(rvalue);
assert(data);
r = qdisc_new_static(QDISC_KIND_NETEM, network, filename, section_line, &qdisc);
if (r == -ENOMEM)
return log_oom();
r = qdisc_new_static(network, filename, section_line, &qdisc);
if (r < 0)
return log_syntax(unit, LOG_ERR, filename, line, r,
"More than one kind of queueing discipline, ignoring assignment: %m");
ne = NETEM(qdisc);
return r;
if (isempty(rvalue)) {
if (streq(lvalue, "NetworkEmulatorDelaySec"))
ne->delay = USEC_INFINITY;
qdisc->ne.delay = USEC_INFINITY;
else if (streq(lvalue, "NetworkEmulatorDelayJitterSec"))
ne->jitter = USEC_INFINITY;
qdisc->ne.jitter = USEC_INFINITY;
qdisc = NULL;
return 0;
@ -105,10 +113,11 @@ int config_parse_tc_network_emulator_delay(
}
if (streq(lvalue, "NetworkEmulatorDelaySec"))
ne->delay = u;
qdisc->ne.delay = u;
else if (streq(lvalue, "NetworkEmulatorDelayJitterSec"))
ne->jitter = u;
qdisc->ne.jitter = u;
qdisc->has_network_emulator = true;
qdisc = NULL;
return 0;
@ -128,7 +137,6 @@ int config_parse_tc_network_emulator_rate(
_cleanup_(qdisc_free_or_set_invalidp) QDisc *qdisc = NULL;
Network *network = data;
NetworkEmulator *ne;
uint32_t rate;
int r;
@ -137,20 +145,12 @@ int config_parse_tc_network_emulator_rate(
assert(rvalue);
assert(data);
r = qdisc_new_static(QDISC_KIND_NETEM, network, filename, section_line, &qdisc);
if (r == -ENOMEM)
return log_oom();
r = qdisc_new_static(network, filename, section_line, &qdisc);
if (r < 0)
return log_syntax(unit, LOG_ERR, filename, line, r,
"More than one kind of queueing discipline, ignoring assignment: %m");
ne = NETEM(qdisc);
return r;
if (isempty(rvalue)) {
if (streq(lvalue, "NetworkEmulatorLossRate"))
ne->loss = 0;
else if (streq(lvalue, "NetworkEmulatorDuplicateRate"))
ne->duplicate = 0;
qdisc->ne.loss = 0;
qdisc = NULL;
return 0;
@ -165,9 +165,9 @@ int config_parse_tc_network_emulator_rate(
}
if (streq(lvalue, "NetworkEmulatorLossRate"))
ne->loss = rate;
qdisc->ne.loss = rate;
else if (streq(lvalue, "NetworkEmulatorDuplicateRate"))
ne->duplicate = rate;
qdisc->ne.duplicate = rate;
qdisc = NULL;
return 0;
@ -187,7 +187,6 @@ int config_parse_tc_network_emulator_packet_limit(
_cleanup_(qdisc_free_or_set_invalidp) QDisc *qdisc = NULL;
Network *network = data;
NetworkEmulator *ne;
int r;
assert(filename);
@ -195,23 +194,18 @@ int config_parse_tc_network_emulator_packet_limit(
assert(rvalue);
assert(data);
r = qdisc_new_static(QDISC_KIND_NETEM, network, filename, section_line, &qdisc);
if (r == -ENOMEM)
return log_oom();
r = qdisc_new_static(network, filename, section_line, &qdisc);
if (r < 0)
return log_syntax(unit, LOG_ERR, filename, line, r,
"More than one kind of queueing discipline, ignoring assignment: %m");
ne = NETEM(qdisc);
return r;
if (isempty(rvalue)) {
ne->limit = 0;
qdisc->ne.limit = 0;
qdisc = NULL;
return 0;
}
r = safe_atou(rvalue, &ne->limit);
r = safe_atou(rvalue, &qdisc->ne.limit);
if (r < 0) {
log_syntax(unit, LOG_ERR, filename, line, r,
"Failed to parse 'NetworkEmulatorPacketLimit=', ignoring assignment: %s",
@ -222,9 +216,3 @@ int config_parse_tc_network_emulator_packet_limit(
qdisc = NULL;
return 0;
}
const QDiscVTable netem_vtable = {
.object_size = sizeof(NetworkEmulator),
.tca_kind = "netem",
.fill_message = network_emulator_fill_message,
};

View File

@ -2,13 +2,13 @@
* Copyright © 2019 VMware, Inc. */
#pragma once
#include "sd-netlink.h"
#include "conf-parser.h"
#include "qdisc.h"
#include "networkd-link.h"
#include "time-util.h"
typedef struct NetworkEmulator {
QDisc meta;
usec_t delay;
usec_t jitter;
@ -17,8 +17,8 @@ typedef struct NetworkEmulator {
uint32_t duplicate;
} NetworkEmulator;
DEFINE_QDISC_CAST(NETEM, NetworkEmulator);
extern const QDiscVTable netem_vtable;
int network_emulator_new(NetworkEmulator **ret);
int network_emulator_fill_message(Link *link, const NetworkEmulator *ne, sd_netlink_message *req);
CONFIG_PARSER_PROTOTYPE(config_parse_tc_network_emulator_delay);
CONFIG_PARSER_PROTOTYPE(config_parse_tc_network_emulator_rate);

View File

@ -13,94 +13,65 @@
#include "set.h"
#include "string-util.h"
const QDiscVTable * const qdisc_vtable[_QDISC_KIND_MAX] = {
[QDISC_KIND_FQ_CODEL] = &fq_codel_vtable,
[QDISC_KIND_NETEM] = &netem_vtable,
[QDISC_KIND_SFQ] = &sfq_vtable,
[QDISC_KIND_TBF] = &tbf_vtable,
};
static int qdisc_new(QDiscKind kind, QDisc **ret) {
static int qdisc_new(QDisc **ret) {
QDisc *qdisc;
if (kind == _QDISC_KIND_INVALID) {
qdisc = new(QDisc, 1);
if (!qdisc)
return -ENOMEM;
qdisc = new(QDisc, 1);
if (!qdisc)
return -ENOMEM;
*qdisc = (QDisc) {
.family = AF_UNSPEC,
.parent = TC_H_ROOT,
.kind = kind,
};
} else {
qdisc = malloc0(qdisc_vtable[kind]->object_size);
if (!qdisc)
return -ENOMEM;
qdisc->family = AF_UNSPEC;
qdisc->parent = TC_H_ROOT;
qdisc->kind = kind;
}
*qdisc = (QDisc) {
.family = AF_UNSPEC,
.parent = TC_H_ROOT,
};
*ret = TAKE_PTR(qdisc);
return 0;
}
int qdisc_new_static(QDiscKind kind, Network *network, const char *filename, unsigned section_line, QDisc **ret) {
int qdisc_new_static(Network *network, const char *filename, unsigned section_line, QDisc **ret) {
_cleanup_(network_config_section_freep) NetworkConfigSection *n = NULL;
_cleanup_(qdisc_freep) QDisc *qdisc = NULL;
QDisc *existing;
int r;
assert(network);
assert(ret);
assert(filename);
assert(section_line > 0);
assert(!!filename == (section_line > 0));
r = network_config_section_new(filename, section_line, &n);
if (r < 0)
return r;
if (filename) {
r = network_config_section_new(filename, section_line, &n);
if (r < 0)
return r;
existing = ordered_hashmap_get(network->qdiscs_by_section, n);
if (existing) {
if (existing->kind != _QDISC_KIND_INVALID &&
kind != _QDISC_KIND_INVALID &&
existing->kind != kind)
return -EINVAL;
qdisc = ordered_hashmap_get(network->qdiscs_by_section, n);
if (qdisc) {
*ret = TAKE_PTR(qdisc);
if (existing->kind == kind || kind == _QDISC_KIND_INVALID) {
*ret = existing;
return 0;
}
}
r = qdisc_new(kind, &qdisc);
r = qdisc_new(&qdisc);
if (r < 0)
return r;
if (existing) {
qdisc->family = existing->family;
qdisc->handle = existing->handle;
qdisc->parent = existing->parent;
qdisc->tca_kind = TAKE_PTR(existing->tca_kind);
qdisc_free(ordered_hashmap_remove(network->qdiscs_by_section, n));
}
qdisc->network = network;
qdisc->section = TAKE_PTR(n);
r = ordered_hashmap_ensure_allocated(&network->qdiscs_by_section, &network_config_hash_ops);
if (r < 0)
return r;
if (filename) {
qdisc->section = TAKE_PTR(n);
r = ordered_hashmap_put(network->qdiscs_by_section, qdisc->section, qdisc);
if (r < 0)
return r;
r = ordered_hashmap_ensure_allocated(&network->qdiscs_by_section, &network_config_hash_ops);
if (r < 0)
return r;
r = ordered_hashmap_put(network->qdiscs_by_section, qdisc->section, qdisc);
if (r < 0)
return r;
}
*ret = TAKE_PTR(qdisc);
return 0;
}
@ -145,6 +116,8 @@ static int qdisc_handler(sd_netlink *rtnl, sd_netlink_message *m, Link *link) {
int qdisc_configure(Link *link, QDisc *qdisc) {
_cleanup_(sd_netlink_message_unrefp) sd_netlink_message *req = NULL;
_cleanup_free_ char *tca_kind = NULL;
char *p;
int r;
assert(link);
@ -166,16 +139,49 @@ int qdisc_configure(Link *link, QDisc *qdisc) {
return log_link_error_errno(link, r, "Could not set tcm_handle message: %m");
}
if (QDISC_VTABLE(qdisc)) {
r = sd_netlink_message_append_string(req, TCA_KIND, QDISC_VTABLE(qdisc)->tca_kind);
if (qdisc->has_network_emulator) {
r = free_and_strdup(&tca_kind, "netem");
if (r < 0)
return log_link_error_errno(link, r, "Could not append TCA_KIND attribute: %m");
return log_oom();
r = QDISC_VTABLE(qdisc)->fill_message(link, qdisc, req);
r = network_emulator_fill_message(link, &qdisc->ne, req);
if (r < 0)
return r;
} else {
r = sd_netlink_message_append_string(req, TCA_KIND, qdisc->tca_kind);
}
if (qdisc->has_token_buffer_filter) {
r = free_and_strdup(&tca_kind, "tbf");
if (r < 0)
return log_oom();
r = token_buffer_filter_fill_message(link, &qdisc->tbf, req);
if (r < 0)
return r;
}
if (qdisc->has_stochastic_fairness_queueing) {
r = free_and_strdup(&tca_kind, "sfq");
if (r < 0)
return log_oom();
r = stochastic_fairness_queueing_fill_message(link, &qdisc->sfq, req);
if (r < 0)
return r;
}
if (qdisc->has_fair_queuing_controlled_delay) {
r = free_and_strdup(&tca_kind, "fq_codel");
if (r < 0)
return log_oom();
r = fair_queuing_controlled_delay_fill_message(link, &qdisc->fq_codel, req);
if (r < 0)
return r;
}
p = tca_kind ?:qdisc->tca_kind;
if (p) {
r = sd_netlink_message_append_string(req, TCA_KIND, p);
if (r < 0)
return log_link_error_errno(link, r, "Could not append TCA_KIND attribute: %m");
}
@ -191,6 +197,7 @@ int qdisc_configure(Link *link, QDisc *qdisc) {
}
int qdisc_section_verify(QDisc *qdisc, bool *has_root, bool *has_clsact) {
unsigned i;
int r;
assert(qdisc);
@ -200,8 +207,15 @@ int qdisc_section_verify(QDisc *qdisc, bool *has_root, bool *has_clsact) {
if (section_is_invalid(qdisc->section))
return -EINVAL;
if (QDISC_VTABLE(qdisc) && QDISC_VTABLE(qdisc)->verify) {
r = QDISC_VTABLE(qdisc)->verify(qdisc);
i = qdisc->has_network_emulator + qdisc->has_token_buffer_filter + qdisc->has_stochastic_fairness_queueing;
if (i > 1)
return log_warning_errno(SYNTHETIC_ERRNO(EINVAL),
"%s: TrafficControlQueueingDiscipline section has more than one type of discipline. "
"Ignoring [TrafficControlQueueingDiscipline] section from line %u.",
qdisc->section->filename, qdisc->section->line);
if (qdisc->has_token_buffer_filter) {
r = token_buffer_filter_section_verify(&qdisc->tbf, qdisc->section);
if (r < 0)
return r;
}
@ -246,7 +260,7 @@ int config_parse_tc_qdiscs_parent(
assert(rvalue);
assert(data);
r = qdisc_new_static(_QDISC_KIND_INVALID, network, filename, section_line, &qdisc);
r = qdisc_new_static(network, filename, section_line, &qdisc);
if (r < 0)
return r;

View File

@ -3,65 +3,44 @@
#pragma once
#include "conf-parser.h"
#include "fq-codel.h"
#include "netem.h"
#include "networkd-link.h"
#include "networkd-network.h"
#include "networkd-util.h"
typedef enum QDiscKind {
QDISC_KIND_FQ_CODEL,
QDISC_KIND_NETEM,
QDISC_KIND_SFQ,
QDISC_KIND_TBF,
_QDISC_KIND_MAX,
_QDISC_KIND_INVALID = -1,
} QDiscKind;
#include "sfq.h"
#include "tbf.h"
typedef struct QDisc {
NetworkConfigSection *section;
Network *network;
Link *link;
int family;
uint32_t handle;
uint32_t parent;
char *tca_kind;
QDiscKind kind;
bool has_network_emulator:1;
bool has_token_buffer_filter:1;
bool has_stochastic_fairness_queueing:1;
bool has_fair_queuing_controlled_delay:1;
NetworkEmulator ne;
TokenBufferFilter tbf;
StochasticFairnessQueueing sfq;
FairQueuingControlledDelay fq_codel;
} QDisc;
typedef struct QDiscVTable {
size_t object_size;
const char *tca_kind;
int (*fill_message)(Link *link, QDisc *qdisc, sd_netlink_message *m);
int (*verify)(QDisc *qdisc);
} QDiscVTable;
extern const QDiscVTable * const qdisc_vtable[_QDISC_KIND_MAX];
#define QDISC_VTABLE(q) ((q)->kind != _QDISC_KIND_INVALID ? qdisc_vtable[(q)->kind] : NULL)
/* For casting a qdisc into the various qdisc kinds */
#define DEFINE_QDISC_CAST(UPPERCASE, MixedCase) \
static inline MixedCase* UPPERCASE(QDisc *q) { \
if (_unlikely_(!q || q->kind != QDISC_KIND_##UPPERCASE)) \
return NULL; \
\
return (MixedCase*) q; \
}
/* For casting the various qdisc kinds into a qdisc */
#define QDISC(q) (&(q)->meta)
void qdisc_free(QDisc *qdisc);
int qdisc_new_static(QDiscKind kind, Network *network, const char *filename, unsigned section_line, QDisc **ret);
int qdisc_new_static(Network *network, const char *filename, unsigned section_line, QDisc **ret);
int qdisc_configure(Link *link, QDisc *qdisc);
int qdisc_section_verify(QDisc *qdisc, bool *has_root, bool *has_clsact);
DEFINE_NETWORK_SECTION_FUNCTIONS(QDisc, qdisc_free);
CONFIG_PARSER_PROTOTYPE(config_parse_tc_qdiscs_parent);
#include "fq-codel.h"
#include "netem.h"
#include "sfq.h"
#include "tbf.h"

View File

@ -11,17 +11,26 @@
#include "sfq.h"
#include "string-util.h"
static int stochastic_fairness_queueing_fill_message(Link *link, QDisc *qdisc, sd_netlink_message *req) {
StochasticFairnessQueueing *sfq;
int stochastic_fairness_queueing_new(StochasticFairnessQueueing **ret) {
StochasticFairnessQueueing *sfq = NULL;
sfq = new0(StochasticFairnessQueueing, 1);
if (!sfq)
return -ENOMEM;
*ret = TAKE_PTR(sfq);
return 0;
}
int stochastic_fairness_queueing_fill_message(Link *link, const StochasticFairnessQueueing *sfq, sd_netlink_message *req) {
struct tc_sfq_qopt_v1 opt = {};
int r;
assert(link);
assert(qdisc);
assert(sfq);
assert(req);
sfq = SFQ(qdisc);
opt.v0.perturb_period = sfq->perturb_period / USEC_PER_SEC;
r = sd_netlink_message_append_data(req, TCA_OPTIONS, &opt, sizeof(struct tc_sfq_qopt_v1));
@ -44,7 +53,6 @@ int config_parse_tc_stochastic_fairness_queueing_perturb_period(
void *userdata) {
_cleanup_(qdisc_free_or_set_invalidp) QDisc *qdisc = NULL;
StochasticFairnessQueueing *sfq;
Network *network = data;
int r;
@ -53,23 +61,18 @@ int config_parse_tc_stochastic_fairness_queueing_perturb_period(
assert(rvalue);
assert(data);
r = qdisc_new_static(QDISC_KIND_SFQ, network, filename, section_line, &qdisc);
if (r == -ENOMEM)
return log_oom();
r = qdisc_new_static(network, filename, section_line, &qdisc);
if (r < 0)
return log_syntax(unit, LOG_ERR, filename, line, r,
"More than one kind of queueing discipline, ignoring assignment: %m");
sfq = SFQ(qdisc);
return r;
if (isempty(rvalue)) {
sfq->perturb_period = 0;
qdisc->sfq.perturb_period = 0;
qdisc = NULL;
return 0;
}
r = parse_sec(rvalue, &sfq->perturb_period);
r = parse_sec(rvalue, &qdisc->sfq.perturb_period);
if (r < 0) {
log_syntax(unit, LOG_ERR, filename, line, r,
"Failed to parse '%s=', ignoring assignment: %s",
@ -77,13 +80,8 @@ int config_parse_tc_stochastic_fairness_queueing_perturb_period(
return 0;
}
qdisc->has_stochastic_fairness_queueing = true;
qdisc = NULL;
return 0;
}
const QDiscVTable sfq_vtable = {
.object_size = sizeof(StochasticFairnessQueueing),
.tca_kind = "sfq",
.fill_message = stochastic_fairness_queueing_fill_message,
};

View File

@ -2,17 +2,16 @@
* Copyright © 2019 VMware, Inc. */
#pragma once
#include "sd-netlink.h"
#include "conf-parser.h"
#include "qdisc.h"
#include "time-util.h"
#include "networkd-link.h"
typedef struct StochasticFairnessQueueing {
QDisc meta;
usec_t perturb_period;
} StochasticFairnessQueueing;
DEFINE_QDISC_CAST(SFQ, StochasticFairnessQueueing);
extern const QDiscVTable sfq_vtable;
int stochastic_fairness_queueing_new(StochasticFairnessQueueing **ret);
int stochastic_fairness_queueing_fill_message(Link *link, const StochasticFairnessQueueing *sfq, sd_netlink_message *req);
CONFIG_PARSER_PROTOTYPE(config_parse_tc_stochastic_fairness_queueing_perturb_period);

View File

@ -15,18 +15,27 @@
#include "tc-util.h"
#include "util.h"
static int token_buffer_filter_fill_message(Link *link, QDisc *qdisc, sd_netlink_message *req) {
int token_buffer_filter_new(TokenBufferFilter **ret) {
TokenBufferFilter *ne = NULL;
ne = new0(TokenBufferFilter, 1);
if (!ne)
return -ENOMEM;
*ret = TAKE_PTR(ne);
return 0;
}
int token_buffer_filter_fill_message(Link *link, const TokenBufferFilter *tbf, sd_netlink_message *req) {
uint32_t rtab[256], ptab[256];
struct tc_tbf_qopt opt = {};
TokenBufferFilter *tbf;
int r;
assert(link);
assert(qdisc);
assert(tbf);
assert(req);
tbf = TBF(qdisc);
opt.rate.rate = tbf->rate >= (1ULL << 32) ? ~0U : tbf->rate;
opt.peakrate.rate = tbf->peak_rate >= (1ULL << 32) ? ~0U : tbf->peak_rate;
@ -124,7 +133,6 @@ int config_parse_tc_token_buffer_filter_size(
_cleanup_(qdisc_free_or_set_invalidp) QDisc *qdisc = NULL;
Network *network = data;
TokenBufferFilter *tbf;
uint64_t k;
int r;
@ -133,28 +141,23 @@ int config_parse_tc_token_buffer_filter_size(
assert(rvalue);
assert(data);
r = qdisc_new_static(QDISC_KIND_TBF, network, filename, section_line, &qdisc);
if (r == -ENOMEM)
return log_oom();
r = qdisc_new_static(network, filename, section_line, &qdisc);
if (r < 0)
return log_syntax(unit, LOG_ERR, filename, line, r,
"More than one kind of queueing discipline, ignoring assignment: %m");
tbf = TBF(qdisc);
return r;
if (isempty(rvalue)) {
if (streq(lvalue, "TokenBufferFilterRate"))
tbf->rate = 0;
qdisc->tbf.rate = 0;
else if (streq(lvalue, "TokenBufferFilterBurst"))
tbf->burst = 0;
qdisc->tbf.burst = 0;
else if (streq(lvalue, "TokenBufferFilterLimitSize"))
tbf->limit = 0;
qdisc->tbf.limit = 0;
else if (streq(lvalue, "TokenBufferFilterMTUBytes"))
tbf->mtu = 0;
qdisc->tbf.mtu = 0;
else if (streq(lvalue, "TokenBufferFilterMPUBytes"))
tbf->mpu = 0;
qdisc->tbf.mpu = 0;
else if (streq(lvalue, "TokenBufferFilterPeakRate"))
tbf->peak_rate = 0;
qdisc->tbf.peak_rate = 0;
qdisc = NULL;
return 0;
@ -169,18 +172,19 @@ int config_parse_tc_token_buffer_filter_size(
}
if (streq(lvalue, "TokenBufferFilterRate"))
tbf->rate = k / 8;
qdisc->tbf.rate = k / 8;
else if (streq(lvalue, "TokenBufferFilterBurst"))
tbf->burst = k;
qdisc->tbf.burst = k;
else if (streq(lvalue, "TokenBufferFilterLimitSize"))
tbf->limit = k;
qdisc->tbf.limit = k;
else if (streq(lvalue, "TokenBufferFilterMPUBytes"))
tbf->mpu = k;
qdisc->tbf.mpu = k;
else if (streq(lvalue, "TokenBufferFilterMTUBytes"))
tbf->mtu = k;
qdisc->tbf.mtu = k;
else if (streq(lvalue, "TokenBufferFilterPeakRate"))
tbf->peak_rate = k / 8;
qdisc->tbf.peak_rate = k / 8;
qdisc->has_token_buffer_filter = true;
qdisc = NULL;
return 0;
@ -200,7 +204,6 @@ int config_parse_tc_token_buffer_filter_latency(
_cleanup_(qdisc_free_or_set_invalidp) QDisc *qdisc = NULL;
Network *network = data;
TokenBufferFilter *tbf;
usec_t u;
int r;
@ -209,17 +212,12 @@ int config_parse_tc_token_buffer_filter_latency(
assert(rvalue);
assert(data);
r = qdisc_new_static(QDISC_KIND_TBF, network, filename, section_line, &qdisc);
if (r == -ENOMEM)
return log_oom();
r = qdisc_new_static(network, filename, section_line, &qdisc);
if (r < 0)
return log_syntax(unit, LOG_ERR, filename, line, r,
"More than one kind of queueing discipline, ignoring assignment: %m");
tbf = TBF(qdisc);
return r;
if (isempty(rvalue)) {
tbf->latency = 0;
qdisc->tbf.latency = 0;
qdisc = NULL;
return 0;
@ -233,52 +231,44 @@ int config_parse_tc_token_buffer_filter_latency(
return 0;
}
tbf->latency = u;
qdisc->tbf.latency = u;
qdisc->has_token_buffer_filter = true;
qdisc = NULL;
return 0;
}
static int token_buffer_filter_verify(QDisc *qdisc) {
TokenBufferFilter *tbf = TBF(qdisc);
int token_buffer_filter_section_verify(const TokenBufferFilter *tbf, const NetworkConfigSection *section) {
if (tbf->limit > 0 && tbf->latency > 0)
return log_warning_errno(SYNTHETIC_ERRNO(EINVAL),
"%s: Specifying both TokenBufferFilterLimitSize= and TokenBufferFilterLatencySec= is not allowed. "
"Ignoring [TrafficControlQueueingDiscipline] section from line %u.",
qdisc->section->filename, qdisc->section->line);
section->filename, section->line);
if (tbf->limit == 0 && tbf->latency == 0)
return log_warning_errno(SYNTHETIC_ERRNO(EINVAL),
"%s: Either TokenBufferFilterLimitSize= or TokenBufferFilterLatencySec= is required. "
"Ignoring [TrafficControlQueueingDiscipline] section from line %u.",
qdisc->section->filename, qdisc->section->line);
section->filename, section->line);
if (tbf->rate == 0)
return log_warning_errno(SYNTHETIC_ERRNO(EINVAL),
"%s: TokenBufferFilterRate= is mandatory. "
"Ignoring [TrafficControlQueueingDiscipline] section from line %u.",
qdisc->section->filename, qdisc->section->line);
section->filename, section->line);
if (tbf->burst == 0)
return log_warning_errno(SYNTHETIC_ERRNO(EINVAL),
"%s: TokenBufferFilterBurst= is mandatory. "
"Ignoring [TrafficControlQueueingDiscipline] section from line %u.",
qdisc->section->filename, qdisc->section->line);
section->filename, section->line);
if (tbf->peak_rate > 0 && tbf->mtu == 0)
return log_warning_errno(SYNTHETIC_ERRNO(EINVAL),
"%s: TokenBufferFilterMTUBytes= is mandatory when TokenBufferFilterPeakRate= is specified. "
"Ignoring [TrafficControlQueueingDiscipline] section from line %u.",
qdisc->section->filename, qdisc->section->line);
section->filename, section->line);
return 0;
}
const QDiscVTable tbf_vtable = {
.object_size = sizeof(TokenBufferFilter),
.tca_kind = "tbf",
.fill_message = token_buffer_filter_fill_message,
.verify = token_buffer_filter_verify
};

View File

@ -2,13 +2,14 @@
* Copyright © 2019 VMware, Inc. */
#pragma once
#include "sd-netlink.h"
#include "conf-parser.h"
#include "qdisc.h"
#include "time-util.h"
#include "networkd-link.h"
#include "networkd-util.h"
#include "tc-util.h"
typedef struct TokenBufferFilter {
QDisc meta;
uint64_t rate;
uint64_t peak_rate;
uint32_t burst;
@ -18,8 +19,9 @@ typedef struct TokenBufferFilter {
size_t mpu;
} TokenBufferFilter;
DEFINE_QDISC_CAST(TBF, TokenBufferFilter);
extern const QDiscVTable tbf_vtable;
int token_buffer_filter_new(TokenBufferFilter **ret);
int token_buffer_filter_fill_message(Link *link, const TokenBufferFilter *tbf, sd_netlink_message *req);
int token_buffer_filter_section_verify(const TokenBufferFilter *tbf, const NetworkConfigSection *section);
CONFIG_PARSER_PROTOTYPE(config_parse_tc_token_buffer_filter_latency);
CONFIG_PARSER_PROTOTYPE(config_parse_tc_token_buffer_filter_size);

View File

@ -7,7 +7,7 @@
* - physical/geographical location of the hardware
* - the interface's MAC address
*
* https://systemd.io/PREDICTABLE_INTERFACE_NAMES
* http://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames
*
* When the code here is changed, man/systemd.net-naming-scheme.xml must be updated too.
*/