Compare commits

...

11 Commits

Author SHA1 Message Date
Lennart Poettering 9929fe8c95
Merge pull request #14252 from keszybz/growfs-port-resizefs
Port growfs over to resizefs
2019-12-06 08:55:30 +01:00
Lennart Poettering 5391dd7bc0
Merge pull request #14253 from keszybz/cleanups
Cleanups
2019-12-06 08:55:15 +01:00
Lennart Poettering 5face5a50a
Merge pull request #14167 from cpaelzer/fix-MemoryDenyWriteExecute-x86-s390-bug-1853852-UPSTREAM
Fix memory_deny_write_execute on x86 and s390 with libseccomp 2.4.2
2019-12-06 08:54:54 +01:00
Zbigniew Jędrzejewski-Szmek 5ebbb45bde TODO: remove obsolete entries
"introspect" is well established and OK. We shouldn't change it at this point.
2019-12-05 10:35:32 +01:00
Zbigniew Jędrzejewski-Szmek bddeb54cbb Fix use of unitialized variable in error path
CID 1408478.
2019-12-05 10:31:34 +01:00
Zbigniew Jędrzejewski-Szmek d6f1e66076 growfs: port over to resize_fs() 2019-12-05 10:15:49 +01:00
Zbigniew Jędrzejewski-Szmek 2b82a99fe0 growfs: define main function through macro 2019-12-05 09:22:13 +01:00
Christian Ehrhardt 49219b5c2a
seccomp: mmap test results depend on kernel/libseccomp/glibc
Like with shmat already the actual results of the test
test_memory_deny_write_execute_mmap depend on kernel/libseccomp/glibc
of the platform it is running on.

There are known-good platforms, but on the others do not assert success
(which implies test has actually failed as no seccomp blocking was achieved),
but instead make the check dependent to the success of the mmap call
on that platforms.

Finally the assert of the munmap on that valid pointer should return ==0,
so that is what the check should be for in case of p != MAP_FAILED.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
2019-12-05 07:19:12 +01:00
Christian Ehrhardt 5ef3ed97e3
seccomp: use per arch shmat_syscall
At the beginning of seccomp_memory_deny_write_execute architectures
can set individual filter_syscall, block_syscall, shmat_syscall values.
The former two are then used in the call to add_seccomp_syscall_filter
but shmat_syscall is not.

Right now all shmat_syscall values are the same, so the change is a
no-op, but if ever an architecture is added/modified this would be a
subtle source for a mistake so fix it by using shmat_syscall later.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
2019-12-05 07:19:12 +01:00
Christian Ehrhardt 903659e7b2
seccomp: ensure rules are loaded in seccomp_memory_deny_write_execute
If seccomp_memory_deny_write_execute was fatally failing to load rules it
already returned a bad retval.
But if any adding filters failed it skipped the subsequent seccomp_load and
always returned an rc of 0 even if no rule was loaded at all.

Lets fix this requiring to (non fatally-failing) load at least one rule set.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
2019-12-05 07:19:12 +01:00
Christian Ehrhardt bed4668d1d
seccomp: fix multiplexed system calls
Since libseccomp 2.4.2 more architectures have shmat handled as multiplexed
call. Those will fail to be added due to seccomp_rule_add_exact failing
on them since they'd need to add multiple rules [1].
See the discussion at https://github.com/seccomp/libseccomp/issues/193

After discussions about the options rejected [2][3] the initial thought of
a fallback to the non '_exact' version of the seccomp rule adding the next
option is to handle those now affected (i386, s390, s390x) the same way as
ppc which ignores and does not block shmat.

[1]: https://github.com/seccomp/libseccomp/issues/193
[2]: https://github.com/systemd/systemd/pull/14167#issuecomment-559136906
[3]: https://github.com/systemd/systemd/commit/469830d1
2019-12-05 07:19:07 +01:00
7 changed files with 72 additions and 140 deletions

8
TODO
View File

@ -378,8 +378,6 @@ Features:
* show whether a service has out-of-date configuration in "systemctl status" by * show whether a service has out-of-date configuration in "systemctl status" by
using mtime data of ConfigurationDirectory=. using mtime data of ConfigurationDirectory=.
* replace all remaining uses of fgets() + LINE_MAX by read_line()
* Add AddUser= setting to unit files, similar to DynamicUser=1 which however * Add AddUser= setting to unit files, similar to DynamicUser=1 which however
creates a static, persistent user rather than a dynamic, transient user. We creates a static, persistent user rather than a dynamic, transient user. We
can leverage code from sysusers.d for this. can leverage code from sysusers.d for this.
@ -460,8 +458,6 @@ Features:
* define gpt header bits to select volatility mode * define gpt header bits to select volatility mode
* ProtectKernelLogs= (drops CAP_SYSLOG, add seccomp for syslog() syscall, and DeviceAllow to /dev/kmsg) in service files
* ProtectClock= (drops CAP_SYS_TIMES, adds seecomp filters for settimeofday, adjtimex), sets DeviceAllow o /dev/rtc * ProtectClock= (drops CAP_SYS_TIMES, adds seecomp filters for settimeofday, adjtimex), sets DeviceAllow o /dev/rtc
* ProtectTracing= (drops CAP_SYS_PTRACE, blocks ptrace syscall, makes /sys/kernel/tracing go away) * ProtectTracing= (drops CAP_SYS_PTRACE, blocks ptrace syscall, makes /sys/kernel/tracing go away)
@ -519,7 +515,7 @@ Features:
* when we detect that there are waiting jobs but no running jobs, do something * when we detect that there are waiting jobs but no running jobs, do something
* push CPUAffinity= also into the "cpuset" cgroup controller (only after the cpuset controller got ported to the unified hierarchy) * push CPUAffinity= also into the "cpuset" cgroup controller
* PID 1 should send out sd_notify("WATCHDOG=1") messages (for usage in the --user mode, and when run via nspawn) * PID 1 should send out sd_notify("WATCHDOG=1") messages (for usage in the --user mode, and when run via nspawn)
@ -580,8 +576,6 @@ Features:
* what to do about udev db binary stability for apps? (raw access is not an option) * what to do about udev db binary stability for apps? (raw access is not an option)
* man: maybe use the word "inspect" rather than "introspect"?
* systemctl: if some operation fails, show log output? * systemctl: if some operation fails, show log output?
* systemctl edit: use equivalent of cat() to insert existing config as a comment, prepended with #. * systemctl edit: use equivalent of cat() to insert existing config as a comment, prepended with #.

View File

@ -1340,9 +1340,8 @@ int safe_fork_full(
} }
} else if (flags & FORK_STDOUT_TO_STDERR) { } else if (flags & FORK_STDOUT_TO_STDERR) {
if (dup2(STDERR_FILENO, STDOUT_FILENO) < 0) { if (dup2(STDERR_FILENO, STDOUT_FILENO) < 0) {
log_full_errno(prio, r, "Failed to connect stdout to stderr: %m"); log_full_errno(prio, errno, "Failed to connect stdout to stderr: %m");
_exit(EXIT_FAILURE); _exit(EXIT_FAILURE);
} }
} }

View File

@ -18,57 +18,15 @@
#include "fd-util.h" #include "fd-util.h"
#include "format-util.h" #include "format-util.h"
#include "log.h" #include "log.h"
#include "missing_fs.h" #include "main-func.h"
#include "mountpoint-util.h" #include "mountpoint-util.h"
#include "parse-util.h" #include "parse-util.h"
#include "path-util.h"
#include "pretty-print.h" #include "pretty-print.h"
#include "stat-util.h" #include "resize-fs.h"
#include "strv.h"
#include "util.h"
static const char *arg_target = NULL; static const char *arg_target = NULL;
static bool arg_dry_run = false; static bool arg_dry_run = false;
static int resize_ext4(const char *path, int mountfd, int devfd, uint64_t numblocks, uint64_t blocksize) {
assert((uint64_t) (int) blocksize == blocksize);
if (arg_dry_run)
return 0;
if (ioctl(mountfd, EXT4_IOC_RESIZE_FS, &numblocks) != 0)
return log_error_errno(errno, "Failed to resize \"%s\" to %"PRIu64" blocks (ext4): %m",
path, numblocks);
return 0;
}
static int resize_btrfs(const char *path, int mountfd, int devfd, uint64_t numblocks, uint64_t blocksize) {
struct btrfs_ioctl_vol_args args = {};
int r;
assert((uint64_t) (int) blocksize == blocksize);
/* https://bugzilla.kernel.org/show_bug.cgi?id=118111 */
if (numblocks * blocksize < 256*1024*1024) {
log_warning("%s: resizing of btrfs volumes smaller than 256M is not supported", path);
return -EOPNOTSUPP;
}
r = snprintf(args.name, sizeof(args.name), "%"PRIu64, numblocks * blocksize);
/* The buffer is large enough for any number to fit... */
assert((size_t) r < sizeof(args.name));
if (arg_dry_run)
return 0;
if (ioctl(mountfd, BTRFS_IOC_RESIZE, &args) != 0)
return log_error_errno(errno, "Failed to resize \"%s\" to %"PRIu64" blocks (btrfs): %m",
path, numblocks);
return 0;
}
#if HAVE_LIBCRYPTSETUP #if HAVE_LIBCRYPTSETUP
static int resize_crypt_luks_device(dev_t devno, const char *fstype, dev_t main_devno) { static int resize_crypt_luks_device(dev_t devno, const char *fstype, dev_t main_devno) {
_cleanup_free_ char *devpath = NULL, *main_devpath = NULL; _cleanup_free_ char *devpath = NULL, *main_devpath = NULL;
@ -159,7 +117,7 @@ static int maybe_resize_slave_device(const char *mountpath, dev_t main_devno) {
return resize_crypt_luks_device(devno, fstype, main_devno); return resize_crypt_luks_device(devno, fstype, main_devno);
#endif #endif
log_debug("Don't know how to resize %s of type %s, ignoring", devpath, strnull(fstype)); log_debug("Don't know how to resize %s of type %s, ignoring.", devpath, strnull(fstype));
return 0; return 0;
} }
@ -231,100 +189,64 @@ static int parse_argv(int argc, char *argv[]) {
return 1; return 1;
} }
int main(int argc, char *argv[]) { static int run(int argc, char *argv[]) {
_cleanup_close_ int mountfd = -1, devfd = -1; _cleanup_close_ int mountfd = -1, devfd = -1;
_cleanup_free_ char *devpath = NULL; _cleanup_free_ char *devpath = NULL;
uint64_t size, numblocks; uint64_t size, newsize;
char fb[FORMAT_BYTES_MAX]; char fb[FORMAT_BYTES_MAX];
struct statfs sfs;
dev_t devno; dev_t devno;
int blocksize;
int r; int r;
log_setup_service(); log_setup_service();
r = parse_argv(argc, argv); r = parse_argv(argc, argv);
if (r < 0) if (r <= 0)
return EXIT_FAILURE; return r;
if (r == 0)
return EXIT_SUCCESS;
r = path_is_mount_point(arg_target, NULL, 0); r = path_is_mount_point(arg_target, NULL, 0);
if (r < 0) { if (r < 0)
log_error_errno(r, "Failed to check if \"%s\" is a mount point: %m", arg_target); return log_error_errno(r, "Failed to check if \"%s\" is a mount point: %m", arg_target);
return EXIT_FAILURE; if (r == 0)
} return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "\"%s\" is not a mount point: %m", arg_target);
if (r == 0) {
log_error_errno(r, "\"%s\" is not a mount point: %m", arg_target);
return EXIT_FAILURE;
}
r = get_block_device(arg_target, &devno); r = get_block_device(arg_target, &devno);
if (r < 0) { if (r < 0)
log_error_errno(r, "Failed to determine block device of \"%s\": %m", arg_target); return log_error_errno(r, "Failed to determine block device of \"%s\": %m", arg_target);
return EXIT_FAILURE;
}
r = maybe_resize_slave_device(arg_target, devno); r = maybe_resize_slave_device(arg_target, devno);
if (r < 0) if (r < 0)
return EXIT_FAILURE; return r;
mountfd = open(arg_target, O_RDONLY|O_CLOEXEC); mountfd = open(arg_target, O_RDONLY|O_CLOEXEC);
if (mountfd < 0) { if (mountfd < 0)
log_error_errno(errno, "Failed to open \"%s\": %m", arg_target); return log_error_errno(errno, "Failed to open \"%s\": %m", arg_target);
return EXIT_FAILURE;
}
r = device_path_make_major_minor(S_IFBLK, devno, &devpath); r = device_path_make_major_minor(S_IFBLK, devno, &devpath);
if (r < 0) { if (r < 0)
log_error_errno(r, "Failed to format device major/minor path: %m"); return log_error_errno(r, "Failed to format device major/minor path: %m");
return EXIT_FAILURE;
}
devfd = open(devpath, O_RDONLY|O_CLOEXEC); devfd = open(devpath, O_RDONLY|O_CLOEXEC);
if (devfd < 0) { if (devfd < 0)
log_error_errno(errno, "Failed to open \"%s\": %m", devpath); return log_error_errno(errno, "Failed to open \"%s\": %m", devpath);
return EXIT_FAILURE;
}
if (ioctl(devfd, BLKBSZGET, &blocksize) != 0) { if (ioctl(devfd, BLKGETSIZE64, &size) != 0)
log_error_errno(errno, "Failed to query block size of \"%s\": %m", devpath); return log_error_errno(errno, "Failed to query size of \"%s\": %m", devpath);
return EXIT_FAILURE;
}
if (ioctl(devfd, BLKGETSIZE64, &size) != 0) {
log_error_errno(errno, "Failed to query size of \"%s\": %m", devpath);
return EXIT_FAILURE;
}
if (size % blocksize != 0)
log_notice("Partition size %"PRIu64" is not a multiple of the blocksize %d,"
" ignoring %"PRIu64" bytes", size, blocksize, size % blocksize);
numblocks = size / blocksize;
if (fstatfs(mountfd, &sfs) < 0) {
log_error_errno(errno, "Failed to stat file system \"%s\": %m", arg_target);
return EXIT_FAILURE;
}
switch(sfs.f_type) {
case EXT4_SUPER_MAGIC:
r = resize_ext4(arg_target, mountfd, devfd, numblocks, blocksize);
break;
case BTRFS_SUPER_MAGIC:
r = resize_btrfs(arg_target, mountfd, devfd, numblocks, blocksize);
break;
default:
log_error("Don't know how to resize fs %llx on \"%s\"",
(long long unsigned) sfs.f_type, arg_target);
return EXIT_FAILURE;
}
log_debug("Resizing \"%s\" to %"PRIu64" bytes...", arg_target, size);
r = resize_fs(mountfd, size, &newsize);
if (r < 0) if (r < 0)
return EXIT_FAILURE; return log_error_errno(r, "Failed to resize \"%s\" to %"PRIu64" bytes: %m",
arg_target, size);
log_info("Successfully resized \"%s\" to %s bytes (%"PRIu64" blocks of %d bytes).", if (newsize == size)
arg_target, format_bytes(fb, sizeof fb, size), numblocks, blocksize); log_info("Successfully resized \"%s\" to %s bytes.",
return EXIT_SUCCESS; arg_target,
format_bytes(fb, sizeof fb, newsize));
else
log_info("Successfully resized \"%s\" to %s bytes (%"PRIu64" bytes lost due to blocksize).",
arg_target,
format_bytes(fb, sizeof fb, newsize),
size - newsize);
return 0;
} }
DEFINE_MAIN_FUNCTION(run);

View File

@ -13,7 +13,7 @@
#include "resize-fs.h" #include "resize-fs.h"
#include "stat-util.h" #include "stat-util.h"
int resize_fs(int fd, uint64_t sz) { int resize_fs(int fd, uint64_t sz, uint64_t *ret_size) {
struct statfs sfs; struct statfs sfs;
int r; int r;
@ -38,6 +38,9 @@ int resize_fs(int fd, uint64_t sz) {
if (ioctl(fd, EXT4_IOC_RESIZE_FS, &u) < 0) if (ioctl(fd, EXT4_IOC_RESIZE_FS, &u) < 0)
return -errno; return -errno;
if (ret_size)
*ret_size = u * sfs.f_bsize;
} else if (is_fs_type(&sfs, BTRFS_SUPER_MAGIC)) { } else if (is_fs_type(&sfs, BTRFS_SUPER_MAGIC)) {
struct btrfs_ioctl_vol_args args = {}; struct btrfs_ioctl_vol_args args = {};
@ -49,12 +52,17 @@ int resize_fs(int fd, uint64_t sz) {
if (sz < BTRFS_MINIMAL_SIZE) if (sz < BTRFS_MINIMAL_SIZE)
return -ERANGE; return -ERANGE;
sz -= sz % sfs.f_bsize;
r = snprintf(args.name, sizeof(args.name), "%" PRIu64, sz); r = snprintf(args.name, sizeof(args.name), "%" PRIu64, sz);
assert((size_t) r < sizeof(args.name)); assert((size_t) r < sizeof(args.name));
if (ioctl(fd, BTRFS_IOC_RESIZE, &args) < 0) if (ioctl(fd, BTRFS_IOC_RESIZE, &args) < 0)
return -errno; return -errno;
if (ret_size)
*ret_size = sz;
} else if (is_fs_type(&sfs, XFS_SB_MAGIC)) { } else if (is_fs_type(&sfs, XFS_SB_MAGIC)) {
xfs_fsop_geom_t geo; xfs_fsop_geom_t geo;
xfs_growfs_data_t d; xfs_growfs_data_t d;
@ -73,6 +81,9 @@ int resize_fs(int fd, uint64_t sz) {
if (ioctl(fd, XFS_IOC_FSGROWFSDATA, &d) < 0) if (ioctl(fd, XFS_IOC_FSGROWFSDATA, &d) < 0)
return -errno; return -errno;
if (ret_size)
*ret_size = d.newblocks * geo.blocksize;
} else } else
return -EOPNOTSUPP; return -EOPNOTSUPP;

View File

@ -5,7 +5,7 @@
#include "stat-util.h" #include "stat-util.h"
int resize_fs(int fd, uint64_t sz); int resize_fs(int fd, uint64_t sz, uint64_t *ret_size);
#define BTRFS_MINIMAL_SIZE (256U*1024U*1024U) #define BTRFS_MINIMAL_SIZE (256U*1024U*1024U)
#define XFS_MINIMAL_SIZE (14U*1024U*1024U) #define XFS_MINIMAL_SIZE (14U*1024U*1024U)

View File

@ -1584,6 +1584,7 @@ assert_cc(SCMP_SYS(shmdt) > 0);
int seccomp_memory_deny_write_execute(void) { int seccomp_memory_deny_write_execute(void) {
uint32_t arch; uint32_t arch;
int r; int r;
int loaded = 0;
SECCOMP_FOREACH_LOCAL_ARCH(arch) { SECCOMP_FOREACH_LOCAL_ARCH(arch) {
_cleanup_(seccomp_releasep) scmp_filter_ctx seccomp = NULL; _cleanup_(seccomp_releasep) scmp_filter_ctx seccomp = NULL;
@ -1593,22 +1594,23 @@ int seccomp_memory_deny_write_execute(void) {
switch (arch) { switch (arch) {
/* Note that on some architectures shmat() isn't available, and the call is multiplexed through ipc().
* We ignore that here, which means there's still a way to get writable/executable
* memory, if an IPC key is mapped like this. That's a pity, but no total loss. */
case SCMP_ARCH_X86: case SCMP_ARCH_X86:
case SCMP_ARCH_S390: case SCMP_ARCH_S390:
filter_syscall = SCMP_SYS(mmap2); filter_syscall = SCMP_SYS(mmap2);
block_syscall = SCMP_SYS(mmap); block_syscall = SCMP_SYS(mmap);
shmat_syscall = SCMP_SYS(shmat); /* shmat multiplexed, see above */
break; break;
case SCMP_ARCH_PPC: case SCMP_ARCH_PPC:
case SCMP_ARCH_PPC64: case SCMP_ARCH_PPC64:
case SCMP_ARCH_PPC64LE: case SCMP_ARCH_PPC64LE:
case SCMP_ARCH_S390X:
filter_syscall = SCMP_SYS(mmap); filter_syscall = SCMP_SYS(mmap);
/* shmat multiplexed, see above */
/* Note that shmat() isn't available, and the call is multiplexed through ipc().
* We ignore that here, which means there's still a way to get writable/executable
* memory, if an IPC key is mapped like this. That's a pity, but no total loss. */
break; break;
case SCMP_ARCH_ARM: case SCMP_ARCH_ARM:
@ -1619,8 +1621,7 @@ int seccomp_memory_deny_write_execute(void) {
case SCMP_ARCH_X86_64: case SCMP_ARCH_X86_64:
case SCMP_ARCH_X32: case SCMP_ARCH_X32:
case SCMP_ARCH_AARCH64: case SCMP_ARCH_AARCH64:
case SCMP_ARCH_S390X: filter_syscall = SCMP_SYS(mmap); /* amd64, x32 and arm64 have only mmap */
filter_syscall = SCMP_SYS(mmap); /* amd64, x32, s390x, and arm64 have only mmap */
shmat_syscall = SCMP_SYS(shmat); shmat_syscall = SCMP_SYS(shmat);
break; break;
@ -1666,7 +1667,7 @@ int seccomp_memory_deny_write_execute(void) {
#endif #endif
if (shmat_syscall > 0) { if (shmat_syscall > 0) {
r = add_seccomp_syscall_filter(seccomp, arch, SCMP_SYS(shmat), r = add_seccomp_syscall_filter(seccomp, arch, shmat_syscall,
1, 1,
SCMP_A2(SCMP_CMP_MASKED_EQ, SHM_EXEC, SHM_EXEC)); SCMP_A2(SCMP_CMP_MASKED_EQ, SHM_EXEC, SHM_EXEC));
if (r < 0) if (r < 0)
@ -1678,9 +1679,13 @@ int seccomp_memory_deny_write_execute(void) {
return r; return r;
if (r < 0) if (r < 0)
log_debug_errno(r, "Failed to install MemoryDenyWriteExecute= rule for architecture %s, skipping: %m", seccomp_arch_to_string(arch)); log_debug_errno(r, "Failed to install MemoryDenyWriteExecute= rule for architecture %s, skipping: %m", seccomp_arch_to_string(arch));
loaded++;
} }
return 0; if (loaded == 0)
log_debug_errno(r, "Failed to install any seccomp rules for MemoryDenyWriteExecute=");
return loaded;
} }
int seccomp_restrict_archs(Set *archs) { int seccomp_restrict_archs(Set *archs) {

View File

@ -535,10 +535,11 @@ static void test_memory_deny_write_execute_mmap(void) {
#if defined(__x86_64__) || defined(__i386__) || defined(__powerpc64__) || defined(__arm__) || defined(__aarch64__) #if defined(__x86_64__) || defined(__i386__) || defined(__powerpc64__) || defined(__arm__) || defined(__aarch64__)
assert_se(p == MAP_FAILED); assert_se(p == MAP_FAILED);
assert_se(errno == EPERM); assert_se(errno == EPERM);
#else /* unknown architectures */
assert_se(p != MAP_FAILED);
assert_se(munmap(p, page_size()) >= 0);
#endif #endif
/* Depending on kernel, libseccomp, and glibc versions, other architectures
* might fail or not. Let's not assert success. */
if (p != MAP_FAILED)
assert_se(munmap(p, page_size()) == 0);
p = mmap(NULL, page_size(), PROT_WRITE|PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1,0); p = mmap(NULL, page_size(), PROT_WRITE|PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1,0);
assert_se(p != MAP_FAILED); assert_se(p != MAP_FAILED);