Recently Amazon.com introduced s2n as a new TLS implementation. The idea is to have a small and simplified TLS library.

Looking at it I noticed it’s very Linux centric. It cannot be compiled on Windows. There are patches to make it work on OS X. There is a report that it works on FreeBSD but I didn’t look closely at it to determine if patches were necessary. Amazon is positioning s2n as a replacement for OpenSSL but it can’t work in nearly as many places as OpenSSl. I get the feeling this is partly whey they were able to keep the code base so small compared to OpenSSL.

One of the aspects of the library that I was very interested in was the statement, “s2n uses operating system features to protect data from being swapped to disk or appearing in core dumps.” Preventing memory from being swapped and showing in core dumps is a really nice feature.

I decided to look into how they implemented this memory hardening. They’re using mlock and madvise. Incorrectly. In a way that can cause segfaults/aborts. Oh and not all of the protections they claim are actually enabled.

Using mlock

mlock prevents a memory segment from being written to on disk swap space.

The issue with mlock is limits set by the OS. RLIMIT_MEMLOCK (ulimit -l) limits the amount of memory that can be locked. munlock must be used (before or after, testing showed it didn’t matter) to reduce the locked memory amount. munmap should implicitly unlock the memory as well but in testing a simple free did not cause the memory to be unlocked.

munlock is not enough to avoid hitting the limit. In simple / small applications or test cases, it would function fine. However, a larger application which uses more memory will fail. Once the lock limit is reached an out of memory error will be returned.

On Ubuntu 14.04.2 the default RLIMIT_MEMLOCK is 64K. On some versions of Debian is was found to be 32K. This limit will quickly be reached by a non-trivial application.

Configuring the system to have a larger limit or making the limit unlimited may not alleviate this issue. For example, FreeBSD allows mlock use to be restricted to the user-user only.

Further, Requiring system configuration to use a general purpose library is unacceptable.

Test Case

#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
#include <unistd.h>
#include <sys/mman.h>

/* Demonstration of using mlock.
 *
 * Shows mlock limits can cause an out of memory condition.
 *
 * gcc mlock_fail.c -o mlock_fail
 */

int main(int argc, char **argv)
{
    char         *tmp;
    size_t        i;
    const size_t  size = 32;

    for (i=0; i<10000; i++) {
        tmp = malloc(size);

        if (tmp == NULL) {
            printf("%zu: allocation failedn", i);
            return 2;
        }

        if (mlock(tmp, size) != 0) {
            printf("%zu: mlock failed: %sn", i, strerror(errno));
            return 3;
        }
    }

    return 0;
}

Running the test

$ ./mlock_fail
1281: mlock failed: Cannot allocate memory

Failure after the limit was reached.

$ valgrind ./mlock_fail
641: mlock failed: Cannot allocate memory
==6137==
==6137== HEAP SUMMARY:
==6137==     in use at exit: 20,544 bytes in 642 blocks
==6137==   total heap usage: 642 allocs, 0 frees, 20,544 bytes allocated

The number of allocs with Valgrind is lower than without. Valgrind does do some memory tracking to function. Due to this it’s not surprising that it is lower. About half the allocs before hitting the max.

Bug report.

Noted in the bug report this issue is exacerbated in s2n because munlock is never used.

Using madvise With MADV_DONTDUMP

This is used to prevent marked memory from being in a core dump.

On Linux madvise requires the memory to be page-aligned. If the memory is not page-aligned madvise will return a failure with errno EINVAL. Page-alignment can easily cause the application to run out of address space.

To see the page size use the following:

$ getconf PAGESIZE
4096

In this (and many) cases we have a 4096 byte boundary. Meaning the address of the allocated data must be the address of a page boundary. There is 4K between each boundary. A large amount of data can be allocated there but if a small amount of data is allocated then there is a large amount of unusable space due to the next allocation needing to also be on a 4K boundary.

Take the following allocations:

  • 8 bytes page-aligned.
  • 4 bytes page-aligned.

Assuming One and Two are allocated next to each other. One allocates 8 bytes. Two will be aligned to the 4K boundary after One. A total of 8K of memory is reserved due to this. Only 12 bytes are actually needed but 8K is reserved. Since memory is now aligned in 4K blocks the total available memory space is greatly reduced. Not the amount of memory per say but the amount of possible allocations. If you were to allocate 4K every time you would be able to use the full amount of memory. The issue is when less than 4K (say a few bytes) needs to be allocated.

Test Case

#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
#include <unistd.h>
#include <sys/mman.h>

/* Demonstration of using madvise.
 *
 * 1. Shows how using madvise on non page-aligned memory will fail.
 * 2. Shows how using madvise on page-aligned memory will succeed but
 *    reserve the alignment amount after the requested size making
 *    subsequent allocations start at the next page.
 * 3. Shows where subsequent allocations would start if not using
 *    page-alignment or madvise.
 *
 * gcc madvise_fail.c -o madvise_fail
 */

static void print_usage(const char *name)
{
    printf("usage: %s option [print_ptr_address]n", name);
    printf("options:n");
    printf("t1 = no page-alignment, madvise:n");
    printf("t2 = page-alignment, madvise:n");
    printf("t1 = no pate-alignment, no madvise:n");
    printf("print_ptr_address: Will print the pointer address after allocation (and madvise if in usen");
}

int main(int argc, char **argv)
{
    char         *tmp;
    size_t        i;
    int           flag      = 0;
    int           print_ptr = 0;
    const size_t  size      = 8;
    size_t        page_size;

    if (argc < 2) {
        print_usage(argv[0]);
        return 1;
    }

    if (strcasecmp(argv[1], "1") == 0) {
        flag = 1;
    } else if (strcasecmp(argv[1], "2") == 0) {
        flag = 2;
    } else if (strcasecmp(argv[1], "3") == 0) {
        flag = 3;
    } else {
        print_usage(argv[0]);
        return 1;
    }

    if (argc >= 3)
        print_ptr = 1;

    page_size = sysconf(_SC_PAGESIZE);
    printf("page_size=%zun", page_size);
    for (i=0; i<10000; i++) {
        if (flag == 2) {
            posix_memalign((void **)&tmp, page_size, size);
        } else {
            tmp = malloc(size);
        }

        if (tmp == NULL) {
            printf("%zu: allocation failedn", i);
            return 2;
        }

        if (flag == 1 || flag == 2) {
            if (madvise(tmp, size, MADV_DONTDUMP) != 0) {
                printf("%zu: madvise failed: %sn", i, strerror(errno));
                return 3;
            }
        }

        if (print_ptr) {
            printf("%pn", tmp);
        }
    }

    return 0;
}

Test 1: no page-alignment, madvise

$ ./madvise_fail 1
page_size=4096
0: madvise failed: Invalid argument

Immediate failure. On Linux per the documentation madvise must have page-aligned memory.

Test 2: page-alignment, madvise

$ valgrind ./madvise_fail 2
page_size=4096
==5899==
==5899== HEAP SUMMARY:
==5899==     in use at exit: 80,000 bytes in 10,000 blocks
==5899==   total heap usage: 10,000 allocs, 0 frees, 80,000 bytes allocated
[/code]

[code lang="bash"]
$ ./madvise_fail 2 1
page_size=4096
...
0x2dad000
0x2dae000
0x2daf000
0x2db0000
0x2db1000
0x2db2000
0x2db3000
0x2db4000

All are 4096 bytes apart.

Test 2: no page-alignment, no madvise

$ valgrind ./madvise_fail 3
page_size=4096
==5900==
==5900== HEAP SUMMARY:
==5900==     in use at exit: 80,000 bytes in 10,000 blocks
==5900==   total heap usage: 10,000 allocs, 0 frees, 80,000 bytes allocated
[/code]

[code]
$ ./madvise_fail 3 1
page_size=4096
...
0x20cb110
0x20cb130
0x20cb150
0x20cb170
0x20cb190
0x20cb1b0
0x20cb1d0
0x20cb1f0

All are 32 bytes

  1. Using non page-aligned memory you get a failure.
  2. Using page-aligned memory you get success but memory pointers are separated by 4K bytes (page-alignment on my system) blocks. Everything within the 4K block, between the end of the requested allocation and the end of the block, is unusable.
  3. Not using page-alignment and not using madvise succeeds with only 32 bytes between each pointer.

Valgrind on 2 and 3 shows 80,000 bytes in 10,000 blocks. However, the size of the block is what matters. 4K byte blocks will run out faster than 32 byte blocks. On an embedded system (phone with 512 MB memory or router with 16 - 64 MB of memory) address space for allocations will be depleted very quickly.

Bug report.

Noted in the bug report is the implementation of madvise in s2n does not work at all.

  1. The madvise check prevents the madvise code from building. This is why no errors are detected.
  2. The memory passed to madvise is not page-aligned.
  3. The error code returned if madvise S2N_ERR_MADVISE isn’t declared. This would cause build failure.

Conclusion

s2n claims memory protection using the above but it either won’t work in the real world or isn’t even enabled.

s2n claims, “To date there have been two external code-level reviews of s2n, including one by a commercial security vendor.” I’d expect this level of review have memory hardening severely scrutinized. I have no idea how these issues could have been over looked. Especially the glaringly obvious madvise section isn’t and cannot compile. If I were them I’d ask the commercial security vendor for my money back.

To me this looks like a case where someone read about some security functions but didn’t really understand them.

The biggest issue here is the fact that mlock and madvise are both used for all memory allocations. They are not meant for this type of use. They can really only be used for securing small portions of memory where it is critical to do so because the data is known to be sensitive. Otherwise the above problems will be encountered.