Recently Amazon.com introduced s2n as a new TLS implementation. The idea is to have a small and simplified TLS library.
Looking at it I noticed it’s very Linux centric. It cannot be compiled on Windows. There are patches to make it work on OS X. There is a report that it works on FreeBSD but I didn’t look closely at it to determine if patches were necessary. Amazon is positioning s2n as a replacement for OpenSSL but it can’t work in nearly as many places as OpenSSl. I get the feeling this is partly whey they were able to keep the code base so small compared to OpenSSL.
One of the aspects of the library that I was very interested in was the statement, “s2n uses operating system features to protect data from being swapped to disk or appearing in core dumps.” Preventing memory from being swapped and showing in core dumps is a really nice feature.
I decided to look into how they implemented this memory hardening. They’re using
mlock
and madvise
. Incorrectly. In a way that can cause segfaults/aborts. Oh
and not all of the protections they claim are actually enabled.
Using mlock
mlock
prevents a memory segment from being written to on disk swap space.
The issue with mloc
k is limits set by the OS. RLIMIT_MEMLOCK
(ulimit -l) limits
the amount of memory that can be locked. munlock
must be used (before or after,
testing showed it didn’t matter) to reduce the locked memory amount. munmap
should implicitly unlock the memory as well but in testing a simple free did
not cause the memory to be unlocked.
munlock
is not enough to avoid hitting the limit. In simple / small
applications or test cases, it would function fine. However, a larger
application which uses more memory will fail. Once the lock limit is reached an
out of memory error will be returned.
On Ubuntu 14.04.2 the default RLIMIT_MEMLOCK
is 64K. On some versions of Debian
is was found to be 32K. This limit will quickly be reached by a non-trivial
application.
Configuring the system to have a larger limit or making the limit unlimited may
not alleviate this issue. For example, FreeBSD allows mlock
use to be
restricted to the user-user only.
Further, Requiring system configuration to use a general purpose library is unacceptable.
Test Case
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
#include <unistd.h>
#include <sys/mman.h>
/* Demonstration of using mlock.
*
* Shows mlock limits can cause an out of memory condition.
*
* gcc mlock_fail.c -o mlock_fail
*/
int main(int argc, char **argv)
{
char *tmp;
size_t i;
const size_t size = 32;
for (i=0; i<10000; i++) {
tmp = malloc(size);
if (tmp == NULL) {
printf("%zu: allocation failedn", i);
return 2;
}
if (mlock(tmp, size) != 0) {
printf("%zu: mlock failed: %sn", i, strerror(errno));
return 3;
}
}
return 0;
}
Running the test
$ ./mlock_fail
1281: mlock failed: Cannot allocate memory
Failure after the limit was reached.
$ valgrind ./mlock_fail
641: mlock failed: Cannot allocate memory
==6137==
==6137== HEAP SUMMARY:
==6137== in use at exit: 20,544 bytes in 642 blocks
==6137== total heap usage: 642 allocs, 0 frees, 20,544 bytes allocated
The number of allocs with Valgrind is lower than without. Valgrind does do some memory tracking to function. Due to this it’s not surprising that it is lower. About half the allocs before hitting the max.
Noted in the bug report this issue is exacerbated in s2n because munlock
is
never used.
Using madvise With MADV_DONTDUMP
This is used to prevent marked memory from being in a core dump.
On Linux madvise
requires the memory to be page-aligned. If the memory is not
page-aligned madvise
will return a failure with errno
EINVAL
. Page-alignment
can easily cause the application to run out of address space.
To see the page size use the following:
$ getconf PAGESIZE
4096
In this (and many) cases we have a 4096 byte boundary. Meaning the address of the allocated data must be the address of a page boundary. There is 4K between each boundary. A large amount of data can be allocated there but if a small amount of data is allocated then there is a large amount of unusable space due to the next allocation needing to also be on a 4K boundary.
Take the following allocations:
- 8 bytes page-aligned.
- 4 bytes page-aligned.
Assuming One and Two are allocated next to each other. One allocates 8 bytes. Two will be aligned to the 4K boundary after One. A total of 8K of memory is reserved due to this. Only 12 bytes are actually needed but 8K is reserved. Since memory is now aligned in 4K blocks the total available memory space is greatly reduced. Not the amount of memory per say but the amount of possible allocations. If you were to allocate 4K every time you would be able to use the full amount of memory. The issue is when less than 4K (say a few bytes) needs to be allocated.
Test Case
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
#include <unistd.h>
#include <sys/mman.h>
/* Demonstration of using madvise.
*
* 1. Shows how using madvise on non page-aligned memory will fail.
* 2. Shows how using madvise on page-aligned memory will succeed but
* reserve the alignment amount after the requested size making
* subsequent allocations start at the next page.
* 3. Shows where subsequent allocations would start if not using
* page-alignment or madvise.
*
* gcc madvise_fail.c -o madvise_fail
*/
static void print_usage(const char *name)
{
printf("usage: %s option [print_ptr_address]n", name);
printf("options:n");
printf("t1 = no page-alignment, madvise:n");
printf("t2 = page-alignment, madvise:n");
printf("t1 = no pate-alignment, no madvise:n");
printf("print_ptr_address: Will print the pointer address after allocation (and madvise if in usen");
}
int main(int argc, char **argv)
{
char *tmp;
size_t i;
int flag = 0;
int print_ptr = 0;
const size_t size = 8;
size_t page_size;
if (argc < 2) {
print_usage(argv[0]);
return 1;
}
if (strcasecmp(argv[1], "1") == 0) {
flag = 1;
} else if (strcasecmp(argv[1], "2") == 0) {
flag = 2;
} else if (strcasecmp(argv[1], "3") == 0) {
flag = 3;
} else {
print_usage(argv[0]);
return 1;
}
if (argc >= 3)
print_ptr = 1;
page_size = sysconf(_SC_PAGESIZE);
printf("page_size=%zun", page_size);
for (i=0; i<10000; i++) {
if (flag == 2) {
posix_memalign((void **)&tmp, page_size, size);
} else {
tmp = malloc(size);
}
if (tmp == NULL) {
printf("%zu: allocation failedn", i);
return 2;
}
if (flag == 1 || flag == 2) {
if (madvise(tmp, size, MADV_DONTDUMP) != 0) {
printf("%zu: madvise failed: %sn", i, strerror(errno));
return 3;
}
}
if (print_ptr) {
printf("%pn", tmp);
}
}
return 0;
}
Test 1: no page-alignment, madvise
$ ./madvise_fail 1
page_size=4096
0: madvise failed: Invalid argument
Immediate failure. On Linux per the
documentation madvise
must have page-aligned memory.
Test 2: page-alignment, madvise
$ valgrind ./madvise_fail 2
page_size=4096
==5899==
==5899== HEAP SUMMARY:
==5899== in use at exit: 80,000 bytes in 10,000 blocks
==5899== total heap usage: 10,000 allocs, 0 frees, 80,000 bytes allocated
[/code]
[code lang="bash"]
$ ./madvise_fail 2 1
page_size=4096
...
0x2dad000
0x2dae000
0x2daf000
0x2db0000
0x2db1000
0x2db2000
0x2db3000
0x2db4000
All are 4096 bytes apart.
Test 2: no page-alignment, no madvise
$ valgrind ./madvise_fail 3
page_size=4096
==5900==
==5900== HEAP SUMMARY:
==5900== in use at exit: 80,000 bytes in 10,000 blocks
==5900== total heap usage: 10,000 allocs, 0 frees, 80,000 bytes allocated
[/code]
[code]
$ ./madvise_fail 3 1
page_size=4096
...
0x20cb110
0x20cb130
0x20cb150
0x20cb170
0x20cb190
0x20cb1b0
0x20cb1d0
0x20cb1f0
All are 32 bytes
- Using non page-aligned memory you get a failure.
- Using page-aligned memory you get success but memory pointers are separated by 4K bytes (page-alignment on my system) blocks. Everything within the 4K block, between the end of the requested allocation and the end of the block, is unusable.
- Not using page-alignment and not using
madvise
succeeds with only 32 bytes between each pointer.
Valgrind on 2 and 3 shows 80,000 bytes in 10,000 blocks. However, the size of the block is what matters. 4K byte blocks will run out faster than 32 byte blocks. On an embedded system (phone with 512 MB memory or router with 16 - 64 MB of memory) address space for allocations will be depleted very quickly.
Noted in the bug report is the implementation of madvise
in s2n does not work at all.
- The
madvise
check prevents themadvise
code from building. This is why no errors are detected. - The memory passed to
madvise
is not page-aligned. - The error code returned if
madvise
S2N_ERR_MADVISE
isn’t declared. This would cause build failure.
Conclusion
s2n claims memory protection using the above but it either won’t work in the real world or isn’t even enabled.
s2n claims, “To date there have been two external code-level reviews of s2n,
including one by a commercial security vendor.” I’d expect this level of review
have memory hardening severely scrutinized. I have no idea how these issues
could have been over looked. Especially the glaringly obvious madvise
section
isn’t and cannot compile. If I were them I’d ask the commercial security vendor
for my money back.
To me this looks like a case where someone read about some security functions but didn’t really understand them.
The biggest issue here is the fact that mlock
and madvise
are both used for all
memory allocations. They are not meant for this type of use. They can really
only be used for securing small portions of memory where it is critical to do
so because the data is known to be sensitive. Otherwise the above problems will
be encountered.