Constant Time String Comparison in C

Comparing strings in C is typically handled with strncmp. This is fine in most cases but if you need to compare sensitive information, such as a message digest, it’s a really bad choice. strncmp is susceptible to timing attacks because it will stop comparing once the first difference is encountered.

The overall design of constant time comparison is pretty well known. The OR XOR combo has been reviewed and vetted by crypto researchers. At least I hope it has… It’s used by a number of large security focused projects so… Well, my focus is on how to implement it specifically with strings in C. Most of the C versions I’ve seen, assume bytes and take the comparison length as an argument. Since I care about strings I want the length calculation handled securely by the function itself.

Here’s the code, so lets go into detail.

str_iseq.c

#include <stdbool.h>
#include <stdlib.h>

bool str_iseq(const char *s1, const char *s2)
{
    int             m = 0;
    volatile size_t i = 0;
    volatile size_t j = 0;
    volatile size_t k = 0;

    if (s1 == NULL || s2 == NULL)
        return false;

    while (1) {
        m |= s1[i]^s2[j];

        if (s1[i] == '\0')
            break;
        i++;

        if (s2[j] != '\0')
            j++;
        if (s2[j] == '\0')
            k++;
    }

    return m == 0;
}

One of the big things I want to avoid is leaking the length of the known string. I’m treating s1 as the string provided by the user/attacker/outside and s2 and the internal, known string that’s being checked for a match.

You might think the first thing we should do is check if the lengths match but this is a bad idea. This will leak the length because an attacker can start with 1 character and keep adding characters until the comparison starts. Not to mention, we’d need to get the lengths using strlen and that will leak the length because strlen is a O(n) operation. Instead we’ll make the while loop doing the comparison length aware.

I have a NULL input check which does leak length but you really shouldn’t be trying to compare NULL in the first place… Additionally I could check if the two pointers are the same but this is a constant time comparison function after all. So we’ll still compare them.

if (s1[i] == '\0')
    break;

We’re always going to go through every character in s1 no matter the length. Other than the strings not matching, the only other thing an attacker should be able to learn is the length of s1. But that’s not a problem because they passed in s1 to begin with. You’d expect them to know the length of their input.

This check and break happens after the comparison to ensure the trailing NULL is part of the comparison. We absolutely need to compare the NULL terminator in order to prevent a sub string match. Even if the s1 is shorter than s2 a match won’t be generated because the NULL in s1 won’t match the character in s2.

if (s2[j] != '\0')
    j++;
if (s2[j] == '\0')
    k++;

This is the key to not overrun s2 if it’s shorter than s1. As long as we haven’t hit the end of s2 we increment j to move forward in the comparison. If we’ve hit the end of s2 we switch to incrementing k. We’ll keep comparing the NULL terminator in s2 to the remaining characters in s1. The only reason for k is to keep operations balanced. What I mean is each iteration of the loop we’re incrementing j to move forward in s2. If we stop incrementing j the timing of the function changes. k ensures there is always an increment operation in every iteration of the loop.

There are two if statements instead of an if else for a very good reason. Modern processors, as well as compilers, will often optimize for the true branch. Again, this leads to unbalanced operations and a difference in timing. By making it two separate checks we’re, hopefully, preventing it from being optimized. The volatile keyword is used to help prevent the k++ from being optimized out.

m |= s1[i]^s2[j];

This is the “industry” standard comparison algorithm I eluded to earlier and not something I came up with. It works by XORing the two characters together (obviously). If they’re the same it will produce 0. If they’re different it will produce something non-zero. That’s saved using an OR. What ends up happening is m starts as 0 and if every character matches, ORing 0 with 0 will keep it as 0. As soon as there isn’t a match m become non-zero. Using an OR means m can never go back to zero once there isn’t a match.

test.c

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    char *a = strdup("ABC");
    char *b = strdup("ABC");

    printf("%s\n", str_iseq(a, b)?"true":"false");
    printf("%s\n", str_iseq("ABC", "AB")?"true":"false");
    printf("%s\n", str_iseq("AB", "ABC")?"true":"false");

    free(a);
    free(b);
    return 0;
}

This very simple test app has two allocated strings, a and b to be 100% sure we’re comparing two different strings. When compiled any constants will be put into the executable once and referenced where needed. Which means that if we set a and b to a constant “ABC” then they would point to literally the same data. This isn’t a huge problem because str_iseq always does a comparison even if the pointers match but if a pointer check were added we might get into a situation where the comparison is skipped during testing. So, allocated strings are used to be 100% sure two different strings with the same characters are compared.

The last two checks are to demonstrate that substrings won’t cause a match. It’s not the most exhaustive set of tests but it’s a good enough demonstration.