Introduction

Awhile back I wrote a constant time string comparison function. I briefly mentioned how the compiler can optimize away some of what makes the function constant time. Specifically, the k++ counter used to balance the increment when the forward scan of s2 stops. The volatile keyword was used in the code to prevent key operations from being removed.

What I neglected to go into was showing that volatile works to keep the function constant time. So, let’s look into this now.

Test App

We’ll use the comparison function from the constant time string comparison post, and this slimmed down test app.

int main(int argc, char **argv)
{
    printf("%s\n", str_iseq("ABCDEFGHIJKLMNOPQRSTUVWXYZ", "123")?"true":"false");
    return 0;
}

How to Inspect the Compilation

First, let’s talk about how we can validate that volatile is preventing optimizations. If we compile the test app we can inspect the .o (object) file and simply validate the k++ is still present. To do this we will use the objdump application. If you’re using Linux you’ll want to invoke it like this: objdump -d -M intel -S main.o. On macOS you’ll use: objdump -source main.o.

Also, when we compile the test app we will need to compile with debug symbols (-g). Otherwise, we won’t be able to see the source lines that belong to each assembly block. While not strictly necessary, this makes it easier to see what assembly corresponds to which block of C code.

What we want to look for in the objdump output is the line k++. If there is assembly code after that line, then it’s there and will be run. If k++ isn’t in the output or if it is but there isn’t any assembly after, then it’s been optimized out. That said, it’s a bit more complex than this because optimizations might reorder some operations so they might not line up exactly with the source output. At least I’ve seen this with objdump output on Linux.

Inspecting the Compilation

All we want to validate is the volatile keyword is working properly. To do this, we’ll compile with -O1 once without the keyword and once with. That should be enough to show us that it’s doing its job. We could use any optimization level we want but I’ve found that without volatile, k++ is optimized out as soon as optimizations are enabled. -O1 provides the easiest output to read so that’s why I’m using it now.

Now that we know what we’re doing, we can start looking at some compiler output.

Without Volatile

$ clang -g -O1 -c main.c
$ objdump -source main.o 

main.o:    file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
_str_iseq:
; {
       0:    55               pushq    %rbp
       1:    48 89 e5         movq     %rsp, %rbp
       4:    31 c0            xorl     %eax, %eax
; if (s1 == NULL || s2 == NULL)
       6:    48 85 ff         testq    %rdi, %rdi
       9:    74 4e            je       78 <_str_iseq+0x59>
       b:    48 85 f6         testq    %rsi, %rsi
       e:    74 49            je       73 <_str_iseq+0x59>
; m |= s1[i]^s2[j];
      10:    8a 17             movb    (%rdi), %dl
      12:    44 8a 06          movb    (%rsi), %r8b
      15:    44 89 c0          movl    %r8d, %eax
      18:    30 d0             xorb    %dl, %al
      1a:    0f be c0          movsbl  %al, %eax
; if (s1[i] == '\0')
      1d:    84 d2             testb    %dl, %dl
      1f:    74 33             je       51 <_str_iseq+0x54>
; i++;
      21:    48 ff c7          incq     %rdi
      24:    31 d2             xorl     %edx, %edx
      26:    66 2e 0f 1f 84 00 00 00 00 00     nopw    %cs:(%rax,%rax)
; if (s2[j] != '\0')
      30:    41 80 f8 01        cmpb    $1, %r8b
      34:    48 83 da ff        sbbq    $-1, %rdx
; m |= s1[i]^s2[j];
      38:    44 0f b6 0f        movzbl  (%rdi), %r9d
      3c:    44 0f b6 04 16     movzbl  (%rsi,%rdx), %r8d
      41:    44 89 c1           movl    %r8d, %ecx
      44:    44 30 c9           xorb    %r9b, %cl
      47:    0f be c9           movsbl  %cl, %ecx
      4a:    09 c8              orl     %ecx, %eax
; if (s1[i] == '\0')
      4c:    48 ff c7           incq    %rdi
      4f:    45 84 c9           testb   %r9b, %r9b
      52:    75 dc              jne     -36 <_str_iseq+0x30>
; return m == 0;
      54:    85 c0              testl   %eax, %eax
      56:    0f 94 c0           sete    %al
; }

Looking at this, we can see that there is no assembly that corresponds to k++. So, we’ve confirmed that without volatile, the if (s2[j] == '\0') check and k++ are both missing from this compilation. No volatile, no balanced operations.

With Volatile

$ clang -g -O1 -c main.c
$ objdump -source main.o

main.o:    file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
_str_iseq:
; {
       0:    55                       pushq   %rbp
       1:    48 89 e5                 movq    %rsp, %rbp
; volatile size_t i = 0;
       4:    48 c7 45 f0 00 00 00 00  movq    $0, -16(%rbp)
; volatile size_t j = 0;
       c:    48 c7 45 f8 00 00 00 00  movq    $0, -8(%rbp)
; volatile size_t k = 0;
      14:    48 c7 45 e8 00 00 00 00  movq    $0, -24(%rbp)
      1c:    31 c0                    xorl    %eax, %eax
; if (s1 == NULL || s2 == NULL)
      1e:    48 85 ff                 testq   %rdi, %rdi
      21:    74 62                    je      98 <_str_iseq+0x85>
      23:    48 85 f6                 testq   %rsi, %rsi
      26:    74 5d                    je      93 <_str_iseq+0x85>
; m |= s1[i]^s2[j];
      28:    48 8b 45 f0              movq    -16(%rbp), %rax
      2c:    8a 04 07                 movb    (%rdi,%rax), %al
      2f:    48 8b 4d f8              movq    -8(%rbp), %rcx
      33:    32 04 0e                 xorb    (%rsi,%rcx), %al
      36:    0f be c0                 movsbl  %al, %eax
      39:    eb 19                    jmp     25 <_str_iseq+0x54>
      3b:    0f 1f 44 00 00           nopl    (%rax,%rax)
      40:    48 8b 4d f0              movq    -16(%rbp), %rcx
      44:    0f b6 0c 0f              movzbl  (%rdi,%rcx), %ecx
      48:    48 8b 55 f8              movq    -8(%rbp), %rdx
      4c:    32 0c 16                 xorb    (%rsi,%rdx), %cl
      4f:    0f be c9                 movsbl  %cl, %ecx
      52:    09 c8                    orl     %ecx, %eax
; if (s1[i] == '\0')
      54:    48 8b 4d f0              movq    -16(%rbp), %rcx
      58:    80 3c 0f 00              cmpb    $0, (%rdi,%rcx)
      5c:    74 22                    je      34 <_str_iseq+0x80>
; i++;
      5e:    48 ff 45 f0              incq    -16(%rbp)
; if (s2[j] != '\0')
      62:    48 8b 4d f8              movq    -8(%rbp), %rcx
      66:    80 3c 0e 00              cmpb    $0, (%rsi,%rcx)
      6a:    74 04                    je      4 <_str_iseq+0x70>
; j++;
      6c:    48 ff 45 f8              incq    -8(%rbp)
; if (s2[j] == '\0')
      70:    48 8b 4d f8              movq    -8(%rbp), %rcx
      74:    80 3c 0e 00              cmpb    $0, (%rsi,%rcx)
      78:    75 c6                    jne     -58 <_str_iseq+0x40>
; k++;
      7a:    48 ff 45 e8              incq    -24(%rbp)
      7e:    eb c0                    jmp     -64 <_str_iseq+0x40>
; return m == 0;
      80:    85 c0                    testl   %eax, %eax
      82:    0f 94 c0                 sete    %al
; }

Even though -O1 did some level of optimizations, it did not optimize out the important parts of the algorithm. As we’d expect from volatile, if (s2[j] == '\0') and k++ are still present in the assembly. The operations are still balanced keeping str_iseq operating in constant time.