Introduction
Awhile back I wrote a constant time string comparison function.
I briefly mentioned how the compiler can optimize away some of what makes
the function constant time. Specifically, the k++
counter used to balance
the increment when the forward scan of s2
stops. The volatile
keyword was used in
the code to prevent key operations from being removed.
What I neglected to go into was showing that volatile
works to keep
the function constant time. So, let’s look into this now.
Test App
We’ll use the comparison function from the constant time string comparison post, and this slimmed down test app.
int main(int argc, char **argv)
{
printf("%s\n", str_iseq("ABCDEFGHIJKLMNOPQRSTUVWXYZ", "123")?"true":"false");
return 0;
}
How to Inspect the Compilation
First, let’s talk about how we can validate that volatile
is preventing optimizations.
If we compile the test app we can inspect the .o (object) file and simply
validate the k++
is still present. To do this we will use the objdump
application. If you’re using Linux you’ll want to invoke it like this:
objdump -d -M intel -S main.o
. On macOS you’ll use: objdump -source main.o
.
Also, when we compile the test app we will need to compile with debug symbols
(-g
). Otherwise, we won’t be able to see the source lines that belong
to each assembly block. While not strictly necessary, this makes it easier
to see what assembly corresponds to which block of C code.
What we want to look for in the objdump
output is the line k++
. If there
is assembly code after that line, then it’s there and will be run. If k++
isn’t in the output or if it is but there isn’t any assembly after, then
it’s been optimized out. That said, it’s a bit more complex than this because
optimizations might reorder some operations so they might not line up exactly
with the source output. At least I’ve seen this with objdump
output on Linux.
Inspecting the Compilation
All we want to validate is the volatile
keyword is working properly. To
do this, we’ll compile with -O1
once without the keyword and once with.
That should be enough to show us that it’s doing its job. We could use any
optimization level we want but I’ve found that without volatile
, k++
is
optimized out as soon as optimizations are enabled. -O1
provides the easiest
output to read so that’s why I’m using it now.
Now that we know what we’re doing, we can start looking at some compiler output.
Without Volatile
$ clang -g -O1 -c main.c
$ objdump -source main.o
main.o: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
_str_iseq:
; {
0: 55 pushq %rbp
1: 48 89 e5 movq %rsp, %rbp
4: 31 c0 xorl %eax, %eax
; if (s1 == NULL || s2 == NULL)
6: 48 85 ff testq %rdi, %rdi
9: 74 4e je 78 <_str_iseq+0x59>
b: 48 85 f6 testq %rsi, %rsi
e: 74 49 je 73 <_str_iseq+0x59>
; m |= s1[i]^s2[j];
10: 8a 17 movb (%rdi), %dl
12: 44 8a 06 movb (%rsi), %r8b
15: 44 89 c0 movl %r8d, %eax
18: 30 d0 xorb %dl, %al
1a: 0f be c0 movsbl %al, %eax
; if (s1[i] == '\0')
1d: 84 d2 testb %dl, %dl
1f: 74 33 je 51 <_str_iseq+0x54>
; i++;
21: 48 ff c7 incq %rdi
24: 31 d2 xorl %edx, %edx
26: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
; if (s2[j] != '\0')
30: 41 80 f8 01 cmpb $1, %r8b
34: 48 83 da ff sbbq $-1, %rdx
; m |= s1[i]^s2[j];
38: 44 0f b6 0f movzbl (%rdi), %r9d
3c: 44 0f b6 04 16 movzbl (%rsi,%rdx), %r8d
41: 44 89 c1 movl %r8d, %ecx
44: 44 30 c9 xorb %r9b, %cl
47: 0f be c9 movsbl %cl, %ecx
4a: 09 c8 orl %ecx, %eax
; if (s1[i] == '\0')
4c: 48 ff c7 incq %rdi
4f: 45 84 c9 testb %r9b, %r9b
52: 75 dc jne -36 <_str_iseq+0x30>
; return m == 0;
54: 85 c0 testl %eax, %eax
56: 0f 94 c0 sete %al
; }
Looking at this, we can see that there is no assembly that corresponds to k++
.
So, we’ve confirmed that without volatile
, the if (s2[j] == '\0')
check
and k++
are both missing from this compilation. No volatile
, no balanced
operations.
With Volatile
$ clang -g -O1 -c main.c
$ objdump -source main.o
main.o: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
_str_iseq:
; {
0: 55 pushq %rbp
1: 48 89 e5 movq %rsp, %rbp
; volatile size_t i = 0;
4: 48 c7 45 f0 00 00 00 00 movq $0, -16(%rbp)
; volatile size_t j = 0;
c: 48 c7 45 f8 00 00 00 00 movq $0, -8(%rbp)
; volatile size_t k = 0;
14: 48 c7 45 e8 00 00 00 00 movq $0, -24(%rbp)
1c: 31 c0 xorl %eax, %eax
; if (s1 == NULL || s2 == NULL)
1e: 48 85 ff testq %rdi, %rdi
21: 74 62 je 98 <_str_iseq+0x85>
23: 48 85 f6 testq %rsi, %rsi
26: 74 5d je 93 <_str_iseq+0x85>
; m |= s1[i]^s2[j];
28: 48 8b 45 f0 movq -16(%rbp), %rax
2c: 8a 04 07 movb (%rdi,%rax), %al
2f: 48 8b 4d f8 movq -8(%rbp), %rcx
33: 32 04 0e xorb (%rsi,%rcx), %al
36: 0f be c0 movsbl %al, %eax
39: eb 19 jmp 25 <_str_iseq+0x54>
3b: 0f 1f 44 00 00 nopl (%rax,%rax)
40: 48 8b 4d f0 movq -16(%rbp), %rcx
44: 0f b6 0c 0f movzbl (%rdi,%rcx), %ecx
48: 48 8b 55 f8 movq -8(%rbp), %rdx
4c: 32 0c 16 xorb (%rsi,%rdx), %cl
4f: 0f be c9 movsbl %cl, %ecx
52: 09 c8 orl %ecx, %eax
; if (s1[i] == '\0')
54: 48 8b 4d f0 movq -16(%rbp), %rcx
58: 80 3c 0f 00 cmpb $0, (%rdi,%rcx)
5c: 74 22 je 34 <_str_iseq+0x80>
; i++;
5e: 48 ff 45 f0 incq -16(%rbp)
; if (s2[j] != '\0')
62: 48 8b 4d f8 movq -8(%rbp), %rcx
66: 80 3c 0e 00 cmpb $0, (%rsi,%rcx)
6a: 74 04 je 4 <_str_iseq+0x70>
; j++;
6c: 48 ff 45 f8 incq -8(%rbp)
; if (s2[j] == '\0')
70: 48 8b 4d f8 movq -8(%rbp), %rcx
74: 80 3c 0e 00 cmpb $0, (%rsi,%rcx)
78: 75 c6 jne -58 <_str_iseq+0x40>
; k++;
7a: 48 ff 45 e8 incq -24(%rbp)
7e: eb c0 jmp -64 <_str_iseq+0x40>
; return m == 0;
80: 85 c0 testl %eax, %eax
82: 0f 94 c0 sete %al
; }
Even though -O1
did some level of optimizations, it did not optimize out the
important parts of the algorithm. As we’d expect from volatile, if (s2[j] == '\0')
and k++
are still present in the assembly. The operations are still
balanced keeping str_iseq
operating in constant time.