Some weeks ago, a colleague and me wondered if a const
array in a function should be marked as static
.
Consider the following, heavily simplified example of the type of function we were looking at:
int prime(int n) {
const int Primes[] = { 2, 3, 5, 7, 11, 13, 17, 19 };
return Primes[n];
}
The real code was obviously a lot more complicated, with a number of arguments instead of just an integer.
The function calculated an index based on these arguments, looked up some data in one array, used the value to calculate the next index, and so on.
So, in the example above, should Primes
be marked as static
to give it “static storage duration”?
Or will the compiler optimize the function anyway and both variants result in the same code?
In case of doubt, it is best to verify what the compiler actually does. A great tool for this is godbolt.org, which we also used in this case. For the sake of having everything in this post, here is the x86 assembly code for the snippet above as produced by the latest GCC 12.2.01:
.file "prime.c"
.text
.globl prime
.type prime, @function
prime:
pushq %rbp
movq %rsp, %rbp
movl %edi, -36(%rbp)
movl $2, -32(%rbp)
movl $3, -28(%rbp)
movl $5, -24(%rbp)
movl $7, -20(%rbp)
movl $11, -16(%rbp)
movl $13, -12(%rbp)
movl $17, -8(%rbp)
movl $19, -4(%rbp)
movl -36(%rbp), %eax
cltq
movl -32(%rbp,%rax,4), %eax
popq %rbp
ret
.size prime, .-prime
.ident "GCC: (GNU) 12.2.0"
.section .note.GNU-stack,"",@progbits
As you might notice, all these movl
instructions construct the constant array on the stack - every time the function is called.
But wait, we compiled without optimizations so the compiler cannot do its magic.
Here is the code generated with -O2
(and just let me tell you that -O3
produces the exact same):
.file "prime.c"
.text
.p2align 4
.globl prime
.type prime, @function
prime:
movdqa .LC0(%rip), %xmm0
movslq %edi, %rdi
movaps %xmm0, -40(%rsp)
movdqa .LC1(%rip), %xmm0
movaps %xmm0, -24(%rsp)
movl -40(%rsp,%rdi,4), %eax
ret
.size prime, .-prime
.section .rodata.cst16,"aM",@progbits,16
.align 16
.LC0:
.long 2
.long 3
.long 5
.long 7
.align 16
.LC1:
.long 11
.long 13
.long 17
.long 19
.ident "GCC: (GNU) 12.2.0"
.section .note.GNU-stack,"",@progbits
It is a bit harder to see what is going on here due to vectorization with SSE.
However, at the end of the day, the pairs of movdqa
and movaps
still construct the array on every invocation.
This may not sound like a big problem here, but keep in mind this problem is extremely simplified.
Real functions may have more than one array, and each of them have more than just eight entries.
Essentially copying them around on every call is a huge waste of performance.
So, what changes if we add the static
keyword to the array?
.file "prime.c"
.text
.p2align 4
.globl prime
.type prime, @function
prime:
movslq %edi, %rdi
leaq Primes.0(%rip), %rax
movl (%rax,%rdi,4), %eax
ret
.size prime, .-prime
.section .rodata
.align 32
.type Primes.0, @object
.size Primes.0, 32
Primes.0:
.long 2
.long 3
.long 5
.long 7
.long 11
.long 13
.long 17
.long 19
.ident "GCC: (GNU) 12.2.0"
.section .note.GNU-stack,"",@progbits
This looks a lot better, the leaq
instruction loads the address of the Primes
array and movl
just accesses the one entry it needs to load.
In principle, the compiler could optimize the original example to the same code because there is no observable difference in this case (famous as-if rule). GCC developers seem to agree on this and there is an open bug report for GCC. As mentioned in there, the Clang compiler does this optimization and it still works at least for this example with version 14.0.6. Note however that the optimization is not allowed if differences can be observed. In fact, older versions of Clang had a bug where the compiler would incorrectly optimize constant arrays.
So there you have it, add the static
keyword to your constant arrays in a function!
(And if you are in C++, maybe make them constexpr
as well?)
-
The full invocation was
gcc -S -fno-asynchronous-unwind-tables -fno-stack-protector prime.c
to get rid of some noise. ↩
You do not need to agree with my opinions expressed in this blog post, and I'm fine with different views on certain topics. However, if there is a technical fault please send me a message so that I can correct it!