Some weeks ago, a colleague and me wondered if a
const array in a function should be marked as
Consider the following, heavily simplified example of the type of function we were looking at:
The real code was obviously a lot more complicated, with a number of arguments instead of just an integer.
The function calculated an index based on these arguments, looked up some data in one array, used the value to calculate the next index, and so on.
So, in the example above, should
Primes be marked as
static to give it “static storage duration”?
Or will the compiler optimize the function anyway and both variants result in the same code?
In case of doubt, it is best to verify what the compiler actually does. A great tool for this is godbolt.org, which we also used in this case. For the sake of having everything in this post, here is the x86 assembly code for the snippet above as produced by the latest GCC 12.2.01:
As you might notice, all these
movl instructions construct the constant array on the stack - every time the function is called.
But wait, we compiled without optimizations so the compiler cannot do its magic.
Here is the code generated with
-O2 (and just let me tell you that
-O3 produces the exact same):
It is a bit harder to see what is going on here due to vectorization with SSE.
However, at the end of the day, the pairs of
movaps still construct the array on every invocation.
This may not sound like a big problem here, but keep in mind this problem is extremely simplified.
Real functions may have more than one array, and each of them have more than just eight entries.
Essentially copying them around on every call is a huge waste of performance.
So, what changes if we add the
static keyword to the array?
This looks a lot better, the
leaq instruction loads the address of the
Primes array and
movl just accesses the one entry it needs to load.
In principle, the compiler could optimize the original example to the same code because there is no observable difference in this case (famous as-if rule). GCC developers seem to agree on this and there is an open bug report for GCC. As mentioned in there, the Clang compiler does this optimization and it still works at least for this example with version 14.0.6. Note however that the optimization is not allowed if differences can be observed. In fact, older versions of Clang had a bug where the compiler would incorrectly optimize constant arrays.
So there you have it, add the
static keyword to your constant arrays in a function!
(And if you are in C++, maybe make them
constexpr as well?)
The full invocation was
gcc -S -fno-asynchronous-unwind-tables -fno-stack-protector prime.cto get rid of some noise. ↩
You do not need to agree with my opinions expressed in this blog post, and I'm fine with different views on certain topics. However, if there is a technical fault please send me a message so that I can correct it!