Const and Static Const Arrays in C and C++ 2022-10-30

Some weeks ago, a colleague and me wondered if a const array in a function should be marked as static. Consider the following, heavily simplified example of the type of function we were looking at:

int prime(int n) {
  const int Primes[] = { 2, 3, 5, 7, 11, 13, 17, 19 };
  return Primes[n];
}

The real code was obviously a lot more complicated, with a number of arguments instead of just an integer. The function calculated an index based on these arguments, looked up some data in one array, used the value to calculate the next index, and so on. So, in the example above, should Primes be marked as static to give it “static storage duration”? Or will the compiler optimize the function anyway and both variants result in the same code?

In case of doubt, it is best to verify what the compiler actually does. A great tool for this is godbolt.org, which we also used in this case. For the sake of having everything in this post, here is the x86 assembly code for the snippet above as produced by the latest GCC 12.2.01:

	.file	"prime.c"
	.text
	.globl	prime
	.type	prime, @function
prime:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	%edi, -36(%rbp)
	movl	$2, -32(%rbp)
	movl	$3, -28(%rbp)
	movl	$5, -24(%rbp)
	movl	$7, -20(%rbp)
	movl	$11, -16(%rbp)
	movl	$13, -12(%rbp)
	movl	$17, -8(%rbp)
	movl	$19, -4(%rbp)
	movl	-36(%rbp), %eax
	cltq
	movl	-32(%rbp,%rax,4), %eax
	popq	%rbp
	ret
	.size	prime, .-prime
	.ident	"GCC: (GNU) 12.2.0"
	.section	.note.GNU-stack,"",@progbits

As you might notice, all these movl instructions construct the constant array on the stack - every time the function is called. But wait, we compiled without optimizations so the compiler cannot do its magic. Here is the code generated with -O2 (and just let me tell you that -O3 produces the exact same):

	.file	"prime.c"
	.text
	.p2align 4
	.globl	prime
	.type	prime, @function
prime:
	movdqa	.LC0(%rip), %xmm0
	movslq	%edi, %rdi
	movaps	%xmm0, -40(%rsp)
	movdqa	.LC1(%rip), %xmm0
	movaps	%xmm0, -24(%rsp)
	movl	-40(%rsp,%rdi,4), %eax
	ret
	.size	prime, .-prime
	.section	.rodata.cst16,"aM",@progbits,16
	.align 16
.LC0:
	.long	2
	.long	3
	.long	5
	.long	7
	.align 16
.LC1:
	.long	11
	.long	13
	.long	17
	.long	19
	.ident	"GCC: (GNU) 12.2.0"
	.section	.note.GNU-stack,"",@progbits

It is a bit harder to see what is going on here due to vectorization with SSE. However, at the end of the day, the pairs of movdqa and movaps still construct the array on every invocation. This may not sound like a big problem here, but keep in mind this problem is extremely simplified. Real functions may have more than one array, and each of them have more than just eight entries. Essentially copying them around on every call is a huge waste of performance.

So, what changes if we add the static keyword to the array?

	.file	"prime.c"
	.text
	.p2align 4
	.globl	prime
	.type	prime, @function
prime:
	movslq	%edi, %rdi
	leaq	Primes.0(%rip), %rax
	movl	(%rax,%rdi,4), %eax
	ret
	.size	prime, .-prime
	.section	.rodata
	.align 32
	.type	Primes.0, @object
	.size	Primes.0, 32
Primes.0:
	.long	2
	.long	3
	.long	5
	.long	7
	.long	11
	.long	13
	.long	17
	.long	19
	.ident	"GCC: (GNU) 12.2.0"
	.section	.note.GNU-stack,"",@progbits

This looks a lot better, the leaq instruction loads the address of the Primes array and movl just accesses the one entry it needs to load.

In principle, the compiler could optimize the original example to the same code because there is no observable difference in this case (famous as-if rule). GCC developers seem to agree on this and there is an open bug report for GCC. As mentioned in there, the Clang compiler does this optimization and it still works at least for this example with version 14.0.6. Note however that the optimization is not allowed if differences can be observed. In fact, older versions of Clang had a bug where the compiler would incorrectly optimize constant arrays.

So there you have it, add the static keyword to your constant arrays in a function! (And if you are in C++, maybe make them constexpr as well?)

  1. The full invocation was gcc -S -fno-asynchronous-unwind-tables -fno-stack-protector prime.c to get rid of some noise. 

You do not need to agree with my opinions expressed in this blog post, and I'm fine with different views on certain topics. However, if there is a technical fault please send me a message so that I can correct it!