Is there a explanation for why C is slower than C++?

tliltocatl · 2025-12-19T18:13:38 1766168018

The code looks 100% identical except for the namespace prefixes. Must be something particular about github setup, because on mine (gcc15.2.1/clang20.1.8/Ryzen5600X) the run time is indistinguishably close. Interestingly, with default flags but -O3 clang is 30% slower, with flags from the script (-s -static -flto $MARCH_FLAG -mtune=native -fomit-frame-pointer -fno-signed-zeros -fno-trapping-math -fassociative-math) clang is a bit faster.

A nitpick is that benchmarking C/C++ with $MARCH_FLAG -mtune=native and math magic is kinda unfair for Zig/Julia (Nim seem to support those) - unless you are running Gentoo it's unlikely to be used for real applications.

AlotOfReading · 2025-12-19T18:30:47 1766169047

The actual assembly generated for the hot loop is identical in both C and C++ on Clang, as you'd expect. It's also identical at the IR level.

AlotOfReading · 2025-12-19T18:13:35 1766168015

It's probably down to the measurement noise of benchmarking on GitHub actions.

drob518 · 2025-12-19T18:32:25 1766169145

I suspect this is it. Any benchmark that takes less than a second to run should have its iteration count increased such that it takes at least a second, and preferably 5+ seconds, to run. Otherwise CPU scheduling, network processing, etc. is perturbing everything.

igouy · 2025-12-19T20:00:03 1766174403

What if instead we measured with …

BenchExec "uses the cgroups feature of the Linux kernel to correctly handle groups of processes and uses Linux user namespaces to create a container that restricts interference of [each program] with the benchmarking host."

https://github.com/sosy-lab/benchexec

drob518 · 2025-12-19T20:29:27 1766176167

Certainly better, but you’re always going to be better off maximizing the runtime to a level where it just swamps any of the other effects. Then do multiple runs and take an average.

mutkach · 2025-12-19T18:04:43 1766167483

Probably LLVM runs different sets of optimization passes for C and C++. Need to look at the IR, or assembly to know exactly what happens.

pizlonator · 2025-12-19T18:54:39 1766170479

It doesn’t as far as I know.

(I have spent a good amount of time hacking the llvm pass pipeline for my personal project so if there was a significant difference I probably would have seen it by now)

mutkach · 2025-12-19T21:22:38 1766179358

You are correct, that was an uneducated guess on my part.

I just glanced at the IR which was different for some attributes (nounwind vs mustprogress norecurse), but the resulting assembly is 100% identical for every optimization level.