I haven’t read the article yet, but back when I tried to get to over 100 GB/s IO...

eqvinox · 2025-03-01T06:43:38 1740811418

I don't think these things are related, this is talking about the LSU right inside the core. I'd also expect oscillations if there were a thermal problem like you're describing, i.e. core clocks up when IO hub delivers data, IO hub stalls, causes core to stall as well, IO hub can run again delivering data, repeat from beginning.

(Then again, boost clocks are an intentional oscillation anyway…)

tanelpoder · 2025-03-01T07:58:33 1740815913

Ok, I just read through the article. As I understand, their tests were designed to run entirely on data on the local cores' cache? I only see L1d mentioned there.

eqvinox · 2025-03-01T08:35:24 1740818124

Yes, that's my understanding of "Zen 5 also doubles L1D load bandwidth, and I’m exercising that by having each FMA instruction source an input from the data cache." Also, considering the author's other work, I'm pretty sure they can isolate load-store performance from cache performance from memory interface performance.