Intel Downfall fix has performance hit

ksj1 · Aug 11, 2023

Interesting news that the Downfall vulnerability fix can reduce performance on Intel processors by 39% in some workloads. Should be interesting to see what that does to comparisons with AS.

ArkSingularity · Aug 11, 2023

Apparently the vulnerability is via speculative execution of AVX instructions, and the mitigations involve patches that reduce the performance of these workloads. If this is true, it shouldn't have a particularly huge impact on non-vectorized workloads, but AVX workloads would take a fairly substantial hit.

Apparently this affects 6th gen to 11th gen processors also (if I'm not mistaken, I could be wrong on this). That's not particularly good news for Skylake, which already hit even harder than earlier processors were because of Retbleed (which significantly increased the performance cost of mitigations compared to older Broadwell, Haswell, and earlier processors).

I have a feeling that the real world impact will probably be much less than 39%, but if AVX instructions are crippled, certain workloads that are heavily vectorized will certainly feel the pain.

leman · Aug 12, 2023

So what, Intel will stop pushing Cinebench as the mother of all benchmarks now?

bobcomer · Aug 12, 2023

leman said:
So what, Intel will stop pushing Cinebench as the mother of all benchmarks now?

Most people don't concern themselves with benchmarks.

If most people notice the bad performance, they'll get bad press. If it's just a few, most people wont care. Me, I've never been a fan of speculative execution, but if it slows down my work flow (not likely much), I'll be ticked.

dmccloud · Aug 12, 2023

ksj1 said:
Interesting news that the Downfall vulnerability fix can reduce performance on Intel processors by 39% in some workloads. Should be interesting to see what that does to comparisons with AS.

The Spectre mitigation also induced a massive performance hit. It seems to be a pattern rather than the exception in the 2020s...

unrigestered · Aug 12, 2023

Should be interesting to see what that does to comparisons with AS.

of course this is not great news for those affected, but doesn't this particular flaw "only" affect gen 6 or so to gen 11?
they are currently on gen 13, with gen 14 being almost around the corner, so gen 12 onwards should, as far as i understand, not be affected by this, thus won't make those look worse in comparison to the Apple chips

dmccloud · Aug 12, 2023

bobcomer said:
Most people don't concern themselves with benchmarks.

If most people notice the bad performance, they'll get bad press. If it's just a few, most people wont care. Me, I've never been a fan of speculative execution, but if it slows down my work flow (not likely much), I'll be ticked.

Sites such as Tom's Hardware, Anandtech, LTT, etc. will be running benchmarks on systems with and without the mitigations applied to compare the actual impacts on performance. It will be interesting to see what the actual numbers are compared to the preliminary/theoretical numbers. With 12th and 13th gen CPUs not being affected by either Downfall or the microcode fix, this probably will get less overall publicity than either Spectre or Retbleed.

leman · Aug 12, 2023

bobcomer said:
If it's just a few, most people wont care. Me, I've never been a fan of speculative execution,

Didn’t you once say you care about performance? Speculative execution is what makes CPUs go fast in the first place. If you don’t want speculative execution might as well use an atom and give up 70% of performance.

bobcomer · Aug 12, 2023

leman said:
Didn’t you once say you care about performance? Speculative execution is what makes CPUs go fast in the first place. If you don’t want speculative execution might as well use an atom and give up 70% of performance.

Yes, I still care about performance. And yes, I understand how it speeds things up.

My dislike of it is part because of this kind of issue, but really, it seems so inefficient -- like there ought to be a better way. I don't really have a technical reason. Remember I'm just a software guy...

leman · Aug 13, 2023

bobcomer said:
My dislike of it is part because of this kind of issue, but really, it seems so inefficient -- like there ought to be a better way. I don't really have a technical reason.

Inefficient? Do you start packing your suitcase the moment the taxi arrives to take you to the airport or do you do it the evening before? Speculation is all about the efficient use of available resources instead of standing there doing nothing until the last possible moment.

bobcomer said:
Remember I'm just a software guy...

Hardware is a leaky abstraction. Can’t write fast software without understanding and utilizing how hardware works. That’s why there are all these O(N) algorithms that end up slower than quadratic in practice.

throAU · Aug 13, 2023

unrigestered said:
of course this is not great news for those affected, but doesn't this particular flaw "only" affect gen 6 or so to gen 11?
they are currently on gen 13, with gen 14 being almost around the corner, so gen 12 onwards should, as far as i understand, not be affected by this, thus won't make those look worse in comparison to the Apple chips

This particular flaw only goes to 11th generation, but for datacenter, that's still pretty current. Ice-lake was still current in 2022, which is 10th gen cores... Intel Xeon only got onto 12th gen this year (shipping to OEM's march 2023). So basically every intel xeon deployment older than a couple of months old (if that) is impacted by this; even after intel launch a product, servers you can actually buy take some time to adopt it.

Intel datacenter parts are generally 1-2 generations behind desktop in terms of architecture (they just have way more of the same cores, more cache, etc.). Additionally, datacenter/cloud hosting is where you're more worried about malicious VMs trying to break your stuff.

MrGunny94 · Aug 14, 2023

I'm implementing these fixes across our Linux servers atm, I'll run some benchmarks and let you guys know how it goes. In the meanwhile I'll redirect you to think link: https://www.phoronix.com/review/intel-downfall-benchmarks

bobcomer · Aug 14, 2023

leman said:
Inefficient? Do you start packing your suitcase the moment the taxi arrives to take you to the airport or do you do it the evening before? Speculation is all about the efficient use of available resources instead of standing there doing nothing until the last possible moment.

Yeah, inefficient, too many wrong paths, too many state changes. For suitcases, it's more efficient to pack it the night before and have it all ready, and no changing things at the last minute.

I get your point and am not really arguing against it, I just think there has to be a better way. And no I don't know of a better way, nor am I likely to figure one out.

leman said:
Hardware is a leaky abstraction. Can’t write fast software without understanding and utilizing how hardware works.

I agree.

ArkSingularity · Aug 14, 2023

bobcomer said:
I get your point and am not really arguing against it, I just think there has to be a better way. And no I don't know of a better way, nor am I likely to figure one out.

Intel actually tried something like this with the Itanium, which used a VLIW (very long instruction word) architecture instead of the normal x86 way of doing things. It was a good idea on paper, but it turned out to be a flop and a bit of a disastrous product for Intel, as it wasn't really feasible for compilers to optimize well enough to make it run the way it was expected to.

The idea was to have compilers optimize things and explicitly parallelize workloads instead of relying on the CPU to re-order instructions on the fly, but that turned out to be a very impractical problem to try to solve. Even with how much work goes into designing modern CPUs with out of order execution, it's still a far more practical solution than the VLIW alternatives which didn't really hold ground over the long term.

Part of the reason that speculative execution works is because of the nature of instructions that are executed. CPUs operate on very atomic operations, much more so than your typical instructions in a high level programming language.

It's a lot of "Load this memory address into this register. Load this other memory address into a 2nd register. Add these two registers and store it into a third register. Load this new memory address into a fourth register. Compare the first two registers and store into a fifth register. Jump to the address stored in the 5th register if the 4th register is not zero. Then load another value from this memory address into a 6th register. Store the value of the 2nd register to this new memory address. Add that value to the 3rd register and store it into the 2nd register... "

A lot of that is very easy to parallelize. It doesn't really make sense to wait to do all of these sorts of loads if you already have a bunch of in-flight instructions and you know what they're loading. Furthermore, when memory accesses have several hundred CPU cycles of latency, you want to go ahead and catch them early, and you want to be able to try to have some other in-flight instructions to try to run in case you have to access a memory address that isn't in cache already.

CPUs without speculative execution do exist (very small in-order cores like the Cortex-A53, for example), but their performance is about 1/20 that of modern speculative execution out-of-order cores. We could have a world without speculative execution, but the cost would be having to rely on computers that weren't much faster than a Pentium 3 (incidentally, even the pentium 3 was an out-of-order processor, although we've learned a lot since then and have now been able to design in-order processors that can roughly match its performance).

bobcomer · Aug 14, 2023

ArkSingularity

Nice explanation, thanks!

The itanium way is probably more how I would try, but I know the itanium was a failure. I still feel there should be a better way, but maybe not.

I'll leave it to you hardware guys, I'm better at software.

ArkSingularity · Aug 14, 2023

bobcomer said:
ArkSingularity

Nice explanation, thanks!

The itanium way is probably more how I would try, but I know the itanium was a failure. I still feel there should be a better way, but maybe not.

I'll leave it to you hardware guys, I'm better at software.

Yea, speculative execution is a necessary evil in the modern computing world. There would simply be no way to get anywhere near the levels of performance that we are used to without it.

Another reason that our standard superscaler out of order processors are much more viable than their VLIW counterparts is because processors and pipelines change and evolve. If you wanted to design a VLIW processor where the compiler did all of the parallelization work, you'd have a problem any time you wanted to release a new version of the processor with additional execution units, ALUs, etc. With VLIW, the code that you compiled for a previous processor wouldn't be able to properly take advantage of the new one (if the compiler said "there are 3 ALUS in this CPU, so I can do three arithmetic operations simultaneously" and you added a 4th ALU to the next version of the CPU, any code that this compiler previously compiled wouldn't be optimized for this and wouldn't be taking full advantage of it.)

Allowing the CPU's hardware to do this work allows the processor to figure out how to best execute any given stream of instructions for its own hardware, no matter how the compiler may have tried to compile it.

Technically, speculative execution itself is not the same thing as out-of-order execution, although they usually go hand in hand. Speculative execution is actually pretty important even for the simplest of processors because of how things are pipelined. I'm oversimplifying things a bit, but modern CPUs work much like an assembly line. Each stage in the pipeline has its own "job" so to speak, and they all hand off work to the next stage in the pipeline clock-by-clock. The CPU's job is to try to keep that pipeline as full as possible. Modern CPUs usually have pipelines of about 10-20 stages, depending on the core design. The difficulty here is branches, since you don't know what you need to execute next until the previous instruction has evaluated the data necessary to determine where to jump. If CPUs didn't have speculative execution and branch prediction, the pipeline would spend the majority of its time stalled, as average executable code generally contains about 1 branch every 6 instructions (which would force the pipeline to be flushed every time if we didn't have branch prediction and speculative execution to try to keep things moving).

falainber · Aug 14, 2023

leman said:
So what, Intel will stop pushing Cinebench as the mother of all benchmarks now?

The issue does not affect the latest generations in Intel CPUs, nor will it affect the new chips.

unrigestered · Aug 14, 2023

falainber said:
The issue does not affect the latest generations in Intel CPUs, nor will it affect the new chips.

but as an Apple fan, one can still dream 👍

throAU · Aug 14, 2023

falainber said:
The issue does not affect the latest generations in Intel CPUs, nor will it affect the new chips.

i.e., intel will make MORE use of Cinebench to try and point out how great the new processors are when in reality outside of cherry picked workloads you're talking the now-intel-standard 5% generational improvement.

MRMSFC · Aug 18, 2023

bobcomer said:
I get your point and am not really arguing against it, I just think there has to be a better way. And no I don't know of a better way, nor am I likely to figure one out.

It only seems inefficient prima fascie, a lot of stuff in technology is actually more efficient and faster despite what “common sense” would tell you.

Like Serial vs. Parallel busses come to mind. And how Serial ends up being faster than Parallel due to various factors.

Icelus · Aug 18, 2023

GCC Compiler Adds Software Workaround To Avoid Intel Downfall Performance Hit - Phoronix

www.phoronix.com

ArkSingularity · Aug 18, 2023

Icelus said:
GCC Compiler Adds Software Workaround To Avoid Intel Downfall Performance Hit - Phoronix

www.phoronix.com

Very interesting, apparently it's specifically an AVX-512 instruction that is affected? If this is the only instruction that requires mitigations (and I'm assuming there are not performance regressions from mitigations to other instructions), then it would seem this vulnerability wouldn't impact the vast majority of everyday consumers who do not have AVX-capable CPUs or don't use software that relies on AVX-512.

Icelus · Aug 18, 2023

ArkSingularity said:
Very interesting, apparently it's specifically an AVX-512 instruction that is affected? If this is the only instruction that requires mitigations (and I'm assuming there are not performance regressions from mitigations to other instructions), then it would seem this vulnerability wouldn't impact the vast majority of everyday consumers who do not have AVX-capable CPUs or don't use software that relies on AVX-512.

No, AVX2 and AXV-512 have "gather" instructions.

(You can highlight/filter SIMD instructions here: https://www.officedaytime.com/simd512e/)

ArkSingularity · Aug 18, 2023

Icelus said:
No, AVX2 and AXV-512 have "gather" instructions.

(You can highlight/filter SIMD instructions here: https://www.officedaytime.com/simd512e/)

Ah, thanks for the clarification.

dmccloud · Aug 19, 2023

throAU said:
i.e., intel will make MORE use of Cinebench to try and point out how great the new processors are when in reality outside of cherry picked workloads you're talking the now-intel-standard 5% generational improvement.

I think too many people put excessive emphasis on Cinebench scores. 99% of actual user workloads won't even tax the system in the same way as CB, so it's really just a stress test with numbers.

Intel Downfall fix has performance hit

macrumors 6502

macrumors 6502a

macrumors Core

macrumors 601

macrumors 68040

Suspended

macrumors 68040

macrumors Core

macrumors 601

macrumors Core

macrumors G4

macrumors 65816

macrumors 601

macrumors 6502a

macrumors 601

ArkSingularity​

macrumors 6502a

ArkSingularity​

macrumors 68040

Suspended

macrumors G4

macrumors 6502

macrumors 6502

macrumors 6502a

macrumors 6502

macrumors 6502a

macrumors 68040

Our Staff

ArkSingularity

ArkSingularity