Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

ksj1

macrumors 6502
Original poster
Jul 17, 2018
294
535
Interesting news that the Downfall vulnerability fix can reduce performance on Intel processors by 39% in some workloads. Should be interesting to see what that does to comparisons with AS.
 

ArkSingularity

macrumors 6502a
Mar 5, 2022
928
1,130
Apparently the vulnerability is via speculative execution of AVX instructions, and the mitigations involve patches that reduce the performance of these workloads. If this is true, it shouldn't have a particularly huge impact on non-vectorized workloads, but AVX workloads would take a fairly substantial hit.

Apparently this affects 6th gen to 11th gen processors also (if I'm not mistaken, I could be wrong on this). That's not particularly good news for Skylake, which already hit even harder than earlier processors were because of Retbleed (which significantly increased the performance cost of mitigations compared to older Broadwell, Haswell, and earlier processors).

I have a feeling that the real world impact will probably be much less than 39%, but if AVX instructions are crippled, certain workloads that are heavily vectorized will certainly feel the pain.
 

bobcomer

macrumors 601
May 18, 2015
4,949
3,699
So what, Intel will stop pushing Cinebench as the mother of all benchmarks now?
Most people don't concern themselves with benchmarks.

If most people notice the bad performance, they'll get bad press. If it's just a few, most people wont care. Me, I've never been a fan of speculative execution, but if it slows down my work flow (not likely much), I'll be ticked.
 

dmccloud

macrumors 68040
Sep 7, 2009
3,138
1,899
Anchorage, AK
Interesting news that the Downfall vulnerability fix can reduce performance on Intel processors by 39% in some workloads. Should be interesting to see what that does to comparisons with AS.

The Spectre mitigation also induced a massive performance hit. It seems to be a pattern rather than the exception in the 2020s...
 
  • Like
Reactions: Chuckeee

unrigestered

Suspended
Jun 17, 2022
879
840
Should be interesting to see what that does to comparisons with AS.
of course this is not great news for those affected, but doesn't this particular flaw "only" affect gen 6 or so to gen 11?
they are currently on gen 13, with gen 14 being almost around the corner, so gen 12 onwards should, as far as i understand, not be affected by this, thus won't make those look worse in comparison to the Apple chips
 
  • Like
Reactions: ArkSingularity

dmccloud

macrumors 68040
Sep 7, 2009
3,138
1,899
Anchorage, AK
Most people don't concern themselves with benchmarks.

If most people notice the bad performance, they'll get bad press. If it's just a few, most people wont care. Me, I've never been a fan of speculative execution, but if it slows down my work flow (not likely much), I'll be ticked.

Sites such as Tom's Hardware, Anandtech, LTT, etc. will be running benchmarks on systems with and without the mitigations applied to compare the actual impacts on performance. It will be interesting to see what the actual numbers are compared to the preliminary/theoretical numbers. With 12th and 13th gen CPUs not being affected by either Downfall or the microcode fix, this probably will get less overall publicity than either Spectre or Retbleed.
 
  • Like
Reactions: MRMSFC and bobcomer

leman

macrumors Core
Oct 14, 2008
19,518
19,668
If it's just a few, most people wont care. Me, I've never been a fan of speculative execution,

Didn’t you once say you care about performance? Speculative execution is what makes CPUs go fast in the first place. If you don’t want speculative execution might as well use an atom and give up 70% of performance.
 

bobcomer

macrumors 601
May 18, 2015
4,949
3,699
Didn’t you once say you care about performance? Speculative execution is what makes CPUs go fast in the first place. If you don’t want speculative execution might as well use an atom and give up 70% of performance.
Yes, I still care about performance. And yes, I understand how it speeds things up.

My dislike of it is part because of this kind of issue, but really, it seems so inefficient -- like there ought to be a better way. I don't really have a technical reason. Remember I'm just a software guy...
 

leman

macrumors Core
Oct 14, 2008
19,518
19,668
My dislike of it is part because of this kind of issue, but really, it seems so inefficient -- like there ought to be a better way. I don't really have a technical reason.

Inefficient? Do you start packing your suitcase the moment the taxi arrives to take you to the airport or do you do it the evening before? Speculation is all about the efficient use of available resources instead of standing there doing nothing until the last possible moment.

Remember I'm just a software guy...

Hardware is a leaky abstraction. Can’t write fast software without understanding and utilizing how hardware works. That’s why there are all these O(N) algorithms that end up slower than quadratic in practice.
 

throAU

macrumors G3
Feb 13, 2012
9,198
7,346
Perth, Western Australia
of course this is not great news for those affected, but doesn't this particular flaw "only" affect gen 6 or so to gen 11?
they are currently on gen 13, with gen 14 being almost around the corner, so gen 12 onwards should, as far as i understand, not be affected by this, thus won't make those look worse in comparison to the Apple chips

This particular flaw only goes to 11th generation, but for datacenter, that's still pretty current. Ice-lake was still current in 2022, which is 10th gen cores... Intel Xeon only got onto 12th gen this year (shipping to OEM's march 2023). So basically every intel xeon deployment older than a couple of months old (if that) is impacted by this; even after intel launch a product, servers you can actually buy take some time to adopt it.

Intel datacenter parts are generally 1-2 generations behind desktop in terms of architecture (they just have way more of the same cores, more cache, etc.). Additionally, datacenter/cloud hosting is where you're more worried about malicious VMs trying to break your stuff.
 
Last edited:

bobcomer

macrumors 601
May 18, 2015
4,949
3,699
Inefficient? Do you start packing your suitcase the moment the taxi arrives to take you to the airport or do you do it the evening before? Speculation is all about the efficient use of available resources instead of standing there doing nothing until the last possible moment.
Yeah, inefficient, too many wrong paths, too many state changes. For suitcases, it's more efficient to pack it the night before and have it all ready, and no changing things at the last minute. :)

I get your point and am not really arguing against it, I just think there has to be a better way. And no I don't know of a better way, nor am I likely to figure one out.

Hardware is a leaky abstraction. Can’t write fast software without understanding and utilizing how hardware works.
I agree.
 
  • Like
Reactions: throAU

ArkSingularity

macrumors 6502a
Mar 5, 2022
928
1,130
I get your point and am not really arguing against it, I just think there has to be a better way. And no I don't know of a better way, nor am I likely to figure one out.
Intel actually tried something like this with the Itanium, which used a VLIW (very long instruction word) architecture instead of the normal x86 way of doing things. It was a good idea on paper, but it turned out to be a flop and a bit of a disastrous product for Intel, as it wasn't really feasible for compilers to optimize well enough to make it run the way it was expected to.

The idea was to have compilers optimize things and explicitly parallelize workloads instead of relying on the CPU to re-order instructions on the fly, but that turned out to be a very impractical problem to try to solve. Even with how much work goes into designing modern CPUs with out of order execution, it's still a far more practical solution than the VLIW alternatives which didn't really hold ground over the long term.

Part of the reason that speculative execution works is because of the nature of instructions that are executed. CPUs operate on very atomic operations, much more so than your typical instructions in a high level programming language.

It's a lot of "Load this memory address into this register. Load this other memory address into a 2nd register. Add these two registers and store it into a third register. Load this new memory address into a fourth register. Compare the first two registers and store into a fifth register. Jump to the address stored in the 5th register if the 4th register is not zero. Then load another value from this memory address into a 6th register. Store the value of the 2nd register to this new memory address. Add that value to the 3rd register and store it into the 2nd register... "

A lot of that is very easy to parallelize. It doesn't really make sense to wait to do all of these sorts of loads if you already have a bunch of in-flight instructions and you know what they're loading. Furthermore, when memory accesses have several hundred CPU cycles of latency, you want to go ahead and catch them early, and you want to be able to try to have some other in-flight instructions to try to run in case you have to access a memory address that isn't in cache already.

CPUs without speculative execution do exist (very small in-order cores like the Cortex-A53, for example), but their performance is about 1/20 that of modern speculative execution out-of-order cores. We could have a world without speculative execution, but the cost would be having to rely on computers that weren't much faster than a Pentium 3 (incidentally, even the pentium 3 was an out-of-order processor, although we've learned a lot since then and have now been able to design in-order processors that can roughly match its performance).
 

bobcomer

macrumors 601
May 18, 2015
4,949
3,699

ArkSingularity


Nice explanation, thanks!

The itanium way is probably more how I would try, but I know the itanium was a failure. I still feel there should be a better way, but maybe not.

I'll leave it to you hardware guys, I'm better at software. :)
 

ArkSingularity

macrumors 6502a
Mar 5, 2022
928
1,130

ArkSingularity


Nice explanation, thanks!

The itanium way is probably more how I would try, but I know the itanium was a failure. I still feel there should be a better way, but maybe not.

I'll leave it to you hardware guys, I'm better at software. :)
Yea, speculative execution is a necessary evil in the modern computing world. There would simply be no way to get anywhere near the levels of performance that we are used to without it.

Another reason that our standard superscaler out of order processors are much more viable than their VLIW counterparts is because processors and pipelines change and evolve. If you wanted to design a VLIW processor where the compiler did all of the parallelization work, you'd have a problem any time you wanted to release a new version of the processor with additional execution units, ALUs, etc. With VLIW, the code that you compiled for a previous processor wouldn't be able to properly take advantage of the new one (if the compiler said "there are 3 ALUS in this CPU, so I can do three arithmetic operations simultaneously" and you added a 4th ALU to the next version of the CPU, any code that this compiler previously compiled wouldn't be optimized for this and wouldn't be taking full advantage of it.)

Allowing the CPU's hardware to do this work allows the processor to figure out how to best execute any given stream of instructions for its own hardware, no matter how the compiler may have tried to compile it.

Technically, speculative execution itself is not the same thing as out-of-order execution, although they usually go hand in hand. Speculative execution is actually pretty important even for the simplest of processors because of how things are pipelined. I'm oversimplifying things a bit, but modern CPUs work much like an assembly line. Each stage in the pipeline has its own "job" so to speak, and they all hand off work to the next stage in the pipeline clock-by-clock. The CPU's job is to try to keep that pipeline as full as possible. Modern CPUs usually have pipelines of about 10-20 stages, depending on the core design. The difficulty here is branches, since you don't know what you need to execute next until the previous instruction has evaluated the data necessary to determine where to jump. If CPUs didn't have speculative execution and branch prediction, the pipeline would spend the majority of its time stalled, as average executable code generally contains about 1 branch every 6 instructions (which would force the pipeline to be flushed every time if we didn't have branch prediction and speculative execution to try to keep things moving).
 
Last edited:
  • Like
Reactions: MRMSFC and bobcomer

throAU

macrumors G3
Feb 13, 2012
9,198
7,346
Perth, Western Australia
The issue does not affect the latest generations in Intel CPUs, nor will it affect the new chips.

i.e., intel will make MORE use of Cinebench to try and point out how great the new processors are when in reality outside of cherry picked workloads you're talking the now-intel-standard 5% generational improvement.
 

MRMSFC

macrumors 6502
Jul 6, 2023
371
381
I get your point and am not really arguing against it, I just think there has to be a better way. And no I don't know of a better way, nor am I likely to figure one out.
It only seems inefficient prima fascie, a lot of stuff in technology is actually more efficient and faster despite what “common sense” would tell you.

Like Serial vs. Parallel busses come to mind. And how Serial ends up being faster than Parallel due to various factors.
 

ArkSingularity

macrumors 6502a
Mar 5, 2022
928
1,130
Very interesting, apparently it's specifically an AVX-512 instruction that is affected? If this is the only instruction that requires mitigations (and I'm assuming there are not performance regressions from mitigations to other instructions), then it would seem this vulnerability wouldn't impact the vast majority of everyday consumers who do not have AVX-capable CPUs or don't use software that relies on AVX-512.
 

Icelus

macrumors 6502
Nov 3, 2018
421
574
Very interesting, apparently it's specifically an AVX-512 instruction that is affected? If this is the only instruction that requires mitigations (and I'm assuming there are not performance regressions from mitigations to other instructions), then it would seem this vulnerability wouldn't impact the vast majority of everyday consumers who do not have AVX-capable CPUs or don't use software that relies on AVX-512.
No, AVX2 and AXV-512 have "gather" instructions.

(You can highlight/filter SIMD instructions here: https://www.officedaytime.com/simd512e/)
 
  • Like
Reactions: ArkSingularity

dmccloud

macrumors 68040
Sep 7, 2009
3,138
1,899
Anchorage, AK
i.e., intel will make MORE use of Cinebench to try and point out how great the new processors are when in reality outside of cherry picked workloads you're talking the now-intel-standard 5% generational improvement.

I think too many people put excessive emphasis on Cinebench scores. 99% of actual user workloads won't even tax the system in the same way as CB, so it's really just a stress test with numbers.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.