Rumored pro chip

leman · Aug 2, 2021

bobcomer said:
Originally, yes, but newer Atom cores do have HT.

Intel dropped HT in Atom with Silvermont (2013). So it's exactly the other way around: earlier Atoms had HT, but newer Atoms do not as HT interferes with the goal of having superior power efficiency and density (HT has to replicate hardware resources which does not come for free).

bobcomer · Aug 2, 2021

leman said:
Intel dropped HT in Atom with Silvermont (2013). So it's exactly the other way around: earlier Atoms had HT, but newer Atoms do not as HT interferes with the goal of having superior power efficiency and density (HT has to replicate hardware resources which does not come for free).

Interesting, sorry for my confusion! I see now that the quick google that I did for Atom processors mostly came up with old stuff. I never owned an Atom powered device so had no experience with it.

The article I read about Alder Lake recently must have been wrong, as it specifically said that the Atom cores had HT too.

diamond.g · Aug 2, 2021

JMacHack said:
Off topic but I love reading through the comments on wccftech, it’s a giant cesspool.

Yeah they are pretty bad. Also have to take with grain of salt, since WCCFTech is a tech tabloid...

Intel Alder Lake CPU Specs Expose 5.3 GHz Boost Clock, 228W PL2

A mix of big and small cores

www.tomshardware.com

TomsHardware isn't 100% better, but is an alternate source...

bobcomer · Aug 2, 2021

leman said:
Intel dropped HT in Atom with Silvermont (2013). So it's exactly the other way around: earlier Atoms had HT, but newer Atoms do not as HT interferes with the goal of having superior power efficiency and density (HT has to replicate hardware resources which does not come for free).

Wow, I just looked up the 2020 Atom CPU's, high core counts without HT. Maybe it's time to pay attention to them.

leman · Aug 2, 2021

bobcomer said:
Wow, I just looked up the 2020 Atom CPU's, high core counts without HT. Maybe it's time to pay attention to them.

Intel's Tremont is basically about as fast as Apple's Icestorm... only that Icestorm uses around 4 times less power. That's quite a gap to overcome and process improvements alone won't help Intel to close it.

Jorbanead · Aug 2, 2021

Apple Knowledge Navigator said:
That’s true. But if the source suggested that this new device be roughly half the size, where would that space be cut? There must be compromises somewhere.

Here’s a super, super rough idea of what they could do (I know it’s not the best photoshop). They’d probably also do something with the empty space above the PCIe slots too. But they could easily cut out a few slots with the current design.

oldmacs · Aug 2, 2021

In the process of building a gaming PC for my brother, I've become very interested in Apple Silicon. However, I was wondering what everyone's thoughts on graphics for the higher end machines?

I've been looking at the Geekbench Metal benchmarking for the M1 graphics vs the graphics in the previous intel machines they replaced. the M1 graphics are a definite big improvement on the MacBook Air, MacBook Pro and Mac mini. M1 graphics are better then the previous low end graphics in the iMac 21.5 4K, slightly better than the mid range graphics in the iMac 21.5 4K but worse than the top end graphics in the iMac 21.5 4K.

For the bigger iMac and the MacBook Pro 16, how Apple going to make the M1X or the M2 beat GPUs such as:

AMD Radeon Pro 5500M (29886) vs M1 (21168)

AMD Radeon Pro 5600M (42510) vs M1 (21168)

AMD Radeon Pro 5300 (37184) vs M1 (21168)

Radeon Pro 5500 XT (41352) vs M1 (21168)

Radeon Pro 5700 (51245) vs M1 (21168)

Radeon Pro 5700 XT (58856) vs M1 (21168)

I've become very convinced of the ability of Apple Silicon to dominate in processing power, but I'm less convinced about graphics. However this could just be my ignorance of how things work. Will M1X/M2 be able to use discrete graphics cards? Or can someone explain how Apple is going to be able to improve on the M1's graphics to such an extent to match/beat Apple's current intel offerings?

Falhófnir · Aug 2, 2021

oldmacs said:
In the process of building a gaming PC for my brother, I've become very interested in Apple Silicon. However, I was wondering what everyone's thoughts on graphics for the higher end machines?

I've been looking at the Geekbench Metal benchmarking for the M1 graphics vs the graphics in the previous intel machines they replaced. the M1 graphics are a definite big improvement on the MacBook Air, MacBook Pro and Mac mini. M1 graphics are better then the previous low end graphics in the iMac 21.5 4K, slightly better than the mid range graphics in the iMac 21.5 4K but worse than the top end graphics in the iMac 21.5 4K.

For the bigger iMac and the MacBook Pro 16, how Apple going to make the M1X or the M2 beat GPUs such as:

AMD Radeon Pro 5500M (29886) vs M1 (21168)

AMD Radeon Pro 5600M (42510) vs M1 (21168)

AMD Radeon Pro 5300 (37184) vs M1 (21168)

Radeon Pro 5500 XT (41352) vs M1 (21168)

Radeon Pro 5700 (51245) vs M1 (21168)

Radeon Pro 5700 XT (58856) vs M1 (21168)

I've become very convinced of the ability of Apple Silicon to dominate in processing power, but I'm less convinced about graphics. However this could just be my ignorance of how things work. Will M1X/M2 be able to use discrete graphics cards? Or can someone explain how Apple is going to be able to improve on the M1's graphics to such an extent to match/beat Apple's current intel offerings?

16 and 32 core options are rumoured - maybe that won't directly translate into 2x and 4x the power, but I'd like to think the 32 should net you at least 3x the M1 (so around 60,000). The larger chassis is likely to allow for much more power draw, so tripling or quadrupling the power budget of the M1 GPU to match shouldn't be a problem. Even 4x the whole chip's 10W is still within just the '45W' the current CPU draws on its own!

oldmacs · Aug 2, 2021

Falhófnir said:
16 and 32 core options are rumoured - maybe that won't directly translate into 2x and 4x the power, but I'd like to think the 32 should net you at least 3x the M1 (so around 60,000). The larger chassis is likely to allow for much more power draw, so tripling or quadrupling the power budget of the M1 GPU to match shouldn't be a problem. Even 4x the whole chip's 10W is still within just the '45W' the current CPU draws on its own!

Excuse my ignorance, but do more GPU cores automatically increase performance?

deconstruct60 · Aug 2, 2021

Apple Knowledge Navigator said:
If they (Apple) go down the chiplet route, then the rumour of a 'G4 Cube' Mac Pro does make a lot of sense for those who don't need PCiE expansion. In its simplest form, they could offer 2-4 of the high-end M-series chip distributed evenly across the logic board, and fill 80% of the chassis with a heatsink.

If the silicon on die isn't placed inside the same CPU package then it isn't a chiplet. That is one essential aspect of being a chiplet is that it is part of a package ( almost always incomplete by itself ).

As opposed to dispersing them farther apart , more likely Apple will use the lower power consumption characteristics to push the chiplets closer together. Pushing them together keeps the communication power lower ( so consistent with the overall lower power objective) and minimizes the problems of unifying the memory accesses.

The 300W heatsink that Apple already has for the Mac Pro is probably enough. They may have to change the package to heatsink physical interface ( probably bigger. ) .

There "rumors" was more of a "modern day G4 Cube". Modern day would likely mean takes PCI-e cards. Perhaps a chance that it won't take any practical GPU card ( due to drivers and lack of AUX power). But modern day and no slots is a looser. Apple only supports one internal drive. Modern day size of data in many fields is much, much larger than it was back in the G4 era. Mac Pro has to support more than just one internal storage drive. And because APFS is skewed toward SSDs that pragmaticaly means they need some kind of at least PCIe-v3 slots .

The Gurman description of this Mac Pro is "half sized". Slicing the vertical dimension of the current Mac Pro in half would make it half sized. That would still leave plenty of room for 1-4 slots. Chopping the length of multiple dimensions would make the result far less than half ( drifting toward being a 1/4 the size). There is little good reason to do that. It would look more cubish from the front where the width and hight were about 9 inches.

Apple Knowledge Navigator said:
It makes me wonder if the current Mac Pro will even get an AS upgrade, or if they'll stick with Intel for the rest of its lifecycle.

Lifecycle for the chassis or the system? The system is rumored to be getting a W-3300 upgrade so Apple isn't really going to change that any time soon

For the chassis? I don't think this is an "either or". That smaller , "half sized" Mac Pro will likely run along side the full size version for a substantial amount of time. Apple wants to "close out" the transition on their self imposed 2 year deadline. They can use that one to claim they are done while still selling the full tower (and rack mount) version in the background. [ for example the 21.5" non-retina , old-as-dirt MBA CPU powered iMac is still for sale. Have to look down inside the 27" iMac buy page, but it is still there. ]

One of the big drivers for the current Mac Pro's volume/size is 500+ Watt MPX modules with 3rd part GPU chips on them. If Apple goes down the path that their iGPUs are "good enough for most Mac Pro users " then 3rd party GPU drivers may be dead. Pragmatically that would mean that the 3rd party hardware is dead too. If there is a 10% decrease in Mac Pro sales coupled to a 10% increase in MBP sales , then Apple will take the swap all day long.

I suspect the "half sizsed" Mac Pro will take the place that the iMac Pro was holding down price wise. the iMac with the larger screen will take performance iMac Pro was covering. That will still leave a upper end "hole" for the current chassis to cover. I doubt Apple wants to cover that with a M-series variant for the first 2-3 cycles. Eventually, I think the iGPU will hit a wall and they will need a chassis that can take at least 3rd party GPGPU cards to get to highest levels of performance. So decent chance it is on a super slow evolution path .

However, if there is no sign of 3rd party GPU drivers after WWDC 2022 for macOS 13 or even a hint for macOS 14 then yeah the chassis (and MPX modules) are probably doomed. But that doesn't mean they are ridigly going back to Cube form factor. There is far, far, far more to PCI-e slots than just GPU cards ( never know it from the AMD vs Nvidia fanboy wars on these forums , but those aren't the most card placement types. )

Falhófnir · Aug 2, 2021

oldmacs said:
Excuse my ignorance, but do more GPU cores automatically increase performance?

Unlike with the CPU many graphics tasks can be effectively parallelised (broken down into small chunks of work that can be handed out to cores to work on individually) making larger numbers of cores more useful. Apple seems to label their GPU cores differently to for example Nvidia, but someone who has a better in depth knowledge would have to take up the how and the why on that. As best I can tell, a 5600M has 36 of what Apple would call 'cores' (compute units) vs 8 on the M1.

CWallace · Aug 2, 2021

oldmacs said:
Excuse my ignorance, but do more GPU cores automatically increase performance?

While we only have the M1 and A12X/A12Z for reference, it appears that Apple GPU performance does scale linearly (at least for smaller core counts) as the 7-GPU SoCs benchmark at 87.5% of the 8-GPU SoCs (so 7/8ths).

deconstruct60 · Aug 2, 2021

bobcomer said:
I think Alder lake is doing something similar to the M1, where it has big and little cores. (performance and power saving) But it's still supposed to have normal HT on all its cores from what I've read.

In terms of hyperthreads, it is split. Performance cores, yes. The Energy cores (Gracemount / "Atom") , no.
( 8 GodlenCove ( 16 logical ) + 8 Gracemount ( 8 logical) ===> 24 logical cores. )

So probably not pragmatically useful without Windows 11 and a very substantively improved OS kernel scheduler upgrade.

Intel Alder Lake-S 16-core and 24-thread CPU appears on Geekbench - VideoCardz.com

Intel 10nm Alder Lake-S with 16-cores and 24 threads Intel already confirmed it intends to launch its Alder Lake desktop processor series by the end of next year. This will be a major launch for Intel, as the company will finally move away from its 14nm fabrication process technology. Intel...

videocardz.com

There also probably will be a fair number of apps that behave badly ( thinking that is 24 homogeneous cores when it is far, far, far from that case. )

There is very high chance that budget ( less than 16 cores ) Alder Lake systems will still be sold with HDD drives to shave some dollars off there base prices. Hyperthreads are more useful there. Apple has just about completely removed HDDs from their entire line up ( just a few more systems to 'retire' from active sale, but everything new is SSD only in configurations from Apple. )

bobcomer said:
That was the thing I liked about the M1 the most, those little cores for background system stuff is something that makes a heck of a lot of sense to me. We'll see when it's actually released.

Not just for system background stuff. The kernel scheduler can pull some workloads off the P cores if their workload queues get too long/large. On more than a few geek tech porn benchmarks that actually helps crank up the score. ( and some app workloads also that can chop workload into peieces without excessive locking sprinkled on top. )

But yes, most of the mundane, non-GUI background "Unix" threads don't need P cores to get the job done. Open up Activity Monitor and look at "view all processes" and 65+ % of the stuff there probably works just fine on an E core.

oldmacs · Aug 2, 2021

Falhófnir said:
Unlike with the CPU many graphics tasks can be effectively parallelised (broken down into small chunks of work that can be handed out to cores to work on individually) making larger numbers of cores more useful. Apple seems to label their GPU cores differently to for example Nvidia, but someone who has a better in depth knowledge would have to take up the how and the why on that. As best I can tell, a 5600M has 36 of what Apple would call 'cores' (compute units) vs 8 on the M1.

Ah that makes sense - I had assumed that like the CPU, multiple cores were only really useful if developers directly took advantage of them - so that make a lot of sense.

However I wonder - at what point does Apple reach a wall where they can't add any more cores to their GPU? What happens then?

CWallace said:
While we only have the M1 and A12X/A12Z for reference, it appears that Apple GPU performance does scale linearly (at least for smaller core counts) as the 7-GPU SoCs benchmark at 87.5% of the 8-GPU SoCs (so 7/8ths).

Ah that's great to know

leman · Aug 2, 2021

oldmacs said:
In the process of building a gaming PC for my brother, I've become very interested in Apple Silicon. However, I was wondering what everyone's thoughts on graphics for the higher end machines?

I've been looking at the Geekbench Metal benchmarking for the M1 graphics vs the graphics in the previous intel machines they replaced. the M1 graphics are a definite big improvement on the MacBook Air, MacBook Pro and Mac mini. M1 graphics are better then the previous low end graphics in the iMac 21.5 4K, slightly better than the mid range graphics in the iMac 21.5 4K but worse than the top end graphics in the iMac 21.5 4K.

For the bigger iMac and the MacBook Pro 16, how Apple going to make the M1X or the M2 beat GPUs such as:

AMD Radeon Pro 5500M (29886) vs M1 (21168)

AMD Radeon Pro 5600M (42510) vs M1 (21168)

AMD Radeon Pro 5300 (37184) vs M1 (21168)

Radeon Pro 5500 XT (41352) vs M1 (21168)

Radeon Pro 5700 (51245) vs M1 (21168)

Radeon Pro 5700 XT (58856) vs M1 (21168)

Apple currently has the most energy-efficient GPU IP on the market (2-3x as efficient than AMD or Nvidia), so it really depends on what they want to do. The current M1 part with 8 GPU cores (10 Watts) is 2.6 TFLOPS and it's graphical performance is roughly on par with a GeForce 1650 Max-Q. The rumored 16-core part should be somewhere in the ballpark of the 5600M (at half the power consumption), the 32-core part should be somewhere around an RTX 3060-3070 (mobile) and the behemoth 128-core part should be faster than an RTX 3090. Note that this assuming today's per-core performance, but upcoming Appel GPUs are likely to be faster.

oldmacs said:
I've become very convinced of the ability of Apple Silicon to dominate in processing power, but I'm less convinced about graphics. However this could just be my ignorance of how things work. Will M1X/M2 be able to use discrete graphics cards? Or can someone explain how Apple is going to be able to improve on the M1's graphics to such an extent to match/beat Apple's current intel offerings?

Apple's recipe is fast GPU clusters paired with fast system memory and very large caches. Technically, these are iGPUs, but they are much faster than regular iGPUs that focus on system cost more than anything else. Apple can get away with using integrated GPUs because they are much more power-efficient than other GPUs. By the way, this type of technology is now what is finding it's way into supercomputers — both Nvidia and AMD have announced plans to ship large compute clusters that combine CPUs/GPUs and fast unified memory for data centers (look up Nvidia Grace for example).

Falhófnir said:
Apple seems to label their GPU cores differently to for example Nvidia, but someone who has a better in depth knowledge would have to take up the how and the why on that. As best I can tell, a 5600M has 36 of what Apple would call 'cores' (compute units) vs 8 on the M1.

GPUs manufacturers like to employ creative accounting for advertising purposes. In fact, Apple is pretty much the only company that uses honest advertising here.

Nvidia uses the term "CUDA cores" to refer to maximal amount of floating-point operations the GPU can do per cycle — this is basically equivalent to the number of floating point ALUs with some technical details that I won't go into. A GTX 1650 Ti has 1024 CUDA cores, an RTX 3060 has 3584 cuda cores and so on. AMD uses a similar measure, only they call their things "stream processors". The number of these cores/processors is not the only factor because you also need to know how fast they run. Basically, number of cores/processors * frequency will tell you how many floating point operations a GPU can theoretically do per second.

Now, Apple's core is something more similar to a CPU "core". Every Apple G13 core (as used in iPhone 12, M1 etc.) contains four compute clusters with 32 compute units inside. Going by the mainstream metric, an Apple GPU core = 128 "shader cores". The M1 is then 1024 shader cores. And so on.

oldmacs · Aug 2, 2021

leman said:
Apple currently has the most energy-efficient GPU IP on the market (2-3x as efficient than AMD or Nvidia), so it really depends on what they want to do. The current M1 part with 8 GPU cores (10 Watts) is 2.6 TFLOPS and it's graphical performance is roughly on par with a GeForce 1650 Max-Q. The rumored 16-core part should be somewhere in the ballpark of the 5600M (at half the power consumption), the 32-core part should be somewhere around an RTX 3060-3070 (mobile) and the behemoth 128-core part should be faster than an RTX 3090. Note that this assuming today's per-core performance, but upcoming Appel GPUs are likely to be faster.

Wow thats impressive! Thanks for explaining

theorist9 · Aug 2, 2021

Jorbanead said:
View attachment 1813848
Here’s a super, super rough idea of what they could do (I know it’s not the best photoshop). They’d probably also do something with the empty space above the PCIe slots too. But they could easily cut out a few slots with the current design.

They could use a double-sided motherboard, which would allow PCIe slots on both sides.

JMacHack · Aug 2, 2021

oldmacs said:
Excuse my ignorance, but do more GPU cores automatically increase performance?

Do you want the short answer or the long answer? Because when asking technical questions like this the answer is always “it depends.”

The medium answer is: yes, for some workloads. If the scheduler can keep the cores fed.

Example: AMD Vega gpus. They ran hot and loud to match the performance of the (on paper) less powerful NVidia gpus (Pascal).

Many deep dives have been taken into it, and the general consensus (that my two brain cells could understand) was that the scheduler could not keep the cores fed with data. Leaving a good chunk of the cores “idle.” Smaller chunks of data that could be excessively parallelized (such as integer calculations) ran very well on Vegas super-wide architecture. Which is why GCN (AMDs architecture from that era) was very strong in compute, but weak in graphics. In deep dives into RDNA (AMDs current architecture) its less “parallel” than GCN, but more focused on keeping the cores fed. This is where the massive performance per watt gains over Vega happened.

With that said, Apple’s M1 gpu is more of a black box. But smarter people than me have said that the architecture is very efficient at keeping the gpu cores fed with data. So I would imagine that an Apple gpu would scale fairly linearly with cores.

So, like all technical questions, the answer is yes*

JMacHack · Aug 2, 2021

theorist9 said:
They could use a double-sided motherboard, which would allow PCIe slots on both sides.

Neat idea but I don’t think the case is wide enough for that. If it didn’t stress the slots I’d like to see it on the rack mount version though!

bobcomer · Aug 2, 2021

deconstruct60 said:
So probably not pragmatically useful without Windows 11 and a very substantively improved OS kernel scheduler upgrade.

I definitely agree with that statement.

deconstruct60 said:
There also probably will be a fair number of apps that behave badly ( thinking that is 24 homogeneous cores when it is far, far, far from that case. )

I'm not so sure general purpose stuff is going to have a problem with it, but benchmarks probably so. You can set core affinity in Windows, but I've never seen anyone use that capability.

deconstruct60 said:
There is very high chance that budget ( less than 16 cores ) Alder Lake systems will still be sold with HDD drives to shave some dollars off there base prices. Hyperthreads are more useful there. Apple has just about completely removed HDDs from their entire line up ( just a few more systems to 'retire' from active sale, but everything new is SSD only in configurations from Apple. )

It's been 6 years since I purchased a PC for work that had a HDD in it. That was the biggest bottleneck in PC design! The thought of getting a hard drive for a user PC just doesn't fly.

Even our midrange computer has SSD's...

deconstruct60 said:
Not just for system background stuff. The kernel scheduler can pull some workloads off the P cores if their workload queues get too long/large. On more than a few geek tech porn benchmarks that actually helps crank up the score. ( and some app workloads also that can chop workload into peieces without excessive locking sprinkled on top. )

I'm not as high on that, it would make run times pretty inconsistent, but I can understand why some would want that.

dgdosen · Aug 2, 2021

Does anyone think the longer Apple waits to introduce any new MBP 14/16, the higher the odds are that the system will be based on new (A15) microarchitecture as opposed to existing (Firestorm/Icestorm)?

leman · Aug 2, 2021

dgdosen said:
Does anyone think the longer Apple waits to introduce any new MBP 14/16, the higher the odds are that the system will be based on new (A15) microarchitecture as opposed to existing (Firestorm/Icestorm)?

I for one have been thinking for a while that we won’t see firestorm/icestorm but something different. In fact, I expect a chip that’s quite unlike the mobile base that M1 is using.

deconstruct60 · Aug 2, 2021

cmaier said:
No it doesn’t. It helps *x86* CPUs with multicore performance, because in x86 CPUs it’s difficult to keep the pipelines full (because of narrow issue, small register files, and difficulty with instruction reordering caused by too small a look-ahead window due to the difficulty of decoding).

That's not really the core issue.

cmaier said:
We have already seen that M1 does a remarkable job of keeping the pipelines full. So if the cores are already completely busy in M1, which they essentially are (when running a big multi-core job), how can HT add anything further? The processor would need to stop a running thread to substitute in another, even though the first thread hasn’t hit a bubble.

That is in part is true becaue the M1's just simpily avoid workloads where SMT ( "hyperthreading" ) has more traction rather than it is vastly more immune or inherently better. Apple also has power gating where they can just turn off stuff that isn't being filled.

cmaier said:
HT is a solution to problems caused by difficulties caused by CISC, and is of little benefit to CPUs that have heterogeneous cores that already run at very high IPC.

"Hyperthreading" is a marketing name that Intel created partially to offset the reality that they didn't invent SMT.

Simultaneous Multithreading home page

dada.cs.washington.edu

The initial research into SMT was done on a modification of Dec Alpha. The first paper's abstract.

"...
The increase in component density on modern microprocessors has led to a substantial increase in on-chip parallelism. In particular, modern superscalar RISCs can issue several instructions to independent functional units each cycle. However, the benefit of such superscalar architectures is ultimately limited by the parallelism available in a single thread.

This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. In the most general case, the binding between thread and functional unit is completely dynamic. We present several models of simultaneous multithreading and compare them with wide superscalar, fine-grain multithreaded, and single-chip, multiple-issue multiprocessing architectures. To perform these evaluations, we simulate a simultaneous multithreaded architecture based on the DEC Alpha 21164 design, and execute code generated by the Multiflow trace scheduling compiler. ..."

So the notion that this is some CISC versus RISC thing is mostly revisionist history. Every RISC implementation that stuck around in the "big iron" market into the current century picked up SMT ( Power , CISC and RISC code doesn't haven't inherent more single thread parallelism in them. There are limits to out of order and prediction that can be done on both. The gap between those two isn't as material as the order of magnitude gaps in the access memory lateness between the versus level of memory hierarchies.

Apple systems tend to have one and only one storage drive attached. The iOS devices never had SATA storage drives. The Macs are for the most part purged them. APFS runs slowly on HDDs and eschews any notion of RAID or large volume management. Apple's mac OS now also is pretty aggresive at caching persistent storage into memory to take that additional memory heirarchy level out of the picture for average workloads.

SMT's utility often doesn't show up in tech porn benchmarks that are more easily sucked into the L3 cache ( out of RAM memory even). So Apple gets some wins there where aspects of their target market are driven by those. Single user with single , largely sequential data streams are also low traction areas. (versus high number of users with aggregate random data streams. )

Arm put SMT on their E1 server baseline design.

"... Simultaneous multithread (SMT) enables Neoverse E1 to execute two threads concurrently resulting in improved aggregate throughput performance. ..."
https://www.arm.com/products/silicon-ip-cpu/neoverse/neoverse-e1

There is no huge instruction set change between N1 and E1. There is a targeted workload change. That workload focus is the key issue. Not the instruction sets.

With the N2 they aren't chasing SMT so much because have a relatively smaller area of implementation for their cores. So just more , but lower power consuming , cores is an offset to not having SMT.

"... In fact, Arm says the ratio is around 3:1 in terms of replacing traditional SMT threads with cores, power-wise, which allows a large core-count-based Neoverse N2 SoC to compete well against traditional x86 SoCs with comparable thread count. ..."

Arm Launches New Neoverse N2 and V1 Server CPUs: 1.4x-1.5x IPC, SVE, and ARMv9

Arm launches its next-generation server CPUs - Neoverse N2 and Neoverse V1 (formerly Perseus and Zeus). Targeting high-performance servers and the HPC market, the new cores bring 1.4-1.5x higher IPC , SVE support, BFloat16, and the ARMv9 architecture.

fuse.wikichip.org

One some workloads that will pan out. Other more effective random access ones it may not.

IBM Power has SMT8 mode option. That is a CISC processor? Not even close.

Intel has it because they need to stretch their baseline micro86_64 is architecture over a far wider set of CPU products than Apple does. Apple's got less than 64 cores limits weaved into their kernel. They don't even compete in some markets where the x86_64 is doing essential workloads every day.

However, to save implementation space and power, Intel's Alder Lake is using a relatively high number of E cores ( Gracemount , "Atom" class) cores. Those don't have SMT but also save on those two factors. If the Windows and Linux schedulers can be made to effectively use those cores in the most appropriate places then that will probably keep Intel "in the game" on generic consumer workloads until they can sort out their fab process issues.

SMT is not in Apple path far more so because they aren't trying to be a "everything for everyboad" CPU implementation. Apple is quite content to detour around some workloads that they don't consider to be important.

leman · Aug 2, 2021

deconstruct60 said:
That is in part is true becaue the M1's just simpily avoid workloads where SMT ( "hyperthreading" ) has more traction rather than it is vastly more immune or inherently better.

Can you elaborate more on this? A domain where SMT traditionally excels is heavy number crunching (as CPUs generally have difficulty keeping the ALUs utilized), but Firestorm performs extremely well here. In fact, for numerical code the M1 was outperforming my 8-core 16-thread mobile i9.

deconstruct60 · Aug 2, 2021

Sydde said:
Given that Intel themselves seem to be edging away from HT, it is difficult to envision circumstances that could lead Apple toward it. HT helped because x86 CPUs have long, narrow pipes that are sensitive to bubbles, so feeding two code streams into one pipe significantly reduces branch-miss and memory-stall type bubbles. Optimizing efficiency between two simultaneous threads, however, is a challenge that may cost more than one gains in performance.

[/QUOTE]

Apple's cores are just about equally sensitive to got to RAM ( as opposed to L2/L3) or go to HDD bubbles as x86_64 is. It isn't the size of the pipeline in those contexts. If you need that data to take a branch or unlock a data dependency , then you have to wait.

Reording and guessing at branches is just reshuffling the deck chairs. It is still the same set of deck chairs with the same entangled ordering dependiices in them. At some point the calculation in step 2 is going to be needed to make a decision and/or calculation at step 5. If don't get step 2 done that will introduce an execution bubble. That downtime is where SMT has deep traction.

SMT was invent on RISC. All the von Neumann based stuff has these issues. It isn't a RISC versus CISC thing.

Rumored pro chip

macrumors Core

macrumors 601

macrumors G5

macrumors 601

macrumors Core

macrumors 65816

macrumors 603

macrumors 603

macrumors 603

macrumors G5

macrumors 603

macrumors G5

macrumors G5

macrumors 603

macrumors Core

macrumors 603

macrumors 601

Suspended

Suspended

macrumors 601

macrumors 68030

macrumors Core

macrumors G5

macrumors Core

macrumors G5

Our Staff