You're Apple. You just built a 40-core SoC for Mac Pro. Now what?

hefeglass · Nov 17, 2021

Gnattu said:
Not quite. The A14/M1 design is not scalable at all, but I'm looking forward to see Apple's approach to make larger chips.

this comment didnt age well..

deconstruct60 · Nov 17, 2021

senttoschool said:
Just go to the official website.

View attachment 1910752

" in as little as 4hrs" is not all that good of a service level agreement for fair number of enterprises. A more salient metric would be "in no more than X hrs " . That "as little as" allows them to cherry pick a business that happens to be down the street from an Apple Store or depot location.

Note the footnote caveat at the bottom. If you happen to be in a specific city maybe we can provide the service. top 15 major population centers would leave out a very large number of businesses.

deconstruct60 · Nov 17, 2021

Gnattu said:
M1 Pro/Max is a completely different design with fundamental changes. The M1 Pro/Max is designed to be scalable even to a multi-die system, and this is simply not possible with A14/M1.

The M1 Max isn't designed to scale up at all. There is little, to no interchip connectivity evidence on that specific instance at all.

Apple might do another die that can scale, but the M1 Max ( Jade). isn't it. In terms of scale up limitations it is in the exact same class the M1 is in; not.

Joe The Dragon · Nov 17, 2021

deconstruct60 said:
" in as little as 4hrs" is not all that good of a service level agreement for fair number of enterprises. A more salient metric would be "in no more than X hrs " . That "as little as" allows them to cherry pick a business that happens to be down the street from an Apple Store or depot location.

Note the footnote caveat at the bottom. If you happen to be in a specific city maybe we can provide the service. top 15 major population centers would leave out a very large number of businesses.

parts in 4hrs or parts on order?

the old IRP program does not let shops stock parts at all.

deconstruct60 · Nov 17, 2021

zakarhino said:
...
He has constantly been leaking extremely accurate information about AMD and among that info he's been talking about AMD internal docs referencing "competition with Apple server chips" as one of their main upcoming competitors. This isn't the first time he's alluded to Apple making server chips.

MLID is the real deal. You might be one of those "well if it hasn't come out of Tim Cook's mouth it isn't true" sorta people but for the rest of us the certainty of a figure like MLID is enough of a confirmation. I'm looking forward to seeing what Apple does with their server chips.

Server chips for what? That Apple would be using Macs as servers. Duh! Apple said they were going to role out a XCode Cloud service. What do folks think that will run on over the long term other than M-series chips. So is this claim about tasking a future Mini (or rack Mac Pro) as a "server" and saying that 'M n ABC' chip is a server chip because it plays a server role? That is just click bait sizzle. Tim Cook (and Apple ) actually already did essentially say this was going to happen. Cloaking it in some cryptic click bait sizzle probably makes more money though for MLID. The 40 core (Jade4C ) is also non-news really for anyone looking at more than just MLID leaks. Again sizzle to be monetized though.

Or is this about the much larger cloud services that Apple runs. For which this could easily be about a not "ground up" Apple chip. It was Apple/ARM that it was being the competitive threat. Bergamo is labeled "Zen 4c" ( c -- cloud ). Apple doing a chip to run disk backups , apache serving , etc ... doesn't really match up with what the M-series is optimized for. Nothing substantive of that runs on macOS now , so a M-series chip derivative doesn't really bring much value ad. ( mostly these are Linux and commercial Unix workloads. ) The Supermicro like vendor or Open Compute server board modules they'd be going for doesn't really have Mac like issues or needs.

If there was certainty knowledge on this path there would be certainty backstory as to what the foundation there is. Apple could easily order up some mildly semi-custom Neoverse N2 and deploy those. However, that too wouldn't really be the "click bait" story that is being spun here. Or take a future semi-custom solution largely driven by Ampere and their 'fork' on ARM server cores that gets an Apple logo painted on it.

There is little to nothing in MLID's reporting that indicates that Apple is going "toe to toe" here in the hard core server space with a "group up, 100% custom" offering. Nothing. If it was not XCode Cloud he simply could have said that. He did not.

The age of the slide he is referencing suggests that "Apple" there could be substituted for "Nuvia" there will little change substantive meaning. ( at that time Nuvia was going after the server market with an "Apple like" core). Apple there is more of a knowledge/experience transfer competitive threat to AMD than Apple itself was going to take large cloud hardware revenue away from AMD.

deconstruct60 · Nov 17, 2021

Joe The Dragon said:
parts in 4hrs or parts on order?

the old IRP program does not let shops stock parts at all.

Apple's new "self repair" program kicks off in Spring 2022 also

iFixit Lauds Apple's New Self Service Repair Program, Calls It a 'Remarkable Concession'

Apple this morning surprised the world with an unexpected "Self Service Repair" program, which is designed to allow customers access to...

www.macrumors.com

I suspect Apple is going to roll out a modified parts logistics infrastructure for a limited set of certified shops also and perhaps feed parts though some of that new infrastructure coupling to some "at repair dispatch site" caching of parts. Use for a repair and Apple sends a replacement. Send the replaced part back and get a discount to make the replace service viable.

That way Apple probably gets some 3rd party sites to absorb the distributed warehousing costs. And they do the same to some of the stores and small scale repair depots. Charge back to the stores for the inventory costs so at larger corporate level overall inventory costs about the same. ( and likely outsourcing to some suppliers/logistics companies to do some of the offload too. )

Gnattu · Nov 17, 2021

deconstruct60 said:
The M1 Max isn't designed to scale up at all. There is little, to no interchip connectivity evidence on that specific instance at all.

Apple might do another die that can scale, but the M1 Max ( Jade). isn't it. In terms of scale up limitations it is in the exact same class the M1 is in; not.

M1 Max has a second set of registers, and it’s IRQ Controller is clearly designed for multi-die configuration. You can soft-gen IRQ on that block and get the IRQ delivered with die-id of 1 in the event register. The interconnect part is more likely be intentionally removed rather than not designed. That means the Jade die in the MacBook Pros still has some part chopped off at the bottom edge.

senttoschool · Nov 17, 2021

deconstruct60 said:
Just absurdly not true. Apple's store-in-a-store with Best Buy is necessary to plug the gaps.

".. The electronics retailer already serviced Apple products at about 225 stores and now does so at all of its 992 stores nationwide, according to Reuters, which is good news for customers who reside in states without any Apple Stores, including Montana, North Dakota, South Dakota, Vermont, West Virginia, and Wyoming. ... "

All of Best Buy's Nearly 1,000 Stores Now Offer Apple-Certified Repairs in the United States

Apple today announced that every Best Buy store across the United States now offers certified repairs and service for Apple products. The...

www.macrumors.com

News flash there actually are businesses in those six states. There are actually businesses that are not near hipster , trendy malls in other States too. Apple has "sherlocked" a number of VARs in some highly wealthily cities and locations, but that has little to do with geographic distribution of where businesses are located.

It seems true to me.

1. Apple stores is near enterprises/medium-sized businesses and probably most small businesses that would actually use Macs

2. Authorized stores like Best Buy plug the gap.

senttoschool · Nov 17, 2021

deconstruct60 said:
" in as little as 4hrs" is not all that good of a service level agreement for fair number of enterprises. A more salient metric would be "in no more than X hrs " . That "as little as" allows them to cherry pick a business that happens to be down the street from an Apple Store or depot location.

Note the footnote caveat at the bottom. If you happen to be in a specific city maybe we can provide the service. top 15 major population centers would leave out a very large number of businesses.

Instead of nitpicking on the fact that Apple's business services don't cover everywhere and everything yet, you should focus on the fact that Apple clearly wants small business, medium business, and enterprise customers. This is just a start and a clear indication that they're serious about it.

Also, I predicted something like this in this thread. But anyone with half a brain cell could have predicted this though. It's so obvious. There are only so many large markets that would make a difference for a 2.5 trillion-dollar company. Businesses and enterprises are clearly one of them.

senttoschool · Dec 18, 2022

senttoschool said:
You're Tim Cook, sitting in his nice office, looking at how much money you just spent to make this giant SoC for a relatively small market. In fact, you have to do this every year or every two years to keep the Mac Pro relevant. How do you recuperate some of this money spent?

The higher-end model with the M2 Extreme chip would have been available with up to a 48-core CPU and up to a 152-core GPU, according to Gurman, but he believes that this configuration was scrapped due to cost and manufacturing complexities.

"Based on Apple's current pricing structure, an M2 Extreme version of a Mac Pro would probably cost at least $10,000 — without any other upgrades — making it an extraordinarily niche product that likely isn't worth the development costs, engineering resources and production bandwidth it would require," he wrote.

Nailed it.

This SoC never made much sense for a highly niche product. In order to profitably produce and provide frequent updates for this chip, Apple would have to make a cloud version too.

IE. AMD and Intel's ultra high-end chips have a highly lucrative server market to support them. No "Extreme" Apple Silicon would have that so Apple is likely to always make a loss on it. It'd only serve as an expensive "halo" product.

I'm still holding out hope that Apple would produce an M3 "Extreme" chip but it'll always be one of the first things to get canceled when the finance people want to cut costs.

Gurman: All-New Mac Pro Still in Testing, But 'M2 Extreme' Chip Likely Canceled

Apple continues to test an all-new Mac Pro with an M2 Ultra chip, but the company has likely abandoned plans to release a higher-end configuration...

www.macrumors.com

Xiao_Xi · Dec 18, 2022

senttoschool said:
This SoC never made much sense for a highly niche product. In order to profitably produce and provide frequent updates for this chip, Apple would have to make a cloud version too.

What would an Apple server chip look like? The M1 Ultra doesn't look like ARM server chips at all.

Processors - Ampere Altra and AmpereOne

Cloud Native Processors reduce data center power consumption and physical space requirements, while delivering unprecedented performance and cost savings.

amperecomputing.com

Zest28 · Dec 18, 2022

AMD their latest Zen 4 server CPU's can have 300 cores (if you combine 3 of them) and 6 TB DDR5 RAM. This is a totally different market than what Apple can offer.

Besides, Google and Amazon have their own custom ARM server CPU's build for their own needs. So not sure what Apple will try to sell to them?

leman · Dec 18, 2022

Xiao_Xi said:
What would an Apple server chip look like? The M1 Ultra doesn't look like ARM server chips at all.

It won't look like anything. That's a very different market.

It's entirely possible that we will see Apple transitioning to be mobile-only company, without high-end desktop products (just some ultracompact desktops for everyday use). It would definitely be the "easy" way to take and still profitable. Not sure whether it would be a good long-term strategy though...

Analog Kid · Dec 19, 2022

How reasonable would it be to design Mac Pro as a cluster-in-a-box? Rather than try to scale the SoC up and figure out how to put more unified memory into the package, or try to build out another memory layer to add RAM that would have to extend the unified memory off package or page into it, would it work to just create a local cluster architecture with some number of Mx Ultras and a high speed interconnect between them?

20 cores and 128GB per Ultra isn't a bad ratio to build on. If you wanted 1TB of RAM, you'd need 8 Ultras in your cluster, which would give 160 CPU cores, 512 GPU cores, and 256 Neural Engine cores.

There may be some workloads that would prefer to pack all the CPU or GPU cores into one processor, but I'd also imagine that many workloads would benefit from having that much more memory bandwidth to distribute the processing over. Keeping the cluster in one box would mean the interconnect could presumably be both higher throughput and lower latency than racking up separate machines.

Of course once you hit the 8 or 16 SoC limit in a single box, you could probably design them to rack up into larger clusters.

ondioline · Dec 19, 2022

Analog Kid said:
would it work to just create a local cluster architecture with some number of Mx Ultras and a high speed interconnect between them?

Multiple processors have been possible for a very long time with hardware interconnects, it’s how the dual CPU Mac Pros worked. The hardware component was the “Intel QuickPath Interconnect” which handled NUMA on the processors. It was exposed to XNU as a single interleaved node (of memory)

Of course you’d still end up with 2 CPUs and 2 GPUs in the OS, so it’s not magic like the M1 Ultra. Your app would need to be able to schedule work on them. Although there’s plenty of things that can use 2 GPUs, its been 10 years since there was a multi-processor Mac so who knows? Maybe MacOS is smarter these days considering the P and E core setup.

I doubt they’d bother though.

Xiao_Xi · Dec 19, 2022

leman said:
It's entirely possible that we will see Apple transitioning to be mobile-only company, without high-end desktop products (just some ultracompact desktops for everyday use). It would definitely be the "easy" way to take and still profitable. Not sure whether it would be a good long-term strategy though...

Wouldn't a chiplet design like AMD's or Intel's allow Apple to scale up and down SoCs more cost-effectively?

leman · Dec 19, 2022

Xiao_Xi said:
Wouldn't a chiplet design like AMD's or Intel's allow Apple to scale up and down SoCs more cost-effectively?

In some way, maybe. AMD currently uses multi-chip packackes to drive the cost down, but the tradeoff is performance and power consumption. AMD still uses monolithic SoC on mobile to avoid the power consumption overhead. It's not quite clear how Intel solution will work.

Apple Silicon on the other hand has very low baseline power consumption while using wide memory interfaces. The way how Apple was doing it so far with specialised monolithic chips + MCM M1 Ultra makes sense from that perspective, but it limits the configurations they can produce and is anything but cheap. The expensive RAM setup doesn't make things better either.

They could use entirely different technology for their desktop products (like AMD does), but it is not obvious that such a strategy will be profitable for them. They could also "split up" the current prosumer SoC into more specialised clusters (CPU-heavy vs. GPU-heavy) and then combine those, but there are probably drawbacks to this approach as well — connecting chips is never going to be as efficient as using a single monolithic one. Regardless, memory is going to remain a key problem as well. And either way, these are solutions that will require considerable time and effort to bring to production.

What might be more attractive for them is to try to make chips that can run at higher clocks (at the expense of higher power consumption obviously). And then use the same chips across mobile and desktop but clock the desktop ones significantly faster. But it is not clear that this approach would scale well or is even possible without sacrificing efficiency.

PauloSera · Dec 19, 2022

senttoschool said:
Nailed it.

This SoC never made much sense for a highly niche product. In order to profitably produce and provide frequent updates for this chip, Apple would have to make a cloud version too.

IE. AMD and Intel's ultra high-end chips have a highly lucrative server market to support them. No "Extreme" Apple Silicon would have that so Apple is likely to always make a loss on it. It'd only serve as an expensive "halo" product.

I'm still holding out hope that Apple would produce an M3 "Extreme" chip but it'll always be one of the first things to get canceled when the finance people want to cut costs.

Gurman: All-New Mac Pro Still in Testing, But 'M2 Extreme' Chip Likely Canceled

Apple continues to test an all-new Mac Pro with an M2 Ultra chip, but the company has likely abandoned plans to release a higher-end configuration...

www.macrumors.com

We can't even get modern Apple to produce an M2 iMac or Mac mini because it isn't a cost effective decision.

deconstruct60 · Dec 19, 2022

Analog Kid said:
How reasonable would it be to design Mac Pro as a cluster-in-a-box? Rather than try to scale the SoC up and figure out how to put more unified memory into the package, or try to build out another memory layer to add RAM that would have to extend the unified memory off package or page into it, would it work to just create a local cluster architecture with some number of Mx Ultras and a high speed interconnect between them?

20 cores and 128GB per Ultra isn't a bad ratio to build on. If you wanted 1TB of RAM, you'd need 8 Ultras in your cluster, which would give 160 CPU cores, 512 GPU cores, and 256 Neural Engine cores.

Cluster in a box has some problems.

First it is doubtful you could get anything like 8 Ultras in a single box about the same size as the rack MP 2019 model. If you tilt the cooling block of the Ultra 90 degrees it is getting pretty close to width of a full sized MPX modules (closer to 4 slot wide). So 8 of those would be 32 slot widths.

Not that far off just putting 8 Studio Ultras on their side and putting them on a Mini-like custom rack shelf.
And if Apple would ship a M2 Pro in classic mini case. 20 of those would get a bigger CPU core count ( but need a bigger local cluster switch. )

Analog Kid said:
There may be some workloads that would prefer to pack all the CPU or GPU cores into one processor, but I'd also imagine that many workloads would benefit from having that much more memory bandwidth to distribute the processing over.

'May be some workloads'? Most apps doesn't come with a 'dispatch work to remote computer' menu option. If chopping problems up into seperate memory address spaces worked insanely great for most apps the Mac Pro 2013 would have had more update with its compute GPU. Some apps used it , but even Apple said that was a slow growth category.

Analog Kid said:
Keeping the cluster in one box would mean the interconnect could presumably be both higher throughput and lower latency than racking up separate machines.

The Ultra would need a substantially better PCI-e provisioning allocation than what the M1 Ultra got. If there was x16 PCI-e v4 'in/out' on each Ultra 'blade card' then if put a virtual ethernet driver on both sides could run standard software that did have a 'run job on remote computer on LAN' without much modifications. No 40GbE (or better ) switch needed .

There would be small crowd of more happy folks , but likely not a large one. virtual ethernet drivers shouldn't cost much though. ( Some of Intel's Xeon Phi cards had that as a host to card running lightweight Linux interface. )
So the small market wouldn't have 'big costs'. Would have the threshold of the custom "mac on a PCI-e card" development. Apple could also sell them as Macs to slot into Windows PCs. ( are reverse of the old Windows compatibility cards). Effectively it is just a Mac without an external case. An M2 Pro would likely be easier to do as a card with no molex power adaptors. Ultra is going to blow the bus power budget limit.

[ Apple has ways can use the Thunderbolt sockets to create a local cluster lan if just do point to point links with no switch. Basically similar. ]

Analog Kid said:
Of course once you hit the 8 or 16 SoC limit in a single box, you could probably design them to rack up into larger clusters.

If using just 'dispatch to computer on LAN' mechanism the subnets might be different (super fast internal one vs on a general LAN with DNS), but the code pretty much wouldn't have to change much. A bit more invovled on naming the nodes on cluster network can broadcast too and keeping all the node addresses straight. For 10-30 nodes it would be that hard to set up. (even if not point and click simple).

deconstruct60 · Dec 19, 2022

Xiao_Xi said:
Wouldn't a chiplet design like AMD's or Intel's allow Apple to scale up and down SoCs more cost-effectively?

something closer to the AMD GPU chiplet approach perhaps.

The Intel Xeon Max ( Sapphire Rapids) is likely close to what Apple tried to do and collapsed on costs. Four tiles meeting at a four corners intersection in a close packed configuration.

AMD's desktop approach likely blows too much Perf/W metrics for Apple to buy in. It doesn't work well for GPU cores. (even AMD switch gears when applied the general technique to GPUs. )

What Apple could do is closer to the model of the AMD chiplet model where the cores and critical compute is left monolithic. The m1 max is largely a GPU die with some smaller CPU core area 'strapped on'. AMD's apporach moved the caches and memory are moved off. ( The caches are not going to scale with TSMC N3 or N3E or much smaller nodes after that probably for a very long while. ). Will loose Pref/W, but if trying to control costs doesn't make much sense to use a $20K wafer to make something the exact same size as a $16K wafer would. No way controlling for costs there. if that was a different fab manufacturer you'd leave and go with the more affordable vendor.

If had a computation core focused die that had UltraFusion on all four sides it would be much easier to make a square of squares out of that building block than the monolithic laptop optimize shape they have now. However, I'm not sure TSMC's packaging technology can handle that.

Still have a problem that their 'Poor man's HBM" isn't going to scale as well as the central monolthic computation die complex shrinks. Not going to be able to follow that all the way down to small central complexes.

If M2 Max,Ultra,Extreme is on N3 then this may be more lack of wafer allocation because timing with other chips is conflicting. For a fixed number of wafers Apple what do you want to make. Lots of smaller dies or some super expensive low volume stuff.

If still on old N5P may be running into similar issues where every other major player usage of N5/N4 family has spike and not a 'infinite' supply of wafers anymore. There is die bloat because third iteration on N5 family which makes an already chunky chiplet even more worse.

If Apple took the mix of cores that would have gone into a M1 Max two die solution and made that a smaller monolithic compute core that could work. Pragmatically would need N3 to make that work (and kick some stuff off ). Then wrap about 10 chiplets around that. [ still would be awkward to through heterogenous DDR5 DIMMS into that mix and not start to spiral out of cost control again. ]

Xiao_Xi · Dec 19, 2022

deconstruct60 said:
The Intel Xeon Max ( Sapphire Rapids) is likely close to what Apple tried to do and collapsed on costs.

I was thinking of Meteor Lake.

Hot Chips 34 – Intel’s Meteor Lake Chiplets, Compared to AMD’s

During a presentation at Hot Chips 34, Intel detailed how their upcoming Meteor Lake processors employ chiplets. Like AMD, Intel is seeking to get the modularity and lower costs associated with usi…

chipsandcheese.com

deconstruct60 · Dec 20, 2022

Xiao_Xi said:
I was thinking of Meteor Lake.

Hot Chips 34 – Intel’s Meteor Lake Chiplets, Compared to AMD’s

During a presentation at Hot Chips 34, Intel detailed how their upcoming Meteor Lake processors employ chiplets. Like AMD, Intel is seeking to get the modularity and lower costs associated with usi…

chipsandcheese.com

Meteor Lake doesn't scale any better than the M1 Ultra does.

Meteor Lake is more utra-hyper dis-aggregation than a 'scaling chiplet' strategy. Intel doesn't have the production capacity to make Meteor Lake internally with at volume as a monolithic chip using Intel 4. So they have to split the production of those SoC over multiple fabs. That is the primary driver there.

If want more GPU cores then have to build a bigger GPU die. If want more CPU cores ... again bigger CPU die needed. It isn't really going to scale well past the approximate sizes they start off with for the GPU and CPU die. Meteor Lake is not going to become the "big workstation chip" solution. It is mainly laptops focused and a scramble not to fall too far behind in the laptop iGPU race.

senttoschool · Jan 29, 2023

leman said:
It won't look like anything. That's a very different market.

It's entirely possible that we will see Apple transitioning to be mobile-only company, without high-end desktop products (just some ultracompact desktops for everyday use). It would definitely be the "easy" way to take and still profitable. Not sure whether it would be a good long-term strategy though...

It really doesn't have to look different. It depends on what Apple wants its Apple Silicon Cloud to do. If they want it as a virtual Mac in the cloud, they don't have to change the SoCs. If they want to compete against AWS running Linux servers, then they'd have to complete change everything and they should have kept the Nuvia folks around.

I'm guessing they just want to create virtual Macs in the cloud approach and then do direct integration between local macOS and cloud SoCs.

sam_dean · Jan 29, 2023

I'd love to see 520 weeks into the future and see which chip/fab will sport the highest performance per watt at a node that is sub-1nm.

Imagine Apple Watch chips that allow the watch to run for a month without charging or draws power from your skin's body and be as powerful as a M1 Ultra.

adib · Jan 29, 2023

deconstruct60 said:
'May be some workloads'? Most apps doesn't come with a 'dispatch work to remote computer' menu option. If chopping problems up into seperate memory address spaces worked insanely great for most apps the Mac Pro 2013 would have had more update with its compute GPU. Some apps used it , but even Apple said that was a slow growth category.

Nevertheless, macOS apps have been multi-process for a long time. For example, Safari runs each website in a distinct renderer process. Spotlight runs its workers as separate processes. By default, processes have their own address spaces. All Apple needs to do on the software side is to make XPC services transparent between machine nodes.

You're Apple. You just built a 40-core SoC for Mac Pro. Now what?

macrumors 6502a

macrumors G5

macrumors G5

macrumors 65816

macrumors G5

macrumors G5

macrumors 65816

macrumors 68030

macrumors 68030

macrumors 68030

macrumors 68000

macrumors 68030

macrumors Core

macrumors G3

macrumors 6502

macrumors 68000

macrumors Core

Suspended

macrumors G5

macrumors G5

macrumors 68000

macrumors G5

macrumors 68030

Suspended

macrumors 6502a

Our Staff