Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

steve123

macrumors 65816
Original poster
Aug 26, 2007
1,161
721
Today's announcement at WWDC that the Apple Intelligence Private Cloud will be powered using Apple Silicon is likely the most significant announcement. nVidia has pretty much been sole supplier of AI hardware and they limit any companies ability to adopt AI. Apple making the decision to use their own silicon suggests they will be better able to accelerate their plunge into AI because they will have more supply of AI chips. Moreover, it bodes well that Apple will continue to aggressively innovate the ANE hardware to be competitive with nVidia.

This is fantastic news.
 
Too bad we may never know what SOCs Apple is using in these new data centers... It'd be interesting to see how quickly these evolve over time, or what impact their design has on consumer SOCs (or vice versa)
 
  • Like
Reactions: steve123
Apple uses plenty of Nvidia devices for training. I can see them running inferences on Mac hardware, Nvidia is too expensive to run for inferences given they limit memory on most of their GPU’s unless you spend ridiculous money.
 
We have heard the rumors that Apple is building their own ML cloud a while ago, and I have been wondering what would be the advantages of such a move. After all, Apple hardware currently lacks the performance and scalability to be really viable against Nvidia datacenter solutions.

I think the WWDC keynote gives the answer to that - privacy. Apple likely runs a custom, secure version of the OS, something they would not be able to do with external solutions. Their mention of auditable server software also sounds very intriguing and I can’t wait to learn more about it. Also, even if this first generation of Apple ML servers is lacking compared to other options, it gives Apple the opportunity to build experience and culture around managing this infrastructure, which is likely to be a vital factor for future independence.
 
Interested to find out about their datacentre hardware. I asked a friend who is an Apple SRE and they said the org is so carefully partitioned that they didn't even know anything about this until yesterday as well.
 
  • Like
Reactions: dk001
Today's announcement at WWDC that the Apple Intelligence Private Cloud will be powered using Apple Silicon is likely the most significant announcement. nVidia has pretty much been sole supplier of AI hardware and they limit any companies ability to adopt AI. Apple making the decision to use their own silicon suggests they will be better able to accelerate their plunge into AI because they will have more supply of AI chips. Moreover, it bodes well that Apple will continue to aggressively innovate the ANE hardware to be competitive with nVidia.

This is fantastic news.
Nvidia isn't the sole supplier. They're just far and away the best supplier.

AMD, Intel, Cerebus, Amazon, Microsoft, Google, Meta, Chinese tech giants all have their own custom AI chips. Apple has been late to server AI chips.
 
Last edited:
  • Like
Reactions: dk001
Too bad we may never know what SOCs Apple is using in these new data centers... It'd be interesting to see how quickly these evolve over time, or what impact their design has on consumer SOCs (or vice versa)
I think they will reveal it at a later time.

My guess is that they really did use M2 Ultras because they were caught with their pants down when the GenAI wave hit. They couldn't buy enough Nvidia GPUs and they didn't have their own dedicated server NPU chips like Google, Meta, Microsoft, and Amazon do. They had to use what they already have which is M2 Ultras.

I'm guessing they will reveal it once they have a dedicated server NPU. I also expect them to use a similar architecture to the Neural Engine so there is compatibility between local and server inference.
 
  • Like
  • Haha
Reactions: Adult80HD and dk001
"...once they have a dedicated server NPU. I also expect them to use a similar architecture to the Neural Engine so there is compatibility between local and server inference."

Ready to go in M5 Mac Pros, Studios et al, announced (as per Gurman) on or after WWDC 2025???
 
We have heard the rumors that Apple is building their own ML cloud a while ago, and I have been wondering what would be the advantages of such a move. After all, Apple hardware currently lacks the performance and scalability to be really viable against Nvidia datacenter solutions.

I think the WWDC keynote gives the answer to that - privacy. Apple likely runs a custom, secure version of the OS, something they would not be able to do with external solutions. Their mention of auditable server software also sounds very intriguing and I can’t wait to learn more about it. Also, even if this first generation of Apple ML servers is lacking compared to other options, it gives Apple the opportunity to build experience and culture around managing this infrastructure, which is likely to be a vital factor for future independence.
Came to post the same link @SteveOm posted above, Apple's article Introducing Private Cloud Compute nodes: "custom-built server hardware... paired with a new operating system..."

Products drive the development of Apple silicon, and Apple Intelligence is a new product. It's got the two hallmarks of an Apple product category: custom hardware and a new OS variant.

The secure iOS foundation and the A17 Pro requirement together seems like it could be a clue. I'd lean away from assuming M2 Ultra and toward some kind of custom A17/M3 engine...
 
I agree. It suggests rack mounted AS hardware, possibly with multiple SoCs per blade. Perhaps NPU cards might become available for the Mac Pro.
It’s Mac Pros or Studios, likely the rack mountable Pros.

M2 Ultra production ramped up last quarter in a way that didn’t correspond to sales, so they’re pumping them out for internal use.

 
I think they will reveal it at a later time.

My guess is that they really did use M2 Ultras because they were caught with their pants down when the GenAI wave hit. They couldn't buy enough Nvidia GPUs and they didn't have their own dedicated server NPU chips like Google, Meta, Microsoft, and Amazon do. They had to use what they already have which is M2 Ultras.

I'm guessing they will reveal it once they have a dedicated server NPU. I also expect them to use a similar architecture to the Neural Engine so there is compatibility between local and server inference.
The irony in all this is that a portion of the Nuvia team (the founder included) left the Apple silicon design because they weren't given the opportunity to work on an ARM server processor for data centers. Apple could have kept that silicon design talent in-house, in hindsight.
 
The irony in all this is that a portion of the Nuvia team (the founder included) left the Apple silicon design because they weren't given the opportunity to work on an ARM server processor for data centers. Apple could have kept that silicon design talent in-house, in hindsight.
I agree that it was a mistake to deny that team the chance to make server chips. Now they have to compete with Qualcomm’s Oryon and miss out on the talent.
 
I think they will reveal it at a later time.

My guess is that they really did use M2 Ultras because they were caught with their pants down when the GenAI wave hit. They couldn't buy enough Nvidia GPUs and they didn't have their own dedicated server NPU chips like Google, Meta, Microsoft, and Amazon do. They had to use what they already have which is M2 Ultras.

I'm guessing they will reveal it once they have a dedicated server NPU. I also expect them to use a similar architecture to the Neural Engine so there is compatibility between local and server inference.
How can their pants have been caught down when they’ve been working towards this holistic offering since before the M1 was announced?

Critical thinking people, it’s time to stop just repeating the “common wisdom” of the CNBC’s and “analysts” of the world.

None of this was just cobbled together in the last one, two, or even three years…
 
The irony in all this is that a portion of the Nuvia team (the founder included) left the Apple silicon design because they weren't given the opportunity to work on an ARM server processor for data centers. Apple could have kept that silicon design talent in-house, in hindsight.

I agree that it was a mistake to deny that team the chance to make server chips. Now they have to compete with Qualcomm’s Oryon and miss out on the talent.
I made the exact same comment awhile back on another forum and the response from someone who knows a few of these people (or at least has in the past) was that this was unlikely to be the reason they left but the rather the opportunity they saw in leaving. Basically they wanted to leave Apple for other reasons and also saw that making server chips was an opportunity they could explore and make a potentially big splash in. Now I can't vouch personally for this and the other person didn't expand on it and didn't pretend that they could. So take that how you will.
 
Came to post the same link @SteveOm posted above, Apple's article Introducing Private Cloud Compute nodes: "custom-built server hardware... paired with a new operating system..."

The software looks far more 'custom' than the hardware.

"...
We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as Code Signing and sandboxing.

On top of this foundation, we built a custom set of cloud extensions with privacy in mind. We excluded components that are traditionally critical to data center administration, such as remote shells and system introspection and observability tools
..."

hardware that only boots into iOS ... that Apple makes a lot of that already. What they have here is more of an 'applicance' than a general purpose compute server. The data doesn't persist. Closer to a very long distance eGPU only for very specific workloads. ( A GPU has some local resources but the persistant data is completely off the dGPU compute module. )

M-series runs iPadOS just fine. This appears to be even more stripped down in most ways than iPadOS is. They don't need 'new' hardware to run an thinned down OS.
 
This is really good for Mac Pro and Mac Studio customers because it creates an incentive for Apple to care about its highest end chips even though that external market for them is quite small.
 
  • Like
Reactions: komuh and Mac_fan75
This is really good for Mac Pro and Mac Studio customers because it creates an incentive for Apple to care about its highest end chips even though that external market for them is quite small.
Indeed. The economics for lower cost Mac Studio and Mac Pro could improve substantially since there will be an enormous number of chips consumed by the servers if they use the same chips.
 
Indeed. The economics for lower cost Mac Studio and Mac Pro could improve substantially since there will be an enormous number of chips consumed by the servers if they use the same chips.

Not likely going to lower the costs at all. Folks said that about dropping Intel ... "oh that evil Intel tax will go away "... it didn't. Even with the several units added in Apple has an about order of magnitude smaller pool to amortize costs over. Nvidia , in the very high of hype frenzy , is shipping about 3M


into a vastly larger market. The best hope for MS/MP is that perhaps the replacements won't be as slow as MP track record over last decade. The server deployment may not be 12 month update cycles, but it should create a more regular schedule. (not the Rip-van-Winkle approach to the MP over last decade. ).

Apple isn't charging for "Apple Intelligence"... which means no users is paying for these servers. At least directly, which means the 'recovery of costs' on these servers is low. More people , paying MORE money in the aggregate might drive down costs. People not paying at all ( Apple having to do this with indirect fees. e.g., iMessage overhead charge backs , etc. ). Apple is definitely going to be looking for at least some folks to pay for the SoCs upfront.
[ Pretty good chance that Apple will toss these SoCs into servers as "hand me down" processors , after something lese (Studio/MP) have paid for lots (if not all ) of the investment and overhead in creating them. The freeloaders are the ones that are more likely to get a 'discount' . ]
 
I think they will reveal it at a later time.

My guess is that they really did use M2 Ultras because they were caught with their pants down when the GenAI wave hit. They couldn't buy enough Nvidia GPUs and they didn't have their own dedicated server NPU chips like Google, Meta, Microsoft, and Amazon do. They had to use what they already have which is M2 Ultras.

I'm guessing they will reveal it once they have a dedicated server NPU. I also expect them to use a similar architecture to the Neural Engine so there is compatibility between local and server inference.
This post makes no sense. How was Apple caught with pants down. Nvidia is great investment for training but not so much. For inference. There is a reason, GroQ AI, Amazon, Google are all investing in custom chips for inferences. Apple does use Nvidia for training, but they can run inferences on custom hardware. Forget apple, there are users who connect few MS ultras and run distributed models for inferences. Very cost effective compared to Nvidia.
Apple models are running most of the tasks locally on device, there are few who have started playing with apple models on iPhone 15 Pro, looks like these models consume around 4GB memory. It will be interesting to see if Apple bumps RAM on all iphone 16 models or restricts to pro models with more memory.
 
  • Haha
  • Like
Reactions: dk001 and Serqetry
I think they will reveal it at a later time.

My guess is that they really did use M2 Ultras because they were caught with their pants down when the GenAI wave hit. They couldn't buy enough Nvidia GPUs and they didn't have their own dedicated server NPU chips like Google, Meta, Microsoft, and Amazon do. They had to use what they already have which is M2 Ultras.

Apple Intelligence only runs on M-series and A17 Pro. Apple has been working toward generative inference all along. They been adding NPUs , AMX , and faster GPU compute for several years. The only thing thing there are 'caught with their pants down' is being miserly on memory (RAM).

Inference on the customer's hardware and electricity is going to be lots easier to deliver for 'free' in addition to the privacy aspects.

Where Apple has come up a bit 'short' is in shrinking/compressing the models. This 'punt extra compute to cloud' is structured ( layered on top a thinned out iOS) so that as the Apple Silicon devices get RAM uplift more and more of the compute can relatively easily migrate to the client devices. ( possibility without almost no changes at all except for where the threshold to cloud point is set to. )

Where Apple missed was the hype train of making the models as large as possible as quickly as possible. (the piled higher and deeper is better mania. )


I'm guessing they will reveal it once they have a dedicated server NPU. I also expect them to use a similar architecture to the Neural Engine so there is compatibility between local and server inference.

Pretty good chance not. Decent chance that it RAM that they are missing. If have 30Million M1 with 8GB and need 6GB model to do some edge case inference on a set of image files ... kicking that to a machine with 128GB of RAM that can consolidate 16 jobs onto one device would be a good 'force multiplier' ( and still have substantive RAM for file cache for the session(s). ). What Apple has is tons of users who are paying no money for using the servers ( so going to need workload consolidation.)

Already have the overhead of shipping all the data needed for the inference up to the cloud. So it isn't 'speed latency' that trying to minimize. Cost is a substantive factor that is being targeted here. R&D 'already paid for' M2 Ultras passed to servers in 'hand me down' status would fit that bill.

An Ultra is a bigger NPU than what the plain Mn , Pro , or Max have. They got bigger NPU covered.



P.S. if put this 'extra' , higher latency compute on a PCI-e card could sell them to MP customers who wanted to avoid the cloud 'round trip' and data center space expansion footprint. ( but most users won't be able to afford that so it wouldn't be the primary direction. ).

P.P.S. Ultra Studio/MP that happen to have 'large excess' of unused local RAM would see no 'cloud compute' latency at all. ( I don't think there is any big upside for Apple to create models that completely 'overflow' out of what the scope of the Ultra is. There can always be a punt to the 3rd party cloud option at the top end of the scale. Apple doesn't have to cover that with their hardware at all. )
 
Last edited:
  • Like
Reactions: mode11 and steve123
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.