Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
68,504
39,346


Apple plans to start using the M4 chip in its Apple Intelligence servers next year, according to a Nikkei Asia report this week, citing TrendForce analyst Frank Kung. Apple Intelligence servers are currently powered by the M2 Ultra chip, per previous reports.

Apple-Intelligence-General-Feature-2.jpg

The report claims that Apple has approached its largest manufacturing partner Foxconn about building additional Apple Intelligence servers in Taiwan.

It is unclear if the new servers will be equipped with the standard M4 chip, or a higher-end variant like the M4 Pro, M4 Max, or yet-to-be-announced M4 Ultra. It is also unclear if the existing servers with the M2 Ultra will be immediately upgraded to M4 chips.

Apple's plan to use M4 chips in servers was previously revealed by Haitong analyst Jeff Pu.

While some Apple Intelligence features rely entirely on on-device processing, Apple says requests that "require more processing power" rely on Private Cloud Compute models that are stored on the Apple Intelligence servers. When using Private Cloud Compute, Apple says that a user's data is never stored or shared with the company.

iOS 18.1 was released last month with the first Apple Intelligence features on the iPhone, such as writing tools and notification summaries. iOS 18.2 will be released to the public in December with additional Apple Intelligence features, including Genmoji for custom emoji, Image Playground for image generation, ChatGPT integration for Siri, and more.

Article Link: Apple Intelligence Servers Expected to Start Using M4 Chips Next Year After M2 Ultra This Year
 
I'd assume that they're at least using many M4 based units as well, just might not be publicly told yet. I hope so. Even being in a server use case, it would still have and continue to provide lots of insights into the M4, for Apple's own internal use and to find bottlenecks. Hopefully that was done and the data used to improve things before they put them into production Macs.
 
  • Like
Reactions: GraXXoR
Given the scale of Nvidia GPU sales I'd like to hope the Apple gets back into the server game since there is a lot of money to be made on high memory inference hardware. Assume the M4 Ultra is able to come with 384 GB of memory each it will be quite competitive for the 400B+ parameter models.
 
requests that "require more processing power" rely on Private Cloud Compute models
I've seen this hypothetical situation referred to many times. Does anyone have some examples of AI requests that might have to leave a user's device? Not being snarky: is it possible that requests from my base M1 MBA might cross that threshold due to its 8GB of RAM? I'm not worried because I don't currently plan to use AI, but I'm intellectually curious.

M4 chips seem like a nice sweet spot for Apple, and good reason for some people to (finally) upgrade from M1.
 
  • Like
Reactions: UpsideDownEclair
Have their been any reports indicating the relative numbers of M2 Ultra chips that Apple has put into their AI servers vs. the Mac Studio & Mac Pro?
 
But seriously, the base M4 NPU outperforms the M2 Ultra. The upgrade is well deserved.
The base M4 does not outperform the M2 Ultra in NPU function, though if you're not familiar with how TOPS ratings are determined it's easy to misunderstand. The short of it is you need to know what operation the TOPS rating is measuring when comparing.

M1 - M3 Neural Engines were measured using FP16 operations, whereas the M4 chips (and A17 and A18) are measured using INT8 operations. FP16 operations handle about twice as much data per operation than INT8. They're not entirely interchangeable but 20 FP16 operations would equalize out to about 40 INT8 operations.

The M2 Ultra is rated at 31.6 TOPS in FP16, which would be equate to roughly 62-64 TOPS in INT8. The M4 is rated at 38 TOPS in INT8.

Similar confusion occurred with the M3. Apple measured the M3 Neural Engine with FP16 but the corresponding A17 Neural Engine was measured with INT8 for whatever reason, thus making it seem that the A17 had a faster NPU than the M3 when they were essentially the same. The M4 looks like a huge leap over the M3 on paper because of the TOPS figure, but it's actually only about 5-10% faster. The M2 was actually the biggest boost to NPU performance in the four generations of M chip, about 40% faster than the M1.

For the record this is not Apple being sneaky, they made the change because AMD, Intel, and other companies coming out with NPU hardware are measuring in INT8 and it's become something of the de-facto standard benchmark for NPUs. Apple, with good reason, didn't want their NPUs specs to look worse because of a reason like that.
 
I oughta say, the M4 feels like the very first actual successor to the M1.
I think the problem was the M1 was actually “too good” for their first run and chip release. It made the more incremental M2/M3 updates seem trivial in comparison. The slight generational bumps on CPU or GPU performance paled in comparison to the monumental leap that was M1 vs predecessors / competitor offerings.

With that said, Apple also did a rather lackluster job at trying to differentiate or innovate product iterations beyond the slight jump in processing capabilities. (iE: M2, M3 iPad Pro, MacBooks, etc). Overall, it left a rather stale product lineup feel for years.
 
This will be about thermal efficiency probably more than anything.
The M4 probably has a better density to compute ratio than an M2 Ultra.
The only remaining advantage the M2 Ultra has over M4 is GPU power. M4 is better on NPU and single core scores.
If Apple's trying to be carbon neutral they'll be keen to keep the power consumption to a minimum and cooling servers is often more of a problem than powering them.
 
Are these processors socketed?
Of course not. Unless you count the connectors on the main board as "sockets"

Even in the old days of Intel processors, a socket only allowed interchanging the CPU with others in the same family or generation.

Today with our faster signals and smaller computers a socket or a connector is what we might call an "electrical speed bump"
 
Last edited:
Given the scale of Nvidia GPU sales I'd like to hope the Apple gets back into the server game since there is a lot of money to be made on high memory inference hardware. Assume the M4 Ultra is able to come with 384 GB of memory each it will be quite competitive for the 400B+ parameter models.
they will not re-enter that space. Not because you are wrong, but because it will take far too many resources to revamp, rebuild and maintain enterprise level customers, especially for AI/LLM.
Will they make capable hardware? Yes - and we can all buy it and do whatever we want with it. The M-series will likely just be soon complemented by some L-series or something like that (LLM is my hint) and those chips/GPUs will be specialized units to support even higher Gflops.
 
Given the scale of Nvidia GPU sales I'd like to hope the Apple gets back into the server game since there is a lot of money to be made on high memory inference hardware. Assume the M4 Ultra is able to come with 384 GB of memory each it will be quite competitive for the 400B+ parameter models.
Waste of time. M4 Ultra doesn't touch EPYC 5 and never will. The roadmap for EPYC Apple nor Intel will ever touch, and Nvidia will continue living of Compute with their GPGPUs as their own Arm chips are absolute trash.

Apple is using small language models in the Cloud to extend functionality within their ecosystem, not to waste tens of billions and making no dent in the Enterprise markets.

When at NeXT Steve promised our enterprise services expertise would be critical in rebuilding Apple. He scrapped that within the first year as the full time CEO.

It was made worse by gutting out a lot of capabilities and releasing a crippled OS X server product that they left to rot.
 
The base M4 does not outperform the M2 Ultra in NPU function, though if you're not familiar with how TOPS ratings are determined it's easy to misunderstand. The short of it is you need to know what operation the TOPS rating is measuring when comparing.

M1 - M3 Neural Engines were measured using FP16 operations, whereas the M4 chips (and A17 and A18) are measured using INT8 operations. FP16 operations handle about twice as much data per operation than INT8. They're not entirely interchangeable but 20 FP16 operations would equalize out to about 40 INT8 operations.

The M2 Ultra is rated at 31.6 TOPS in FP16, which would be equate to roughly 62-64 TOPS in INT8. The M4 is rated at 38 TOPS in INT8.

Similar confusion occurred with the M3. Apple measured the M3 Neural Engine with FP16 but the corresponding A17 Neural Engine was measured with INT8 for whatever reason, thus making it seem that the A17 had a faster NPU than the M3 when they were essentially the same. The M4 looks like a huge leap over the M3 on paper because of the TOPS figure, but it's actually only about 5-10% faster. The M2 was actually the biggest boost to NPU performance in the four generations of M chip, about 40% faster than the M1.

For the record this is not Apple being sneaky, they made the change because AMD, Intel, and other companies coming out with NPU hardware are measuring in INT8 and it's become something of the de-facto standard benchmark for NPUs. Apple, with good reason, didn't want their NPUs specs to look worse because of a reason like that.
except that only M4/A17 support x2 speed for int8 operations, M2 does not.
there are also other changes to ANE to boost performance

m2 ultra:
fp16: 26742 int8:30153

m4:
fp16: 36345 int8: 51123
 
With the M4 Ultra possibly reaching RTX 4090 performance at a fraction of the power draw, and not requiring the cost of water cooling, while looking the other way with gaudy LED light setups will save Apple may save Apple a lot of loot.

The savings of Apple doing it all themselves and not getting gouged by Nvidia, coupled with the energy savings, should leave enough for Apple to deck their server farms with some of the gaudiest LED hookups of every teen gamer's dreams.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.