Apple Intelligence Servers Expected to Start Using M4 Chips Next Year After M2 Ultra This Year

MacRumors · Nov 6, 2024

Apple plans to start using the M4 chip in its Apple Intelligence servers next year, according to a Nikkei Asia report this week, citing TrendForce analyst Frank Kung. Apple Intelligence servers are currently powered by the M2 Ultra chip, per previous reports.

The report claims that Apple has approached its largest manufacturing partner Foxconn about building additional Apple Intelligence servers in Taiwan.

It is unclear if the new servers will be equipped with the standard M4 chip, or a higher-end variant like the M4 Pro, M4 Max, or yet-to-be-announced M4 Ultra. It is also unclear if the existing servers with the M2 Ultra will be immediately upgraded to M4 chips.

Apple's plan to use M4 chips in servers was previously revealed by Haitong analyst Jeff Pu.

While some Apple Intelligence features rely entirely on on-device processing, Apple says requests that "require more processing power" rely on Private Cloud Compute models that are stored on the Apple Intelligence servers. When using Private Cloud Compute, Apple says that a user's data is never stored or shared with the company.

iOS 18.1 was released last month with the first Apple Intelligence features on the iPhone, such as writing tools and notification summaries. iOS 18.2 will be released to the public in December with additional Apple Intelligence features, including Genmoji for custom emoji, Image Playground for image generation, ChatGPT integration for Siri, and more.

Article Link: Apple Intelligence Servers Expected to Start Using M4 Chips Next Year After M2 Ultra This Year

lowendlinux · Nov 6, 2024

Are these processors socketed?

Gnattu · Nov 6, 2024

lowendlinux said:
Are these processors socketed?

Unlikely as the M2 Ultra they are currently using is not

Gnattu · Nov 6, 2024

Actually this is a good sign for people want even faster Mac Studio and Mac Pro if Apple is building chips for their own AI server, which could lead to more resources being put into making those high performance chips.

JPack · Nov 6, 2024

Did Apple upgrade to M4 because the base memory was increased? 😄

But seriously, the base M4 NPU outperforms the M2 Ultra. The upgrade is well deserved.

parameter · Nov 6, 2024

I'd assume that they're at least using many M4 based units as well, just might not be publicly told yet. I hope so. Even being in a server use case, it would still have and continue to provide lots of insights into the M4, for Apple's own internal use and to find bottlenecks. Hopefully that was done and the data used to improve things before they put them into production Macs.

joelypolly · Nov 6, 2024

Given the scale of Nvidia GPU sales I'd like to hope the Apple gets back into the server game since there is a lot of money to be made on high memory inference hardware. Assume the M4 Ultra is able to come with 384 GB of memory each it will be quite competitive for the 400B+ parameter models.

Thisismattwade · Nov 6, 2024

MacRumors said:
requests that "require more processing power" rely on Private Cloud Compute models

I've seen this hypothetical situation referred to many times. Does anyone have some examples of AI requests that might have to leave a user's device? Not being snarky: is it possible that requests from my base M1 MBA might cross that threshold due to its 8GB of RAM? I'm not worried because I don't currently plan to use AI, but I'm intellectually curious.

M4 chips seem like a nice sweet spot for Apple, and good reason for some people to (finally) upgrade from M1.

theorist9 · Nov 6, 2024

Have their been any reports indicating the relative numbers of M2 Ultra chips that Apple has put into their AI servers vs. the Mac Studio & Mac Pro?

Jack Burton · Nov 6, 2024

Put it in the Studio, please. I know AI comes first since it's phone centric, but sweet mercy.

JippaLippa · Nov 6, 2024

I oughta say, the M4 feels like the very first actual successor to the M1.

bradman83 · Nov 6, 2024

JPack said:
But seriously, the base M4 NPU outperforms the M2 Ultra. The upgrade is well deserved.

The base M4 does not outperform the M2 Ultra in NPU function, though if you're not familiar with how TOPS ratings are determined it's easy to misunderstand. The short of it is you need to know what operation the TOPS rating is measuring when comparing.

M1 - M3 Neural Engines were measured using FP16 operations, whereas the M4 chips (and A17 and A18) are measured using INT8 operations. FP16 operations handle about twice as much data per operation than INT8. They're not entirely interchangeable but 20 FP16 operations would equalize out to about 40 INT8 operations.

The M2 Ultra is rated at 31.6 TOPS in FP16, which would be equate to roughly 62-64 TOPS in INT8. The M4 is rated at 38 TOPS in INT8.

Similar confusion occurred with the M3. Apple measured the M3 Neural Engine with FP16 but the corresponding A17 Neural Engine was measured with INT8 for whatever reason, thus making it seem that the A17 had a faster NPU than the M3 when they were essentially the same. The M4 looks like a huge leap over the M3 on paper because of the TOPS figure, but it's actually only about 5-10% faster. The M2 was actually the biggest boost to NPU performance in the four generations of M chip, about 40% faster than the M1.

For the record this is not Apple being sneaky, they made the change because AMD, Intel, and other companies coming out with NPU hardware are measuring in INT8 and it's become something of the de-facto standard benchmark for NPUs. Apple, with good reason, didn't want their NPUs specs to look worse because of a reason like that.

jctevere · Nov 6, 2024

JippaLippa said:
I oughta say, the M4 feels like the very first actual successor to the M1.

I think the problem was the M1 was actually “too good” for their first run and chip release. It made the more incremental M2/M3 updates seem trivial in comparison. The slight generational bumps on CPU or GPU performance paled in comparison to the monumental leap that was M1 vs predecessors / competitor offerings.

With that said, Apple also did a rather lackluster job at trying to differentiate or innovate product iterations beyond the slight jump in processing capabilities. (iE: M2, M3 iPad Pro, MacBooks, etc). Overall, it left a rather stale product lineup feel for years.

Seoras · Nov 6, 2024

This will be about thermal efficiency probably more than anything.
The M4 probably has a better density to compute ratio than an M2 Ultra.
The only remaining advantage the M2 Ultra has over M4 is GPU power. M4 is better on NPU and single core scores.
If Apple's trying to be carbon neutral they'll be keen to keep the power consumption to a minimum and cooling servers is often more of a problem than powering them.

CrysisDeu · Nov 6, 2024

Make the model better first..

ChrisA · Nov 6, 2024

lowendlinux said:
Are these processors socketed?

Of course not. Unless you count the connectors on the main board as "sockets"

Even in the old days of Intel processors, a socket only allowed interchanging the CPU with others in the same family or generation.

Today with our faster signals and smaller computers a socket or a connector is what we might call an "electrical speed bump"

ChrisA · Nov 6, 2024

CrysisDeu said:
Make the model better first..

It is already not bad at all. Have you seen Apple's open source model. It is called "openELM" and it really is what they claim. Google will find it but look here: https://huggingface.co/apple/OpenELM

I have it running on my M2 16GB Mac Mini. You can try it too.

joelypolly · Nov 6, 2024

bradman83 said:
The M2 Ultra is rated at 31.6 TOPS in FP16, which would be equate to roughly 62-64 TOPS in INT8. The M4 is rated at 38 TOPS in INT8.

I am not sure if you can do a direct conversion since it the M2 chips do not directly support half precision quite as well which Apple is now using for the LLMs.

Here is the benchmark link https://browser.geekbench.com/ai/v1/compare/98097?baseline=96668

maruyama · Nov 6, 2024

I just want to know when this guy's coming back.

Takeo Apple · Nov 6, 2024

joelypolly said:
Given the scale of Nvidia GPU sales I'd like to hope the Apple gets back into the server game since there is a lot of money to be made on high memory inference hardware. Assume the M4 Ultra is able to come with 384 GB of memory each it will be quite competitive for the 400B+ parameter models.

they will not re-enter that space. Not because you are wrong, but because it will take far too many resources to revamp, rebuild and maintain enterprise level customers, especially for AI/LLM.
Will they make capable hardware? Yes - and we can all buy it and do whatever we want with it. The M-series will likely just be soon complemented by some L-series or something like that (LLM is my hint) and those chips/GPUs will be specialized units to support even higher Gflops.

Rradcircless · Nov 6, 2024

JippaLippa said:
I oughta say, the M4 feels like the very first actual successor to the M1.

How is that, I’m curious.

mdriftmeyer · Nov 6, 2024

joelypolly said:
Given the scale of Nvidia GPU sales I'd like to hope the Apple gets back into the server game since there is a lot of money to be made on high memory inference hardware. Assume the M4 Ultra is able to come with 384 GB of memory each it will be quite competitive for the 400B+ parameter models.

Waste of time. M4 Ultra doesn't touch EPYC 5 and never will. The roadmap for EPYC Apple nor Intel will ever touch, and Nvidia will continue living of Compute with their GPGPUs as their own Arm chips are absolute trash.

Apple is using small language models in the Cloud to extend functionality within their ecosystem, not to waste tens of billions and making no dent in the Enterprise markets.

When at NeXT Steve promised our enterprise services expertise would be critical in rebuilding Apple. He scrapped that within the first year as the full time CEO.

It was made worse by gutting out a lot of capabilities and releasing a crippled OS X server product that they left to rot.

dessa · Nov 6, 2024

bradman83 said:
The base M4 does not outperform the M2 Ultra in NPU function, though if you're not familiar with how TOPS ratings are determined it's easy to misunderstand. The short of it is you need to know what operation the TOPS rating is measuring when comparing.

M1 - M3 Neural Engines were measured using FP16 operations, whereas the M4 chips (and A17 and A18) are measured using INT8 operations. FP16 operations handle about twice as much data per operation than INT8. They're not entirely interchangeable but 20 FP16 operations would equalize out to about 40 INT8 operations.

The M2 Ultra is rated at 31.6 TOPS in FP16, which would be equate to roughly 62-64 TOPS in INT8. The M4 is rated at 38 TOPS in INT8.

Similar confusion occurred with the M3. Apple measured the M3 Neural Engine with FP16 but the corresponding A17 Neural Engine was measured with INT8 for whatever reason, thus making it seem that the A17 had a faster NPU than the M3 when they were essentially the same. The M4 looks like a huge leap over the M3 on paper because of the TOPS figure, but it's actually only about 5-10% faster. The M2 was actually the biggest boost to NPU performance in the four generations of M chip, about 40% faster than the M1.

For the record this is not Apple being sneaky, they made the change because AMD, Intel, and other companies coming out with NPU hardware are measuring in INT8 and it's become something of the de-facto standard benchmark for NPUs. Apple, with good reason, didn't want their NPUs specs to look worse because of a reason like that.

except that only M4/A17 support x2 speed for int8 operations, M2 does not.
there are also other changes to ANE to boost performance

m2 ultra:
fp16: 26742 int8:30153

Mac Studio (2023) - Geekbench

Benchmark results for a Mac Studio (2023) with an Apple M2 Ultra processor.

browser.geekbench.com

m4:
fp16: 36345 int8: 51123

Mac16,3 - Geekbench

Benchmark results for a Mac16,3 with an Apple M4 processor.

browser.geekbench.com

rp2011 · Nov 6, 2024

With the M4 Ultra possibly reaching RTX 4090 performance at a fraction of the power draw, and not requiring the cost of water cooling, while looking the other way with gaudy LED light setups will save Apple may save Apple a lot of loot.

The savings of Apple doing it all themselves and not getting gouged by Nvidia, coupled with the energy savings, should leave enough for Apple to deck their server farms with some of the gaudiest LED hookups of every teen gamer's dreams.

svish · Nov 6, 2024

Good to know about this. All M series chips are very powerful.

Apple Intelligence Servers Expected to Start Using M4 Chips Next Year After M2 Ultra This Year

macrumors bot

macrumors 603

macrumors 65816

macrumors 65816

macrumors G5

macrumors regular

macrumors 6502a

macrumors 6502

macrumors 601

macrumors 6502a

macrumors 68000

macrumors 65816

macrumors 6502

macrumors 6502a

macrumors 65816

macrumors G5

macrumors G5

macrumors 6502a

macrumors member

macrumors regular

macrumors 6502

macrumors 68040

macrumors newbie

macrumors 68030

macrumors G5

Our Staff