I've long theorized Apple could justify the R&D cost of an "Extreme" SoC by also deploying it in the server. In 2024, the world has changed a lot. LLMs and GenAI have taken over and are starving for compute. Bloomberg is reporting that Apple plans to use their Apple Silicon in the cloud to do AI inference.
Bloomberg is reporting the usage of M2/M4 Ultra thus far.
Here are my thoughts:
Apple’s plan to use its own chips and process AI tasks in the cloud was hatched about three years ago, but the company accelerated the timeline after the AI craze — fueled by OpenAI’s ChatGPT and Google’s Gemini — forced it to move more quickly.
Bloomberg is reporting the usage of M2/M4 Ultra thus far.
Here are my thoughts:
- Apple needs to use the M2/M4 Ultra because their dedicated AI inference chips aren't ready.
- Using a full Ultra/Extreme chip to do inference can't be scalable. Apple has billions of users. But because of how fast LLMs/GenAI are developing, Apple has no choice but to use what they have now.
- Apple is probably planning to break its Neural Engine into its own chip and scale it for server use long term. All big tech have their own in-house server NPUs deployed. Amazon has Inferentia. Microsoft just announced Maia 100. Google has had TPU for many years.
- There might still be use cases where Apple needs the full SoC to emulate a local Mac for some sort of cloud service. I think Apple might customize these SoCs for servers as well. For example, fewer CPU cores, more GPU cores, no display controllers.
Last edited: