I think they will reveal it at a later time.
My guess is that they really did use M2 Ultras because they were caught with their pants down when the GenAI wave hit. They couldn't buy enough Nvidia GPUs and they didn't have their own dedicated server NPU chips like Google, Meta, Microsoft, and Amazon do. They had to use what they already have which is M2 Ultras.
Apple Intelligence only runs on M-series and A17 Pro. Apple has been working toward generative inference all along. They been adding NPUs , AMX , and faster GPU compute for several years. The only thing thing there are 'caught with their pants down' is being miserly on memory (RAM).
Inference on the customer's hardware and electricity is going to be lots easier to deliver for 'free' in addition to the privacy aspects.
Where Apple has come up a bit 'short' is in shrinking/compressing the models. This 'punt extra compute to cloud' is structured ( layered on top a thinned out iOS) so that as the Apple Silicon devices get RAM uplift more and more of the compute can relatively easily migrate to the client devices. ( possibility without almost no changes at all except for where the threshold to cloud point is set to. )
Where Apple missed was the hype train of making the models as large as possible as quickly as possible. (the piled higher and deeper is better mania. )
I'm guessing they will reveal it once they have a dedicated server NPU. I also expect them to use a similar architecture to the Neural Engine so there is compatibility between local and server inference.
Pretty good chance not. Decent chance that it RAM that they are missing. If have 30Million M1 with 8GB and need 6GB model to do some edge case inference on a set of image files ... kicking that to a machine with 128GB of RAM that can consolidate 16 jobs onto one device would be a good 'force multiplier' ( and still have substantive RAM for file cache for the session(s). ). What Apple has is tons of users who are paying no money for using the servers ( so going to need workload consolidation.)
Already have the overhead of shipping all the data needed for the inference up to the cloud. So it isn't 'speed latency' that trying to minimize. Cost is a substantive factor that is being targeted here. R&D 'already paid for' M2 Ultras passed to servers in 'hand me down' status would fit that bill.
An Ultra is a bigger NPU than what the plain Mn , Pro , or Max have. They got bigger NPU covered.
P.S. if put this 'extra' , higher latency compute on a PCI-e card could sell them to MP customers who wanted to avoid the cloud 'round trip' and data center space expansion footprint. ( but most users won't be able to afford that so it wouldn't be the primary direction. ).
P.P.S. Ultra Studio/MP that happen to have 'large excess' of unused local RAM would see no 'cloud compute' latency at all. ( I don't think there is any big upside for Apple to create models that completely 'overflow' out of what the scope of the Ultra is. There can always be a punt to the 3rd party cloud option at the top end of the scale. Apple doesn't have to cover that with their hardware at all. )