That's a huge advantage over M3 NPU in perf/watt. I wonder if it's real.
I wouldn't trust any of these claims (for QC, for Intel, for Apple) right now.
The basic problem is I have no idea what's in "Procyon AI".
CPU's have been around long enough and are understood well enough that we kinda know what to expect from a CPU benchmark (though even there a coder can write code that will, or will not, be well mapped by the compiler onto SIMD...)
GPU's take us into a world where coders are much less aware of best practices for a given family of devices, so that even things like choice of algorithm or tuning (how many registers to use? how large threadgroups? ...) can have a large effect.
And with NPUs it's the wild west. The people writing these benchmarks have no clue what goes on inside the NPU, have no clue what flexibility is available in terms of tweaking the net to get same results but better performance on hardware X, and I'm not even convinced they (or anyone!) especially know the appropriate balance of functionality to be testing. Should they be testing mostly fully-connected layers or convolution layers? If convolutions, small (3x3) or larger (17x17)? What activation function should be used, good old ReLU, or various fancy alternatives? Testing with or without quantization? Straight line nets, or branched nets, or conditional nets (like mixture of experts)? etc etc
I suspect the whole thinking behind this is misguided.
When QuickTime started, it wasn't clear how things would play out and for the first few years the most visible face of QuickTime (not technically the most interesting, but publicly most visible) was that you could plug in different codecs for audio, video, and images. But after a few years smarter people than me within Apple realized that this was sub-optimal, that it made more sense, now that the world of codecs had settled down somewhat, to choose a few blessed codecs, optimize the hell out of those, and ignore the rest. So we went from a smorgasbord to Sorenson (for a few years) as the blessed video codec, to MPEG-4 and then the world of today (h.264 then h.265, on optimized hardware for both encode and decode).
My point is that while the rest of the world is still excited about plug-in neural nets (as in I download a random net, equivalent of a random codec) and just start running it on my hardware, I suspect Apple is looking to a future where there are a few blessed nets (probably different on each platform) that are the workhorses for that platform, and which will be the primary targets of HW and SW optimization. Maybe MobileOne for vision, OpenElm for language.
Just like you can run Dirac (or whatever the darling open source codec is this year) on your Apple hardware – but the experience will suck compared to just doing what Apple tells you and using h.265, so I suspect what will matter going forward is APPLE's vision and language networks. You will hook into those using Vision and Language APIs, with the ability to tweak things for various scenarios (eg adding vocabulary) but where Apple will focus their attention is on those. You can bring your own network and if it's for something minor (like classifying exercise) it will be fine; but if it's something major (like language) it will clearly suck compared to using the Apple built-in.
So I SUSPECT that this QC talk is just nonsense. It's like boasting that you have hardware that runs Dirac way more efficiently than an iPad – perhaps true, but also utterly irrelevant. The job to be done is not "run Dirac", it's "provide a video codec and its ecosystem". Likewise for the average person, the job to be done is not "run <random AI network> efficiently", it is "provide Vision functionality and Language functionality".
It took a few years for the PC ecosystem to pick up this particular change in codecs (inevitable given the way the PC world is about fragmented hardware) and it will be the same for ML. I expect three years from now QC, Intel and AMD, will be making similar boasts, and it simply will not matter on the Apple side – by then the built-in Vision and Language APIs will be well entrenched, sane developers will not be adding their own large nets, and no-one in the Apple world will care how Procyon AI performs.