The whole world is going straight into IA, and you all think doubling the neural engine performance is bad? Apart from the redesigned GPU that probably will accellerate ML performance even more?
Just think about the possibilities of running a decently performance ligh ChatGPT in your pocket, without disclosing any of your personal data.
Now picture this scaling up to a M3 Ultra, if the current M2 Ultra performance with Lama model is already 12 tokes/sec…. Rivaling a Single A100 (nvidia best performance neural GPU), we could potentially have two A100 (or more) running in a small factor desktop machine.