They have already probably optimized it as much as they possibly can while still maintaining the accuracy of the output. Apple has released lots of research papers towards that, and they have released some AI models on huggingface if you are curious about that. Quantization is one method used to reduce the size of the AI models.Maybe so. But Apple should be able to use a smaller model for the older devices like 14 pro.