Apple isn't the one making the drivers for the Intel/nVidia/AMD parts in their desktops/laptops though. That's the part you keep overlooking. Not only is it "really an argument", it's the most significant roadblock to implementing metal on the desktop/notebook aside from the x86 vs ARM architecture differences. The GPUs in the iOS devices are based to some extent off the PowerVR GPU. That is a far different part from GPUS from Intel, AMD, and nVidia, which aren't based on ARM/PowerVR at all. At the assembly code level (where these drivers ultimately make the system calls), you can't simply use X86 Assembly on ARM or vice-versa.
I wasn't talking about assembly. Right now, the OpenGL on OS X works like this Apple makes the frontend, which implements the basic OpenGL API and provides hooks for a pluggable driver. The driver is then written by the IHV (Nvidia, AMD etc.) in cooperation with Apple. I have serious suspicions that Apple writes their Intel drivers themselves, but I can't be sure.
Whether a GPU is a PoweVR one or not does not make any difference. The IHV still has to write their driver bit. And there is nothing about Metal which makes it PowerVR-specific. Its a just low-overhead 3D API (similar to AMD's Mantle) with a high-level shading language (similar to Nvidia's Cg)
One big issue with OpenGL is the shading language. Often, each driver has their own parser/optimizer stack for the shading language, which results in subtle incompatibilities and performance issues. Compare this to how DirectX handles things the shader language is parsed by the DirectX framework itself, which also performs high-level (hardware-independent) optimisations. The result of it is optimised intermediate code, which then gets passed to the actual driver. The driver can then translate that intermediate shader code into the native GPU code and perform hardware-specific optimisations. This makes the drivers much simpler to implement because they don't have to deal with the high-level stuff. Also, it makes the shader language behave more predictably between different vendors. Again, there is a good chance that Apple already does something like this with its current OpenGL stack, but I am not sure.
In any case, Metal is based on the LLVM compiler. The LLVM is designed to be modular and pluggable you have front-ends which parse specific programming languages and transform it to a language-independent code representation, a set of optimiser modules which work on that code representation and a backend which translates the code representation to the particular machine code. You can write a LLVM backend for a new CPU and all of the LLVM languages will then work on that CPU. With Metal, this means all the driver needs to do is provide a hardware-specific code generator for the LLVM. This is much less work then maintaining a full compiler.