I do want to be careful to phrase this based off of my understanding. The way how OpenGL seems to run is almost to be getting what will be supported by the hardware by sending a request to the OS to see whats supported. Once that's validated, getting the functions relevant and running what it can handle. This seems like it's not much better from the Java model of "Write Once Run Everywhere". Yes, Java will run on everything that has the runtime, but it comes at the expense of having the bytecode run through the Java VIrtual Machine, then running against the results.
My first reaction was to write 'no, that is not how it works'. But its also possible that your understanding is essentially correct. Well, let me elaborate. OpenGL has a specification and everything exposed by the specification must be supported. its actually rather strict about this. E.g. if I target OpenGL 4.0, I know that I can use certain things (like tessellation shaders). Same with the various extensions offered via OpenGL I can test for the extension and if its supported, I can switch to a certain code path. These tests happen only once though, at the start of the application, so they do not have any performance impact.
A completely different question is whether a certain functionality is actually supported by the underlying hardware, e.g. whether is
fast. E.g. a driver might support tessellation shaders, but it could actually fall back to slow CPU-based emulation if you use them. This is a big problem, simply because there is no way to tell which feature is fast and which is not.
This is not a problem which is OpenGL-specific, every API which aims to communicate with a certain hardware has it, but OpenGL is quite prone to these problems because of its complexity. The specification is very big and consequently, its quite difficult to write a driver which fully supports it and does it really good. This is why Microsoft has had such good success with DirectX they provide most of the functionality (like the shading language compiler), and the hardware vendor just needs to implement a fairly simple interface to plug in their drivers. This makes things easier for both the driver developer (as they don't need to care about a very complex specification) and the software developer (because they have less stress in trying to find out the idiosyncrasies of the particular implementations).
What Metal, at least to me, appears to do is far more assumptive. A7 is the baseline. If you program to Metal, you know that the full graphic stack of A7 is there. It runs without as much intermediate steps. Once the calls are actually ran, I'd imagine that there is very very little difference. But all of the other steps build up latency.
I don't really see a principal difference here. Sure, Metal assumes some things; but so does OpenGL. They both assume that the graphics processing happen in a pipeline (the pipeline model between the both is essentially identical), that the output is produced by rasterising triangles, that there are framebuffer operations, that there are certain areas of memory called 'textures' which have to be accessed in a specific way, and so on.
It is true that when using Metal you can assume that the full A7 stack is available, but only because iOS and A7 is the only platform where Metal is actually implemented
If someone would, say, write a Metal library which translates it into OpenGL calls, you could link your application agains it and it will still work.
I can't stress enough that I'm thinking about this in terms of OpenGL running more like Java (though OpenGL doesn't suck as much as Java on most platforms), and Metal running more like native code compiled specifically against
While your Java analogy has some merit, it applies to
any API in more or less the same degree. The purpose of the API is to provide an abstraction level over hardware, so there is always some translation involved. APIs like Metal and DirectX 12 aim to be 'thin', by avoiding design mistakes that can lead to reduced performance in certain scenarios. APIs like OpenGL do not really care about that because they have a certain historical background from the times when the hardware was different. Both APIs make some assumptions about the underlaying hardware and neither of them have anything that makes them 'like native code'. E.g. the Metal shader language for example is no way similar to the actual native code used by the GPU.