Quick Geekbench benchmarks for the new A15 iPad mini.
Geekbench 5.4.1
Single core: 1594
Multi core: 4550
Metal compute: 13679
Geekbench 5.4.1
Single core: 1594
Multi core: 4550
Metal compute: 13679
My new iPhone arrived and I just wrote a simple Metal App that takes 50k triangles and renders them to a 16k x 16k texture, and then renders the 50k triangles again directly to the frame buffer using that 16k x 16k render target as a texture map.Would be interesting to benchmark the effects…
The bright side is that Proton is Steam-exclusive tech. Consoles and non-steam platforms don’t use it.I am actually worried about Vulcan‘s future. Paradoxically enough, progress of Proton makes it easy for the devs to just stick to DX12 and windows and get Linux support for free, and I expect that Vulkan in the mobile space will be superseded by WebGPU for all practical needs. Where would it leave Vulkan? As an API GPU vendors have to implement so that DX emulation works on Linux?
I’m very far behind on reading all of this so forgive me for saying something someone else said.Here’s a random thought - could the A15 cores be, essentially, simply revs of Firestorm and Icestorm, with the same microarchitecture and some minor improvements? Maybe they increased the sizes of the caches and SLC - that could account for, say, 10% of the increased transistor count. Then you have the extra GPU core, for another 5% - maybe more if the GPUs are redesigned. Maybe the neural engines were completely redone. Tough to figure out how to add up to 3 billion more transistors. But, anyway, perhaps they’ve decided to decouple the i-device and Mac CPUs, so the M2 is not using the same cores as the A15? It would make a certain amount of sense - the A-series now go only into phones and lower-end ipads, where the CPU performance has more or less become “more than good enough” and where the focus is on other functional blocks (GPU, neural engine, image processing, etc.) to support things like the upcoming VR goggles, etc.
Anyway, just random thoughts. I’ve certainly worked on a lot of chips where we got 10-15% improvement without ripping up the core microarchitecture. And if there was an entirely new design, I feel like Apple might have spent a bunch of time talking about how great it is. Now I feel like we will get our next CPU brag from Apple in a few weeks at the mac announcement.
The difference in transistor count is too big to just be caches and GPU, by a lot. Something else must also be going on.I’m very far behind on reading all of this so forgive me for saying something someone else said.
I’d expect the actual micro design of each E/P cores to be largely the same between A/M. There maybe deviations, that are minor, but, the generalized architectures are probably almost all the same.
Apple directly called out various system blocks being revised, plus the new GPU core.
That is where the increase comes from the size.
I think I read that the caches were also bigger…. But because reasons I probably won’t be taking the new iPhone apart when they come in.
I’m curious, are caches more or less transistor dense (i.e. transistors per die area) than the cores?The difference in transistor count is too big to just be caches and GPU, by a lot. Something else must also be going on.
More dense. Each bit is 6 transistors, then you have a certain number for each row and column (for the read/write ports, sense amps, etc., around the edges).I’m curious, are caches more or less transistor dense (i.e. transistors per die area) than the cores?
My new iPhone arrived and I just wrote a simple Metal App that takes 50k triangles and renders them to a 16k x 16k texture, and then renders the 50k triangles again directly to the frame buffer using that 16k x 16k render target as a texture map.
If the render target is lossless it's about 131 ms per frame, and if the render target is lossy it's 121 ms, with the fragment shaders taking 56 ms and 49 ms respectively. That's a pretty nice boost.
Obviously it's a slightly exaggerated test because nobody is going to render to a 16k render target, but I think it shows it's a pretty nice feature, and it's a GPU performance improvement that will (most likely) be invisible to Geekbench scores.
The difference in transistor count is too big to just be caches and GPU, by a lot. Something else must also be going on.
We'll wait for results then. Geekbench is not a very intensive test.Looks like we can expect Anandtech results soon and some possibly different conclusions about the A15 than reached by others.
Especially saying the CPU cores perf increase is really good is … well should be interesting! Looking forward to it. Possibly on more memory intensive workloads or efficiency?
We'll wait for results then. Geekbench is not a very intensive test.
Cinebench do utilize all the available CPU resources, it’s a good indicator when comparing multithreaded performance.Cinebench however doesn’t utilize cores to their full potential
Cinebench do utilize all the available CPU resources, it’s a good indicator when comparing multithreaded performance.
Cinebench is raytracing a real world scene, some rays won't bounce at all and others will bounce many many times all going through a bounding volume hierarchy that probably isn't balanced. There is going to be a massive amount of branching and cache hits and misses that it would be incredibly difficult to load-balance it to utilise all cores all of the time.
SPEC does far simpler computations repeatedly with probably very little branching which is always going to get far better core utilization.
Cinebench is probably far more representative of actual real-world performance (because very few real-world tasks are able to fully saturate all cores all the time), but it's not going to be useful for measuring power draw and other things where you want to test the CPU going all out.
@leman
Don’t know what’s happening there or if this is something specific to some CPU.
Some renderer can be more efficient than other for sure but overall render engine based benchmarks have been always good tests when comparing multithread performance. Benchmarks like Cinebench, Corona, Vray benchmark etc are based on a real render tasks and in every test I see all the core running @ 100%, like when rendering a 3D scene in real life. As said I’m not sure about the AS but on Intel/AMD based systems you can download the benchmarks and see by yourself
Cinebench is raytracing a real world scene, some rays won't bounce at all and others will bounce many many times all going through a bounding volume hierarchy that probably isn't balanced. There is going to be a massive amount of branching and cache hits and misses that it would be incredibly difficult to load-balance it to utilise all cores all of the time.
SPEC does far simpler computations repeatedly with probably very little branching which is always going to get far better core utilization.
Cinebench is probably far more representative of actual real-world performance (because very few real-world tasks are able to fully saturate all cores all the time), but it's not going to be useful for measuring power draw and other things where you want to test the CPU going all out.
@leman
Don’t know what’s happening there or if this is something specific to some CPU.
Some renderer can be more efficient than other for sure but overall render engine based benchmarks have been always good tests when comparing multithread performance. Benchmarks like Cinebench, Corona, Vray benchmark etc are based on a real render tasks and in every test I see all the core running @ 100%, like when rendering a 3D scene in real life. As said I’m not sure about the AS but on Intel/AMD based systems you can download the benchmarks and see by yourself
I would say the problem with Cinebench is that it's a single repetitive task. Traversing BVHs and performing ray intersections over and over shows how well a CPU does on these particular tasks. The fact that it doesn't draw as much power as possible is not a fault in my opinion as it shows how good the architecture is at executing code without wasting clock cycles. So yes it shows real-world performance but the CPU might perform worse at other, different use cases. I would rather trust a benchmark that cycles through many different tasks.
Cinebench is raytracing a real world scene, some rays won't bounce at all and others will bounce many many times all going through a bounding volume hierarchy that probably isn't balanced. There is going to be a massive amount of branching and cache hits and misses that it would be incredibly difficult to load-balance it to utilise all cores all of the time.
SPEC does far simpler computations repeatedly with probably very little branching which is always going to get far better core utilization.
Cinebench is probably far more representative of actual real-world performance (because very few real-world tasks are able to fully saturate all cores all the time), but it's not going to be useful for measuring power draw and other things where you want to test the CPU going all out.
Personally when I compare system performance I do check how fast my CPU will take to complete the job. I live in a real world and I do care about actual performance using render engine, compositing software, photogrammetry software etc (all maximize CPU usage at some point of the workflow), it’s totally possible that some tasks may be more efficient than other when running at full speed, but I do not care if a synthetic benchmark consume more power, I do care about completing the job on my softwareWhat we’re talking about is a little different and apply to x86 systems as well (@leman linked to those results in his posts). The core running at “100%” isn’t actually a good metric for the utilization of the core’s resources, power draw is literally the metric of how much energy a core is spending executing a task and one can see that Cinebench doesn’t cause cores to draw the kind of power that other workloads do - that’s true for x86 as well as AS.