New cores (A15)

jdb8167 · Sep 24, 2021

Quick Geekbench benchmarks for the new A15 iPad mini.

Geekbench 5.4.1
Single core: 1594
Multi core: 4550
Metal compute: 13679

jmho · Sep 24, 2021

leman said:
Would be interesting to benchmark the effects…

My new iPhone arrived and I just wrote a simple Metal App that takes 50k triangles and renders them to a 16k x 16k texture, and then renders the 50k triangles again directly to the frame buffer using that 16k x 16k render target as a texture map.

If the render target is lossless it's about 131 ms per frame, and if the render target is lossy it's 121 ms, with the fragment shaders taking 56 ms and 49 ms respectively. That's a pretty nice boost.

Obviously it's a slightly exaggerated test because nobody is going to render to a 16k render target, but I think it shows it's a pretty nice feature, and it's a GPU performance improvement that will (most likely) be invisible to Geekbench scores.

JMacHack · Sep 24, 2021

leman said:
I am actually worried about Vulcan‘s future. Paradoxically enough, progress of Proton makes it easy for the devs to just stick to DX12 and windows and get Linux support for free, and I expect that Vulkan in the mobile space will be superseded by WebGPU for all practical needs. Where would it leave Vulkan? As an API GPU vendors have to implement so that DX emulation works on Linux?

The bright side is that Proton is Steam-exclusive tech. Consoles and non-steam platforms don’t use it.

Kpjoslee · Sep 24, 2021

Quick test from iPhone 13 Pro Max
Geekbench 5.4.1
Single Core - 1723
Multi Core - 4600
Metal Score - 14135

vigilant · Sep 24, 2021

cmaier said:
Here’s a random thought - could the A15 cores be, essentially, simply revs of Firestorm and Icestorm, with the same microarchitecture and some minor improvements? Maybe they increased the sizes of the caches and SLC - that could account for, say, 10% of the increased transistor count. Then you have the extra GPU core, for another 5% - maybe more if the GPUs are redesigned. Maybe the neural engines were completely redone. Tough to figure out how to add up to 3 billion more transistors. But, anyway, perhaps they’ve decided to decouple the i-device and Mac CPUs, so the M2 is not using the same cores as the A15? It would make a certain amount of sense - the A-series now go only into phones and lower-end ipads, where the CPU performance has more or less become “more than good enough” and where the focus is on other functional blocks (GPU, neural engine, image processing, etc.) to support things like the upcoming VR goggles, etc.

Anyway, just random thoughts. I’ve certainly worked on a lot of chips where we got 10-15% improvement without ripping up the core microarchitecture. And if there was an entirely new design, I feel like Apple might have spent a bunch of time talking about how great it is. Now I feel like we will get our next CPU brag from Apple in a few weeks at the mac announcement.

I’m very far behind on reading all of this so forgive me for saying something someone else said.

I’d expect the actual micro design of each E/P cores to be largely the same between A/M. There maybe deviations, that are minor, but, the generalized architectures are probably almost all the same.

Apple directly called out various system blocks being revised, plus the new GPU core.

That is where the increase comes from the size.

I think I read that the caches were also bigger…. But because reasons I probably won’t be taking the new iPhone apart when they come in.

cmaier · Sep 24, 2021

vigilant said:
I’m very far behind on reading all of this so forgive me for saying something someone else said.

I’d expect the actual micro design of each E/P cores to be largely the same between A/M. There maybe deviations, that are minor, but, the generalized architectures are probably almost all the same.

Apple directly called out various system blocks being revised, plus the new GPU core.

That is where the increase comes from the size.

I think I read that the caches were also bigger…. But because reasons I probably won’t be taking the new iPhone apart when they come in.

The difference in transistor count is too big to just be caches and GPU, by a lot. Something else must also be going on.

altaic · Sep 24, 2021

cmaier said:
The difference in transistor count is too big to just be caches and GPU, by a lot. Something else must also be going on.

I’m curious, are caches more or less transistor dense (i.e. transistors per die area) than the cores?

cmaier · Sep 24, 2021

altaic said:
I’m curious, are caches more or less transistor dense (i.e. transistors per die area) than the cores?

More dense. Each bit is 6 transistors, then you have a certain number for each row and column (for the read/write ports, sense amps, etc., around the edges).

leman · Sep 25, 2021

jmho said:
My new iPhone arrived and I just wrote a simple Metal App that takes 50k triangles and renders them to a 16k x 16k texture, and then renders the 50k triangles again directly to the frame buffer using that 16k x 16k render target as a texture map.

If the render target is lossless it's about 131 ms per frame, and if the render target is lossy it's 121 ms, with the fragment shaders taking 56 ms and 49 ms respectively. That's a pretty nice boost.

Obviously it's a slightly exaggerated test because nobody is going to render to a 16k render target, but I think it shows it's a pretty nice feature, and it's a GPU performance improvement that will (most likely) be invisible to Geekbench scores.

Very cool! It sounds like it could be a nice boost for things like reflection maps (among others).

leman · Sep 25, 2021

cmaier said:
The difference in transistor count is too big to just be caches and GPU, by a lot. Something else must also be going on.

My hope is that they have included SVE support and it’s just not enabled in the iPhone right now.

Erasmus · Sep 25, 2021

Something I'm hoping for is that the more advanced Macs (i.e. M1X/P1/whatever it's called in bigger MBPs) get the significant increase in A15 Neural Engine performance.

I also hope the number of NE cores are scaled up by 2x-4x in line with the rumoured CPU/GPU core count increases, and that Apple spends a significant amount of effort making sure that developers, and other programming languages, can take full advantage of them with ease.

To me, the NE accelerator seems to be the most interesting part of the Apple Silicon to Mac transition, purely because it's potentially a huge amount of processing power, that could open up capabilities on the Mac that aren't possible on current or near future Intel/AMD machines, but what those capabilities are aren't clear yet.

crazy dave · Sep 29, 2021

Looks like we can expect Anandtech results soon and some possibly different conclusions about the A15 than reached by others.

https://twitter.com/x/status/1443362561089490944

Especially saying the CPU cores perf increase is really good is … well should be interesting! Looking forward to it. Possibly on more memory intensive workloads or efficiency?

thunng8 · Sep 29, 2021

crazy dave said:
Looks like we can expect Anandtech results soon and some possibly different conclusions about the A15 than reached by others.

https://twitter.com/x/status/1443362561089490944

Especially saying the CPU cores perf increase is really good is … well should be interesting! Looking forward to it. Possibly on more memory intensive workloads or efficiency?

We'll wait for results then. Geekbench is not a very intensive test.

crazy dave · Sep 30, 2021

thunng8 said:
We'll wait for results then. Geekbench is not a very intensive test.

It depends … Geekbench is fine on compute workloads - especially for measuring peak performance. Cinebench however doesn’t utilize cores to their full potential but again it depends on what you are doing the test *for* - something that will emulate a program needing to use all the cores’ resources flat out or something that reflects a particular use case. Anyway wrt GB, Anandtech prefers SPEC but they use GB too. SPEC has a different set of workloads - some of which are more memory intensive that might be sensitive to the A15’s big cache.

However I have a feeling that the surprise in the A15 CPU results might be in power efficiency. It’s not always easy to measure that (especially on iPhones) and in fact even Andrei in an earlier post has said they’ve had to revise earlier A14 peak GPU power estimates based on a better testing method they recently started.

It could also be in some of the lower level tests that Anandtech perform which show promise and gains beyond what most benchmarks readily test.

While speculation is fun, as you say, we’ll just have to wait for the results.

sirio76 · Sep 30, 2021

crazy dave said:
Cinebench however doesn’t utilize cores to their full potential

Cinebench do utilize all the available CPU resources, it’s a good indicator when comparing multithreaded performance.

leman · Sep 30, 2021

sirio76 said:
Cinebench do utilize all the available CPU resources, it’s a good indicator when comparing multithreaded performance.

It really doesn't. For example, M1 only draws 70% of it's peak possible power when running R23 [1]. Similarly, Zen3 has been demonstrated to run on a fairly low 5.6W at 4.4ghz [2]. Power draw when running industry-standard benchmarks such as SPEC is significantly higher which suggests a fundamental problem with Cinebench.

My suspicion is that Cinebench has poor utilization of CPU's ALU, which would also explain why SMT (hyper threading) gives it such a big boost. Conversely, Cinebench M1 scores are somewhat disappointing because M1 has a wide execution backend that Cinebench apparently is not able to take advantage of.

[1]

https://twitter.com/x/status/1328777333512278020

[2] https://forums.anandtech.com/threads/apple-a15-announced.2597187/post-40596651

jmho · Sep 30, 2021

Cinebench is raytracing a real world scene, some rays won't bounce at all and others will bounce many many times all going through a bounding volume hierarchy that probably isn't balanced. There is going to be a massive amount of branching and cache hits and misses that it would be incredibly difficult to load-balance it to utilise all cores all of the time.

SPEC does far simpler computations repeatedly with probably very little branching which is always going to get far better core utilization.

Cinebench is probably far more representative of actual real-world performance (because very few real-world tasks are able to fully saturate all cores all the time), but it's not going to be useful for measuring power draw and other things where you want to test the CPU going all out.

leman · Sep 30, 2021

jmho said:
Cinebench is raytracing a real world scene, some rays won't bounce at all and others will bounce many many times all going through a bounding volume hierarchy that probably isn't balanced. There is going to be a massive amount of branching and cache hits and misses that it would be incredibly difficult to load-balance it to utilise all cores all of the time.

SPEC does far simpler computations repeatedly with probably very little branching which is always going to get far better core utilization.

Cinebench is probably far more representative of actual real-world performance (because very few real-world tasks are able to fully saturate all cores all the time), but it's not going to be useful for measuring power draw and other things where you want to test the CPU going all out.

It’s not about saturating all cores at all time, its about saturating the execution resources of a single core. I agree that mispredicted branches and cache misses are a likely culprit in this particular scenario, but if this is the case then we should stop treating Cinebench as a CPU performance benchmark and instead treat it as a memory hierarchy/branch prediction benchmark. The point is that Apple Silicon excels in branch heavy code such as compilers or interpreters. But not Cinebench. There is something wrong here.

sirio76 · Sep 30, 2021

@leman
Don’t know what’s happening there or if this is something specific to some CPU.
Some renderer can be more efficient than other for sure but overall render engine based benchmarks have been always good tests when comparing multithread performance. Benchmarks like Cinebench, Corona, Vray benchmark etc are based on a real render tasks and in every test I see all the core running @ 100%, like when rendering a 3D scene in real life. As said I’m not sure about the AS but on Intel/AMD based systems you can download the benchmarks and see by yourself

crazy dave · Sep 30, 2021

sirio76 said:
@leman
Don’t know what’s happening there or if this is something specific to some CPU.
Some renderer can be more efficient than other for sure but overall render engine based benchmarks have been always good tests when comparing multithread performance. Benchmarks like Cinebench, Corona, Vray benchmark etc are based on a real render tasks and in every test I see all the core running @ 100%, like when rendering a 3D scene in real life. As said I’m not sure about the AS but on Intel/AMD based systems you can download the benchmarks and see by yourself

What we’re talking about is a little different and apply to x86 systems as well (@leman linked to those results in his posts). The core running at “100%” isn’t actually a good metric for the utilization of the core’s resources, power draw is literally the metric of how much energy a core is spending executing a task and one can see that Cinebench doesn’t cause cores to draw the kind of power that other workloads do - that’s true for x86 as well as AS.

GubbyMan · Sep 30, 2021

jmho said:
Cinebench is raytracing a real world scene, some rays won't bounce at all and others will bounce many many times all going through a bounding volume hierarchy that probably isn't balanced. There is going to be a massive amount of branching and cache hits and misses that it would be incredibly difficult to load-balance it to utilise all cores all of the time.

SPEC does far simpler computations repeatedly with probably very little branching which is always going to get far better core utilization.

Cinebench is probably far more representative of actual real-world performance (because very few real-world tasks are able to fully saturate all cores all the time), but it's not going to be useful for measuring power draw and other things where you want to test the CPU going all out.

I would say the problem with Cinebench is that it's a single repetitive task. Traversing BVHs and performing ray intersections over and over shows how well a CPU does on these particular tasks. The fact that it doesn't draw as much power as possible is not a fault in my opinion as it shows how good the architecture is at executing code without wasting clock cycles. So yes it shows real-world performance but the CPU might perform worse at other, different use cases. I would rather trust a benchmark that cycles through many different tasks.

leman · Sep 30, 2021

sirio76 said:
@leman
Don’t know what’s happening there or if this is something specific to some CPU.
Some renderer can be more efficient than other for sure but overall render engine based benchmarks have been always good tests when comparing multithread performance. Benchmarks like Cinebench, Corona, Vray benchmark etc are based on a real render tasks and in every test I see all the core running @ 100%, like when rendering a 3D scene in real life. As said I’m not sure about the AS but on Intel/AMD based systems you can download the benchmarks and see by yourself

These benchmarks have always been popular, but it doesn’t mean that they are good. And it’s only the last year or so that experts started voicing concerns regarding reliability and relevance of Cinebench.

To be clear: I don’t think that Cinebench is useless. It definitely represents a specific class of workloads and it’s a way to compare how different hardware runs this workload. It’s also useful to study throttling behavior and as a limited stress tool. But you should not use Cinebench as a main predictor of CPU performanc.

Quick note: 100% CPU utilization is not what you think it is. It’s just the relative time that the CPU spends running a user thread, as reported by the OS. It has no relation to how well the CPU core is actually utilized.

leman · Sep 30, 2021

GubbyMan said:
I would say the problem with Cinebench is that it's a single repetitive task. Traversing BVHs and performing ray intersections over and over shows how well a CPU does on these particular tasks. The fact that it doesn't draw as much power as possible is not a fault in my opinion as it shows how good the architecture is at executing code without wasting clock cycles. So yes it shows real-world performance but the CPU might perform worse at other, different use cases. I would rather trust a benchmark that cycles through many different tasks.

I agree. There is a related question though: is it possible to code BVH traversal in a way that results in higher CPU utilization? I would be surprised if there it none. It all depends on how Cinebench does these things.

The bottomline is that Cinebench does not measure how fast a CPU can do stuff. It measures how fast a CPU can run Cinebench. There are benchmarks however that indeed measure how fast a CPU can do stuff. One just needs to be very aware of what one is doing.

crazy dave · Sep 30, 2021

jmho said:
Cinebench is raytracing a real world scene, some rays won't bounce at all and others will bounce many many times all going through a bounding volume hierarchy that probably isn't balanced. There is going to be a massive amount of branching and cache hits and misses that it would be incredibly difficult to load-balance it to utilise all cores all of the time.

SPEC does far simpler computations repeatedly with probably very little branching which is always going to get far better core utilization.

Cinebench is probably far more representative of actual real-world performance (because very few real-world tasks are able to fully saturate all cores all the time), but it's not going to be useful for measuring power draw and other things where you want to test the CPU going all out.

It depends … SPEC is a collection of different tasks (including raytracing) and has a bunch of different workloads that stress different parts of the CPU. It would be a mistake to say that Cinebench is necessarily more indicative of real world performance than SPEC but it isn’t necessarily wrong either - it depends on the context: what workload you actually care about, what test it is, and the correlation between the two.

sirio76 · Sep 30, 2021

crazy dave said:
What we’re talking about is a little different and apply to x86 systems as well (@leman linked to those results in his posts). The core running at “100%” isn’t actually a good metric for the utilization of the core’s resources, power draw is literally the metric of how much energy a core is spending executing a task and one can see that Cinebench doesn’t cause cores to draw the kind of power that other workloads do - that’s true for x86 as well as AS.

Personally when I compare system performance I do check how fast my CPU will take to complete the job. I live in a real world and I do care about actual performance using render engine, compositing software, photogrammetry software etc (all maximize CPU usage at some point of the workflow), it’s totally possible that some tasks may be more efficient than other when running at full speed, but I do not care if a synthetic benchmark consume more power, I do care about completing the job on my software

New cores (A15)

macrumors 601

macrumors 6502a

Suspended

macrumors 6502

macrumors 6502a

Suspended

macrumors 6502a

Suspended

macrumors Core

macrumors Core

macrumors 68030

macrumors 68000

macrumors 65816

macrumors 68000

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors 68000

macrumors 6502

macrumors Core

macrumors Core

macrumors 68000

macrumors 6502a

Our Staff