New cores (A15)

jmho · Sep 30, 2021

crazy dave said:
It depends … SPEC is a collection of different tasks (including raytracing) and has a bunch of different workloads that stress different parts of the CPU. It would be a mistake to say that Cinebench is necessarily more indicative of real world performance than SPEC but it isn’t necessarily wrong either - it depends on the context: what workload you actually care about, what test it is, and the correlation between the two.

Yeah, you're correct. I thought SPEC was much simpler. Looks like they're using POVRay to actually raytrace a chessboard.

It's still very likely to have a lot more cache coherence and less branching given that it's going to be a simpler engine with a simpler and more balanced scene with fewer materials.

crazy dave · Sep 30, 2021

sirio76 said:
Personally when I compare system performance I do check how fast my CPU will take to complete the job. I live in a real world and I do care about actual performance using render engine, compositing software, photogrammetry software etc (all maximize CPU usage at some point of the workflow), it’s totally possible that some tasks may be more efficient than other when running at full speed, but I do not care if a synthetic benchmark consume more power, I do care about completing the job on my software

This is orthogonal to the discussion at hand.

The original comment I responded to mentioned that Geekbench doesn’t stress the CPU that much. I responded that’s not actually true, but that Cinebench *is* an example of a benchmark that doesn’t stress the CPU. You responded to me saying that wasn’t true either even though it actually is true. Cinebench is not terribly stressful for the CPU - any CPU. That doesn’t make Cinebench necessarily bad, it depends on what you want to know.

All benchmarks are - at some level - synthetic and yes you should care most about those that reflect your own workloads as closely as possible. (Btw SPEC has workloads for everything you mentioned - eg one of the tests is just running blender).

crazy dave · Sep 30, 2021

jmho said:
Yeah, you're correct. I thought SPEC was much simpler. Looks like they're using POVRay to actually raytrace a chessboard.

It's still very likely to have a lot more cache coherence and less branching given that it's going to be a simpler engine with a simpler and more balanced scene with fewer materials.

Absolutely I can believe that and povray has much higher core utilization - it really pushes the CPU. How much is the scene and how much is differences between the engines is beyond my ken.

jmho · Sep 30, 2021

leman said:
I agree. There is a related question though: is it possible to code BVH traversal in a way that results in higher CPU utilization? I would be surprised if there it none. It all depends on how Cinebench does these things.

You'd probably get far higher CPU utilisation just dropping the BVH and brute forcing through a giant contiguous array of bounding boxes. (But obviously worse real-world performance)

The main problem with a BVH is that you're jumping all over memory and doing a single simple calculation and then jumping again, which makes filling the cache very difficult.

leman · Sep 30, 2021

jmho said:
You'd probably get far higher CPU utilisation just dropping the BVH and brute forcing through a giant contiguous array of bounding boxes. (But obviously worse real-world performance)

The main problem with a BVH is that you're jumping all over memory and doing a single simple calculation and then jumping again, which makes filling the cache very difficult.

You don’t have to use a naive binary tree for your BVH. There are more complex data structures that allow you to fetch and test entire cache lines in parallel. It’s an active area of research, and there are algorithms that allow efficient BVH traversal on GPUs with SIMD width as high as 32.

sirio76 · Sep 30, 2021

crazy dave said:
Absolutely I can believe that and povray has much higher core utilization - it really pushes the CPU. How much is the scene and how much is differences between the engines is beyond my ken.

the difference is that povray is not a renderer used in real world, while Cinema4D renderer is

theoretical performance is one thing, what you get in real life is a different story and I need to relay on what I work with.

leman · Sep 30, 2021

sirio76 said:
the difference is that povray is not a renderer used in real world, while Cinema4D renderer is theoretical performance is one thing, what you get in real life is a different story and I need to relay on what I work with.

You are entirely correct! In fact, all these synthetic benchmarks are quite pointless when it comes to choosing a computer for real work — one should always test using own workflows.

But much of this thread is about the question "how fast are these new cores as far as CPU cores go", and tests like Cinebench simply do not give a full answer.

crazy dave · Sep 30, 2021

sirio76 said:
the difference is that povray is not a renderer used in real world, while Cinema4D renderer is theoretical performance is one thing, what you get in real life is a different story and I need to relay on what I work with.

Sorry I’m not necessarily assigning a moral value to how far a benchmark pushes the cpu - a power virus will push a cpu core to its max but you wouldn’t benchmark a CPU’s computational performance using a power virus.

I keep trying to say that yes I agree it depends on your workflow and what benchmarks correlate the best with it (near 1-1 one would hope if it is the same program). Not every workflow has great benchmarks for it and/or may be quite varied. That’s the reason why SPEC and GB uses a large array of different tests to stress the CPU in different ways using a mixture of low level and real world applications. For instance you mention renderers: SPEC uses blender and povray, which in the real world can be used together, but SPEC tests them separately to get a more fine grained measure of each’s performance. This is also why, while easy to digest, the top level numbers for each (SPEC and GB), the weighted averages, are not actually as informative (even in a comparative sense to other CPUs) as the individual scores of each sub-test. And even with that, while both SPEC and GB endeavor to make their various sub tests as applicable to as wide a range of different workloads as possible that breadth invariably means that it is likely more targeted benchmarks will be better for you if they exist.

Edit: it should be noted that CPU designers like @cmaier have said they generally don’t use any of these benchmarks (except maybe SPEC) to judge the designs of their cores during development

jmho · Oct 3, 2021

I just spent the evening coding with the new A15 shuffle and fill instructions because I was curious as to how much faster they are.

I made a simple app that takes a 5k wallpaper and performs a 5x5 gaussian blur on it. Using the normal 25 samples per pixel method it takes between ~10-18 milliseconds to process the image and using shuffle and fill it takes only ~5-12 ms.

A really nice speed boost for post-processing / image-processing workflows (but obviously only if devs actually make specific shaders for the A15 and later)

Buntschwalbe · Oct 4, 2021

There's now a article from Anandtech about the A15:

The Apple A15 SoC Performance Review: Faster & More Efficient

www.anandtech.com

I'm interested on your comments and knowledge...

leman · Oct 4, 2021

I’d say, exactly as expected. They decided to further improve energy efficiency rather than focusing on performance alone, which was the right decision IMO. The reworked cache means that M2 is going to be an absolute beast. Hopefully this means good things for the prosumer silicon as well.

Lack of back-end or other architectural improvements can be seen as a disappointment, but than again, we were getting a bit spoiled. I still think that the cache, GPU and efficiency upgrades are a very solid year-to-year improvement. So far, I don’t really see any signs that Apple‘s chip design is slowing down.

EntropyQ3 · Oct 4, 2021

Excellent data from Andrei, as usual.
The A15 turns in a better than expected overall picture. I can't help wondering to what extent Apple will scale the size of the caches with the number of performance cores in the proverbial M1X.

leman · Oct 4, 2021

By the way, I just had a look at the last couple of generations of Apple Silicon and I think there is a clear pattern emerging starting from A12. The A12 seems in many ways to be the "modern Apple SoC", featuring a substantial redesign and improvements in all areas. A13 brought mostly cache improvements (as well as adopted the die layout Apple is still using). A14 had new CPU execution engines. And now A15 has new cache as well as improved E-cores.

This leads me to the following observations:

- the odd A- releases are focusing on cache, while the even A- releases are focusing on CPU internals
- every single A- release has significant changes in in some of it's internals (if it's not the CPU then it's the GPU or the video pipeline or the NPU)
- Apple is still on track of delivering consistent performance improvements, just this year they are going for a mix of improved performance and decreased power consumption rather than pure 20% performance boost than before

There were some claims that Apple's momentum is slowing down after some of its engineers have left. Frankly, I don't see it. It is clear that Apple is on some sort of two-year cadence here (with odd and even releases focusing on different big things), but we still get more progress every year than an average x86 CPU arch goes through in four or five. The improvements of A15 over A14 are at least comparable in scale to A13 over A12. Sure, the P-core execution engine didn't seem to change at all, but we got larger caches all the way, significantly faster E-cores, faster NPU, a new video pipeline, significant GPU tweaks and probably other things.

jdb8167 · Oct 4, 2021

Buntschwalbe said:
There's now a article from Anandtech about the A15:

The Apple A15 SoC Performance Review: Faster & More Efficient

www.anandtech.com

I'm interested on your comments and knowledge...

What I got from that nice detailed analysis is that Apple doesn’t design targeting consumer benchmarks. Outside of Wildlife Extreme the usual benchmarks don’t show much improvement but Anandtech’s partial SPEC benchmarks and power analysis show good improvements. How many other phone/mobile CPU makers would do that?

The GPU improvements also bode well for the upcoming pro MacBooks.

Colstan · Oct 4, 2021

I found this particular tidbit notable:

On an adjacent note, with a score of 7.28 in the integer suite, Apple’s A15 P-core is on equal footing with AMD’s Zen3-based Ryzen 5950X with a score of 7.29, and ahead of M1 with a score of 6.66.

I have to wonder if Apple is so far ahead with P-Core, that they felt the need to concentrate on other areas. This bodes well for future Apple Silicon on Mac. It'll be interesting where they take the M1X/M2 that are expected to be announced later this month.

Also:

In the end, it seems like Apple’s SoC team has executed well after all.

From Andrei's review, it sounds like @cmaier is correct that his old colleagues are still doing a bang up job at Apple, and that the talent loss to Nuvia (or other companies) has been widely overblown.

crazy dave · Oct 4, 2021

Colstan said:
I found this particular tidbit notable:

I have to wonder if Apple is so far ahead with P-Core, that they felt the need to concentrate on other areas. This bodes well for future Apple Silicon on Mac. It'll be interesting where they take the M1X/M2 that are expected to be announced later this month.

Also:

From Andrei's review, it sounds like @cmaier is correct that his old colleagues are still doing a bang up job at Apple, and that the talent loss to Nuvia (or other companies) has been widely overblown.

In the comments he posts the other averages too:

Comparative subsets would 5950X 7.29 int / 9.79 fp, 11900K 6.61 int / 9.58 fp. versus 7.28 / 10.15 on A15.

The reason he says he didn’t include the AMD/Intel results are the efficiency differences are so big it would skew the charts and make them hard to read.

As an aside, I like his new bubble charts.

cmaier · Oct 4, 2021

Colstan said:
I found this particular tidbit notable:

I have to wonder if Apple is so far ahead with P-Core, that they felt the need to concentrate on other areas. This bodes well for future Apple Silicon on Mac. It'll be interesting where they take the M1X/M2 that are expected to be announced later this month.

Also:

From Andrei's review, it sounds like @cmaier is correct that his old colleagues are still doing a bang up job at Apple, and that the talent loss to Nuvia (or other companies) has been widely overblown.

yep, the performance improvement was undersold, and the efficiency improvement is fantastic.

I wonder if Apple will, at some point, go to three tiers of cores. Would be interesting to profile some workloads and figure out if there’s even more efficiency to be gained somehow in the future.

crazy dave · Oct 4, 2021

cmaier said:
yep, the performance improvement was undersold, and the efficiency improvement is fantastic.

I wonder if Apple will, at some point, go to three tiers of cores. Would be interesting to profile some workloads and figure out if there’s even more efficiency to be gained somehow in the future.

Yeah this is what I was hoping for but I have to admit on the efficiency front it seemed dire with that one user’s results. I wonder what they did wrong in their measurements … because those initial not-done-by-Anandtech results showed no improvement on that front. Thankfully not the case.

cmaier · Oct 4, 2021

crazy dave said:
Yeah this is what I was hoping for but I have to admit on the efficiency front it seemed dire with that one user’s results. I wonder what they did wrong in their measurements … because those initial not-done-by-Anandtech results showed no improvement on that front. Thankfully not the case.

Hard to say, but it’s always been the case that cpu designers are looking at different benchmarks - and a lot more of them - than any particular internet poster or publication is likely to test. Not surprising that every once in awhile there is an outlier measurement.

Serban55 · Oct 4, 2021

leman said:
There were some claims that Apple's momentum is slowing down after some of its engineers have left.

Agree. Only those who doesnt know how an hardware business work
Again, until the SoC brain leaves apple, there are no worries at all..Johny Srouji knows what engineers he needs on his long silicon projects...When Johny Srouji will leave, then i would worry...everywhere he was, that company was on top of the game.

cmaier · Oct 4, 2021

Serban55 said:
Agree. Only those who doesnt know how an hardware business work
Again, until the SoC brain leaves apple, there are no worries at all..Johny Srouji knows what engineers he needs on his long silicon projects...When Johny Srouji will leave, then i would worry...everywhere he was, that company was on top of the game.

Even if he leaves, I wouldn’t worry. Plenty of people ready to step up.

cmaier · Oct 4, 2021

Colstan said:
I found this particular tidbit notable:

I have to wonder if Apple is so far ahead with P-Core, that they felt the need to concentrate on other areas. This bodes well for future Apple Silicon on Mac. It'll be interesting where they take the M1X/M2 that are expected to be announced later this month.

Also:

From Andrei's review, it sounds like @cmaier is correct that his old colleagues are still doing a bang up job at Apple, and that the talent loss to Nuvia (or other companies) has been widely overblown.

BTW, I’m being stalked on linkedin by folks from that other company - i keep getting weekly notices that they are looking at my profile.

Serban55 · Oct 4, 2021

cmaier said:
Even if he leaves, I wouldn’t worry. Plenty of people ready to step up.

Yes,of course, Apple will not shut down their silicon segment, so someone has to step up. but where Srouji will leave, that will be the top goat after. But again, a lot of IF's, maybe he will end his career at Apple
I love this guy, and the passion for silicon since he was at intel more than a decade ago and since its at Apple, its impressive

crazy dave · Oct 4, 2021

cmaier said:
Hard to say, but it’s always been the case that cpu designers are looking at different benchmarks - and a lot more of them - than any particular internet poster or publication is likely to test. Not surprising that every once in awhile there is an outlier measurement.

Given the stark difference between a 12-20% P/W improvement and a 25-33% P/J improvement vs … 0 … I’m going with that person not normalizing something correctly like accounting for differences in display power.

Edit: actually going back it’s worse … the 0% difference was IPC (which people seem to generally agree on), P/W was measured to be worse on the new A15 chips. Something is wrong somewhere.

leman · Oct 4, 2021

crazy dave said:
Given the stark difference between a 12-20% P/W improvement and a 25-33% P/J improvement vs … 0 … I’m going with that person not normalizing something correctly like accounting for differences in display power.

Edit: actually going back it’s worse … the 0% difference was IPC (which people seem to generally agree on), P/W was measured to be worse on the new A15 chips. Something is wrong somewhere.

Andrei's review is very clear that P/W on A15 is significantly better. There was one earlier review that claimed the opposite, but frankly, I think I will trust an established expert who has been doing in-depth reviews of Apple chips for years (and has the methodology).

New cores (A15)

macrumors 6502a

macrumors 68000

macrumors 68000

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors Core

macrumors 68000

macrumors 6502a

macrumors member

macrumors Core

macrumors 6502a

macrumors Core

macrumors 601

macrumors 6502

macrumors 68000

Suspended

macrumors 68000

Suspended

Suspended

Suspended

Suspended

Suspended

macrumors 68000

macrumors Core

Our Staff