You think adding a GPU core was more power efficient than bumping the GPU clock up 10-20%?
That would be the obvious conclusion, no?
You think adding a GPU core was more power efficient than bumping the GPU clock up 10-20%?
I guess, they have increased CPU clocks instead of adding (P) cores I just think it is interesting that adding more GPU cores is cheaper than increasing clocks. Does Apple have differing clock domains like AMD does for frontend and shader in their GPU, or is it clocked monolithically like Nvidia (who used to have different frontend/shader clock domains).That would be the obvious conclusion, no?
Perhaps the gpu cores scale better with amount than frequency? They were initially designed for mobile power consumption.I guess, they have increased CPU clocks instead of adding (P) cores I just think it is interesting that adding more GPU cores is cheaper than increasing clocks. Does Apple have differing clock domains like AMD does for frontend and shader in their GPU, or is it clocked monolithically like Nvidia (who used to have different frontend/shader clock domains).
I guess, they have increased CPU clocks instead of adding (P) cores I just think it is interesting that adding more GPU cores is cheaper than increasing clocks.
Does Apple have differing clock domains like AMD does for frontend and shader in their GPU, or is it clocked monolithically like Nvidia (who used to have different frontend/shader clock domains).
As an aside, I seem to remember that the Mac Pro 2013 had the same 2 teraflops compute with the D300 GPUs.Regarding the GPU, I did a quick test and A17 Pro delivers 2TFLOPs compute throughput. That gives us the GPU clock of 1.3Ghz, same as A15.
The M1 Max seemed to have a 2nd ANE on the bottom left that wasn't utilized and seemingly omitted from the M2 Max, so maybe at first they were planning to scale the ANE by 2x for the Max and Ultra variants, but late into the M1 Max design process decided to utilize the die area for other features for later generations?- 16 core NPU. It's not widely remarked upon how (very unlike the GPU or CPU) the NPU core count remains the same from A to Max. Kinda strange that, and not sure what it means. Maybe that, for now anyway, Apple see this as mainly about facilitating language UI, and there's only one person talking whether you have an iPhone or a Mac Studio?
Following up on all this, it's remarkable how quickly we all settle into an idea that none of us imagined 3 years ago (Pro/Max/Ultra design) and start to assume it will be THE design going forward always...The M1 Max seemed to have a 2nd ANE on the bottom left that wasn't utilized and seemingly omitted from the M2 Max, so maybe at first they were planning to scale the ANE by 2x for the Max and Ultra variants, but late into the M1 Max design process decided to utilize the die area for other features for later generations? View attachment 2284194
May I know what app you used to test the TFLOPS???Regarding the GPU, I did a quick test and A17 Pro delivers 2TFLOPs compute throughput. That gives us the GPU clock of 1.3Ghz, same as A15.
May I know what app you used to test the TFLOPS???
I see, can your program see the amount of ALU and RAM???It’s a simple compute shader I wrote.
I think we've talked about this before.Now multiple mix and match options exist. At the low end just a standard Pro. In the mid-range, if I mainly want CPU, I can join two Pro's together. If I mainly want GPU, I can join a Pro and a GPU. At the high end, I can get anything from four Pro's to a Pro and three GPU's.
Are you saying that instead of shoreline they'd use the back of the package for memory busses? I imagine that would make BSPD harder once they get there in a couple more nodes. But regardless, how do you distribute all those memory busses across the chiplets without spending a ton of energy (and delay) shipping all those bits constantly between the chiplets?You can raise the boring obvious complaint about geometry (how do you lay this out? where does the memory go?) but honestly that's just not interesting! There are at least two alternatives. One is on the back of the package (as is done with the A16, and maybe A17 - haven't seen any photos), the other is via the memory-spine design I have described on a few occasions.
No, I have to verify my hypothesis as well, your hypothesis sounds just as if not more plausible than my hypothesis. I'll comment again if I have any more evidence. FWIW, I saw Geekerwan say the A17 changes the strategy of the low power mode while turning off the P-cores, which might explain the a lot lower power/performance point than older chips in their charts for the low power mode, if true.Maybe you’re right. Notebookcheck mentioned this behavior in an article, and this Reddit post said the same, I should have verified it further. It seemed plausible to me because 4 E cores should have (roughly) a multicore performance score in that ballpark.
(a) I referred to the A16. Don't you think I made that reference for a REASON?I think we've talked about this before.
I wonder if this would actually work. You also say:
Are you saying that instead of shoreline they'd use the back of the package for memory busses? I imagine that would make BSPD harder once they get there in a couple more nodes. But regardless, how do you distribute all those memory busses across the chiplets without spending a ton of energy (and delay) shipping all those bits constantly between the chiplets?
I imagine it would come down to how local memory can be for GPU work patterns, and I don't know enough about GPU code to have an opinion.
The memory-spine thing is fascinating. I don't think it fully eliminates the problems of possible nonlocality though, from a performance/efficiency standpoint at least (it obviates NUMA issues, if I remember right).
Right, but I was thinking more about nonlocality. Basically the design you're describing (1 Mx + 3 GPU chiplets) is similar to the Zen 1 generation epyc chips - each chiplet has some memory busses, and that gives you NUMA. AMD chose to move memory to the I/O chiplet in succeeding generations to avoid the problems that presented. Sure, Apple may be smarter, but those definitely are issues that would need to be addressed. And what I was asking was, if you had that NUMA architecture with GPU chiplets, would that likely be a bigger or smaller issue than NUMA is with CPU chiplets? That is, is the GPU workload more or less sensitive to NUMA? I have no feel for that at all.(a) I referred to the A16. Don't you think I made that reference for a REASON?
It's certainly possible that mounting memory this way results in lower energy transfers than mounting it on the side. The details would depend on things like the substrate (unclear whether it is silicon, "real" glass, standard thin ABF, or something else altogether) and the capacitance of the TSV's. But the fact that it was adopted for A16 certainly suggests that it both works (technology is reliable) and that it is low power.iPhone 14 Proの心臓部、「A16 Bionic」を解析する
今回は2022年9月16日に発売されたAppleの最新スマートフォン「iPhone 14 Pro」のプロセッサ「A16 Bionic」について報告する。A16 BionicはiPhone 14 Proにのみ採用されている。eetimes.itmedia.co.jp
BSPD only becomes an issue if you mount the memory on the BSPD side. There are two sides, so you don't have to do that...
I see, can your program see the amount of ALU and RAM???
I feel like I am talking out of turn, for GPU rasterization they tend to care more about bandwidth than latency. For GPGPU I read that it depends on the workload as to if it is more latency sensitive.Right, but I was thinking more about nonlocality. Basically the design you're describing (1 Mx + 3 GPU chiplets) is similar to the Zen 1 generation epyc chips - each chiplet has some memory busses, and that gives you NUMA. AMD chose to move memory to the I/O chiplet in succeeding generations to avoid the problems that presented. Sure, Apple may be smarter, but those definitely are issues that would need to be addressed. And what I was asking was, if you had that NUMA architecture with GPU chiplets, would that likely be a bigger or smaller issue than NUMA is with CPU chiplets? That is, is the GPU workload more or less sensitive to NUMA? I have no feel for that at all.
They just did with RTX on A17 Pro. It's a significant multiyear investment and it burns extra battery life compared to simply deactivating a graphics feature, which is just pure unnecessary eye candy for triple-A games. Apple is all about advancing their own platform in ways no one else can. Next to programming their own OS, designing their own chips is a major vector to achieve this uniqueness. And a gaming GPU is just one subplot in this chip design endeavour. It doesn't even need to be that useful on a phone. The same GPU technology will go into iPads, MacBooks, AppleTV and iMacs where eventually it will be properly cooled. It doesn't matter whether console gaming on a phone stays a fantasy for now. Maybe it will help pushing iPad sales, maybe it will take off after the next die shrink. The groundwork has been laid for others to build upon.Apple are willing to play along with this [gamer] fantasy to the extent that it doesn't get in the way of real customers, but no more than that. Which means they care a LOT about making sure the A17 works well when playing Candy Crush. But they are not going to increase weight, cost, or anything else, PURELY to make Genshin Impact behave differently.
They just did with RTX on A17 Pro. It's a significant multiyear investment and it burns extra battery life compared to simply deactivating a graphics feature, which is just pure unnecessary eye candy for triple-A games. Apple is all about advancing their own platform in ways no one else can. Next to programming their own OS, designing their own chips is a major vector to achieve this uniqueness. And a gaming GPU is just one subplot in this chip design endeavour. It doesn't even need to be that useful on a phone. The same GPU technology will go into iPads, MacBooks, AppleTV and iMacs where eventually it will be properly cooled. It doesn't matter whether console gaming on a phone stays a fantasy for now. Maybe it will help pushing iPad sales, maybe it will take off after the next die shrink. The groundwork has been laid for others to build upon.
You can use that for DLSS, so "free" performance, and if Nvidia is going that route is because is more efficient.They just did with RTX on A17 Pro. It's a significant multiyear investment and it burns extra battery life compared to simply deactivating a graphics feature, which is just pure unnecessary eye candy for triple-A games. Apple is all about advancing their own platform in ways no one else can. Next to programming their own OS, designing their own chips is a major vector to achieve this uniqueness. And a gaming GPU is just one subplot in this chip design endeavour. It doesn't even need to be that useful on a phone. The same GPU technology will go into iPads, MacBooks, AppleTV and iMacs where eventually it will be properly cooled. It doesn't matter whether console gaming on a phone stays a fantasy for now. Maybe it will help pushing iPad sales, maybe it will take off after the next die shrink. The groundwork has been laid for others to build upon.
4.5 is too big of a jump, I would say 4.2 max, 4.1 probably, the actual M2 Max/Ultra is 3.7So...
...???
- M3 Max/Ultra/Extreme
- High Power Mode
- CPU @ 4.5GHz
- GPU @ 2.0GHz
The fact that you think ray tracing exists as "pure unnecessary eye candy for triple-A games" says volumes about how little you know of where Apple is going.They just did with RTX on A17 Pro. It's a significant multiyear investment and it burns extra battery life compared to simply deactivating a graphics feature, which is just pure unnecessary eye candy for triple-A games. Apple is all about advancing their own platform in ways no one else can. Next to programming their own OS, designing their own chips is a major vector to achieve this uniqueness. And a gaming GPU is just one subplot in this chip design endeavour. It doesn't even need to be that useful on a phone. The same GPU technology will go into iPads, MacBooks, AppleTV and iMacs where eventually it will be properly cooled. It doesn't matter whether console gaming on a phone stays a fantasy for now. Maybe it will help pushing iPad sales, maybe it will take off after the next die shrink. The groundwork has been laid for others to build upon.
I think you're *probably* right about this. But there are a few counter-indications, that Apple may finally be thinking about someday maybe getting serious about gaming. I wouldn't bet on it, but it no longer seems entirely insane to think so.Ray tracing is an essential element of a variety of graphics algorithms that are relevant to AR, for example casting realistic shadows... I suspect that is vastly more important in Apple's calculations than game eye-candy, which is just a minor side benefit.