I think you need to cool down and understand the context of the discussion.Apple silicon magic strikes again.
I think you need to cool down and understand the context of the discussion.Apple silicon magic strikes again.
I think you'll be right in this case. I think M1/X probably only has LPDDR4X memory controller built-in. Probably have to wait for the next gen M series SoCs for DDR5/LPDDR5 memory controller.On the subject at hand, I think LPDDR5 is unlikely for the M1X.
I think you'll be right in this case. I think M1/X probably only has LPDDR4X memory controller built-in. Probably have to wait for the next gen M series SoCs for DDR5/LPDDR5 memory controller.
Most likely would be that the M1X, if it comes out, will likely use higher channel LPDDR4X. It really depends if Apple thinks it'll be enough to beat the performance level of the models they are replacing with the M1X, for them to claim 2-3x performance increases, especially in the GPU department.
This but unironically, it’s like being blueballed.HUrRY uP IwAnT 16 inCh M2 nOW!!
I would think with Apple's experience with high performance system (i.e. Mac Pros, xServes, etc) their internal fabric would be designed to handle really high bandwidth. Like you said, Apple's pocket is deep enough to go really wild as far as SoC design is concerned.
Wow! I suspect tho. that what you saw probably are calculations performed wholely using the SoC's cache? FP32 are 32-bits long. Completing 2.6 TFLOPS with each data item 32-bits long means we need 10TB/s of bandwidth in steady state, not withstanding the other processing cores' need for memory bandwidth. I'm sure my calculation is over simplifying the scenario but I somehow think that simply doubling the bandwidth will double the M1's 8 GPU core's performance in real world use.
I think you'll be right in that most likely they'll go with multi channel DDR5. I still can't reconcile how Apple will implement it for the iMacs and Mac Pros tho. Using soldered memory in notebooks may be fine, but I don't think Apple would want to manufacture a bunch of iMacs and Mac Pros with soldered memory and find themselves stuck with unmovable inventories. It'll be interesting to see how Apple is going to address this.
Also, I'm not sure if there are designs where the internal fabric are designed to connect to two types of memory controllers (i.e. HBM2 and DDR5/LPDDR4X). If possible, the fixed HBM2 memory will be used for the GPUs, while the slower DDR5/LPDDR4X could be via DIMM slots for the Mac Pros and iMacs, maybe even the 16" MBP. The drivers will have the smarts to delineate the memory regions for various processing cores' use. If anyone can do it, it'll probably be Apple.
I think and hope that the issue is likely driver related instead of the actual Silicon. Intel's CPU and their north/south bridges chipsets are mature with equally mature drivers, while the M1 has yet to be battle tested, so to speak. So I'm hopeful existing issues will be resolved with future Big Sur updates.
so what’s more likely, that Apple’s preserves the UMA and uses the same RAM for the cpu and gpu or goes back to a traditional RAM arrangement for their higher end machines?
so what’s more likely, that Apple’s preserves the UMA and uses the same RAM for the cpu and gpu or goes back to a traditional RAM arrangement for their higher end machines?
I think that the main reason why we don't see LPDDR5 in Apple products yet is because the supply is still very limited. The controller itself probably supports it. I believe there is some hope that we will see LPDDR5 in higher-end, lower volume Macs.
I don't think the mini will get an M1X. Traditionally, the mini has been a parts bin Mac and will live off the excess M1's and LPDDR4 until the supply runs out or demand begins to wane. Then, when the Air is ready for the next low-power Apple Silicon chip, the mini will also get it.Maybe, but even with that logic, I just don't see the M1X Macs as having it. The rumored models getting the M1X are the Mac Mini, the 14/16 MacBook pro, and the low end iMac and these are still among the higher selling models. By the time Apple gets up to the really low-volume Macs, the SOC will probably be the M2 generation and hopefully LPDDR5 will be standard across AS. But who knows? I'll be happy to be wrong about it coming earlier.
I don't think the mini will get an M1X. Traditionally, the mini has been a parts bin Mac and will live off the excess M1's and LPDDR4 until the supply runs out or demand begins to wane. Then, when the Air is ready for the next low-power Apple Silicon chip, the mini will also get it.
Would it be possible to have a 16/32GB on chip UMA and have it supplemented with DIMM style memory pool? Effectively making a super cache on the chip.
Whether the rumoured M1X would get LPDDR5 would depend on just how cheap Apple would want to go.Maybe, but even with that logic, I just don't see the M1X Macs as having it. The rumored models getting the M1X are the Mac Mini, the 14/16 MacBook pro, and the low end iMac and these are still among the higher selling models. By the time Apple gets up to the really low-volume Macs, the SOC will probably be the M2 generation and hopefully LPDDR5 will be standard across AS. But who knows? I'll be happy to be wrong about it coming earlier.
I’m kind of thinking option 3, hoping not option 1.Whether the rumoured M1X would get LPDDR5 would depend on just how cheap Apple would want to go.
The rumoured configuration of 8 performance CPU cores and twice as many GPU cores as the M1, would suggest that an M1X would need twice the bandwidth for performance to scale with computational capabilities vis a vis the M1.
I can see four cheap scenarios -
* 128-bit LPDDR4x, same as M1. This is a possibility, but it would suck.
* 128-bit LPDDR5. This would offer a 50% (in some scenarios better) improvement in bandwidth over the M1. It would be as cheap to implement as the M1 solution, and could offer up to 32GB if I read Samsung correctly. Max performance would be compromised relative to linear scaling.
* 256-bit LPDDR4x, twice the M1. Straightforward on all fronts, and a pretty reasonable configuration. Using the same parts as other devices probably makes procurement easier/cheaper, not that these devices should be very sensitive to such concerns, we are talking small money here.
* 256-bit LPDDR5, three times the M1 nominally, and better under certain conditions. This would allow superlinear scaling with computational resources vs. the M1, and provide a great hike in performance vs. the M1 with minimum expenditure of 5nm SoC area. Would also provide a wider range of memory configurations.
These are all cheap/minimum engineering effort variations of what Apple is already doing, and thus trivial to predict. So if any of this shows up in a rumour, it doesn't lend any credence to that rumour, much as just doubling the performance cores and GPU cores is a trivial extrapolation of the M1 that any internet bot can come up with. Doesn't mean it's wrong, but there is nothing there that suggests that anything other than speculation as a source.
I waited to see if the thread would generate some more traffic. Honestly I think we have exhausted the subject in the absence of new information, but I wanted to state somewhere that I’d really like to see a tile based deferred renderer with some serious hardware grunt, and the backing and support of a strong company such as Apple. Preferable priced within the reach of mere mortals (such as me) so that it benefits as many people as possible.I’m kind of thinking option 3, hoping not option 1.
For what it’s worth, potentially very little, the M1X “leak” on CPU Monkey said it was LPDDR4X memory.
I waited to see if the thread would generate some more traffic. Honestly I think we have exhausted the subject in the absence of new information, but I wanted to state somewhere that I’d really like to see a tile based deferred renderer with some serious hardware grunt, and the backing and support of a strong company such as Apple. Preferable priced within the reach of mere mortals (such as me) so that it benefits as many people as possible.
I’ll pay for it out of pure technical curiosity, and I’ll pay for games that are actually coded targeting the architecture in preference to buying them on other platforms available to me.
If anyone working at Apple who feels the same reads this, know that there are people on the sidelines cheering you on! ?
At this point we've all seen the Bloomberg report about future ASi chips (refresher) which gets into a bit of detail about the core count variations of both the CPU and GPU that Apple is testing. But what do we speculate about other aspects of it? Will there be clock speed increases? Ray tracing hardware? Will the neural engine be the same across all? What kind of RAM will they use? How will the chips be packaged?
We are still talking about an M1/A14 variant, so I wouldn’t expect any microarchitectural changes. Memory bus will be doubled to 256 bit, that’s almost certain, with twice as many memory controllers as M1 and four RAM chips instead of two. Packaging I’d expect to stay the same, just with two more RAM chips on the other side. No changes to neural engine from M1. Maybe LPDDR5. More thunderbolt controllers. That’s about it.
Doubling the memory bus width would also double the memory bandwidth to 136 GB/s, assuming the memory technology used is still LPDDR4X. It'll be interesting to see if doing that alone will double the M1's thruput for all processing cores (i.e. CPUs, GPUs, NPUs, etc.)
Tests conducted by Anandtech on the M1 Macs shows that a single Firestorm core is enough to saturate the 68 GB/s bandwidth. So it would seem that the M1 Macs are severely bandwidth constrained with more potential yet to be unleashed?
With UMA, I would think that the M1's system interconnect fabric would have to implement some sort of fair share algorithm for each of the processing cores to prevent data starvation. So the 68 GB/s bandwidth provided by the 128-bit LPDDR4X memory could not be allocated fully to any single processing core's use.
According to Apple, the M1's GPU could perform 2.6 TFLOPS (presumably FP32). From my limited understanding, 68 GB/s is nowhere near enough to keep the M1 7/8 GPU cores fed to achieve 2.6 TFLOPS.
For iMacs and Mac Pros, I would think it's unlikely Apple will go with higher bandwidth memory, e.g. HBM2, as it'll be too cost prohibitive to implement for consumer products. What I think would be likely is that HBM2 or equivalent (costly) memory tech. will be use solely for the GPU, and DDR5/LPDDR5 will be used for main memory, with the GPU sitting on a separate die/board with it's own memory, but with custom circuitry to ensure memory coherency with main memory so as to preserve the UMA architecture. The 68000 Macs used to have proprietary bus slots (if memory serves) for such purposes, so Apple may go back to custom designs instead of using PCIe.
I am probably completely off tho.
Thoughts?
If you increase the number of processing cores, you have to increase the memory bandwidth. The GPU in M1 is already likely bandwidth-limited, if one wants to have 16 cores one needs to at least double the bandwidth.
That’s interesting, right? Bandwidth to individual cores is usually constrained, but not with Apple design. I wouldn’t say that M1 CPU is bandwidth constrained, more that it’s able to utilize all available bandwidth. As to what maximal bandwidth the internal fabric can support, we can only guess.
I was able to get pretty much exactly 2.6 TFLOPS using long chains of fused multiply adds. The FP16 performance is identical to FP32 (which is a big difference to A14 that has half the FP32 throughout). As to bandwidth... no GPU or CPU has enough of it. The assumption is that you do a bunch of calculations between loads and stores or your ALUs are running empty.
I think we will see “real” unified memory. Enduring coherency as you describe is really complicated and so dint think design purists st Apple would be happy with it. Maybe not HBM, but multi channel stacked DDR5 (8 to 16 channels, should provide plenty of bandwidth). Abs yeah, it’s costly but still cheaper than buying Xeons. And Apple is the only company that can afford it
I would think with Apple's experience with high performance system (i.e. Mac Pros, xServes, etc) their internal fabric would be designed to handle really high bandwidth. Like you said, Apple's pocket is deep enough to go really wild as far as SoC design is concerned.
Wow! I suspect tho. that what you saw probably are calculations performed wholely using the SoC's cache? FP32 are 32-bits long. Completing 2.6 TFLOPS with each data item 32-bits long means we need 10TB/s of bandwidth in steady state, not withstanding the other processing cores' need for memory bandwidth. I'm sure my calculation is over simplifying the scenario but I somehow think that simply doubling the bandwidth will double the M1's 8 GPU core's performance in real world use.
I think you'll be right in that most likely they'll go with multi channel DDR5. I still can't reconcile how Apple will implement it for the iMacs and Mac Pros tho. Using soldered memory in notebooks may be fine, but I don't think Apple would want to manufacture a bunch of iMacs and Mac Pros with soldered memory and find themselves stuck with unmovable inventories. It'll be interesting to see how Apple is going to address this.
Also, I'm not sure if there are designs where the internal fabric are designed to connect to two types of memory controllers (i.e. HBM2 and DDR5/LPDDR4X). If possible, the fixed HBM2 memory will be used for the GPUs, while the slower DDR5/LPDDR4X could be via DIMM slots for the Mac Pros and iMacs, maybe even the 16" MBP. The drivers will have the smarts to delineate the memory regions for various processing cores' use. If anyone can do it, it'll probably be Apple.
I think and hope that the issue is likely driver related instead of the actual Silicon. Intel's CPU and their north/south bridges chipsets are mature with equally mature drivers, while the M1 has yet to be battle tested, so to speak. So I'm hopeful existing issues will be resolved with future Big Sur updates.
That could be how Apple replaces the models with dGPUs (16" MBP, iMac), but I don't think you'll see that type of setup in a MBA or sub-16" Pro. There's a reason dGPUs (even the ones used in Macs) are running GDDR instead of DDR, so I think that you'd still have the SoC with its CPU and GPU cores, then a dedicated GPU with its own memory that connects via a (likely proprietary) high-speed bus to the SoC.
I don't think there's anything left to say. RTX 3080 Mobile has the following specs:
View attachment 1870019
M1Max 32 Core GPU offers 10.4 TFLOPs compute, 327 GTexels/s and 165 GPixels/s rates.
Texture and pixel fill rates exceed RTX 3080 Mobile but computing performance is half as much. (Wonder why that is).
it's funny how apple actually went with LPDDR5 512-bit LPDDR5I’m kind of thinking option 3, hoping not option 1.
For what it’s worth, potentially very little, the M1X “leak” on CPU Monkey said it was LPDDR4X memory.
I'm definitely impressed by the RAM bandwidth, though I guess it's mainly from them moving to LPDDR5 when I expected them to stay with LPDDR4X. A bit sad we didn't get the A15 cores here, which probably means next year's Mac Pro will be boasting 2020 cores ?So.. What do you Guys now think of the M1 Pro and M1 Max? seems to be closer to a rtx 3080m but with 100w less power, and also the 400Gb/s transfer rate is way beond LPDDR5 RAM
what do you mean? why woul we have a15 cores considering it's their M1 versionA bit sad we didn't get the A15 cores here, which probably means next year's Mac Pro will be boasting 2020 cores ?
cant find this to be normal ether. considering GPUs normaly uses GDDR6(x) with more latency compared to LPDDR5The bandwidth is in-line with LPDDR5 with a wider interface. It's probably ~408GB/s even.
I'm definitely impressed by the RAM bandwidth, though I guess it's mainly from them moving to LPDDR5 when I expected them to stay with LPDDR4X. A bit sad we didn't get the A15 cores here, which probably means next year's Mac Pro will be boasting 2020 cores ?
The bandwidth is in-line with LPDDR5 with a wider interface. It's probably ~408GB/s even.
It was just my hope that it wouldn't be A14 cores but A15 cores. Oh well.what do you mean? why woul we have a15 cores considering it's their M1 version
Well it's probably cheaper to go with GDDR6 vs a very wide LPDDR5 interface, but Apple has money to spend.cant find this to be normal ether. considering GPUs normaly uses GDDR6(x) with more latency compared to LPDDR5
From what I know, GDDR memory runs too hot and too power hungry, so it'll likely never considered. Also has 2-3 times the access latency as well.Well it's probably cheaper to go with GDDR6 vs a very wide LPDDR5 interface, but Apple has money to spend.
Well it's probably cheaper to go with GDDR6 vs a very wide LPDDR5 interface, but Apple has money to spend.