Rumored pro chip

Appletoni · Jun 20, 2021

theorist9 said:
How does one calculate total memory bandwidth? I found this table from https://www.rambus.com/blogs/get-ready-for-ddr5-dimm-chipsets/#3

So does this mean your total memory bandwidth with standard DDR5 would be?:

ECC:
4.8–6.4 Gbps/channel/bit width x 32-bit data channels x 2 channels/DIMM x number of DIMMS ÷ 8 bits/byte
NON-ECC:
4.8–6.4 Gbps/channel/bit width x 40-bit data channels x 2 channels/DIMM x number of DIMMS ÷ 8 bits/byte

Thus with 8 DIMMs, that would give you:
ECC: 307–410 GBps
NON-ECC: 384–512 GBps

And what would be the corresponding calculation for Apple's current M1, and for the expected next-gen AS?

View attachment 1795897

8x DDR5 RAM = 512 GB RAM that’s fine.

leman · Jun 20, 2021

quarkysg said:
I'm thinking that with the M1 vs GTX 1050/1060, which seems roughly equivalent in terms of performance, the M1 is trailing the GTX in terms of GPU to dataset bandwidth by 2:1 but the M1 is still matching the performance delivered.

That's not that simple though. M1 has 16MB of last level cache, the 1050 GTX has 1MB of last level cache, that's a massive difference. The math it not necessarily linear in this case. On the entry-level GPU performance spectrum, things might scale faster than on the high-end performance. Let's take a hypothetical 64-core Apple GPU (8192 compute units), that's 8x more compute capability than M1. Does it need 8x the bandwidth of M1 to show optimal performance? What does 8x mean in this case? Can we just increase the cache to 128MB and keep a relatively low memory bandwidth (e.g. 4-channel DDR5 at 200GB/s) or do we also need to scale the RAM bandwidth accordingly (to 500GB/s)? Or is it something else entirely? And also not forget that we have a large CPU, NPU clusters in the mix as well — those will need some access to the bandwidth as well.

I don't really have clear picture here, nor do I have any education or experience to predict these things. But my intuition tells me that while Apple can use advanced technologies such as large caches and transparent memory compression to compensate around 100GB/s on the GPU side, I am not sure that they will be as easily able to compensate 500GB/s on the high-end. Their bandwidth optimizations are a great thing for laptops, as they will be able to deliver thin and light machines with excellent battery life and unprecedented performance in that class. But for high-end desktop, you still need raw power/bandwidth etc.

quarkysg said:
In the case of game rendering for example, with high end GPU cards, they have gobs of high speed VRAMs as they need to pre-fill it with textures and have it rendered, as sending the textures to the VRAMs on-demand will kill performance due to the PCIe bottleneck. I think that's why you need to pair high end GPU cards with a beefy CPU as other than processing game logic, I would think a fair chunk of the CPU time will be spent determining when to shuttle data to the GPU VRAMs, and sending gobs of data to VRAMs before it's needed for the next scene for example.

I think you are making this more complicated than it is. Games have a lot of ways of hiding memory latency, and games don't have to produce correct results all the time — it's all fine as long as the user does not notice. What one often does is preload a low-resolution mipmap for the usual textures and then load the texture on demand when the engine determines that a higher-resolution is needed. While the texture is uploaded, the image in certain areas might be sightly blurry here or there for a split second, but your eye is likely not to register it at all. There are a lot of things engines can play with it, for example the speed with which the camera and the game objects can move, which will put a limiter on how quickly new textures need to become available. Texture streaming is a problem that has well-established, robust solutions that are well optimized to the limited bandwidth of PCI-e interface.

Where it's a bit more critical are things like dynamic geometry generation, or in general, complex geometry changes. You can draw an image if the high-quality texture is not yet in the GPU memory (it will just be slightly blurry for a frame or two). You can't really draw anything if the geometry is not there. That's why games tend not to do geometry generation, and that's why Nvidia has been pushing mesh shaders as a fast way to do geometry generation on a dedicated GPU. Apple GPUs are much more flexible in this regard.

leman · Jun 21, 2021

theorist9 said:
How does one calculate total memory bandwidth? I found this table from https://www.rambus.com/blogs/get-ready-for-ddr5-dimm-chipsets/#3

So does this mean your total memory bandwidth with standard DDR5 would be?:

ECC:
4.8–6.4 Gbps/bit x 32 bits/channel for data x 2 channels/DIMM x number of DIMMs ÷ 8 bits/byte
NON-ECC:
4.8–6.4 Gbps/bit x 40 bits/channel for data x 2 channels/DIMM x number of DIMMs ÷ 8 bits/byte

Thus with 8 DIMMs, that would give you:
ECC: 307–410 GBps
NON-ECC: 384–512 GBps

That's a bit confusing but the main idea is that DDR RAM transfers 8 bytes per request. This is the same for DDR4 or DDR5. So when you look at the number next to the DDR specification, you can multiply it by 8 to get the bandwidth in MB/s. For example DDR5-4800 (slowest speed mandated by the spec) has 38.4 GB/s of bandwidth. The high-speed Hynix DDR-8400 has 67.2 GB/s of bandwidth.

There are some more subtle differences, e.g. DDR5 having two 32-bit memory channels instead of DDR4's one 64-bit channel — same amount of data transferred, but with more flexibility as you can make two independent data request where DDR4 was forced to load more data than it might have needed. This is similar to how LPDDR operates (LPDDR5 as configured in M1 in for example uses 16-bit memory channels for even more flexibility). As far as I am aware, ECC and non-ECC RAM have the same bandwidth, it's just that ECC RAM needs to use a wider bus to transfer the error-correction codes.

theorist9 · Jun 21, 2021

leman said:
That's a bit confusing but the main idea is that DDR RAM transfers 8 bytes per request. This is the same for DDR4 or DDR5. So when you look at the number next to the DDR specification, you can multiply it by 8 to get the bandwidth in MB/s. For example DDR5-4800 (slowest speed mandated by the spec) has 38.4 GB/s of bandwidth. The high-speed Hynix DDR-8400 has 67.2 GB/s of bandwidth.

There are some more subtle differences, e.g. DDR5 having two 32-bit memory channels instead of DDR4's one 64-bit channel — same amount of data transferred, but with more flexibility as you can make two independent data request where DDR4 was forced to load more data than it might have needed. This is similar to how LPDDR operates (LPDDR5 as configured in M1 in for example uses 16-bit memory channels for even more flexibility). As far as I am aware, ECC and non-ECC RAM have the same bandwidth, it's just that ECC RAM needs to use a wider bus to transfer the error-correction codes.

Thanks for the correction on the non-ECC bandwidth. My ECC numbers were the same as yours: For DDR5-4800, 38.4 GB/s/DIMM => 307 GB/s for 8 DIMMs. I cleaned up my post accordingly, crediting you for the non-ECC correction.

What would be the corresponding calculation for the M1?

leman · Jun 21, 2021

theorist9 said:
What would be the corresponding calculation for the M1?

Same principle. M1 uses LPDDR4X-4266, that's 4266 mega-transfers per second (for 8 bytes per transfer) or around 34GB/s for a single LPDDR module. M1 uses two modules with the aggregate bandwidth of around 68GB/s.

The details are obviously more complicated, since LPDDR4X (as configured by M1) operates with small 16-bit memory channels that can — from what I gather — transmit varying amounts of data on request, and frankly, I am getting lost at this point. Luckily enough, for computing the maximal bandwidth these details don't matter — you can treat LPDDR as a "normal" DDR chip with a single 64-bit memory channel for this purpose. Just keep in mind that having a bandwidth and actually saturating it are two different things.

deconstruct60 · Jun 21, 2021

leman said:
Same principle. M1 uses LPDDR4X-4266, that's 4266 mega-transfers per second (for 8 bytes per transfer) or around 34GB/s for a single LPDDR module. M1 uses two modules with the aggregate bandwidth of around 68GB/s.

Apple isn't using standard off the shelf LPDDR4 packages/module (the physical thing you can see glued to the logic board or multii-chip-module board ). Each one of the packages has (4 data paths. There are separate stacked RAM dies inside the package. Some LPDDR packages allow those to be on the same bank. ( controller talks to die a , b , or c inside the package).

The M1's internal structure suggest that there is a custom matching between the M1 and the specific memory modules Apple is buying. So the above would be bandwidth to 1/4 of their physical package.

DDR5 DIMM standard does something similar.

"...The big change here is that, similar to what we’ve seen in other standards like LPDDR4 and GDDR6, a single DIMM is being broken down into 2 channels. Rather than one 64-bit data channel per DIMM, DDR5 will offer two independent 32-bit data channels per DIMM (or 40-bit when factoring in ECC). Meanwhile the burst length for each channel is being doubled from 8 bytes (BL8) to 16 bytes (BL16), meaning that each channel will deliver 64 bytes per operation. ... "

DDR5 Memory Specification Released: Setting the Stage for DDR5-6400 And Beyond

www.anandtech.com

There is not one 64 bit path. There are two 32's each with their own command/instruction stream.

I think there is a bit of loosing the forest for the tree going on here. Maximum effective bandwidth for a single thread drag race looks different than maximum effective bandwidth where there are 100 different core requests in play. It isn't just a matter of "bigger pipe" if there are if there are dozens of different directions to go. ( or a 100 mph , one lane freeway versus a 45 mph , 8 lane freeway . Single car versus 1,000 cars; the real life performance on those two roads will be different beteween those two contexts. ).

deconstruct60 · Jun 21, 2021

leman said:
That's a bit confusing but the main idea is that DDR RAM transfers 8 bytes per request. This is the same for DDR4 or DDR5. So when you look at the number next to the DDR specification, you can multiply it by 8 to get the bandwidth in MB/s. For example DDR5-4800 (slowest speed mandated by the spec) has 38.4 GB/s of bandwidth. The high-speed Hynix DDR-8400 has 67.2 GB/s of bandwidth.

While request A has the DIMM locked up in a burst , what bandwidth do requests B , C, D, get waiting further back in the queue? How long did it take request A to get to the front of the line?

That's why this theortic max bandwidth calculations typically skip the burst lenght factor. It answers what is the bandwidth service to the aggregate requests. ( reads and write possibly being different time aren't factored in either. ). The push is to get a nice simple number than can be more easily marketed.

EntropyQ3 · Jun 21, 2021

deconstruct60 said:
The push is to get a nice simple number than can be more easily marketed.

”Bandwidth” is probably not a sellable metric to the masses in the first place, never mind the intricacies. It’s for the techies.
You certainly have a point when you bring up that a complex SoC has a large number of not only independent threads but also functional units accessing and competing for this shared resource. And their access patterns may differ and step on each others toes, evicting cached data and so on, interacting in ways that are far more complex (and prone to cause hiccups) than doing a simple benchmark test.
For the most part though, having more bandwidth helps with those problems as well. It’s a game of compromises and cost vs. benefit. The trend for ages now has been that bandwidth increases far slower than the capabilities of the processing elements to chew through data. John McCalpin used to produce nice graphs showing the ever increasing disparity between FLOPs and bandwidth, illustrating why the memory hierarchy is so important, and increasingly so over time.

By contrast, benchmarking by it’s very nature strives to isolate the performance of specific subsystems in isolation, be it CPU cores, the GPU or these days the NPU. And bandwidth, and bandwidth dependent code, was never particularly popular in benchmarking anyway, since it doesn’t lend itself to marketing. Most tests today are predominantly cache resident and in the case of multithreaded benchmarks, typically confined to the levels of cache that are private to the individual cores.

It will be very interesting to see where Apple goes with their higher performance (non-laptop) solutions, for a lot of reasons. It is not at all clear how they will prioritize, we don’t have access to either their user data of how these systems are used today, nor their vision for how their use will evolve in the future.

deconstruct60 · Jun 21, 2021

theorist9 said:
1) I understand that was your experience in your business, but that was just one company. Recall that, in desiging the latest Mac Pro, Apple brought on a "Pro Workflow Team" (https://techcrunch.com/2018/04/05/apples-2019-imac-pro-will-be-shaped-by-workflows/ ) so that their pro users could tell them just what their needs were. And a key input from those pros is that they needed the machine to be modular, so that it could be upgraded as their needs changed. From Tom Boger, senior director of Mac Hardware Product Marketing:

".... modular was inherently ... a real need for our customers and that’s the direction we’re going."

And from Apple's white paper on the Mac Pro (https://www.apple.com/mac-pro/pdf/Mac_Pro_White_Paper_Feb_2020.pdf):

"The Mac Pro is engineered to provide unprecedented levels of access and capability. Every aspect of the hardware is designed to be flexible and accommodate change. The graphics, storage, and memory modules are easily expandable and configurable." [emphasis mine]

And it's not just Apple. The workstations produced by HP, Dell, Boxx, etc. are all modular and upgradeable. So essentially what you're arguing here is that what Apple, HP, Dell, and Boxx think many of their pro customers want, and what many of their pro customers are actually saying that want, isn't what they really want!

2) The pro community is quite diverse, so again, just because that was your experience doesn't mean it's applicable generally. Here's a quote from John Ternus, Apple's VP of Hardware Engineering (from techcruch article linked above):

“We said in the meeting last year that the pro community isn’t one thing. It’s very diverse. There’s many different types of pros and obviously they go really deep into the hardware and software and are pushing everything to its limit."

That objective that Apple stated doesn't supersede other higher priority strategic directives that Apple has. Apple takes suggestive feedback from these supervised work Pro contracts , but that feedback isn't completely driving the design bus.

1. No long term semi-public / public long term roadmaps... nope.

2. In 2017 - 2018 there was re-occurring theme that Apple was going to chuck Thunderbolt or "monkey see , monkey do" copy the Windows PC market on add-in-card approach to Thunderbolt.
Didn't happen. It is modular but it is modular in a way that only Apple has modules.

3. Modular default boot SSD ? Largely no. The SSD is soldered on to the MP 2019 motherboard and isn't open market modular in any way. ( so Security/Privacy supersedes modularity. ) . Yes, the "brainless" SSD NAND modules are on daughter cards , but those are only a subset of a complete SSD. So split decision there. The SSD controller of the SSD soldered to the board is a Security/Privacy thing. The data always being encrypted 100% of the time is a Security/Privacy thing.

Do 3rd party operating systems get to see the T2 SSD. Not really.

4. Nvidia GPU add-in-cards. No drivers signed. ( Nvidia isn't on Apple's satistfactory partner list so that isn't happening.)

( dumped subcontractors don't get support. )

5. 3.5" HDDs being a necessity. Yes it is optionally there. No it isn't in any of the BTO optoins for the system. The Intel chipset provided it anyway ( so Apple was buying something with it along with the rest of the components). [ Apple's stated strategic future is SSDs. ]

The MP 2009-2012 had 4 HDD drive sleds. If you buy a 3rd party bracket you can have two drive slides in the MP 2019. Apple swapped in more PCI-e sockets as somewhat of an exchange for more SSD add-in-cards and faster SAN/NAS storage. if want more than two HDDs they provide the more modular foundation of buying that expansion yourself.

A no point in Apple's story about the glories about modularity was that modularly is about making things more affordable over time or lowering systems costs. The entry level price on the Mac Pro 2019 jumped 100% from the Mac Pro 2013 levels. "Pros need options to buy more affordable options over time" isn't Apple's approach to modularity.

So for macOS on M-series so far there have been a couple of things.

1. Sp far , no 3rd party GPU drivers. So macOS running native iPhone app at full speed is extremely likely a higher priority thing. In a coupled issue, 100% modular control over the default GPU probably supersedes the "modular objective" also. ( default MPX 580 drew complains as the entry options but you could pull it after paid for it. )

[ I think they will slightly reverse course on this later rather than sooner. After macOS 13 where all PCI-e drivers that reside in the kernel are banned. Probably some compromise pops up that bans iPhone apps from the 3rd party driven displays or Apple comes up with a backchannel framebuffer copy work around. Or they get greedy and cover a large subset of the market. The new GPUs 3-4 years down the road just being faster is going to be hard from them to escape from. ]

2. Apple embedded the T2 into the primary. SoC. Same propopagation of the security priority will solder the base CPU cores (and GPU cores ) to the motherboard as they are no entangled with the higher priority objective.

Moving from 12 CPU cores to 28 CPU cores later is very probably gone. The modularly won't esapce the SoC 'Black Hole' effect.

Will Apple do a 100% PCI-e slotless "half sized" Mac Pro? Probably not. There are several modularity vectors that don't get tripped up over entanglements with the higher priority goals. A/V capture cards, Afterburner , non default primary boot drives , more USB sockets , more Firewire sockets , external SAN/NAS interface card , etc.

Are those going to get you a 300W aux power port connector on the motherboard? Maybe not.

Is Apple going to put a discrete SATA controller on the logic board? I wouldn't be the farm on that. ( even more so if there are 2-3 open standard PCI-e slots. In that case, Apple will just point at the "modularity" they provided with the open slots as the more modular solution. ).

theorist9 said:
3) You wrote "Should your demands change in an unpredictable manner, upgradeability does not help. It is never the case that you go "oh, I have misjudged how much RAM I need, I should have bought 64GB instead of 16GB."

On the contrary, when I was doing my Ph.D. I specced out a G5 tower that was fine for my computational needs for the first two years (both local computation, and development work for programs I would then send to the university's clusters). But then my needs expanded (research, after all, can take you in unexpected directions), and I recall having to increase both the RAM and HD size. It was good I could do that, because my PI didn't have the budget to buy me a new computer.

Apple hasn't particularly locked down persistent storage max capacity in any substantively ways since the advent of Thunderbolt 2. There are some narrow edge case folks but a MBA - iMac aren't in the "need to be tossed" state if need to store more stuff.

The RAM sizing on the Power G5 2.3GHz DP (PCI-X) 512MB to 8GB . Apple's increments for M1 memory configuration go 8GB. The Mac Pro 2019 gaps are even larger. 32 -> 48 ( +16GB, +50% ) , 48 -> 96 ( 48 GB , +100% ) , 95 -> 192 ( 104 , + 108% ).

Apple used to sell almost "bare bones" boxes. They would find the smallest possible , and fewest possible DIMMs and toss those into the box at the lowest configuration. Now they have lifted the upper tier Pro options into the range of starting much higher. So if someone needs 10-15 GB of working space they can grow at 10-15% per year growth and still be under 32GB in five years.

The notion that Apple is selling tiny, short runways on capacity doesn't really match with their approaches decades ago.

Bug-Creator · Jun 21, 2021

deconstruct60 said:
The RAM sizing on the Power G5 2.3GHz DP (PCI-X) 512MB to 8GB . Apple's increments for M1 memory configuration go 8GB. The Mac Pro 2019 gaps are even larger. 32 -> 48 ( +16GB, +50% ) , 48 -> 96 ( 48 GB , +100% ) , 95 -> 192 ( 104 , + 108% ).

I'm really trying to understand what you are complaining about here.

PowerMac to MacPro, G3 to Xeon always had a sensible amount of RAM sockets that most of the time allowed for a max config beyond what was considered sensible at that time.
Sometimes their were restrictions as to how to populate these to get max performance and for sure the DIMMs that newer models take only start at certain sizes.

32GB in the 2019 is a stupid config as you won't get full performance. If you don't need at least 48GB you probraly should not be looking at a MacPro in the 1st place.
If you go even further you just have to accept that you can't mixed different sizes and that small ones just won't work.

theorist9 · Jun 21, 2021

deconstruct60 said:
The RAM sizing on the Power G5 2.3GHz DP (PCI-X) 512MB to 8GB . Apple's increments for M1 memory configuration go 8GB. The Mac Pro 2019 gaps are even larger. 32 -> 48 ( +16GB, +50% ) , 48 -> 96 ( 48 GB , +100% ) , 95 -> 192 ( 104 , + 108% ).

Apple used to sell almost "bare bones" boxes. They would find the smallest possible , and fewest possible DIMMs and toss those into the box at the lowest configuration. Now they have lifted the upper tier Pro options into the range of starting much higher. So if someone needs 10-15 GB of working space they can grow at 10-15% per year growth and still be under 32GB in five years.

The notion that Apple is selling tiny, short runways on capacity doesn't really match with their approaches decades ago.

Bug-Creator said:
I'm really trying to understand what you are complaining about here.

PowerMac to MacPro, G3 to Xeon always had a sensible amount of RAM sockets that most of the time allowed for a max config beyond what was considered sensible at that time.
Sometimes their were restrictions as to how to populate these to get max performance and for sure the DIMMs that newer models take only start at certain sizes.

32GB in the 2019 is a stupid config as you won't get full performance. If you don't need at least 48GB you probraly should not be looking at a MacPro in the 1st place.
If you go even further you just have to accept that you can't mixed different sizes and that small ones just won't work.

Yeah, I didn't understand how deconstruct60's post was responding to what I wrote either.

I was responding to a poster who said the following:

"Should your demands change in an unpredictable manner, upgradeability does not help. It is never the case that you go "oh, I have misjudged how much RAM I need, I should have bought 64GB instead of 16GB". If you find yourself in such a situation, it's not just more RAM that you need. You likely need a bigger system overall."

I provided a counterexample from my own experience. I bought a G5 in 2004 with 1 GB of RAM, which was sufficient for my computational needs for the next few years. But then my research took a new direction, which meant a different computational approach that required larger arrays that could no longer fit into 1 GB of RAM (less the RAM used by the OS and other apps). So I tripled it to 3 GB and, voilà, problem solved.

Scaling that example up to today, and to a machine with many more cores, that would be like purchasing a Mac Pro with, say, 128 GB of RAM, finding it's fine for a few years, but then moving in a different direction where you now need 384 GB of RAM.

leman · Jun 22, 2021

theorist9 said:
I provided a counterexample from my own experience. I bought a G5 in 2004 with 1 GB of RAM, which was sufficient for my computational needs for the next few years. But then my research took a new direction, which meant a different computational approach that required larger arrays that could no longer fit into 1 GB of RAM (less the RAM used by the OS and other apps). So I tripled it to 3 GB and, viola, problem solved.

I think your example worked very ell 20 years ago when RAM was expensive and adding a single GB or two would double the available memory. In fact, much of the "upgradeability myth" (how I like to call it) comes from those times 20-10 years ago, where simple upgrades would make a huge difference.

Today however? Computers already come with very fast components and a humongous amount of RAM. Adding a single GB won't do anything (and is not even technically possible), and doubling the RAM amount is a completely different category. I simply can't imagine a situation where you can benefit from doubling or tripling a RAM today without being bottlenecked by your current CPU/mainboard etc. Unless of course you made a mistake when buying the computer in the first place and ordered way too little RAM.

theorist9 said:
Scaling that example up to today, and to a machine with many more cores, that would be like purchasing a Mac Pro with, say, 128 GB of RAM, finding it's fine for a few years, but then moving in a different direction where you now need 384 GB of RAM.

I have difficulty imagining how this would play out in a practical scenario. What kind of workload do you have in mind for this example? Maybe if you want to retire your workstation as a ramdisk database server... Which would be a very contrived niche case.

BigPotatoLobbyist · Jun 22, 2021

cmaier said:
So now the rumor is a 10-core chip with only two efficiency cores. If so, the efficiency cores must be quite different than the M1 efficiency cores. I continue to stand by my (not at all informed by any wine I swear) claim that these will be M2 and not M1x.

Or the rumor an (nobody i swear) could be wrong. Since a 4:1 ratio seems a little odd.

edit: never mind, my response was in error and mistaken

Unregistered 4U · Jun 22, 2021

Jorbanead said:
It’s always been my understanding and assumption that the Mac Pro going forward will still be modular. For both the reasons you stated, but also because I don’t see Apple spending years in R&D for a completely new modular Mac Pro design that gets discontinued two/three years late

See, I feel that the price of the MacPro was such that Apple could break even with R&D or make a reasonable profit over the product’s 2/3 years lifetime. Once they make that back, then they could drop the whole concept. I mean, the prior MacPro was similarly a “one and done” situation.

I think Apple’s alienated enough of the Pro’s that wanted solutions Apple’s not willing to offer. So, many of the ones that are left may simply be looking for the fastest way to run Final Cut Pro X or Logic Pro. If modularity is no longer required for an effective workflow for that smaller set of users, then modularity could be headed out the window.

JMacHack · Jun 22, 2021

Unregistered 4U said:
See, I feel that the price of the MacPro was such that Apple could break even with R&D or make a reasonable profit over the product’s 2/3 years lifetime. Once they make that back, then they could drop the whole concept. I mean, the prior MacPro was similarly a “one and done” situation.

I think Apple’s alienated enough of the Pro’s that wanted solutions Apple’s not willing to offer. So, many of the ones that are left may simply be looking for the fastest way to run Final Cut Pro X or Logic Pro. If modularity is no longer required for an effective workflow for that smaller set of users, then modularity could be headed out the window.

That makes zero sense. They made a Mac Pro 7,1 to appease buyers that they plan to abandon? Even if they priced it soley to recoup the r&d cost, there’s no guarantee it would sell in the first place.

If Apple planned to abandon that market then they would have abandoned it with the 6,1. They would not have taken the expense of making an entire supply chain and custom design on a product that might not sell on the bet that they would break even.

The scenario you suggest is taking a massive hit on the off chance that they recoup their investment, and then drop that product segment entirely. That’s a guaranteed loss.

No, if Apple wanted to drop the Mac Pro, they would have never committed the resources to making the 7,1.

Rumors indicate that the m1 transition was planned as early as 2016, a year before the big “mea culpa” meeting where they announced they’d be working on the 7,1 and Pro Display. If Apple was going to drop the MP with the transition they would have not made that meeting at all.

Lemon Olive · Jun 22, 2021

JMacHack said:
That makes zero sense. They made a Mac Pro 7,1 to appease buyers that they plan to abandon? Even if they priced it soley to recoup the r&d cost, there’s no guarantee it would sell in the first place.

If Apple planned to abandon that market then they would have abandoned it with the 6,1. They would not have taken the expense of making an entire supply chain and custom design on a product that might not sell on the bet that they would break even.

The scenario you suggest is taking a massive hit on the off chance that they recoup their investment, and then drop that product segment entirely. That’s a guaranteed loss.

No, if Apple wanted to drop the Mac Pro, they would have never committed the resources to making the 7,1.

Rumors indicate that the m1 transition was planned as early as 2016, a year before the big “mea culpa” meeting where they announced they’d be working on the 7,1 and Pro Display. If Apple was going to drop the MP with the transition they would have not made that meeting at all.

I think you're a little confused. No one is saying they may still abandon the concept of a Pro level Mac. Including Bloomberg who as already reported on the specs of said machine.

But making yet another "new" Mac Pro that does not have the modularity of Intel model is completely possible, and extremely likely, and probably exactly what is going to happen.

Unregistered 4U · Jun 22, 2021

JMacHack said:
They made a Mac Pro 7,1 to appease buyers that they plan to abandon? Even if they priced it soley to recoup the r&d cost, there’s no guarantee it would sell in the first place.

Those folks that bought the Mac Pro… still have their Mac Pro. And, they’ll have that Mac Pro long after Apple’s no longer making that Mac Pro with the Intel processor. And, if their workflows include massive amounts of third party plug-ins that have not been OR won’t be made native, then whatever the next Mac Pro IS does not concern those folks in the least.

JMacHack said:
If Apple planned to abandon that market then they would have abandoned it with the 6,1. They would not have taken the expense of making an entire supply chain and custom design on a product that might not sell on the bet that they would break even.

It was guaranteed to sell precisely at the numbers they expected it to. The Mac Pro was not made in a vacuum, they invited professionals across the industry to discuss their workflows… to know what type of device would be best suited for the several thousands of folks that would ever buy them. (Remember, the Mac Pro sales are, at best, a low single percentage of all Mac revenues.) When your target market is VERY focused, and VERY small, you can make VERY good estimations on what will sell.

JMacHack said:
Rumors indicate that the m1 transition was planned as early as 2016, a year before the big “mea culpa” meeting where they announced they’d be working on the 7,1 and Pro Display. If Apple was going to drop the MP with the transition they would have not made that meeting at all.

I didn’t say they were going to drop the Mac Pro. But, you have to admit, that from one Mac Pro to the next, the past 3 iterations had little relation to each prior one. I’m sure a LOT of R&D went into the trash can Pro, but it was used for 1 iteration, minor speed bumps over that time, and the next Pro had very little in common with it.

I’m just suggesting that the same could be true here, where what is released as a Pro may not have a lot in common with what came before.

JMacHack · Jun 22, 2021

Lemon Olive said:
I think you're a little confused. No one is saying they may still abandon the concept of a Pro level Mac. Including Bloomberg who as already reported on the specs of said machine.

But making yet another "new" Mac Pro that does not have the modularity of Intel model is completely possible, and extremely likely, and probably exactly what is going to happen.

If I’m confused I apologize, but I’ve seen the argument that Apple will abandon the pro market plenty of times and it never holds water.

As to the level of modularity I don’t know, but if I were a bettin man I’d at least guess there will be some sort of pci slots. Maybe not as many as the 7,1 but I don’t think it will be sealed like the 6,1.

Unregistered 4U said:
I’m just suggesting that the same could be true here, where what is released as a Pro may not have a lot in common with what came before.

I think that’s a given considering Apple Silicon’s touted architectural advantages over Intel. I don’t believe anyone expects the 8,1 Mac Pro to be a carbon copy of the 7,1 with an Apple processor instead of an Intel one.

theorist9 · Jun 22, 2021

leman said:
I think your example worked very ell 20 years ago when RAM was expensive and adding a single GB or two would double the available memory. In fact, much of the "upgradeability myth" (how I like to call it) comes from those times 20-10 years ago, where simple upgrades would make a huge difference.

Today however? Computers already come with very fast components and a humongous amount of RAM. Adding a single GB won't do anything (and is not even technically possible), and doubling the RAM amount is a completely different category. I simply can't imagine a situation where you can benefit from doubling or tripling a RAM today without being bottlenecked by your current CPU/mainboard etc. Unless of course you made a mistake when buying the computer in the first place and ordered way too little RAM.

I have difficulty imagining how this would play out in a practical scenario. What kind of workload do you have in mind for this example? Maybe if you want to retire your workstation as a ramdisk database server... Which would be a very contrived niche case.

I was doing computational work on large, complex molecules that required doing repeated operations on very large arrays. In order for the program to run efficiently, the entire array needed to be resident in RAM.

So suppose you're buying a 24-core MacPro to do such computations today. Each computation runs on its own core (the program is hard to parallelize). Thus you can run computations on 24 molecules at a time. You know how big your molecules are, and how finely you want to coarse-grain them, so you determine the maximum array size needed per molecule is 3 GB and, to give yourself some cushion, get 4 GB/core, i.e. 96 GB (after all, why overbuy)?

Here are three scendarios in which you wouldn't need to buy a new machine, but would need to double the RAM:

1) You'd like to run computations on larger molecules, but that's not practical, because the algorithm takes too long. But a few years down the road you, or some else, develops a more efficient algorithm that allows you to process molecules that are much larger, where each molecule now requires a larger array (say, 7 GB). Thus, giving the same cushion, you now need 8 GB/core, i.e., double the RAM.

2) You don't move to looking at larger molecules, but instead switch to a different computational approach that, while it still can be handled without issue by your CPU's (indeed, this might even be a faster computation), requires larger array sizes (again, say 7 GB/molecule instead of 3 GB/molecule). Again, double the RAM.

3) Your research takes you in an entirely different direction that requires you to look at a new class of molecules you never envisioned, for which you need larger array sizes and thus more RAM. Etc., etc.

In sum, I think the problem is that you're looking at the huge increase in RAM since 2004, but what you're missing is that there's also been a huge increase in core count. And many of these computational programs are written by scientists, not computer specialists, and thus they're typically single-core (parallelizing computational programs can be very tricky). Thus what matters for these programs isn't total RAM, it's RAM/core. And RAM/core obviously hasn't risen as much as total RAM.

And if you think 96 GB (4 GB/core) is clearly underbuying for a 24-core machine, that's not an issue—I was just giving sample numbers; of course what I've written above works equally well if you scale everything up by two.

Wotcher · Jun 22, 2021

JMacHack said:
If I’m confused I apologize, but I’ve seen the argument that Apple will abandon the pro market plenty of times and it never holds water.

I've no proof, but I get the feeling those who say that are those who'd been clamoring for xMac or some upgradeable format and have been disappointed for many years that Apple never gives them what they want.

Unregistered 4U · Jun 22, 2021

JMacHack said:
I think that’s a given considering Apple Silicon’s touted architectural advantages over Intel. I don’t believe anyone expects the 8,1 Mac Pro to be a carbon copy of the 7,1 with an Apple processor instead of an Intel one.

Right, that was the point of my initial post on the topic. There shouldn’t be an assumption that the next Mac Pro is going to be as modular as the current one just because the current one is.

leman · Jun 22, 2021

theorist9 said:
I was doing computational work on large, complex molecules that required doing repeated operations on very large arrays. In order for the program to run efficiently, the entire array needed to be resident in RAM.

So suppose you are buy a 24-core MacPro to do such computations today. Each computation runs on its own core (the program is hard to parallelize). Thus you can run computations on 24 molecules at a time. You know how big your molecules are, and how finely you want to coarse-grain them, so you determine the maximum array size needed per molecule is 3 GB and, to give yourself some cushion, get 4 GB/core, i.e. 96 GB (after all, why overbuy)?

Here are three scendarios in which you wouldn't need to buy a new machine, but would need to double the RAM:

1) You'd like to run computations on larger molecules, but that's not practical, because the algorithm takes too long. But a few years down the road you, or some else, develops a more efficient algorithm that allows you to process molecules that are much larger, where each molecule now requires a larger array (say, 7 GB). Thus, giving the same cushion, you now need 8 GB/core, i.e., double the RAM.

2) You don't move to looking at larger molecules, but instead switch to a different computational approach that, while it still can be handled without issue by your CPU's (indeed, this might even be a faster computation), requires larger array sizes (again, say 7 GB/molecule instead of 3 GB/molecule). Again, double the RAM.

3) Your research takes you in an entirely different direction that requires you to look at a new class of molecules you never envisioned, for which you need larger array sizes and thus more RAM. Etc., etc.

In sum, I think the problem is that you're looking at the huge increase in RAM since 2004, but what you're missing is that there's also been a huge increase in core count. And many of these computational programs are written by scientists, not computer specialists, and thus they're typically single-core (parallelizing computational programs can be very tricky). Thus what matters for these programs isn't total RAM, it's RAM/core. And RAM/core obviously hasn't risen as much as total RAM.

And if you think 96 GB (4 GB/core) is clearly underbuying for a 24-core machine, that's not an issue—I was just giving sample numbers; of course what I've written above works equally well if you scale everything up by two.

Fair enough. HPC workloads are like that. Of course, if that is the kind of work you do, it would probably be much more efficient - and cheaper - to do it on a supercomputer. I developed the code for my thesis on my laptop, the final simulation run on a cluster with hundreds of CPUs and TBs of RAM.

CWallace · Jun 22, 2021

Unregistered 4U said:
I’m sure a LOT of R&D went into the trash can Pro, but it was used for 1 iteration, minor speed bumps over that time, and the next Pro had very little in common with it.

I’m just suggesting that the same could be true here, where what is released as a Pro may not have a lot in common with what came before.

MacPro 6,1 was a dead-end in terms of it's overall design. While they mention the "thermal corner" issue, if that was all that mattered, Mac Pro 7,1 would have been a "sealed box" with no internal expansion and with better thermal management.

Instead, MacPro 7,1 was like MacPro 5,1 - excellent thermal management and excellent internal expansion. It seems very unlikely to me Apple will decide MacPro 9,1 should return to a "sealed box" with no internal expansion.

Will it have 8 PCIe slots? Probably not, since the rumors say it could be a small form-factor tower. But it will have them (probably 4).
Will it support 1.5TB of RAM? I am guessing no (especially if it uses on-package). But I expect it to support at least 256GB and maybe 512GB.
Will it have SATA ports and room for internal SATA disks? Could be, though if it does, I think they could be limited to 2.5" form factors instead of the current 3.5" and two instead of four.

JouniS · Jun 22, 2021

leman said:
Fair enough. HPC workloads are like that. Of course, if that is the kind of work you do, it would probably be much more efficient - and cheaper - to do it on a supercomputer. I developed the code for my thesis on my laptop, the final simulation run on a cluster with hundreds of CPUs and TBs of RAM.

A supercomputer is more cost-effective, assuming that you can meaningfully test your code on a laptop. If you can't, it's better to use a workstation or a dedicated server, because most HPC environments are not suitable for iterated testing.

In the work I do, 256-512 GB has been the sweet spot for RAM for around a decade. I have always ended up using reserved HPC nodes or cloud instances for testing, which is a bit inconvenient but avoids the latency issues with HPC schedulers. Luckily, things are finally changing. Today you can get 128 GB RAM in a cheap consumer desktop, which is often but not always enough for my purposes. The next desktop I'll get after this 2020 iMac will probably be the first computer I've ever had with enough memory. And the one I'll get after that will probably be the first with an option to upgrade memory if my requirements change.

theorist9 · Jun 22, 2021

leman said:
Fair enough. HPC workloads are like that. Of course, if that is the kind of work you do, it would probably be much more efficient - and cheaper - to do it on a supercomputer. I developed the code for my thesis on my laptop, the final simulation run on a cluster with hundreds of CPUs and TBs of RAM.

I did the same thing. I needed the extra RAM so I could do development work on my G5 before sending it off to the cluster.

Except:
1) Development work can sometimes be sped up by doing many different test runs simultaneously, so it would be nice to have a powerful multi-core machine for that. You generally don't want to do to do your development work on the cluster itself.

2) The other problem with relying on a cluster is that they can get very heavy use. At my university, that typically meant a max allocation of 8 CPU's/user account (though occasionally it went down to 4 during especially heavy use; OTOH, I got an awful lot done by submitting my jobs over the winter holidays, during which I was allocated 64-128 CPU's, which was glorious). Thus many groups that have to do heavy computing develop their own "mini-clusters". One (or a few) high-core-count workstations could be such a cluster. Typically those would be Linux, but if the scientist is a Mac user, a Mac Pro would be very nice.

So "just send it to the cluster" is only a solution if you've done all your dev work, and you have reasonable cluster resources (which our university did not, even though we were, globally, a top-10 research university). I think there were enough experimental groups with a lot of money that anyone who needed them just bought their own cores. We were a bunch of theorists, and theorists tend not to get a lot of funding, so we had to make do with the university's "general-use" cores).

Rumored pro chip

Suspended

macrumors Core

macrumors Core

macrumors 601

macrumors Core

macrumors G5

macrumors G5

macrumors 6502a

macrumors G5

macrumors 68000

macrumors 601

macrumors Core

macrumors 6502

macrumors G4

Suspended

Suspended

macrumors G4

Suspended

macrumors 601

macrumors member

macrumors G4

macrumors Core

macrumors G5

macrumors 6502a

macrumors 601

Our Staff