Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Is there a good compilation of places to buy these 16GB sticks at present? I'm trying to shop around and so far:

 
I'd just go with Superbiiz. Other than lacking 3rd party warranty you can't really go wrong with Hynix/Samsung modules. Of course I'm in the UK and I don't worry about my parts failing too much as we have good consumer laws here and I am happy to push them on a supplier when they **** me about.
 
I hear what each of you are saying: "Geekbench only shows a slight slowdown when DIMM sockets are left empty, so it won't have a large impact in the real world."

This perspective is rational and has merit if Geekbench's top line numbers can be generalized to general system performance.

Unfortunately, they of not generalize.

In fact, it is worse than that. The design of Geekbench and the sole purpose for which it was built in the first place, ensure that it is a very bad proxy for measuring the influence of memory on system performance.

Maybe I'm missing something, but looking at the final geekbench numbers in single and multi-core looks like there is only a 3-5% difference in 2 vs. 4 channels used.

Clearly in memory performance there is a difference.. but it looks like its about a 4% difference conceivable by the user.

That comparison clearly shows that it the additional DIMMs don't matter for most of the tests - just a couple.

Those tests are with a dodeca system, a quad would show even less difference.

Each of you claim that system performance will only be impacted by a few percent by under-populating the memory channels. You base this on Geekbench gross scores. The problem with this is that Geekbench is a CPU level benchmark not a system level benchmark.

What does it measure?

It measures only three classes of thing.

It measures Integer performance, Floating point performance, and Memory performance of CPUs in isolation from the rest of the system.

The Integer and Floating point groups of tests are designed to fit entirely in on-die caches and never touch main memory. These tests together assess the raw number crunching and processing power of the cpu under test. None of these tests read or write data during operation.

The Memory performance section gives perspective on how fast the CPU can perform calculations that rely on external data. Some of these tests also perform significant computation, but each of the algorithms have dependencies which are designed to miss the on-die cache and force data to be read or written during operation.

As you can see, the bulk of the tests performed by Geekbench are specifically designed to avoid touching memory in any way. They are entirely irrelevant to assessing the memory subsystem because they were designed to be irrelevant. Any off-chip access would page fault, stall the thread, and destroy the thing being measured (oops).

Only the memory performance subtests touch memory in any way during operation. Thus, they are the only ones which are relevant to compare when evaluating different memory configurations.

Can you see that?

This system has 4 separate memory channels between DIMMs and the memory controller.

By interleaving data addresses across channels, access to and from memory can occur in parallel on 4 sets of data lines.

The controller can handle 2, 3, and 4 way interleaving limited only by how many channels contain matching DIMMs. DIMMs match by sharing the properties {RDIMM/UDIMM, voltage, speed, size, and maybe internal rank}.

So by choosing to install 1 DIMM you are limiting memory bandwidth to 25% of optimal performance. Install 2, you get 50%. 3 = 75%.

Back to Geekbench.

The memory test subsection posted here, clearly shows that there is a difference. We expect bandwidth to suffer with 2 chips compared to 4 chips. The number of channels has been cut in half so have 50% fewer lanes involved. There is latency, setup, and prefetch queuing in the mix, and the algorithms perform nontrivial work on the data as well so we expect to see less than 50% and do see a 40% difference.

Does this mean that the general system will be 40-50% slower? No. It only means the memory subsystem will be up to 50% slower when actually transferring data between the memory and controller. This does not happen at peak continuously. But it does happen from time to time. Sometime as frequently as 1.866 billion times per second.

Every time that memory is read or written on your system it will happen on a data path that is determined by how the data is interleaved. By choosing to use fewer channels you set a lower bandwidth.

If you do plan on 16GB and still need a number around 4% to compare with something, how about this one? 100 bucks is less than 4% of the base model sticker price. You'll have your 16GB and it will boost your memory bandwidth by 33% in the process.
 
Last edited:
I hear what each of you are saying: "Geekbench only shows a slight slowdown when DIMM sockets are left empty, so it won't have a large impact in the real world."

This perspective is rational and has merit if Geekbench's top line numbers can be generalized to general system performance.

Unfortunately, they of not generalize.

In fact, it is worse than that. The design of Geekbench and the sole purpose for which it was built in the first place, ensure that it is a very bad proxy for measuring the influence of memory on system performance.





Each of you claim that system performance will only be impacted by a few percent by under-populating the memory channels. You base this on Geekbench gross scores. The problem with this is that Geekbench is a CPU level benchmark not a system level benchmark.

What does it measure?

It measures only three classes of thing.

It measures Integer performance, Floating point performance, and Memory performance of CPUs in isolation from the rest of the system.

The Integer and Floating point groups of tests are designed to fit entirely in on-die caches and never touch main memory. These tests together assess the raw number crunching and processing power of the cpu under test. None of these tests read or write data during operation.

The Memory performance section gives perspective on how fast the CPU can perform calculations that rely on external data. Some of these tests also perform significant computation, but each of the algorithms have dependencies which are designed to miss the on-die cache and force data to be read or written during operation.

As you can see, the bulk of the tests performed by Geekbench are specifically designed to avoid touching memory in any way. They are entirely irrelevant to assessing the memory subsystem because they were designed to be irrelevant. Any off-chip access would page fault, stall the thread, and destroy the thing being measured (oops).

Only the memory performance subtests touch memory in any way during operation. Thus, they are the only ones which are relevant to compare when evaluating different memory configurations.

Can you see that?

This system has 4 separate memory channels between DIMMs and the memory controller.

By interleaving data addresses across channels, access to and from memory can occur in parallel on 4 sets of data lines.

The controller can handle 2, 3, and 4 way interleaving limited only by how many channels contain matching DIMMs. DIMMs match by sharing the properties {RDIMM/UDIMM, voltage, speed, size, and maybe internal rank}.

So by choosing to install 1 DIMM you are limiting memory bandwidth to 25% of optimal performance. Install 2, you get 50%. 3 = 75%.

Back to Geekbench.

The memory test subsection posted here, clearly shows that there is a difference. We expect bandwidth to suffer with 2 chips compared to 4 chips. The number of channels has been cut in half so have 50% fewer lanes involved. There is latency, setup, and prefetch queuing in the mix, and the algorithms perform nontrivial work on the data as well so we expect to see less than 50% and do see a 40% difference.

Does this mean that the general system will be 40-50% slower? No. It only means the memory subsystem will be 40% slower when actually transferring data between the memory and controller. This does not happen at peak continuously. But it does happen from time to time. Sometime as frequently as 1.866 billion times per second.

Every time that memory is read or written on your system it will happen on a data path that is determined by how the data is interleaved. By choosing to use fewer channels you set a lower bandwidth.

If you do plan on 16GB and still need a number around 4% to compare with something, how about this one? 100 bucks is less than 4% of the base model sticker price. You'll have your 16GB and it will boost your memory bandwidth by 33% in the process.

In summary: If you buy a nMP and only populate one DIMM, you're nuts! :p :D
 
Is there a good compilation of places to buy these 16GB sticks at present? I'm trying to shop around and so far:


Your best bet will be to shop by part numbers.

I do not know of a list of part numbers for the Mac Pro specifically, or for the E5-16xx v2 series CPUs.

However, Intel has published the results of DIMMs which have passed system level validation tests in quad channel 3 slot server boards for the E5-2600 v2 CPUs. I know of no reason why the UDIMM and RDIMM parts they have certified would not work for us, but cannot vouch for any personally for at least another month. :(

The list of current reports is on Intel's page titled:
"DDR3 Platform Memory Validation, Specifications, and Results"

http://www.intel.com/content/www/us/en/platform-memory/platform-memory.html

link to the UDIMM report is here:
http://www.intel.com/content/www/us...mm-ecc-xeon-e5-2600v2-validation-results.html

link to the RDIMM report is here:
http://www.intel.com/content/www/us...-rdimm-xeon-e5-2600v2-validation-results.html

Here is Apple's official memory specification:
http://support.apple.com/kb/HT6064

Since I hope this will prove helpful, but it may make it easier for some to make unwise decisions, I want to add a bit of advice.

Based on experience with prior systems and personal opinion:
4 matched DIMMs in this system is optimal
3 matched DIMMs will result in acceptably lower performance.
2 DIMMs will work but are probably a Very Bad Idea.
1 DIMM will work but is probably a Profoundly Bad Idea

Enjoy

p.s. Gotta love italic. Caps lock but less shouty.

----------

In summary: If you buy a nMP and only populate one DIMM, you're nuts! :p :D

It was one last attempt at reason before casting aspersions.

Your summary is perfect, however. Concise and complete.
 
In summary: If you buy a nMP and only populate one DIMM, you're nuts! :p :D

Or "crazy like a fox".

Note that the original idea was to buy one 16 GiB now, and get more when the price settles - rather than buying a 4 GiB that would be discarded.

Still sounds like a good idea.

Look at the tasks in the GeekBench report - most them showed very little difference in performance. Real world tasks like JPEG compression, Zip compression, SHA1/SHA2 hashing, Twofish encryption, sharpen filter, blur filter, FFTs, ray tracing - very little difference.

The cache sizes on the chips were chosen so that "real world" problems would be helped by the cache.

What was hurt? AES encryption - did you know that it scales perfectly with hyper-threading, so you had 24 cores in play? Dijkstra path routing, also hit. STREAM - well, STREAM is a "memory bandwidth virus" program that does nothing useful, other than measure sustained memory bandwidth.

If your workflow is mostly similar to AES/Dijkstra/STREAM, you'll possibly notice the difference with more channels.

If your workflow is more like the other 49 components of GeekBench, you probably won't notice.
 
Or "crazy like a fox".

Note that the original idea was to buy one 16 GiB now, and get more when the price settles - rather than buying a 4 GiB that would be discarded.

Still sounds like a good idea.

Look at the tasks in the GeekBench report - most them showed very little difference in performance. Real world tasks like JPEG compression, Zip compression, SHA1/SHA2 hashing, Twofish encryption, sharpen filter, blur filter, FFTs, ray tracing - very little difference.

The cache sizes on the chips were chosen so that "real world" problems would be helped by the cache.

What was hurt? AES encryption - did you know that it scales perfectly with hyper-threading, so you had 24 cores in play? Dijkstra path routing, also hit. STREAM - well, STREAM is a "memory bandwidth virus" program that does nothing useful, other than measure sustained memory bandwidth.

If your workflow is mostly similar to AES/Dijkstra/STREAM, you'll possibly notice the difference with more channels.

If your workflow is more like the other 49 components of GeekBench, you probably won't notice.

On one hand I agree with you, but on the other hand, if you're buying a nMP... really?! At least spend the extra $160 for a second stick :confused: :eek: It just seems like nonsense to me. You could use the exact same argument to talk yourself into downgrading everything until you we're running a Pentium II from 1999. :p
 
On one hand I agree with you, but on the other hand, if you're buying a nMP... really?! At least spend the extra $160 for a second stick :confused: :eek: It just seems like nonsense to me. You could use the exact same argument to talk yourself into downgrading everything until you we're running a Pentium II from 1999. :p

Where are they $160 - Crucial and OWC are $215-$220 per DIMM.

Some people are on tight budgets, and need to consider every couple-of hundred bucks.

Although back to the OP - another option would be to use the 12 GiB until the prices drop, then buy 2x16GiB.
 

They do have good prices, but kind of inconsistent inventory from checking for a few other things (no Seagate 4TB NAS drives, for example).


On one hand I agree with you, but on the other hand, if you're buying a nMP... really?! At least spend the extra $160 for a second stick :confused: :eek: It just seems like nonsense to me. You could use the exact same argument to talk yourself into downgrading everything until you we're running a Pentium II from 1999. :p

And I could use your argument to ask why anyone would stop with a hexa/D500/32GiB instead of springing for the dodeca/D700/64GiB.

If you're buying an nMP... really? ;)
 
And I could use your argument to ask why anyone would stop with a hexa/D500/32GiB instead of springing for the dodeca/D700/64GiB.

If you're buying an nMP... really? ;)

That is not a fair characterization of his argument.

It is baffling why someone would buy a mac pro - a system with a powerful processor and beefy GPUs, and then deliberately cripple it by giving it memory bandwidth more suitable for a Core Duo processor. You are bringing the memory performance down to about the level of a 2008 iMac or a 2009 white macbook.

Seriously, that is not a sensible thing to do.
 
That is not a fair characterization of his argument.

It is baffling why someone would buy a mac pro - a system with a powerful processor and beefy GPUs, and then deliberately cripple it by giving it memory bandwidth more suitable for a Core Duo processor. You are bringing the memory performance down to about the level of a 2008 iMac or a 2009 white macbook.

Seriously, that is not a sensible thing to do.

Are you seriously suggesting that a system with one PC3-14900 DIMM and 10 MiB L3 cache has the same memory performance as a system with one PC2-5300 DIMM and 2 MiB cache?

And the OP is looking at it as a temporary budgeting measure. The nMP with one DIMM will still be much faster than his previous system, and he plans to add more DIMMs soon.

To me, that seems more sensible than paying credit card interest for a DIMM that might not give any noticeable benefit for many programs, when the prices of DIMMs most likely will go down as the volume of PC3-14900 DIMMs increases.

BTW, I just ordered three 20-core/40-thread ProLiants - each with eight PC3-14900 to fully populate all eight memory channels. These systems will be doing AES encryption/decryption, so they'll need the bandwidth.
 
Last edited:
Are you seriously suggesting that a system with one PC3-14900 DIMM and 10 MiB L3 cache has the same memory performance as a system with one PC2-5300 DIMM and 2 MiB cache?
No, I was referring to the systems with 2 channels of PC2-6400, with aggregate bandwidth of 12.8GB/s. The memory subsystem on the 6,1 Mac Pro pushes about 14.9 GB/s per channel (with worse latency than DDR2). So, yeah, I think it is an apt comparison. that older memory system was about 14% less beefy, but feeding a tiny fraction of the demand that this Mac Pro will throw at it. When the memory subsystem and CPU are in balance, work can proceed with minimal delays. When memory cannot keep up, you get stalls. It doesn't really matter how fast your silicon is, if you cannot keep it fed it will go dark until the data arrives.

Each Xeon core has the ability to complement its directly accessible cache with that of other cores, but it still causes the thread to stall while a fetch is made across the shared ring bus to another cache. So, I don't like lumping these caches together to talk about 10 or 12 MB caches as if it was monolithic. On short time scales cache is very effective at keeping a core fed, but it not magical. All data still needs to come to and from the memory subsystem to populate caches in the first place.

And it's not just CPU cores involved. 40 PCIe 3 lanes worth of devices and the PCH via the DMI interface also talk the home controller and the memory controller.

All subsystems in the computer rely on the memory subsystem. So the performance of the memory subsystem is tuned to meet the aggregate demands. It is the circulatory system on which all other subsystems depend.

It is also no use arguing that the peak memory bandwidth of 59.7 will rarely come into play. With 4 channels interleaved it will be utilized at full tilt for each request of 64 bytes or more. Most access will be driven by page faults and the native page size of Mach is 4K. Deliberately choosing to slow down memory access by a factor of 4 is a big deal. It will effect the lowest levels of the system for each page fetch or store.

And the OP is looking at it as a temporary budgeting measure. The nMP with one DIMM will still be much faster than his previous system, and he plans to add more DIMMs soon.

To me, that seems more sensible than paying credit card interest for a DIMM that might not give any noticeable benefit for many programs, when the prices of DIMMs most likely will go down as the volume of PC3-14900 DIMMs increases.

Paying interest on a non-productive asset is a suckers game. For sake of argument, this "temporary budgeting" has to be at least several weeks to a month to make any difference at all. Put it on plastic and pay in full at the next billing cycle. But it sounds that this delay is both longer and uncertain. There was a glut of DRAM supply causing cancellation of several memory fabs even before the Hynix fire (which led to the current capacity shortfall). The recent Q3 and projected Q1 memory markets show rising commercial demand for DRAMs. So, it may take 3-6 months to see significant drops in retail prices.

That seems like a long time to gimp a system.

Look, I know I am showing my own biases here. I do have a chip on my shoulder. 8) I have had to resolve too many cluster****s on production systems over the years caused by people naively screwing up memory configurations. Maybe it was a bean counter sourcing mixed lots of similar but unmatched sets. Some times it was some similarly well intentioned schmuck in the PC or database groups who thought that under-populating now to save money in the future was a good idea. In either case, I was the guy called in at 3 a.m. months later to find out why the production schedule was blown.

BTW, I just ordered three 20-core/40-thread ProLiants - each with eight PC3-14900 to fully populate all eight memory channels. These systems will be doing AES encryption/decryption, so they'll need the bandwidth.

Sounds sweet. 20 Cores - that's a lot of mouths to feed!
 
Some actual data....

https://forums.macrumors.com/posts/18745317/

'analog guy' was kind and generous enough to do a test with one to four 16 GiB DIMMs on GeekBench, and post the results.

They confirm that except for AES and Dijkstra (and of course STREAM), there's virtually no performance difference between 1 DIMM and 4 DIMMs on the components of GeekBench.

In fact, the SHA1 and SHA2 tests were faster with one 16 GiB DIMM than with four! (Although clearly within the sampling error....)

So, the OP can buy one DIMM and not worry about general performance while he waits for prices to drop, and I'm happy that I've ordered 8 DIMMs for each of my AES-NI application servers.
 
Helpful findings as a followup to this spirited discussion! Thanks AidenShaw for your observations, and I appreciate your tests, analog guy.
 
Helpful findings as a followup to this spirited discussion! Thanks AidenShaw for your observations, and I appreciate your tests, analog guy.

happy to help!

got my nMP late last week and figured i would post up some information that could help people with decisions. right now there is a dearth of good information out there on various configs. we have some benchmarks, but they tend to be for a single configuration. (e.g., i didn't see any tests on the same config with 16/32/48/64GB, so i posted mine; i don't see any tests yet on the same config with d300 vs d500 vs d700, etc.)

happy to read this discussion. i learn from it.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.