Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
The first post of this thread is a WikiPost and can be edited by anyone with the appropiate permissions. Your edits will be public.
If purely for Windows vs macOS comparison. I think we can use PassMark PerformanceTest. This should be better than use two difference software, because we don't know if the software can really measure the real max performance. By using the same software from the same company, we at least can assume the software should use same / similar method to measure performance on difference platform. And the results are more comparable.

It's a free software, and has Memory Test Suite. All we need to run should be just the "Memory Latency" and "Memory Threaded" tests.

May be run it three times, check for consistency, and get the average results, etc.

Then we can compare the difference between Windows and macOS.

Of course, we can use these method to compare performance difference between different memory config as well.
 
Where do you get this timings and the "true" speed from. My memory,x sticks of 32, 192 in total, "claims" to run at 1.333. But i guess, this is just what i entered into the config.plist. Is there any way to check what the speed really is in MacOS?
CPU-Z and Aida64 Extreme both provides the memory stats on Windows. In CPU-Z, there are two tabs, one is the actual negotiated memory speed and timing, typically determined by the motherboard. The second tab is SPD, which is the specs of the ram sticks. The config.plist is a way to provide the SPD information to the SMC. Given that it is a BIOS setting of the actual speed, I think Windows and MacOS will be operating on the same frequency and timing.

Unfortunately, I haven't found a way to check the "real" speed inside MacOS, but I think you can compare the single thread speed and draw an educated guess. Using amorphous memory mark, SEQ1M T1 Read 192GB (9.66GB/s) 256GB (8.43GB/s), SEQ1M T1 Write 192GB (7.38GB/s) 256GB (4.74GB/s). The theoretical peak transfer rate for 1333Mhz is 10.6GB/s, for 1066Mhz is 8.5GB/s, and for 800Mhz is 6.4GB/s.

I'm finally more surprised, how Windows can keep the trougput that high, while triple channel is definitely lost. Or there beeing no real difference in the troughput with triple vs. dual channel in Windows.

I found that Passmark's PerformanceTest can do memory tests for both Windows 10 and MacOS, so now I can do better comparable testing. The result is below.

Screen Shot 2023-07-23 at 3.18.32 pm.png


My conclusion is under Windows 10, going 256GB is a worthwhile upgrade as it doesn't impact performance much. But for MacOS, threaded memory read degraded a lot.

Based on this, I think the performance degradable might be more about how the kernel access memory than just triple channel going to single channel. Some of the keywords I found are interleaved, NUMA, far memory vs near memory, which are a bit too much into the rabbit hole for me.

For completeness and the benefit of others, I am also including the thread memory graph under Windows for 192GB and 256GB below. My interpretation is for a workload that involves less than three threads, there shouldn't be an apparent real-world impact using 256GB under MacOS.

Windows 10 192GB PassMark v11 Threaded.PNG
Windows 10 256GB PassMark v11 Threaded.PNG
 
  • Like
Reactions: flyproductions
CPU-Z and Aida64 Extreme both provides the memory stats on Windows. In CPU-Z, there are two tabs, one is the actual negotiated memory speed and timing, typically determined by the motherboard. The second tab is SPD, which is the specs of the ram sticks. The config.plist is a way to provide the SPD information to the SMC. Given that it is a BIOS setting of the actual speed, I think Windows and MacOS will be operating on the same frequency and timing.

Unfortunately, I haven't found a way to check the "real" speed inside MacOS, but I think you can compare the single thread speed and draw an educated guess. Using amorphous memory mark, SEQ1M T1 Read 192GB (9.66GB/s) 256GB (8.43GB/s), SEQ1M T1 Write 192GB (7.38GB/s) 256GB (4.74GB/s). The theoretical peak transfer rate for 1333Mhz is 10.6GB/s, for 1066Mhz is 8.5GB/s, and for 800Mhz is 6.4GB/s.



I found that Passmark's PerformanceTest can do memory tests for both Windows 10 and MacOS, so now I can do better comparable testing. The result is below.

View attachment 2236007

My conclusion is under Windows 10, going 256GB is a worthwhile upgrade as it doesn't impact performance much. But for MacOS, threaded memory read degraded a lot.

Based on this, I think the performance degradable might be more about how the kernel access memory than just triple channel going to single channel. Some of the keywords I found are interleaved, NUMA, far memory vs near memory, which are a bit too much into the rabbit hole for me.

For completeness and the benefit of others, I am also including the thread memory graph under Windows for 192GB and 256GB below. My interpretation is for a workload that involves less than three threads, there shouldn't be an apparent real-world impact using 256GB under MacOS.

View attachment 2236009View attachment 2236008
Thanks for providing a full picture of what's happening.

Yeah, it seems the pure memory performance is a bit off when running 256GB RAM in macOS. However, under most real world situation, it's the "cached memory performance" matter (apart from benchmarking software, there should be no software will disable cache). And apparently, that's not affected.

So, I think for those who prefer more memory, no need to super worry about this effect. The cached memory read performance is still the same. Which means, most software's performance shouldn't be affected significantly.
 
I also changed the product definition in config.plist to MacPro6,1 and MacPro7,1. RAM performance was much worst under 256GB. I am unsure if it's because of different CPU behaviour (e.g. dual CPUs vs. single CPU), or if I didn't do the RAM's banking configuration properly.

There might be some config.plist settings that could improve the 256GB ram performance using a different product definition.
 
I also changed the product definition in config.plist to MacPro6,1 and MacPro7,1. RAM performance was much worst under 256GB. I am unsure if it's because of different CPU behaviour (e.g. dual CPUs vs. single CPU), or if I didn't do the RAM's banking configuration properly.

There might be some config.plist settings that could improve the 256GB ram performance using a different product definition.
I am not sure if this help anything.

But most likely DeviceLocator can be changed to DIMM 1 to DIMM 8 (the native cMP format), rather than use the existing Channelx-DIMMx format.

Also, when there is no memory spoofing.
BankLocator should be Not Specified
AssetTag is actually Asset Tag:

Manufacture and part number most likely can be read by dmidecode in Windows.
Manufacturer should be something like 0x802C and
PartNumber usually looks like 0x33364B534632473732505A2D314734453120

I haven't try dmidecode in Windows, but you may try to see if you can get all the native memory info which can be set in the config.plist. Then see if those setting make any difference in real world memory performance.
 
  • Like
Reactions: flyproductions
I am not sure if this help anything.

But most likely DeviceLocator can be changed to DIMM 1 to DIMM 8 (the native cMP format), rather than use the existing Channelx-DIMMx format.

Also, when there is no memory spoofing.
BankLocator should be Not Specified
AssetTag is actually Asset Tag:

Manufacture and part number most likely can be read by dmidecode in Windows.
Manufacturer should be something like 0x802C and
PartNumber usually looks like 0x33364B534632473732505A2D314734453120

I haven't try dmidecode in Windows, but you may try to see if you can get all the native memory info which can be set in the config.plist. Then see if those setting make any difference in real world memory performance.
My current settings are already as suggested, i.e. DIMM 1 to DIMM 8 as DeviceLocator, plus others. I did it based on Linux dmidecode. I also notice many other data fields are not supported in OpenCore, but not sure if those matter much.
 
I found that Passmark's PerformanceTest can do memory tests for both Windows 10 and MacOS, so now I can do better comparable testing.
Nice tool! Thanks for pointing me to it.

...and here are my results:

passmark.png

I get by small amount higher values for 192GB, running Monterey 12.6.7.

Maybe, for any reason the RAM really operates at 1.333MHz.

Edit: Most likely not. I did some calculation and in any case the difference between the results is smaller than the one between 1.333 and 1.066.
 
Last edited:
Manufacture and part number most likely can be read by dmidecode in Windows.
Manufacturer should be something like 0x802C and
PartNumber usually looks like 0x33364B534632473732505A2D314734453120
I entered manufacturer as well as part-/serial-number in text instead of hex, where i took the numbers just from the sticks's labels. Can this do any harm? (Aside from entering wrong values as the parts for sure can be mislabeled)
 
I entered manufacturer as well as part-/serial-number in text instead of hex, where i took the numbers just from the sticks's labels. Can this do any harm? (Aside from entering wrong values as the parts for sure can be mislabeled)
From my own test, you can enter whatever you want. That item is very cosmetic.
 
  • Like
Reactions: flyproductions
You are loosing Triple Channel when "upgrading" from 192 to 256. So the fourth socket should always be left open, if performance is your goal. 256 is just for an "impressive" "About This Mac".
As easy as that!
Not true i have all 8 slots filled to make 80GB ram and in triple channel according to Windows 10. Mixed sizes too.
 
You are loosing Triple Channel when "upgrading" from 192 to 256. So the fourth socket should always be left open, if performance is your goal. 256 is just for an "impressive" "About This Mac".

Not true i have all 8 slots filled to make 80GB ram and in triple channel according to Windows 10. Mixed sizes too.
Yes, sure! If you populate the first two slots with sticks of same size and the other two with sticks of half the size, you still have triple channel. But as soon as you max out the amount, using four sticks of the same size, triple channel is gone.
 
  • Like
Reactions: VaZ
You are loosing Triple Channel when "upgrading" from 192 to 256. So the fourth socket should always be left open, if performance is your goal. 256 is just for an "impressive" "About This Mac".

Not true i have all 8 slots filled to make 80GB ram and in triple channel according to Windows 10. Mixed sizes too.
How do you know you still have triple channel with 80GB RAM? If it is based on CPU-Z, I think it is just displaying the capability of the CPU. If you only have one stick, you still get "triple channel" in CPU-Z.

If you benchmark RAM speed, you might see a difference under proper triple channel vs. other combinations.
 
How do you know you still have triple channel with 80GB RAM? If it is based on CPU-Z, I think it is just displaying the capability of the CPU. If you only have one stick, you still get "triple channel" in CPU-Z.

If you benchmark RAM speed, you might see a difference under proper triple channel vs. other combinations.
Yeah CPU-Z shows me that. I suppose i could run some tests and pop 2 sticks out to really know if it is or isn't. Is there another tool that may be more accurate than CPU-Z?
 
Yeah CPU-Z shows me that. I suppose i could run some tests and pop 2 sticks out to really know if it is or isn't. Is there another tool that may be more accurate than CPU-Z?
One way to verify is to try with one-stick and see if CPU-Z still display the "triple channel". If CPU-Z changes, then we know it is telling the truth.

Other than that, I do not know any tools to get to that low level of detail inside Windows. I supposed MemTest86+ might give more detail, but it is OS-independent, and I haven't tried it.

I think a speed test is still a better way to verify the real impact. For example, I found 192GB (32GB x4 + 16GB x4) is almost equivalent to 192GB (32GB x6) for my purpose, which means I could fill two mac pros to 192GB and 192GB, rather than 256GB and 128GB, and total performance is better.
 
  • Like
Reactions: h9826790
CPU-Z shows single, double, triple channel the more ram i add so that works. My testing shows the opposite of what you're saying. I'm pulling faster speeds each pair of ram i add up to the 8th slot. I see more speed with 8 sticks instead of 6 using some benchmark exe program i found in the M$ Apple store.
 
CPU-Z shows single, double, triple channel the more ram i add so that works. My testing shows the opposite of what you're saying. I'm pulling faster speeds each pair of ram i add up to the 8th slot. I see more speed with 8 sticks instead of 6 using some benchmark exe program i found in the M$ Apple store.
That's interesting.

Did you use 4Rx4 ram? If you read the technical document I have attached, you can see how the expected behaviour should change. One of the memory configuration in this document is the same as Mac Pro. Basically each channel can only have certain total ranks, once you exceed that the memory speed decrease. Using 2R rams don't breach that cap while 4R seems to.

Also what is the speed you are referring to? I think Memory Write and Memory Read Uncached seem to more indicative of how many channels used by the system. Threaded memory speed seems to be related to how the operating system kernal deal with memory access.

My conclusion is Windows always does it better/no worse off than Mac OS for the cMP.
 

Attachments

  • Quad Rank RAM Performance Penalty White Paper.pdf
    565.8 KB · Views: 127
  • Like
Reactions: VaZ
Ahh you're a very wise man :)
Yes correct i am using 2Rx4 sticks for all 8 slots so i guess i'm not breaching that cap you mention.

Here are some photos i took for you showing my throughput speed tests in each mode -
 

Attachments

  • IMG_9794.JPG
    IMG_9794.JPG
    842.5 KB · Views: 90
  • IMG_9793.JPG
    IMG_9793.JPG
    1 MB · Views: 81
  • IMG_9791.JPG
    IMG_9791.JPG
    848.8 KB · Views: 86
  • IMG_9790.JPG
    IMG_9790.JPG
    1.1 MB · Views: 69
  • IMG_9789.JPG
    IMG_9789.JPG
    851 KB · Views: 72
  • IMG_9787.JPG
    IMG_9787.JPG
    998.1 KB · Views: 66
  • IMG_9786.JPG
    IMG_9786.JPG
    751 KB · Views: 62
  • IMG_9785.JPG
    IMG_9785.JPG
    796.8 KB · Views: 82
  • IMG_9792.JPG
    IMG_9792.JPG
    490.9 KB · Views: 81
Here are some photos i took for you showing my throughput speed tests in each mode -
What app is this, you are using to measure? And why are all the values swapped? i. e. "analog" instrument showing 164.671 for Write, but below it shows "Read. 164.671 MB/s"

Also these values appear "a bit" high to me... ...compared to what i get.

memory.png
 
Last edited:
Ahh you're a very wise man :)
Yes correct i am using 2Rx4 sticks for all 8 slots so i guess i'm not breaching that cap you mention.

Here are some photos i took for you showing my throughput speed tests in each mode -
Thank you for the test results.

Good to know CPU-Z showing the correct channel. I can see different write speed for different channel mode so that helps as a benchmark.

What app is this, you are using to measure? And why are all the values swapped? i. e. "analog" instrument showing 164.671 for Write, but below it shows "Read. 164.671 MB/s"

Also these values appear "a bit" high to me... ...compared to what i get.

View attachment 2244263
Actually, you are correct, the value looks wrong, too high by multiple magnitudes compared to the actual technical capabilities of the RAM and CPU.
 
Last edited:
  • Like
Reactions: flyproductions
Actually, you are correct, the value looks wrong, too high by multiple magnitudes compared to the actual technical capabilities of the RAM and CPU.
Also, at least in my case, the memory seems to run @1066 instead of 1333. I checked CPU-Z in Windows and it shows 532 something MHz for DRAM frequency. But this doesn't seem to cause any harm to overall performance.

With the 16GB-modules i also had this when using quad rank dimms. And Geekbench in some cases even showed (by small amount) better memory scores for the sticks running at the lower frequency (due to the shorter latency of 7 vs. 9?). Alltogether looks more like a somehow cosmetic issue to me.

The 32GB-sticks i am using are these HYNIX-parts:

Hynix 32GB.jpg


Also quad rank (4R). Are there even dual rank dimms in 32GB size?

Edit: So the 1333 MHz, "About this Mac" comes up with, most likely are just what i myself typed into the OC config.plist. Maybe i should give 4200 a try to get a nice screenshot! 😎
 
Last edited:
Maybe i should give 4200 a try to get a nice screenshot! 😎
I even get 7200 MHz! 🤣

ram_speeds.png


So it’s now confirmed, that whatever the Mac shows in "About..." and System Info, is just taken from the Memory-Key in OC's config.plist!

Doesn't really matter to me as long as the full amount can be used (which seems to be the case, regarding what tools like Passmark or Rember show).
 
Last edited:
I even get 7200 MHz! 🤣

View attachment 2244870

So it’s now confirmed, that whatever the Mac shows in "About..." and System Info, is just taken from the Memory-Key in OC's config.plist!

Doesn't really matter to me as long as the full amount can be used (which seems to be the case, regarding what tools like Passmark or Rember show).
You should try the other way, make the speed like 667Mhz and see if the benchmark result is worse.
 
You should try the other way, make the speed like 667Mhz and see if the benchmark result is worse.
Quite sure that also this will not change anything.

Ram will keep running @1066 and results almost certainly will be as experienced with the different 16GB modules: dual-rank, 1333 haved slightly better transfer, quad-rank 1066 have slightly faster access. Overall memory performance between the two was, within a very small margin, identical.
 
Quite sure that also this will not change anything.

Ram will keep running @1066 and results almost certainly will be as experienced with the different 16GB modules: dual-rank, 1333 haved slightly better transfer, quad-rank 1066 have slightly faster access. Overall memory performance between the two was, within a very small margin, identical.
I speculate that the speed data sets the maximum possible speed the system could attempt to use but falls back to the actual physical limitation. So when you increase it, nothing happens, but if you decrease it, it may cap the actual speed. Just a hypothesis that I can't be bothered to test.
 
I speculate that the speed data sets the maximum possible speed the system could attempt to use but falls back to the actual physical limitation. So when you increase it, nothing happens, but if you decrease it, it may cap the actual speed. Just a hypothesis that I can't be bothered to test.
Ok, i can give this a try!
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.