Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,520
19,671
We do know that RAM errors happen, and that they cause random crashes and data corruption. Another benefit of ECC RAM is that you can get an advance warning before it is failing, i.e. while it is starting to fail and producing a higher rate of errors.

And what if it isn't a crash but silent data corruption? Why is that only bad if it's on a server? A businessman's Excel sheet, a software developer's source code, your Amazon order and online banking transaction, are they not important?

The same persistent data that passes through RAM at least once on its way to and from persistent storage? ECC is not rocket science. We are investing so much into nicer screens, faster drives, faster everything actually, why not invest just a little bit to make your system more reliable, too?

I totally understand what you mean. However, at the end of the day this is about cost–benefit analysis. The main question for me is how real a chance of „proper„ data corruption is as opposed to a crash. I mean, if it costs 5–10% of performance at every given moment to avoid something that will probably not happen in my lifetime… is it really a must–have feature? But if instead one can demonstrate that data corruption is real and happens with a high enough frequency (even if it’s one file in a few months) than ECC tradeoffs is definitely worth it.
 
  • Like
Reactions: bobcomer

Basic75

macrumors 68020
May 17, 2011
2,101
2,447
Europe
I totally understand what you mean. However, at the end of the day this is about cost–benefit analysis. The main question for me is how real a chance of „proper„ data corruption is as opposed to a crash. I mean, if it costs 5–10% of performance at every given moment to avoid something that will probably not happen in my lifetime… is it really a must–have feature? But if instead one can demonstrate that data corruption is real and happens with a high enough frequency (even if it’s one file in a few months) than ECC tradeoffs is definitely worth it.
Even though Intel marketing has tried to brainwash everybody into believing that only servers need ECC, RAM error rates are not that uncommon.

 
  • Like
Reactions: Ruftzooi

leman

macrumors Core
Oct 14, 2008
19,520
19,671
Even though Intel marketing has tried to brainwash everybody into believing that only servers need ECC, RAM error rates are not that uncommon.


Yeah, I know that study. I don’t find it convincing. The distribution of errors is too skewed, suggesting that a lot of errors were constrained to specific hardware.
 
  • Like
Reactions: Ruftzooi

Analog Kid

macrumors G3
Mar 4, 2003
9,360
12,603
I totally understand what you mean. However, at the end of the day this is about cost–benefit analysis. The main question for me is how real a chance of „proper„ data corruption is as opposed to a crash. I mean, if it costs 5–10% of performance at every given moment to avoid something that will probably not happen in my lifetime… is it really a must–have feature? But if instead one can demonstrate that data corruption is real and happens with a high enough frequency (even if it’s one file in a few months) than ECC tradeoffs is definitely worth it.
Yeah, I think the reason most consumer equipment has gotten away without it is because most failures (like most genetic mutations) are generally fatal or corrected by internal consistency checks when saving a file or something. If the system crashes, it's annoying, but most people blame Windows or Tim Cook (this never would have happened when Steve was alive!), reboot and move on.

It's more important in scientific computing and such where there's a risk that a dataset is corrupted in memory leading to an invalid outcome. If you are spending hours or days processing terrabytes of data there's a reasonable chance that a random event will occur in that time that impacts your results.
 
  • Like
Reactions: MacCheetah3

leman

macrumors Core
Oct 14, 2008
19,520
19,671
Yeah, I think the reason most consumer equipment has gotten away without it is because most failures (like most genetic mutations) are generally fatal or corrected by internal consistency checks when saving a file or something. If the system crashes, it's annoying, but most people blame Windows or Tim Cook (this never would have happened when Steve was alive!), reboot and move on.

It's more important in scientific computing and such where there's a risk that a dataset is corrupted in memory leading to an invalid outcome. If you are spending hours or days processing terrabytes of data there's a reasonable chance that a random event will occur in that time that impacts your results.

Precisely. And that’s why those domains use ECC memory. But it’s not something I need when doing MCMC Bayesian estimation for example. I’m already running thousands of estimates, I don’t care if one of those gets corrupted once in a while.
 
  • Like
Reactions: Analog Kid

Basic75

macrumors 68020
May 17, 2011
2,101
2,447
Europe
But it’s not something I need when doing MCMC Bayesian estimation for example. I’m already running thousands of estimates, I don’t care if one of those gets corrupted once in a while.
What if the corruption is in the code and all of these thousands of estimates?

I'm sorry, I really can not fathom why anybody would resist using a readily available technology that increases the reliability and stability of our digital devices!

The price would come down if it were deployed ubiquitously.

The only reason against it seems to be "I feel like I don't need it", even though we know that RAM cells are not perfect, we know that they occasionally produce errors, we know that the error rate can increase at some point in the life of a RAM chip, another thing that ECC would help detect early enough.

Nobody knows how many faults they have encountered on their non-ECC machines because they often wouldn't notice! Not every fault leads to the equivalent of a blue screen, and even then you'd probably blame MS or Apple and not your RAM because you don't know what happened.



"The only reason Intel says 'ECC is for servers and embedded' is because Intel marketing people have convinced the powers that be that they can sell otherwise inferior chips for a higher price by enabling ECC functionality,"

And some people here have listened to Intel too much.

With DRAM cells shrinking all the time they are not getting much more reliable, on the other hand RAM sizes are ever increasing, making the probability of an error in a device rise over the years.
 
  • Like
Reactions: Dismayed

Basic75

macrumors 68020
May 17, 2011
2,101
2,447
Europe
I think the reason most consumer equipment has gotten away without it is because most failures (like most genetic mutations) are generally fatal or corrected by internal consistency checks when saving a file or something.
I think the reason most consumer equipment has gotten away without ECC is because more often than not the cheaper option wins. And what consistency checks? Are applications performing hashes over the user data every time it changes in memory? That would be quite a hassle.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,671

So Torvald's PC broke, what's your point? It's not like ECC RAM doesn't break. I don't see how ECC RAM would have prevented this. Sure, he might have realised that he has a problem a bit earlier. The end effect is the same: he has to wait until the new RAM arrives.


I'm sorry, I really can not fathom why anybody would resist using a readily available technology that increases the reliability and stability of our digital devices!

The price would come down if it were deployed ubiquitously.

I'm not resisting anything. I am simply not convinced of the usefulness. You are trying to sell me a load of bulky protective gear for my light hike. Maybe this gear would protect me from bruising my knee once in a while. But for now I'd prefer to take the risk and so that I don't have to sweat under all this stuff.

Don't get me wrong please. If the next revision of Apple Silicon comes with error correction, I'd be very much rejoiced. I'd even take a RAM capacity hit for that (those codes have to be stored somewhere). But not if it will cut the performance by 20% or increase the price by 20%.
 

Basic75

macrumors 68020
May 17, 2011
2,101
2,447
Europe
I'd even take a RAM capacity hit for that (those codes have to be stored somewhere). But not if it will cut the performance by 20% or increase the price by 20%.
Implementations of ECC add extra RAM chips, ECC does not cut into the capacity. Next, ECC RAM is like 2% slower, not 20%, which should be hardly noticeable given today's cache sizes. And that one extra RAM chip for every eight is 12.5% more, whether that would be added to Apple's already ridiculously overpriced RAM prices would have to be seen.
 
  • Like
Reactions: Dismayed

theluggage

macrumors G3
Jul 29, 2011
8,011
8,444
Implementations of ECC add extra RAM chips, ECC does not cut into the capacity.
Whichever way you cut it, you need to pay for an extra RAM chip per module.

Moreover, Apple Silicon uses LPDDR RAM which would probably use inline ECC which does grab part of the RAM capacity to store the ECC data.


...this would be a problem for Apple Silicon since the SoCs we've seen so far already have low-ish maximum RAM sizes, a trade off for the power/speed advantages of having LPDDR RAM mounted directly on the package. Part of that is down to memory bus width, but some of it is physically how many RAM dies you can fit on the package... Adding ECC would eat ~20% of that. Even with "regular" ECC RAM you have to find space for the extra RAM chips somewhere.

I don't think there's any real demand or justification for ECC below "Mac Pro" serious workstation level - and the main area where it starts to become essential (along with redundant PSUs, parity-checked RAID for storage etc.) is in the server/high-density computing market, and Apple haven't had a dog in that race since they dropped the XServe.

As for the Mac Pro, either Apple are going to create a new processor with expandable RAM and shedloads of GPU-capable PCIe lanes to create something comparable to the 2019 MP (In which case it would probably have ECC) or it's going to be a souped-up Mac Studio based on the next version of the M1 Max/Ultra, in which case people complaining about lack of ECC will have to queue up behind those who want upgradeable RAM, AMD GPUs and 64 lanes of PCIe...

I'm sorry, I really can not fathom why anybody would resist using a readily available technology that increases the reliability and stability of our digital devices!

There's always one more thing that you can pay for to make your digital device a bit more reliable. It comes down to cost/benefit and weighing up vs. all the other things that can cause code to crash or produce bad results. Why not have RAID 6 storage, military EMP/solar-flare-proof chips and multiple redundancy for everything? Those things all exist and for some applications they are worthwhile but, like ECC, they cost extra.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,671
Implementations of ECC add extra RAM chips, ECC does not cut into the capacity. Next, ECC RAM is like 2% slower, not 20%, which should be hardly noticeable given today's cache sizes. And that one extra RAM chip for every eight is 12.5% more, whether that would be added to Apple's already ridiculously overpriced RAM prices would have to be seen.

As @theluggage writes above, you seem to be talking about side-band ECC, which widens the RAM bus (and RAM itself) to send error correction codes alongside with the data. But LPDDR5 uses inline ECC, where checksums are stored in the same RAM and communicated with a separate request. This eats both into the bandwidth and latency. I wasn't able to find any hard numbers that describe the performance penalty, but it's fairly clear that it's going to be non-trivial. At least when using LPDDR5, maybe Apple will roll their own totally custom DDR that can do it all. Who knows.
 

sunny5

macrumors 68000
Jun 11, 2021
1,837
1,706
You're literally saying that speed is more important than correctness. To me that's an insane point of view.
All ECC RAMs have reduced performance for better stability compared to the same gen normal RAM so I guess he knows nothing about ECC RAM.
 

sam_dean

Suspended
Sep 9, 2022
1,262
1,091
Intel's lack of ECC memory support on the Core line was a conscious effort to differentiate their Xeon line from "consumer" chips, as much or more so than simply a cost consideration.
With Intel out of the picture, would Apple make a point to move to ECC memory on Apple Silicon Macs?

The added assurance of stability can be a selling point for Pro workflows, and it's hard to argue that it wouldn't benefit The Rest Of Us at the same time.
Added cost has never held Apple back before... but what about the extra space required?
I think whatever Apple is doing is heads above shoulder what other PC makers do. So I do not think about something that is out of my hands.
 

Basic75

macrumors 68020
May 17, 2011
2,101
2,447
Europe
I don't think there's any real demand or justification for ECC below "Mac Pro" serious workstation level - and the main area where it starts to become essential (along with redundant PSUs, parity-checked RAID for storage etc.) is in the server/high-density computing market, and Apple haven't had a dog in that race since they dropped the XServe.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,671
The RTW discussion is very high quality and should be followed by anyone interested in this topic. After having read the very informative posts, I am even more convinced that the only way for Apple to add ECC support is to design a custom RAM solution.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.