Since this is still going on, I felt I should actually take some time this week to watch three of the videos (I spared myself the time of watching the “Apple fans say mean things” video which I expect is just an effort to monetize the disagreement he found here). Even on the videos that are meant to be technical, I find the information per unit time painfully low, so I've only gone through the 3 below. Most of these are 30 min or more long and most of the discussion is non-technical ranting or self aggrandizement, but there are a few points that can form a basis for discussion of the actual topic at hand.
The three videos I’m referencing are:
LR1:
$3000 laptop turned to paperweight due to ISL9240 unavailability. :'(
LR2:
Apple's soldered-in SSDs are engineered in the WORST way possible!
LR3:
Horrible design of Apple's soldered in SSDs; it's WORSE than you thought!
If you don't want to spend the time watching, the first two follow a common pattern:
-- Here’s a busted MacBook
-- Drown the board in flux and heat gun the hell out of it
-- The MacBook is still busted
-- Must be Apple’s fault, **** Apple
The third one does away with the bench top content and is mostly "**** Apple, Extended Edition".
He says “If you’re writing a lot to an SSD for 5 years, you can expect to use up a good portion of an SSDs life in that time.”
I'm still looking for anything that looks like broad statistical data, but I've got a couple sources that undermine that assertion and probably better align with most people's real world experience:
techreport.com
First drive failed after >2700 full drive writes.
That is rewriting the full drive, every day, for seven and a half years.
The first TLC drive failed after 3600 full drive writes. A full write every day for almost 10 years. If you have a terabyte drive, that works out to about 3 petabytes written over the life of the drive.
And those
were the first two failures, the remaining drives kept going 2 or 3 times longer.
This test shows many more results consistent with or more often far better than that (Reddit discussion of Russian primary source):
Reddit Link
I wouldn’t read this as an indication of how any particular model will perform, there’s only one of each. But taken together this is an indication that even under heavy load an SSD is likely to far exceed the published warranty specs we see on drives and last 7-10 years and often far longer.
He describes an alternate world where you could have an M.2 drive installed.
Regardless of whether it is socketed or not, it won’t be a standard M.2 SSD. A standard module includes a controller, these will not. The SSD controller is part of the Apple Silicon now.
There’s a few other problems with this argument. For one, there is a space concern here as I showed before. This is the 14" logic board:
Apple has room for 8 NAND chips in 4 groups of 2 around the board to run in a RAID configuration to increase performance in the top spec’d models. This means you need to find a rectangular space to hold all 8 chips on one board. There’s also a height challenge, putting the PCB, connector, another PCB, and components in the lower housing.
The reason for 8 is that each chip has its own PCIe lane, which is probably what makes these custom chips and what allows them to be spaced away from the controller. I don't see anything to support his claims that each NAND part has its own embedded ARM core running custom firmware "sitting between the PCIe bus and the T2"-- he points to a random post on LTT that points to a notebook check article that points to a tweet that isn't a tweet anymore and also doesn't exist either because it was retracted or because of any one of the reasons tweets and accounts are going away recently. I find the idea that every part has an ARM core running wear leveling a bit hard to explain. He's definitely got the placement wrong (the PCIe bus is the communication to the T2, so it could only be between the PCIe and the NAND array), and I suspect the game of telephone between the now gone tweet and the video scrambled other details.
Anyway this all means the SSD is running over 8 PCIe lanes, while an M.2 module only supports 4. This is likely why the Studio has two bays for modules. So you need to find space for two connectors and two daughter cards to sit above the motherboard. Cards and connectors take space, and forcing a rectangular module takes more space. And now you've made a one connector problem a two connector problem
And I know people don't want to think connectors are a common point of failure because it doesn't feel truthi enough for their guts to believe, but they are.
People get kind of neurotic about heat and their SSDs but heat is a legitimate concern. Most will typically put some sort of heat sink on top of their drives if they have space for it. For a laptop, the heat spreader approach is typically used which pulls the heat from the hottest parts and spreads over a larger surface to radiate and heat more air to be blown or convect away.
But it’s the controller that is the hottest part of the drive, so when you take it out of the thermally managed SoC package and put on an M.2 drive next to the NAND chips it has the effect of taking the heat from the controller and spreading it over and adding heat to the NAND chips themselves:
That’s in contrast to the MacBook Pro approach of putting the NAND chips on a cool area of the board and right where the cool air gets drawn in by the fans:
So Apple's design approach here is actually quite SSD and performance friendly.
He goes into a rant about how the SSD is a “wear part” that will eventually fail. That’s true. Over time SSDs develop a few problems around maintenance of the floating gate, charge accumulation, oxide issues etc. You wind up with bits that fail validation on write or erase, you get bad sectors, etc.
But this is an entirely new thing for me:
LR2 (2:31) “My personal favorite, is when the NAND fails by shorting to ground. Many of these fail by shorting to ground entirely. Not just like a dead SSD where the data went poof, I mean when the actual NAND chips fail and bring down the power line completely because there is a 6 to 0 ohm short to ground on the NANDs main power line.”
Ok, so what about Rossmann’s argument that these parts are all shorting to ground?
Here’s where Rossmann’s credibility is so important to the discussion— he claims that half of the repairs he does on A2141’s is because of SSD shorts to ground. That’s a rather incredible number and one that I can’t find any independent confirmation of. Rossmann claims this has been happening since at least 2019. 4 years is a lot of time for Apple to work with their suppliers, of which they apparently have 4: Samsung, Hynix, Kioxia, Western Digital. It seems odd that this would be a failure point across so many vendors for so long without a correction.
Can a digital IC like this show a short? Sure. But typically it’ll happen on the I/O ring because of some damage inflicted on the chip not as a result of wear. One major culprit would be ESD or some other voltage surge blowing out the protection diodes on the pins. Wear would imply this was a short in a particular NAND cell, which would be a rather bizarre mechanism for sinking that much current. The I/O pins at least are beefy transistors with (relatively) fat bond wires attached, the internal cells are delicate structures with nm scale metal layers that you’d expect to vaporize and open circuit when passing that much current.
And can someone, anyone, explain this statement to me: LR2 (30:42) “It's more likely here that the NAND is what's bad. [...] If the .9V rail has all these little solder balls moving around that means that most likely the NAND is shorted and it is placing a lot of tension on all of the power rails that are powering the SSD which is why there is solder balls there which… **** my life…” WT-actual-F is he going on about? Is he saying the solder balls are formed by "tension" in the power rail caused by a short in a NAND chip?!? Is it somehow getting mechanically squeezed out of the capacitor?
None of the MacBooks Rossmann opens in those videos show any actual evidence of the NANDs shorting to ground. What we see, if you trust his measurement methods, is that something may have shorted to ground and in most cases it’s hard to know what by the time he’s finished throwing chips out and reworking component after component.
And finally, what about the fact that you can't boot from an external drive without a working internal SSD? I think it comes down to what working means to understand how much of a problem that presents. He goes through a sloppy interpretation of an iBoff video that needs some disentangling.
For one, he says that T2 Macs don't have UEFI.
That's wrong. They do:
The UEFI firmware sits in the main SSD array rather than in a separate SPI connected Flash chip and the T2 acts as an eSPI client to the Intel chip which reads the UEFI firmware through the T2. So, yes, there is still UEFI firmware.
The T2 also has its own SPI connected Flash that holds the secure iBoot procedure. This is the part that gets updated when you go into DFU mode. iBoot validates the UEFI, UEFI is needed to access the ports on the machine which eventually allows access to an external drive and verifies that you aren't booting from an external drive to roll back to an insecure version of firmware.
If you have a board failure that blows up your power rails, can you boot from an external SSD? No. But I don't see how that's relevant to the discussion. It's no different if you had a modular SSD-- you have a board failure and a power supply failure. You can't expect anything to work. If you held the UEFI in a separate SPI NOR flash somewhere, it still relies on a power supply, it still relies on a working flash chip, and the same failure modes that apply to the main SSD apply to that part as well.
If you wear out your SSD, are you unable to boot from an external drive? Frankly, it's not clear. You only need to read the UEFI code, and it's not likely to be the sectors that wear because you don't update firmware that often. If the SSD reaches end of life, does it lock out your ability to read? It's not clear that there's a reason to-- you can't erase or program any longer, but I don't see why the system wouldn't be able to read the younger cells. The inode table would likely go corrupt, but as I understand it the UEFI is in a separate partition which means a separate and clean inode table. I haven't read about enough drives failing to know for sure-- which is kind of the point, this isn't a common failure.
One thing that is absolutely true here though:
you must have a backup to boot from an external drive. For everyone saying that Apple is horrible for how they made their SSDs because it's unreasonable to expect people to back up their data, you need another drive to boot from, folks.
The take away here though is that nothing would be different if the SSD weren’t soldered down. There was a failure somewhere in the system that led to a power line failure. This is why the board is replaced as a unit— once you have that kind of failure there’s no confidence in the performance of anything anymore.
Let’s say a buck converter blows but the SSD was in a module. What difference does it make? The drive is still lost, for all Rossmann’s repeating that the customer really cares about their information it would still be gone, and everything connected to that rail is now suspect. The NANDs have failed because their I/O ring took a surge, which means the SoC itself is now suspect because it’s at the other end of that NAND interface. And what caused the power supply to fail in the first place? It’s not necessarily the root cause either.
Simply put, soldered or not makes no difference here. I just watched Rossmann manhandle a number of boards and none recovered— maybe he has better luck in other videos but I can’t be bothered to keep watching. And the methods he’s using to diagnose and repair jeopardize the rest of the system anyway, so if it wasn’t at risk before he opened it, it is now.
As far as I can tell, this is an entirely hypothetical problem being hypothesized by people with an agenda.