SSD/Flash storage in Macs: Premium prices for garbage?!

jasoncarle · Feb 23, 2022

Xiao_Xi said:
Do other filesystems support checksums? Which ones?

I have read that Btrfs supports checksums, but it has some issues. What are the differences between Btrfs and APFS?

ZFS

leman · Feb 23, 2022

Krevnik said:
The big things in my mind are:
* Snapshots
* Copy on Write
* Containers instead of Partitions

Some quick things to add to this list: native sparse file support, fast file copy (via cloning), optimised for latency.

Krevnik said:
Snapshots are probably the least useful day to day, but back the sealed system volume functionality, since you are booting snapshots that are signed. A malicious actor wanting to modify the system volume has to recreate the seal which is extra work. If I recall correctly, Time Machine has benefited from snapshots as well. But it'd be a good way to snapshot the state you want to backup and not require exclusive access to files you are trying to backup.

Snapshots are the foundation of how Apple does backups these days. Instead of copying the files like in the old good days, Time Machine now will only compare the metadata and copy over the differing pages. Enables much much faster and more stable backups.

Krevnik · Feb 23, 2022

leman said:
Some quick things to add to this list: native sparse file support, fast file copy (via cloning), optimised for latency.

Fast file copy is part of copy on write.

leman said:
Snapshots are the foundation of how Apple does backups these days. Instead of copying the files like in the old good days, Time Machine now will only compare the metadata and copy over the differing pages. Enables much much faster and more stable backups.

I figured they had to be leaning on it, I just wasn't sure, so didn't want to claim it was. Thanks for clarifying that point.

leman · Feb 23, 2022

Krevnik said:
Fast file copy is part of copy on write.

If I understand it correctly, there are some details I regards to fast file copy functionality. Since usual file copying algorithms rely on reads and writes, copying this way will actually create "real" copies (duplicated data) even on COW filesystems. So if you use the usual copy API or commands, you will still incur a real copy. COW filesystems offer additional APIs/flags to avoid full copies.

Krevnik · Feb 23, 2022

leman said:
If I understand it correctly, there are some details I regards to fast file copy functionality. Since usual file copying algorithms rely on reads and writes, copying this way will actually create "real" copies (duplicated data) even on COW filesystems. So if you use the usual copy API or commands, you will still incur a real copy. COW filesystems offer additional APIs/flags to avoid full copies.

So I think we are getting a bit muddled in the minutae here, but the copy on write functionality is what makes clones feasible as a filesystem feature in the first place. For the clone to take up no space (other than the file node), you need to be able to share the data blocks, which is what copy on write enables.

Although to be fair, there's a lot that copy on write helps enable. Clones, checkpoints, snapshots, it all benefits from the copy on write behavior. So there could be some disagreement on where something is "part" of what someone considers copy on write, versus something important enough to stand on its own. I also wasn't intimately familiar with the terminology Apple used here (hence me talking about journaling behaviors rather than checkpoints), which probably didn't help.

In the case of Foundation APIs, Apple seems to support cloning when available automatically, at least so long as you aren't implementing your own file copy routines on Apple's platforms (and if someone reading this does, please don't):

Code:

The copyItem(at:to:) and copyItem(atPath:toPath:) methods of FileManager automatically create a clone for Apple File System volumes, as shown in the listing below.

Jorbanead · Feb 23, 2022

Can someone ELI5: is this anything to be concerned about for average users?

crazy dave · Feb 23, 2022

Jorbanead said:
Can someone ELI5: is this anything to be concerned about for average users?

Two different issues:

1) Basically some UNIX software written with both Linux and macOS in mind might not send the correct command on OS X to ensure that your data is actually written safely to the SSD. Correctly written macOS software should be fine but the default is not to be so secure most of the time (i.e. in the case of a power outage on a desktop with no battery backup, it's more likely to lose recent data as it more likely to be stored in volatile SSD cache rather than the SSD itself).

2) When performing the full "no really, this is important to save, write to the SSD now" command the internal Apple SSD is much, much slower than it should be. Tests indicate this slowness is a problem with Apple's firmware rather than the SSD hardware itself. So it is fixable. However, most of the time this doesn't matter, the default is not to do this. In fact this is probably why it was not caught or, if caught, not cared about because it is more rare than not to do this and the default being faster but less safe is often okay because most of Apple's most popular products come with batteries such that a sudden power outage doesn't really matter.

In short: a user is more likely to get data loss on desktops with power outages; poorly written multi-platform programs that want to save the data safely might not know the correct macOS command to save data safely to the SSD; and programs that are often invoking the safe save to SSD command will suffer from poor performance (things like database programs).

Xiao_Xi · Feb 23, 2022

crazy dave said:
poorly written multi-platform programs that want to save the data safely might not know the correct macOS command to save data safely to the SSD

How can we know if an application (e.g. SQLite) is well/poorly written?

Mike Boreham · Feb 23, 2022

Krevnik said:
A lot of this benefits from SSDs. Containers are a form of thin provisioning of volumes. So I don't have to cut a 2TB drive into two 1TB partitions, but rather I can let them share the free space on the drive and each container only uses what it needs. But this means a heavily fragmented drive in the end. Containers make the sealed system volume possible without overhead of resizing partitions/etc.

I thought APFS Containers are in fact fixed size, like HFS+ partitions. APFS volumes can share the space within a Container, as you describe.

crazy dave · Feb 24, 2022

Xiao_Xi said:
How can we know if an application (e.g. SQLite) is well/poorly written?

I believe SQLite uses the correct flags - FSYNC/FSYNC barrier are the recommended flags for flushing in Apple’s guide. Someone posted that earlier. If you can’t inspect the code you can try to use a program that measures SSD write speed and on the Mac it’ll slow to a crawl when a FSYNC or FSYNC barrier hits. If it’s always fast, then it’s never doing a true flush. Finally you can perform a similar test to what Hector is proposing on SSDs to test to see which ones cheat on benchmarks. Basically cut power to the SSD/computer after the “flush” and see if you experience data loss.

It should be noted by correctly/poorly written I mean the developer intends the write to be safe when it isn’t as opposed to the developer intends for it to be fast and so doesn’t care that a power outage will result in loss. This may sound odd but that may be preferable most of the time for most users on most applications.

Krevnik · Feb 24, 2022

Mike Boreham said:
I thought APFS Containers are in fact fixed size, like HFS+ partitions. APFS volumes can share the space within a Container, as you describe.

As I’ve stated before, I’m not intimately familiar with the terminology around APFS and what maps to what. You are correct though. Container != volume, and I used container in place of volume in this case.

Mike Boreham · Feb 24, 2022

Krevnik said:
As I’ve stated before, I’m not intimately familiar with the terminology around APFS and what maps to what. You are correct though. Container != volume, and I used container in place of volume in this case.

Sorry to nit-pick, but while we are getting it right ......container does not = volume. An APFS container contains APFS volumes. Container is more equivalent to HFS+ partition.

Krevnik · Feb 24, 2022

Mike Boreham said:
Sorry to nit-pick, but while we are getting it right ......container does not = volume. An APFS container contains APFS volumes. Container is more equivalent to HFS+ partition.

That’s what I just said.

Mike Boreham · Feb 24, 2022

Krevnik said:
That’s what I just said.

OK. I saw "Container != volume", but realise now that was meant to be "Container ≠ volume". Peace!

Krevnik · Feb 24, 2022

Mike Boreham said:
OK. I saw "Container != volume", but realise now that was meant to be "Container ≠ volume". Peace!

Old habits die hard: Not equal in programming/ASCII

mr_roboto · Feb 24, 2022

Krevnik said:
I figured they had to be leaning on it, I just wasn't sure, so didn't want to claim it was. Thanks for clarifying that point.

I remember watching the WWDC session a few years ago where they talked about the first step, which was migrating Mobile Time Machine to use APFS atomic snapshots. The old design used some kind of crazy multi-daemon thing that they didn't describe in great detail but sounded like a pile of special-case hacks. Rebuilding on top of APFS snapshots let them delete tens of thousands of lines of code, and block level CoW meant it could back up anything (the old Mobile TM was limited to relatively small files).

leman · Feb 24, 2022

Jorbanead said:
Can someone ELI5: is this anything to be concerned about for average users?

Not really.

crazy dave said:
Two different issues:

1) Basically some UNIX software written with both Linux and macOS in mind might not send the correct command on OS X to ensure that your data is actually written safely to the SSD. Correctly written macOS software should be fine but the default is not to be so secure most of the time (i.e. in the case of a power outage on a desktop with no battery backup, it's more likely to lose recent data as it more likely to be stored in volatile SSD cache rather than the SSD itself).

2) When performing the full "no really, this is important to save, write to the SSD now" command the internal Apple SSD is much, much slower than it should be. Tests indicate this slowness is a problem with Apple's firmware rather than the SSD hardware itself. So it is fixable. However, most of the time this doesn't matter, the default is not to do this. In fact this is probably why it was not caught or, if caught, not cared about because it is more rare than not to do this and the default being faster but less safe is often okay because most of Apple's most popular products come with batteries such that a sudden power outage doesn't really matter.

In short: a user is more likely to get data loss on desktops with power outages; poorly written multi-platform programs that want to save the data safely might not know the correct macOS command to save data safely to the SSD; and programs that are often invoking the safe save to SSD command will suffer from poor performance (things like database programs).

Few comments on this:

1) Software that relies on platform-agnostic fsync() to ensure that the data is really written is inherently faulty, as POSIX does not guarantee that fsync() does anything at all. So I do not really see 1) as a problem — buggy software is buggy software after all. Behaviour of these APIs on various platforms has been well documented and consistent for decades. Apple's F_FULLFSYNC closes the gap in the fsync() specification and provides consistent, reliable behaviour.

2) Also hardly a problem since a full "no really, this is important to save, write to SSD now" actions are extremely rare in practice, all things considered. This is not an API that a regular application will use or need — computer crashing is an extreme catastrophic situation and no customer software is designed to guarantee that your data is safe the moment you press that "save button" if your computer were to crash a fee seconds later. I think there is a lot of misunderstanding what fsync() actually does. It is mostly used by databases to ensure atomicity — you don't want your database to become corrupted in case of a critical failure. But even databases batch their writes in a way that diminished performance won't be noticeable unless you are running a write-heavy production database on your machine (which you really should not btw).

Xiao_Xi said:
How can we know if an application (e.g. SQLite) is well/poorly written?

Use good quality, properly engineered software. But yeah, you can't know. Disk stuff is extremely tricky to order correctly which is why most people rely on standard high-quality implementations. SQLite does it correctly.

mr_roboto said:
I remember watching the WWDC session a few years ago where they talked about the first step, which was migrating Mobile Time Machine to use APFS atomic snapshots. The old design used some kind of crazy multi-daemon thing that they didn't describe in great detail but sounded like a pile of special-case hacks. Rebuilding on top of APFS snapshots let them delete tens of thousands of lines of code, and block level CoW meant it could back up anything (the old Mobile TM was limited to relatively small files).

Yep, it is a great example how a smart low-level feature can radically simplify a user-facing system. And simpler code means safer, more robust code. My backups over WiFi now take mere minute or two on average, which is a huge step up from earlier Time Machine.

k27 · Feb 25, 2022

I find the argumentation that the own (Apple) SSDs do not need checksums questionable. However, I lack the knowledge to answer this. But it is arogant. Because people also want to connect external SSDs and they are not from Apple.

ZFS is absolutely robust. I'll go so far as to say that anyone who says otherwise is a huge liar or a storyteller.
And to Btrfs also exist many myths that are years old but are still told as if they are still current. But that has not been true for a long time. There are some features of Btrfs that are not yet stable and should not be used (e.g. RAID56). But there are many features that are stable and do not cause any problems in practice like a RAID1 with Btrfs (not to be confused with the RAID technology from the last century, which is complete junk compared to ZFS and Btrfs).

Other topic:

I had once read an article or comment that the statement that APFS is SSD optimized is not true at all. Unfortunately, I can't find this source anymore and I can't judge to what extent this is true.
The content was that there is no SSD optimized in the sense. The truth according to this article is that Apple has forgotten (or deliberately ignored) HDDs. You can build a file system so that SSD is "optimized" and still runs on HDDs without problems.
If that is true, I would consider that to be another arogant view from Apple.
- iMacs with HDDs were still sold for a long time. Many people still use such Macs today.
- There are people who want to connect HDDs to their Macs.

As I said: I had read that and I lack knowledge to be able to judge this. Maybe this article was also technically wrong. I cannot judge it.

leman · Feb 25, 2022

k27 said:
I find the argumentation that the own (Apple) SSDs do not need checksums questionable. However, I lack the knowledge to answer this. But it is arogant. Because people also want to connect external SSDs and they are not from Apple.

Yes, this is the main issue. There is no reason to doubt Apple‘s claims that their physical storage is robust. But this makes APFS a less attractive format for non-Apple media.

k27 said:
I had once read an article or comment that the statement that APFS is SSD optimized is not true at all. Unfortunately, I can't find this source anymore and I can't judge to what extent this is true.
The content was that there is no SSD optimized in the sense. The truth according to this article is that Apple has forgotten (or deliberately ignored) HDDs. You can build a file system so that SSD is "optimized" and still runs on HDDs without problems.
If that is true, I would consider that to be another arogant view from Apple.
- iMacs with HDDs were still sold for a long time. Many people still use such Macs today.
- There are people who want to connect HDDs to their Macs.

As I said: I had read that and I lack knowledge to be able to judge this. Maybe this article was also technically wrong. I cannot judge it.

Not an expect either, but I am fairly sure that there are such things as SSD-optimized file systems. SSDs and HDDs have very different access characteristics that justify different algorithmic approaches, e.g. SSDs have very fast random access and asymmetric read/write latency. Also, you can get the best out of SSDs performance and longevity by batching writes to minimize cell rewrites, which also requires a very different approach if you want it done properly.

Andropov · Feb 25, 2022

leman said:
2) Also hardly a problem since a full "no really, this is important to save, write to SSD now" actions are extremely rare in practice, all things considered. This is not an API that a regular application will use or need — computer crashing is an extreme catastrophic situation and no customer software is designed to guarantee that your data is safe the moment you press that "save button" if your computer were to crash a fee seconds later.

I believe the SSD is flushed even after a kernel panic. This should only be a problem in case of sudden power failure.

Krevnik · Feb 25, 2022

leman said:
Not an expect either, but I am fairly sure that there are such things as SSD-optimized file systems. SSDs and HDDs have very different access characteristics that justify different algorithmic approaches, e.g. SSDs have very fast random access and asymmetric read/write latency. Also, you can get the best out of SSDs performance and longevity by batching writes to minimize cell rewrites, which also requires a very different approach if you want it done properly.

As I understand it, one of the reasons that folks don’t recommended it for external HDDs is that with thinly provisioned volumes, files being fragmented from regular writes due to copy on write, the design of the filesystem depends on that fast random access to not have performance tank in the long term. APFS on an HDD would be fine, initially, but generally get worse over time as the drive fragments and accesses become less sequential. It depends a lot on how you use the drive if it’s actually any worse than HFS+ in the long haul.

I’ll also point out that Apple is not one of those entities saying not to use APFS on HDDs.

k27 said:
The content was that there is no SSD optimized in the sense. The truth according to this article is that Apple has forgotten (or deliberately ignored) HDDs. You can build a file system so that SSD is "optimized" and still runs on HDDs without problems.

I can kinda see this argument, but I don’t agree with it. Filesystems generally have always “ignored” HDDs, and instead required tools to clean up the mess (defragmentation). But filesystems like APFS (and ZFS) can fragment a drive’s files faster than filesystems like HFS+ or EXT4, due to the copy on write functionality. It depends a lot on the use case.

When Apple says it is optimized for SSD, they are more stating the fact that they take advantage of the different performance profile of SSDs as leman states. This means being more aggressive in the use of snapshots, clones, and the like, knowing that file fragmentation from heavy use of copy on write mechanisms isn’t going to be a problem with modified files.

k27 said:
ZFS is absolutely robust. I'll go so far as to say that anyone who says otherwise is a huge liar or a storyteller.

I’m not really sure anyone tried to argue the opposite here. Just more that if it isn’t ZFS-level, doesn’t automatically make it bad. You’re not going to catch me trying to say APFS is on ZFS’ level, but it’s still a step in the right direction, and a win considering users didn’t have to lift a finger to migrate to it and get the benefits, meaning less technically savvy users are getting something more modern on their devices. Apple’s approach to the migration was ambitious, and is still a rather impressive feat looking at the history of file systems. And that experience means they have a blueprint for how to do it again when it’s needed.

Ext4 is still quite popular and the default on many Linux distros, making users go out of their way to use btrfs or OpenZFS. Android uses Ext4 by default these days, last I checked. Windows still relies on NTFS. My NAS won’t even let me use anything other than ext4. QNAP now has an OS with ZFS, but my model won’t support it, and I’d have to start from scratch to use it if it did.

Meanwhile Apple has made APFS the default on every machine or device that could be migrated to it.

SSD/Flash storage in Macs: Premium prices for garbage?!

Suspended

macrumors Core

macrumors 601

macrumors Core

macrumors 601

macrumors 65816

macrumors 68000

macrumors 68000

macrumors 601

macrumors 68000

macrumors 601

macrumors 601

macrumors 601

macrumors 601

macrumors 601

macrumors 6502a

macrumors Core

macrumors 6502

macrumors Core

macrumors 6502a

macrumors 601

Our Staff