MP 7,1 OpenZFS woes, on a maxed-out Mac Pro with Catalina 10.15.3

deconstruct60 · Feb 8, 2020

mward333 said:
Is anybody out there using OpenZFS on a Mac Pro with Catalina 10.15.3 ??
I'm using OpenZFS 1.9.3.1 (https://openzfsonosx.org/) on a Mac Pro with 1.5 TB of RAM, 8 TB (Apple-installed) SSD, and 96 TB of spinning disks, namely: six of the 16 TB Seagate Exos drives (2 in a Pegasus J2i and 4 in a Pegasus R4i). The spinning disks are setup in one zpool, in a very basic way:
sudo zpool create -f -o ashift=12 tank disk2 disk3 disk4 disk5 disk6 disk7
....

another diagnostic step would be to put the ZFS volume onto just one of these storage interfaces. [ Yes, spreading over multiple storage interface instances should work because not a single point of failure , but .... ]

Apple has heavily oversubscribed the PCH in the Mac Pro 2019. More than a good chance that the access times to there two systems is not uniform at all. ( R4i being on a 'clean' x16 PCI-e bus .... absolutely gross overkill for 4 HDD drives) and the the SATA J2i interface being throttled by the Apple T2 SSD at full blast ( read and write different directions but if other stuff going on to SSD. )

Snow Tiger · Feb 8, 2020

deconstruct60 said:
another diagnostic step would be to put the ZFS volume onto just one of these storage interfaces. [ Yes, spreading over multiple storage interface instances should work because not a single point of failure , but .... ]

Apple has heavily oversubscribed the PCH in the Mac Pro 2019. More than a good chance that the access times to there two systems is not uniform at all. ( R4i being on a 'clean' x16 PCI-e bus .... absolutely gross overkill for 4 HDD drives) and the the SATA J2i interface being throttled by the Apple T2 SSD at full blast ( read and write different directions but if other stuff going on to SSD. )

This is actually a very good idea .

The R4i and the J2i might have identical SATA rusty spinners , but the interfaces are different and have different characteristics .

I wonder by how much the IOPS of the SATA interface ( J2i ) lag behind the IOPS of the PCIe interface ( R4i ) ?

I can't seem to find any IOPs ratings on the two interface specifications . Are they drive generated instead ?

knweiss · Feb 9, 2020

mward333 said:
I hope that we can fine-tune the parameters so that OpenZFS will work for me. I want to do some high I/O from the Apple-installed SSD to/from the OpenZFS pool, but the reading/writing keeps getting throttled. Almost every single time.... Argh......

I wonder if you see any relevant warning/error messages in the system log ("Console" app) when your test case throttles?

How does the "Memory" view of Activity Monitor look like when it happens?

edgerider · Feb 9, 2020

mward333 said:
@edgerider thanks for chiming in! I'm imagining your setup in my mind's eye now! This is awesome.
I'm a professor at a major research university, so I don't need to put shelves in my basement with racks of storage. We are fortunate to have crazy amounts of storage on our computing clusters at work.
For example, we just added another petabyte of storage for the undergraduate students in my class this summer. Seriously. I can't make this stuff up.

so cool!
then 100 or 40 Gbe is your best friend, because no matter how fast your internal storage is , on the long run software raid will never be able to be as safe and reliable as a dedicated hba offloading all parity calculation to a dedicated raid engine.

if you have petabyte of storage in your organisation, it is almost sure that the servers are up to 100gbe so, for you it would be so much easier just to put a ATTO 100gbe hba in your mac and get 4200 to 6000 Mb/ second, nearly zero latency, safe, reliable, and no brainer...

mward333 · Feb 9, 2020

edgerider said:
so cool!
then 100 or 40 Gbe is your best friend, because no matter how fast your internal storage is , on the long run software raid will never be able to be as safe and reliable as a dedicated hba offloading all parity calculation to a dedicated raid engine.

if you have petabyte of storage in your organisation, it is almost sure that the servers are up to 100gbe so, for you it would be so much easier just to put a ATTO 100gbe hba in your mac and get 4200 to 6000 Mb/ second, nearly zero latency, safe, reliable, and no brainer...

@edgerider, well, in practice, the enormous computations occur *entirely* on the cluster (i.e., we do all the computing on the cluster and the data stays on the cluster). The Mac Pro is just for prototyping, at this scale! Like I mentioned earlier in the thread, a couple years ago, I ran a job on our clusters at our university that rendered 72 petabytes of data, and used 37 years of computing time on the cluster (of course, the computational tasks occurred in parallel).

ytk · Feb 9, 2020

edgerider said:
so cool!
no matter how fast your internal storage is , on the long run software raid will never be able to be as safe and reliable as a dedicated hba offloading all parity calculation to a dedicated raid engine.

Nonsense. Software RAID is WAY more reliable than hardware. Hardware RAID is, by nature, proprietary. That means that you are fundamentally dependent on a black box. If that black box goes south, well, you don't have a lot of options. If you're lucky, you can find an identical RAID card, plug all of your drives in in the exact same pattern, and pray that everything works (when something as minor as a firmware difference could easily cause everything to stop functioning). If you're not lucky, you've just discovered the meaning of “single point of failure”.

Hardware RAID is a dying technology. It has very, very few use cases anymore that wouldn't be better served by a software solution.

AidenShaw · Feb 9, 2020

ytk said:
Hardware RAID is a dying technology. It has very, very few use cases anymore that wouldn't be better served by a software solution.

Getting performance, availability and low latency is one of those use cases.

deconstruct60 · Feb 9, 2020

ytk said:
Nonsense. Software RAID is WAY more reliable than hardware. Hardware RAID is, by nature, proprietary. That means that you are fundamentally dependent on a black box. If that black box goes south, well, you don't have a lot of options.

In vast majority of cases , "Hardware" RAID is really software RAID also. The difference is on which processor the software ( called firmware in the embedded context ) runs on. A subset of RAID computations will probably be accelerated on the custom processor hardware, but stuff like juggling caches , SATA metadata interactions , S.M.A.R.T / healthy monitoring , and even mode settings probably have a software component to them.

The Software RAID that runs on the host CPU and deals with a variety of storage access paths and drives is usually more robust in dealing with a wider variety of drives. On the other hand it is usually bigger ( more lines of code and more complexity that isn't pushed into fixed function hardware ). If apply the same fixed sum of money to validating and verifying the software in both context the smaller firmware can be moved to higher state of being "bug free". [ In real life though there are different teams with different budgets and resources (programmer skill , time to focus on task , etc. ) ]

Being reliable has to do with being consistent and the absence of errors. Not being locked into a single implementation from a single vendor is a substantively different dimension. If the there is no vendor lock and the Software RAID has errors then it isn't more reliable. Recovery from hardware failure to another system with the same drives is a different issue.

Hardware RAID is a dying technology. It has very, very few use cases anymore that wouldn't be better served by a software solution.

Hardware RAID isn't so much as dying as being relegated to a relatively narrower niche. Hardware RAID doesn't scale to modern PB levels. Running that many drives into a single controller is a "single point of failure" issue also even if do go through massive gyrations to try to make it fit.

In the 2-18 drive zone it still have traction but that isn't the limit of the storage pool problems these days for a growing set of users. So addresses a smaller and smaller piece of the pie. SSDs also mean don't in any way always require more drives to go faster. There is "RAID" like handling there but it being done across chip dies and packages in a single device. So there is growth on both sides ( smaller (single drives for speed or high capacity ) and bigger collections (wider fanout ... bigger 'arrays" ). )

Thirdly, RAID 10 , 1 don't leave much for hardware to do. ( RAID 0 even less ). It is only the 4,5,6 (and associated variants where there is much significant work to actually offload from a modern CPU and host OS file system (and it s cache). )

The bigger nail also is use of more whole computer devices as SAN/NAS servers. A server that just primarily runs ZFS can be the "RAID" array. So it is only limited , direct attached storage (DAS). Inside of higher capacity, multiple device DAS, it is still holding on.

deconstruct60 · Feb 9, 2020

AidenShaw said:
Getting performance, availability and low latency is one of those use cases.

Performance: SSD have that 'crown' now. If want higher capacity with "good" performance then hardware is an option. If looking for best performance then largely no. RAID 0 , 1 , 10 with HDDs

Availability : if want to solely want to point the finger at individual storage modules then HW RAID allows slower preforming recovering from that. But the RAID controller itself is very often a single point of failure. Multiple HW cards may have a failover mechanism but that is implemented in what? Yep software.

Latency ... similar stuff of which corner context are in.

edgerider · Feb 9, 2020

wow!😂 so much back fire...
areca raid array are just the contrary of single point of failure: if the raid card burns, you just replace the controler...
and netapp have two hotswap controller and if you want maximum availability you just run them in a active-active configuration, once again as long as you dont change the drive from emplacements in the hard drive bay you are good.
I have even replaced a areca 1680 to a 1880 and then to a 1882, in the mean time I went from 8 x4 to 12x4 to 16 x4 and now I have one areca 24 port in a supermicro chassis with 4tb in raid 6 for pure archival purpose, and two netapp ds4346 with 24x2tb in raid 60 for speed...
on the twin netapp volume i get almost 3000mb/s R/W with a very low latency on a 72tb volume, and it has cost me less tha 1500€....
I would a hundred time fear more about the single psu go south on a ssd zfs array than having an issue on my 4 psu, twin active active controller and easy replacement areca controller ...

areca 1880xi 24 can be have for 300$ used on ebay and 1880ix can be have for under 200$... I have a spare for each...
I paid 500$ for my ds4346 loaded with 24x 2tb hgst drives,
So i’m curious on how the hell you manage to do a 72tb raid 60 array with redondant point of failure, 3000mb/s R/W and the ability to loose up to 6 drive per array without compromising data, and still acces to them for 1500$ with ssd....😇

as I say : I wouldn’t trust a zfs ssd array for anything else than cache and temporary media copy for work.

over the network my 5.1 access the 72 Tb data at over 1,2 to 1,4 gb/s with the two atto nt12...

if it is fast and reliable enough to work daily with 8tb 5,5k timelapse video projects, it might be good for 90% of us...

anyhow I am happy with what I have , and just wait for my MP7.1 to arrive and I will be good for the next two years, and by then I will buy more ds4346 shelfs... and eventually go to 40gbe if i can!

I just wanted to pass around my experience to the one who are thinking about spending 32 grand on a promise vtrack, or 5k on a synology....

areca controller have been dam reliable to me and alway kept my data safe.

and if anything go south , my most important data are on a separate shelf at a friend house... just need to go pick it up and in no time i am back on track...

If you guys are happy with theirs zfs software ssd array, I am happy for you!
i just hope you will never experience a power outage without a dedicated memory backup battery...like all areca card have... or have to rebuit a zfs pool with greater than 2 tb sata drive

any how, zfs, software, raid, jbod, if tour data are not on 3 different place, they dont exist.

it is not for no reason that millitary go for redundancy...
[automerge]1581278983[/automerge]
and by the way, if you hit anything close to 1200mb/s on an internal zfs 8x8tb drive array , consider this as a miracle, and then when a drive will go bad I hope you will have a backup, because rebuilding a zfs pool on 7 desktop 8tb sata drive will never happen... just google zfs over 2 tb sata....
[automerge]1581279142[/automerge]

mward333 said:
@edgerider, well, in practice, the enormous computations occur *entirely* on the cluster (i.e., we do all the computing on the cluster and the data stays on the cluster). The Mac Pro is just for prototyping, at this scale! Like I mentioned earlier in the thread, a couple years ago, I ran a job on our clusters at our university that rendered 72 petabytes of data, and used 37 years of computing time on the cluster (of course, the computational tasks occurred in parallel).

way out of my league!!!😂😂😂
but in essence I would really like apple to revive XGrid, that was sick!

edgerider · Feb 9, 2020

ytk said:
Nonsense. Software RAID is WAY more reliable than hardware.

Hardware RAID is a dying technology. It has very, very few use cases anymore that wouldn't be better served by a software solution.

you realise that 99% of banking data are on either netapp raid array or some kind of harware raid...

if hardware raid is dying, so why do all major brand offer raid applyance?

see how linus tech tips software ssd raid array has turned out...😂😂😂

AidenShaw · Feb 9, 2020

deconstruct60 said:
Being reliable has to do with being consistent and the absence of errors.

(Agree that in the end, it's all software.

)

Hardware RAID with a battery-backed (or capacitor-backed) persistent writeback cache is pretty much a requirement for RAID-5/6/50/60. Pure host-based software RAID can be reliable or fast - not both. If it's fast, it is likely to corrupt the disk if a host failure occurs during parity generation. When you get into the petabyte range, distributed filesystems like HDFS that are resilient to dozens of drive failures per day are the norm.

I have about 4 PB in my lab, but few of the filesystems are much over 100TB, and RAID-60 with hot spares on hardware RAID with hot spares is the norm.

ZombiePhysicist · Feb 9, 2020

What do you guys think will happen with SSDs. With 15TB SSDs finally coming down in price from insanity to just ludicrous levels, I wonder if it changes how you balance redundancy.

AidenShaw · Feb 9, 2020

ZombiePhysicist said:
What do you guys think will happen with SSDs. With 15TB SSDs finally coming down in price from insanity to just ludicrous levels, I wonder if it changes how you balance redundancy.

In my lab, we don't use parity RAID with SSDs. We'll use hardware RAID-0 with SSDs if we need a filesystem larger than the SSD, but never parity RAID.

Mirrors and parity RAID have never, ever been a substitute for backups. Mirrors and parity RAID protect you from disk failures. While SSDs do fail, the failure rate is so far less than spinners we just don't use redundant RAID for SSDs.

We do, however, run nightly backups. Think enterprise level Time Machine for hundreds of systems with about 200 TB RAID-60 storage for compressed, single-instance store backups. A low level of SIS compression is 95%. (That means that the 200 TB is backing up 4 PB.)

Backups will protect you from the "OMG, I screwed up an edit on 3 January and everything since then is garbage" situations. IOW, wetware failures.

Schismz · Feb 9, 2020

ytk said:
Nonsense. Software RAID is WAY more reliable than hardware.

[...]

Hardware RAID is a dying technology. It has very, very few use cases anymore that wouldn't be better served by a software solution.

Unless it's SoftRAID which has gone downhill off a cliff. <Raising hand> Happy to take almost anybody's "proprietary black box" vs. that mess.

Flint Ironstag · Feb 10, 2020

Schismz said:
Unless it's SoftRAID which has gone downhill off a cliff. <Raising hand> Happy to take almost anybody's "proprietary black box" vs. that mess.

I've been using SoftRAID since ~2012 and never had an issue. About to buy a couple more SSD Thunderbays from OWC. What have you experienced?

shaunp · Feb 10, 2020

Here's an alternative idea. I see from your original post that you have the disks spread across a couple of different thunderbolt arrays. How are the disks configured here - JBOD or have you implemented some form of RAID here? If you are using RAID, try JBOD instead.

I also can't help thinking I wouldn't bother with the Pegasus arrays (sell them) and just have a PC with a couple of 10GigE ports and Linux. Install your drives here and then present ZFS volumes via NFS over the 10GigE to your Mac. This will also remove the compression/dedupe overhead from running directly on your Mac. Ebay would probably be your friend here for getting a PC that's up to the task.

I don't know if that will give you enough throughput, but as you only have 6 SATA disks I'm guessing it will - you are more concerned with the compression functionality.

mward333 · Feb 10, 2020

shaunp said:
Here's an alternative idea. I see from your original post that you have the disks spread across a couple of different thunderbolt arrays. How are the disks configured here - JBOD or have you implemented some form of RAID here? If you are using RAID, try JBOD instead.

I also can't help thinking I wouldn't bother with the Pegasus arrays (sell them) and just have a PC with a couple of 10GigE ports and Linux. Install your drives here and then present ZFS volumes via NFS over the 10GigE to your Mac. This will also remove the compression/dedupe overhead from running directly on your Mac. Ebay would probably be your friend here for getting a PC that's up to the task.

I don't know if that will give you enough throughput, but as you only have 6 SATA disks I'm guessing it will - you are more concerned with the compression functionality.

I have the disks in the zfs pool configured using this setup:
zpool create -f -o ashift=12 -O compression=lz4 -O casesensitivity=insensitive -O atime=off -O normalization=formD tank raidz2 disk2 disk3 disk4 disk5 disk6 disk7

You mentioned the possibility of doing the work externally on a PC running Linux. We have thousands of Linux machines at our university on our clusters (with petabytes of storage), and I do use these resources extensively to do my computational work.

shaunp · Feb 11, 2020

mward333 said:
I have the disks in the zfs pool configured using this setup:
zpool create -f -o ashift=12 -O compression=lz4 -O casesensitivity=insensitive -O atime=off -O normalization=formD tank raidz2 disk2 disk3 disk4 disk5 disk6 disk7

You mentioned the possibility of doing the work externally on a PC running Linux. We have thousands of Linux machines at our university on our clusters (with petabytes of storage), and I do use these resources extensively to do my computational work.

Just use the PC to effectively build a Linux-based ZFS NAS. It just becomes a storage platform that you are more than familiar with and it won't cost you a fortune to do - you could reclaim most of the costs by selling the Pegasus arrays.
[automerge]1581416368[/automerge]

mward333 said:
I have the disks in the zfs pool configured using this setup:
zpool create -f -o ashift=12 -O compression=lz4 -O casesensitivity=insensitive -O atime=off -O normalization=formD tank raidz2 disk2 disk3 disk4 disk5 disk6 disk7

You mentioned the possibility of doing the work externally on a PC running Linux. We have thousands of Linux machines at our university on our clusters (with petabytes of storage), and I do use these resources extensively to do my computational work.

I get your ZFS setup on the MAC, but how have you configured the disks on the Pegasus arrays? You could be causing contention if you are putting ZFS on top of RAID 5 for example. If they are JBOD then that's okay.

mward333 · Feb 11, 2020

shaunp said:
you could reclaim most of the costs by selling the Pegasus arrays.

I can't sell the Pegasus arrays. This machine is owned by my university.

shaunp said:
I get your ZFS setup on the MAC, but how have you configured the disks on the Pegasus arrays? You could be causing contention if you are putting ZFS on top of RAID 5 for example. If they are JBOD then that's okay.

They are JBOD. I'm not causing contention by using ZFS on top of RAID 5 or any other kind of RAID for that matter.

HDFan · Feb 12, 2020

AidenShaw said:
While SSDs do fail, the failure rate is so far less than spinners we just don't use redundant RAID for SSDs.

Even with the lower expected lifetimes of SSDs? Although the time to replace is much harder to predict with HDs.

shaunp said:
I also can't help thinking I wouldn't bother with the Pegasus arrays (sell them) and just have a PC with a couple of 10GigE ports and Linux

I've found a dedicated Pegasus to be significantly faster than my other 10 GigE storage devices. No contention, or ethernet overhead.

AidenShaw · Feb 12, 2020

HDFan said:
Even with the lower expected lifetimes of SSDs? Although the time to replace is much harder to predict with HDs.

With RAID-1, you're writing the same amount of data to both drives, so you should "expect" them to fail together.

The disk controllers monitor the SSDs via S.M.A.R.T. and other metrics, and will flag when an SSD is approaching its limit.

atonaldenim · May 23, 2020

Sorry to hear about the Catalina woes. ZFS on a Mac Pro is a very sexy concept.

On a less bleeding edge mac, how functional is OpenZFS on OS X on Mojave or High Sierra?

I’ve put FreeNAS on a spare HP server with a basic RAIDZ1 array and 10Gbe and some Noctua fans to try to civilize it. But man it would be pretty sweet to move that RAIDZ1 inside my Mac Pro 5,1 and not have to run that extra server at all. Can OpenZFS on OS X be a daily driver on a more mature version of OS X? Open source SoftRAID alternative?

There’s even a Mac GUI of sorts... ZetaWatch

mward333 · May 26, 2020

atonaldenim said:
On a less bleeding edge mac, how functional is OpenZFS on OS X on Mojave or High Sierra?

I never tried OpenZFS on Mojave or High Sierra.

I finally gave up on OpenZFS on my maxed-out Mac Pro. I worked on it for many days and I finally grew tired. I think it simply has to do with the fact that I have 1.5 TB of RAM in my Mac Pro.

In the three months since my posts, I've been using the Mac Pro and its 6 drives (16 TB each) with no difficulties at all. I wish I had OpenZFS working on it, but I don't.... and I have work that I need to accomplish.... so I just moved on!

MP 7,1 OpenZFS woes, on a maxed-out Mac Pro with Catalina 10.15.3

macrumors G5

macrumors 6502a

macrumors newbie

macrumors 6502

macrumors 6502a

macrumors 6502

macrumors P6

macrumors G5

macrumors G5

macrumors 6502

macrumors 6502

macrumors P6

Suspended

macrumors P6

macrumors 6502

macrumors 65816

Cancelled

macrumors 6502a

Cancelled

macrumors 6502a

Contributor

macrumors P6

macrumors 6502

macrumors 6502a

Our Staff