This has been an absurdly long day for me, starting at about 4am local time. I really shouldn't be spending time on this, but I got caught up in it; here are some results that may or may not be useful.
Base system:
- Mac Pro 4,1 flashed to 5,1 BootROM 144.0.0.0.0 Dual X5675 (3.06 GHz)
- 64 GB 1066 MHz ECC (8x8GB)
- NO WIFI CARD
- Factory Bluetooth card
- PowerColor Radeon RX 570 4GB (which has worked perfectly in my MP3,1 and this MP for years) in slot 1
- For testing, I used an old Apple keyboard and mouse (from a G5)
I did a fresh install of Big Sur 11.3 (20E232) using OCLP 0.1.1.
For the install, I removed all drives except for a blank 120GB SATA SSD.
Installation went fairly smoothly, although it's about the longest fresh install I can remember (there were a LOT of reboots before I got the final setup screens).
With the system as shown above, I rebooted four times without incident, using both "restart" and "shut down."
I then added in my Sonnet Allegro (ASM1142-based USB3 card), and did two more successful boots (slot 3, slot 2).
I then removed the ASM1142 card and added my NVMe drive (
Micro Connectors M.2 NVMe PCIe x4 adapter with heat sink +
Inland Premium 256GB 3D NAND NVMe SSD (Phison E12 controller, firmware ECFM22.7)), which has been running perfectly in this system under both Mojave and Catalina, in slot 3.
First boot was successful, the next 40 boots with the NVMe installed saw about a 15% success rate (I was also varying the conditions; see below). Another 20 boots without the NVMe were interspersed in there; all were successful.
Based solely upon my setup here (i.e. you shouldn't extrapolate too much without more testing), trying various combinations of components, I can make the following observations (your mileage may vary):
- Timing of the boot (delay between OC appearing and me pressing ENTER to boot) appears to make no difference.
- Presence or absence of my ASM1142 card appears to make no difference. (This was a surprise.)
- Presence or absence of additional drives (SSDs or spinners) appears to make no difference. Presence or absence of additional drives on any given port (ODD SATA or backplane) appears to make no difference.
- Being connected or disconnected from Ethernet appears to make no difference.
- NVMe in slots 2, 3, or 4 appears to make no difference. I had successful boots with the NVMe in each of those slots, along with a great many unsuccessful ones.
- Presence or absence of USB devices (mouse, keyboard, hub, flash drive), and which USB2 ports they're in, appears to make little or no difference. (I include "little" here because on two occasions, inserting a flash drive mid-boot yielded a successful boot; however, since six other attempts to do the same thing failed, that's probably just a coincidence.)
- On my system, set up as noted, I get (so far) a 100% success rate booting with no NVMe device installed (and varying other devices (including SSDs and the USB3 card)), but only about a 15% success rate with the NVMe installed. That suggests that (for my system, at least) the NVMe is the trigger for the problem.
- Despite the ridiculous number of reboots I've done today, I have not attempted to do 10 consecutive reboots with the NVMe card removed, so I can't say I've passed @startergo's test. That being said, from what I've seen so far, I have no reason to believe that's not possible on my system.
Now that I have an installed copy of Big Sur 11.3, I can start analyzing the code itself.
I've barely started on that, but I can make one observation:
A frequent set of NVMe assert() errors on the verbose boot screen (just before a hang) are:
Code:
AppleNVMe Assert failed: ( 0 != data ) ReleaseIDNode file: {...path...}/IONVMeController.cpp line: 5669
AppleNVMe Assert failed: 0 == (status) Exit file: {...path...}/IONVMeController.cpp line: 5718
The first one (line 5669) is from IONVMeController::GetChipInfo(), which attempts to get an IORegistryEntry object: "IODeviceTree:/chosen" "chip-id". When that fails, we see the assert message.
The second one (line 5718) is a direct result of the first; when ::GetChipInfo() returns its error code, its caller (in this case, IONVMeController::CheckWorkaround()) displays its own assert failure message.
It's completely unclear whether this is significant or just another meaningless (to us) debug message.
I haven't had time to dig much deeper yet, but out of curiosity, could someone with a genuine Apple NVMe device do an
ioreg -l -p IODeviceTree
and see if there's a
chip-id
identifier anywhere in the output?
(
ioreg -l -p IODeviceTree | grep "chip"
will tell you if there's something there; if so, the full
ioreg -l -p IODeviceTree
would be helpful (you might want to redact any personal info that appears, though)).
EDIT: I forgot to mention something odd that may or may not be relevant. On Mojave and Catalina, the Blackmagic Disk Speed Test pretty consistently shows my NVMe drive getting 1400+MBps/1300+MBps (read/write). Under Big Sur 11.3, it was showing anything between 250MB/s and 1450MB/s, with basically no consistency. This is a highly anecdotal result, as I was just running it on a whim, but I wonder if it's related to underlying NVMe issues (or maybe I just need a nap).