Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
The first post of this thread is a WikiPost and can be edited by anyone with the appropiate permissions. Your edits will be public.
Status
Not open for further replies.
I don’t know your hardware set up, can’t answer if you will have boot issues. Please stop asking if it will work for you. Install to a separate drive and see for yourself.
 
Last edited:
I tested my 4.1/5.1 rig with 11.3.1 and @Dayo's RefindPlus in a boot loop and got more than a dozen successful boots.

With all this USB Hardware I have in (Cinema Display Hub, USB Keyboard Hub)

I know RP does nothing special for BS but another data point for this race condition.

Booting via RP is native, I could also boot just with the firmware and -no_compat_check

But the timing is different as RP loads before.
 
Why people use OCLP to start with? We should always use the OC install guide offered in these forums, with a good documentation knowledge so we know what we actually do to our machines. I thought we are all using the same EFI setup.

#220

🙏
 
I tested my 4.1/5.1 rig with 11.3.1 and RefindPlus in a boot loop and got more than a dozen successful boots.
Interesting
  1. Which Version of RP?
    1. Believe @startergo had previously tested and didn't get the same results
    2. Although presumably not latest RP (v0.13.2.AD - shouldn't make any difference but need to rule out)
  2. Are you running DEBUG or RELEASE version of RP?
    1. Is there any difference between the two?
  3. What happens when chain-loading OC?
    1. Any difference?
  4. What happens when all the connected USB stuff is disconnected?
    1. Any change?
  5. Presumably not booting from drive on PCIe Slot
    1. Does this have an impact?
  6. Does having another Mac OS instance present make a difference?
    1. That is, present on another drive but not being booted
    2. Assume most tests to date have this condition.
  7. Is this the same unit you have been testing before?
    1. Presumably was failing when going through OC directly if so.
    2. If not the same, needs running directly through OC.
  8. Why would RP loading before be different to OC loading before?
    1. Needs more tests to rule out random luck.
    2. Basically needs another tester to reproduce the same results
 
Interesting
  1. Which Version of RP?


0.13.2.AD
    1. Believe @startergo had previously tested and didn't get the same results
    2. Although presumably not latest RP (v0.13.2.AD - shouldn't make any difference but need to rule out)
  1. Are you running DEBUG or RELEASE version of RP?
    1. Is there any difference between the two?
Debug
  • What happens when chain-loading OC?
    1. Any difference?
not at the first 5 tries, OC boots my 11.3.1, too
  • What happens when all the connected USB stuff is disconnected?
    1. Any change?
I do my "stress test" and add 5 USB thumb drives and cold booted - it hung on Intel 82574L

when I pull the AHCI Blade it hung on USB Drivers
  • Presumably not booting from drive on PCIe Slot
    1. Does this have an impact?
  • Does having another Mac OS instance present make a difference?
    1. That is, present on another drive but not being booted
    2. Assume most tests to date have this condition.
on the AHCI Blade is Mojave. Just sitting in to fill a PCI Slot
  • Is this the same unit you have been testing before?
    1. Presumably was failing when going through OC directly if so.
    2. If not the same, needs running directly through OC.

always the same unit, same SSD, same installation on my bench - just different bootloaders on same EFI Partition.
  • Why would RP loading before be different to OC loading before?
    1. Needs more tests to rule out random luck.
    2. Basically needs another tester to reproduce the same results

it's not easy to tell cause testing all those circumstances needs a lot of time. I can't sit the whole day on the bench rebooting. Last weekend I had plenty of spare time but that was an exception.
 
I decided to upgrade my HDD (bay 4) from 11.4b1 to 11.4b2, and it didn't go very well. I'm using Martin Lo's new OpenCore 0.6.9 package.

The upgrade hung here, requiring a power cycle, then continuing with the installer.
1620174088571.png


Then this happened after the install resumed - never seen this before. Seemed like several screens of this scrolled by before the installer continued.
1620174195261.png


It got to this point and hung again.
1620174245588.png


Then it hung here, permanently. Several power cycles, kept hanging here, so I booted into recovery and am performing a re-install, which seems to be going well - 3 minutes remaining. I'll do the usual multiple reboots, but not expecting anything new. Never had this HDD give me so much trouble throughout the 11.3 beta's.
1620174355659.png
 
The re-install from recovery hasn't gone well either. It hung here, and I have yet to see an 11.4b2 login screen.
1620176224914.png


After a power cycle, and a lot more verbose text, finally got an Apple logo with progress bar, which always means it's going to give me a login screen. Sure enough, I can finally log into 11.4b2. **EDIT** Re-install from recovery, re-installed 11.b41 - a couple times in the past, it installed the OS I was trying to upgrade to. Ah well, start over and attempt to install 11.4b2 again.

I haven't watched a lot of verbose text go by before, and it usually scrolls by pretty fast, but has it always claimed unsupported CPU and unsupported PCH? I sure hope this issue isn't the Northbridge.

1620176690195.png
 
Last edited:
it's not easy to tell cause testing all those circumstances needs a lot of time. I can't sit the whole day on the bench rebooting
Unfortunately, that's what is required.

Anyway, this looks like it is not RP related and that angle can be discounted.
 
I haven't tried to update my Mac Pro 5,1 to 11.3.x yet. However, I found it next to impossible to update a Parallels 11.2.3 virtual machine to 11.3 (two series of attempts, with repeated kernel panics), so I had to create a new virtual 11.3 machine from scratch, and that wasn't that smooth either, as it required at least one virtual machine reset for the installation procedure to end successfully. The update to 11.3.1 was kind of easier, but it involved at least two kernel panics.

Now, in my experience, ALL Big Sur installations/updates using both VMware Fusion (in Catalina) and Parallels Desktop (in Big Sur) have been fraught with difficulty, and this includes 11.0, 11.1 and 11.2.x. In all cases, once the new/newly updated virtual machine got to the Big Sur Desktop for the first time, later boot processes went smoothly, with no kernel panics. Considering the virtual machines themselves have no NVMe/PCIe components, I'm wondering if the 11.3.x and 11.4.x problems people are experiencing on their 5,1s might not be similar to the problems these same versions and even earlier releases of Big Sur have been plaguing virtual machines. In other words, if we persist in our efforts to install/update 11.3.x and 11.4.x until the Big Sur Desktop is reached for the first time on an actual Mac, will future boot processes be stable enough?

EDIT: I went ahead and installed 11.3.1 on an HDD that had no operating system. The initial, partial install took a long time (roughly two hours). After about ten reboots with and without OpenCore (0.6.9) the install hadn't finished. Perhaps I should have persisted in this effort, but I'm inclined to believe something is definitely wrong with 11.3.x as far as the Mac Pro 5,1 is concerned. One word of caution: I did NOT remove my Titan-Ridge Thunderbolt PCIe card or my relatively new Wi-Fi/Bluetooth module. Perhaps if I were to remove at least the Thunderbolt card the install would eventually finish.
 
Last edited:
...In other words, if we persist in our efforts to install/update 11.3.x and 11.4.x until the Big Sur Desktop is reached for the first time on an actual Mac, will future boot processes be stable enough?
If you read all posts in this thread there is nothing stable about 11.3 or 11.4. Both could hang or panic during boot regardless of whether their installation was (relatively) painless or not, or panic after they boot regardless of what hardware you have installed.
 
  • Like
Reactions: PeterHolbrook
If you read all posts in this thread there is nothing stable about 11.3 or 11.4. Both could hang or panic during boot regardless of whether their installation was (relatively) painless or not, or panic after they boot regardless of what hardware you have installed.
Yes, thank you. I've been reading the thread with great interest. That's why I've refrained from updating my main Big Sur installation and took the extra step(s) of updating only a clone, just in case it worked reasonably well. It doesn't seem to have worked at all.
 
I haven't tried to update my Mac Pro 5,1 to 11.3.x yet. However, I found it next to impossible to update a Parallels 11.2.3 virtual machine to 11.3 (two series of attempts, with repeated kernel panics)
By "kernel panic", do you mean the VM panicked, or the host system panicked? Mine took down the host system in a spectacular crash/reboot.
 
While we've tested setting CPU's to 1, is there a way to reduce CPU cores to 1? Is that a worthwhile test, and would it help identify if this is a race condition?
 
By "kernel panic", do you mean the VM panicked, or the host system panicked? Mine took down the host system in a spectacular crash/reboot.
No, the host system never panicked. Only the virtual machine did. My Mac Pro 5,1 froze today, though, repeatedly, when I installed 11.3.1 to an HDD clone of my main Big Sur disk and tried to boot from that clone.
 
Last edited:
While we've tested setting CPU's to 1, is there a way to reduce CPU cores to 1? Is that a worthwhile test, and would it help identify if this is a race condition?
The boot-arg "cpus=N" sets the number of logical processors, so "cpus=1" means "use only one logical processor." If you're on a dual-CPU system with 12 total cores and hyperthreading, making a total of 24 logical processors, "cpus=" can range from 1 to 24. In that case, "cpus=1" uses only one core, without hyperthreading.



While I have the floor, here are a few notes (from the kernel source) to anyone who's playing with boot-args:
  • Traditional command-line programs look for specific arguments, such as "-r" or "--disable-xyz", and issue errors or warnings if they find unknown arguments. MacOS does not behave that way. boot-args can contain anything (or nothing) at all - you could set boot-args="MeinHundHatKeineNase=WieGeriechtEr?Schrecklich!" and there would be no error (although the kernel might quietly groan). The kernel (and kexts) simply scan the boot-args for whatever they're interested in at the time, and ignore everything else.

    The point of this is to be careful what you type, because typos/errors will most likely be silently ignored. If you accidentally type "cpu=1" or "numcpus=1", MacOS will happily ignore that and never let you know. (The exception to this is if you type a genuine boot-arg but give it an invalid value, the kernel/kext may issue a complaint about the value, then most likely ignore it.)
  • All boot-args that accept a numeric value can understand decimal (123), hex (0x7b), octal (0173), or binary (0b1111011). The minimum length is one digit. All numeric values are parsed as 64-bit, although specific arguments may actually be smaller (32-bit, 16-bit, 8-bit, or boolean). In addition, values can be negative (leading '-'), and can be terminated with 'k' or 'K' (meaning * 1024), 'm' or 'M' (meaning * 1024 * 1024), or 'g' or 'G' (meaning * 1024 * 1024 * 1024).
    So, "amfi_get_out_of_my_way=1" is identical to "amfi_get_out_of_my_way=0x00001", "amfi_get_out_of_my_way=0b1", etc. "debug=0x400" is identical to "debug=1k", although the hex version is clearer about which bits are being set (of course, binary is also good for that, if less compact - "debug=0b10000000000" is also identical). "zsize=12582912" is identical to "zsize=12M".
    (For the non-C-programmers among you: if a number starts with 0 and contains only the digits 0..7, it's considered octal (base-8) - for example, nbuf=17 means "17", while nbuf=017 means "15". That means you don't want to use any leading zeros on decimal values.)

    An interesting side-effect of the way arguments are parsed is that you can use the 'k'/'m'/'g' suffixes with each of the bases - so, "16k" is identical to "0x10k" is identical to "0b10000K". (Just in case you're feeling the urge to confuse yourself later.)
  • Like most things from the Unix world, boot-args are case-sensitive.
  • boot-args need to be separated by one or more spaces. (Technically, you can also use tabs, but that's both difficult and pointless). Commas, semicolons, and other common separators are treated as part of the argument, not as separators.
  • When looking for a boot-arg, MacOS scans the entire boot-args string from left to right. That means if an argument appears more than once, the last one takes precedence. For example, if you had boot-args="debug=0x100 -no_compat_check debug=0", MacOS would effectively only see "-no_compat_check debug=0". If there are duplicates, the last (right-most) one "wins."
  • Arguments beginning with '-' are considered boolean flags, and don't accept values. While the parser won't complain if you use something like boot-args="-no_compat_check=999", the "=999" just gets ignored (all boolean flags are assigned the value 1 if they're present). Arguments that don't begin with '-' are considered variables, and are expected to be immediately followed by '=' and a value; however, if there is no "=value", they are assigned a value of 1 (so, hypothetically, "amfi_get_out_of_my_way=1" should be identical to just "amfi_get_out_of_my_way" (I haven't actually tested that, but the kernel source suggests it should be so)).
 
Last edited:
The boot-arg "cpus=N" sets the number of logical processors, so "cpus=1" means "use only one logical processor." If you're on a dual-CPU system with 12 total cores and hyperthreading, making a total of 24 logical processors, "cpus=" can range from 1 to 24. In that case, "cpus=1" uses only one core, without hyperthreading.
I decided to test cpus=1, and found something interesting. While booting 11.3.1 would freeze at random places before, setting cpus=1 freezes at exactly the same place, every time - 10x reboots. I don't think I've ever seen it freeze at !BSD, but it's 100% repeatable with cpus=1.

1620251889942.png


EDIT: 11.4b2 gets 2 lines past that point (with cpus=1).

1620252290445.png
 
Last edited:
  • Like
Reactions: h9826790
I decided to test cpus=1, and found something interesting. While booting 11.3.1 would freeze at random places before, setting cpus=1 freezes at exactly the same place, every time - 10x reboots. I don't think I've ever seen it freeze at !BSD, but it's 100% repeatable with cpus=1.
Any predictable behavior in this situation is definitely interesting! Does cpus=2 change where the hang occurs? Also, perhaps the BSD patch could help with cpus=1.
 
  • Like
Reactions: h9826790
Last edited:
Any predictable behavior in this situation is definitely interesting! Does cpus=2 change where the hang occurs? Also, perhaps the BSD patch could help with cpus=1.
10x reboot iterations of cpus=1 hangs at exactly the same place (!BSD).

5x reboot iterations of cpus=2 hangs at exactly the same place (Intel82574L). Not enough iterations to call that 100% repeatable, but it was rare to see 2x freezes in the same location before, and I don't think I ever had 3x in a row, so strong evidence. Here's where cpus=2 hangs every time:

1620256611839.png


I set cpus=3 and got 4x login screens in a row. This is from my SATA SSD on PCIe. However, the 4th reboot hung at Intel82574L. The 5th reboot hung at the USB2 point, and resulted in the "prohibited" icon and screen mess, so it seems that cpus=3 is back to random.
 
  • Like
Reactions: h9826790 and cdf
Some instructions for the kernel debug kit above:
 
  • Like
Reactions: JohnD
I have something I'd like to try, but I've hosed my BS 11.3 installation and I won't have time to reinstall until this weekend. Perhaps some kind (or curious, or desperate) soul will give this a try and report back:

First, you need to determine the Volume/Disk/Partition GUID of your Big Sur 11.3 (or 11.4 beta) boot volume.
In Terminal, run diskutil list and identify your boot volume - it should show up as "/dev/diskN (synthesized):", where N is some number, partition 0 is "APFS Container Scheme", there's a line that says "Physical Store diskXsY" (where X and Y are numbers), then partition 1 is an APFS Volume with a reecognizable name, presumably something like "Big Sur". There will be two such partitions, one named with the base name (e.g. "Big Sur"), the other with " - Data" appended to it (e.g. "Big Sur - Data"). Note the disk number ("/dev/diskN (synthesized)") and which partition number contains the base name WITHOUT " - Data" appended.
(Here's a concrete example:
Code:
/dev/disk5 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +239.9 GB   disk5
                                 Physical Store disk3s2
   1:                APFS Volume Big Sur 11_1 - Data     5.3 GB     disk5s1
   2:                APFS Volume Preboot                 248.6 MB   disk5s2
   3:                APFS Volume Recovery                613.6 MB   disk5s3
   4:                APFS Volume VM                      1.1 MB     disk5s4
   5:                APFS Volume Big Sur 11_1            16.3 GB    disk5s5
   6:                APFS Volume Update                  413.7 KB   disk5s6
In this example, the disk number is 5 (from "/dev/disk5"), partition 1 is the data partition ("Big Sur 11_1 - Data"), and partition 5 is the base ("Big Sur 11_1")).

Once you know the disk number and base partition number, run diskutil info diskNsY where N is your disk number and Y is the base partition number. You should see something like this:
Code:
   Device Identifier:         disk5s5
   Device Node:               /dev/disk5s5
   Whole:                     No
   Part of Whole:             disk5

   Volume Name:               Big Sur 11_1
   Mounted:                   Yes
   Mount Point:               /Volumes/Big Sur 11_1

   Partition Type:            41504653-0000-11AA-AA11-00306543ECAC
   File System Personality:   APFS
   Type (Bundle):             apfs
   Name (User Visible):       APFS
   Owners:                    Enabled

   OS Can Be Installed:       Yes
   Booter Disk:               disk5s2
   Recovery Disk:             disk5s3
   Media Type:                Generic
   Protocol:                  SATA
   SMART Status:              Verified
   Volume UUID:               D5711839-E42A-4782-9172-E0DF14B21CDF
   Disk / Partition UUID:     D5711839-E42A-4782-9172-E0DF14B21CDF

   Disk Size:                 239.9 GB (239855427584 Bytes) (exactly 468467632 512-Byte-Units)
   Device Block Size:         4096 Bytes

   Volume Total Space:        239.9 GB (239855427584 Bytes) (exactly 468467632 512-Byte-Units)
   Volume Used Space:         22.6 GB (22643900416 Bytes) (exactly 44226368 512-Byte-Units) (9.4%)
   Volume Free Space:         217.2 GB (217211527168 Bytes) (exactly 424241264 512-Byte-Units) (90.6%)
   Allocation Block Size:     4096 Bytes

   Read-Only Media:           No
   Read-Only Volume:          No

   Device Location:           Internal
   Removable Media:           Fixed

   Solid State:               Yes
   Hardware AES Support:      No
   Device Location:           "Bay 1"
The values for "Volume UUID" and "Disk / Partition UUID" should be identical; copy that string (in this example, D5711839-E42A-4782-9172-E0DF14B21CDF).

Now, set your boot-args to include the following:
Code:
rd=uuid boot-uuid=The_UUID_you_just_copied
(using the diskutil output from above, our example would be rd=uuid boot-uuid=D5711839-E42A-4782-9172-E0DF14B21CDF). If you're running OpenCore, you probably need to edit config.plist and change boot-args there. (Remember to keep any debugging-relevant boot-args, such as -v or debug=0x100)

Once you have rd=uuid boot-uuid=Some_UUID in your boot-args, try booting Big Sur 11.3 and see if it makes any more progress than usual.

(For those who are interested: the first half or more of 11.3's IOFindBSDRoot() is identical to the binary of 11.1 and the source code for 11.2, but then it diverges. With limited time available, I'm hypothesizing that it's getting stuck trying to boot from either a RAMdisk or a network device, and by forcing it to use the correct disk partition, we might get past this particular hurdle. If setting the boot-uuid doesn't do it, there are still some other things to try.)

Thanks, and good luck. I should be able to check back in tomorrow.

IMPORTANT REMINDER: be sure you have a natively bootable partition available to you; if this (or any similar) experiment goes badly, you'll need to do a PRAM reset to get rid of the problematic boot-args, after which your OpenCore setup may or may not boot correctly.
(EDIT: even if this works, you'll still want to do a PRAM reset, since with rd=uuid boot-uuid=... in the boot-args, the only MacOS disk you'll be able to boot will be the BS 11.3 disk...)
 
Last edited:
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.