Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
@Syncretic Amazing work with the research, documentation and patch sets! I updated OpenCore Legacy Patcher to support the SurPlus patchset, this will land in 0.3.0's release or can be grabbed via our nightly links right now. Additionally there's a setting in the Patcher to remove the MaxKernel limit for those who want to experiment in newer OSes:
Screen Shot 2021-09-27 at 9.29.43 AM.png
 
@Syncretic. Successfully booted on macOS 11.6. Thank you very much. And I have to report. I built OpenCore 0.7.3 on a USB flash drive and created a config.plist with @cdf's guide and installed it, but on macOS 11.6, after the boot sound, the screen remains black and no Apple logo appears. Boot Picker does not start even if I press the Esc key. In the same environment, Mojave boots and the Boot Picker works. Next, I moved OpenCore to the EFI partition of the internal SATA SSD and blessed it, and it booted. On my Mac, when booting into macOS 11.6, if I have a USB flash drive that is blessed, it doesn't work and Boot Picker doesn't work either. It shuts down on its own after a black screen, no problem at all with Mojave.
 
Last edited:
After thinking the MP5.1 was dead earlier this year and started looking for alternatives, I got it (partly) working with latebloom but always had random "unsupported" boot failures, more frequently after a soft reboot... Now with @Syncretic's fix it feels like it came back to life with reliable booting similar to pre-11.3 !!! This is fantastic to all of us hanging on to our trusty MP5.1s... thank you @Syncretic for all your hard work!
 
@Syncretic. Successfully booted on macOS 11.6. Thank you very much. And I have to report. I built OpenCore 0.7.3 on a USB flash drive and created a config.plist with @cdf's guide and installed it, but on macOS 11.6, after the boot sound, the screen remains black and no Apple logo appears. Boot Picker does not start even if I press the Esc key. In the same environment, Mojave boots and the Boot Picker works. Next, I moved OpenCore to the EFI partition of the internal SATA SSD and blessed it, and it booted. On my Mac, when booting into macOS 11.6, if I have a USB flash drive that is blessed, it doesn't work and Boot Picker doesn't work either. It shuts down on its own after a black screen, no problem at all with Mojave.
As a counter - I boot off a USB stick for OpenCore and am working fine with SurPlus and 11.6.
 
Reading through the explanation on github quickly I am surprised that this deadlock is not present on the newer Apple devices (or the newer PC hardware).
Thanks to some insight from @vit9696, it appears that newer Macs aren't affected because those CPUs support the rdrand instruction, meaning they don't need floating-point access during early boot. Meaning, from Apple's perspective, this is not a bug at all, since supported Macs will never encounter it. (I have updated my writeup on github to reflect this.)
 
This little program, by the way, will tell you if your processor has the rdrand function:


which is useful to see if other systems you might have need this patch (my MacBookPro 9,1 has the function, so does not need this patch).
 
This little program, by the way, will tell you if your processor has the rdrand function:


which is useful to see if other systems you might have need this patch (my MacBookPro 9,1 has the function, so does not need this patch).
Ivy Bridge and newer CPU's support rdrand.
 
I had to write my own debugger; the boot hangs generally happened before the boot was far enough along for MacOS' remote debugging facility to work, so drastic measures were required.
Can you describe what your debugger does and how it works?

One method for debugging might be to use a virtual machine to run macOS.

I understand that two machine debugging of the kernel using FireWire and Ethernet doesn't start early enough, but what about serial?

In the serial_init function of pe_serial.c, there's three options for a serial port: Legacy, MMIO, and PCIe.
Legacy uses the COM1 port address 0x3f8.
MMIO uses MMIO Config space 0xFE036000 or Legacy MMIO Config space 0xFE034000 but you can use a boot-arg "mmio_uart" to specify a base address.
PCIe uses PCIe MMIO base 0xFE410000 but you can use a boot-arg "pcie_mmio_uart" to change that.

Macs don't usually have a serial port but a Hackintosh can (a COM port). My GA-Z170X-Gaming 7 motherboard has a COM port for example.

I wonder if the PCIe option would work? Install a 16x50 compatible PCIe card, and set the boot-args. To check if it could work, boot into a UEFI Shell and probe the addresses with the mm command using the same offsets and values that the probe functions in pe_serial.c use.

A PCIe card in a Thunderbolt enclosure might also work if you have a Mac with no PCIe slots. But the PCIe addresses might change during PCIe enumeration?

Does setting DB_HALT (bit 0) in the debug boot-arg cause the kernel to wait for debugger attach during early boot or is another method required?
 
it appears that newer Macs aren't affected because those CPUs support the rdrand instruction

It would be an interesting exercise to spoof out RDRAND (bit 30 in ECX) on a supported Mac and see if the race condition can be triggered.

This little program, by the way, will tell you if your processor has the rdrand function:

An alternative:

sysctl -a | grep machdep.cpu.features
 
  • Like
Reactions: Petri Krohn
Thanks to some insight from @vit9696, it appears that newer Macs aren't affected because those CPUs support the rdrand instruction, meaning they don't need floating-point access during early boot. Meaning, from Apple's perspective, this is not a bug at all, since supported Macs will never encounter it. (I have updated my writeup on github to reflect this.)
Interesting that Macs with SandyBridge, that not yet have the rdrand instruction, are much less susceptible to suffer the race condition, maybe single core speed also have some influence on this?
 
  • Like
Reactions: 14dcutaneo and cdf
One method for debugging might be to use a virtual machine to run macOS.

I understand that two machine debugging of the kernel using FireWire and Ethernet doesn't start early enough, but what about serial?

In the serial_init function of pe_serial.c, there's three options for a serial port: Legacy, MMIO, and PCIe.
Legacy uses the COM1 port address 0x3f8.
MMIO uses MMIO Config space 0xFE036000 or Legacy MMIO Config space 0xFE034000 but you can use a boot-arg "mmio_uart" to specify a base address.
PCIe uses PCIe MMIO base 0xFE410000 but you can use a boot-arg "pcie_mmio_uart" to change that.

Macs don't usually have a serial port but a Hackintosh can (a COM port). My GA-Z170X-Gaming 7 motherboard has a COM port for example.

I wonder if the PCIe option would work? Install a 16x50 compatible PCIe card, and set the boot-args. To check if it could work, boot into a UEFI Shell and probe the addresses with the mm command using the same offsets and values that the probe functions in pe_serial.c use.

A PCIe card in a Thunderbolt enclosure might also work if you have a Mac with no PCIe slots. But the PCIe addresses might change during PCIe enumeration?

Does setting DB_HALT (bit 0) in the debug boot-arg cause the kernel to wait for debugger attach during early boot or is another method required?

All of that is probably valid, correct, and do-able. Knowing that the problem appeared inconsistently, and not knowing what might be causing the problem or what caused the occasional successes, I decided to stick with a real Mac and no VM, just to rule out all the variables that a Hackintosh and/or VM would introduce. I don't have a serial port, and didn't want to invest the time in acquiring one and getting it to work, so I did it the hard way. (That seems to be a habit of mine, unfortunately.)

Can you describe what your debugger does and how it works?

It's quite limited in scope, since it was tailored for this project. It basically grabs control as soon as the display is usable, lets me examine memory, step through the thread list, dump and trace the stack of each thread, etc. It's crude, but effective enough to get the job done. The very first time the debugger ran successfully (trust me, I've generated more kernel panics than all other MacRumors members combined), I got a great adrenaline rush because that boot triggered the deadlock, and I got a clean stack dump that laid out the roadmap that eventually led to the solution. That was a good day.

(Before anybody asks: I have no plans to release this debugger. It was a quick and dirty (and, frankly, ugly) hack that was custom-built for this purpose. With any luck, it will never be used again.)
 
All I can say is this: Syncretic you're a true star! 11.6 installed without a hitch and running like a charm! Thank you. Donation will soon follow!
👏👏👏👏👏👏👏👏👏👏👏👏
 
Last edited:
  • Like
Reactions: KevinClark
WILL APPLE FIX THIS?

I've posted my findings about the MacOS 11.3+ "race condition" bug, along with a patch I'm calling SurPlus, in a github repository.

My comments after skimming through the technical details.

The race condition exists for all Macs, not just unsupported ones. It is related to the corecrypto module of the kernel, but does not depend on the instruction set available. corecrypto will use SSE3 or AES-NI, if AVX is not available.

My analysis: There was a deadlock in the kernel code as Apple first wrote it. They then implemented a workaround, but did not make sure it is always selected. SurPlus makes sure the workaround is always used by eliminating the deadlocking branch.

Technical details: The corecrypto and zalloc (memory zone allocation) threads of the kernel are in a conflict. If they call each other before both are properly initialized, the kernel will hang in a deadlock. zalloc needs random numbers to implement a new security feature. It gets them from corecrypto.

To avoid a conflict during Apple has implemented a function named early_random(). In a successful boot early_random() uses its own SHA1 random number generator. This is what it should always do during initialization.

The bug is in the early_random() function. Instead of generating its own random numbers, it checks to see if corecrypto has been initialized. If so, it calls corecrypto. corecrypto will always fail and deadlock because it needs zalloc.

SurPlus fixes the bug by simply removing the conditional branch to corecrypto in the zalloc initialization code. I believe Apple should do the same.


P.S.
I assume early_random() is only called by the initialization code for zalloc. If not, then another way for Apple to fix early_random() would be for it to check that both corecrypto and zalloc are initialized. But this would provide little benefit from an ugly clutch.

Randomizing memory addresses are generally used to mitigate side-channel attacks.
 
Last edited:
@Syncretic, thanks for the update, we should really update the patch to affect the performance less though. Their randomisation can be killed without any serious security issue: its purpose is to make heap spraying and exploitation less reliable, and is aimed to make the exploit writing harder rather than protect a particular machine.
To preserve rand performance for other subsystems try returning 1 from _vm_pool_low…
 
I tried BS 11.6 clean install. The previous problem was completed without any problems as if it had never happened before.
It's like magic. I sincerely thank you for your efforts, hard work, and elegant and beautiful solution that breathed new life into my Old Lady still doing her job wonderfully even after 10 years has passed.
 
@Syncretic, thanks for the update, we should really update the patch to affect the performance less though. Their randomisation can be killed without any serious security issue: its purpose is to make heap spraying and exploitation less reliable, and is aimed to make the exploit writing harder rather than protect a particular machine.
To preserve rand performance for other subsystems try returning 1 from _vm_pool_low…

Forcing _vm_pool_low() to always return true has broader implications - it affects the scheduler (via the compute_zone_working_set_size method in sched_average[]) and callers to zalloc_ext() (via zalloc_item_slow()). It also only provides protection against the specific instance of zalloc initialization using FP too early; I'm concerned that kexts (notably, APFS) might also do so, a condition which the SurPlus patch covers. I'd rather not have to go through this exercise again if another early FP access shows up.

My goal was to solve the boot hang problem with minimal impact to other parts of the system. Based on my own testing, I have not seen boot times adversely affected by the patch (granted, that's a very small sample size). Since booting only happens occasionally, I think a small performance hit is acceptable here. (If we were talking about part of the main kernel, performance would be a priority.)
 
Also tested, works like a charm. 🔥🔥🔥

@Syncretic Many thanks! It's so cool. Many thanks go to you, the people of OpenCore and all the kind people here in the forum.
 
Last edited:
Can I double check exactly which array this is (kernel - patch), I'm not sure I'm looking at the right ones:

Code:
<key>Kernel</key>
    <dict>
        <key>Add</key>
        <array>
            <dict>
                <key>Arch</key>
                <string>x86_64</string>
                <key>BundlePath</key>
                <string>Lilu.kext</string>
                <key>Comment</key>
                <string>Do not touch this setting</string>
                <key>Enabled</key>
                <true/>
                <key>ExecutablePath</key>
                <string>Contents/MacOS/Lilu</string>
                <key>MaxKernel</key>
                <string></string>
                <key>MinKernel</key>
                <string>16.0.0</string>
                <key>PlistPath</key>
                <string>Contents/Info.plist</string>
            </dict>


Or am I looking at one starting with this:


Code:
<key>Patch</key>
        <array>
            <dict>
                <key>Arch</key>
                <string>Any</string>
                <key>Base</key>
                <string></string>
                <key>Comment</key>
                <string>IONVMeFamily Patch#External</string>
                <key>Count</key>
                <integer>0</integer>
                <key>Enabled</key>
                <true/>
                <key>Find</key>
                <data>RXh0ZXJuYWw=</data>
                <key>Identifier</key>
                <string>com.apple.iokit.IONVMeFamily</string>
                <key>Limit</key>
                <integer>0</integer>
                <key>Mask</key>
                <data></data>
                <key>MaxKernel</key>
                <string></string>
                <key>MinKernel</key>
                <string>17.0.0</string>
                <key>Replace</key>
                <data>SW50ZXJuYWw=</data>
                <key>ReplaceMask</key>
                <data></data>
                <key>Skip</key>
                <integer>0</integer>
            </dict>
        </array>

Looking for the right place to put the new additions.
Under ‘patch’. It is a patch not a kernel.
 
I wasn't fortunate to get my 11.6 5,1 working with this.

I rebooted warm, rebooted cold and also did 5 PRAM resets and still won't boot.

Clearly I didn't put it in the correct place. I added the latest to the same place that latebloom was added but at the end of the kernel section.

Here's what I captured on boot.

EDIT: I confirmed that I didn't put it in the correct place. I don't have time right now to go back and fix it, but will before the end of the day and confirm booting or not then
 

Attachments

  • IMG_8707.jpeg
    IMG_8707.jpeg
    749.6 KB · Views: 167
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.