Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
The first post of this thread is a WikiPost and can be edited by anyone with the appropiate permissions. Your edits will be public.
Status
Not open for further replies.

PeterHolbrook

macrumors 68000
Sep 23, 2009
1,625
441
Did you checked IONVRAM.cpp?

There is a reference of a race and I've found IONVRAM-FORCESYNCNOW-PROPERTY and IONVRAM-SYNCNOW-PROPERTY used before.

Code:
  // <rdar://problem/9529235> race condition possible between
  // IODTNVRAM and IONVRAMController (restore loses boot-args)


Does that discovery bring us any closer to a solution?
 

lotusmac

macrumors newbie
Jul 30, 2018
9
6
Another note. I've opened the config file of both the Martin Lo EFi folder (0.6.9) and OCLP generated folder and both of them have this AvoidRuntimeDefrag set to false. On the Opencore project install guide (0.6.9) over here it says Big Sur may require this quirk enabled (set to true) on the booter ( saying Big Sur requires the APIC table present, otherwise causing early kernel panic so the quirk is recommended for the Big Sur users).
I've tried to tweak the Martin Lo Efi folder with the single change but it didn't work (something else might need to be changed in order to work with this setting).

By going through the Opencore install guide, I've also noticed the Martin Lo EFi folder (that had been working wonders until 11.2.3) has at least 11 different settings from the suggestions the Opencore team make for the Nehalem and Westmere cpu (including some NVRAM settings - and the Mac Pro 6.1 smbios suggestion for Catalina and above).

I might try to do a fresh OC efi folder and config file later on, but I might miss on setting up properly some of those custom files that work so great with the legacy cMP logicboard (SSDT USB mapping, audio kexts and so on). Will try to use different SmBios profiles with different PCI ports count. If there's any expert on opencore over here I suggest you to try out a fresh one as well and play around with it as there seems to be some divergence when it comes to the slightly different approaches (Opencore guide suggestions, Martin Lo, OCLP Team). Just my 2 cents.

@tsialex Do you happen to know the bios (or SPI flash) chip size of the Hackintosh systems reported to have the same Big Sur crashings as the cMP (the pc motherboards with Nehalem/Westmere socket)? Is it possible the pc motherboards were using bigger bios chips and still have the same crashing problems (a way of testing your hypothesis)?

Idea about opencore.png
 
  • Like
Reactions: PeterHolbrook

cdf

macrumors 68020
Jul 27, 2012
2,256
2,583
I might try to do a fresh OC efi folder and config file later on, but I might miss on setting up properly some of those custom files that work so great with the legacy cMP logicboard (SSDT USB mapping, audio kexts and so on). Will try to use different SmBios profiles with different PCI ports count. If there's any expert on opencore over here I suggest you to try out a fresh one as well and play around with it as there seems to be some divergence when it comes to the slightly different approaches (Opencore guide suggestions, Martin Lo, OCLP Team). Just my 2 cents.
For educational purposes, you might want to take a look at this as well, which allows you to manually build upon a minimal config from scratch:
 
  • Like
Reactions: Bmju

Bmju

macrumors 6502a
Dec 16, 2013
702
768
Another note. I've opened the config file of both the Martin Lo EFi folder (0.6.9) and OCLP generated folder and both of them have this AvoidRuntimeDefrag set to false. On the Opencore project install guide (0.6.9) over here it says Big Sur may require this quirk enabled (set to true) on the booter ( saying Big Sur requires the APIC table present, otherwise causing early kernel panic so the quirk is recommended for the Big Sur users).
I've tried to tweak the Martin Lo Efi folder with the single change but it didn't work (something else might need to be changed in order to work with this setting).

By going through the Opencore install guide, I've also noticed the Martin Lo EFi folder (that had been working wonders until 11.2.3) has at least 11 different settings from the suggestions the Opencore team make for the Nehalem and Westmere cpu (including some NVRAM settings - and the Mac Pro 6.1 smbios suggestion for Catalina and above).

I might try to do a fresh OC efi folder and config file later on, but I might miss on setting up properly some of those custom files that work so great with the legacy cMP logicboard (SSDT USB mapping, audio kexts and so on). Will try to use different SmBios profiles with different PCI ports count. If there's any expert on opencore over here I suggest you to try out a fresh one as well and play around with it as there seems to be some divergence when it comes to the slightly different approaches (Opencore guide suggestions, Martin Lo, OCLP Team). Just my 2 cents.

@tsialex Do you happen to know the bios (or SPI flash) chip size of the Hackintosh systems reported to have the same Big Sur crashings as the cMP (the pc motherboards with Nehalem/Westmere socket)? Is it possible the pc motherboards were using bigger bios chips and still have the same crashing problems (a way of testing your hypothesis)?

View attachment 1786188

The main OpenCore documentation states this quirk is not needed on Apple hardware: https://github.com/acidanthera/OpenCorePkg/raw/master/Docs/Configuration.pdf
 

tsialex

Contributor
Jun 13, 2016
13,455
13,601
Another note. I've opened the config file of both the Martin Lo EFi folder (0.6.9) and OCLP generated folder and both of them have this AvoidRuntimeDefrag set to false. On the Opencore project install guide (0.6.9) over here it says Big Sur may require this quirk enabled (set to true) on the booter ( saying Big Sur requires the APIC table present, otherwise causing early kernel panic so the quirk is recommended for the Big Sur users).
I've tried to tweak the Martin Lo Efi folder with the single change but it didn't work (something else might need to be changed in order to work with this setting).

By going through the Opencore install guide, I've also noticed the Martin Lo EFi folder (that had been working wonders until 11.2.3) has at least 11 different settings from the suggestions the Opencore team make for the Nehalem and Westmere cpu (including some NVRAM settings - and the Mac Pro 6.1 smbios suggestion for Catalina and above).

I might try to do a fresh OC efi folder and config file later on, but I might miss on setting up properly some of those custom files that work so great with the legacy cMP logicboard (SSDT USB mapping, audio kexts and so on). Will try to use different SmBios profiles with different PCI ports count. If there's any expert on opencore over here I suggest you to try out a fresh one as well and play around with it as there seems to be some divergence when it comes to the slightly different approaches (Opencore guide suggestions, Martin Lo, OCLP Team). Just my 2 cents.

@tsialex Do you happen to know the bios (or SPI flash) chip size of the Hackintosh systems reported to have the same Big Sur crashings as the cMP (the pc motherboards with Nehalem/Westmere socket)? Is it possible the pc motherboards were using bigger bios chips and still have the same crashing problems (a way of testing your hypothesis)?

View attachment 1786188
PCs motherboards of this era usually have even less NVRAM space than Macs, only some servers have the same or more.

MacPro5,1 NVRAM size is a real problem nowadays, but it's not the cause of the crashes here or you didn't even get 11.2.3 booting reliably. This is not the issue, MP3,1 BootROM is half the size and boot much more reliably.
 
  • Like
Reactions: foliovision

tsialex

Contributor
Jun 13, 2016
13,455
13,601
Does that discovery bring us any closer to a solution?
My bet is that Apple probably removed a quirk somewhere that is needed for successful PCI enumeration or something similar. We are looking for any race conditions, IONVRAM is one clearly stated in the source code, but don't mean that is the real culprit.
 
  • Like
Reactions: PeterHolbrook

socamx

macrumors 6502
Oct 7, 2004
360
16
the pale blue dot
I had two episodes of data corruption recently and unfortunately discovered that having your data on separate disks is not enough, the crashes can corrupt other disks.
Is there somewhere I can find more information about this? I don't want to muddle up this thread with questions but to see such a report from someone like yourself who has immense knowledge on the cMPs is worrisome. Up until now I was okay dealing with some minor quirks of Big Sur and 11.2.3 but seeing this message is genuinely making me consider going back to Catalina.
 

tsialex

Contributor
Jun 13, 2016
13,455
13,601
Is there somewhere I can find more information about this? I don't want to muddle up this thread with questions but to see such a report from someone like yourself who has immense knowledge on the cMPs is worrisome. Up until now I was okay dealing with some minor quirks of Big Sur and 11.2.3 but seeing this message is genuinely making me consider going back to Catalina.
Don't forget that I do much more tests than the average reader here, so you probably won't have a corrupt disk, but it's definitively a possible outcome of using Big Sur with a version >11.2.3. Since a lot of security issues have been found with past Big Sur releases, going back to Catalina is a must for anyone that uses it for work. If you have multiple backups of your data and won't disrupt your workflow having to nuke your disks and reinstall from scratch/restore a clone, it's not a too big issue - but it's something that only you can evaluate.

Catalina works perfectly with a MacPro5,1 and the last Security Update took care of the security problems.
 

socamx

macrumors 6502
Oct 7, 2004
360
16
the pale blue dot
Don't forget that I do much more tests than the average reader here, so you probably won't have a corrupt disk
Yeah I was thinking about that fact too, I've been following this thread closely from the start and seeing how much everyone restarts, reinstalls and so forth.

I'm not too concerned yet, I've got Time Machine backups going back well over 3 or 4 years now, I clone my drives about once a week with CCC, and I have everything shoot up to Backblaze.

11.2.3 has been mostly okay for me on it, especially with all the Open Core tweaks from the official thread. But I feel like it's quickly approaching end of life for me, which is unfortunate because I hate retiring a computer that still works.
 

tsialex

Contributor
Jun 13, 2016
13,455
13,601
Yeah I was thinking about that fact too, I've been following this thread closely from the start and seeing how much everyone restarts, reinstalls and so forth.

I'm not too concerned yet, I've got Time Machine backups going back well over 3 or 4 years now, I clone my drives about once a week with CCC, and I have everything shoot up to Backblaze.

11.2.3 has been mostly okay for me on it, especially with all the Open Core tweaks from the official thread. But I feel like it's quickly approaching end of life for me, which is unfortunate because I hate retiring a computer that still works.
A lot of eyes are on this issue now, I have to admit that it's an extremely difficult one to find the real culprit, but we also have progress with several unrelated bugs squashed and improvements to OpenCore being made at each new release.

The probable outcome is that the issue will be mitigated overtime, if we get MacPro5,1 reliably as a MacPro3,1 is right now, for most people it will be good enough to use BigSur as a daily driver. A lot of info is being disseminated too, people are learning best practices and it's a very interesting endeavour.

Anyway, even we don't find it, Catalina will have active support from Apple until October 2022, so it's not the end yet and we can still use our Mac Pros reliably and "securely".
 

panjandrum

macrumors 6502a
Sep 22, 2009
732
919
United States
Well, this is certainly a bummer. I've got my own 4,1 -> 5,1 fully upgraded which I was going to move to Open Core and Big Sur eventually, simply to continue using it for as long as possible as at 12x3.46Ghz cores it's still somewhat faster than the M1 machines in terms of CPU power (and well over 2x faster when it comes to the GPU). Plus, of course, the vast quantity of cheap storage inside it; it's simply a beast not easily replaced by anything Apple currently sells.

More importantly I have 22+ donated to my school in the K-8 educational sector that I've fully upgraded myself; new CPUs (to avoid the 4,1 -> 5,1 audio glitch problem) and home-grown Fusion drives in every one of them. Keeping them working as long as humanly possibly would be great, but we will run into incompatibilities relatively soon if we can't match the Big Sur OS that's now moving into the school in the form of M1 MacBook Airs (for example, a student opens their personal Photos library on a Big Sur machine would never be able to open it again on anything running an older OS, so maintaining a single OS across the board in the school would be ideal if possible. It's not absolutely necessary, but it makes life a lot easer for the students.)

I'll keep a close eye on this thread and see if things work out by the end of this summer when I would have to roll this out.
 

JohnD

macrumors regular
Jun 2, 2005
150
97
Los Angeles, California
Well, this is certainly a bummer. I've got my own 4,1 -> 5,1 fully upgraded which I was going to move to Open Core and Big Sur eventually, simply to continue using it for as long as possible as at 12x3.46Ghz cores it's still somewhat faster than the M1 machines in terms of CPU power (and well over 2x faster when it comes to the GPU). Plus, of course, the vast quantity of cheap storage inside it; it's simply a beast not easily replaced by anything Apple currently sells.

More importantly I have 22+ donated to my school in the K-8 educational sector that I've fully upgraded myself; new CPUs (to avoid the 4,1 -> 5,1 audio glitch problem) and home-grown Fusion drives in every one of them. Keeping them working as long as humanly possibly would be great, but we will run into incompatibilities relatively soon if we can't match the Big Sur OS that's now moving into the school in the form of M1 MacBook Airs (for example, a student opens their personal Photos library on a Big Sur machine would never be able to open it again on anything running an older OS, so maintaining a single OS across the board in the school would be ideal if possible. It's not absolutely necessary, but it makes life a lot easer for the students.)

I'll keep a close eye on this thread and see if things work out by the end of this summer when I would have to roll this out.
Thanks for sharing, but is there a helpful testing report in there somewhere?
 

panjandrum

macrumors 6502a
Sep 22, 2009
732
919
United States
Thanks for sharing, but is there a helpful testing report in there somewhere?
No, but I'm planning to pull several machines soon after the school year is over and give it a go. It will be my first experience with OpenCore, so I expect it will take me a while just to learn that part of the entire process, but I do plan to try Big Sur at some point and will be keeping an eye on this thread and will report what my testing reveals. I've got a significant number of machines with several different hardware configurations; maybe something will reveal itself that hasn't thus far. First I'll be trying to conquer OpenCore and a Catalina install however.
 

ldmfd

macrumors newbie
May 24, 2021
1
3
As a former system integrator (aka UEFI/DEX and Linux kernel hacker) at Intel who also owns a Mac Pro 5,1, this has been a very interesting thread to follow. I used to do system bringup and debugging, working on very similar issues as this one. Unfortunately, the Darwin kernel is very different from the Linux kernel, so no ready solution from me :)

Machines have too many PCI devices to query and not finishing fast enough to create the boot hangs and/or errors

Looking into the boot logs of the failed boots, the number of PCI devices to be configured looks pretty high, always in the 50s, with bridges always above ten. On the other hand, this seems synchronous, and configuration ends before awaiting the root device.

I have not updated my Mac Pro yet, but it would be interesting to first check from the IO registry what all those devices are, if they also show up in 11.2.3, and then start to disable/mask devices either in OC or via ACPI.

If that would make a difference, it probably depends whether it is just the number which causes problems, or if one specific device is the culprit.
 
Last edited:

Macschrauber

macrumors 68030
Dec 27, 2015
2,981
1,487
Germany
At current state it will be safe to stay with 11.2.3

plus secure in some way (with csr local bit?) that OTA updates cannot be loaded.
 

Laurentfr45

macrumors newbie
Jun 6, 2021
3
2
Hello everyone, I have a mac pro 4.1 upgraded to 5.1, 3.33 GHz Intel Xeon 6 cores, 16 GB 1333 MHz DDR3, Radeon RX 580 8 GB I was on big on 11.3.2 and I had the same crash as everyone else at startup (1/10) I would say, but I made a clone of my ssd and I went to 11. 4 and since two days the system is more stable and faster to see with time, I am on open core 0.68, open core is not modified and the configuration comes from here, I tried to go to 0.69, but impossible to boot with this version I will tell you later if I have other problem or if it remains stable
 

khronokernel

macrumors 6502
Sep 30, 2020
278
1,425
Alberta, Canada
Thought I'd add an extra data point, MacBookPro8,3 with an Express Slot to NVMe adapter experiences boot hangs and KPs as well. So means Sandy Bridge is not exempt from this issue.

So we could theoretically get a MacPro6,1, load up with 6 NVMe drives over thunderbolt and theoretically reproduce. If this is possible, then would be able to file a report to Apple on a supported machine. Whether engineers internally would acknowledge such a configuration as a valid reason to resolve XNU's race condition is another question
 

Dayo

macrumors 68020
Dec 21, 2018
2,257
1,279
This is the "Apple Internal" bit but in any case, that information is outdated.

If you set that bit, it will be cleared automatically by Big Sur on real Macs ... so not setting it makes no difference for such. This is why @startergo, who tested things, could not see the reported problems with OTA.

The reason for the confusion was that the feature devs appear to be on Hacks (where it is an issue) and not on real Macs (where it seems not to be):

We always have to remember that despite the increased accommodation of real Macs by OC over the last few months, it is primarily a tool for Hacks. This is why the @cdf config, and variants, remain important although I suppose we are moving closer to the hacks with each release but real Macs, even unsupported exotics such as ours, still seem to have specific things to be considered.
 
Last edited:

borp99

macrumors regular
Jun 25, 2020
139
151
Hello everyone, I have a mac pro 4.1 upgraded to 5.1, 3.33 GHz Intel Xeon 6 cores, 16 GB 1333 MHz DDR3, Radeon RX 580 8 GB I was on big on 11.3.2 and I had the same crash as everyone else at startup (1/10) I would say, but I made a clone of my ssd and I went to 11. 4 and since two days the system is more stable and faster to see with time, I am on open core 0.68, open core is not modified and the configuration comes from here, I tried to go to 0.69, but impossible to boot with this version I will tell you later if I have other problem or if it remains stable
Can you post the exact link location for the configuration file you are using please.
 

JohnD

macrumors regular
Jun 2, 2005
150
97
Los Angeles, California
Thought I'd add an extra data point, MacBookPro8,3 with an Express Slot to NVMe adapter experiences boot hangs and KPs as well. So means Sandy Bridge is not exempt from this issue.

So we could theoretically get a MacPro6,1, load up with 6 NVMe drives over thunderbolt and theoretically reproduce. If this is possible, then would be able to file a report to Apple on a supported machine. Whether engineers internally would acknowledge such a configuration as a valid reason to resolve XNU's race condition is another question
2013 6,1 Mac Pro uses Ivy Bridge (v2 Xeon's) - I assume those are affected also. I just saved a 6,1 from e-waste - just needs a new capacitor (ordered) and expected to work fine after re-assembly. I don't own a single NVMe drive however, let alone 6x of them, nor Thunderbolt enclosures. Does it need to be 6x Thunderbolt enclosures, or 6x NVMe's in 1-2x enclosures? Assuming the repair is successful, I'd be happy to test - just that lack of NVMe's... :p We have a LOT of 6,1's at work, most with Thunderbolt HDD RAID's, but none with NVMe's. If you can think of any other way to validate, let me know.

PS - I assume U.2 drives would suffice as well - we do have a bunch at work, but no Thunderbolt enclosures.
 
  • Like
Reactions: Stex

Stex

macrumors 6502
Jan 18, 2021
280
189
NYC
Hello everyone, I have a mac pro 4.1 upgraded to 5.1, 3.33 GHz Intel Xeon 6 cores, 16 GB 1333 MHz DDR3, Radeon RX 580 8 GB I was on big on 11.3.2 and I had the same crash as everyone else at startup (1/10) I would say, but I made a clone of my ssd and I went to 11. 4 and since two days the system is more stable and faster to see with time, I am on open core 0.68, open core is not modified and the configuration comes from here, I tried to go to 0.69, but impossible to boot with this version I will tell you later if I have other problem or if it remains stable

My tests with 11.4 a few pages back in this thread were done with OC 0.6.8. And they were unsuccessful. See if you can replicate the number of boots that I have in those tests without any panic issue. And if successful, then duplicate the number of boots to see if you can match JohnD record. Otherwise, unfortunately it is hard to believe that your system has no (potential) boot issues at this time.

EDIT: ...and here more 11.4 tests with SIP disabled
 
Last edited:

khronokernel

macrumors 6502
Sep 30, 2020
278
1,425
Alberta, Canada
2013 6,1 Mac Pro uses Ivy Bridge (v2 Xeon's) - I assume those are affected also. I just saved a 6,1 from e-waste - just needs a new capacitor (ordered) and expected to work fine after re-assembly. I don't own a single NVMe drive however, let alone 6x of them, nor Thunderbolt enclosures. Does it need to be 6x Thunderbolt enclosures, or 6x NVMe's in 1-2x enclosures? Assuming the repair is successful, I'd be happy to test - just that lack of NVMe's... :p We have a LOT of 6,1's at work, most with Thunderbolt HDD RAID's, but none with NVMe's. If you can think of any other way to validate, let me know.

PS - I assume U.2 drives would suffice as well - we do have a bunch at work, but no Thunderbolt enclosures.
Perfect, the main hypothesis for triggering the race condition is adding more dedicated PCI devices to the device tree for XNU to build off of. Thunderbolt devices will qualify however USB drives wouldn't very much due to how IOPCIFamily isn't involved much beyond the inital XHCI/EHCI controller setup on the MacPro6,1. So Thunderbolt HDD setups should work great, I mainly said NVMe as that's the quickest way to trigger this race condition.

Similar to MacPro3,1 and others, the race condition will likely be difficult to replicate quickly. However at least with my personal MacPro3,1, I replicate much faster after several consecutive reboots as well as OS installations
 
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.