Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
The first post of this thread is a WikiPost and can be edited by anyone with the appropiate permissions. Your edits will be public.
Status
Not open for further replies.
Might be worth using that as it would load the driver version consistent with the BS version.

You can't rule out issues coming from using an older version otherwise.
 
So Alex when you built my boot rom you added the more up to date rom drivers from the 2012 era. Is it a possibility right now that those who are having these problems are all using the 2010 era? Just asking. I will not be trying any of this because I don’t have a spare machine that be offline from this kind of experimentation.
It's not BootROM related in any way - you can make it worse with a corrupted/full NVRAM, but you can't make it better.

All early-2009 to mid-2012 have the same PCIe related crashes, BootROM being pristine or not. You can flash the generic firmware upgrade (MP51.fd) and the crashes still happen.
 
the bootrom chip itself:

every reboot will write to the nvram section of the Firmware in that chip.

I dont recommend to let the box boot the whole night thru cause of the wear.
The 25L3205D is related for 100,000 P/E cycles - that's rewriting the entire chip 100,000 times. Assuming every reboot rewrites the entire chip (which it doesn't), a 2009 Mac Pro would have had to have been rebooted 23x every single day the past 12 years to exhaust that chip.

You can also buy a blank for $10 if you know someone with a SPI programmer, or buy one already flashed off eBay (with someone else's, likely blacklisted, serial number), solder it on, then flash your backup onto it.

But still, not a good idea to leave it in a reboot cycle all night. :)
 
The 25L3205D is related for 100,000 P/E cycles - that's rewriting the entire chip 100,000 times. Assuming every reboot rewrites the entire chip (which it doesn't), a 2009 Mac Pro would have had to have been rebooted 23x every single day the past 12 years to exhaust that chip.

You can also buy a blank for $10 if you know someone with a SPI programmer, or buy one already flashed off eBay (with someone else's, likely blacklisted, serial number), solder it on, then flash your backup onto it.

But still, not a good idea to leave it in a reboot cycle all night. :)
You understood it wrong, it's not the whole chip that can be rewritten 100k times, but the NAND cell/sector (it's a sectored chip) is certified to have endurance over 100K cycles of erase/rewrite (JEDEC A117).

Since there is no wear levelling with SPI flash memories of this era, the sectors never change, and the NVRAM volume area easily can get to the 100K cycles after all those years and it's why so much early-2009/mid-2010s are having dead backplanes.

This Infineon paper "Endurance and Data Retention Characterization of Infineon Flash Memory" is more comprehensible than the JEDEC A117 standard:


The endurance specification of a flash device should be evaluated in terms of the projected in-system rate of erasure for any given sector. The sectors used for data logging may rapidly accumulate erase cycles depending on the frequency and size of the data being captured. Such use may ultimately lead to those sectors failing first. As such, the shorter the Program/Erase interval time between Program/Erase cycles, the worse the data retention. Longer interval times between Program/Erase cycles can de-trap the excess trapped electrons between Program/Erase cycles, resulting in better data retention. Figure 8 shows an example of the retention lifetime over a variety of interval times, assuming 20 years retention lifetime after 10k Program/Erase cycles at an average of 55 degree Celsius cycling, under JEDEC test conditions.

The quote above represents exactly the same use case of the Mac Pro NVRAM volume inside the SPI flash memory that stores the BootROM image.
 
Last edited:
Read the threads from @tsialex, he explained all in detail.

The flash chips die, if it's the wear, the age or the continuous writing of the same cells in the nvram area. They die.

I have repaired more than a dozen boards with dead firmware chips, some of them readable with lower frequency when pulled off the board and read out with an external programmer.

Just wanted to spell a warning, dont overstress the firmware chip with too much rebootings as say all night thru.
 
Might be worth using that as it would load the driver version consistent with the BS version.

You can't rule out issues coming from using an older version otherwise.
Tried supply_apfs and the efi driver allone, with both BS hang with USB Hardware and ancient Firmware.

A12DF0F9-F9BE-4C83-864E-A3AF00B5D0DF.jpeg
 
  • Like
Reactions: Dayo
Some of the active ones would have noticed this obviously sooner or later, but could there be some kind of a correlation
I wouldn't think so, as Syncretic discovered the issue to be Apple using entirely new bootstrap code in 11.3b3, which is the version that we started having issues with. Highly unlikely (but possible) that Apple would include that code in previous OS updates, and especially not Mojave, as our 5,1 Mac Pro's are still supported under Mojave. That would be GREAT if they did however. :)
 
Some of the active ones would have noticed this obviously sooner or later, but could there be some kind of a correlation here:
Yesterday at 22:08

I read that and thaught also: aha, another race condition.

if that bootstrap code is timing critical it could affect other supported Macs In special, untested configuration.

why not? So the ball goes back to the mothership to fix their code.

at least, hope so.
 
I read that and thaught also: aha, another race condition.

if that bootstrap code is timing critical it could affect other supported Macs In special, untested configuration.

why not? So the ball goes back to the mothership to fix their code.

at least, hope so.
That post is for a Catalina security update - I would highly doubt Apple would update the bootstrap code, in a security update, for the previous OS, but never know I suppose.
 
Another quick update - good news/bad news. Among the changes to the startup code, Apple reorganized some (not all) things from a linear procedure to a priority list. Basically, instead of code saying "do A, then do B, then do C...", there's an array of functions to call and arguments to pass, each one with a subsystem and a rank assignment. (I didn't catch this at first, because this list is stored in its own separate data segment (__KLDDATA,__init_entry_set), so my initial disassembly missed it). Anyway, that list gets sorted at runtime by subsystem and rank, then as each of the subsystems gets initialized, all of the associated functions in the list get called in order of their rank. From a programming perspective, it's much more elegant and flexible; from a reverse-engineering perspective, it's a pain in the ass - and the eventual source code release is unlikely to help, because 11.1 uses this same mechanism (to a much smaller extent) and the 11.1/11.2 source did not include this part.

I was initially intrigued by the runtime sorting of the list, because they use qsort. Some variants of qsort use random pivots, which would result in slightly different outcomes with every run (sound familiar?). Unfortunately, after tracing through the qsort they used, it's deterministic (meaning it should generate the same output every time).

My current working theory is that by reorganizing the code in this manner, one or more functions ended up being executed earlier or later than they did before this change, resulting in either creating or exposing a race condition. The good news is that because they used a priority list, if we could identify the function(s) that need to be re-ordered, it would only take a one-byte (or perhaps a few-byte) patch to fix it. The bad news is that in 11.3, there are 2315 of these function/argument pairs to analyze in order to make that determination.

I'm going to take a break from this and ponder how I might automate at least part of this process; there's no way I'm going to slog through 2315 functions just to find the needle in this haystack.
 
Please see my post #2,516 as I do not want to be repeating posts in multiple threads.

For some reason unbeknownst to me, my flashed 2009 cMP at work is doing just fine with no new issues on 11.3
 
Thinking backwards a bit, is there any way to make a supported machine exhibit this race condition? Then it would be an official bug report that Apple would need to fix. I don't have a supported machine, or I'd be hard at work trying every combination to make it fail right now.
 
  • Like
Reactions: RLTechs1
@Syncretic, you can use macOS DEBUG or DEVELOPMENT kernels for easier disassembling. For me IDA extracts the dSYM and makes things quite readable in Hex-Rays out of the box. DEBUG builds are much better, but Apple stopped providing them after 11.0b1 or so at least for the time being.

I partly reverse-engineered the sysctl initialisation code in 11.3 for my own needs, and the only change I remember was sysctl init code being moved to kernel_startup_initialize_upto (https://github.com/apple/darwin-xnu/blob/xnu-7195.81.3/osfmk/kern/startup.c) in addition to all the other functions that were moved earlier in 11.x. I had not noticed anything unusual in it as Apple was restructuring their constructor code from the beginning of Big Sur. Also, the code is pretty much open, I am not sure what in particular do you mean regarding closed source.
 
Thinking backwards a bit, is there any way to make a supported machine exhibit this race condition? Then it would be an official bug report that Apple would need to fix. I don't have a supported machine, or I'd be hard at work trying every combination to make it fail right now.
I think I still have one 2018 Mac mini that hasn't been upgraded at the shop yet. If it's a slow day today I can check it out, and give it a try although, the other 2 that are upgraded are not showing any signs of issues yet.
 
  • Like
Reactions: JohnD
@VitaminK You mention NVRAM and bricking the Mac Pro in your post. Out of curiosity, this is the same brick that requires one to then purchase a MATT card correct? Does merely resetting the NVRAM too many times have the same effect?

Sorry it's a bit unrelated, but my biggest fear is accidentally bricking my Mac Pro. :p
 
@VitaminK You mention NVRAM and bricking the Mac Pro in your post.
The warning on the first post was written by me, please follow the link, read the instructions and check it.
Out of curiosity, this is the same brick that requires one to then purchase a MATT card correct?
Yes.
Does merely resetting the NVRAM too many times have the same effect?

Sorry it's a bit unrelated, but my biggest fear is accidentally bricking my Mac Pro. :p
This is a lot more complicated than it appears. NVRAM with Intel Macs is not a battery backed SRAM like with PPCs that you just remove the power and it's fresh from factory.

What it's called "NVRAM reset" is a triggered/forced garbage collection that happens inside the NVRAM volume that is stored in the BootROM. The NVRAM is not really erased when you reset it.

A lot of info inside the NVRAM volume is permanent, while some are almost permanent and some are transient. The reset NVRAM procedure removes the transient (like default boot device/default sound volume/etc), the "deep NVRAM reset", when working, removes the transient and some of the almost permanent (like the MemoryConfig variables), but you never get a pristine NVRAM volume with a NVRAM reset - you never get it back factory fresh like it's possible with a PPC Mac (the exception is with BootROM reconstructions where a firmware engineer recreates the never booted image of your Mac Pro BootROM).

I've read that someone forced hundreds of NVRAM resets overnight using a Arduino, this is beyond crazy and doing it will kill the SPI flash memory (read post #755).
 
The warning on the first post was written by me, please follow the link, read the instructions and check it.

Yes.

This is a lot more complicated than it appears. NVRAM with Intel Macs is not a battery backed SRAM like with PPCs that you just remove the power and it's fresh from factory.

What it's called "NVRAM reset" is a triggered/forced garbage collection that happens inside the NVRAM volume that is stored in the BootROM. The NVRAM is not really erased when you reset it.

A lot of info inside the NVRAM volume is permanent, while some are almost permanent and some are transient. The reset NVRAM procedure removes the transient (like default boot device/default sound volume/etc), the "deep NVRAM reset", when working, removes the transient and some of the almost permanent (like the MemoryConfig variables), but you never get a pristine NVRAM volume with a NVRAM reset - you never get it back factory fresh like it's possible with a PPC Mac (the exception is with BootROM reconstructions where a firmware engineer recreates the never booted image of your Mac Pro BootROM).

I've read that someone forced hundreds of NVRAM resets overnight using a Arduino, this is beyond crazy and doing it will kill the SPI flash memory (read post #755).

Like usual, you're a fount of information! Thanks.

I really ought to get the information pulled off of my NVRAM at some point, just to be safe in case it happens.
 
Last edited:
In


you will see garbage collection has run in the 3rd to 4th Screenshot, look at Free Space.


This script used for dumping the Firmware and analysing the NVram a little can be loaded here:



if you want to start healthy do at least with a forced garbage collection (4 times nvram reset in one row)
 
Maybe this helps a little to understand the early boot process for 11.3

even this is for M1 Macs it should have a lot in common:


also the linked pdf is very informative

start reading at page 41

 
Last edited:
  • Like
Reactions: JohnD
Maybe this helps a little to understand the early boot process for 11.3

even this is for M1 Macs it should have a lot in common:


also the linked pdf is very informative

start reading at page 41

Intel-based Mac computers without a T2 chip An Intel-based Mac without a T2 chip doesn’t support secure boot. Therefore the UEFI firmware loads the macOS booter (boot.efi) from the file system without verification, and the booter loads the kernel (prelinkedkernel) from the file system without verification. To protect the integrity of the boot chain, users should enable all of the following security mechanisms:

• System Integrity Protection (SIP): Enabled by default, this protects the booter and kernel against malicious writes from within a running macOS.

• FileVault: This can be enabled in two ways: by the user or by a mobile device management (MDM) administrator. This protects against a physically present attacker using Target Disk Mode to overwrite the booter.Apple Platform Security 43

• Firmware Password: This can be enabled in two ways: by the user or by an MDM administrator. This protects a physically present attacker from launching alternate boot modes such as recoveryOS, Single User Mode, or Target Disk Mode from which the booter can be overwritten. This also prevents booting from alternate media, by which an attacker could run code to overwrite the booter
 
FileVault. Has anyone tried booting ≥11.3 with it? I wonder if the booting issue would be any different.
 
FileVault. Has anyone tried booting ≥11.3 with it? I wonder if the booting issue would be any different.
It looks like the prohibitory sign is probably due to the filevault not able to unlock the hard drive so the Apple suggestion is actually to disable it:
So I was on the phone with Apple support today and my adviser said the error messages in my boot log are very similar to those you would see when FileVault is not able to unlock your hard drive while booting.
 
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.