MP 1,1-5,1 Latebloom - An experimental workaround for the 11.3+ race condition

tsialex · Jul 13, 2021

HuRR said:
Are you keeping the delay the same or will you change any of the parameters before installing?

I'll start with the same delay and adjust as needed overtime. Btw, upgrades are one of the best assurance tests that anyone can do, it's very common to have KPs with the several warm reboots needed - before latebloom several people reported volume corruption with the KPs during the upgrade.

HuRR · Jul 13, 2021

tsialex said:
I'll start with the same delay and adjust as needed overtime. Btw, upgrades are one of the best assurance tests that anyone can do, it's very common to have KPs with the several warm reboots needed - before latebloom several people reported volume corruption with the KPs during the upgrade.

Yup. I think I'll install as well in a few. I'll start with my current delay and see what happens.

tsialex · Jul 13, 2021

Just to remember everyone, Apple will almost certainly remove 11.2.3 from the download servers with the release of 11.5, so if you still don't have it, the time to download is right now.

http://swcdn.apple.com/content/downloads/12/32/071-14766-A_Q2H6ELXGVG/zx8saim8tei7fezrmvu4vuab80m0e8a5ll/InstallAssistant.pkg

HuRR · Jul 13, 2021

11.5 is giving me issues so far. Thought it had installed, which I think it did but now it's hanging on a lot of different values.

Testing to continue. May need to go back to 11.4 if anything.

@tsialex any updates on your end?

tsialex · Jul 13, 2021

HuRR said:
11.5 is giving me issues so far. Thought it had installed, which I think it did but now it's hanging on a lot of different values.

Testing to continue. May need to go back to 11.4 if anything.

@tsialex any updates on your end?

Still downloading, I'll probably only get it late this night. My 500Mbps fiber connection never get me more than ~800KB/s from Apple for any new downloads since I'm physically distant from the Apple CDNs that initially start delivering the content.

macnrolla · Jul 13, 2021

tsialex said:
I've being testing latebloom with my Mac Pros at home/office for some time now (most of my tests were with 0.16 and 0.17) and found that it's efficacy is directly dependent with the Mac Pro hardware configuration and the latebloom value needs to be tailored to the hardware config. More PCIe devices installed / dual processor / slower processor, more delay is needed to overcome the KPs.

While I did most of my tests with my home test Mac Pro (early-2009 with single X5680 and no PCIe switches), these are the values that I'm using for each config trying to achieve the best reliability possible and not the fastest boot time:

Year model + CPU GPU: PCIe AHCI/NVMe: USB: AirPort Extreme: latebloom value:
early-2009 with dual X5680 eVGA GTX 680 Mac Edition 1 SM951-512 Orico FL1100 BCM94322MC 200
early-2009 with single X5680 eVGA GTX 680 Mac Edition 1 SM951-256 Orico FL1100 BCM94322MC 80
mid-2010 with single W3680 Sapphire Pulse RX 560 1 PM961-256 Caldigit with PCIe v2.0 switch BCM94360CD 175
mid-2012 with single X5680 Sapphire Pulse RX 580 SSD7101A-1 v101 with 4 PM961-256 blades + 980 PRO Caldigit with PCIe v2.0 switch BCM94360CD 250

One thing to note, I still have very occasional KPs when cold booting with my mid-2012 with single X5680 and with my early-2009 with dual X5680 in normal usage while for the other two with less PCIe devices installed, it's almost perfect.

If you have slower Xeons, you'll probably have to increase the value.

Do you have set any value at `lb_range` and what are your experiences with it? My current parameters are `latebloom=248 lb_range=124`, it is the most "stable" variant so far after much testing. I have a single X5690, with the same GPU and similar PCIe AHCI/NVMe like the last Mac Pro in your list.

tsialex · Jul 13, 2021

macnrolla said:
Do you have set any value at `lb_range` and what are your experiences with it? My current parameters are `latebloom=248 lb_range=124`, it is the most "stable" variant so far. I have a single X5690, with the same GPU and similar PCIe AHCI/NVMe.

Most of my tests were with v0.15 and v0.16 releases that don't have lb_range, it's a v0.17 setting, I only started to test with the current release Sunday night, so I still don't have a baseline yet.

macnrolla · Jul 13, 2021

tsialex said:
Most of my tests were with v0.15 and v0.16 releases that don't have lb_range, it's a v0.17 setting, I only started to test with the current release Sunday night, so I still don't have a baseline yet.

Thanks for your answer. Just wondering, if 50% of `latebloom` as `lb_range` is way too much and I have always been lucky.

Edit: Updated the `lb_range` value back to `0`.

tsialex · Jul 13, 2021

macnrolla said:
Thanks for your answer. Just wondering, if 50% of `latebloom` as `lb_range` is way too much and I have always been lucky.

Still don’t know, but right now my idea is to identify the point that starts to KP overtime then go back to the setting that is most reliable and add a lb_range value that is less than the difference between the two.

macnrolla · Jul 13, 2021

Syncretic said:
lb_range=N sets a range for random delays. By default, every delay is exactly the same (either the latebloom=N value or the default 60ms). If lb_range is set, the delay for each PCI bus probe is random in the range (latebloom +/- lb_range) milliseconds. For example, latebloom=90 lb_range=20 results in random delays between 70..110 ms (90 +/- 20). If lb_range=N is not present, it defaults to 0 (no random changes, always use the same delay value). Using random delays may help avoid deadlocks.

@Syncretic Wondering if the value for `lb_range` should always be added and never subtracted.

HuRR · Jul 13, 2021

HuRR said:
11.5 is giving me issues so far. Thought it had installed, which I think it did but now it's hanging on a lot of different values.

Testing to continue. May need to go back to 11.4 if anything.

@tsialex any updates on your end?

11.5 RC did not work for me. Not sure there is much else I can do. Hoping others had better success. Formatting and restoring until further notice.

EDIT: I can get into 11.5 recovery perfectly fine but no luck with booting into the system.

Ausdauersportler · Jul 13, 2021

macnrolla said:
@Syncretic Wondering if the value for `lb_range` should always be added and never subtracted. Or is "sometimes less" also good against deadlocks?

What is exactly the difference between 100+[0...10] and 105+-5 ?

sfalatko · Jul 13, 2021

Is anyone seeing different results (in terms of boot success/failure) between cold boot and warm reboot?

Right now it is only anecdotal but I have several hangs warm booting (rebooting) with 100 while success cold booting.

I haven't had time to do more of a "test" but was curious if anyone else noticed similar differences

tsialex · Jul 13, 2021

sfalatko said:
Is anyone seeing different results (in terms of boot success/failure) between cold boot and warm reboot?

Right now it is only anecdotal but I have several hangs warm booting (rebooting) with 100 while success cold booting.

I haven't had time to do more of a "test" but was curious if anyone else noticed similar differences

From the start we noticed that latebloom is a lot more efficient to overcome KPs at cold boot. Warm reboots are much more susceptible to KPs.

Syncretic · Jul 13, 2021

macnrolla said:
@Syncretic Wondering if the value for `lb_range` should always be added and never subtracted. Or is "sometimes less" also good against deadlocks?

For deadlocks, "more" or "less" aren't relevant, "random" is (at least when you're not in control of the whole system).

The classic example of a deadlock situation is the Dining Philosophers problem. Imagine a circular table with some number of seats. In front of each seat is a plate of spaghetti, and between each plate is a fork (thus there are the same number of forks as plates). Seated in each seat is a philosopher, who must eat some, then think for a while, then eat some more. In order to eat, a philosopher must have a fork in each hand, left and right - without two forks, the philosopher may not eat. A fork can only be used by one philosopher at a time.

If every philosopher at the table behaves by the same rules, each of them would reach to their right and pick up a fork, then look to their left and find no remaining forks, so none of them could eat - deadlock. If they then all lowered their right forks and picked up the one to their left, the same thing happens - no one can eat because none can get the necessary resources (two forks). What can they do to avoid starving?

They could choose to wait between attempts to pick up a fork - but again, if every one of them works by the same rules, they'd all wait the same amount of time and simultaneously repeat the one-fork pickup.

There are several possible solutions to this problem; many of them involve adding elements to the scenario or adding/changing the rules to adapt. A simple one is to introduce randomness - a philosopher will pick up either a left or right fork (randomly), then try to get the other fork; failing that, the philosopher will put down the fork and think for a random amount of time (presumably with some boundaries) before trying again. While this is not the most efficient system, it breaks the deadlock and allows the philosophers to eat. Because the delays are random, there are fewer cases where philosophers are "in sync," and when it does happen, they quickly get back out of sync.

In our case, it's unclear whether the random element is truly helpful, or just wishful thinking. While latebloom does mitigate the race condition for most systems, it's not addressing the underlying condition (it's an aspirin that eases the headache without affecting the brain tumor). If there is a deadlock happening, it's unclear whether or not it's directly related to the PCI bus probe, so randomly modulating the timing of those probes may or may not help with the deadlock. Assuming reasonable values are chosen, though, I don't think random values would have any negative effects. (In a nutshell, "it can't hurt.")

(My personal theory is that the real problem is in the APFS kext; however, manipulating that is far more difficult than this PCI hack, due to lack of both documentation and source code, plus the added risk of serious data corruption if there are any errors in the hack.)

sfalatko said:
Is anyone seeing different results (in terms of boot success/failure) between cold boot and warm reboot?

Right now it is only anecdotal but I have several hangs warm booting (rebooting) with 100 while success cold booting.

I haven't had time to do more of a "test" but was curious if anyone else noticed similar differences

From post #1: "Note that while it's effective on either warm (restart) or cold (power-up) boots, testing shows it to be much more effective on cold boots, for reasons currently unknown. I welcome more test results to see if we can figure out why, and perhaps do something about it."

macnrolla · Jul 13, 2021

Ausdauersportler said:
What is exactly the difference between 100+[0...10] and 105+-5 ?

~~Maybe not much. But let's say, the `latebloom` value is `100` and `lb_range` has the value of `20`, in the worst case, latebloom is only `80` and thats maybe to short and so KP.~~

Edit: @Syncretic has written a really cool explanation above.

macnrolla · Jul 13, 2021

Syncretic said:
For deadlocks, "more" or "less" aren't relevant, "random" is (at least when you're not in control of the whole system).

The classic example of a deadlock situation is the Dining Philosophers problem. Imagine a circular table with some number of seats. In front of each seat is a plate of spaghetti, and between each plate is a fork (thus there are the same number of forks as plates). Seated in each seat is a philosopher, who must eat some, then think for a while, then eat some more. In order to eat, a philosopher must have a fork in each hand, left and right - without two forks, the philosopher may not eat. A fork can only be used by one philosopher at a time.

Thanks for this cool explanation

and your work and time.

Ausdauersportler · Jul 13, 2021

macnrolla said:
Maybe not much. But let's say, the `latebloom` value is `100` and `lb_range` has the value of `20`, in the worst case, latebloom is only `80` and thats maybe to short and so KP.

You do not get my point:

From a math point of view there is exactly no difference (given you use the same flat distribution for the random number generator). So there is absolutely not need to change implementation, because you can do this be choosing the correct values yourself (as you can check yourself).

There are no known optimal value for the startpoint yet. So it is ridiculous to claim that subtracting from an arbitrary startpoint or value is worse than adding to the same or different arbitrary value.

w1z · Jul 13, 2021

tsialex said:
Still downloading, I'll probably only get it late this night. My 500Mbps fiber connection never get me more than ~800KB/s from Apple for any new downloads since I'm physically distant from the Apple CDNs that initially start delivering the content.

Try changing your dns to google’s, cloudflare’s or 9.9.9.9

I get pretty good download speeds from Apple CDN, which I believe is Akamai, using NextDNS so you can try their service too (it’s free).

Macschrauber · Jul 13, 2021

Ausdauersportler said:
You do not get my point:

From a math point of view there is exactly no difference (given you use the same flat distribution for the random number generator). So there is absolutely not need to change implementation, because you can do this be choosing the correct values yourself (as you can check yourself).

There are no known optimal value for the startpoint yet. So it is ridiculous to claim that subtracting from an arbitrary startpoint or value is worse than adding to the same or different arbitrary value.

I was unsure about the random thing, so I asked:

Question: is the random delay the same for every item it loads ?

like lb_range=10

a)
Boot 1:
PCI Probe 1 100ms
PCI Probe 2 100ms
PCI Probe 3 100ms

Boot 2:
PCI Probe 1 105ms
PCI Probe 2 105ms
PCI Probe 3 105ms

or like

b)
Boot1
PCI Probe 1 95ms
PCI Probe 2 100ms
PCI Probe 3 97ms

Boot 2:
PCI Probe 1 96ms
PCI Probe 2 101ms
PCI Probe 3 98ms

Answer is b)

cdf · Jul 13, 2021

HuRR said:
11.5 RC did not work for me. Not sure there is much else I can do. Hoping others had better success. Formatting and restoring until further notice.

Did you check if latebloom found the code to hook in this case?

Ausdauersportler · Jul 13, 2021

Macschrauber said:
I was unsure about the random thing, so I asked:

Question: is the random delay the same for every item it loads ?

like lb_range=10

How can a random added delay be the same?
What is your definition of random?

I bet the author found away to change the seed of the generator on every cold boot to avoid a static (random) sequence of delays.

EDIT:
It will be like answer b, but since I have no access to the source I can only assume it. At least I would have used a new random value each time the delay is applied. This comes close to the „add some noise to the boot process“ approach.

Macschrauber · Jul 13, 2021

Ausdauersportler said:
How can a random added delay be the same?
What is your definition of random?

I bet the author found away to change the seed of the generator on every cold boot to avoid a static (random) sequence of delays.

I was quite sure it must be b) but I wanted to know exactly. Hope that brings a little more clarity for some who asked.

StarPlayrX · Jul 13, 2021

Syncretic said:
From post #1: "Note that while it's effective on either warm (restart) or cold (power-up) boots, testing shows it to be much more effective on cold boots, for reasons currently unknown. I welcome more test results to see if we can figure out why, and perhaps do something about it."

When testing USB-C Cards on Mac Pro 3,1, on macOS 11.1, cold starts would usually recognize the PCIe card properly and it would load the correct driver. Warm starts would usually lead to loading the wrong driver and the card would often get recognized as a USB 3.0 card instead.

I've also found a race condition in 11.2.3 and earlier on the Mac Pro 3,1. It just happens way less than 11.3 and later. It would usually spit out Airport channels to the console and later end with FireWire GUID <ptr> is invalid! repeating over and over.

The race conditions / deadlocks, I have found from my Mac Pro 3,1 in 11.3 or later including Monterey usually involve Crypto and/or USB HID devices.

---

My first cold start was a success on Mac Pro 3,1. 4 PCIe cards (8G RX580, USB 3.0, Two OWC PCIe 2.5" SSD Cards)
Using BigMac2, I decided to load `/Library/Apple/System/Library/Extensions/latebloom.kext` and have it pretend to be an Apple Ext with `com.apple.AAA.LoadEarly.latebloom` bundle ID. Seem to take pretty quick. set chown -R to 0:0 and chmod -R 755. Rebuilt kc's using Bigmac2.

My boot-args:
latebloom=100 lb_range=20 lb_debug kext-dev-mode=1 -v

(easier to see if loaded with grep)
kmutil showloaded | grep latebloom

this command is useful. It will point out if permissions are not set correctly.
kmutil print-diagnostics -p /Library/Apple/System/Library/Extensions/latebloom.kext

Testing is on Monterey Beta 2. Two successful cold starts in a row. Could see it working well in verbose mode.

Great work especially for a beta. @Syncretic

---

Update: I am having very good luck with Monterey Dev Beta 2 with LateBloom. I am not have any luck with 11.4 and tried different settings. Like most users 100 +/- 20 is a good sweet spot. I even thought well try 110 and 10, but it was not as good. 11.4 does not want to cooperate; I'll wait for 11.5 to go public and see if it matches up to Monterey.

Noteworthy: Before LB, on Monterey B2, Preview would crash. After LB, Preview now behaves and does not crash.

---

HuRR · Jul 13, 2021

cdf said:
Did you check if latebloom found the code to hook in this case?

Sorry for the bad quality. I had to use slow mo mode.

EDIT: Something tells me that 11.5 install got corrupted. I increased 260 and it went much further and then verbose became an Apple Boot Screen and it's stopped exactly at the beginning. I have an NVME enclosure coming tomorrow. I'll pull that drive out and try and run it off another computer. Then put it back into the Mac Pro.

EDIT 2: Formatted the Data partition. Ran recovery. Installed 11.4 smoothly with latebloom. It's running perfectly fine. Now going to try and install 11.5 delta update again from 11.4. If that doesn't work. I'll probably just pull the drive, enclose it, and then run the installer on a MBP.

EDIT 3: I had a clean 11.4 partition and was able to install 11.5. I'm currently using migration assistant and transferring all my stuff from my 11.4 backup. This will probably last overnight. I ended up increasing the delay and the range and it ended up working. I most likely had a bad install which can happen. Curious to see once everything is migrated. I've had this issue in the past but only like 2 out of 4 USB 3 ports work. I'm assuming this has to do with USB mapping?

EDIT 4: Back up with everything transferred. Just did 6 warm boots and they loaded each and every time on 11.5. I know it's a small sample size but it's pretty damn good so far. This is what I used to boot. Don't ask me why I used that. It just ended up working. I'm sure the delay can be reduced but I have not had an issue, knock on wood, so far using these values.

Code:

latebloom=280 lb_range=50 lb_debug=1

MP 1,1-5,1 Latebloom - An experimental workaround for the 11.3+ race condition

Contributor

macrumors regular

Contributor

macrumors regular

Contributor

macrumors member

Contributor

macrumors member

Contributor

macrumors member

macrumors regular

macrumors 603

macrumors 6502a

Contributor

macrumors 6502

macrumors member

macrumors member

macrumors 603

macrumors 6502a

macrumors 68040

macrumors 68020

macrumors 603

macrumors 68040

macrumors member

macrumors regular

Attachments

Our Staff