Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Wonder whether the kext can be blocked and replaced by the one from the earlier working MacOS version in OpenCore.
Feedback is that it is possible to do this in OpenCore but chances of a successful outcome are low since there are likely multiple kexts linked to AppleFSCompressionTypeZlib and the OpenCore Block/Insert process would break such links.

In other words, it would most likely need filesystem item changes to implement the replacement.
 
Unfortunately I cannot replicate the crash locally, I've implemented a downgrade for the Zlib kext in a new branch of OCLP.
Below binary is from zlib-test branch commit 67a78ec, you can verify the kext was properly replaced by seeing whether com_apple_AppleFSCompression_NoAVXFSCompressionTypeZlib appears in IOService. If it applied correctly, you'll see that and not the original AppleFS Zlib entry.
Additionally a copy of the NoAVX variant for those who want to install it manually:

Verified personally on a Macmini4,1 that the old Zlib is in effect, however would appreciate more tests to see if this fix resolved the issue

Screen Shot 2022-05-24 at 5.53.45 PM.png
 
It appears the saga has resumed as the v12.4 release appears to use another unsupported CPU instruction, AVX in this case apparently, that causes random kernel panics.

Wonder whether this can be similarly patched. I suppose more and more instances will keep cropping up until the patches, if even possible/desirable, become as long as one's arm.

I expect the next MacOS version will be filled with similar instructions but let's live in hope!
Doesn’t really make sense that 12.4 would require AVX as Apple Silicon does not support AVX. Strange.
 
Doesn’t really make sense that 12.4 would require AVX as Apple Silicon does not support AVX. Strange.
Apple Silicon and Intel are two difference architectures. Apple needs to build binaries containing code for both architectures, which generally makes the binary files twice as large as usual (like back in the PPC/Intel days or the Intel 32 bit/64 bit days). I suppose it would be possible to create a binary that has 4 or 5 architectures (PPC 32/64 bit, Intel 32/64 bit, ARM)

It may be that the compiler used to generate the x86 binary choose to use AVX as an optimization.
Or it may be that Apple choose to use a AVX optimized library.
 
  • Like
Reactions: freqrider
It may be that the compiler used to generate the x86 binary choose to use AVX as an optimization.
Or it may be that Apple choose to use a AVX optimized library.
Interesting thing for the future is that Intel has apparently decided to drop AVX512 instruction support which would leave Apple, unless it decides to release a new Intel Mac with Adler Lake or newer, being able to just continue to use AVX512 until the transition to ARM is completed as the current macs would have support. This would create issues for hackintosh with newer CPUs.
 
  • Wow
Reactions: Bmju
I had a bright idea in the shower this morning, and I got to spend about 5 minutes looking at the 12.4 AppleFSCompressionTypeZlib code again. My idea came to nothing, but I did notice something odd (and possibly foreboding): Apple seems to be using all of the YMM registers as arguments. The (pseudocode) sequence of events is:
Code:
compression_decode_buffer:
   (function call)
      (function call)
         Save all YMM registers on stack
         (manipulate those values on the stack)
         Restore all YMM registers from (manipulated) stack
         return
(EDIT 28may22 - the following was a misunderstanding of the code. Look here for corrected details.)This suggests that compression_decode_buffer() is expecting 512 bytes of input in the YMM registers, and is then providing 512 bytes of output in the same YMM registers. This doesn't fit into any of the ABI models I've seen; not only are they using all of the YMM registers (not just the prescribed subset), they seem to be passing an entire buffer therein (and since all of the compression_decode_buffer() arguments seem to be accounted for, it's unclear to me what exactly is being passed via the YMM registers; more analysis is required). (It's also conceivable that they're trying to use the YMM registers as persistent storage, but that makes no sense, since the YMM registers are considered volatile by the ABI (at least the ABI I thought was in use).)

If they're using all of the YMM registers as arguments, that means anything that calls compression_decode_buffer() will also use the YMM registers (if only for argument passing), which means there's probably more AVX code lurking out there. If this becomes the norm, patching will become untenable.

(As I'm typing this, I'm wondering if this isn't already a bug - the reports I've seen show the #UD occurring at the first vmovdqa in compression_decode_buffer(), which means compression_decode_buffer() got called - but that also means the caller didn't previously load the YMM registers with data, because in that case, the #UD would have occurred there instead. I'm now guessing that whatever's calling compression_decode_buffer() is using registers based on what CPUID says is available, but compression_decode_buffer() was compiled to just always use YMM. If you're on an AVX-capable system, this is not an issue, since the caller will see (and use) the YMM registers. On our older systems, the caller is (presumably) using XMM or other registers (or a buffer), but compression_decode_buffer() always tries to use the YMM registers. And having written all of that, I'm now hopeful that a patch might be possible. Maybe tomorrow's shower will be more productive. ;-)


Kudos to @khronokernel for his quick kext substitution workaround. I haven't had time to try it, but assuming it works, that may become the mid-to-long-term solution...

Hopefully, I can squeeze out some time this weekend to run through all of the 12.4 code and look for red flags. It's too early to be alarmist, but 12.4 may mark the beginning of the end for pre-MP6,1 MacOS upgrades. As expected, they're demonstrating no hesitation to using AVX, and indifference to legacy systems...

EDIT: It just occurred to me that if this hypothesis is correct, then injecting a pre-12.4 AppleFSCompressionTypeZlib kext should have a reasonably high probability of success on a MP5,1 or older, but it should break on any system that natively supports AVX. If someone with an AVX-capable system and time on their hands could test that, I'd love to hear the results.

EDIT2 (from the road) - I’ve rethought this hypothesis, and now have a completely different one that I’ll try to look into tonight or tomorrow. In the meantime, I retract my guess that an AVX system will fail with the older kext. And if my new hypothesis turns out to be correct, a patch is definitely workable (if a bit ugly).
 
Last edited:
Wow, cutting edge guys. I’m on 12.4 for a few days with no kp’s yet. I’m on a 3,1 and don’t use siri/voice control though.
 
Last edited:
  • Like
Reactions: Kmilot
Maybe my experience will be useful for OpenCore developers:
MacOS 12.4 TBOLT sleep/wakeup panic
https://forums.macrumors.com/threads/testing-tb3-aic-with-mp-5-1.2143042/post-31110390
MacOS 12.4 AVAST panic
https://forums.macrumors.com/threads/activate-amd-hardware-acceleration.2180095/post-31115198
MacOS 12.4 VoiceControl panic
https://forums.macrumors.com/threads/opencore-on-the-mac-pro.2207814/post-31114807
MacOS 12.4 TBOLT wakeUp New panic
https://forums.macrumors.com/threads/opencore-on-the-mac-pro.2207814/post-31124142
My current OC 0.8.0 with edited cdf config +TBOLT3+radeonsensor+noAVX
My CMP 4.1/5.1 dual cpu xeon 5680, 96gb RAM, radeonpro w5700, nvme crucial 2tb, evo plus 1tb, TB3 alpine ridge, oc 0.8.0, mac os 12.4
 

Attachments

  • cMPpanic.rtf.zip
    2.6 KB · Views: 187
  • TBOLT_TRUEkernel.zip
    4.6 KB · Views: 184
Wow, cutting edge guys. I’m on 12.4 for a few days with no kp’s yet. I’m on a 3,1 and don’t us siri/voice control though.
I just tried activating Voice Control and it failed to dl voice from internet. I am connected. No kp.
 
Now I’m concern, it’s possible get data corruption even when the system have not issues or KP?
 
OK, being the compulsive fool I am, I stole minutes throughout the day, accessing my Mac Pro remotely through my iPhone, and managed to run some of the tools I built while chasing the race condition last year. I've come up with a patch (see below, and be sure to read the disclaimer first!). (As an aside, it's just amazing to be able to do something like this from a handheld device that could literally be anywhere. It's easy to take some of this technology for granted, but I still find it cool.)

Everything below was done remotely, and in a hurry, so none of it should be considered complete or exhaustive (or maybe even correct...).

There are at least two kexts that contain (nearly?) identical instances of _compression_decode_buffer, which contains unfiltered AVX code. The first one, we've previously identified as AppleFSCompressionTypeZlib. The other, AppleDiskImagesUDIFDiskImage, is a plugin for IOHDIXController. The latter seems to be referenced when handling some types of .DMG files.

_compression_decode_buffer is referenced by parts of the Metal subsystem, numerous dyld_shared libraries, corespeechd (likely the source of the Siri/Voice Control issues), the XPC Services Disk Image Controller, mediaanalysisd, the ContextKit Context Service, some filesystems, and various other subsystems. Also, the unfiltered AVX code in _compression_decode_buffer only gets executed under certain cirumstances; this, along with the number of varied references above, might explain the odd and inconsistent constellation of symptoms we've seen surrounding 12.4.

I also found AVX code in apfs and corecrypto (both of which we knew about, and have to date been well-behaved), as well as AppleHV, AppleMesaLib, and OSvKernDSPLib. I haven't examined any of those to look for possible problems; for now, they've just joined the "usual suspects" for examination sometime in the future.

The following patch tries to deal with both kexts (AppleFSCompressionTypeZlib and AppleDiskImagesUDIFDiskImage). It should be inserted into the Kernel/Patch section of your OpenCore config.plist (that's the same section as where the SurPlus patch went). Before anyone points this out, yes - the patches are big and ugly. (They can't all be elegant... ;-)

IMPORTANT: This patch was devised and constructed remotely, under less-than-ideal conditions. I personally have NOT tested it; I haven't had either the time or the access. @cdf was kind/brave enough to test it (thank you so much!), and he reports success. Even so, THIS PATCH SHOULD BE CONSIDERED AN "ALPHA TEST" FOR THE TIME BEING. (EDIT 28may22 - I have now tested the patch, and I consider it to be "beta" now; I'll update that status after more feedback from others.) Consider the pros and cons before applying it to a production system, or one containing precious data.

Good luck!
(@cdf - if you'd be so kind, please compare the patch below to what you tested, just to be sure I didn't munge it while posting here. Thanks in advance!)

The patch:
Code:
            <dict>
                <key>Arch</key>
                <string>x86_64</string>
                <key>Base</key>
                <string>_lzvn_decode_buffer</string>
                <key>Comment</key>
                <string>AVXpel - part 1 of 6</string>
                <key>Count</key>
                <integer>1</integer>
                <key>Enabled</key>
                <true/>
                <key>Find</key>
                <data>
                xMF9fwfEwX1/TyDEwX1/V0DEwX1/X2DEwX1/p4AAAADEwX1/r6AAAADEwX1/t8AAAADEwX1/v+AAAADEQX1/hwABAADEQX1/jyABAADEQX1/l0ABAADEQX1/n2ABAADEQX1/p4ABAADEQX1/r6ABAADEQX1/t8ABAADEQX1/v+ABAAA=
                </data>
                <key>Identifier</key>
                <string>com.apple.AppleFSCompression.AppleFSCompressionTypeZlib</string>
                <key>Limit</key>
                <integer>7168</integer>
                <key>Mask</key>
                <data>
                </data>
                <key>MaxKernel</key>
                <string>21.5.0</string>
                <key>MinKernel</key>
                <string>21.5.0</string>
                <key>Replace</key>
                <data>
                ZkEPfwdmQQ9/TxBmQQ9/VyBmQQ9/XzBmQQ9/Z0BmQQ9/b1BmQQ9/d2BmQQ9/f3BmRQ9/h4AAAABmRQ9/j5AAAABmRQ9/l6AAAABmRQ9/n7AAAABmRQ9/p8AAAABmRQ9/r9AAAABmRQ9/t+AAAABmRQ9/v/AAAACQkJCQkJCQkJCQkJA=
                </data>
                <key>ReplaceMask</key>
                <data>
                </data>
                <key>Skip</key>
                <integer>0</integer>
            </dict>
            <dict>
                <key>Arch</key>
                <string>x86_64</string>
                <key>Base</key>
                <string>_lzvn_decode_buffer</string>
                <key>Comment</key>
                <string>AVXpel - part 2 of 6</string>
                <key>Count</key>
                <integer>4</integer>
                <key>Enabled</key>
                <true/>
                <key>Find</key>
                <data>
                xMF9bwfEwX1vTyDEwX1vV0DEwX1vX2DEwX1vp4AAAADEwX1vr6AAAADEwX1vt8AAAADEwX1vv+AAAADEQX1vhwABAADEQX1vjyABAADEQX1vl0ABAADEQX1vn2ABAADEQX1vp4ABAADEQX1vr6ABAADEQX1vt8ABAADEQX1vv+ABAAA=
                </data>
                <key>Identifier</key>
                <string>com.apple.AppleFSCompression.AppleFSCompressionTypeZlib</string>
                <key>Limit</key>
                <integer>7168</integer>
                <key>Mask</key>
                <data>
                </data>
                <key>MaxKernel</key>
                <string>21.5.0</string>
                <key>MinKernel</key>
                <string>21.5.0</string>
                <key>Replace</key>
                <data>
                ZkEPbwdmQQ9vTxBmQQ9vVyBmQQ9vXzBmQQ9vZ0BmQQ9vb1BmQQ9vd2BmQQ9vf3BmRQ9vh4AAAABmRQ9vj5AAAABmRQ9vl6AAAABmRQ9vn7AAAABmRQ9vp8AAAABmRQ9vr9AAAABmRQ9vt+AAAABmRQ9vv/AAAACQkJCQkJCQkJCQkJA=
                </data>
                <key>ReplaceMask</key>
                <data>
                </data>
                <key>Skip</key>
                <integer>0</integer>
            </dict>
            <dict>
                <key>Arch</key>
                <string>x86_64</string>
                <key>Base</key>
                <string>_lzvn_decode_buffer</string>
                <key>Comment</key>
                <string>AVXpel - part 3 of 6</string>
                <key>Count</key>
                <integer>2</integer>
                <key>Enabled</key>
                <true/>
                <key>Find</key>
                <data>
                xf1vAMX9b0ggxf1vUEDF/W9YYMX9b6CAAAAAxf1vqKAAAADF/W+wwAAAAMX9b7jgAAAAxX1vgAABAADFfW+IIAEAAMV9b5BAAQAAxX1vmGABAADFfW+ggAEAAMV9b6igAQAAxX1vsMABAADFfW+44AEAAA==
                </data>
                <key>Identifier</key>
                <string>com.apple.AppleFSCompression.AppleFSCompressionTypeZlib</string>
                <key>Limit</key>
                <integer>7168</integer>
                <key>Mask</key>
                <data>
                </data>
                <key>MaxKernel</key>
                <string>21.5.0</string>
                <key>MinKernel</key>
                <string>21.5.0</string>
                <key>Replace</key>
                <data>
                Zg9vAGYPb0gQZg9vUCBmD29YMGYPb2BAZg9vaFBmD29wYGYPb3hwZkQPb4CAAAAAZkQPb4iQAAAAZkQPb5CgAAAAZkQPb5iwAAAAZkQPb6DAAAAAZkQPb6jQAAAAZkQPb7DgAAAAZkQPb7jwAAAAkJCQkA==
                </data>
                <key>ReplaceMask</key>
                <data>
                </data>
                <key>Skip</key>
                <integer>0</integer>
            </dict>
            <dict>
                <key>Arch</key>
                <string>x86_64</string>
                <key>Base</key>
                <string>_lzbitmap_decode</string>
                <key>Comment</key>
                <string>AVXpel - part 4 of 6</string>
                <key>Count</key>
                <integer>1</integer>
                <key>Enabled</key>
                <true/>
                <key>Find</key>
                <data>
                xMF9fwfEwX1/TyDEwX1/V0DEwX1/X2DEwX1/p4AAAADEwX1/r6AAAADEwX1/t8AAAADEwX1/v+AAAADEQX1/hwABAADEQX1/jyABAADEQX1/l0ABAADEQX1/n2ABAADEQX1/p4ABAADEQX1/r6ABAADEQX1/t8ABAADEQX1/v+ABAAA=
                </data>
                <key>Identifier</key>
                <string>com.apple.driver.DiskImages.UDIFDiskImage</string>
                <key>Limit</key>
                <integer>7168</integer>
                <key>Mask</key>
                <data>
                </data>
                <key>MaxKernel</key>
                <string>21.5.0</string>
                <key>MinKernel</key>
                <string>21.5.0</string>
                <key>Replace</key>
                <data>
                ZkEPfwdmQQ9/TxBmQQ9/VyBmQQ9/XzBmQQ9/Z0BmQQ9/b1BmQQ9/d2BmQQ9/f3BmRQ9/h4AAAABmRQ9/j5AAAABmRQ9/l6AAAABmRQ9/n7AAAABmRQ9/p8AAAABmRQ9/r9AAAABmRQ9/t+AAAABmRQ9/v/AAAACQkJCQkJCQkJCQkJA=
                </data>
                <key>ReplaceMask</key>
                <data>
                </data>
                <key>Skip</key>
                <integer>0</integer>
            </dict>
            <dict>
                <key>Arch</key>
                <string>x86_64</string>
                <key>Base</key>
                <string>_lzbitmap_decode</string>
                <key>Comment</key>
                <string>AVXpel - part 5 of 6</string>
                <key>Count</key>
                <integer>4</integer>
                <key>Enabled</key>
                <true/>
                <key>Find</key>
                <data>
                xMF9bwfEwX1vTyDEwX1vV0DEwX1vX2DEwX1vp4AAAADEwX1vr6AAAADEwX1vt8AAAADEwX1vv+AAAADEQX1vhwABAADEQX1vjyABAADEQX1vl0ABAADEQX1vn2ABAADEQX1vp4ABAADEQX1vr6ABAADEQX1vt8ABAADEQX1vv+ABAAA=
                </data>
                <key>Identifier</key>
                <string>com.apple.driver.DiskImages.UDIFDiskImage</string>
                <key>Limit</key>
                <integer>7168</integer>
                <key>Mask</key>
                <data>
                </data>
                <key>MaxKernel</key>
                <string>21.5.0</string>
                <key>MinKernel</key>
                <string>21.5.0</string>
                <key>Replace</key>
                <data>
                ZkEPbwdmQQ9vTxBmQQ9vVyBmQQ9vXzBmQQ9vZ0BmQQ9vb1BmQQ9vd2BmQQ9vf3BmRQ9vh4AAAABmRQ9vj5AAAABmRQ9vl6AAAABmRQ9vn7AAAABmRQ9vp8AAAABmRQ9vr9AAAABmRQ9vt+AAAABmRQ9vv/AAAACQkJCQkJCQkJCQkJA=
                </data>
                <key>ReplaceMask</key>
                <data>
                </data>
                <key>Skip</key>
                <integer>0</integer>
            </dict>
            <dict>
                <key>Arch</key>
                <string>x86_64</string>
                <key>Base</key>
                <string>_lzbitmap_decode</string>
                <key>Comment</key>
                <string>AVXpel - part 6 of 6</string>
                <key>Count</key>
                <integer>2</integer>
                <key>Enabled</key>
                <true/>
                <key>Find</key>
                <data>
                xf1vAMX9b0ggxf1vUEDF/W9YYMX9b6CAAAAAxf1vqKAAAADF/W+wwAAAAMX9b7jgAAAAxX1vgAABAADFfW+IIAEAAMV9b5BAAQAAxX1vmGABAADFfW+ggAEAAMV9b6igAQAAxX1vsMABAADFfW+44AEAAA==
                </data>
                <key>Identifier</key>
                <string>com.apple.driver.DiskImages.UDIFDiskImage</string>
                <key>Limit</key>
                <integer>7168</integer>
                <key>Mask</key>
                <data>
                </data>
                <key>MaxKernel</key>
                <string>21.5.0</string>
                <key>MinKernel</key>
                <string>21.5.0</string>
                <key>Replace</key>
                <data>
                Zg9vAGYPb0gQZg9vUCBmD29YMGYPb2BAZg9vaFBmD29wYGYPb3hwZkQPb4CAAAAAZkQPb4iQAAAAZkQPb5CgAAAAZkQPb5iwAAAAZkQPb6DAAAAAZkQPb6jQAAAAZkQPb7DgAAAAZkQPb7jwAAAAkJCQkA==
                </data>
                <key>ReplaceMask</key>
                <data>
                </data>
                <key>Skip</key>
                <integer>0</integer>
            </dict>
 
Last edited:
@Syncretic, the patch indeed corresponds to what I tested earlier. Again: absolutely amazing work!

For those concerned about the patch's fugliness (which really shouldn't be an issue given this great achievement), try passing your config through the following command:

plutil -convert xml1 config.plist

It will fix the formatting and create nice standard size blocks for the patch's find and replace data.

By the way: "AVXpel". Clever!
 
I have inserted the new patch in the Kernel - Patch array of my OpenCore config.plist file (OCLP 0.4.5) (attached below).

I have checked it:

"plutil -convert xml1 config.plist && plutil config.plist" :
config.plist: OK

I cannot boot. I get a frozen apple with no progress bar.

I have installed OCLP 0.4.6 and Monterey 12.4 is working again with no issues (as before, with 0.4.5). I don’t use voice control.


Maybe I did something wrong

EDITED: It was my mistake inserting the patch. Read below!

Thank you @Syncretic again!
 

Attachments

  • config.plist.zip
    8.4 KB · Views: 406
Last edited:
I have inserted the new patch in the Kernel - Patch array of my OpenCore config.plist file (OCLP 0.4.5) (attached below).

I have checked it:

"plutil -convert xml1 config.plist && plutil config.plist" :
config.plist: OK

I cannot boot. I get a frozen apple with no progress bar.

I have installed OCLP 0.4.6 and Monterey 12.4 is working again with no issues (as before, with 0.4.5). I don’t use voice control.

Maybe I did something wrong

Thank you @Syncretic again!
Try to put the patches into the Kernel->Patch section (side by side with the SurPlus patches), not directly into the list of kernel extensions ...

Either use 0.4.5 with these patches or the solution @khronokernel delivered.
 
Try to put the patches into the Kernel->Patch section (side by side with the SurPlus patches), not directly into the list of kernel extensions ...

Either use 0.4.5 with these patches or the solution @khronokernel delivered.
Now I see my mistake. Thank you!
I have just inserted the new patch in the Kernel - Patch array, just below the SurPlus Patch of my OpenCore config.plist file (OCLP 0.4.5) (attached below).

My Monterey 12.4 has booted correctly and I don't see any issue. Everything seems to work well.
 

Attachments

  • config.plist.zip
    8.4 KB · Views: 282
Last edited:
...
(They're using the YMM registers as temporary storage for a 512-byte chunk of something; there are 7 sections of code moving data in and out of all of the YMM registers. They're not doing any manipulations, just shoveling data in and out.) For the short term (and possibly the long term), someone should investigate the feasibility of using an older version - I had the 12.2 version at hand, and it does not use AVX.
...
When I can find some time, a patch for this may be possible, but it will be a bit tricky because the patch will not only need to modify the code, it will need to create (at minimum) a new 512-byte data space as well. The problem is, this may well just be the tip of the iceberg, since Apple no longer has any reason to avoid using AVX.
...

Is there code in QEMU/UTM emulator that could be studied for emulation solutions in the long run?
 
Last edited:
I upgraded to 12.4 the day it became available and have not yet once had an issue. I've been watching this issue (kernel panics, AppleFSCompressionTypeZlib, avx). Has anyone determined why some people seem to have the issue and others do not? I'm trying to decide if I should add Syncretic's patch.
 
  • Like
Reactions: jgleigh
You can probably wait if you want. It's being included in the next OCLP release. I've also not had any issues with 12.4.
 
@Syncretic, the patch indeed corresponds to what I tested earlier. Again: absolutely amazing work!

For those concerned about the patch's fugliness (which really shouldn't be an issue given this great achievement), try passing your config through the following command:

plutil -convert xml1 config.plist

It will fix the formatting and create nice standard size blocks for the patch's find and replace data.

By the way: "AVXpel". Clever!
I had kernel panic due to AVAST antivirus without "noAVX" kext and Syncretic patch
AVAST support wrote: Based on the attached Kernel panic log it seems this issue is related to Avast Security (antivirus)
I deleted Avast antivirus. And worked as usual without any panic
Few days ago I added noAVX kext. no panic.
Yesterday I added patch made by syncretic. No panic 24 hours.
This morning I reinstalled AVAST antivirus. No panic
EDITED
I found a strange thing in the dock&menu bar, when I booted the system,
safari randomly disappeared several times, and today chrome disappeared.
I had to add them to the dock again.
I noticed this only when I installed the kext, before that I did not observe such behavior
My current OC 0.8.0 with edited cdf config +TBOLT3+radeonsensor+noAVX
My CMP 4.1/5.1 dual cpu xeon 5680, 96gb RAM, radeonpro w5700, nvme crucial 2tb, evo plus 1tb, TB3 alpine ridge, oc 0.8.0, mac os 12.4
 
Last edited:
OK, I've had a chance to review and test my 12.4 patch, and I think it's safe to move it to "beta" status; once some more people confirm that it's stable and has no side-effects, we can remove the "beta" label and just call it a patch.

This patch should have zero side-effects, so any observed differences are almost certainly unrelated to this patch.

Note that the kernel version in the patch is currently limited to only Monterey 12.4 (Kernel 21.5.0). This problem seems likely to persist into future kernels, but the compiler could easily choose different registers, even if the source code is identical. That means each iteration of MacOS will need to be examined, and might (or might not) need a separate patch. If no change is required, the MaxKernel value can simply be increased appropriately. If each released version of Monterey uses a slightly different register set, a config.plist capable of booting various versions of Monterey will be littered with these patches (ugh!).

Note that as far as I can tell, @khronokernel's NoAVX kext substitution solution is equally effective, and probably has the same amount of MacOS version-dependency as my patch (i.e. future MacOS releases may or may not work with the older kext that @khronokernel is using). The difference there is that my patch is adaptable to future MacOS changes, but the older kext most likely is not. Regardless, use the solution that best fits your needs.

Is there code in QEMU/UTM emulator that could be studied for emulation solutions in the long run?

Emulation is straightforward. My AVX emulator has been complete for months; the problem I encountered is integrating it with MacOS. Surprisingly, the integration is more complicated than the emulation itself. I've been stuck trying to solve the integration issues for a long time now...

I upgraded to 12.4 the day it became available and have not yet once had an issue. I've been watching this issue (kernel panics, AppleFSCompressionTypeZlib, avx). Has anyone determined why some people seem to have the issue and others do not? I'm trying to decide if I should add Syncretic's patch.

As I noted in my earlier post, the AVX code only gets executed under the right (or, perhaps more accurately, wrong) circumstances. If your use case somehow consistently avoids those circumstances, more power to you - but if/when it happens, you'll get a kernel panic, and probably lose whatever you were working on at the time. The patch I posted should be safe, whether you encounter the problem or not. (The choice of whether or not to apply the patch is entirely yours, of course.)



The rest of this post offers a bit of technical insight to the problem. If you're not interested in the gory details, feel free to skip to the next post in the thread, or to something more entertaining. Maybe even go outside for a while.

The function _compression_decode_buffer looks at the Zlib strategies in use, and if LZ is involved, it eventually invokes _lzbitmap_decode, which contains unfiltered AVX instructions. As noted earlier, both the AppleFSCompressionTypeZlib and AppleDiskImagesUDIFDiskImage kexts contain apparently-identical copies of _compression_decode_buffer, but the AppleDiskImagesUDIFDiskImage instance contains more public symbols (e.g. the AppleFSCompressionTypeZlib instance doesn't contain the symbol _lzbitmap_decode, despite having the same code).

The YMM code confused me for a while, causing me to overlook the most obvious reason for its existence. The YMM save/restore code isn't there for argument-passing or any other processing, it's simply doing a "callee-save" on the XMM registers that _lzbitmap_decode uses. I'm guessing that the YMM save/restore is either a macro or an inlined library function. At the beginning of _lzbitmap_decode, all 16 YMM registers are saved on the stack, then in no fewer than six separate locations, all 16 YMM registers are restored, either from [R15] or [RAX]. What initially looked to me like manipulation of the values on the stack turned out not to involve the YMM storage area, and both R15 and RAX both get loaded with the address of the YMM storage area prior to restoring the YMM registers, so all of the instructions involving the YMM registers are purely for save/restore, and really only for the XMM (low 128 bit) portion of the registers, since that's all the code uses.
Code:
// Original [RAX] restore code (AVX) (115 bytes):
c5fd6f00                vmovdqa (%rax), %ymm0
c5fd6f4820              vmovdqa 0x20(%rax), %ymm1
c5fd6f5040              vmovdqa 0x40(%rax), %ymm2
c5fd6f5860              vmovdqa 0x60(%rax), %ymm3
c5fd6fa080000000        vmovdqa 0x80(%rax), %ymm4
c5fd6fa8a0000000        vmovdqa 0xa0(%rax), %ymm5
c5fd6fb0c0000000        vmovdqa 0xc0(%rax), %ymm6
c5fd6fb8e0000000        vmovdqa 0xe0(%rax), %ymm7
c57d6f8000010000        vmovdqa 0x100(%rax), %ymm8
c57d6f8820010000        vmovdqa 0x120(%rax), %ymm9
c57d6f9040010000        vmovdqa 0x140(%rax), %ymm10
c57d6f9860010000        vmovdqa 0x160(%rax), %ymm11
c57d6fa080010000        vmovdqa 0x180(%rax), %ymm12
c57d6fa8a0010000        vmovdqa 0x1a0(%rax), %ymm13
c57d6fb0c0010000        vmovdqa 0x1c0(%rax), %ymm14
c57d6fb8e0010000        vmovdqa 0x1e0(%rax), %ymm15

A straightforward patch would be to simply replace all the VMOVDQA YMM# instructions with their MOVDQA XMM# counterparts. However, the VEX prefixes used by AVX instructions are more compact than the older XMM prefixes, and if we use the same addresses for saving/restoring, the putative patch code is actually larger than the AVX code, and won't fit.
Code:
// Direct replacement (123 bytes):
660f6f00                movdqa  (%rax), %xmm0
660f6f4820              movdqa  0x20(%rax), %xmm1
660f6f5040              movdqa  0x40(%rax), %xmm2
660f6f5860              movdqa  0x60(%rax), %xmm3
// Starting at offset 0x80, switches to 32-bit offsets
660f6fa080000000        movdqa  0x80(%rax), %xmm4
660f6fa8a0000000        movdqa  0xa0(%rax), %xmm5
660f6fb0c0000000        movdqa  0xc0(%rax), %xmm6
660f6fb8e0000000        movdqa  0xe0(%rax), %xmm7
// Starting with XMM8, an additional prefix byte is required
66440f6f8000010000      movdqa  0x100(%rax), %xmm8
66440f6f8820010000      movdqa  0x120(%rax), %xmm9
66440f6f9040010000      movdqa  0x140(%rax), %xmm10
66440f6f9860010000      movdqa  0x160(%rax), %xmm11
66440f6fa080010000      movdqa  0x180(%rax), %xmm12
66440f6fa8a0010000      movdqa  0x1a0(%rax), %xmm13
66440f6fb0c0010000      movdqa  0x1c0(%rax), %xmm14
66440f6fb8e0010000      movdqa  0x1e0(%rax), %xmm15

However, we can improve the situation - we don't need to store the XMM registers at the 32-byte offsets used by the YMM registers, we can store them at 16-byte offsets (using only half of the reserved stack space). Doing this allows 4 additional MOVDQA instructions to use 8-bit offsets, resulting in code that's smaller than the original AVX code (and thus requires some NOPs at the end to fill out the space).
Code:
// Condensed replacement (111 bytes):
660f6f00                movdqa  (%rax), %xmm0
660f6f4810              movdqa  0x10(%rax), %xmm1
660f6f5020              movdqa  0x20(%rax), %xmm2
660f6f5830              movdqa  0x30(%rax), %xmm3
// XMM4-7 still using 8-bit offsets
660f6f6040              movdqa  0x40(%rax), %xmm4
660f6f6850              movdqa  0x50(%rax), %xmm5
660f6f7060              movdqa  0x60(%rax), %xmm6
660f6f7870              movdqa  0x70(%rax), %xmm7
// Starting with offset 0x80/XMM8, 32-bit offset + extra prefix byte
66440f6f8080000000      movdqa  0x80(%rax), %xmm8
66440f6f8890000000      movdqa  0x90(%rax), %xmm9
66440f6f90a0000000      movdqa  0xa0(%rax), %xmm10
66440f6f98b0000000      movdqa  0xb0(%rax), %xmm11
66440f6fa0c0000000      movdqa  0xc0(%rax), %xmm12
66440f6fa8d0000000      movdqa  0xd0(%rax), %xmm13
66440f6fb0e0000000      movdqa  0xe0(%rax), %xmm14
66440f6fb8f0000000      movdqa  0xf0(%rax), %xmm15
(Of course, if we we were really pressed for code space, and wanted to get fancy, we could just use fxsave64 and fxrstor64, since we have a 512-byte area on the stack to work with. However, that would involve more analysis, since fxsave64/fxrstor64 are considered floating-point instructions, which might cause complications.)

Ultimately, the patch(es - technically, there are six of them per kext, and two kexts' worth of patches) just switches from saving/restoring YMM registers to saving/restoring XMM registers. One nice thing we don't have to worry about is preserving the upper 128 bits of the YMM registers - when using AVX instructions on an XMM register, the upper bits of the corresponding YMM/ZMM register are zeroed (much like manipulating the 32-bit registers in 64-bit mode, e.g. movl $9,%eax clears the upper 32 bits of %RAX). However, when using non-AVX instructions, the upper bits of the YMM/ZMM registers are left undisturbed. Thus, we can safely save/restore only the XMM registers without worrying about the YMM/ZMM register bits.
 
Last edited:
As I noted in my earlier post, the AVX code only gets executed under the right (or, perhaps more accurately, wrong) circumstances. If your use case somehow consistently avoids those circumstances, more power to you - but if/when it happens, you'll get a kernel panic, and probably lose whatever you were working on at the time. The patch I posted should be safe, whether you encounter the problem or not. (The choice of whether or not to apply the patch is entirely yours, of course.)
Ok, I always intended on implementing your fix, just was under the impression it was still "alpha" and was trying to decide if adding it was more or less risky than not at this point as I was not experiencing the issue.

I went ahead and added it. You seem pretty confident in it and I trust you. If it wasn't for your work I probably would have finally retired my cMPs and bought another mac! =)
 
  • Like
Reactions: Enricote
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.