Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
One nice thing we don't have to worry about is preserving the upper 128 bits of the YMM registers - when using AVX instructions on an XMM register, the upper bits of the corresponding YMM/ZMM register are zeroed (much like manipulating the 32-bit registers in 64-bit mode, e.g. movl $9,%eax clears the upper 32 bits of %RAX). However, when using non-AVX instructions, the upper bits of the YMM/ZMM registers are left undisturbed. Thus, we can safely save/restore only the XMM registers without worrying about the YMM/ZMM register bits.
What an excellent piece of reverse engineering. I really admire your persistence and the motivation to even explain the stuff in detail to a wider community.
The concern about the upper part of the YMM/ZMM registers is not a real one in my mind. We would only want to apply this patch to machines without AVX and to my understanding those only contain the XMM registers anyway.

Regarding the emulation of AVX I would have thought the integration would just be about changing the address of the routine responding to the exception in the table of interrupt vectors. That would be below the OS level. However with all the security features and complexity of the modern x86 architecture, there are obviously things getting in the way.
 
Ok, I always intended on implementing your fix, just was under the impression it was still "alpha" and was trying to decide if adding it was more or less risky than not at this point as I was not experiencing the issue.

I went ahead and added it. You seem pretty confident in it and I trust you. If it wasn't for your work I probably would have finally retired my cMPs and bought another mac! =)

After reading your comment, I realize that what I wrote sounds somewhat snarky or sarcastic. That wasn't my intent - I apologize. I appreciate your kind words; I hope you weren't offended by my careless ones.

The concern about the upper part of the YMM/ZMM registers is not a real one in my mind. We would only want to apply this patch to machines without AVX and to my understanding those only contain the XMM registers anyway.

Regarding the emulation of AVX I would have thought the integration would just be about changing the address of the routine responding to the exception in the table of interrupt vectors. That would be below the OS level. However with all the security features and complexity of the modern x86 architecture, there are obviously things getting in the way.

You are correct that non-AVX systems only contain the XMM registers and don't really care about the YMM/ZMM bits. What I had in mind (but failed to say; I really must improve my proof-reading of these posts) is that this patch should be safe to apply even on an AVX system (e.g. if one moves a preconfigured OC setup to an AVX-compatible system and forgets/doesn't know to remove the patch, it shouldn't matter).

My AVX emulator (and its progenitor, MouSSE) work much as you describe. They hook the hardware #UD vector (below the OS) and interpret the missing instructions on the fly. MouSSE is very straightforward, since it only has to implement simple instructions (take some data from a register on the interrupt stack, manipulate it, put the result in another register on the interrupt stack, fix the stack so it looks like the #UD never occurred, then Bob's your uncle). Handling AVX is complicated by the fact that AVX involves registers that don't physically exist on the processor. That means keeping a set of YMM (or, ultimately, ZMM) registers in memory, and keeping them in sync with the physical XMM registers. Though it is complex, that should be fairly straightforward, since the OS context switches also rely on emulated instructions (XSAVE64/XRSTOR64), so all relevant YMM changes should occur within the emulator. The problems are manifold, though: first, parts of the MacOS context switch mechanism rely on the CPUID-reported features to choose instructions, while other parts choose instructions based on flags saved within the context itself. The emulator can easily spoof the CPUID bits, but since MacOS achieves some degree of initialization (some number of threads/processes exist) before the emulator gets initialized, there are "legacy threads" that contain the wrong flags. That's messy to clean up after the fact. The bigger problem, though, is that parts of MacOS "cheat" and manipulate saved contexts. Ideally, a context switch happens at a low level, and is effectively invisible to processes and threads. Logically, you just save one thread's state (including registers), then load another thread's state, then continue processing. However, some other pieces of the kernel will examine and modify the saved states of some threads outside of a context switch, and that wreaks havoc with the emulator. Trying to accommodate each of those occurrences is like plugging leaks with your fingers and toes, and what should have been a very narrowly-hooked emulation turns into an ugly mutant octopus with tentacles reaching all over the place (and still failing to maintain register integrity). And addressing those problems is often MacOS-version-specific, so creating general-purpose fixes becomes quite complex - every dot-rev of MacOS might potentially require its own tweaks, since not everything can be generalized. It's a challenge...

There are additional complications, but I think you see the difficulties. MacOS makes the reasonable assumption that the hardware is consistent (i.e. whatever CPU capabilities are present at power-on exist unchanged until power-off), and trying to change those capabilities mid-stream without hardware support is tricky, to say the least. I'm still hoping to get there (although Apple's shift away from Intel may make the entire project moot before then).
 
Unfortunately I cannot replicate the crash locally, I've implemented a downgrade for the Zlib kext in a new branch of OCLP.
Below binary is from zlib-test branch commit 67a78ec, you can verify the kext was properly replaced by seeing whether com_apple_AppleFSCompression_NoAVXFSCompressionTypeZlib appears in IOService. If it applied correctly, you'll see that and not the original AppleFS Zlib entry.
Additionally a copy of the NoAVX variant for those who want to install it manually:

Verified personally on a Macmini4,1 that the old Zlib is in effect, however would appreciate more tests to see if this fix resolved the issue

View attachment 2009214
Thank you so much for providing OCLP 0.46 so quickly. I have no issues since I have installed it last week Thursday morning. System is perfectly stable, no crashes anymore with MACOS 12.5 beta and my iMac11,3.
 
Yesterday Apple introduced macOS Ventura to the public and made the first Beta available to developers. After some installation attempts the current consensus is this:

One needs AVX2 support and a Haswell+ CPU to boot Ventura beta.

This covers (IMHO) Apple Mac systems from Late 2013 on. The journey goes on and the road blocks become bigger. Just redistributing information here…
 
One needs AVX2 support and a Haswell+ CPU to boot Ventura beta.
That's quite a claim. To avoid the duplication of efforts, I think further substantiation would be fitting here. In particular: Is this consensus simply based on which unsupported Macs Ventura currently seems installable? If that's the case (and I hope it is), then maybe there's another (more encouraging) explanation.
 
That's quite a claim. To avoid the duplication of efforts, I think further substantiation would be fitting here. In particular: Is this consensus simply based on which unsupported Macs Ventura currently seems installable? If that's the case (and I hope it is), then maybe there's another (more encouraging) explanation.
Just citing here: the below haswell dyld cache being gone
Screen_Shot_2022-06-06_at_10.27.30_PM.png

P.S.:
Usually I am not the guy pressing the panic button early on. But I would really love to be corrected here.
Installations failed on all pre Haswell systems so far, could not investigate my own verbose output in detail. Since Haswell introduced some more CPU instruction than AVX2 we might not even come away with emulating these, only.
 
Last edited:
As usual, I'm juggling 17 things plus a wedding this weekend, so I don't have time to address this properly yet. However, I can report that my preliminary scans (which are still running as I type this) of the Ventura Beta 1 are finding a lot of AVX+ code - more than in previous releases, and spread among more components (including preboot, video drivers (including AMD), Finder, Dock, and other essential areas). I haven't yet looked at whether AVX is really being exercised or if it's just "bogus" usage (like we recently saw in AppleFSCompressionTypeZlib) that might be easily patched; I haven't looked to see how much of this code is filtered (doesn't need to be patched) or unfiltered (would need work). I also haven't looked to see if the code in question is AVX, AVX2, AVX512, BMI1, BMI2, F16C, FMA, or any of the other newer instruction set extensions. And as always, it will take a while to understand the conditions under which the code gets executed, meaning there may be anecdotal cases of success where everyone else sees failure, or vice-versa. First-blush, it looks like a minefield to me.

I probably won't get to spend much time on this for at least another week or two; hopefully, others will make advances in the meantime. In fact, given how the Monterey betas went, I may just wait for a much later beta or the GA release before giving it detailed scrutiny.

For those of you who are just itching to get Ventura running on your cMP, I suggest you temper your expectations, and prepare to be very patient. This one looks like it will be an adventure (possibly even making Monterey the "final frontier" for the cMP)...

EDIT: My preliminary scan just ended. It found 1,680,140 AVX+ instructions in 172 files. These early numbers are imprecise (might be a bit low or high), but they give you a sense of the magnitude of this problem.
 
Last edited:
How feasible would it be to create an AVX-like kext that will fool the Mac into thinking it has an AVX-compatible processor and then intercept each AVX+ call and translate it into its non-AVX, more complex, equivalent? If feasible at all, would the negative impact of such a live translation render the desire to keep a cMP working under Ventura impractical?
 
How feasible would it be to create an AVX-like kext that will fool the Mac into thinking it has an AVX-compatible processor and then intercept each AVX+ call and translate it into its non-AVX, more complex, equivalent? If feasible at all, would the negative impact of such a live translation render the desire to keep a cMP working under Ventura impractical?
As I noted previously, such emulation is not extremely difficult (it's already essentially complete), but there are issues with integrating it into MacOS. And those issues are present when trying to integrate with a relatively "friendly" version of MacOS, where the kernel and kexts are not using unfiltered AVX (i.e. they're well-behaved, and won't use AVX unless/until the "AVX available" CPUID bits get set). From the very brief glance I got at Ventura, it would be exceedingly difficult to ensure that a kext-based solution would be fully initialized prior to any kernel/kext AVX instructions being executed; without that, we'll see panics and hangs. And timing aside, the same integration issues persist (and might even be exacerbated) in MacOS 13.

I'm going to take the Ventura dumps with me as I travel, and if I can find some uninterrupted downtime, maybe I can at least narrow the scope of this. It's conceivable that a few well-placed patches could solve a lot of problems, but I'm not (yet) optimistic. And I'm currently fairly pessimistic about reaching a point where we can say "effectively all of Ventura will run on a MP5,1." Time will tell. (If only Amazon would deliver those 36-hour days I ordered... ;-)
 
Hi @Syncretic,
Considering the level at which you hook the kernel's trap functions (i.e. trampoline setup is done in your kext itself, no Lilu reliance), it may be worth looking into registering your kernel extension as a KEC (Kernel External Component) so that its kmod routines run very near the kernel startup (at the same time as e.g. CoreCrypto). This ties your kext to having to be injected via OpenCore because KernelManagement will not allow the kext, however, it is the best option as far as I know.
 
As I noted previously, such emulation is not extremely difficult (it's already essentially complete), but there are issues with integrating it into MacOS. And those issues are present when trying to integrate with a relatively "friendly" version of MacOS, where the kernel and kexts are not using unfiltered AVX (i.e. they're well-behaved, and won't use AVX unless/until the "AVX available" CPUID bits get set). From the very brief glance I got at Ventura, it would be exceedingly difficult to ensure that a kext-based solution would be fully initialized prior to any kernel/kext AVX instructions being executed; without that, we'll see panics and hangs. And timing aside, the same integration issues persist (and might even be exacerbated) in MacOS 13.

I'm going to take the Ventura dumps with me as I travel, and if I can find some uninterrupted downtime, maybe I can at least narrow the scope of this. It's conceivable that a few well-placed patches could solve a lot of problems, but I'm not (yet) optimistic. And I'm currently fairly pessimistic about reaching a point where we can say "effectively all of Ventura will run on a MP5,1." Time will tell. (If only Amazon would deliver those 36-hour days I ordered... ;-)
Can the emulation be added before macOS loads? Create a EFI runtime driver like OpenCore does for NVRAM protection. I don't know what the kernel does to exception vectors when it loads.
 
  • Like
Reactions: iMac-iPad
I wonder if AVX patching is also needed for com.apple.inputmethod.SCIM and other IME related things ? It's constantly spitting out EXC_BAD_INSTRUCTION (SIGILL) and the IME is basically unusable at this moment in 14.2. Basically crashes from time to time after typing one or two characters.
Installing a third-party IME would do the trick but personally I don't like such workarounds.
 
  • Like
Reactions: JohnD and h9826790
OK, being the compulsive fool I am, I stole minutes throughout the day, accessing my Mac Pro remotely through my iPhone, and managed to run some of the tools I built while chasing the race condition last year. I've come up with a patch (see below, and be sure to read the disclaimer first!). (As an aside, it's just amazing to be able to do something like this from a handheld device that could literally be anywhere. It's easy to take some of this technology for granted, but I still find it cool.)

Everything below was done remotely, and in a hurry, so none of it should be considered complete or exhaustive (or maybe even correct...).

There are at least two kexts that contain (nearly?) identical instances of _compression_decode_buffer, which contains unfiltered AVX code. The first one, we've previously identified as AppleFSCompressionTypeZlib. The other, AppleDiskImagesUDIFDiskImage, is a plugin for IOHDIXController. The latter seems to be referenced when handling some types of .DMG files.

_compression_decode_buffer is referenced by parts of the Metal subsystem, numerous dyld_shared libraries, corespeechd (likely the source of the Siri/Voice Control issues), the XPC Services Disk Image Controller, mediaanalysisd, the ContextKit Context Service, some filesystems, and various other subsystems. Also, the unfiltered AVX code in _compression_decode_buffer only gets executed under certain cirumstances; this, along with the number of varied references above, might explain the odd and inconsistent constellation of symptoms we've seen surrounding 12.4.

I also found AVX code in apfs and corecrypto (both of which we knew about, and have to date been well-behaved), as well as AppleHV, AppleMesaLib, and OSvKernDSPLib. I haven't examined any of those to look for possible problems; for now, they've just joined the "usual suspects" for examination sometime in the future.

The following patch tries to deal with both kexts (AppleFSCompressionTypeZlib and AppleDiskImagesUDIFDiskImage). It should be inserted into the Kernel/Patch section of your OpenCore config.plist (that's the same section as where the SurPlus patch went). Before anyone points this out, yes - the patches are big and ugly. (They can't all be elegant... ;-)

IMPORTANT: This patch was devised and constructed remotely, under less-than-ideal conditions. I personally have NOT tested it; I haven't had either the time or the access. @cdf was kind/brave enough to test it (thank you so much!), and he reports success. Even so, THIS PATCH SHOULD BE CONSIDERED AN "ALPHA TEST" FOR THE TIME BEING. (EDIT 28may22 - I have now tested the patch, and I consider it to be "beta" now; I'll update that status after more feedback from others.) Consider the pros and cons before applying it to a production system, or one containing precious data.

Good luck!
(@cdf - if you'd be so kind, please compare the patch below to what you tested, just to be sure I didn't munge it while posting here. Thanks in advance!)
maybe it will be useful for syncretic
a. with OC 0.8.1 "mds process" has been indexing continuously for an hour
CPU performance dropped by 20% due to continuous indexing
Standard time for indexing was 3min or less
b. When opening my Archicad project, Archicad hung for about 1 minute (activity monitor shows archicad hanging), and only after that it started opening the file
I found what's wrong:
I removed syncretic's patch and inserted NoAVX kext, everything started working as it should
probably the patch does not work correctly with my system disk - Crucial P2 2Tb NVME ssd
(Earlier I resized and shrinked all my Apple_APFS ⁨Containers by 10% on each disk to get a raw unallocated space)


I tested the work on different configurations (to make sure it's not a syntax error)
1. 0.8.0 cdf with patch
- mds time indexing 3min
- archicad long opening
2. 0.8.0 Martin's with patch
-mds 3min
- archicad long time opening
3. 0.8.0 cdf with kext OK
4. 0.8.0 Martin's with kext OK
5. 0.8.1 cdf with patch
-mds continuous indexing
- archicad long opening
6. 0.8.1 cdf with kext OK
there are my configs in attachment (I renamed each config for better understanding)
SOLVED
thanks Dayo for the right direction
I reverted all my ssd to the apfs standard (no more shrink)
syncretic patch is now working in the full version
archicad also loads projects normally

My CMP 4.1/5.1 dual cpu xeon 5680, 96gb RAM, radeonpro w5700, nvme crucial 2tb, evo plus 1tb, TB3 alpine ridge, oc 0.8.0, mac os 12.4
 

Attachments

  • Config.zip
    32.3 KB · Views: 225
Last edited:
  • Like
Reactions: Ausdauersportler
maybe it will be useful for syncretic
a. with OC 0.8.1 "mds process" has been indexing continuously for an hour
CPU performance dropped by 20% due to continuous indexing
Standard time for indexing was 3min or less
b. When opening my Archicad project, Archicad hung for about 1 minute (activity monitor shows archicad hanging), and only after that it started opening the file
I found what's wrong:
I removed syncretic's patch and inserted NoAVX kext, everything started working as it should
probably the patch does not work correctly with my system disk - Crucial P2 2Tb NVME ssd
(Earlier I resized and shrinked all my Apple_APFS ⁨Containers by 10% on each disk to get a raw unallocated space)


I tested the work on different configurations (to make sure it's not a syntax error)
1. 0.8.0 cdf with patch
- mds time indexing 3min
- archicad long opening
2. 0.8.0 Martin's with patch
-mds 3min
- archicad long time opening
3. 0.8.0 cdf with kext OK
4. 0.8.0 Martin's with kext OK
5. 0.8.1 cdf with patch
-mds continuous indexing
- archicad long opening
6. 0.8.1 cdf with kext OK
there are my configs in attachment (I renamed each config for better understanding)
My CMP 4.1/5.1 dual cpu xeon 5680, 96gb RAM, radeonpro w5700, nvme crucial 2tb, evo plus 1tb, TB3 alpine ridge, oc 0.8.0, mac os 12.4
So, is this an AVX patch that will allow programs to run that need an AVX-compatible processor?
 
  • Like
Reactions: Kmilot
So, is this an AVX patch that will allow programs to run that need an AVX-compatible processor?
There's a patch and and kext, both of which try to 'fix' Monterey 12.4 so that it does not require avx. The patch patches the kernel, the kext replaces a system library with an older version. (It is all discussed in this thread, particularly here and here.)
 
  • Like
Reactions: Gustav Holdoff
Can the patch be used with Big Sur? Unfortunately I have software that isn't Monterey compatible yet. Many Audio programs lag behind for a year or 2..
 
I removed syncretic's patch and inserted NoAVX kext, everything started working as it should
probably the patch does not work correctly with my system disk - Crucial P2 2Tb NVME ssd
(Earlier I resized and shrinked all my Apple_APFS ⁨Containers by 10% on each disk to get a raw unallocated space)
The AVXpel patch is in two (2) parts across six (6) individual patches.
  1. Patches 1 to 3 work on AppleFSCompressionTypeZlib.kext
  2. Patches 4 to 6 work on UDIFDiskImage.kext
Your issue appears to be linked to the patched UDIFDiskImage.kext. @Syncretic mentioned that UDIFDiskImage.kext is used when loading some DMG images. So either the CAD program or something else loads these and somehow doesn't like the patching of the kext or shrinking the APFS containers makes this happen when something else would have been done otherwise.

You can test things a bit by removing the old AppleFSCompressionTypeZlib.kext (AKA NoAVX kext) and reinserting only the first three (3) patches from AVXpel (or inserting them all and disabling Patches 4 to 6)
 
  • Like
Reactions: cdf and Bmju
The AVXpel patch is in two (2) parts across six (6) individual patches.
  1. Patches 1 to 3 work on AppleFSCompressionTypeZlib.kext
  2. Patches 4 to 6 work on UDIFDiskImage.kext
Your issue appears to be linked to the patched UDIFDiskImage.kext. @Syncretic mentioned that UDIFDiskImage.kext is used when loading some DMG images. So either the CAD program or something else loads these and somehow doesn't like the patching of the kext or shrinking the APFS containers makes this happen when something else would have been done otherwise.

You can test things a bit by removing the old AppleFSCompressionTypeZlib.kext (AKA NoAVX kext) and reinserting only the first three (3) patches from AVXpel (or inserting them all and disabling Patches 4 to 6)
I disabled 3 parts of the patch (4,5 and 6), it didn't work out well:
indeed, the archicad began to load correctly, but an unknown kernel task appeared at additional 50%
(before that, I had a kernel task by 45% due to an unknown navi usb controller in my GPU radeon proW5700),
now the kernel task has become 90%, which reduced the CPU performance (seen not only by the speed of working with programs - scene loading time, but also seen by geekbench5
cpu score for 0.8.1 with syncretic's patch with kernel task due to radeonpro w5700 is 5983
0.8.0 (with my other configs) are about 6400 (with radeon pro)
0.8.1 with reduced syncretic patch is 6100 (with radeon pro)
for mojave with RX570 =6686
for monterey 12.4 with rx570 =6756
screenshot attached
GB5scores.jpg


even returning to my old configurations did not remove this new problem - reset SMC and NVRAM did not help, the replacement of the card with rx570 did not help either - the kernel task was present with all my old configs-
probably three parts of the patch changed something in the system
I had to reboot from spare mojave disk with rx570 card
then the kernel disappeared, and after returning to open core 8.1 with noAVX kext, I no longer had a kernel task with RX570
but it is obvious that after replacing it with RadeonPro, I still have my standard kernel task 45% with which I have been working for almost a year (due to unknown NAVI USB controller)

Still, for reasons unknown to me, the syncretic's patch does not work correctly with my shrinked Crucial P2 2Tb NVME ssd

My CMP 4.1/5.1 dual cpu xeon 5680, 96gb RAM, radeonpro w5700, nvme crucial 2tb, evo plus 1tb, TB3 alpine ridge, oc 0.8.0, mac os 12.4
 

Attachments

  • illustration.pdf.zip
    2.6 MB · Views: 152
Last edited:
patch does not work correctly with my shrinked Crucial P2 2Tb NVME ssd
Thanks.

Can't say I understood much of what you wrote (especially the kernel task stuff and whatever this is apparently surviving reboots into different configs etc. Could be indexing going on I suppose ... which would settle later) but the summation above is clear enough.

Wouldn't have expected the shrinkage to affect such patching but maybe it does and the bytes are somehow different with that done which perhaps throws the patching off or something along those lines.

Perhaps @Syncretic can chip in at some point or maybe someone else with a similar shrunken APFS container can do. This could help confirm whether that is in fact a factor.

Best stick with the kext swap since that appears to work for you though.
 
  • Like
Reactions: Gustav Holdoff
Thanks.

Can't say I understood much of what you wrote (especially the kernel task stuff and whatever this is apparently surviving reboots into different configs etc. Could be indexing going on I suppose ... which would settle later) but the summation above is clear enough.

Wouldn't have expected the shrinkage to affect such patching but maybe it does and the bytes are somehow different with that done which perhaps throws the patching off or something along those lines.

Perhaps @Syncretic can chip in at some point or maybe someone else with a similar shrunken APFS container can do. This could help confirm whether that is in fact a factor.

Best stick with the kext swap since that appears to work for you though.
if the shrink is useless or harmful, i'd rather try to revert the ssd to the apfs standard
I also got a file with explanations and illustrations of what a kernel task is in my original post, if you are interested, you can familiarize yourself with my research results, which I have been doing for almost two years since the version of opencore 0.5.9 and os catalina
SOLVED
thanks Dayo for the right direction
I reverted all my ssd to the apfs standard (no more shrink)
syncretic patch is now working in the full version
archicad also loads projects normally
 
Last edited:
Even if there is a way to patch the os for / emulate AVX, won't those parts of the OS or apps that require it (AVX) run excruciatingly slowly?
 
Even if there is a way to patch the os for / emulate AVX, won't those parts of the OS or apps that require it (AVX) run excruciatingly slowly?
What i'd expect is that we'd not see an advertised performance gain / improvement that is typically advertised as one of the reasons / benefits of updating to a new version of code. That we'd remain relatively on-par with current performance of the machine, with perhaps a slight decrease depending on how the emulation of AVX was achieved.
 
  • Like
Reactions: Dewdman42
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.