It's amazing to me that the CPU architects that came from Apple and are designing a brand new processor from scratch didn't bother implementing something like Apple's TSO mode. Are we sure they didn't?
Several odd assumptions built into the above.
First, it very likely isn't a primary 'feature' . Microsoft isn't going to tweak their compatibility solution just for one vendors CPU implementation. If Arm specified it as an optional feature then maybe. But as a proprietary fork from the Arm standard it doesn't do much to create a competitive Arm implementation environment for Windows.
The Arm Neoverse implementations that Amazon , Ampere Computing , etc. have released have zero TSO and yet have taken substantive share away from x86 servers. Amazon is up closing in 20+% of deployed servers with no TSO at all. The hype that they have to have this feature to win is mostly just Apple 'fan boy' talk. There is already concrete evidence that it isn't necessarily. It is a 'nice to have' , but it isn't necessary in the general market.
Apple's policy of 'all or nothing" coupled with 'burn the bridge after march the army over the river" makes it a more necessary feature in the macOS space. That is largely driven by Apple killing x86 off as an option for macOS in the intermediate future. Arm is not going to kill off x86 in Windows ( or Linux ). The hard core, 'legacy x86 code at maximum speed' folks in the Windows space are just going to buy x86 Windows systems. Apple just doesn't even give that choice as an option. ( the Mac space is about having at least an order of magnitude less system choice options. )
Second, the 'from scratch' is likely not entirely true. Part of the Arm law suite again Nuvia/Qualcomm is about the assistance of data and IP that Arm provided to Nuvia to jumpstart their efforts. That additionally grounds the reality here that these cores started off in a different target market. Being 'repurposed' into a alternative target market leaves even less time/effort/resources available for non critical features. If even up slower than x86 options on general code then TSO isn't going to 'buy' Qualcomm/Nuvia anything. And deploying something slower (or less competitive) than Neoverse v3 would be even worse. They need to get something out the door with as few distractions as possible.
Third, since were going to leave with a decent size block of Apple engineers , it was about 100% certain the Nuvia was going to get sued. If they7 tried to copy a feature that only Apple implemented, then Apple would be digging around in their company for far , far , far longer than they budget for ( Apple piling on lawyers means you also need lawyers. ). Apple's TSO came out of lots of data they collected on x86 processors. If Nuvia takes the data then they are shooting themselves in the foot. If they spend gobs of time recreating the data... they are burning money that they have relatively little of to spare.
If not it calls into question how expert these ex-Apple engineers really are.
No it doesn't at all. An 'expert' is a person who picks the correct solution to a specifically posed problem. Experts are not necessarily indicated by someone who simply pulls out the same hammer every them and mainly just regurgitates what they have already done over and over again.
In the Linux market ( i.e., high overlap with servers) lots of the applications are 'open source' which means it isn't all that hard to just recompile them and deploy on a new platform. (e.g., why Amazon and other hyperscalings are relatively rapidly deployed once at the Linux foundation work was done in the server space ( drivers , kernel, boot, common utilities , etc. ). Android market is minor variation on Linux. Same issue different day.
Extremely relevant core issue here was that Nuvia wasn't going to build a macOS/iOS clone running CPU. Windows already had an conversation tool long before Nuvia even sat down in their brand new , 'from scratch' offices. Running the Microsoft solution faster is the primary point. Not solving the problem they might 'wish they had'. It is solving the issue they did have.
Same thing with boot UEFI (because Windows is grounded in it) . Better know something about Pluton ... because Windows is heading in that direction. Apple can burn all that up by burning bridges behind them (they have iPhone/iPad ecosystem to lean on). Nuvia/Qualcomm cannot.
To not have reliable x86 and x86-64 emulation without heavy use of memory fences that slows down execution is borderline incompetent.
Windows on Arm primarily problem for the last 3-5 years has been native Arm performance; not 'emulation'.
A substantial amount of information about where fencing is/is not necessary is in the source code. Leverage that and don't need to rely on rewriting binaries where most of that has been thrown away. More performance will pull more software developers into doing the compile to Arm instead od doing a re-compile behind their back. ( Emulation is the wrong word connotation here. These are both running Arm binaries. It is varying on when and how much effort is put into the recompile time/resources. )
And Apple is likely also 'spraying' memory fences ... just behind the instruction decoder. That is why Rosetta cannot mix-and-match code.