First, it very likely isn't a primary 'feature' . Microsoft isn't going to tweak their compatibility solution just for one vendors CPU implementation. If Arm specified it as an optional feature then maybe. But as a proprietary fork from the Arm standard it doesn't do much to create a competitive Arm implementation environment for Windows.
It should be a pretty easy tweak. If Microsoft detects a TSO mode, set the memory compatibility setting as low as it can go and turn it on.
The Arm Neoverse implementations that Amazon , Ampere Computing , etc. have released have zero TSO and yet have taken substantive share away from x86 servers. Amazon is up closing in 20+% of deployed servers with no TSO at all. The hype that they have to have this feature to win is mostly just Apple 'fan boy' talk. There is already concrete evidence that it isn't necessarily. It is a 'nice to have' , but it isn't necessary in the general market.
Server products have little reason to do x86 binary translations. This Qualcomm SoC is meant for consumer Windows PCs. Microsoft OEMs have a hard time selling them because x86 compatibility isn't that great.
Apple's policy of 'all or nothing" coupled with 'burn the bridge after march the army over the river" makes it a more necessary feature in the macOS space. That is largely driven by Apple killing x86 off as an option for macOS in the intermediate future. Arm is not going to kill off x86 in Windows ( or Linux ). The hard core, 'legacy x86 code at maximum speed' folks in the Windows space are just going to buy x86 Windows systems. Apple just doesn't even give that choice as an option. ( the Mac space is about having at least an order of magnitude less system choice options. )
I find it hard to believe that Microsoft doesn't think the highest x86 compatibility isn't a top priority. This is Microsoft who always values backwards compatibility more than Apple.
Second, the 'from scratch' is likely not entirely true. Part of the Arm law suite again Nuvia/Qualcomm is about the assistance of data and IP that Arm provided to Nuvia to jumpstart their efforts. That additionally grounds the reality here that these cores started off in a different target market. Being 'repurposed' into a alternative target market leaves even less time/effort/resources available for non critical features. If even up slower than x86 options on general code then TSO isn't going to 'buy' Qualcomm/Nuvia anything. And deploying something slower (or less competitive) than Neoverse v3 would be even worse. They need to get something out the door with as few distractions as possible.
Third, since were going to leave with a decent size block of Apple engineers , it was about 100% certain the Nuvia was going to get sued. If they7 tried to copy a feature that only Apple implemented, then Apple would be digging around in their company for far , far , far longer than they budget for ( Apple piling on lawyers means you also need lawyers. ). Apple's TSO came out of lots of data they collected on x86 processors. If Nuvia takes the data then they are shooting themselves in the foot. If they spend gobs of time recreating the data... they are burning money that they have relatively little of to spare.
This is a good point. Nuvia was building a server chip where I already acknowledged that x86 compatibility isn't particularly important. I guess I don't know enough about the process that Nuvia and then Qualcomm went through to get a final product. If the lawsuit continues then we may find out in the future.
In the Linux market ( i.e., high overlap with servers) lots of the applications are 'open source' which means it isn't all that hard to just recompile them and deploy on a new platform. (e.g., why Amazon and other hyperscalings are relatively rapidly deployed once at the Linux foundation work was done in the server space ( drivers , kernel, boot, common utilities , etc. ). Android market is minor variation on Linux. Same issue different day.
Linux is really not something that is relevant to this. The Qualcomm SoC is mostly targeted at Windows.
Windows on Arm primarily problem for the last 3-5 years has been native Arm performance; not 'emulation'.
A substantial amount of information about where fencing is/is not necessary is in the source code. Leverage that and don't need to rely on rewriting binaries where most of that has been thrown away. More performance will pull more software developers into doing the compile to Arm instead od doing a re-compile behind their back. ( Emulation is the wrong word connotation here. These are both running Arm binaries. It is varying on when and how much effort is put into the recompile time/resources. )
And Apple is likely also 'spraying' memory fences ... just behind the instruction decoder. That is why Rosetta cannot mix-and-match code.
Again, the issue is x86 compatibility not porting to ARM. (And yes emulation is the wrong word but I was using it as a shortcut instead of saying binary translation.) I doubt Microsoft is going to be any more successful at getting ARM ports than they have been in the past. I guess we'll see.
As for Apple spraying memory fences, I doubt it. Most acknowledge that Rosetta 2 is faster and more reliable than Microsoft's solution. Apple added TSO specifically so that no changes to the memory model of the x86 binaries was necessary.