Consider it this way: Maybe they’d revamp the scheduler to use more than 64 threads now in case they make a cpu with more than 64 cores in the future.
Why would Apple put that case on high priority when they are on a differentiation path with the rest of the SoC ?
First, Apple has put an extremely high priority on this "Unified and everything on the SoC package" differentiation for the M-series ( and aways had it or A-Series , S-series (watch) , etc. ). This means CPU cores are going to have to compete with other function units for transistor budget. Apple is perusing bleeding edge fab process to give themselves a bigger allotment of transistor , but that is so they can share more. 'Unified' is on their "want to do " list.
There are also lots of focus units; not just GPU cores. There are AMX (matrix) units , Neural Units(NPU) , video en/decode units . The pressing issue is that for their specific areas of expertise they beat the slop out of general CPU cores in terms of Perf/watt . And Pref/Watt is also an explicitly , open statement Apple has said they are pursuing. That is on their "want to do" list.
Apple's third openly stated objective is to have the fastest iGPU among all competitors. That's is gong to apply pressure to assign it a disproportionate share of the budget increase. If they throw displacing upper mid range discrete GPU also ( e.g., getting rid of dGPU in iMac 27" class ) then even more so.
Apple made a Afterburner card to accelerate ProRES RAW 8k decode. Bigger budget could see a one stream 8k decode unit being pushed into the SoC. That isn't an explicit want, but putting a "value add" to a custom, Apple format across the whole Mac line up..... why would they not "want" to do that? That is just deeper ecosystem glue (or tarpit depending upon viewpoint. )
Second, there are explicit "want to do" that Apple has explicitly stated directions. Moving FP16 and FP32 computational kernel to the GPU cores through Metal compute. That is a "want to do". That goes back into the desire for better Perf/Watt. The complaint from some that putting computation on GPU looses access to main RAM? Buzz, dead because of the unified memory priority.
Apple uses foundation libraries like Accelerate and AV to push computations to custom units when they are available. That is seemless access to the NPU, AMX , video en/decode units. Apple spends lots of time and effort trying to get better utilization out of their significantly better Perf/Watt specialized units. Pragmatically that amounts to generic CPU core offload.
The 'icing on the cake' is that offloading from the CPU also reduces the pressure to change the kernel schedulers... those specialied are not scheduled directly by the OS. Apple keeps a single unified microkernel from Homepod to Mac Pro. Apple "wants to" spend a higher amount of money on kernel development costs overhead? Probably not. APFS is great at single SSD drive set ups. High mulitples of storage drives (via RAID and/or multidrive, volume management. Apple deprecated that path. ). Laptops/mobile leads the charge for the file system. That is highly likely to show up in other major subsystems of the OS also.
Apple also explicitly stated that they want to natively run iOS apps on the Mac. Pragmatically that is an underlined emphasis on the unified ( i.e., GPUs are going to get a substantive transistor budget). this is somewhat similar to why Intel/AMD are hesistant to drop old x86. The 32-bit software inertia keeps that 32-bit hardware functionality in their CPU packages. The huge inertia from OS apps that Apple derives large revenues from is going to keep the GPUs in the M-series SoCs. (unlike 32-bit apps, OS apps are generally moving forward. So the chances of this inertia dying out over time is extremely low for the foreseeable future. )
Third, Apple doesn't place a high priority on being in the server business. There is no deep, driving "want" there. macOS Server is a bundle of some software and activating some cron jobs and opening some ports. There is no substantive changes at the kernel level at all. Hardware wise the approach is just to press the mini and Mac Pro into a solution. There are no 1U or 2U boxes. Apple walked away from that business over a decade ago.
Apple's foray into XCode cloud is likely going to be the same path other macOS cloud vendors too. Lots of Mini's and a relatively smaller percentage of Mac Pros. As long as the other cloud providers are constrained by the same end user hosting density then Apple will continue to just sell more Macs. ( Apple probably wants to sell more Macs rather than fewer. )
Apple has a want to be in the single user workstation business. But they don't necessarily have to take a server CPU to do that. When they make their own , they can choose their own priorities. For the last decade Apple has taken the single CPU package track. Choosing Xeon E5 versus E7. Choosing Xeon W-3200 over Xeon SP. If Apple is waiting on W-3300 for a deferred launch instead of going forward on Xeon SP Gen 3 . All basically the same want. Trading off slow time to market for higher single threaded performance. There isn't a higher priority being applied to ultimate possible core count.
Same transistor budget allocation impact. If single thread performance is a equal or higher objective than multiple threads performance, then bigger caches ( branch target, L1/L2/L3 ) are going to take transistor budget away from more cores (with smaller caches ). Similarly, the caching that AMX , NPU , and/or GPU cores need to be more effective. Probably ittle want to sacrifice that just to win some CPU core count 'war'. Similar issues as to why probably won't get SMT later either (plus Apple cares very little about SATA storage performance) .
Apple is probably wants a more balanced SoC at the top of their offerings. The combination of multiple dimensions that it does very well is the 'win' that they likely want. If they had a 56-64 ( 56P + 8E cores ) SoC that had the same (or higher) top end single threaded performance as the 4-8 core offerings then that would be something the "hand me down" server x86 packages don't really do. ( at least without crazy high TDP.... which Apple has zero desire of going anywhere near ( Perf/Watt objective). ). Apple will position that as "we are doing a better job so... replaced them" win.
Apple having a myopic 'want' to win the maximum general CPU core count war is deeply suspect.
Finally, one can point to increasingly narrow corner cases where the general CPU cores have to be used, but still is an embarrassingly parallel workload. The problem there is that Apple explicitly doesn't want to be everything for everybody. Tackling a targeted, "good enough" coverage workloads with more specialized cores is more Pref/Watt effective. The workloads inside the targeted are getting larger performance wins over design evolution.