Agreed. Indeed, if you're just at the start of production for a new process (as is the case for the N3), it makes sense for lower-volume products to get the new process first (giving the process time to ramp up in production before it's directed to the higher-volume products, i.e., the iPhones).
Not all low volume is the same. A car maker than needs 800 wafers for relatively cheap car backup camera processors is not a good candidate for a pipe cleaner.
You select a pipe cleaner on a die that can afford to take the hit on lots of defective dies. Either because it is a relatively low volume die demand because it has a very high price tag (and high margins to pay for more 'dead' defective silicon) or a much lower priced die (with decent margins) whose run rate will be so long that will aggregate that 'dead die' costs over one , two , or three orders of magnitude higher numbers of dies.
1,000 dies that have $1,500 profit margin is same amount of revenue as 100,00 dies that have $15 profit margin.
TSMC gets paid for the wafer if zero dies come out working or not. For an N5 (and up) wafer that is around $17K. The more defective dies out of that wafer the more 'defect die' overhead each one of the working dies has to pay for. A $17K wafer that produces 10 working dies has a cost basis of $1,700/die. 300 working dies per wafer that drops to $57. If the 300 dies are being sold to end users at $47 ... that isn't a good pipe cleaner (unless there is a some humongous cost recovery later when cost structure shifts . ). In that 10 working per wafer if the profit margin is $2K per chip then still making $300 profit per die. That works as a pipe cleaner ( both TSMC and customer are making money. The customer could be making lots more, but the customer is not paying for TSMC's temporary yield issues. )
If the lower-volume is too low . The wafer run rate will be too smaller to generate significant statistical data to do process improvements on. If one customer comes in and order 4 wafers and then disappears for 2 months. That isn't going to help much. A sustained extended feedback loop is what is helping to cleaning the 'pipe'. Not some short burst like push a actual pipeline 'pig' through a real physical pipe.
Conversely, it makes sense for the products with the simplest chips (the iPhones) to get the new microarchitecture first, and then for Apple to roll it out to successively more complicated chips*: M#, M# Pro/Max, M# Ultra—and, lastly, the M# "Extreme" (as it has been called here on MR, though I'd prefer M# Garuda). That's what they've been doing thus far.
First, that is different from 'pipe cleaning'. That is more so about which one gets to tape out first and the first set of extra extremely low volume first silicon dies for verification testings. Second, SoC are far modular than that. Currently the P core complex consists of 4 cores clustered around a common L2 cache area. Getting to more P cores just means putting more 4 complexes onto the design. Each group of cores really doesn't work that significantly different from each other. It is largely just more. ( there are extremely narrow things like interrupt vectors that need to get longer with more cores/complexity , but that can done in a significantly modular fashion also if plan ahead in the design. )
The P (and E ) clusters can be 'chopped' . If planned ahead of time should be straightforward to create a 2 core P/E complex from a working 4 complex . Or scale up go from working 2 core to 4 core complex.
They are likely got major sections of the SoC blocked off and tons of design verification and bug testing being done on those individual blocks before putting all of that together. Get your parts working and then put higly working parts together into a SoC and then test that ( which ideally is mainly integration 'bugs' and not deep seated problems in the individual building blocks).
The are more "uncore" parts in an Ultra than in a plain A-series. Those can be worked on in parallel because the Soc has a modular infrastructure.
So lots of this is done all at once. In a very similar fashion in the way that the design of the A-series is pipelined. Multiple SoC generations in R&D stages at the same time.
*That's because I envision the designers starting by laying out the A# chip microarchitecture and then, once that's completed, tackling the larger ones, rather than designing them all at once. Specifically, I assume the understanding they get by completing the A# chip microarchitecture informs their design of the M#, which in turn informs their design of the M# Pro/Max, and so on. But maybe that's not how it works....
Actively managing the complexing EARLY in the design process leads to better outcomes. If have working (highly correct) , functional building 'lego' blocks you can build stuff much easier then if trying to build bigger things with brand new (less tested ) even bigger blocks (with lots of redundant functionality)
The A-series and M-series sharing the same building 'blocks' saves time , effort and money. Can't really do extremely highly shared blocks is completely ignore the eventual contexts where the blocks are going to need to go.