I have no opinion on the rest of your post, because I don't know if 7% area saved would outweigh the costs of making two dies.
It almost certainly wouldn't. One thing I only obliquely referenced in that post is that die have to be packed or "tiled" onto a wafer, and this has consequences.
Like everyone else, TSMC uses 300mm dia. wafers. This yields about 70,000mm^2 of usable area.
So how many M3 Max can you fit on one wafer? That's a hard question to answer because I haven't seen anyone reveal the actual measured size. I'm going to assume it's 500mm^2 because it should be a bit larger than past Max chips, and the M1 Max was supposedly about 430mm^2. Also, 500 makes the numbers for the first round of calculations nicer.
Speaking of, we'll ignore that the wafer is circular while the die is a rectangle. So we just divide 70000 by 500 and get a neat round number of 140 M3 Max per wafer. Let's further assume that the M3 Max is square. (It actually almost is, according to pixel counting, though the hidden Ultra Fusion interconnect will likely push it slightly more rectangular.) We have to lay the 140 die out in a grid, and since sqrt(140) is not a nice integer, let's bump the assumed wafer size slightly. That puts our baseline as 144 die in a 12*12 grid on a 72,000mm^2 square wafer. (Spherical cow, square wafer, same kinda thing...)
With that 144-die wafer as our baseline, how many extra die do we get once we chop 7% of the height?
Well... the answer is actually a big fat zero. A 500mm^2 die that's perfectly square is 22.36mm x 22.36mm, and the chop gives us 22.36 x 20.8 - we're saving 1.56mm of height. The total height saved is 12 rows * 1.56mm = 18.72mm. But we'd need to free up 20.8mm to add another row, so chopping 1.56mm per die gained us nothing.
Now obviously in the real world you're packing rectangles (not squares) onto a circle (not a square). So you lose a lot of area at the edges to the whole rectangle-packed-on-circle thing. There's also saw kerf laid out between the rows and columns because at some point you gotta singulate the wafer (cut it apart with a diamond saw), and so forth. Many complications I papered over. However, the basic geometric principle remains true: when packing objects that are relatively large into a given area, slightly reducing the size of those objects may not improve the number which can be packed at all.
(As die size goes down, the gains you get by reducing die size do start to more closely approximate what you'd naively expect just by dividing wafer area by die area.)
But this specific argument makes no sense. There is no reason Apple couldn't sell both chopped M3 Maxes, and full M3 Maxes with some cores and a memory bus disabled, as 30-core M3 Maxes.
I think you're being "technically correct" here. I'm making economic / practical arguments here, not talking about whether it's physically possible to do as you say.
The yield of perfect 40/40 core Maxes is bound to be fairly low as it's such an enormous die. They're going to have plenty of supply of binned down 30c GPU Maxes just by running the wafers required to supply them with 40c GPU Maxes, so why spend all that extra money developing and manufacturing a second mask set? Especially if the chop can only be a very small one, meaning it doesn't even yield them many (or any) extra die per wafer?