I know, I know! But ... you falsely assume, only because A17 is 3nm and a new design, it must realize the total max possible 18% speed increase from the die shrink plus some extra speed increase from design changes. You do not take into account that new GPU features and power reduction are also valuable design targets.
That really isn't the 'design rules'. The design rules are more so constraints on layout and timings. The A17 has a higher level more abstract definition. That gets mapped down into actual gate layouts in a specific floorplan ( how the different function units are arranged on the die). If operating in a related set of process nodes ( e.g., N5 family of compatilble design rules) that mapping down process generally works for any member of the set. So port costs are lower. You don't have to do extensive floorplan remapping, major connectivity changes, and chasing major timing issues. It needs to be re validated and some 'quirks' probably cleaned up.
But the architecture design can be moved to another 'family' of process nodes. The die will probably end up a different size. Probably needs a substantively adjusted floorplan, probably internal inetwork issues to iron out. But the logical archiecture of the function units doesn't have to radically change.
For example, AMD has a roadmap that has Zen 5 on both TSMC N4 and N3.
To Infinity Architecture, and beyond
www.tomshardware.com
TSMC N4 and N3 are not design rule compatible. ( N3E and N4 do match up in design issues/constraints about SRAM/cache though. That subsystem isn't seeing a major change. Same with I/O. ). The Zen5 N3 probably more likely shows up in a later stage APU that has a different lifecycle than the server chiplets will see ( that are also used in desktop ).
They have also put some earlier "Vega" GPU implementations onto 'new family' nodes. It is a matter of how much longer going to sell something to whether or not the additional costs are worth it or not... not whether it is possible.
Whether Apple skews their design to taking max performance/clock out of N3 or toward max power saving is different that the underlying design rules. Whichever one of those design skew choices that Apple makes the underlying fab process design rules are exactly the same. That is more a matter of what higher level design trade-offs Apple wants to take or not.
AMD for example once again. The Zen4 and Zen4c cores have the same N5 family design rule constraints but the AMD design shoots for different design density. 4c and 5c designs at the higher levels impact being able to extract maximum clock from the fab process. They 'give up' max single thread drag racing option to roll out at a smaller die space size. The process design rules didn't 'change'. Same rules just tweaking high level design to realize difference trade-offs the rules enable.
The A14 , A15 , A16 resulted in gradually bigger and bigger dies sizes on the N5-family nodes used. The cost of N3 wafers is substantively higher. So it is pretty likely the team got put on a 'diet' in terms of die bloat allowed on this iteration. ( N3E is likely going to have similar issues because the SRAM is backsliding back to N5 size levels. )