What I said only applies to CPU, so I'll focus on the CPU. The A4 is a pretty standard Cortex A8 inside. The A5 is a pretty standard Cortex A9. It is starting A6 that Apple started blowing away the competition (Cortex A15) with a smart architecture that focuses on large and wide bus speeds to achieve it's excellent performance. I saw this as a very good architecture improvement over the Cortex A8 in the A5.
I was blown away by the A7 which seems to pretty much doubled performance with pure architecture efficiencies and minimal clock speed improvement. The A8 is lacklustre as they started running out of ideas/things to do on the architecture side and just relied on a die shrink to 20nm.
The A9 is again quite impressive, but they didn't achieve something magical like the A7. They got the gains, approximately half of them from the architecture improvement, and the other half from the much higher clock speed from the 14/16nm manufacturing process.
Notice in each A6, A7, and in A9 there were *massive* architecture improvements? There is only so much improvement you can do to an architecture, and I feel they have reached a point where they've optimized so much that there will only be one more major optimization gain. Think about it, the A9X is reaching pretty close to the performance of Intel i3 chips at a fraction of the power consumption. They've already done way better on the architecture side than I expected. Apple have already exhausted the traditional techniques of improving IPC from increasing memory interface width, bus lanes, FINFET, etc. Intel's rate of improvement has dropped to the 10% range for over 5 years already because they've ran out of big things to improve on. Therefore, I think they're reaching the limits on the architecture side.
The other way to increase performance is to increase the clock speed, and you can do that without excessive consequences of heat and power consumption from smaller manufacturing processes. But Apple is already at 14nm. There won't be too many more die shrinks. There will be 10nm, then 7nm, and below that will get very, very hard due to the increasing quantum effects on the transistors as they become close to the size of an atom and some basic physics limitations on size. We won't even reach 10nm for at least 2 years, and when we get 10nm, that'll very likely be Apple's second last major CPU performance improvement. At that point, power consumption should be low enough that Apple can move from dual core to quad core, and this move will be the last major CPU performance improvement.
However, I can see Apple moving towards quad core very soon, as early as the A10, because Apple has gotten a die shrink with every generation, but they will be stuck on the same process and won't get a boost from a die shrink. Therefore I don't think they'll get a large enough boost in performance from architecture or clock speed (which is already very high), and they'll have to rely on more cores.