Maynard, have you seen these recently published patents on prediction?
Would be curious to hear your analysis and how these relate to the state of the art.
Woah, those are really strange! I need some time to read through them fully and figure out what they mean.
As is frequently the case, there seem to be two separate ideas bundled into one patent. (I don't think is malicious on anyone's part; it's just that engineers have limited time to talk to the lawyers, and the lawyers are not competent/confident to split a dump of "here's how we changed X" into two or more separate individual concepts.)
So the more obvious idea is to add to the Prefetch and Address Generation stages of the front end a dedicated "fully-biased" predictor to catch the (surprisingly common) cases of branches that always go in the same direction. [eg things like "if (divisor!=0)" or "if(error condition)" ]
This probably allows these address generation stages to be a little larger, and saves some power (we don't need to run the TAGE machinery subsequent to the address generation to confirm the prediction). It's "fairly obvious" (which doesn't mean I thought of it! but means once I see it, it's clearly "oh yeah, that makes sense, neat idea").
Along with that there's a very weird idea, unlike anything I have ever seen discussed anywhere else, for how to handle mispredicted branches. I really have to think a lot about how it might work; right now I have no real idea.
(One possibility is that it implements an idea I have suggested elsewhere of snipping out short segments of mispredicted code. Consider something like
if(condition){ three instructions }
The traditional prediction/fetch mechanism would, at address generation, effectively predict the condition true and and then either load the trace of the three instructions, so the instruction stream looks like {conditional branch, three instructions, next instruction},
or predict it false and not load the trace, so the instruction stream looks like {conditional branch, next instruction}.
But that's suboptimal with an Apple-type design! Better is not to care about these sorts of branches (conditional followed by up to about 3 or 4 instructions). Just generate the instruction stream as {conditional branch, three instructions, next instruction} and let TAGE (which is a much better predictor than the Fetch Address predictor) decide the condition of the branch. Then mark the three instructions, if they are not to be executed, as NOPs, and drop them in the transfer between Instruction Queue and Decode.
I don't believe Apple (or anyone else) does this yet, though they should! It's POSSIBLE that the mechanism they describe essentially does something like this. It seems more heavyweight, more overhead and energy than what I have described, but maybe there are some advantages to balance that? Anyway I need to look at this much more carefully.
How did you discover these? I have my ways of looking for new technical patents (basically following up on the names on previous patents) but it's not very streamlined or efficient for discovering new patents as they come in; so sometimes it's a year before I find something new.
And doing a search for all "new" Apple patents is hopeless, you get lost in a stream of design patents, or OS patents, or just general stuff of much less interest to me.
There is a new edition of my PDFs coming out soon. (The goal would be to release it just before the next iPhone event, we'll see. The main new stuff is a new GPU volume, plus a quick update to the previous 5 volumes covering every new patent I've discovered in the past year; the above two will of course go into that!)