You're assuming that R is the same sort of thing as A, M and S. I don't know that that's true. A better analogy might be something like the U or H, in that it's doing a very different sort of job and, while packed full of logic, that logic does not correspond to what we'd think of as a CPU or a GPU.
I'm not assuming the R1 is minor 'cousin' at all.
It isn't going to run 'end user' apps so the OS and library stack don't have to be 100% the same as iOS or macOS. But does it 'boot' different? Probably not ( M-series has high overlap with the A-series initial boot. )
It also reportedly has a different memory which likely means the memory controller has substantive differences.
Apple's Vision Pro headset will use a new type of dynamic random access memory, or DRAM, that has been custom designed to support Apple's R1...
www.macrumors.com
However, most FPGA have a CPU core or a few. The likelihood that it has zero CPU cores on it is pretty small. As for a GPU. A complex 3D app rendering GPU? no. but extremely likely the R1 handles both camera input and all headset end user output in one chip to minimalize latency (and power) . There is about zero good reason to ship all that camera data back to the M2.
Primarily because the M2 really doesn't have extremely good input streams back into the SoC. The plain M2 packages have four x1 PCI-e v4 lanes and possibliy some Thunderbolt x4 v3 bandwidth to use. It is really geared to stream very low latency , high bandwidth 'out' rather than 'in'.
Furthermore most generic iPad apps just work (hovering in a window pane). So the M2 just has to ship out the hovering window video stream and R1 can do all the merging with reality for the 'frame'.
Does the R1 need PCI-e v4 to connect to WiFi/Bluetooth or Ethernet ? No. To connect to external video out for 'mirroring' ( USB-C) not necessarily (someone else (non-operator) is 'watching over the shoulder' of the operator does not have the same latency constraints. ) .
Does the non random end user app running seperate them from A, M , and S? I don't think that is a direct physical attribute of the die so ... no. Probably substantively different die area allocations for the subcomponents. But R1 is likely not as myopically single purpose at H and U and the 'firmware' running on the die is likely far more larger , complex, and diverse.
We really have no idea what the R1 actually does.
Do we know all of it to exacting precise detail? no. Some of it though is relatively obvious from Apple's low-on-details commentary. Does R1 control a hefty portion of the camera output (to the internals; SoC input ). Yes. Apple made pretty direct comments that there is very substantial AI/ML inferencing going on to track the hands via the cameras. So that is highly likely a R1 job ( just send mouse input data to the M1 as necessary) .
Does the R1 do the inferencing on the turning real world objects into some digital representation? Very probably yes. It is going to be way cheaper to ship relatively sparse gemetric object data back to the M2 then to do a round trip just to get that info back for what the R1 needs to do also.
Is the R1 running the mirrored rendering of the eyeballs on the exterior facing screen? Maybe a coin toss. It would make sense.
In short, need a different die area allocaiton ... more image processing. more NPU and less CPU and 3D GPU ( reality is already 3D ... it doesn't need to be generated to be 3D).
It seems like a combination of fancy ISP (modify the incoming images) and Display Controller (tweak the outgoing images). Reuse in an Apple car seems plausible, but who knows?
Microsoft Hololens ... two computatioal packages ... Splitting off the reality from generated artifical has been done. What balance Apple did is likely different, but splitting off the 'blending' between real and generated just makes sense.
Is it mostly "just" bigger versions of existing Apple IP? Or are there important and difficult algorithms embedded in there that we don't know about?
Bigger versions of existing Aple IP has been their game plan for over a decade. Why would they change now?
It isn't like their image subsystems are 'bad'. Or the NPU is 'bad'. The allocation of multiple P or E 'cpu' clusters has more to do with random applications showing up that need more horsepower. If toss random apps then can probably put a cap on just adding just enough to get the job done. (e.g., like the watch S SoCs ).
The the substantially larger amounts of AI/ML inference they are doing that area buget probably needs to go up (more compute), but do they really need entirely 'new' ones.? ( Apple basically has three variations already NPU , AMX, and some tweaks to the GPU cluster subsystem. Something here is going to miss all three of those; plus whatever the iMG subsystem has for predigesting for focus/composition inference ? ).
Crazy idea (for example) – would it make sense for Apple to sell an Apple Camera as a high-end DSLR, reusing, eg the R1 as part of the magic to make this a camera prosumers would lust after (and pay more for than a usual DSLR)???
Maybe DSLR-like in terms of stills – but also Red-like in terms of video...
Apple already sells lots of cameras. They have already said that they will be rolling out an approximated 3D photos on the iPhone Pro models later.
" ... Coming later this year, iPhone 15 Pro will add a new dimension to video capture with the ability to record spatial video for Apple Vision Pro. Users will be able to capture precious moments in three dimensions and relive those memories with incredible depth on Apple Vision Pro when it is available early next year in the U.S. ..."
iPhone 15 Pro and iPhone 15 Pro Max are designed with aerospace-grade titanium, delivering Apple’s lightest Pro models ever.
www.apple.com
Zero need to shift to also making (and/or certifying ) interchangeable lenses. ... already shipping spatial cameras millions now.
Bigger 'gap' Apple isn't addressing is how to get some of this leading edge camera stuff down onto Mac family of products. They already have 10's of millions of handheld cameras covered. ( pretty good chance next iteration of iPad Pro they have those covered also. )
I don't think this is going to be good match to the R1 die either. It is the number of camera inputs that it more excels at rather than better pictures from just one camera ( or switching between two/three). It would be surprising if narrowed the scope down to 1-3 camera inputs that the R1 doesn't anything substantially better than the rest of the Apple silicon family of SoCs. Bulk inputs is what the other ones don't cover (very energy efficiently, max Perf/Watt. ).
P.S. If the R1 handles audio , it might might more sense in a "Studio Display" product ( that had more than one camera) more so that tossing a Axx in there. R1 doesn't run end users apps and the 'display docking station' products really do not need to either.