Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Not sure what is the relevance of this?
I thought it showed that out-of-order RISC-V SoCs have similar performance to out-of-order ARM SoCs.

ARM64 is designed to pack a lot of information into a single instruction. Many ARM64 instructions can be efficiently decoded and scheduled as multiple operations on a modern OOO core. In other words, the goal is to maximize the useful information per instruction.
Can you give an example of such an instruction?

RISC-V authors explicitly mention operation fusion to solve this problem - the CPU will analyze instruction sequences and combine them into a single operation that can be executed more efficiently. But this is not free as it requires power and complex logic. Basically, the idea is that by pursuing extreme simplicity RISC-V might have sabotaged its ability to scale.
Do you mean this?
 
  • Like
Reactions: psychicist

leman

macrumors Core
Oct 14, 2008
19,518
19,664
Can you give an example of such an instruction?

I can try! Please note that I am not a hardware person and what I am describing here might be amateurs and naive. I hope that the real experts who read this will correct me should I make blunders.

So, here is the example. There is a very common pattern when loading or storing data. It occurs when you are accessing data from an array. In C, it could be something like a[b] which means "get the b's element of the array a". From the CPU perspective, this involves taking the value of b, multiplying it by the size of the array element and adding the resulting offset to the base address of a. Let's say we are working with 32-bit integers, so our multiplier is 4. Then the operation we want to perform is load a 32-bit word at the address (a + 4*b). Let's see how this looks in ARM64 and RISC-V (note: I have replaced the architectural registers with names a and b to make it more clear)

Code:
-- ARM64
ldr     result, [a, b, sxtw #2] -- load the value at address a + b*4

-- RISC-V
slli     b, b, 2         -- shift b by 2 (same as multiply by 4)
add    a, a, b         -- add a and b and store the value in a
lw      result, 0(a)  -- load the word at address a

As you can see, this is one instruction in ARM64, three instructions in RISC-V. The latter follows the RISC idea of "one instruction = one operation", so every individual operation has its own instruction. ARM however has special forms for frequent operations like these and allows you to encode address combinations with common multiplication factors in one instruction.

Since this kind of addressing modification is very common, a CPU might choose to implement it in hardware, as part of the load/store unit. You can have a dedicated adder + shifter that directly combine the values as they come in and produce the final address. The hardware overhead is very small and you have a big win in energy efficiency and latency. You also free up the general-purpose ALUs in the integer unit as well as the precious dependency-tracking resources.

Now, in an ARM CPU this can be done immediately, since all the information you need to feed to this address computation unit is already there. The decoder can trivially extract it and schedule it as one convenient package. And this is also how this stuff works on modern ARM CPUs — if you look at instruction timings, M1 does not need longer for more complex addressing modes.

With RISC-V, things get tricky. The basic implementation will use the regular integer ALUs for the address computation, scheduling and tracking three dependent instructions. If you want to use the optimisation of a specialised address computation unit, your decoder needs to able to detect these kind of instruction sequences, analyse the dependencies between then and collapse them into a single "address compute + load" instruction of the kind ARM already has. This is of course possible and routinely done, but introduces additional complexity and overhead in the decoder stage. In some way you area also going back from fixed-size instruction to unpredictable variable-size instructions (even worse, since you have to analyse dependencies!). The other non-trivial aspect of the story is that the RISC-V instruction sequence has visible side effects, as it modifies the values of the registers a and b. So even if you do the fusion so that the execution itself is efficient, you need to track additional data dependencies. What's more, most of the time you don't care about these intermediate values, which means you will waste registers, reducing the amount of data slots available for your algorithm (sure, RISC-V has a lot of registers, but there are limits).

And there are a lot of things like that. ARM has instructions that allow you to load two registers at once, automatically modify an indexing register, shift an instruction operand and other things. This adds complexity but also allows opportunities for more efficient hardware execution of common patterns. RISC-V on the other hand puts the burden of
making common patterns fast entirely on the CPU. Which brings me to the original point: ARM64 requires a more complex CPU to begin with, but it comes with features that make CPU's life easier when things have to go really really fast. RISC-V instead can be implemented on a much much simpler computer, but it requires progressively more complex logic to make things go really really fast.

Which by the way is confirmed by the latest experiments in code density. This excellent writeup shows that on mostly integer programs, compressed RISC-V code density is approximately 10-15% smaller than the ARM64 code density (which is great for RISC-V), but the actual number of instructions is 10% higher! That is, compressed RISC-V manages to save 10-15% on instruction caches but has to pay for this by having to decode and execute 10% more instructions! That's not a good trade-off IMO. And that's mostly integer code, where RISC-V has the most advantage. This means that if you are pursuing a state of the art ultra-high single-core performance design, you will need more decoders + complex sequence and dependency detection logic for instruction fusion just to maintain parity with ARM64. This can be done, absolutely, but it's an extra cost that has to be accounted for.

In the end, what I am trying to say is that RISC-V is not some magical super-innovative panacea people make it to be. It's an ISA with its set of strength and weaknesses, and that's it. Personally I think it's great for microcontrollers and other specialised hardware, but I don't find it even remotely interesting for general purpose computing. There is just no added value.


I thought it showed that out-of-order RISC-V SoCs have similar performance to out-of-order ARM SoCs.

I don't think it shows much except that RISC-V is good for microcontrollers where area-efficiency matters a lot. I am hoverer talking about the state of the art general-purpose performance. And of course, it should be entirely possible to make a very fast RISC-V core. It just won't be very easy, due to the aforementioned problem's with its design.



I don't have a specific reference. But operation fusion is the common answer given by RISC-V proponents when asked about these problems.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
There is a very common pattern when loading or storing data. It occurs when you are accessing data from an array. In C, it could be something like a[b] which means "get the b's element of the array a". From the CPU perspective, this involves taking the value of b, multiplying it by the size of the array element and adding the resulting offset to the base address of a. Let's say we are working with 32-bit integers, so our multiplier is 4. Then the operation we want to perform is load a 32-bit word at the address (a + 4*b).
You are right, RISC-V seems to have no scaled addressing mode.

The RISC-V reader: An open architecture atlas has the following example (unfortunately, the English version is not free).

Page 26

code.png

Page 100
arm64.png


Page 99
rv64i.png


This means that if you are pursuing a state of the art ultra-high single-core performance design, you will need more decoders + complex sequence and dependency detection logic for instruction fusion just to maintain parity with ARM64.
How is it possible that RISC-V needs more decoders than ARM64 when RISC-V has fewer instructions? RISC-V programs are longer than ARM64 because RISC-V instructions are simpler.
 
Last edited:
  • Like
Reactions: psychicist

leman

macrumors Core
Oct 14, 2008
19,518
19,664
You are right, RISC-V seems to have no scaled addressing mode.

More generally, it’s about ARM64 carrying more information per instruction.

How is it possible that RISC-V needs more decoders than ARM64 when RISC-V has fewer instructions? RISC-V programs are longer than ARM64 because RISC-V instructions are simpler.

If you want to reach the same performance at the same clock you need to compensate for the fact that RISC-V needs to execute more instructions on average. This can only be achieved by decoding and executing more instructions (either via a wider backend or via instruction fusion).

This is an important advantage ARM64 has over RISC-V, at least in my understanding. A RISC-V CPU needs to do more work to get the information that an ARM64 CPU gets for free.
 
  • Like
Reactions: Xiao_Xi and Basic75

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
It seems that the first RISC-V SoCs coming to user-facing applications could be next year.


Errr, kind of not what the article says. Sure he says

".. The question on everyone’s mind is when RISC-V will come to user-facing applications. The answer is that this may be closer than people expect. There are currently 4 companies working on large RISC-V cores that compete with the biggest and fastest from Intel, AMD, Arm, Apple, etc
..."

but then goes on to talk about the focus of these four candidates.

1. Ventana Micro Systems,

Is mainly targeting DPUs and SoC with highly custom accelerators attached. Not general "dick , jane ,spot" general users apps. All the I/O is "off the shelf" from 3rd parties ( Ventana isn't doing that. just CPU core only chiplet. Yes, Amazon is tacking on memory controller and PCI-e with Graviton 3 , but their SoC doesn't really have to compete for placement. Amazon built it so they are extremely likely to place it. )




that is where the free content ends , but...

2. Tenstorrent,

Is an company targeting AI training. How is that in any general user facing solution? Most users don't do AI training. Most users interact with AI inferencing; not training.

" ... “What excites me most about Tenstorrent is that its technology scales from a single chip to a thousand chips, and from low power all the way up to megawatt data centers. $1,000 cards to $1 million high-density racks are powered by a single software stack that supports inference and training and a wide range of models, which is a game changer,” he says. ...
...
As reported by Dylan Patel of SemiAnalysis earlier this year, Tenstorrent expects to have 1,000 PCIe cards with dual Grayskull chips in datacenters by the end of this year in addition to a machine with 1,000 Wormhole chips in another. ..."




3. Rivos,

There is a semianalysis write up for Rivos that name drops lots of folks, but not really a wide enough set of folks to truly crack general purpose market (there is software and firmware stack also need to have for general user app contexts. ).



and Akeana.

" ...

About us​

Akeana is a venture funded RISC-V startup founded in early 2021 by leaders in our industry. Our semiconductor IP offerings include low end microcontroller cores, mid-range embedded cores, high end laptop/server cores along with coherent and non-coherent interconnects and accelerators.
..."





It is more than just tossing some silicon over the wall to make a dent on the general user application market.

"...
By my count, there were eight unsuccessful attempts at general purpose Arm-based datacenter processors: Marvell V1, Marvell V2, Calxeda, Samsung, AMD, Qualcomm, Broadcom, and APM. Literally, companies invested billions in cash with zero payback. ...
...
You would think we would be done at this point, but you would be wrong. Ampere must create hardware and software platforms to make the SOC useful. The scalable hardware platforms need to conform to standards and include BMCs and all the peripherals like memory and storage. Ampere then needs to create UEFI compliant firmware and along with Arm and other IP vendors, integrate hardware-enabling software ..."

Yes there has been lots of work over last couple of years to get Linux working on SiFive "dev kits" , but that is still a pretty long way from 'prime time' end user applications . RISC-V moving up into more "heavyweight" embedded spaces where there is full linux system that drives the dedicated machine-system GUI that has more horsepower ( e.g., industrial control machine. )

If RISC-V can kill off Arm embedded business faster than Arm can shift and grow revenues in mid-upper range SoCs zone then there will be more folks looking for a "mid-upper" range RISC-V also. But a revived x86-64 market and Arm not shooting themselves in the foot, that could be a tough biz to break into. Especially if there are 2-3 vendors climbing like crabs out of the barrel over the same sub-market. They'll end up competing more so on price ($) or maybe $/perfomance. But world 'crushing' performance only criteria. We'll . IMHO probably going to end up with better competitor to Pi Boxes market than to putting Cray EX Systems out of business.
 
  • Like
Reactions: Xiao_Xi

Longplays

Suspended
May 30, 2023
1,308
1,158





https://www.reddit.com/r/RISCV/comments/13jms98
Will Apple move from ARM to RISC-V by the mid 30s?
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Will Apple move from ARM to RISC-V by the mid 30s?
RISC-V ISA is very successful in academia, but not so much in industry. It is still incomplete and has not proven to be an alternative to Arm ISA in industry.

It seems more plausible to me that Apple will be Arm's last remaining customer and buy the ISA than that Apple will start RISC-V based SoCs.

I do however see them using RISC-V in specialized processors.
Could future versions of R1 use RISC-V ISA? I doubt that Apple used RISC-V for R1 because the extension for sensor fusion isn't yet finished.
 

leman

macrumors Core
Oct 14, 2008
19,518
19,664
Could future versions of R1 use RISC-V ISA? I doubt that Apple used RISC-V for R1 because the extension for sensor fusion isn't yet finished.

The nice thing about RISC-V is that it’s easily extensible. If Apple uses RISC-V for special purposes they are not going to wait for official extensions, they are just going to add whatever ISA functionality they need.
 

Longplays

Suspended
May 30, 2023
1,308
1,158
The nice thing about RISC-V is that it’s easily extensible. If Apple uses RISC-V for special purposes they are not going to wait for official extensions, they are just going to add whatever ISA functionality they need.
Historically Apple has done this. An example of a custom job would be the Retina 5K display.
 

Longplays

Suspended
May 30, 2023
1,308
1,158
RISC-V ISA is very successful in academia, but not so much in industry. It is still incomplete and has not proven to be an alternative to Arm ISA in industry.

It seems more plausible to me that Apple will be Arm's last remaining customer and buy the ISA than that Apple will start RISC-V based SoCs.


Could future versions of R1 use RISC-V ISA? I doubt that Apple used RISC-V for R1 because the extension for sensor fusion isn't yet finished.
I pointed to a mid 30s transition. ;) That's ~15 years away.

2 decades ago who would have ever thought that Apple moving to a Mac SoC?

Quarter century ago who would have ever imagine Apple would abandon PowerPC or OS9.

Give any tech enough R&D runway and they may yield a more efficient solution.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,518
19,664
Historically Apple has done this. An example of a custom job would be the Retina 5K display.

Pretty much everything about Apple is custom job. I mean, they even had custom firmware on the HDDs they ship to make sure they behave as expected with data flushes. Is this a good comparison to RISC-V though? Pretty much any company that ships a RISC-V product has some sort of custom ISA elements in there.

Maybe by 2030 RISC-V can be where ARM is today, who knows. I still don't see any technological advantage for anyone with an established ARM product portfolio to switch to RISC-V. Political/business reasons, maybe, especially if ARM significantly changes their licensing terms. But Apple has an architectural license, they can do pretty much whatever they want.
 

Longplays

Suspended
May 30, 2023
1,308
1,158
Pretty much everything about Apple is custom job. I mean, they even had custom firmware on the HDDs they ship to make sure they behave as expected with data flushes. Is this a good comparison to RISC-V though? Pretty much any company that ships a RISC-V product has some sort of custom ISA elements in there.

Maybe by 2030 RISC-V can be where ARM is today, who knows. I still don't see any technological advantage for anyone with an established ARM product portfolio to switch to RISC-V. Political/business reasons, maybe, especially if ARM significantly changes their licensing terms. But Apple has an architectural license, they can do pretty much whatever they want.
It may be a hedge against a time when ARM's licensing terms may become too limiting or expensive.

Similar to Apple buying Intel's 5G modem division to avoid/reduce the Qualcomm tax. IIRC Qualcomm's licensing fee specific to Apple is based on the MSRP of the device hence no Macs with a 5G modem.
 

leman

macrumors Core
Oct 14, 2008
19,518
19,664
It may be a hedge against a time when ARM's licensing terms may become too limiting or expensive.

Except Apple already has a license and I doubt they are impacted by whatever decisions ARM takes. I wouldn't be surprised if Apple forks ARM ISA at some point and just goes with Apple Silicon.

Everybody will move to RISC-V. Its way better approach to design hardware and software, where you customize both for your specific needs and functions.

It's a shame however that the base ISA is very limited and heavily relies on instruction fusion to even be viable for high-performance personal computing. Also, the information density of RISC-V really sucks. This means they have to use instruction compression, but this cuts down their encoding space.

Also, the customisation aspect of RISC-V is undoubtedly its greatest advantage (especially for smaller startups that can innovate at their own pace), but it is also its biggest downfall. Customisation is amazing if you are developing a special-purpose microcontroller or accelerator. It's a huge problem for personal computing, because it creates fragmentation.
 
  • Like
Reactions: Basic75

Longplays

Suspended
May 30, 2023
1,308
1,158
Except Apple already has a license and I doubt they are impacted by whatever decisions ARM takes. I wouldn't be surprised if Apple forks ARM ISA at some point and just goes with Apple Silicon.
But what does the license allows and disallows.

I was watching this video on the Qualcomm/Nuvia vs ARM lawsuit and thought to myself what can Apple do or not do with their ARM license/contract?

This includes legal commentary from @ServeTheHomeVideo who happens to be a JD.

 
Last edited:
  • Like
Reactions: Xiao_Xi

koyoot

macrumors 603
Jun 5, 2012
5,939
1,853
Also, the customisation aspect of RISC-V is undoubtedly its greatest advantage (especially for smaller startups that can innovate at their own pace), but it is also its biggest downfall. Customisation is amazing if you are developing a special-purpose microcontroller or accelerator. It's a huge problem for personal computing, because it creates fragmentation.
It won't be a problem in "programming by AI" era.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
RISC-V ISA is very successful in academia, but not so much in industry. It is still incomplete and has not proven to be an alternative to Arm ISA in industry.

Not really true. For general PC ( windows / macOS ) with high inertia software bases ? No . High inertia Android ? No focused embedded contolllers ( e.g. , HDD controller , network controller , etc ) whwere there is no huge , generic app ecosystem …. Yes RISCV is making inroads into an area thr Arm had basicallly taken over the first 10 years of this decade . That is why Arm is trying to mine up into Severs and really peeved at Qualcomm for muddying the waters on maybe opening up the Windows market unit sales.



It seems more plausible to me that Apple will be Arm's last remaining customer and buy the ISA than that Apple will start RISC-V based SoCs.


Last customer ? Probably not . If most folks leave Arm then Apple probably would just stop buying new Architectural licenses . Then might be some really small license fee they owed for some of the then older embedded controllers, but RISC-V will probable decimate that category quicker than main app CPU core .

Apple would more likely just fork off from Arm‘s updates if everyone else was leaving and Apple didn’t like where they were all going. But Apple probably really doesn’t want to pay all the expenses for new ISA R&D . However , If they feel there is really not much more to add to general CPU instruction sets, then the R&D costs are shrinking also . If the builk of the A/M series SoCs future is in non CPU cores then Arm isn’t adding much value anyway.

Could future versions of R1 use RISC-V ISA?

R1 far more needs non-CPU ( ’uncore’ ) silicion subsystems than CPU core clusters to be successful. i don’t see needing anything special there CPU wise that isn’t already there in the ISA if Apple is up-to-date on new arch licensing . If Apple has a A/M core cluster, they can just reuse those ( maybe chopped down 2 or 4 instead of 4 or 8 )

I doubt that Apple used RISC-V for R1 because the extension for sensor fusion isn't yet finished.

Extension not finished .? People just wore mostly working prototypes ? which extension is this ?
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
It won't be a problem in "programming by AI" era.

errr no. If there a block of instructions missing from platform whether the programmer is AI or human doesn’t matter much. If the requiprednisone performance service levels diaper when try to kludge around the missing funct with user level code then it may not meet specs.

for example x86 virtualiz implemens before 6-8 generations of virtualization instruction additions . Or have that WSL subsysten working in Windows on Apple hypervisor …nope. ( no nested virtual and pffft )
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
But what does the license allows and disallows.

I was watching this video on the Qualcomm/Nuvia vs ARM lawsuit and thought to myself what can Apple do or not do with their ARM license/contract?


Completely different cases . Qualcomm/Nuvia are trying to cherry pick parts of licensing , info ,and rights they individually got into a new ‘best for Qualcomm‘ deal were actually still trying to get Arm certification . Apple can just quit Arm certification altogether if completely just fork off . ( no new Arm stuff . Stop trying to be ‘Arm’ friendly with code from other implementations ( Window /Linux vi might stop working over long term ) )

If implementing an ISA that is not compatible with Arm , then generally do not need an Arm license.

Apple might be throwing most of their hypervisor/virtualization work down the toilet , but it is an option to shift to an walled garden with even higher, ’20m’ walls .
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Extension not finished .? People just wore mostly working prototypes ? which extension is this ?
The RISC-V P Packed SIMD Extension.
The proposed P instruction set extension increases the DSP algorithm processing capabilities of the RISC-V CPU IP products.

Apple would more likely just fork off from Arm‘s updates if everyone else was leaving and Apple didn’t like where they were all going.
What do you mean by forking Arm ISA? Using only custom instructions from now on?
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
The RISC-V P Packed SIMD Extension.



if Apple is trying to achieve max Perf/watt sensor fusion at real time speeds , then doing it with a general purpose CPU core may not make sense. You are presuming that Apple is attacking that problem with Arm CPU cores. They may not. Apple gets to pick the specific cameras on the headset. The get to pick the specific senspr fusion tasks they want to cover . The have image processing fixed function logic. Fix function video en/decompression logic . The R1 doesn’t have to be an off the shelf SoC . Nor does it have to work in iphones/Macs . They aren’t likely trying to use open source , portable source code to do the task ( porting application software from other arm/riscv platforms to R1 ) .

R1 needs some good AI/ML processing , but that doesn’t have to be on generic CPU cores.

What do you mean by forking Arm ISA? Using only custom instructions from now on?

Yes, but fork off ( not an competing Arm ISA ). Apple only invented additions to ISA ( and not tracking anyone else’s updates ). For example, Apple has AMX for matrix . That is not Arm standard at all ( Arm recently added some extensions oriented toward matrix work .)

Apple already does their own microarchitecture so they can speed up the current ISA with no new additions .

Once added bfloat16 , matrix , generic notion of SVE2 , and decent virtualization…. What is a huge missing piece in the ISA that general apps need ??? Future ‘ hard core AI/ML ‘ work … why not throw that at NPU cores . Faster AV1 processing …. Why not fixed function logic. Disk I/o to from GPU … why not let GPU do that … etc etc. etc .

Pretty good chance that Arm additions will drift more into server space and/or areas where Apple has other core types that do that work. The Arm64 ISA has gotten to the mature state .
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
You are presuming that Apple is attacking that problem with Arm CPU cores. They may not. Apple gets to pick the specific cameras on the headset. The get to pick the specific senspr fusion tasks they want to cover . The have image processing fixed function logic. Fix function video en/decompression logic . The R1 doesn’t have to be an off the shelf SoC .
My first thought was that Apple is using an ASIC-type DSP/sensor fusion core rather than an Arm-based real-time SoC. I just find it less interesting to talk about an ASIC-type sensor fusion core.
 

dmccloud

macrumors 68040
Sep 7, 2009
3,138
1,899
Anchorage, AK
The RISC-V P Packed SIMD Extension.



What do you mean by forking Arm ISA? Using only custom instructions from now on?

The "proposed" extension, meaning it has not been finalized, let alone made an official part of the RISC-V spec. As far as "forking" the ISA goes, Apple has technically done that already, with some (but nowhere near most) of Apple's custom instructions making their way back into the official ARM ISA. R1 undoubtedly shares some features with the M series, but there's likely several things unique to that chip and how Vision Pro takes advantage of its capabilities.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.