Of course, some would say Apple of the 1980’s was better.I liked Apple of the 2000s better.
Of course, some would say Apple of the 1980’s was better.I liked Apple of the 2000s better.
Don’t get me wrong - I love RISC whether PowerPC(I run a few of those for file serving) and ARM(13 inch is bad on my eyes). I do hate Intel though.. but I want to get more proof that M1 is as fast as they say it is. I am a pro-RISC user, not CISC.
Sorry, that is incorrect. x86 is in no way RISC-y; that‘s just a marketing plot by Intel making that up.@cmaier has explained that in other threads to quite some extentx86 .... essentially RISC machines under the hood.
Sorry, that is incorrect. x86 is in no way RISC-y; that‘s just a marketing plot by Intel making that up.@cmaier has explained that in other threads to quite some extent
Sorry, that is incorrect. x86 is in no way RISC-y; that‘s just a marketing plot by Intel making that up.@cmaier has explained that in other threads to quite some extent
Didn't he just claim ARM and x86 are nothing alike? ARM itself isn't that similar to a classic RISC architecture. It seems pretty strained to call instruction sets with FMA and saturating arithmetic instructions "reduced".
He has adressed this at multiple occasions. See above a quote I‘d consider relevant.The microops are not the same as RISC, at least because they are not independent of each other. There are also complications regarding the register file, and what happens when you run out of scratch registers (and steps you take to avoid scratch registers). Pretty much every internal block in each core has to be aware of aspects of the original CISC instruction to handle context switches, incorrectly-predicted branches, load/store blocking, etc. Having designed the scheduling unit for one of these bad boys, it’s really quite a pain in the neck, takes a lot of circuitry, adds to cycle time (slows the chip down), and results in wires running all over the place to send these extra signals from place to place.
Having also designed true RISC CPUs (sparc, mips, PowerPC), x86-64 cores are nothing at all like those.
You have to remember why the PowerPC got dropped; much the same reason Intel got dropped - the company involved could not put out CPUs of the speed at a certain power or when originally promised. It didn't help that the whole consortium fell apart leaving Apple out in the wilderness feeling like it just got conned into looking for queen snakes.So, your statement above just proved M1 DOES support running windows 10. As of right now, anything Apple says is empty air until I see it for myself. I hate Intel also.. I have a strong love for PowerPC as that is what made the Mac a real mac as far as i am concerned.. BUT M1 does seem to follow that same tradition. WE WILL SEE, I will wait.. M1 selections now are not in my best interest. 13 inch screen gives me headaches and on my eyes.. I have a 2015 MacBook Pro dual graphics - I may trade that up for a 15 inch M1 or M2.. to me, RISC IS THE BEST.
Stanford has a piece on RISC vs CISC In a thumb nail "The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program."He has adressed this at multiple occasions. See above aquote I‘d consider relevant.
From what I understand there is no agreed upon definition of what RISC really or what exactly a chip design must look like to qualify as RISC. That said, x86 does not seem to contain any feature considered required to qualify as RISC (see @cmaier various posts regarding microps, instruction decoding, registers, memory fetches. Plz bear with me if the latter isn‘t 100% technically accurate, I pulled this from the voids of my memory)
Ok, but where does this leave RISC ? M1 and PowerPC are RISC. I also like how this website was clean and not bloated with ads or YouTube video junk. I just loaded that page on my PB G4 and it flew scrolling to the bottom.Stanford has a piece on RISC vs CISC In a thumb nail "The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program."
Near the end we get this: "Today, the Intel x86 is arguable the only chip which retains CISC architecture."
Yes but there is a key difference. The 68x00 was used by few outside of Apple and the transition to PowerPC was not exactly smooth. More over trying to run x86 code on it was not the most pleasant thing. The M1 by contrast runs x86 code reasonably fast and runs ARM code better than what MS was using.Ok, but where does this leave RISC ? M1 and PowerPC are RISC.
Didn't he just claim ARM and x86 are nothing alike? ARM itself isn't that similar to a classic RISC architecture. It seems pretty strained to call instruction sets with FMA and saturating arithmetic instructions "reduced".
ARM is clearly RISC. Memory accesses limited to essentially LDR, LDM, STR, STM, SWP, and PLD instructions, large register count, fixed instruction lengths (within a given mode), no instructions require microcoding, etc.
no instructions require microcoding, etc.
The memory access is the main thing that seemingly remains unchanged over the years. ARM uses only standalone loads and provides a large number of register names, whereas x86 provides a number of instructions which accept a pointer as their final argument.
The disparity in register names (as opposed to available register space) isn't that large though. Neon uses 32 simd register names. Intel uses 16 for the most part. AVX512 exposes 32.
I would be surprised if things like VSQRT were implemented without microcoding.
Documentation – Arm Developer
developer.arm.com
ARM is clearly RISC. Memory accesses limited to essentially LDR, LDM, STR, STM, SWP, and PLD instructions, large register count, fixed instruction lengths (within a given mode), no instructions require microcoding, etc. All classic hallmarks of RISC. Like every RISC architecture it has its own quirks (e.g. conditional instructions), but it is fundamentally similar to MIPS, PowerPC, and SPARC, which are all CPUs I have designed. And it is *very* different than x86 and x86-64 (which are also CPUs I have designed).
Ask anyone who actually designs CPUs, and they will tell you that x86 is clearly CISC, pretty much everything else now is RISC, and the differences are easily visible in the complexity of the designs.
I guess I am a traditionalist from the old school of cpu architectures. I was in college when the PPC G3/MAINLY G4 era came out.
no really, i don't care what reports say because i have one - tested it hands on with intel based macs and also versus desktop PC's.Your interpretation and opinion.. but I think they are dead on.. then again, I don’t support Apple after Jobs died.. M1 may have greatness, but according to those reports I think they are doing the same s*** during the MHz myth days. Only time will tell and so far Apple is playing the alienation game by not allowing other OS’s to run on their pathetic closed hardware.
For some reason, this discussion hits me like the debate whether David Lee Roth or Sammy Hagar were the better singer for Van Halen. Roth put Van Halen on top of the world(! Oh yeah!). Hagar had plenty of hits with Van Halen, but he wasn’t as amazing a frontman as Diamond Dave, who was the face of the Van Halen brand.Of course, some would say Apple of the 1980’s was better.
There is of course no doubt that ARM is RISC... my argument is that "RISC" and "CISC" in their original sense are notions from early days of CPUs and represent extreme poles on the design spectrum that are simply not useful or even relevant for today's high-performance computing. Current designs are always hybrids.
You simply can't make a fast CISC CPU without backing it by some sort of reduced architecture (like x86 CPUs do with microcode). And you also can't make a fast RISC CPU without giving it complex operations, like ARMv8 ability to store/load multiple registers via a single instruction or the auto-increment addressing modes. And while most modern RISC CPUs might not use microcode in the classical sense, they absolutely do split operations into micro-ops.
The point being: labels like RISC and CISC trivialize the discussion. We need to look at the actual relevant differences between the ISA (variable width vs. fixed width, load/store vs register–memory, addressing modes, code density) and between the hardware implementation (wide vs. narrow backend, cache architecture, OOE capabilities, branch prediction etc.). Intel is not faster than ARM Cortex because one is CISC and another one is RISC, and Apple is not faster than Intel because one is RISC and another one is CISC. There are simply different design optimization points and there are worse and better designs for each of the optimization point. That is something we should talk about, not trying to pigeonhole CPUs into one of the two loosely defined and frankly unhelpful buckets.
Sorry, but what you are saying makes zero sense. CPUs are computing devices, not fashion items. If you are "traditionalist" (whatever that means), you shouldn't be using any computer made in the last 20 years.
But you seem to be basing your argument on the idea that something has changed. Unlike in the “early days,” now x86 cpus have a “reduced architecture” because of microcode.
But CISC machines have always had microcode (though they sometimes didn’t do it with microcode ROMs). That’s what makes them CISC machines. If you see microcode, you’ve got CISC.
And the labels are incredibly useful. In the industry, it’s shorthand for all the things I keep mentioning. Universally, RISC means the same thing to us. Same with CISC.
And when you compare two designs in the modern era where there is always enough instruction memory, ceteris paribus, the RISC one wins. Yes, what you call hardware implementation (and the rest of us call microarchitecture, because hardware implementation is a different thing), is important. And a better micro architecture can result in overcoming the RISC advantage, just as a better implementation, or a better semiconductor process can. But that doesn’t mean you can ignore RISC/CISC. Because if two products are made on the same process, using the same cell library, and the same macro circuits, and the same physical design techniques, using the same microarchitecture, then the RISC design wins in performance and performance per watt. That’s what makes these ”buckets” helpful.
ANd it’s disingenuous to say they are loosely defined. They are not. Everyone agrees that if you see microcode [more specifically the need for a state machine in the instruction decode unit], addressing modes where random instructions access memory, variable instruction lengths outside of modes [more specifically, the requirement to scan the instruction stream to determine instruction end points], it’s CISC.
When you mention things like autoincrement or multi-register load/store as “complex,” that’s never been what “complex” means in RISC vs. CISC. Complex has always referred to the decoding/issuing. It’s trivial to implement an increment - it does not complicate the pipelines or require interactions between multiple “micro-ops.” The instruction decoder simply sends a single flag signal to the ALU, and the ALU does it as the last step. Multiple load/stores in parallel - same deal. Lots of possible implementations, but it doesn’t even need to take more than once cycle (other than the memory accesses - the RF can be multi-ported).
What makes an instruction complex is ”oh, before i can perform this first I have to wait 50 cycles to read this memory location, which could cause a cache miss. Since I have to add the result of register AX to that, I need someplace to hold the results temporarily. Hopefully that didn’t cause an overflow. If it didn’t, then I use the result of the sum as an address to store the results of this subtract involving two other registers.” I have to use multiple parts of the chip (load/store unit, ALU, etc.) in sequence, with dependencies between steps so that I can’t just let things fly and work on something else while waiting for the results. Each of the things you mention as “complex,’ by contrast, are just things that happen within a single unit, and are more parallel or take longer than, say, a shift left instruction. Every real RISC machine ever has likely had an integer multiplier that takes 4-10 times as long to reach a result as an integer addition. That doesn’t make the integer multiply instruction “complex.”. Same with square root, divide, etc. Same with autoincrement - the results of add-plus-increment do not require that I add, store the results somewhere, then send a new increment instruction into the ALU. Most likely the ALU feedback register is simply an adder with a bypass, and you’re done.
Why? I implemented floating point square root for the follow-up to the PowerPC x704, and we certainly didn’t have any microcode. The load/store unit sends the instruction to the ALU where it goes to the sqrt unit (which is a lookup table and some newton-Raphson magic, if I remember correctly - that was in 1996 or 1997 so my memory is vague), and that ALU takes however many cycles as necessary before setting the “I’m done” signal which tells the retirement circuitry that the contents of the bypass register are valid.
There’s a weird idea going around (i think the stanford link above suggested it too) that “complicated” instructions require microcode. Multiplication was given as an example. Multiplication and division can take many clock cycles, but there’s no microcode used on any RISC processor I’ve ever seen. There’s an ALU, you tell it you want to do div or mul, and it takes multiple cycles, and signals when it is done. If the instruction requires passing data from the output of one part of the ALU into an input of another part of the ALU, that’s handled by sequencing logic within the ALU. (That’s rare. I seem to recall I did that in an integer divider, once, where I had to feed something from one circuit into the input of the integer multiplier).
The square root example came to mind in particular due to the potential number of micro-ops potentially involved there. I wasn't thinking so much of complicated instructions from a conceptual viewpoint as those that commonly take many micro ops.
Floating point multiplication doesn't take many cycles on any ARM processor constructed with performance in mind. The optimization guides in many of these latency tables suggest around 4 cycles is common for both add and subtract variants of FMA3.