Why Intel and AMD don't make chips like the M2 Max and M2 Ultra

deconstruct60 · Oct 9, 2023

R2DHue said:
Incidentally, RISC engineers didn’t get it perfectly right the first time: the first thing they jettisoned was floating point.

Errrr. The original Berkley and Stanford RISC back in the early 80's???? See the opcode genelogy chart here and also note the strong correlation between empty boxes and 'older stuff'.

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-6.pdf

If you set the "way back' machine to the late 70's and 80's , almost nobody on small scale chips was doing Float. Especially not IEEE standard float ( which was not standardized until 1985 ) .

Berkeley and Standard CS/EE departments/programs were as big as they are now. Only so many grad students and some much funding for this stuff at the very early start. Early on, some stuff is not there because 'no budget or resources'.

Even IBM's stuff was relatively a 'side project'.

Much of RISC-V was taken from doing an intersection of selected 'good' instruction sets from RISC 'history'. Plus adjusting to the bigger transistor budgets afforded by modern fab techniques ( no good reason to restrict to the max transistor counts of 1983-89 era ) and better automated design tools.

[ The T1/Niagara ejecting float in the 2000's , two decades later, was a radically different context. Evolution in fab processes, toos, and actually having a standard to follow. ]

P.S. An addition on the "done by a grad student team" front ...

" ... The Berkeley team asked the question ‘Are we crazy not to use a standard ISA?’ before noting that existing standard ISAs (x86, ARM and GPUs) would probably be too complex for a university project anyway. ...
...
The team iterated on the new ISA. In order to make implementation as simple as possible, they gradually reduced the number of instructions down to the absolute minimum needed.

..."
"

RISC-V - Part 1: Origins and Architecture

The revolutionary instruction set with roots in the first Berkeley RISC design

thechipletter.substack.com

As RISC-V gets commercialized by bigger and bigger players (with relatively large , expensive team budgets ) . The budget for doing RISC-V will go up.

quarkysg · Oct 9, 2023

FlyingTexan said:
Absolutely. A driver has no reason to know these things. To get that involved and that in-depth means he's not focusing on the things that matters. I have almost 12,000hrs flying jets and the amount of things I don't know keeps growing. If I can't fix it from the cockpit I have no reason to know it nor should I want to. My first course of action is to call maintenance and let the people that specialize in that handle it.

I don't think you can call maintenance when your aircraft's engine suddenly decided to quit working while in the air. Wouldn't an in-depth knowledge of how your equipment works give you a fighting chance of survival during emergency? Seems like a very narrow perspective there.

IMHO, the difference between a great race car driver vs an average one is that the former knows how to exploit the best characteristics of his/her equipment, as different race circuits calls for different equipment setup that suits the driver's driving style. If you don't know your car, you will not be able to set it up properly to win races.

Back to software, I'm the the camp that in order to create great software, you need to know the hardware you're creating the software for. Processing element like a GPU calls for a different structure of codes than say a CPU, that will be different from an FPU or NPU e.g.

Solution like Electron is a terrible one.

0423MAC · Oct 9, 2023

Because it's easier to force architectural changes with a significantly smaller userbase than it would be for Windows and most of Linux today. Forget about home users, you know what kind of panic this would cause in the corporate world globally?

We'll see where things stand in 10 years. Despite all the efficiency gains a lot of older legacy datacenters are running custom software that would require a complete rework. More money upfront and the hassle of swapping out entire buildings full of hardware, many that run software designed in the 1990s...

FlyingTexan · Oct 9, 2023

quarkysg said:
I don't think you can call maintenance when your aircraft's engine suddenly decided to quit working while in the air. Wouldn't an in-depth knowledge of how your equipment works give you a fighting chance of survival during emergency? Seems like a very narrow perspective there.

If you can't tell by the name I'm a pilot. Knowing how it works is like knowing how your car engine works. You turn the key and it starts. Him talking about a driver knowing how the gearbox works is like me saying I can draw the jet engine and it's components, which I can, and no none of that helps in any sort of an emergency.

FlyingTexan · Oct 9, 2023

Screenshot 2023-10-09 at 10.51.16 PM.png

Just a fun screen grab for those harping on intel and amd. While M series are great it's not like the other guys are sitting in the back seat crying. It comes down to use case. Apple made hardware that can do things now that hardware in the past couldn't but you have to be honest with yourself if you think other systems can't do it.

sack_peak · Oct 9, 2023

FlyingTexan said:
View attachment 2292187
Just a fun screen grab for those harping on intel and amd. While M series are great it's not like the other guys are sitting in the back seat crying. It comes down to use case. Apple made hardware that can do things now that hardware in the past couldn't but you have to be honest with yourself if you think other systems can't do it.

Heat? Battery life? Throttling?

quarkysg · Oct 9, 2023

FlyingTexan said:
If you can't tell by the name I'm a pilot. Knowing how it works is like knowing how your car engine works. You turn the key and it starts. Him talking about a driver knowing how the gearbox works is like me saying I can draw the jet engine and it's components, which I can, and no none of that helps in any sort of an emergency.

I did say equipment, not just confined to the engine. And I think you very well know that the use of the engine is just as an example.

Btw, I would never have guess anybody's profession just by looking at the online user name.

Do you think I'm a physicists?

Analog Kid · Oct 9, 2023

leman said:
There is no such thing as "software", programs run on hardware.

That's a bit over the top, no? I get that people are annoyed when someone seems to not understand how the hardware works but continues to insist they know better, but that doesn't mean software doesn't exist.

There is a lot of useful software that can be written without a detailed understanding of the hardware. There are lot of people with deep domain knowledge about the logic of the application even if they don't have a deep understanding of how the application will be executed.

Yes, if your goal is to wring the most performance out of a piece of code then understanding the hardware and using a language that lets you feel it will help, but if we're honest with ourselves we'll acknowledge that a lot of the benefits of better hardware over the past few generations have been in making it possible to get away with writing "bad code". We can spend more time adding capability and less time optimizing. Or maybe writing code at a higher level of abstraction. It's about shortening development time, not execution time.

So there are plenty of people who write software without fully understanding the hardware it runs on. If you look at lists of "most commonly used languages", most of them are too high level to care about the hardware beneath them:

Stack Overflow Developer Survey 2021

In May 2021 over 80,000 developers told us how they learn and level up, which tools they’re using, and what they want.

insights.stackoverflow.com

Top Programming Languages 2022

Python’s still No. 1, but employers love to see SQL skills

spectrum.ieee.org

In a lot of modern languages, software runs on software and the hardware is an abstraction or two below it.

theorist9 · Oct 9, 2023

FlyingTexan said:
View attachment 2292187
Just a fun screen grab for those harping on intel and amd. While M series are great it's not like the other guys are sitting in the back seat crying. It comes down to use case. Apple made hardware that can do things now that hardware in the past couldn't but you have to be honest with yourself if you think other systems can't do it.

1) Unless you created this yourself, I would urge everyone to *always* cite a reference, both so others can check the context (it's not even clear from from this if this is SC or MC performance) and, more importantly, to give the author or organization credit for their work.

2) Cinebench should not be used for cross-platform comparisons bertween AS, AMD, and Intel. It's been demonstrated that it disfavors AS (I found the disadvantage is about 10%). GB6 is a better cross-platform benchmark:
SC:
M2 Pro: 2663
i9-13900H: 2620
MC:
M2 Pro: 14,568
i9-13900H: 14,307

Source:

Processor Comparison - Head 2 Head

Apple M2 Pro vs Intel Core i9-13900H - Benchmarks, Tests and Comparisons

www.notebookcheck.net

3) These are mobile processors, so power consumption and heat is a critical component. Your chart doesn't show that. Even if the performance of the i9-13900H is, on average, somewhat higher than that of the M2 Pro, it's miselading to look at performance alone without accouting for power, since higher power consumption means a bigger, heavier form factor, lower battery life, and lower performance when on battery. So until AMD and Intel can offer mobile CPUs with the same performance/power ratio as AS, then it is indeed the case that they "can't [yet] do it."

Unfortunately the notebookcheck article doesn't give power consumption figures for the M2 Pro. But it does give them for the i9-13900H, and the M2's CPU certainly uses much less than this:

FlyingTexan · Oct 9, 2023

theorist9 said:
1) Unless you did this yourself, you should *always* cite a reference, both so others can check the context (it's not even clear from from this if this is SC or MC performance) and, more importantly, to give the author or organization credit for their work.

2) Cinebench should not be used for cross-platform comparisons bertween AS, AMD, and Intel. It's been demonstrated that it disfavors AS (I found the disadvantage is about 10%). GB6 is a better cross-platform benchmark:
SC:
M2 Pro: 2663
i9-13900H: 2620
MC:
M2 Pro: 14,568
i9-13900H: 14,307

Source:

Processor Comparison - Head 2 Head

Apple M2 Pro vs Intel Core i9-13900H - Benchmarks, Tests and Comparisons

www.notebookcheck.net

View attachment 2292196

3) These are mobile processors, so power consumption and heat is a critical component. Your chart doesn't show that. Even if the performance of the i9-13900H is, on average, somewhat higher than that of the M2 Pro, it's miselading to look at performance alone without accouting for power, since higher power consumption means a bigger, heavier form factor, lower battery life, and lower performance when on battery. So until AMD and Intel can offer mobile devices with the same performance/power ratio as AS, then it is indeed the case that they "can't do it."

Unfortunately the notebookcheck article doesn't give power consumption figures for the M2 Pro. But it does give them for the i9-13900H, and the M2's CPU certainly uses much less than this:

View attachment 2292207

I'm not getting paid and I'm not under any obligation to be told what I should and shouldn't do. It's a screen grab. If you want to know more then go look for it yourself I guess. Before simply stating CB shouldn't be used and GB6 should please state your sources. Make sure to include Author, title, date of publication, and permalink. Thank you for your understanding. BTW the scoring was using multiple test not just Cinebench and each of the machines listed is a thin and light.

theorist9 · Oct 9, 2023

FlyingTexan said:
I'm not getting paid and I'm not under any obligation to be told what I should and shouldn't do. It's a screen grab. If you want to know more then go look for it yourself I guess. Before simply stating CB shouldn't be used and GB6 should please state your sources. Make sure to include Author, title, date of publication, and permalink. Thank you for your understanding. BTW the scoring was using multiple test not just Cinebench and each of the machines listed is a thin and light.

Hey, you're the guy that was telling **us** what we should do (indeed, how we should think): "you have to be honest with yourself if you think other systems can't do it." I.e., that somehow we're not being "honest" if we don't acknowledge Intel and AMD can do the same as AS. [When, ironically, they can't.] So it's hypocritical to complain when someone does the same to you.

Plus it's wrong to post someone else's work without giving them credit, regardless of whether you think you're "under obligation".

Analog Kid · Oct 9, 2023

bobcomer said:
True, it doesn't sound like much difference, but divide those 8 cores by what is actively running and not in some kind of wait situation, and the numbers are closer to 8 and an 8 core processor is going to be better at it than a 2 core processor. If it's closer to 2, then the 2 core faster processor is going to be a lot better. I'll take 8 though.

Again, we’re looking a hypothetical where you have the choice between a core running N instructions per second and a k processors running N/k instructions per second.

Let’s say we have 8 cores and a sea of threads of which 8 are typically active at a time. One thread gets assigned to each core concurrently. Seems like you’re getting 8*N/8=N instructions per second, just as you would from a single core.

What if one of those threads produces data that another thread is meant to consume? Now that data needs to make its way from one core, through the memory system, to another core rather than just pass from thread to thread on that single processor.

What if a 9th process becomes active and takes priority? It will displace the active thread on one of the 8 cores. That thread might then become active before the 9th yields and thus displace another thread on one of the other cores. Around and around it goes pushing data and code in and out of caches rather than all staying in one place.

What if one of the 8 threads sleeps. Now you have 7 threads on 8 cores and some other thread is running half as fast as it might otherwise if everything was on one core. It’s that much worse if there’s only one thread going full out for an extended period. I don’t know if you’ve ever run a processor intensive process on a multicore machine and watched the CPU loading— you’ll very often see one core peg for seconds at a time, then go quiet and another pegs for a few seconds, then it changes again. The system is running at 12% of its full capability because 7 cores are idle.

You’re attempting to create very narrow examples to make a point, but you don’t design a general purpose machine to the exception you design to the breadth of expected use cases.

As others have said, we have multi core systems because it was the only path forward for practical reasons at an implementation level, not because it’s a preferred architecture for most uses.

leman · Oct 10, 2023

Analog Kid said:
That's a bit over the top, no?

You are right of course, I overdid it. It's just I don't see how one can be discussing the matters we are discussing (of high performance computing, multithreading, and multi-core programming) without having a good knowledge of how the CPU and CPU/OS interaction work.

Just the other day I was reading this story about a junior C programmer who decided to "optimise" a codebase by making all structs packed and adding a bunch of bitfields (their idea was to improve the memory consumption). The performance on x86 allegedly dropped by a factor of 10x and on other architectures the thing simply crashed due to unaligned access. That's the stuff I am talking about.

Basic75 · Oct 10, 2023

deconstruct60 said:
The 68000 was 32-bit from the start. The narrow amount of 16 bit stuff was more so in the 'inside' than the outside. It always had a decent number of data registers (32) .

Yes, the 68000 was a 16bit implementation (bus, alu) of a full 32bit architecture. However it only had 8 data registers (and 8 address registers.)

Basic75 · Oct 10, 2023

theorist9 said:
Cinebench should not be used for cross-platform comparisons bertween AS, AMD, and Intel.

Unless you are a heavy Cinema 4D user?

MRMSFC · Oct 10, 2023

leman said:
or an electrician who says "I just lay cables, I have no idea about basic electrical science"?

You would be shocked (no pun intended) at how many electrical engineers are just winging it.

bobcomer · Oct 10, 2023

Analog Kid said:
Let’s say we have 8 cores and a sea of threads of which 8 are typically active at a time. One thread gets assigned to each core concurrently. Seems like you’re getting 8*N/8=N instructions per second, just as you would from a single core.

That doesn't make sense. That one thread isn't the only thing running on an 8 core processor.

Analog Kid said:
What if a 9th process becomes active and takes priority? It will displace the active thread on one of the 8 cores. That thread might then become active before the 9th yields and thus displace another thread on one of the other cores. Around and around it goes pushing data and code in and out of caches rather than all staying in one place.

Of course, that's what a modern multitasking OS does. And yes, takes time to do a context switch.

Analog Kid said:
I don’t know if you’ve ever run a processor intensive process on a multicore machine and watched the CPU loading—

Of course I have.

Analog Kid said:
you’ll very often see one core peg for seconds at a time, then go quiet and another pegs for a few seconds, then it changes again.

If course, if that one thread is all that's running, but it's not.

Analog Kid said:
You’re attempting to create very narrow examples to make a point, but you don’t design a general purpose machine to the exception you design to the breadth of expected use cases.

That's what it sounds like you are doing to me -- this hypothetical, only 1 thread running. When I'm actually working, I *never* have just one core showing activity.

Analog Kid said:
As others have said, we have multi core systems because it was the only path forward for practical reasons at an implementation level, not because it’s a preferred architecture for most uses.

Disagree. With your preferred one thread, just how is multicore helping you? It isn't, it only helps when more than one thing at a time.

Sorry, your argument just doesn't fit with my experience.

bobcomer · Oct 10, 2023

MRMSFC said:
You would be shocked (no pun intended) at how many electrical engineers are just winging it.

Too true. Most of my college career was done with EE's. Some were brilliant, but some weren't at all. That's just the Computer Science track that I went through -- there were a lot of common courses. (math and physics mainly, but also hardware design, and they had to take a bit of programming too.)

bcortens · Oct 10, 2023

Analog Kid said:
Again, we’re looking a hypothetical where you have the choice between a core running N instructions per second and a k processors running N/k instructions per second.

Let’s say we have 8 cores and a sea of threads of which 8 are typically active at a time. One thread gets assigned to each core concurrently. Seems like you’re getting 8*N/8=N instructions per second, just as you would from a single core.

What if one of those threads produces data that another thread is meant to consume? Now that data needs to make its way from one core, through the memory system, to another core rather than just pass from thread to thread on that single processor.

What if a 9th process becomes active and takes priority? It will displace the active thread on one of the 8 cores. That thread might then become active before the 9th yields and thus displace another thread on one of the other cores. Around and around it goes pushing data and code in and out of caches rather than all staying in one place.

What if one of the 8 threads sleeps. Now you have 7 threads on 8 cores and some other thread is running half as fast as it might otherwise if everything was on one core. It’s that much worse if there’s only one thread going full out for an extended period. I don’t know if you’ve ever run a processor intensive process on a multicore machine and watched the CPU loading— you’ll very often see one core peg for seconds at a time, then go quiet and another pegs for a few seconds, then it changes again. The system is running at 12% of its full capability because 7 cores are idle.

You’re attempting to create very narrow examples to make a point, but you don’t design a general purpose machine to the exception you design to the breadth of expected use cases.

As others have said, we have multi core systems because it was the only path forward for practical reasons at an implementation level, not because it’s a preferred architecture for most uses.

It's worse than even you mention - typically all those small background threads and processes will never fully saturate CPU core and as such having a lot of small cores is fine. However typically you'll have one or two large threads that will happily saturate a high performance core for short bursts.

So:
You have 8 threads that only need 2 units of work per second (16 units total)
You have 2 threads that need 20 units of work per second (40 units total)
(suppose in this case these are ongoing tasks that cannot be quit)

If you have 8 small cores that can do say 10 units of work per second, and no fast cores then you are always going to be bottlenecked and the system will never feel responsive because the system is always going to be behind those 2 threads that each need 20 units of work

If you have 2 fast cores capable of 40 units of work (capable of the same total 80 units of work) you will be able to handle all the small threads and the larger threads and still have a ton of capacity left over that can be used for either an additional large thread or more small threads.

Additionally modern OSs are pretty good (but not perfect) at making sure enough processor time slices are allocated to UI threads to ensure the OS remains responsive even under heavy processor loading conditions.

bcortens · Oct 10, 2023

bobcomer said:
That doesn't make sense. That one thread isn't the only thing running on an 8 core processor.

Of course, that's what a modern multitasking OS does. And yes, takes time to do a context switch.

Of course I have.

If course, if that one thread is all that's running, but it's not.

That's what it sounds like you are doing to me -- this hypothetical, only 1 thread running. When I'm actually working, I *never* have just one core showing activity.

Disagree. With your preferred one thread, just how is multicore helping you? It isn't, it only helps when more than one thing at a time.

Sorry, your argument just doesn't fit with my experience.

I think you're exaggerating your experience and doubt it could be replicated under laboratory conditions. Why do I think this? Because your experience doesn't make any sense based on the way processor scheduling works in operating systems... as long as the total thread compute load is less than the total compute capacity of your processor a smaller number of faster cores will be able to execute variable workloads more responsively than a large number of slower cores.

Analog Kid · Oct 10, 2023

leman said:
You are right of course, I overdid it. It's just I don't see how one can be discussing the matters we are discussing (of high performance computing, multithreading, and multi-core programming) without having a good knowledge of how the CPU and CPU/OS interaction work.

Just the other day I was reading this story about a junior C programmer who decided to "optimise" a codebase by making all structs packed and adding a bunch of bitfields (their idea was to improve the memory consumption). The performance on x86 allegedly dropped by a factor of 10x and on other architectures the thing simply crashed due to unaligned access. That's the stuff I am talking about.

I don't disagree. It's a mixed audience here, with more observers than participants. And some participants who would do better to observe.

It's one thing to be able to construct a degenerate scenario where a particular architecture may have a hypothetical advantage, but it's usually a sign of bad engineering if you use that corner case as the basis for a general purpose approach. In this case there's the added complication that the world is, in fact, moving more toward multicore systems for entirely different reasons. So it's a case here of the right conclusion for the wrong reasons which just reinforces the bad logic.

As you say, if you don't understand the actual reasons for the architectural decisions you don't understand how to mitigate the compromises inherent in it.

Analog Kid · Oct 10, 2023

bcortens said:
It's worse than even you mention - typically all those small background threads and processes will never fully saturate CPU core and as such having a lot of small cores is fine. However typically you'll have one or two large threads that will happily saturate a high performance core for short bursts.

So:
You have 8 threads that only need 2 units of work per second (16 units total)
You have 2 threads that need 20 units of work per second (40 units total)
(suppose in this case these are ongoing tasks that cannot be quit)

If you have 8 small cores that can do say 10 units of work per second, and no fast cores then you are always going to be bottlenecked and the system will never feel responsive because the system is always going to be behind those 2 threads that each need 20 units of work

If you have 2 fast cores capable of 40 units of work (capable of the same total 80 units of work) you will be able to handle all the small threads and the larger threads and still have a ton of capacity left over that can be used for either an additional large thread or more small threads.

Additionally modern OSs are pretty good (but not perfect) at making sure enough processor time slices are allocated to UI threads to ensure the OS remains responsive even under heavy processor loading conditions.

Yeah, that's what I was trying to get at here, though maybe not as clearly:

Analog Kid said:
What if one of the 8 threads sleeps. Now you have 7 threads on 8 cores and some other thread is running half as fast as it might otherwise if everything was on one core. It’s that much worse if there’s only one thread going full out for an extended period. I don’t know if you’ve ever run a processor intensive process on a multicore machine and watched the CPU loading— you’ll very often see one core peg for seconds at a time, then go quiet and another pegs for a few seconds, then it changes again. The system is running at 12% of its full capability because 7 cores are idle.

leman · Oct 10, 2023

bobcomer said:
Of course, that's what a modern multitasking OS does. And yes, takes time to do a context switch.

Context switch on a modern CPU takes around 1-1.5microseconds. With cache miss penalty this goes up to 3 microseconds.

Let's go back to our "one fast core vs two slow cores" example. A fast core running two threads using one millisecond slice will lose at most 3 milliseconds on context switch overhead. That's 0.3% of time lost. And of course, since the OS is smart and can figure out that both threads are demanding, it might use 5 or even 10 millisecond slices, dropping that overhead down to 0.3ms for each second. It's practically negligible. And the fast core will perform much better on asymmetric workloads.

And the same argument also works with 100 1Gz cores running 100 threads vs a single hypothetical 100Ghz core.

We don’t make multi-core CPUs because it’s more fun. We make them because there are limits how fast we can make single cores.

bobcomer · Oct 10, 2023

bcortens said:
I think you're exaggerating your experience and doubt it could be replicated under laboratory conditions. Why do I think this? Because your experience doesn't make any sense based on the way processor scheduling works in operating systems... as long as the total thread compute load is less than the total compute capacity of your processor a smaller number of faster cores will be able to execute variable workloads more responsively than a large number of slower cores.

Dang people around here are so into personal attacks and saying I'm lying about my experience. Well, I wont block any more people about it, but you can all forget about me responding to it.

You stick with your 2 cores and I'll stick with my 8 because it fits what I do, and no amount of insults changes that.

Basic75 · Oct 10, 2023

leman said:
And the fast core will perform much better on asymmetric workloads.

It's really that simple, one large core can "simulate" two smaller cores via time slices, but two smaller cores can not be combined into one large core to run a single task faster.

Why Intel and AMD don't make chips like the M2 Max and M2 Ultra

macrumors G5

macrumors 65816

macrumors 6502a

macrumors 6502a

macrumors 6502a

Suspended

macrumors 65816

macrumors G3

macrumors 601

macrumors 6502a

macrumors 601

macrumors G3

macrumors Core

macrumors 68020

macrumors 68020

macrumors 6502

macrumors 601

macrumors 601

macrumors 65816

macrumors 65816

macrumors G3

macrumors G3

macrumors Core

macrumors 601

macrumors 68020

Our Staff