Why Intel and AMD don't make chips like the M2 Max and M2 Ultra

Basic75 · Oct 10, 2023

bobcomer said:
Dang people around here are so into personal attacks and saying I'm lying about my experience.

I don't think so, at least that's not what I think. We believe you when you say that your workload runs well with many smaller cores. The issues are that your experience does not generalise to all, or even many, workloads. We disagree with the understanding that you imply that it does. Further, we believe that your workload would run equally well on a smaller number of higher performance cores of equal aggregate performance. And we disagree with the understanding that you imply that it might not. Faster cores are more flexible, and if we could have a single core that was 4x as fast as a current P-core we would have it. But it's not (currently) possible, and that's why we have (at least) 4 of them in our Macs, that is relatively simple.

leman · Oct 10, 2023

Basic75 said:
It's really that simple, one large core can "simulate" two smaller cores via time slices, but two smaller cores can not be combined into one large core to run a single task faster.

Yep. The argument of context switch overhead is void, since it only matters if your time slices are super short. And it only makes sense for them to be super short if your workload has a lot of internal communication. In which case multiple slow cores will do even worse since you have to pay for the core-to-core communication.

And even more, the last few years of computing were about improving the efficiency of highly interdependent concurrent code. Buzzword „async“. We are moving away from threads as units of concurrent computation because threads are heavy, expensive, and not energy efficient.

bobcomer · Oct 10, 2023

leman said:
Context switch on a modern CPU takes around 1-1.5microseconds. With cache miss penalty this goes up to 3 microseconds.

Okay. Not really anything I didn't know. It only gets to be a problem when you try to shove through way too many context switches. That's why I like more cores, less switches per core.

leman said:
Let's go back to our "one fast core vs two slow cores" example. A fast core running two threads using one millisecond slice will lose at most 3 milliseconds on context switch overhead. That's 0.3% of time lost. And of course, since the OS is smart and can figure out that both threads are demanding, it might use 5 or even 10 millisecond slices, dropping that overhead down to 0.3ms for each second. It's practically negligible. And the fast core will perform much better on asymmetric workloads.

That makes sense, with one and two cores and two threads. Even back when context switching was a lot slower than now.

leman said:
Sure, if you have very intensive symmetric workload that has hundreds or thousands of threads in flight, context switch overhead will become substantial. That's where the GPUs come into play.

If you have a GPU <g> Most of the machines I use don't. (other than an iGPU) Our midrange machine (big FAST DB server, that also happens to run our business software)

But I'm talking more than 2 active threads. I have a lot of background monitoring apps, and do a lot of work with VM's. Having more cores just makes it faster to switch between tasks for me, and it is noticeable. Maybe a 2 modern core machine could handle it as well as my Ryzen 7 8 core, but I can't buy a modern 2 core to compare. And the intel PC's I control, even the cheapest we have are more than 2 cores.

All my stuff is very general purpose, never just one type of task at a time. Not like video/sound processing, just DB, data, reporting, and monitoring business processing. Maybe that's where the disconnect here is between most people and myself. I'm an IT generalist, I do it all, I don't just program, I don't just analyze data, I don't just watch my servers and network to make sure they're running okay. I do it all and I need a lot of automaton for that, hence a lot of tasks.

bcortens · Oct 10, 2023

bobcomer said:
Dang people around here are so into personal attacks and saying I'm lying about my experience. Well, I wont block any more people about it, but you can all forget about me responding to it.

You stick with your 2 cores and I'll stick with my 8 because it fits what I do, and no amount of insults changes that.

I'm not attacking you, I think you're exaggerating the extent to which your personal experience is generalizable to real world and laboratory conditions.

bobcomer · Oct 10, 2023

Basic75 said:
I don't think so, at least that's not what I think. We believe you when you say that your workload runs well with many smaller cores.

That's a good thing, thanks

Basic75 said:
The issues are that your experience does not generalise to all, or even many, workloads.

That could well be true, and I don't doubt that it is. I know most people don't use a computer like I do. But on that same token, the way I use it is valid too and I need the machine to do it, so is it so odd for me to want machines that can do that workload? Sure, I'd take an 8 fast core machine over an 8 core slow machine, but if I couldn't have that, I'd settle for an 8 slower core machine over a 2 faster core machine.

Basic75 said:
We disagree with the understanding that you imply that it does.

Ah, that's the problem. That's wasn't my intent at all! I never intended to imply that what I do and need works for everyone else. I am biased personally towards that type of machine/workload, yes, but I don't buy the same machines for most of my users that I buy for myself either. I also get a lot more RAM and as fast an SSD too for myself...

What's bad is I was reading the room the other way, that all you one or two FAST core "designers" people were implying that my desire for more cores was stupid and couldn't possibly be better for my workload than your one or 2 core machine. How could I think otherwise when so many question my experiences.

So to set the record straight, I'm only talking about my needs and desires on computer design, that's it, period. I'm not a computer hardware designer, I left that stuff back in college, but I do want machines that fit me. If I overstepped in posting at all I'm sorry. (but I do think my use case is valid enough to mention.)

bobcomer · Oct 10, 2023

bcortens said:
I'm not attacking you, I think you're exaggerating the extent to which your personal experience is generalizable to real world and laboratory conditions.

I never said it was generalizable.

APCX · Oct 10, 2023

bobcomer said:
What's bad is I was reading the room the other way, that all you one or two FAST core "designers" people were implying that my desire for more cores was stupid and couldn't possibly be better for my workload than your one or 2 core machine.

To be clear, I am absolutely saying that. Two fast cores would absolutely be better than 8 slow cores if the fast cores are significantly faster.

bobcomer · Oct 10, 2023

APCX said:
To be clear, I am absolutely saying that. Two fast cores would absolutely be better than 8 slow cores if the fast cores are significantly faster.

Obviously, and I don't agree with that and know it's not true for me.

APCX · Oct 10, 2023

bobcomer said:
Obviously, and I don't agree with that and know it's not true for me.

How do you know?

bobcomer · Oct 10, 2023

APCX said:
How do you know?

How do you know it's not?

APCX · Oct 10, 2023

bobcomer said:
How do you know it's not?

Computer science. As shown to you by a multitude of posters on here.

bcortens · Oct 10, 2023

bobcomer said:
Obviously, and I don't agree with that and know it's not true for me.

This is exactly why people keep arguing with you, you "know" it isn't true for you.

You mentioned you have 8 Zen cores (don't remember the generation) but no matter, these are not "slow" cores by any stretch of the imagination. But for the sake of argument, suppose significantly faster meant that two cores, each of which is able to sustain 4x the performance of an individual zen core. In this case the total compute power would be the same between the two processors and I doubt you could measure a significant difference in responsiveness in a laboratory setting.

Now, since those cores don't exist, your use case makes perfect sense, however you earlier claimed that you would take two 2.5 GHz cores over a single 5 GHz core as that would be less responsive than the two cores (assuming all else was equal). You've now moved the goalposts to be all about your use case while I, and others, were responding to the more general point that more slow cores is not going to be measurably better than fewer fast cores. We also keep pointing out that the way an OS time slices processor time means that in most cases a few fast cores are going to be better than a lot of slow cores for general purpose use.

Apple's iPhone CPUs are a perfect example of this, the iPhone would not be improved by trading out the two P-cores for an additional 6 E-cores even though aggregate theoretical performance would be similar. It would be made much worse because any task that overwhelmed the per second compute capabilities of the e-core will bog down and take much longer to complete.

bobcomer · Oct 10, 2023

APCX said:
Computer science. As shown to you by a multitude of posters on here.

Sorry guy, I have a computer science degree too and nobody has shown any proof against what I said, no matter how much you think so. Don't expect me to respond again.

APCX · Oct 10, 2023

bobcomer said:
Sorry guy, I have a computer science degree too and nobody has shown any proof against what I said, no matter how much you think so. Don't expect me to respond again.

They have shown you multiple times. You respond with zero factual arguments. Just “I feel” posts.

I would really appreciate it if you could walk me through your argument.

Analog Kid · Oct 10, 2023

Basic75 said:
one large core can "simulate" two smaller cores via time slices, but two smaller cores can not be combined into one large core to run a single task faster

Again, we should be careful to not be too absolutist in pursuit of an argument. Multiple smaller cores can simulate one large core. This is commonly what's done for the types of workflows that are farmed out for GPU or neural processing-- take a massive computation and dice it up into a lot of smaller computations then combine the individual results.

As with any simulation, it's not the real thing and certain workflows show the differences more clearly. For one core simulating many, it shows in areas like latency and determinism or optimization via heterogenous logic units. For multiple cores simulating one it shows when there is a lack of independence in the computations being done and difficulty in machine optimizing asynchronous workflows.

Yes, one single fast core is generally preferred, but multiple cores can simulate one large core.

APCX · Oct 10, 2023

Analog Kid said:
Again, we should be careful to not be too absolutist in pursuit of an argument. Multiple smaller cores can simulate one large core. This is commonly what's done for the types of workflows that are farmed out for GPU or neural processing-- take a massive computation and dice it up into a lot of smaller computations then combine the individual results.

As with any simulation, it's not the real thing and certain workflows show the differences more clearly. For one core simulating many, it shows in areas like latency and determinism or optimization via heterogenous logic units. For multiple cores simulating one it shows when there is a lack of independence in the computations being done and difficulty in machine optimizing asynchronous workflows.

Yes, one single fast core is generally preferred, but multiple cores can simulate one large core.

Im not sure what “simulate” means in this context?

Analog Kid · Oct 10, 2023

APCX said:
Im not sure what “simulate” means in this context?

I understood it to mean to make one architecture behave as though it were another giving the same (in this case numeric) results but with different secondary characteristics (execution time, power, etc). Turing complete machines can simulate each other.

APCX · Oct 10, 2023

Analog Kid said:
I understood it to mean to make one architecture behave as though it were another giving the same (in this case numeric) results but with different secondary characteristics (execution time, power, etc). Turing complete machines can simulate each other.

Ok. I’d be interested in other people’s knowledge on this.

To my understanding it’s the os that deals with that. In a sense a single core machine doesn’t simulate a multi-core machine and vice versa is true. Work is split into small amounts by the scheduler which portions out the work to any cores available. The available resources are shared by the os, making sure that each process gets some time and also that tasks with a higher priority get more time than lower priority tasks. For example on macOS audio work would generally run at a higher priority than os maintenance tasks.

The complexity of multicore machines is that it can be tough to divide up certain tasks. Some things depend on the result of another operation, as a result, you may not see a speedup from having multiple cores/cpus. There are other issues as well.

Overall, my point would be that neither is acting or simulating the other.

There are however many more knowledgable people here so I could be completely wrong!

Analog Kid · Oct 10, 2023

APCX said:
Ok. I’d be interested in other people’s knowledge on this.

To my understanding it’s the os that deals with that. In a sense a single core machine doesn’t simulate a multi-core machine and vice versa is true. Work is split into small amounts by the scheduler which portions out the work to any cores available. The available resources are shared by the os, making sure that each process gets some time and also that tasks with a higher priority get more time than lower priority tasks. For example on macOS audio work would generally run at a higher priority than os maintenance tasks.

The complexity of multicore machines is that it can be tough to divide up certain tasks. Some things depend on the result of another operation, as a result, you may not see a speedup from having multiple cores/cpus. There are other issues as well.

Overall, my point would be that neither is acting or simulating the other.

There are however many more knowledgable people here so I could be completely wrong!

It's probably not useful to rabbit hole into the meaning of one word, maybe another word better conveys the meaning.

Formally, "simulate" is correct as used here:

Turing completeness - Wikipedia

en.wikipedia.org

You might read it slightly more loosely as "sufficiently but imperfectly modeling the behavior of". Meaning that beyond simply returning equivalent results, one core mimics the behavior of several by quickly sequencing among tasks serially rather than in parallel. A parallel process can mimic a serial one by running each iteration of certain for-loops in parallel, for example.

If you're reading it as "intentionally pretending to be", it's still reasonably accurate. I'm pretty sure the original schedulers were designed specifically to simulate (pretend to be) multicore machines because you had one very expensive mainframe and many users clamoring for access. Rather than give each user a dedicated CPU, the mainframe time sliced among tasks given each user the impression they had a machine of their own. Processes act as automated users. Likewise when you hide a GPU accelerated kernel behind a host function call, it's pretending to be run on the host machine.

In the end, the two architectures can stand in for each other yielding the same computational results. Each has strengths and weaknesses, but certainly for general purpose computing the tension is between the fact that algorithms generally prefer fewer faster processors and physics prefers more slower ones. That's why most of the cores on my machine sit idle until I run a process that was painstakingly written to use more of them at once.

falainber · Oct 10, 2023

APCX said:
To be clear, I am absolutely saying that. Two fast cores would absolutely be better than 8 slow cores if the fast cores are significantly faster.

That's a meaningless statement without specific numbers. If this was true in a vacuum, we would still be using single core computers. Yet all vendors, including Apple, are rapidly increasing the core count. That's because the MC loads are becoming more and more prevalent and critical for overall performance. SC performance is still important for some (rare) tasks but for the most typical SC tasks (like UI), all modern cores are simply fast enough to care.

Basic75 · Oct 10, 2023

bobcomer said:
What's bad is I was reading the room the other way, that all you one or two FAST core "designers" people were implying that my desire for more cores was stupid and couldn't possibly be better for my workload than your one or 2 core machine. How could I think otherwise when so many question my experiences.

We are implying that if aggregate performance is equal then fewer cores are better than more cores because a wider range of applications will be able to use a larger percentage of that total aggregate performance.

And since many applications are not, or only lightly, multi-threaded, that it is important to have at least a couple of really fast cores.

Intel's approach since Alder Lake makes a lot of sense to me, even in configurations with 2 or 4 performance cores and 8 area-efficient cores:

The performance cores can take care of the not very multi-threaded applications, and when you have many threads, from one or more applications, then all cores can contribute.

APCX · Oct 10, 2023

falainber said:
That's a meaningless statement without specific numbers. If this was true in a vacuum, we would still be using single core computers. Yet all vendors, including Apple, are rapidly increasing the core count. That's because the MC loads are becoming more and more prevalent and critical for overall performance. SC performance is still important for some (rare) tasks but for the most typical SC tasks (like UI), all modern cores are simply fast enough to care.

We are using multi core computers largely because single core improvements stalled.

Basic75 · Oct 10, 2023

Analog Kid said:
Again, we should be careful to not be too absolutist in pursuit of an argument. Multiple smaller cores can simulate one large core.

In this case I am intentionally absolutist because (and I thought it was clear from the context) with "one task" I meant one thread. And in that case I hope that it's obvious that it doesn't matter how many smaller cores you have, a single larger one will be better because the small ones can't "gang up."

Basic75 · Oct 10, 2023

falainber said:
If this was true in a vacuum, we would still be using single core computers. Yet all vendors, including Apple, are rapidly increasing the core count. That's because the MC loads are becoming more and more prevalent and critical for overall performance.

You're mistaking cause and effect. The transistors in a chip have been increasing exponentially for decades. In the beginning the additional transistors could be easily used to make a single processor core faster. Over time it has become more and more difficult to increase single core performance, and more so to do it energy efficiently. So the only way that these additional transistors could be used to increase (aggregate) performance was to put 2 and then ever more cores into one chip. Since this went mainstream, application and game developers started to make their programs multi-threaded, which is easy in some cases, hard in many more, and nearly impossible in some others again. Let me repeat, in many cases it's really hard work to make applications use multiple cores well! Case in point, often the software is lagging behind the hardware. For example you can get PCs with 16 core processors in a desktop platform, not server, not workstation, plain old desktop, but how many games use more than 8?

leman · Oct 10, 2023

falainber said:
That's a meaningless statement without specific numbers. If this was true in a vacuum, we would still be using single core computers. Yet all vendors, including Apple, are rapidly increasing the core count. That's because the MC loads are becoming more and more prevalent and critical for overall performance. SC performance is still important for some (rare) tasks but for the most typical SC tasks (like UI), all modern cores are simply fast enough to care.

No, we are increasing the number of cores because we hit a limit with single-core performance. While we still get small improvements in single core, it also costs increasingly more power. Multiple cores can side-step this problem.

Why Intel and AMD don't make chips like the M2 Max and M2 Ultra

macrumors 68020

macrumors Core

macrumors 601

macrumors 65816

macrumors 601

macrumors 601

Suspended

macrumors 601

Suspended

macrumors 601

Suspended

macrumors 65816

macrumors 601

Suspended

macrumors G3

Suspended

macrumors G3

Suspended

macrumors G3

macrumors 68040

macrumors 68020

Suspended

macrumors 68020

macrumors 68020

macrumors Core

Our Staff