Aspects where Intel Macs still outperform the M1 models?

pshufd · Apr 2, 2021

bobcomer said:
I don't understand your point here. That's the way computers run these days and good for that to me. Computers aren't a game to get the best theoretical usage, it's to get the job done, and they're pretty good at that -- at least ever since we went from single threaded, interrupt driven, state machines. And yes, that means intel processors do it pretty well, they get the job done at a reasonable cost. Saturating cpu throughput is not something that's a good thing, it's a hard limit that one should strive not to do.

Power consumption rises exponentially with frequency so saturating a CPU is an inefficient way to get work done which is why we have multi-core and multi-threaded CPUs today. My i7-10700 gets the job done well with this though an AMD 5900X or 5950X would do a better job as there are more cores/threads to share the load. It's an expensive way, though, to keep power consumption low and run a cool system.

leman · Apr 2, 2021

bobcomer said:
Totally disagree, single core perf means nothing to me, we have multithreaded everything these days.

I am not arguing with that, simply pointing out the potential of Apple Silicon for multi-core applications.

bobcomer said:
Even if the M1 had a single core performance increase over my i9 (I don't think it does), it only has 4 high performance cores, I have 10+. They just really don't compare in any way.

Well, M1 (an entry level SoC with peak CPU power consumption of 15W) multicore performance is around 60-70% of that of a i9-10900K (enthusiast-level Intel with 10 cores and 20 threads and sustained power consumption of 125W). As far as single-core performance goes, the only currently commercially available CPUs that can outperform M1 are the end-of-the-line desktop Zen3 or the brand new Intel Rocket Lake.

bobcomer said:
As for the scale linearly -- that doesn't make any sense at all -- we don't know squat about scaling on an M1 because there's only 1 version right now, and it only has 4 performance cores and 4 low power cores.

Actually we do, since there are variants with 2 performance cores called A14. The M1 is around 70% faster in multi-code workloads than the A14 iPad, which is exactly the performance you'd expect from two extra performance cores while taking into account that both chips have identical 4 high-efficiency cores (1 Icestorm efficient core ~ 20% Firestorm performance core). Sustained power usage estimates of the iPad put it at around 7-10W, so M1 consumes somewhere between 50-100% power under load. I'd say that's quite good scaling.

bobcomer said:
The M1 cuts the clock on the 4 high performance cores in half when pushed to the limit heat wise, just like other processors. I've seen it every time I try to run more than one VM on it.

M1 base multicore clock is around 2.9ghz, peak turbo clock is around 3.2 ghz. That's very different from x86 CPUs where base clock is usually half of the peak turbo. As to why you are being severe throttling on your machine, I don't know. Possibly some bug or other unrelated issue. I've been doing very demanding stuff on my M1 laptop (large software builds, statistical simulation that take hours etc.) and I cannot replicate your experience.

bobcomer said:
No.

How that? All benchmarks show that a Firestorm core running at 3.2 ghz/5 watts is very close in performance to a Tiger Lake or Zen3 running at 5ghz/20 watts. And again, look at the Anandtech multicore benchmarks: M1 is outperforming Inte's latest and greatest Tiger Lake by 20% while consuming half the power. That's exactly why performance per watt is important: it simply allows you to build much faster chips.

bobcomer said:
The trick is we don't know how they'll connect them. Having all the RAM on a massively parallel Mx processor has it's own problems.

Well, sure, but then again, these are not new problems. They have existed for decades in HPC and GPU domain and they have been solved. When you look at M1 architecture closely, it is very clear that Apple is borrowing the experience of GPUs, with their massive memory bandwidth. Extremely high memory parallelism, quick context switches, multiple independent narrow memory channels to optimize parallel processing... M1 essentially uses a GPU-inspired memory controller connected to standard system RAM with optimized packaging. There is no reason to doubt that they will be able to further widen the memory bus to to 256, 512 or even 1024-bit — GPUs have been doing it for decades. In fact, I used to own an Apple laptop with a 1024-bit memory interface couple of years ago (Vega Pro 20). The current 5600M even has a 2048-bit memory bus!

bobcomer said:
Anyway, we'll just have to wait and see given the published information. Anyway, I'm just as interested to see what comes next as you are!

Agreed! And to make it clear — you are entitled to your skepticism and I absolutely respect that. It's just happens to be my hobby to discuss these things and to compare available information.

pshufd · Apr 2, 2021

The Core i9-9980HK in the Late 2019 MacBook Pro 16 has Geekbench 5 scores of 1,092 single-core and 6,851 multi-core. The 2020 MacBook Pro 13/M1 has a single-core score of 1,730 and a multi-core score of 7,439. So the M1 smokes the i9 in single core but their multicore scores are similar, mainly because there are only four high-performance cores on the M1 and eight on the i9.

My i7-10700 system, for reference, is 1,286 for single-core and 8,281 for multi-core. I've used an M1 system and the responsiveness on this system is noticeable compared to my i7-10700.

There are aspects of the MacBook Pro 16 where it's better for some workloads and there are workloads where the M1 fares better. I expect Apple Silicon will eventually eclipse all of the Intel CPUs used in Macs in absolute performance and performance per watt. I expect big gains in programs that are commonly used by macOS users as Apple adds more and more custom silicon for commonly used programs and operations. Something that will be quite difficult to do on x64.

leman · Apr 2, 2021

pshufd said:
Power consumption rises exponentially with frequency so saturating a CPU is an inefficient way to get work done which is why we have multi-core and multi-threaded CPUs today. My i7-10700 gets the job done well with this though an AMD 5900X or 5950X would do a better job as there are more cores/threads to share the load. It's an expensive way, though, to keep power consumption low and run a cool system.

Yep, but that is the byproduct of modern x86 cores having to run way above their optimal efficiency point to deliver good performance. The optimum perf/watt for these chips is somewhere around 2.5-3ghz (which is not accidentally the most common base frequency range) where they consume around 4-5 watts. But you have to crank them up to 5ghz to get good peak performance, which pushes the power consumption to 20-25 watts.

Apple's advantage is that they can match that peak performance with their 3.2ghz CPU running at 5 watts... which gives them much more space to scale. Thermals is one of the biggest challenges for large multi-core chips and they have to be carefully balanced to avoid overheating. Power efficiency of Apple Silicon is king here.

pshufd · Apr 2, 2021

leman said:
Yep, but that is the byproduct of modern x86 cores having to run way above their optimal efficiency point to deliver good performance. The optimum perf/watt for these chips is somewhere around 2.5-3ghz (which is not accidentally the most common base frequency range) where they consume around 4-5 watts. But you have to crank them up to 5ghz to get good peak performance, which pushes the power consumption to 20-25 watts.

Apple's advantage is that they can match that peak performance with their 3.2ghz CPU running at 5 watts... which gives them much more space to scale. Thermals is one of the biggest challenges for large multi-core chips and they have to be carefully balanced to avoid overheating. Power efficiency of Apple Silicon is king here.

My workload would be better with a lot of low-power cores but nobody is going to make these for general purpose systems because you use a lot of transistors for this. Apple has architectural advantages which are just not possible with x86. I am waiting for more critical mass on Apple Silicon for software and an M1X for more ports and more RAM. I actually don't need more performance. Eight Firestorm cores would be nice but I don't have any workloads that would come close to saturating them. But, even if I did, I suspect it would still run cool.

bobcomer · Apr 2, 2021

pshufd said:
Power consumption rises exponentially with frequency so saturating a CPU is an inefficient way to get work done which is why we have multi-core and multi-threaded CPUs today. My i7-10700 gets the job done well with this though an AMD 5900X or 5950X would do a better job as there are more cores/threads to share the load. It's an expensive way, though, to keep power consumption low and run a cool system.

Getting the job done and having a reasonable cost is the metric that's the most important to me. The AMD 5900X and 5950X could get more done given the same timeframe, but they're too expensive for me (and too hard to get!)

pshufd · Apr 2, 2021

bobcomer said:
Getting the job done and having a reasonable cost is the metric that's the most important to me. The AMD 5900X and 5950X could get more done given the same timeframe, but they're too expensive for me (and too hard to get!)

The 5900x is a wishlist item for me. I have notifications at five places if they come into stock. I could get a 5950x as well but it's definitely overkill for me. These CPUs are as hard to get as GPUs right now. I'm not going to bother stalking GPUs, though, until I have a CPU.

bobcomer · Apr 2, 2021

pshufd said:
The Core i9-9980HK in the Late 2019 MacBook Pro 16 has Geekbench 5 scores of 1,092 single-core and 6,851 multi-core. The 2020 MacBook Pro 13/M1 has a single-core score of 1,730 and a multi-core score of 7,439. So the M1 smokes the i9 in single core but their multicore scores are similar, mainly because there are only four high-performance cores on the M1 and eight on the i9.

As I said before, single threaded performance isn't what I look for, but my geekbench score for the i9 I have is 9331. I also have 64G of RAM and 3TB of high speed SSD, and a cooling system that's 3/4 the size of a Mac mini. I think (know), it can get a lot more done in the same time frame.

I never bought a laptop with an i9 in it, cooling is too much of a problem.

pshufd · Apr 2, 2021

bobcomer said:
As I said before, single threaded performance isn't what I look for, but my geekbench score for the i9 I have is 9331. I also have 64G of RAM and 3TB of high speed SSD, and a cooling system that's 3/4 the size of a Mac mini. I think (know), it can get a lot more done in the same time frame.

I never bought a laptop with an i9 in it, cooling is too much of a problem.

Responsiveness is a factor for me. The best Intel, AMD and the M1 have single-core in the 1,600-1,700 area now. That would be useful for one application that I have. But multicore is the most salient for my workload. Gamers like top single-core performance but I'm not a gamer.

bobcomer · Apr 2, 2021

pshufd said:
The 5900x is a wishlist item for me. I have notifications at five places if they come into stock. I could get a 5950x as well but it's definitely overkill for me. These CPUs are as hard to get as GPUs right now. I'm not going to bother stalking GPUs, though, until I have a CPU.

I can understand that. One of my friends was on the waiting lists for the newest threadripper too.

Personally GPU's aren't my thing, they don't help much in the work I do, but the 5950x, that would be nice. I needed a new machine now though, and the cost for this medium end i9 desktop was insane compared to what it was a year ago. (i9-10900)

bobcomer · Apr 2, 2021

pshufd said:
Responsiveness is a factor for me. The best Intel, AMD and the M1 have single-core in the 1,600-1,700 area now. That would be useful for one application that I have.

That just doesn't work for me. Multithreading and multiple cores is the only way I've seen good responsiveness.

pshufd said:
Gamers like top single-core performance but I'm not a gamer.

Nor am I.

pshufd · Apr 2, 2021

bobcomer said:
That just doesn't work for me. Multithreading and multiple cores is the only way I've seen good responsiveness.

I actually want both. The 5950X or 5900X would fill the bill. Of course if M1X comes along, I might just go with that if Fidelity comes out with a new trading platform by then.

bobcomer · Apr 2, 2021

leman said:
Agreed! And to make it clear — you are entitled to your skepticism and I absolutely respect that. It's just happens to be my hobby to discuss these things and to compare available information.

Agreed as well! I find these discussions interesting and it really does inspire me to think in different ways, which is always a good thing. Sometimes it has to percolate awhile to sink in though.

bobcomer · Apr 2, 2021

pshufd said:
I actually want both. The 5950X or 5900X would fill the bill. Of course if M1X comes along, I might just go with that if Fidelity comes out with a new trading platform by then.

I just want the intel types now, the M1X wont interest me (purchase-wise) until a few more types of software are available. I am quite disappointed in my M1 MBA (and WOA for Windows compatibility), that I wont be buying an M1X or M2 until it can do everything I want to do. I'll stick with my Windows machines and my Intel Mac Mini for most of my Mac stuff.

Give me an M1X or M2 that's at least as fast as my i9, with as much RAM, and that can run a full x86 emulator for things that I need to do, then I'll think about buying one, but I will be interestedly following what's happening with them!

thekev · Apr 2, 2021

bobcomer said:
I don't understand your point here. That's the way computers run these days and good for that to me. Computers aren't a game to get the best theoretical usage, it's to get the job done, and they're pretty good at that -- at least ever since we went from single threaded, interrupt driven, state machines. And yes, that means intel processors do it pretty well, they get the job done at a reasonable cost. Saturating cpu throughput is not something that's a good thing, it's a hard limit that one should strive not to do.

pshufd said:
It depends on what you're doing.

My workload spreads across my 8 cores/16 threads nicely. One program is display 88 real-time charts. The other is displaying 15 real-time charts with several studies per chart. I assume that it just runs each chart in a thread. I also have about ten tabs in Firefox and you can specify the number of processes/subprocesses that Firefox will use - up to 8. Most of those tabs are on auto-reload. All of these threads aren't CPU intensive so they don't get anywhere near stressing a process or thread but they do spread out nicely.

A workload based on a large number of tasks that update periodically without any of them saturating a particular core's schedule represents the type of workload that typically scales poorly with high core counts. In spite of this, such a workload generates a high thread count. The two are quite distinct, and we were previously talking about core counts, thus my mention of compute bound workloads and single-threaded performance.

ADGrant · Apr 2, 2021

bobcomer said:
Totally disagree, single core perf means nothing to me, we have multithreaded everything these days.

Not really. Javascript interpreters are single threaded so is the standard python interpreter. Those two languages alone are used for many workloads and most developers working in other languages don't know how to write multi-threaded code properly.

leman · Apr 2, 2021

thekev said:
A workload based on a large number of tasks that update periodically without any of them saturating a particular core's schedule represents the type of workload that typically scales poorly with high core counts. In spite of this, such a workload generates a high thread count. The two are quite distinct, and we were previously talking about core counts, thus my mention of compute bound workloads and single-threaded performance.

Yep, the most efficient way to run such workloads is to use a single core to process all the tasks, especially if the processing does not take much time. Scheduling short-running threads on separate cores is a waste of resources.

People forget that single core is very well capable of running multiple treads simultaneously. Multi-core designs are relatively new, multithreading is not.

pshufd · Apr 2, 2021

ADGrant said:
Not really. Javascript interpreters are single threaded so is the standard python interpreter. Those two languages alone are used for many workloads and most developers working in other languages don't know how to write multi-threaded code properly.

I'm running a bunch of tabs though so you wind up with process/threaded parallelism.

JMacHack · Apr 2, 2021

pshufd said:
What will change my mind on gaming (I'm not a gamer):

- GPU shortages persist until 2023
- Apple is able to build compelling GPUs, either integrated or discrete
- Apple sells systems at a very good price/performance point
- Performance/watt remains excellent

I used to play a lot more games than I do now that I’m an old fart. But I loathe “gaming” and the “gaming market “ as it currently is.

On topic: I think Apple will be able to build compelling gpus in their higher end machines. Maybe not in raw specs, but leveraging somewhere else that they excel.

bobcomer · Apr 2, 2021

ADGrant said:
Not really. Javascript interpreters are single threaded so is the standard python interpreter. Those two languages alone are used for many workloads and most developers working in other languages don't know how to write multi-threaded code properly.

Javascript really isn't my cup of tea, I don't do web stuff unless I really have to.

I'm a business developer that only works on internal stuff, and mainly on bigger hardware.. I'm not great at multithreaded stuff either, never needed it, anything I write assumes many tasks running at once and any one task isn't going hurt other task performance.

pshufd · Apr 2, 2021

bobcomer said:
Javascript really isn't my cup of tea, I don't do web stuff unless I really have to.

I'm a business developer that only works on internal stuff, and mainly on bigger hardware.. I'm not great at multithreaded stuff either, never needed it, anything I write assumes many tasks running at once and any one task isn't going hurt other task performance.

Firefox worked on implementing a Javascript JIT back in 2008. I was in a meeting with Brenden Eich and the guy from CalTech who would eventually become the CTO describe how it works and the development plan. Javascript performance improved dramatically. You could theoretically do some threading but I don't know if the amount of work to do it would be a net win in performance. These days, people run so much in tabs that spreading tabs over threads is a net win.

I'm kind of amazed at the number of people that leave thirty or forty tabs open. I have ten pinned tabs and eight of them are active - that is I either have auto-reload enabled on the pinned tabs or they are sites that are dynamic.

Krevnik · Apr 5, 2021

leman said:
Yep, but that is the byproduct of modern x86 cores having to run way above their optimal efficiency point to deliver good performance. The optimum perf/watt for these chips is somewhere around 2.5-3ghz (which is not accidentally the most common base frequency range) where they consume around 4-5 watts. But you have to crank them up to 5ghz to get good peak performance, which pushes the power consumption to 20-25 watts.

leman said:
Yep, the most efficient way to run such workloads is to use a single core to process all the tasks, especially if the processing does not take much time. Scheduling short-running threads on separate cores is a waste of resources.

People forget that single core is very well capable of running multiple treads simultaneously. Multi-core designs are relatively new, multithreading is not.

Thank you, this is the crux of the point I was trying to make earlier in the thread.

But using some real numbers again, if I had a load similar to pshufd’s, I’d save my money. A 5900X is more expensive, produces more heat (and thus louder with the same cooling), and is more power hungry under the same load as a 5600X. On top of that, the load from pshufd is very light, and wouldn’t be able to saturate a 5600X as it is.

The 5900X vs the 5600X shows that the package itself can draw noticeable power. A 5900X doing nothing draws 23W. A 5600X doing nothing draws 11W. The package alone draws 20-30W for the 5900X while it’s 10-15W for the 5600X. On top of that, the per-core power on the 5600X is flat out better under load as well (partly due to slightly slower boost clocks). Unless you can saturate a 5600X and then some, you would just be burning money on a 5900X IMO, and it would be a noisier machine to boot.

Source for numbers: https://www.anandtech.com/show/1621...e-review-5950x-5900x-5800x-and-5700x-tested/8

ADGrant · Apr 5, 2021

leman said:
People forget that single core is very well capable of running multiple treads simultaneously. Multi-core designs are relatively new, multithreading is not.

Actually really not true. Some Intel Cores can appear to run two threads simultaneously using Hyperthreading but only one thread is really running, the other is blocked by something like a cache miss.

In all other cases, a core only runs one thread at a time, switching between them when a thread blocks (usually on an I/O operation or a timer interrupt signals the end of that threads time slice. Thread context switches are quite inefficient. The kernel has to switch the CPU core to another thread context which means discarding the data in the CPU registers and cache lines.

BTW the first multi-core CPU was introduced 20 years ago, which doesn't really count as new anymore, but the use of multiple cpus for parallel computing is much older. Amdahl's law was presented to a conference audience in 1967.

mj_ · Apr 5, 2021

ADGrant said:
Actually really not true. Some Intel Cores can appear to run two threads simultaneously using Hyperthreading but only one thread is really running, the other is blocked by something like a cache miss.

I don't think that's entirely true. I believe SMT will allow a CPU to run two threads simultaneneously as long as they are not interdependent and run on different execution units, say one integer operation running on the ALU and one floating point operation running on the FPU, or one decoder operation running on the decoder unit and one memory fetch operation running on the MU. I believe that more recent implementations of SMT will even allow two simultaneous ALU or FPU operations if at least two arithmetic or floating point execution units are unused (disregarding context switching, which would entail as you've correctly stated a huge penalty) but I may be wrong on this.

Either way, there are numerious aspects where Intel Macs are still outperforming M1 Macs. For example, Intel Macs give you several seconds of me time between opening the lid and actually waking up. These three to five seconds can be used for contemplation, meditation, and to think about live, the universe, and everything. Who knows, maybe God would still be alive if Nietzsche had used an M1 Mac instead of an Intel Mac and would've thus had less time to think about stuff 🤷‍♀️

leman · Apr 6, 2021

ADGrant said:
Actually really not true. Some Intel Cores can appear to run two threads simultaneously using Hyperthreading but only one thread is really running, the other is blocked by something like a cache miss.

I was referring to preemptive multithreading with context switches.

ADGrant said:
Thread context switches are quite inefficient. The kernel has to switch the CPU core to another thread context which means discarding the data in the CPU registers and cache lines.

Context switch costs just a few microseconds on modern CPUs with a modern kernel. Waking up from a power-saving also costs couple of microseconds (depending on how much power you want to conserve), so does clock shifting. If we are in scenario like discussed above (multiple threads that have to do small amount of work from time to time), the CPU will spend most of the time in a low power state, only waking for brief periods of time to do the work. It is more power efficient (and also more performant) to coalesce such threads on a single core that would do a larger work package in one go (even with context switches) rather than to wake up multiple cores to do minuscule amount of work. There is a good reason why modern OSes use interrupt coalescing.

Aspects where Intel Macs still outperform the M1 models?

macrumors G4

macrumors Core

macrumors G4

macrumors Core

macrumors G4

macrumors 601

macrumors G4

macrumors 601

macrumors G4

macrumors 601

macrumors 601

macrumors G4

macrumors 601

macrumors 601

macrumors 604

macrumors 68000

macrumors Core

macrumors G4

Suspended

macrumors 601

macrumors G4

macrumors 601

macrumors 68000

macrumors 68000

macrumors Core

Our Staff