Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,516
19,662
the best AI PC as a mac...ML is not up there with current Copilot PC
For me ML=AI from pc world just under a different name. Again i was referring to the title that Apple is marketing the mac as the best AI device while they dont..Aaron user miss-understood Apple marketing about the M3 MBA is what i mean

I am afraid I don’t quite follow. ML and AI are pretty much synonymous concepts. And while M3 might have lower advertised ML TOPS, they score higher in real-world benchmarks. So what is it that you mean?
 

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,194
I am afraid I don’t quite follow. ML and AI are pretty much synonymous concepts. And while M3 might have lower advertised ML TOPS, they score higher in real-world benchmarks. So what is it that you mean?
i just edit my answer. Maybe it helps a bit more. But i will stop here since AI and Mac are a different topic stories...here is about Qualcomm and windows
 
Last edited:

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
From what I understand M3 outperforms most Copilot+ PCs in on-device inference benchmarks. Advertised TOPS number is one thing, ability to use it effectively in ML workloads is something else entirely.
I find it interesting that Apple couldn't find a practical use of ML that it could use to compare its laptops to Qualcomm-based laptops. I imagine Apple would use some Apple Intelligence performance metrics in the future for such comparisons.

I get the impression that the PC world is selling the idea that Copilot+ PCs are more advanced PCs with unique features. So I'm curious what Apple's marketing team can do to counter it.
 

vladi

macrumors 65816
Jan 30, 2010
1,008
617
In what sense for either? If we're talking native ST/MT performance, then Qualcomm is already competitive with AMD and the currently available Intel processors if not well ahead of the latter's Meteor Lake. Intel is claiming to retake the lead in ST performance and MT perf-per-watt (though not MT performance) with their soon-to-released Lunar Lake processors - though as already noted 1st party benchmarks on unreleased processors should never be assumed to be accurate - and AMD already has roughly equivalent MT perf/W to the Elite (depending on workload). Neither of them are anywhere close to Qualcomm's ST perf/W and that's unlikely to change anytime soon. So I'm not quite sure where either statement is coming from? Emulated performance?

I'm sorry but hardcore ST performance means nothing really. The only apps that need hardcore ST performance are video games and Qualcomm's first attempt in GPU performance is very weak compared to both AMD and Intel. All other utilities that thrive on ST don't even utilize that much performance to begin with. Office Word is a single thread app but Excel uses multi threads to solve formulas.
 

Exclave

Suspended
Jun 5, 2024
77
102
I'm sorry but hardcore ST performance means nothing really. The only apps that need hardcore ST performance are video games and Qualcomm's first attempt in GPU performance is very weak compared to both AMD and Intel. All other utilities that thrive on ST don't even utilize that much performance to begin with. Office Word is a single thread app but Excel uses multi threads to solve formulas.
This might be the most incorrect statement I’ve seen on here. ST performance is massively important. To say otherwise shows a complete lack of understanding.
 

vladi

macrumors 65816
Jan 30, 2010
1,008
617
This might be the most incorrect statement I’ve seen on here. ST performance is massively important. To say otherwise shows a complete lack of understanding.

You probably misread my post. I didn't say ST performance doesn't matter I said hardcore/extreme ST performance doesn't matter in everyday workflows except video gaming. Most of single thread apps are already efficient and they don't require extreme performance, even some single thread CAD apps will never peak M2 single thread performance when you use them.

What single thread workflow you got that you need M3 Max single thread performance compared to M1 single thread performance? I would really like to know. If there is such thing it must be very niche.
 

Exclave

Suspended
Jun 5, 2024
77
102
You probably misread my post. I didn't say ST performance doesn't matter I said hardcore/extreme ST performance doesn't matter in everyday workflows except video gaming. Most of single thread apps are already efficient and they don't require extreme performance, even some single thread CAD apps will never peak M2 single thread performance when you use them.

What single thread workflow you got that you need M3 Max single thread performance compared to M1 single thread performance? I would really like to know. If there is such thing it must be very niche.
I’m not sure what you mean here? My point is single thread performance improvements help everything. Everything is improved by increases in ST. That is not the case for multi-threaded improvements. In a situation where I could choose more ST or more MT, I would almost always choose more ST. Ironically video is one of the ones that benefits from multi-threaded improvements.
 
Last edited:

vladi

macrumors 65816
Jan 30, 2010
1,008
617
I’m not sure what you mean here? My point is single thread performance improvements help everything. Everything is improved by increases in ST. That is not the case for multi-threaded improvements. In a situation where I could choose more ST or more MT, I would almost always choose more ST. Ironically video is one of the ones that benefits from multi-threaded improvements.

I’m not sure what you mean here? My point is single thread performance improvements help everything. Everything is improved by increases in ST. That is not the case for multi-threaded improvements. In a situation where I could choose more ST or more MT, I would almost always choose more ST. Ironically video is one of the ones that benefits from multi-threaded improvements.

I'm not saying you are wrong here but issue is a lot more complex than just st vs mt. I'm talking about real world scenarios where you could extract such performance and not about benchmark scores. Bear with me here please. We've been in visual & audio production business for over 35 years now and the way hardware moves and evolves is fascinating.

In our software workflow we don't have a single major app that doesn't support multithreading. When I asked what piece of software (not a video game) would push M3 Max single thread performance to its peak I already had an answer in my head and that was some audio VST plugin that hasn't been optimized yet. VST plugins are the ones that will spike your CPU load and they were all single threaded up to few years back. That's why UAD sold you the box to run their plugins on. Thanks to Intel Mac Pro extreme core count many major VST developers have implemented multi core support, it took them only forever, but they did it.

So in our workflow extreme single thread CPU performance is not important but neither is total multi thread CPU performance. What we look for is the balance between single core speed and core count this time around. All the Windows PCs we've built have high-end gaming CPUs inside instead of workstation CPUs. That's because multi core count is not a priority in our workflow anymore because of nVidia GPU evolution as all of our apps are now offloading to GPU now when it comes to video invisible 2D VFX, 3D VFX, motion graphics and rendering. Bottom line is: GPUs today are faster than 64 core CPU in final renderings and you actually save money by skipping on 64 core CPU workstation and going with 8 core video gaming CPU with fastest single core clock there is + high end GPU.

Our 24 core Mac Pro (which we only use for audio mastering now) probably has much better MT performance benchmark score than our PC Intel 14900 with 8 real cores (technically it's 24 cores but that's different topic) but in the real world that is means nothing because in regular workflow like setting up the shot and comping it there is no real difference in speed because app can't max out all the 24 cores most of the time while you are just painting the wires or doing matte painting. When it comes to finalizing the result of course 24 core Mac Pro would devour our 8 core Intel but not our GPU let alone GPU farm we got going.

If you are still with me I applaud you so let me wrap it up with what my experience tells me to be the fair middle ground: multi thread total CPU performance is irrelevant in visual professional workflow as the hardest of tasks have already shifted to GPU. But single thread total CPU performance is even more so irrelevant in reality as most apps do support multi-threading already, even Chrome is multi-threaded. But does that mean Chrome would run better on Xeon with 24 slow cores compared to fewer but faster i9 cores? No, absolutely not. Chrome would most likely run faster on fewer faster cores so that's where single thread performance actually shines by giving us a glimpse at performance of every day light multi threading. In that way you are right about single thread performance.

There are some exceptions to single thread apps like Python compiling, javascript, PHP, heck even zipping files but I really doubt these would overload your CPU even if it's few years old. These will do just fine on vanilla M1 CPU and M3 Max would not make them significantly faster. But then again M3 Max comes with faster SSD than vanilla M1 so the performance is not truly comparable. That's just my assumption.

On a side note we did run Alan Wake 2 on our 64 core Threadripper and on Intel i9 14900 and my god the difference is beyond obvious. Games just prefer faster single core clock than more cores.
 

Confused-User

macrumors 6502a
Oct 14, 2014
850
984
Chrome would most likely run faster on fewer faster cores so that's where single thread performance actually shines by giving us a glimpse at performance of every day light multi threading. In that way you are right about single thread performance.

There are some exceptions to single thread apps like Python compiling, javascript, PHP, heck even zipping files but I really doubt these would overload your CPU even if it's few years old. These will do just fine on vanilla M1 CPU and M3 Max would not make them significantly faster. But then again M3 Max comes with faster SSD than vanilla M1 so the performance is not truly comparable. That's just my assumption.
Everything you wrote (not quoted here) makes sense but is limited to your industry There are other use cases that are like yours, but they are also limited in scope. By far the majority of PC and Mac users have a different type of workload, much better represented in your quote above. For most users, ST rules, and MT helps a lot up to a limited number of cores, probably 4-8 for most users. That number will likely continue to inch up over time.

SSD performance can also be important in some cases, but for most people most of the time, M1 SSD performance is all they'll need. Faster performance form the M3 will not be perceptible, because only large transfers will show a difference.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,450
1,219
You probably misread my post. I didn't say ST performance doesn't matter I said hardcore/extreme ST performance doesn't matter in everyday workflows except video gaming. Most of single thread apps are already efficient and they don't require extreme performance, even some single thread CAD apps will never peak M2 single thread performance when you use them.

What single thread workflow you got that you need M3 Max single thread performance compared to M1 single thread performance? I would really like to know. If there is such thing it must be very niche.
I think I understand what you're trying to say and it's much more reasonable than what I initially thought you were saying about ST/MT performance (though I still disagree, I would say ST performance is still more important for reasons I'll get to, but as a preview of my arguments efficiency here matters - race to sleep for ST matters). However, both this post and your subsequent longer piece is still an indirect contradiction or at least negation of your original statement that "Intel/AMD will catch Qualcomm before the other way around etc..." which is what I was responding to originally and what prompted this conversation. Let me explain: for both MT and ST, that's already happened except in ST performance per watt. Qualcomm's Elite multithreaded performance is already caught up to AMD/Intel and AMD's H300 multithreaded efficiency has caught up to the Elite as well, potentially so has LNL's (some of that is a consequence of the compromises in the Elite SOC design which I'll get to). And the ST performance of any of them is hardly extreme. All of these are mobile chips, not overclocked enthusiast builds running on liquid nitrogen here, and all have similar enough ST performance - often trading blows (though of course Intel in its marketing materials claims to have the best, as of yet unconfirmed).

Further, even if what you were saying wrt ST/MT performance was 100% accurate and it isn't ST performance is still the most important, ST efficiency would still be massively important to the user's experience, especially given that they are mobile chips. And for the case that you assert predominates users' workflows, hybrid, low thread count MT performance/efficiency, is also heavily impacted by ST performance/efficiency as you might imagine. As such, Qualcomm should have a substantial advantage in such workloads. And I don't disagree with that light MT is a crucial aspect to test. In fact, it's why GB changed its approach to testing MT in fact and their MT overarching benchmark includes, is even dominated by, such "non-scalable" MT workloads. They still test pure, scalable MT because those workloads can still matter, but the resulting MT "score" is an average of these two kinds of MT workloads as much as "FP" and "Int". As an aside, this is one reason why I am in favor of the idea that, when one is really getting into the weeds on performance comparing multi-test benchmarks, one ignores the top line figure entirely and just compare the subtests. But that's getting into a different topic. Qualcomm's (and Apple/ARM's) ST perf/watt is pretty unassailable, for now, and that does have knock on effects for designing SOCs and even getting the most out of scalable and non-scalable MT as well.

That said, you can probably tell from my previous posts that I think the Qualcomm Elite chip has serious issues. Where Qualcomm erred in its SOC design from my perspective as a tech enthusiast, not an expert, and even if I'm right they may not have had much of a choice as I'm sure some of these decisions were made from necessity rather than by preference, is as follows:

1A. no E cores for low power operations and light MT
1B. too much prioritizing of MT throughput not allowing a true fan-less design even in cutdown models (to be fair Strix Point seems to be similar - LNL may have gotten this balance right)
1C. Not even achieving the theoretical max of MT throughput, 12 P-cores should really be doing better and something about their SOC design, maybe heat (again no E-cores) or lack of cache/fabric bandwidth seems to be hampering MT performance, by about 20% by my estimation from another site
2. underperforming iGPU and, so far, no dGPU design wins, making it unattractive for gamers even if the driver/software issues weren't there and gaming occupies an outsized portion of the mindshare in the PC space
3. LATE - M2 core equivalents on N4 in 2024 simply don't cut it especially if you're having to fight software compatibility issues that your main rivals aren't

This isn't to say that the Elite project is a failure, for any V1 chip it would be almost remarkable not to have any of these issues*, but I think mostly everyone disagreeing with you on the ST/MT stuff agrees that Qualcomm's V2 needs to be a huge step up and not just in the SOC(s)/cores but everything around that if it is to successfully fight for meaningful market share in the (non-Mac) PC space.

*while the M1 had issues and delays itself - especially delays it took a year to release the M1 Pro/Max and well over for the M1 Ultra - it is overall remarkable how well the Apple transition went. More than performance, this is where total control of the product stack helps enormously.
 
Last edited:

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
You probably misread my post. I didn't say ST performance doesn't matter I said hardcore/extreme ST performance doesn't matter in everyday workflows except video gaming. Most of single thread apps are already efficient and they don't require extreme performance, even some single thread CAD apps will never peak M2 single thread performance when you use them.

What single thread workflow you got that you need M3 Max single thread performance compared to M1 single thread performance? I would really like to know. If there is such thing it must be very niche.
You really need to go read up on Amdahl's Law:


Thanks to this timeless, universal truth of computing, if given a choice between a system with 1 CPU that provides 100 units of throughput and another that has 100 CPUs which provide 1 unit each (thus the same total of 100 units), you should always choose the single-CPU machine. It will be faster in nearly all scenarios, as there are too many programs with significant serial execution bottlenecks.

Also, your claim that ST apps which require lots of performance cannot, as a general rule, take advantage of more than M1's level of ST performance is just not how things work at all.

ST performance is always important! We have a never-ending appetite for it. The only reason people bother designing many-core CPU designs is that quite a long time ago, ST performance scaling hit a wall. That 1x100 vs 100x1 comparison is unrealistic; in the real world, we literally don't know how to build that 100-unit CPU.

Instead, we get CPU core designs where they explore the limits of diminishing returns. As you scale a core's ST performance up, the costs in power and area are non-linear: the last unit of performance requires far more area and power than the first. Guided by product requirements, designers pick an appropriate efficiency tradeoff, design a core, then fill the die out with multiple copies of it to scale MT performance. (Or sometimes do more complex things with multiple tiers of core, of course.)

But if they could, at no power or area cost, halve the core count, double the ST performance per core, and keep MT performance constant? Absolute no brainer. They'd do that every time. Huge win.
 

leman

macrumors Core
Oct 14, 2008
19,516
19,662
I'm not saying you are wrong here but issue is a lot more complex than just st vs mt. I'm talking about real world scenarios where you could extract such performance and not about benchmark scores. Bear with me here please. We've been in visual & audio production business for over 35 years now and the way hardware moves and evolves is fascinating.

It appears that you are focusing on certain professional workload types, and for those your argument makes sense. Having multiple slower cores works very well if your workload consists of a large number of small semi-independent tasks. Extreme cases of this are GPUs and servers.

However, we are talking about personal computing. The keywords here are throughput and latency. So far, you are talking about throughput. But for personal computing, latency is extremely important as it determines how responsive the software will be, especially as the code gets more complex. Latency also becomes increasingly important if your workloads are asymmetrical, as is often the case with personal computing. See also @mr_roboto’s post above. I would always take faster cores over more cores, because it gives me flexibility. In fact, I would take a 20-25% hit in total throughput if I can get 25% cores, because most of my work is asymmetrical.
 

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
You really need to go read up on Amdahl's Law:


Thanks to this timeless, universal truth of computing, if given a choice between a system with 1 CPU that provides 100 units of throughput and another that has 100 CPUs which provide 1 unit each (thus the same total of 100 units), you should always choose the single-CPU machine. It will be faster in nearly all scenarios, as there are too many programs with significant serial execution bottlenecks.

Also, your claim that ST apps which require lots of performance cannot, as a general rule, take advantage of more than M1's level of ST performance is just not how things work at all.

ST performance is always important! We have a never-ending appetite for it. The only reason people bother designing many-core CPU designs is that quite a long time ago, ST performance scaling hit a wall. That 1x100 vs 100x1 comparison is unrealistic; in the real world, we literally don't know how to build that 100-unit CPU.

Instead, we get CPU core designs where they explore the limits of diminishing returns. As you scale a core's ST performance up, the costs in power and area are non-linear: the last unit of performance requires far more area and power than the first. Guided by product requirements, designers pick an appropriate efficiency tradeoff, design a core, then fill the die out with multiple copies of it to scale MT performance. (Or sometimes do more complex things with multiple tiers of core, of course.)

But if they could, at no power or area cost, halve the core count, double the ST performance per core, and keep MT performance constant? Absolute no brainer. They'd do that every time. Huge win.
That 1x100 vs 100x1 comparison is not just unrealistic it's also not nuanced. There are loads where 100x1 design will win. If you are running 100 processes/threads, the 100x1 design is probably better because 1x100 design will have to constantly switch between the processes causing the overhead.
 
  • Sad
Reactions: Chuckeee

Confused-User

macrumors 6502a
Oct 14, 2014
850
984
That 1x100 vs 100x1 comparison is not just unrealistic it's also not nuanced. There are loads where 100x1 design will win. If you are running 100 processes/threads, the 100x1 design is probably better because 1x100 design will have to constantly switch between the processes causing the overhead.
While your words are literally true, they present a false image. (Good trick!) The numbers (1x100 vs. 100x1) were not meant to be accurate (as @mr_roboto noted). But the *idea* is absolutely correct, for almost every real case.

Yes, there might be a few problems that are at the extreme end of "embarrassingly parallel" where, as you say, the penalty for context switches will make the 100x1 design better, but it would be a challenge to find them (see Amdahl's law, again). And in the real world, most of those sorts of problems run on GPUs anyway, where you may get 5000x1, not 100x1, and there's no real argument about what you want for that.

The takeaway from all this is simple. Reread what roboto wrote. I was going to say it in my own words but really, I can't do it better than he just did, so why bother?
 
  • Like
Reactions: Chuckeee

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
where, as you say, the penalty for context switches will make the 100x1 design better,
Threads are processing in time-slices in modern OSes, so regardless, any executing code will be pre-empted and returned back to the scheduler queue for another go at the CPU when it's time-slice is up. It could end up that due to so many threads/processes running, context switches will invariably happen as at any one time, you may have many threads that are pre-empted, freeing up a bunch of free CPUs and making the scheduler switch to scheduling waiting processes in the queue to another CPU, causing cache flush.
 

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
While your words are literally true, they present a false image. (Good trick!) The numbers (1x100 vs. 100x1) were not meant to be accurate (as @mr_roboto noted). But the *idea* is absolutely correct, for almost every real case.

Yes, there might be a few problems that are at the extreme end of "embarrassingly parallel" where, as you say, the penalty for context switches will make the 100x1 design better, but it would be a challenge to find them (see Amdahl's law, again). And in the real world, most of those sorts of problems run on GPUs anyway, where you may get 5000x1, not 100x1, and there's no real argument about what you want for that.

The takeaway from all this is simple. Reread what roboto wrote. I was going to say it in my own words but really, I can't do it better than he just did, so why bother?
My point was that the categorical statement about one fast core being always better than multiple slower cores was patently incorrect. As is your statement that "most of those sorts of problems run on GPUs". If that was true, we would not need multithreading, would we? There are plenty of apps that use multithreading for things for which GPUs can't be used. Don't game apps routinely use muliple threads? Besides, 1x100 = 100x1 is not the only scenario. What if the faster core is 20% faster but the combined performance of faster cores is, say, 40% lower than the combined performance of slow cores (because the other CPU has so many of them)? If anything, Amdahl's law says that there will be loads for which the second CPU will be faster.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
My point was that the categorical statement about one fast core being always better than multiple slower cores was patently incorrect.
No, your point was to be needlessly pedantic. I was trying to explain the basic principles that guide how you should think about this, and therefore I felt it wasn't necessary to discuss a rarely-encountered corner case.

Why is it rare? Simple: most applications users are likely to run on desktop computers don't blindly create hundreds or thousands of worker threads. Instead, they query the OS and fire up as few worker threads as necessary to keep the system's cores utilized. Going far over the hardware thread count costs memory and performance, even on many-core computers, and thus smart programmers avoid going overboard with thread creation.
 

leman

macrumors Core
Oct 14, 2008
19,516
19,662
That 1x100 vs 100x1 comparison is not just unrealistic it's also not nuanced. There are loads where 100x1 design will win. If you are running 100 processes/threads, the 100x1 design is probably better because 1x100 design will have to constantly switch between the processes causing the overhead.

Hardly. A context switch is around 5 microseconds. If you run each thread for 1ms, you have spent one second running the threads and 0.5ms on context switching (let’s round this up to 1ms because I feel generous). That’s 1% loss. If you run each thread for 10ms, that’s 0.1% loss. And so on. Not to mention that a super fast 1x100 CPU is likely much faster at context switching in the first place, because it’s memory subsystem be super-fast too. And it’s not like the 100x1 computer is not context-switching.

To make context -switching painful you’d need to carefully engineer an extreme case of data dependencies between threads, which will force the scheduler to switch threads very frequently. But this will be very slow on both hypothetical implementations, as you’d spend most of the time in the kernel, managing the synchronization.
 
  • Like
Reactions: Chuckeee

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
No, your point was to be needlessly pedantic. I was trying to explain the basic principles that guide how you should think about this, and therefore I felt it wasn't necessary to discuss a rarely-encountered corner case.

Why is it rare? Simple: most applications users are likely to run on desktop computers don't blindly create hundreds or thousands of worker threads. Instead, they query the OS and fire up as few worker threads as necessary to keep the system's cores utilized. Going far over the hardware thread count costs memory and performance, even on many-core computers, and thus smart programmers avoid going overboard with thread creation.
Nonsense. When the app queries the hardware, typically it then selects the number of threads based on the number of available cores (there are other scenarios: it may create more threads for I/O heavy tasks or fewer threads if the extra parallelism does not benefit it). The point is that a well designed app can load up all cores in which case the total (MC) performance is more important than SC performance.

When people claim the superiority of fast core designs they typically emphasize that fast core improves the UI responsiveness. While there is some logic behind this claim, this issue is also not that simple.
 

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
Hardly. A context switch is around 5 microseconds. If you run each thread for 1ms, you have spent one second running the threads and 0.5ms on context switching (let’s round this up to 1ms because I feel generous). That’s 1% loss. If you run each thread for 10ms, that’s 0.1% loss. And so on. Not to mention that a super fast 1x100 CPU is likely much faster at context switching in the first place, because it’s memory subsystem be super-fast too. And it’s not like the 100x1 computer is not context-switching.

To make context -switching painful you’d need to carefully engineer an extreme case of data dependencies between threads, which will force the scheduler to switch threads very frequently. But this will be very slow on both hypothetical implementations, as you’d spend most of the time in the kernel, managing the synchronization.
I am not sure I got it. Does not 0.5ms delay per 1ms of processing mean 30% performance hit? Also, making the core faster is difficult. Increasing core count is relatively easy which makes it the obvious choice for increasing total system performance. Everybody (including Apple) does it.

Now, I am not saying that having faster core does not have its benefits. Developing parallelized apps is hard. I am just cautioning against oversimplification.
 

leman

macrumors Core
Oct 14, 2008
19,516
19,662
I am not sure I got it. Does not 0.5ms delay per 1ms of processing mean 30% performance hit?

It is 0.5ms overhead for each second of processing. The bottom line is that it all depends on how you slice up the time between the threads. Very short time slices will result in higher overhead.

Also, making the core faster is difficult. Increasing core count is relatively easy which makes it the obvious choice for increasing total system performance.

Of course. It’s a hypothetical example. If you find such examples distasteful (and it would be understandable if you do), let’s use something more realistic, like 1 fast core or 4 cores that are 4x slower.
 

leman

macrumors Core
Oct 14, 2008
19,516
19,662
Nonsense. When the app queries the hardware, typically it then selects the number of threads based on the number of available cores (there are other scenarios: it may create more threads for I/O heavy tasks or fewer threads if the extra parallelism does not benefit it). The point is that a well designed app can load up all cores in which case the total (MC) performance is more important than SC performance.

Just a quick note: it seems to me that your post simply rephrases what the poster you have quoted has already stated? How is it nonsense if you repeat the same thing? Maybe I’m missing something…
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
Nonsense. When the app queries the hardware, typically it then selects the number of threads based on the number of available cores
Says "nonsense", repeats the same thing I said... hmmm.

(there are other scenarios: it may create more threads for I/O heavy tasks or fewer threads if the extra parallelism does not benefit it). The point is that a well designed app can load up all cores in which case the total (MC) performance is more important than SC performance.
The point is that if you have a choice between two computers that offer equal total performance, but one of them does it with fewer cores that are higher performance per core, and all else is equal, the fewer-core option is, in general, superior. You still haven't actually countered this point.
 
  • Like
Reactions: Chuckeee

Confused-User

macrumors 6502a
Oct 14, 2014
850
984
Threads are processing in time-slices in modern OSes, so regardless, any executing code will be pre-empted and returned back to the scheduler queue for another go at the CPU when it's time-slice is up. It could end up that due to so many threads/processes running, context switches will invariably happen as at any one time, you may have many threads that are pre-empted, freeing up a bunch of free CPUs and making the scheduler switch to scheduling waiting processes in the queue to another CPU, causing cache flush.
All true, but in the very rare cases where the 100x1 case has a shot at actually being better, there are tools to mitigate this issue. Core affinity, and locking cores to user tasks, for example.

None of this is to suggest that mr_roboto was wrong. These rare corner cases are just that, rare, and not all that interesting except to the people who have to deal with them.

My point was that the categorical statement about one fast core being always better than multiple slower cores was patently incorrect. As is your statement that "most of those sorts of problems run on GPUs". If that was true, we would not need multithreading, would we? There are plenty of apps that use multithreading for things for which GPUs can't be used. Don't game apps routinely use muliple threads? Besides, 1x100 = 100x1 is not the only scenario. What if the faster core is 20% faster but the combined performance of faster cores is, say, 40% lower than the combined performance of slow cores (because the other CPU has so many of them)? If anything, Amdahl's law says that there will be loads for which the second CPU will be faster.
You are arguing about corner cases, or about unreasonable resource balances. Obviously, nobody here is suggesting that multicore CPUs (or having a bunch of E cores) is bad. This was about general concepts and rules of thumb.

If you wanted to actually have a slightly more interesting discussion about this idea, you could talk about power and efficiency, since performance never exists in a vacuum. In a relatively realistic example, 8x2GHz might be better than 4x4GHz cores where heat and power matter. It won't be faster, though, despite context switch overhead, even if the task can run eight threads (because Amdahl's law, and the balance @leman pointed out).
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.