Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,522
19,679
So the performance cores can only do N things at once, but the efficiency cores can do N+X things at once because they are able to multithread? My Mac currently has 513 threads running. So you're saying that hypothetically 500 of them are low-priority and 13 are high priority (or 450/63, whatever). So the efficiency cores might be able to handle 100 low priority tasks each, while the performance cores can only handle 10? But they can run those 10 at 10x the speed that the efficiency cores can run them? And I'm talking about the M1 and a hypothetical M1x.

P-cores and E-cores are designed with different targets in mind: P-cores to run a unit of work as fast as possible and E-cores to run a unit of work with lowest possible energy expenditure. As a result of this, the cores themselves are very different. E-cores have less execution units, less cache etc. — everything is geared to use less power.

As @cmaier explains, E-cores cannot run more threads at ones or anything like that, the basic execution model is the same for P-cores and E-cores. The main difference is that while P-cores can run, say, 100 threads of work in X time using Y power, the E-cores can run that same workload in X*4 time but at Y/10 power usage (numbers only illustrative). If the work is not high priority and you are not losing anything by taking longer to run it, you can save quite a lot of energy this way.
 
  • Like
Reactions: Mr. Bear

altaic

Suspended
Jan 26, 2004
712
484
P-cores and E-cores are designed with different targets in mind: P-cores to run a unit of work as fast as possible and E-cores to run a unit of work with lowest possible energy expenditure. As a result of this, the cores themselves are very different. E-cores have less execution units, less cache etc. — everything is geared to use less power.

As @cmaier explains, E-cores cannot run more threads at ones or anything like that, the basic execution model is the same for P-cores and E-cores. The main difference is that while P-cores can run, say, 100 threads of work in X time using Y power, the E-cores can run that same workload in X*4 time but at Y/10 power usage (numbers only illustrative). If the work is not high priority and you are not losing anything by taking longer to run it, you can save quite a lot of energy this way.
So, are threads just context switches? Surely it’s a grey zone, Shirley?
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
So, are threads just context switches? Surely it’s a grey zone, Shirley.

Yep, threads are context switches. Each Apple CPU core runs only one instruction stream at a time. There is no SMT functionality.

P.S. So yeah, my "100 threads of work" is clearly BS from technical standpoint, I was just using it to illustrate a point. Substitute it with "100 tasks" or "100 work packages".
 

altaic

Suspended
Jan 26, 2004
712
484
Yep, threads are context switches. Each Apple CPU core runs only one instruction stream at a time. There is no SMT functionality.

P.S. So yeah, my "100 threads of work" is creamy BS from technical standpoint, I was just using it to illustrate a point. Substitute it with "100 tasks" or "100 work packages".
It seems to me that the burden of context switching can be mitigated by other proactive measures, potentially leading to a much larger gain than would be expected. To name a one low hanging fruit, instruction reordering space.

I’m curious: are you or @cmaier familiar with the belt architecture? There are few architectures that deviate from Von Newman.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
It seems to me that the burden of context switching can be mitigated by other proactive measures, potentially leading to a much larger gain than would be expected.
Context switches at the OS level are a lot more expensive vs at the processor level (for multi-threaded CPUs). But any context switching will add latency to any processing threads, so should be avoided if possible. I would think this area would have been studied in depth and we probably are not going to get any more breakthrough here.
 

altaic

Suspended
Jan 26, 2004
712
484
Context switches at the OS level are a lot more expensive vs at the processor level (for multi-threaded CPUs). But any context switching will add latency to any processing threads, so should be avoided if possible. I would think this area would have been studied in depth and we probably are not going to get any more breakthrough here.
I’m constantly shocked by things that have not been studied. The American dream is still alive.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,522
19,679
It seems to me that the burden of context switching can be mitigated by other proactive measures, potentially leading to a much larger gain than would be expected. To name a one low hanging fruit, instruction reordering space.

You are absolutely right, but unfortunately, a stream of instructions (thread) is still a core architectural concept in Aarch64. So you can't just eliminate threads on a hardware level. Maybe future architectures will offer alternative approaches to computation.

In the meantime, Apple is working on reducing the need of context switching in software. Their new concurrency framework allow continuations to share the same thread/move between threads which massively improves latency and performance of small async code blocks.

I’m curious: are you or @cmaier familiar with the belt architecture? There are few architectures that deviate from Von Newman.

Yep, I've been following them for a while. It's really cool stuff although its not clear whether it has practical benefit. As I mentioned in the RISC-V threads, as far as register-based architectures go, Aarch64 is pretty much close to being optimal in my opinion (and x86 is utterly ugly). Any objectively "better" ISA will probably have to drop the registers.
 
  • Like
Reactions: altaic

altaic

Suspended
Jan 26, 2004
712
484
You are absolutely right, but unfortunately, a stream of instructions (thread) is still a core architectural concept in Aarch64. So you can't just eliminate threads on a hardware level. Maybe future architectures will offer alternative approaches to computation.

In the meantime, Apple is working on reducing the need of context switching in software. Their new concurrency framework allow continuations to share the same thread/move between threads which massively improves latency and performance of small async code blocks.



Yep, I've been following them for a while. It's really cool stuff although its not clear whether it has practical benefit. As I mentioned in the RISC-V threads, as far as register-based architectures go, Aarch64 is pretty much close to being optimal in my opinion (and x86 is utterly ugly). Any objectively "better" ISA will probably have to drop the registers.
Concurred on all points. Good times ahead; gotta love innovation.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.