Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Apple M1 (2020)
APL1102 (Tonga, T8103)
TSMC N5 5nm
16 billion transistors

CPU
Apple designed ARMv8.4-A
8-core big.LITTLE 18% faster
Core designs from A14 Bionic

8-core4 high-performance Firestorm4 energy-efficient Icestorm
Speed3.2 GHz2.064 GHz
L1 cache192 KB instructions /128 KB data128 KB instructions / 64 KB data
L2 cache12 MB4 MB

GPU
Apple G13G Apple 10 family integrated graphics 8 cores
128 Execution Units
1024 Arithmetic Logic Units
24,576 threads
1.30 GHz
2.6 TFLOPs (FP32)
16 Execution Units per core
8 Arithmetic Logic Units per Execution Unit

Memory
Unified 128-bit LPDDR4X SDRAM at 66.67 GB/s 2,133 MHz
Up to 16 GB

Other
Neural Engine 16-core 11 trillion operations per second 4th gen (H13_Styx_J5x)
PCIe 4.0 storage controller up to 2 TB SSD
2x USB-C 4 controller with Thunderbolt 3/USB 4 support (40 Gb/s)
2x USB 3.0 Type-A (5 Gb/s)
Secure Enclave and Image Signal processor
Video codec encoding support for HEVC and H.264
Video codec decoding support for HEVC and H.264
HDMI 2.0
One 6K display at 60Hz or one 4K display path 60Hz HDMI
Wi-Fi 6 up to 1.2 Gbit/s 2x2
Bluetooth 5.0
 
Last edited:
Apple M2 (2022)
APL1109 (Staten, T8112)
TSMC N5P second-generation 5nm
20 billion transistors
CPU
Apple designed ARMv8.5-A
8-core big.LITTLE
Core designs from A15 Bionic

8-core4 high-performance Avalanche4 energy-efficient Blizzard
Speed3.49 GHz2.424 GHz
L1 cache192 KB instructions / 128 KB data128 KB instructions / 64 data
L2 cache16 MB4 MB

GPU
Apple G14G Apple 10 family integrated graphics 10 cores
160 Execution Units
1280 Arithmetic Logic Units
30,720 threads
1.40 GHz
3.55 TFLOPs (FP32)
16 Execution Units per core
8 Arithmetic Logic Units per Execution Unit

Memory
Unified 128-bit 6,400 MT/s LPDDR5 SDRAM at 100 GB/s 3,200 MHz
Up to 24 GB

Other
Neural Engine 16-core 15.8 trillion operations per second 5th gen (H14_Bia_J6x (j4xx))
PCIe 4.0 storage controller to 2 TB SSD
2x USB-C 4 controller with Thunderbolt 3/USB 4 support (40 Gb/s)
2x USB 3.0 Type-A (5 Gb/s)
Secure Enclave and Image Signal processor
Video codec encoding support for 8K HEVC and 8K H.264
Video codec decoding support for 8K ProRes, 8K HEVC, 8K H.264, VP9 and JPEG
HDMI 2.0
One 6K display at 60Hz or one 4K display at 60Hz HDMI
Wi-Fi 6E up to 2.4 Gbit/s 2x2
Bluetooth 5.3
 
Last edited:
Apple M3 (2023)
APL1201 (Ibiza, T8122)
TSMC N3B 3nm
25 billion transistors

CPU
Apple designed ARMv8.6-A
8-core big.LITTLE
Core designs from A16 Bionic

8-core4 high-performance Everest (V2)4 energy-efficient Sawtooth (V2)
Speed4.05 GHz2.75 GHz
L1 cache192 KB instructions / 128 KB data128 KB instructions / 64 data
L2 cache16 MB4 MB

GPU
Apple G16G Apple 10 family integrated graphics 10 cores
160 Execution Units
1280 Arithmetic Logic Units
30,720 threads
1.40 GHz
3.55 TFLOPs (FP32)
16 Execution Units per core
8 Arithmetic Logic Units per Execution Unit
Dynamic Caching, Mesh Shading and Ray Tracing

Memory
Unified 128-bit 6,400 MT/s LPDDR5 SDRAM at 102.4 GB/s 3,200 MHz
Up to 24 GB

Other
Neural Engine 16-core 18 trillion operations per second 6th gen (H15_Themis_J51y)
PCIe 4.0 storage controller up to 2 TB SSD
2x USB-C 4 controller with Thunderbolt 3/USB 4 support (40 Gb/s)
2x USB 3.0 Type-A (5 Gb/s)
Secure Enclave and Image Signal processor
Video codec encoding support for 8K HEVC and 8K H.264
Video codec decoding support for 8K ProRes, 8K HEVC, 8K H.264, VP9 and JPEG
HDMI 2.0
One 6K display at 60Hz or one 4K display at 60Hz HDMI
Wi-Fi 6E up to 2.4 Gbit/s 2x2
Bluetooth 5.3
 
Last edited:
Apple M4 (2024)
APL1206 (Donan, T8132)
TSMC N3E 3nm
28 billion transistors

CPU
Apple designed ARMv9.2-A
10-core big.LITTLE
Core designs from A18/A18 Pro

10-core4 high-performance Everest (V3)6 energy-efficient Sawtooth (V3)
Speed4.41 GHz2.89 GHz
L1 cache192 KB instructions / 128 KB data128 KB instructions / 64 KB data
L2 cache16 MB4 MB

GPU
Apple G16G Apple 10 family integrated graphics 10 cores
160 Execution Units
1280 Arithmetic Logic Units
30,720 threads
1.58 GHz
4.26 TFLOPs (FP32)
16 Execution Units per core
8 Arithmetic Logic Units per Execution Unit
Dynamic Caching, Mesh Shading and Second-generation ray tracing engine

Memory
Unified 128-bit 7,500 MT/s LPDDR5X SDRAM at 120 GB/s 3,750 MHz
Up to 32 GB

Other
Neural Engine 16-core 38 trillion operations per second 7th gen (H16x_Leto_J7x)
PCIe 4.0 storage controller up to 2 TB SSD
3x USB-C 4 controller with Thunderbolt 4/USB 4 support (40 Gb/s)
2x USB 3.0 Type-C (10 Gb/s)
Secure Enclave and Image Signal processor
Video codec encoding support for 8K HEVC and 8K H.264
Video codec decoding support for 8K ProRes, 8K HEVC, 8K H.264, VP9, AV1 and JPEG
HDMI 2.1
One 8K display at 60Hz or one 4K display at 240Hz Thunderbolt or HDMI
Wi-Fi 6E up to 2.4 Gbit/s 2x2
Bluetooth 5.3
 
Last edited:
Apple M5 (2025)
APL???? (Hidra, T8142)
TSMC N3P 3nm
28 billion transistors

CPU
Apple designed ARMv9.2-A
10-core big.LITTLE
Core designs from A19/A19 Pro

10-core4 high-performance Everest (V4)6 energy-efficient Sawtooth (V4)
Speed4.46 GHz3.04 GHz
L1 cache192 KB instructions / 128 KB data128 KB instructions / 64 KB data
L2 cache22 MB6 MB

GPU
Apple G17G Apple10 family integrated graphics 10 cores
160 Execution Units
1280 Arithmetic Logic Units
30,720 threads
1.62 GHz
5.13 TFLOPs (FP32)
16 Execution Units per core
8 Arithmetic Logic Units per Execution Unit
Enhanced shader cores, Rearchitected second-generation dynamic caching, Mesh Shading, Third-generation ray-tracing engine and Neural Accelerators

Memory
Unified 128-bit 9,600 MT/s LPDDR5X SDRAM at 153 GB/s 4,800 MHz
Up to 32 GB

Other
Neural Engine 16-core 38 trillion operations per second 9th gen (H17_Theia_J73y)
PCIe 5.0 storage controller up to 4 TB SSD
3x USB-C 4 controller with Thunderbolt 4/USB 4 support (40 Gb/s)
2x USB 3.0 Type-C (10 Gb/s)
Secure Enclave and Image Signal processor
Video codec encoding support for 8K HEVC and 8K H.264
Video codec decoding support for 8K ProRes, 8K HEVC, 8K H.264, VP9, AV1 and JPEG
HDMI 2.1
One 8K display at 60Hz or one 4K display at 240 Hz Thunderbolt or HDMI
Wi-Fi 6E up to 2.4 Gbit/s 2x2
Bluetooth 5.3
 
Apple M4 (2024)
APL1206 (Donan, T8132)
TSMC N3E 3nm
28 billion transistors

CPU
Apple designed ARMv9.2-A
10-core big.LITTLE
Core designs from A18/A18 Pro

10-core4 high-performance Everest (V3)6 energy-efficient Sawtooth (V3)
Speed4.41 GHz2.89 GHz
L1 cache192 KB instructions / 128 KB data128 KB instructions / 64 KB data
L2 cache16 MB4 MB

GPU
Apple G16G Apple 10 family integrated graphics 10 cores
160 Execution Units
1280 Arithmetic Logic Units
30,720 threads
1.58 GHz
4.32 TFLOPs (FP32)
16 Execution Units per core
8 Arithmetic Logic Units per Execution Unit
Dynamic Caching, Mesh Shading and Ray Tracing

Memory
Unified 128-bit 7,500 MT/s LPDDR5X SDRAM at 120 GB/s 3,750 MHz
Up to 32 GB

Other
Neural Engine 16-core 38 trillion operations per second 7th gen (H16x_Leto_J7x)
PCIe 4.0 storage controller up to 2 TB SSD
3x USB-C 4 controller with Thunderbolt 4/USB 4 support (40 Gb/s)
2x USB 3.0 Type-C (10 Gb/s)
Secure Enclave and Image Signal processor
Video codec encoding support for 8K HEVC and 8K H.264
Video codec decoding support for 8K ProRes, 8K HEVC, 8K H.264, VP9, AV1 and JPEG
HDMI 2.1
One 8K display at 60Hz or one 4K display at 240Hz Thunderbolt or HDMI
Wi-Fi 6E up to 2.4 Gbit/s 2x2
Bluetooth 5.3

I don't think the TFLOPS calculation is right:

1280*1.58*2/1000 = 4.0448 TFLOPS

And the M5 is even more off. The only thing that changed is a minor clock speed bump but a nearly 20% increase in TFLOPS? Why?

Also I'm not 100% what the execution unit is here? I was under the impression Apple had 4 x execution units of 32 FP32 units each, not 2 x 64. Unless there is another control structure below what they are calling the execution unit? @leman?
 
I don't think the TFLOPS calculation is right:

1280*1.58*2/1000 = 4.0448 TFLOPS

And the M5 is even more off. The only thing that changed is a minor clock speed bump but a nearly 20% increase in TFLOPS? Why?

Also I'm not 100% what the execution unit is here? I was under the impression Apple had 4 x execution units of 32 FP32 units each, not 2 x 64. Unless there is another control structure below what they are calling the execution unit? @leman?
I'll check back on those numbers later and will update the post if necessary, thanks for sharing your observation. :)
 
  • Like
Reactions: crazy dave
I'll check back on those numbers later and will update the post if necessary, thanks for sharing your observation. :)

My quick check is that the M1 and M2 TFLOPS are correct, it's the M3 number which goes starts to wrong and then snowballs to the M4 and M5.
 
  • Like
Reactions: Aquamite
ok, peak flop value or sustainable (in bus widths, stack length, and other memory considerations etc)
 
ok, peak flop value or sustainable (in bus widths, stack length, and other memory considerations etc)
Whenever someone reports TFLOPS it should be peak, the theoretical throughput of the number of FP32 units x clock speed x 2, unless otherwise stated. The issue here is the numbers being reported are actually even above the peak theoretical value, which is not possible. So even from a perspective of peak vs sustained/practical, it doesn't really matter, the reported TFLOPS have issues starting with the M3.
 
  • Like
Reactions: Aquamite
I don't think the TFLOPS calculation is right:

1280*1.58*2/1000 = 4.0448 TFLOPS

And the M5 is even more off. The only thing that changed is a minor clock speed bump but a nearly 20% increase in TFLOPS? Why?

Also I'm not 100% what the execution unit is here? I was under the impression Apple had 4 x execution units of 32 FP32 units each, not 2 x 64. Unless there is another control structure below what they are calling the execution unit? @leman?
I've edited the TFLOPS for M2, M3 and M4 according to these sources, M5 already matched the source:


M5: https://www.cpu-monkey.com/en/igpu-apple_m5_10_core
 
  • Like
Reactions: CooperBox
The M1 chip doesn’t have ProRes hardware acceleration. Only the subsequent chips have that.
The M1 is entirely CPU/GPU…
Wikipedia lists the base M1 as having ProRes decode but not encode. And what does "The M1 is entirely CPU/GPU" mean?
 
@Basic75 The main indicator or CPU activity is the high level (orange) of processing going on in activity monitor whilst editing ProRes media in Final Cut with an M1 mini with 16GB RAM.

There doesn’t appear to be any hardware acceleration going on.
In contrast to re-encoding H264 using QT player, which is obviously accelerated.

So it’s an empirical observation rather than any knowledge of the M1 SoC design.
 
Last edited:
jeez, I was just thinking just two minutes ago how nice my MacBooks
running INTEL are much better at SSD space than these feeble M1s
or  just sabotages the M1 recently so I blindly purchase a new laptop every 5 years.

personally I loathe/am tired of the m series since that eats and gathers fake memory.
I will deal with the occasional fan and some heat the Intels offer just to have 35GB free than 14GB.

okay....we can resume back to the tech stats of this thread!
 
I've edited the TFLOPS for M2, M3 and M4 according to these sources, M5 already matched the source:


M5: https://www.cpu-monkey.com/en/igpu-apple_m5_10_core
Ah unfortunately cpu-monkey is a little unreliable. This is not the first time I've found them making mistakes. The M2 and M3 TFLOPS are right, but the M4 and M5 are wrong. I can't confirm their clock speed for the M5. But even if it is right, the TFLOPS they calculated from it is still wrong.

M4: 1280*1.58*2/1000 = 4.0448 TFLOPS
M5: 1280*1.9*2/1000 = 4.864 TFLOPS (again, that's if they have the new GPU clock speed right)

Another problem is that cpu-monkey lists double throughput of FP16 relative to FP32 for all of these chips, but Apple didn't gain the ability to do that until the M5. And finally, pretty sure the number of execution units is 4 per core, so it should be 40 units not 160 units*. Apple has a SIMD width of 32 for a total of 40*32 = 1,280 FP32 units (which cpu-monkey does get right, though we don't know yet about the M5 structure and how Apple doubled FP16 throughput, but CPU-monkey doesn't list that anyway). Hopefully I haven't led you astray, but if so @name99 or @leman can correct me.

*EDIT: so tired I managed to confuse myself. It should be right now. I think where CPU monkey got confused and momentarily confused me, is that Apple used to allow 16*32 = 512 threads per core (I believe since Apple has bumped that up to 32*32 = 1024 threads per core, but that's not the same as the execution units/core counts).

 
Last edited:
@Basic75
No Mac before the introduction of Apple Silicon had a ProRes decoder/encoder except the Afterburner card in the 2019 Mac Pro.
The T2 chip was restricted to H.264/265, to add that acceleration to the iMac Pro as the Xeon CPU didn't have Intel's QuickSync.

My impression was that Apple built T2 functionality into Apple Silicon Gen1, so to add ProRes hardware decode seems an oddity, since they had hitherto optimised MacOS to handle ProRes using the CPU/GPU.

During the year that it took to develop the M1Pro/Max SoCs, they presumably ported the Afterburner FGPA hardware design into Apple Silicon's Media Engine, and introduced ProRes acceleration.

So it seems very unlikely that there is any acceleration of ProRes decode in the M1, beyond the H.26* acceleration imported from the T2 chip.
 
Last edited:
Ah unfortunately cpu-monkey is a little unreliable. This is not the first time I've found them making mistakes. The M2 and M3 TFLOPS are right, but the M4 and M5 are wrong. I can't confirm their clock speed for the M5. But even if it is right, the TFLOPS they calculated from it is still wrong.

M4: 1280*1.58*2/1000 = 4.0448 TFLOPS
M5: 1280*1.9*2/1000 = 4.864 TFLOPS (again, that's if they have the new GPU clock speed right)

Another problem is that cpu-monkey lists double throughput of FP16 relative to FP32 for all of these chips, but Apple didn't gain the ability to do that until the M5. And finally, pretty sure the number of execution units is 4 per core, so it should be 40 units not 160 units*. Apple has a SIMD width of 32 for a total of 40*32 = 1,280 FP32 units (which cpu-monkey does get right, though we don't know yet about the M5 structure and how Apple doubled FP16 throughput, but CPU-monkey doesn't list that anyway). Hopefully I haven't led you astray, but if so @name99 or @leman can correct me.

*EDIT: so tired I managed to confuse myself. It should be right now. I think where CPU monkey got confused and momentarily confused me, is that Apple used to allow 16*32 = 512 threads per core (I believe since Apple has bumped that up to 32*32 = 1024 threads per core, but that's not the same as the execution units/core counts).

I THINK (patents and some benchmarks I've seen both suggest this) that M5 FP16 throughput is 3x FP32 -- but ONLY for matrix multiply.
 
  • Like
Reactions: crazy dave
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.