Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

theorist9

macrumors 68040
May 28, 2015
3,880
3,060
As things are right now, despite losing by about 10% in ST benchmarks, I feel that M1 has more useful single threaded performance than Alder Lake.

It's 2022, ST isn't ST anymore. We all have tons of background tasks going all the time now.

Intel chips won't run at max turbo frequency unless there is truly just one thread running, and the dropoff in frequency as more cores turn on can be quite severe. People who know anything about benchmarking methodology correctly quit all the software they can before running their benchmark, because you don't want the variance from random competitors for CPU time polluting the results. This helps Alder Lake actually hit its ST peak (max turbo) numbers, but isn't too representative of the real world where most people have at least a few browser tabs open all the time.

The other thing I'd cite is that if you start up a heavy compute job on 4 cores of your M1 Pro, then tab over to another program which needs 1 core, the 1-core program will run no worse than 94% as fast as it would with nothing else running. By contrast, a top of the line Alder Lake desktop chip has a base frequency about 62% of its max turbo ST frequency.

The ability to run any core at near-peak speed regardless of load on other cores is why M1's real-world ST performance is superior. You're never significantly penalized by anything less than #active threads >= #cores.
Excellent point. Plus another (which applies to mobile specifically) is that, if you're running Alder Lake on battery, its performance will probably take a big hit vs the figures measured when plugged in.

Though once we get so close in performance that such things matter, we also need to consider the other key determinant of speed: How well-optimized your workflow is for Intel vs AS. For instance, even though an AS-native version of Mathematica is available, it continues to be much better optimized for Intel than AS; so if you use that app heavily, Intel will continue to outperform AS.
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,060
A stock i5-12600K has a much more reasonable TDP of 125w/150w and also beats the M1 in single-core performance. I looked but couldn't find a precise power draw figure for it running single-core Geekbench 5 but it should be well below these TDP numbers. Of course it'll still be materially higher than the M1's power draw.
Good point, but even when comparing Intel's mobile i9 (which should have less power draw than that i5 desktop chip), you've got twice the max power draw (considering the system as a whole) than with a comparable AS chip:
 
Last edited:

Argon_

macrumors 6502
Nov 18, 2020
425
256
A dubious virtue if you look at the multicore performance. For anyone needing a fast CPU the 12900K will be running circles around the mini.

Agreed. My point is about the unusual situation of an Apple computer holding a performance per dollar advantage.
 
  • Haha
Reactions: mi7chy

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
The reason why Apple formally killed the performance-oriented iMac is that they want to use this "bigger Mac Mini" to replace the iMac Pro. Reasons unknown, but my guess is that the sales data shows that most iMac buyers does not purchase the high end models.

Back in the 2018-2019 era the iMac Pro was about the 5-9 best selling 27" iMac configuration selling on B&H Photo's web site. you could select iMac and then order by popularity. It wasn't a big volume leader but it sold. So did the 27" models overall.

Some will stay the iMac Pro sales were mainly driven by the Mac Pro 2013 being so old and dated, but people did buy higher end All-in-Ones. And when Apple dropped the iMac Pro and went to 8-10 core options on iMac that also worked as one of the top ten configuration sellers.

Pretty decent chance that Apple can't find a 27" panel to "upgrade to" that is also in the price range they want to pay. That is probably a contributing reason more so than lack of projected sales.

Looking at the iMac 24" and Studio the other issue is that Apple likely painted themselves into a corner thinning out the 27" chassis. Thinned out to point Max or Ultra would be a problem thermally. [ could just reuse the current chassis but won't. ] So in order to get away from the iMac "thinness police" it was better to "bigger Mini" than fight with folks internally over why the iMac should take principle design cues from the iPad. There were multiple reports of turning off the 27" design effort to polish off the 24" model. The Studio Display is also probably killed off resources.

If the Studio display doesn't do well on sales volume then the iMac 27" will probably be back in a year or two. ( display supply chains stabilize and Apple can squeeze for lower bill of material panels that also represent an upgrade. )
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,060
Macs now represent a compelling value, purely from a perspective of performance per dollar. Not something we could say in the Intel days.

Remember that a 12900K alone costs 2/3 of an M1 Mini, making the latter's performance that much more impressive.

As a value/performance comparator to the M1 Air, you'd want to look at something like the Dell XPS 13 Plus.

As a comparator for the 12900K, you'd need to use some hybrid of a Max and Ultra: For multithread, with SPEC2017, a DDR5-equipped 12900K equals the Max in floating point op speed (lower bar in each pair, see screenshot from Anandtech.com), and looks like it would approximately equal the Ultra in integer ops (upper bar). Averaging the two, the 12900K offers 20% higher MT CPU performance than the Max (ST CPU performance, not shown here, averages 10% higher).

A Dell XPS Desktop, i9-12900K, RTX3080 (not Ti), 32 GB DDR5, 2 TB, is $2900.
A Max Studio, 32 GB, 2 TB, is $2800.

The XPS offers upgradeable SSD and RAM, higher max RAM (128 GB as compared with 64 GB), and ~10% and 20% higher ST and MT CPU performance, respectively (on average). It may also have better GPU performance.

The Studio is much quieter, more compact, and more efficient.

For generalized performance (not including software-specific hardware acceleration found with the Studio), the XPS is the better value (even more so if you need an 8 TB SSD, since you can pay aftermarket SSD prices instead of Apple's). For usability, it's the Studio.

Of course, there's also not having to work in Windows which, for me, trumps all of this. But that's a personal choice.


1652133865243.png
 
Last edited:
  • Like
Reactions: BootsWalking

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
Intel chips won't run at max turbo frequency unless there is truly just one thread running, and the dropoff in frequency as more cores turn on can be quite severe. People who know anything about benchmarking methodology correctly quit all the software they can before running their benchmark, because you don't want the variance from random competitors for CPU time polluting the results. This helps Alder Lake actually hit its ST peak (max turbo) numbers, but isn't too representative of the real world where most people have at least a few browser tabs open all the time.
This is a very good point.
 

fakestrawberryflavor

macrumors 6502
May 24, 2021
423
569
I think it will really depend on what we will see from the M2 and M3 over the next few years. I think Apple has really stretched their chip design facilities by making all these different components for iPhone, Apple Watch, Airtags and so on, and then adding on top making the M1, M1 Pro, M1 Max and M1 Ultra. It was a major design effort.

You can tell by the smaller jump forward for the A15, which had a modest battery life gain but only about 10% speed gain compared to A14 (geekbench 5 of 1700 compared to 1570). The best current Intel cpus rate about 1950 single core, so about a 10% speed difference.

The thing is, will Apple be able to hit their stride and continue delivering yearly or bi-yearly gains?
Gonna make a prediction that in GB5 single core score the A16 scores about 2000 and the M2 performance core is between 2100-2200. Let’s see how good or bad this is to me 😂
 

BootsWalking

macrumors 68020
Feb 1, 2014
2,274
14,232
Intel chips won't run at max turbo frequency unless there is truly just one thread running, and the dropoff in frequency as more cores turn on can be quite severe. People who know anything about benchmarking methodology correctly quit all the software they can before running their benchmark, because you don't want the variance from random competitors for CPU time polluting the results. This helps Alder Lake actually hit its ST peak (max turbo) numbers, but isn't too representative of the real world where most people have at least a few browser tabs open all the time.
The operation of Intel's Turbo Boost is not as binary as you described, both in how it triggers and the frequencies it runs at.

First, it doesn't require only a single-thread to be running to reach maximum boost frequency on a single core. Multiple threads can be running across multiple cores - the requirement is that only one of those threads can be running with a high load. So low-load threads running on other cores while a single-core benchmark is running won't prevent the benchmark core from running at max single-core frequency.

Second, multiple cores can be boosted together and an sliding-scale based on TDP and thermals and the drop off is not severe at all. On most Alder lakes two cores can run at just below the max turbo frequency of a single core (around 100Mhz less vs single core). On the top-of-the-line i9-12900KS, all 8 p-cores can run at just 300Mhz below the max single core boost (base frequency 3.4Ghz, single p-core boost of 5.5Ghz, all p-core boost of 5.2Ghz). On the i9-12900K the difference is just 200Mhz (5.2Mhz -> 5.0Mhz).
 

leman

macrumors Core
Oct 14, 2008
19,521
19,677
First, it doesn't require only a single-thread to be running to reach maximum boost frequency on a single core. Multiple threads can be running across multiple cores - the requirement is that only one of those threads can be running with a high load. So low-load threads running on other cores while a single-core benchmark is running won't prevent the benchmark core from running at max single-core frequency.

“Running on low load” = not running most of the time. You are arguing semantics. “CPU load” is hardly a hardware concept. It’s just the proportion of time that a core is kept running rather than halted.


Second, multiple cores can be boosted together and an sliding-scale based on TDP and thermals and the drop off is not severe at all. On most Alder lakes two cores can run at just below the max turbo frequency of a single core (around 100Mhz less vs single core). On the top-of-the-line i9-12900KS, all 8 p-cores can run at just 300Mhz below the max single core boost (base frequency 3.4Ghz, single p-core boost of 5.5Ghz, all p-core boost of 5.2Ghz). On the i9-12900K the difference is just 200Mhz (5.2Mhz -> 5.0Mhz).

And that why these max clocks are a marketing scam. This “velocity boost” you are referring to only activates under such absurd conditions that it practically never does.
 

theluggage

macrumors G3
Jul 29, 2011
8,013
8,446
Pretty decent chance that Apple can't find a 27" panel to "upgrade to" that is also in the price range they want to pay. That is probably a contributing reason more so than lack of projected sales.
The cost price of electronics is very sensitive to economies of scale, and Apple are effectively the only customer for 5k, 27" panels, so "projected sales" and "cost of panel" are two sides of the same coin.

The potential sales of a 5k iMac would have been eroded in several ways:

First, I'm sure there's a general ongoing trend away from desktops towards laptops, across the whole market... especially in the case of Apple Silicon where the M1 Max MacBook Pro delivers pretty much the same performance as the M1 Max Studio or any hypothetical M1 Max iMac.

Secondly, the 24" M1 iMac now has a bigger and better screen than the old 21.5" and is probably at least comparable in power to the old entry-level i5 5k. That's going to satisfy a lot of potential customers for the entry-level 5k.

Finally, a significant proportion of customers for the high end i9 iMac and iMac Pro will be delighted to have the option of the "headless" Mac Studio which they can team up with their display of choice (...and possibly only bought an iMac or iMac Pro in the past because there was no viable headless Mac at that level). Even if you go for a Mac Studio + Studio Display you're still paying the same ballpark price as an i9 iMac with 32GB RAM expansion. So the Mac Studio will have decimated the high end of the iMac market.

OTOH, making the separate Studio Display (possible flaws of said product QV ad nauseam elsewhere) means that Apple can potentially sell it not just to Mac Studio buyers, but to existing Mac Pro owners who want something less expensive than an XDR and to MacBook/MacBook Pro users who want the ultimate MacBook docking station (only a proportion of MacBook owners, to be sure, but there are a lot more MacBook customers than desktop customers).

So in order to get away from the iMac "thinness police" it was better to "bigger Mini" than fight with folks internally over why the iMac should take principle design cues from the iPad.
Form-over-function thinness does seem to be an issue at Apple and has certainly influenced the design (and some of the constraints) of both the 24" iMac and the Studio Display. However, a lot of customers welcome the idea of a "Mac Mini Pro/Mac Pro Mini" (i.e. the Mac Studio) and there's a lot to be said for being able to choose (and update) the computer and display separately.

Apple don't seem to like competing with themselves for the same market - so I suspect that the Mac Studio + Studio Display will be their "powerful desktop" offering for a while (with any new Mac Pro being in a whole different price bracket).

There's a "timing" thing, too - the 5k panel is still somewhat ahead of the game today but it's still fairly old tech that has only seen marginal improvements since 2014, and it seems likely that we'll see miniLED or even microLED/improved OLED appearing within the useful lifetime of a 2022 Mac Studio. I don't think now would be the right time to buy a new iMac with an old school 5k panel. Although I personally don't think we'll see a new large-screen iMac/iMac Pro in the near future that could change if Apple could offer a more substantially upgraded display.

...and then there's the "M1 Pro" thing, too. If the (regular) M2 (as rumored) has 20-50% single core performance and an extra GPU core - and maybe even support for more RAM - it's likely to thrash the cheaper "binned" M1 Pro on some measures and make the full-fat M1 Pro look a bit mediocre on anything that doesn't really exploit all the cores. (M1 Max has a bit more headroom in terms of GPU and I/O to keep it safe). What's currently missing from the range is a M1 Pro desktop - whether it's a 24" iMac, 27" iMac or a Mac Mini + Studio Display combo, but now would be a bad time to launch/buy a brand new M1 Pro machine if the M2 is coming real soon now to replace the 2020 models.

I'm assuming that the M2 Pro/M2 Max - assuming the M2 keeps that convention - won't show until the 2021 MBPs get upgraded, and then those will have a few months of exclusivity before the chips show up in desktops. Until then, the rumoured M2 Mac Mini could represent the best bangs-per-buck on the desktop.
 
  • Like
Reactions: AlphaCentauri

joema2

macrumors 68000
Sep 3, 2013
1,646
866
...M1 Max is so brutal, what could Apple do better than it?

Relative to their hardware potential, M1 Max and M1 Ultra have significant performance issues on some video encode/decode workloads. This is apparently because the multiple accelerator units are not being used. The M1 Max has two video encode engines, two ProRes encode/decode engines, and the Ultra has double that. IOW 75% of the Ultra's video hardware is just sitting there doing nothing. The reason it seems fairly fast is the accelerators that actually work are pretty quick, but in theory the M1 Max could be about 2x faster and the Ultra 4x faster on some encode/decode workloads.

The underlying issue may be that the VideoToolBox framework is not thread safe, hence cannot dispatch multiple units of concurrent work to the accelerators. There would also have to be a programmer-facing API to expose this functionality.

This is ironic since at Apple's "Peek Performance" event in March, hardware VP Johny Srouji said of the M1 Ultra: "...thanks to the magic of the UltraFusion architecture, it behaves like a single chip to software...it has twice the capabilities of the amazing media engine in M1 Max for unprecedented....video encode and decode throughput.."

John Ternus, Sr. VP of Apple Hardware Engineering: "One of the things that makes Apple Silicon so unique is how tightly it integrates with the operating system. This integration enables MacOS to scale with M1 Ultra, allowing it to automatically benefit from M1 Ultra's immense capabilities, delivering another big step forward in performance...And because M1 Ultra looks like a single piece of silicon to software, apps will benefit from its extraordinary capabilities without any additional work."

Unfortunately it appears that apps including Apple's own Final Cut Pro and the associated software frameworks need a lot of additional work to fully use the M1 Max and M1 Ultra hardware. The statement about not requiring "any additional work" seems incorrect.

It's interesting all those statement were made by hardware VPs. There were no statements from software VP (Craig Federighi) or from any Sr. engineer on the "Core Darwin" team. Maybe John Ternus needed to talk more with Craig Federighi.

So to answer your question about what could Apple do better: properly coordinate hardware and software development so their products fully leverage the capabilities of each -- upon the initial release.
 
  • Like
Reactions: JMacHack

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
So to answer your question about what could Apple do better: properly coordinate hardware and software development so their products fully leverage the capabilities of each -- upon the initial release.
I hate to give credence to the “Apple’s software is on a downslide” guys, but I they’ve certainly dropped the ball many times lately. (Though I have to say my experience with Monterey has been fantastic, and Big Sur has massively improved over its initial release.)

Also I’d like to pre-empt any snow leopard simps by saying I’ve used it and it’s still crap compared to later MacOS releases, fight me.
 

BootsWalking

macrumors 68020
Feb 1, 2014
2,274
14,232
“Running on low load” = not running most of the time. You are arguing semantics. “CPU load” is hardly a hardware concept. It’s just the proportion of time that a core is kept running rather than halted.
The poster stated "Intel chips won't run at max turbo frequency unless there is truly just one thread running". That is demonstrably false. Pointing that out isn't semantics.
And that why these max clocks are a marketing scam. This “velocity boost” you are referring to only activates under such absurd conditions that it practically never does.
Huh? The core clock rates start rising immediately upon any non-trivial load, which is easily observable with any number of third-party apps that let you see the instantaneous clock rate for each core. Even Intel has an app that lets you see this. The clock rates scale back down when the load no longer needs it, for power efficiency. Every modern CPU does this, including the M1. Even DDR does this. If you think that's a scam then how do you propose it should work instead?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,677
The poster stated "Intel chips won't run at max turbo frequency unless there is truly just one thread running". That is demonstrably false. Pointing that out isn't semantics.

Disclaimer: I don’t know the exact conditions for Turbo Boost. It has been a while since I’ve read Intels documentation and I don’t remember the details. What I am trying to say that „one thread running“ and „other threads running on low load“ ends up being the same thing. It’s one core doing work with other cores being parked significant amount of time.

Huh? The core clock rates start rising immediately upon any non-trivial load, which is easily observable with any number of third-party apps that let you see the instantaneous clock rate for each core. Even Intel has an app that lets you see this. The clock rates scale back down when the load no longer needs it, for power efficiency. Every modern CPU does this, including the M1. Even DDR does this. If you think that's a scam then how do you propose it should work instead?

I am talking specifically about Intel Velocity Boost technology, the last few hundreds Mhz. For Coffee Lake, it triggered under the condition that Tcase is under 50C. Which basically never happens. From what I understand, this technology is more adaptive in Alder Lake but the principle remains the same. It allows Intel to market their CPUs with certain max clocks while virtually guaranteeing that you will never se these clocks in practice. On desktop, sure, you can get some fancy watercolor capable of dissipating 300W. On mobile, fat chance. All those 5Ghz CPUs are in reality 4.8ghz. That’s what I mean with „scam“.
 

BootsWalking

macrumors 68020
Feb 1, 2014
2,274
14,232
Disclaimer: I don’t know the exact conditions for Turbo Boost. It has been a while since I’ve read Intels documentation and I don’t remember the details. What I am trying to say that „one thread running“ and „other threads running on low load“ ends up being the same thing. It’s one core doing work with other cores being parked significant amount of time.
Right, but the other cores are spinning in HLT loops most of the time because they're able to complete the thread workloads assigned to them without having to scale up. This is how you want the CPU to run so I'm still not seeing the distinction you're trying to make. It's a multi-threaded workload on the system with only one thread with heavy load. It is exactly how a real-world multitasking system would run when one application is CPU-bound on a single-threaded operation, which is typical for most applications.
I am talking specifically about Intel Velocity Boost technology, the last few hundreds Mhz. For Coffee Lake, it triggered under the condition that Tcase is under 50C. Which basically never happens. From what I understand, this technology is more adaptive in Alder Lake but the principle remains the same. It allows Intel to market their CPUs with certain max clocks while virtually guaranteeing that you will never se these clocks in practice. On desktop, sure, you can get some fancy watercolor capable of dissipating 300W. On mobile, fat chance. All those 5Ghz CPUs are in reality 4.8ghz. That’s what I mean with „scam“.
Ok, I see what you meant now. Intel advertises "up to" for their top-line GHz ratings but they do fully document the max GHz for each step of the boosting technology.
 

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
The underlying issue may be that the VideoToolBox framework is not thread safe, hence cannot dispatch multiple units of concurrent work to the accelerators. There would also have to be a programmer-facing API to expose this functionality.

This is ironic since at Apple's "Peek Performance" event in March, hardware VP Johny Srouji said of the M1 Ultra: "...thanks to the magic of the UltraFusion architecture, it behaves like a single chip to software...it has twice the capabilities of the amazing media engine in M1 Max for unprecedented....video encode and decode throughput.."

John Ternus, Sr. VP of Apple Hardware Engineering: "One of the things that makes Apple Silicon so unique is how tightly it integrates with the operating system. This integration enables MacOS to scale with M1 Ultra, allowing it to automatically benefit from M1 Ultra's immense capabilities, delivering another big step forward in performance...And because M1 Ultra looks like a single piece of silicon to software, apps will benefit from its extraordinary capabilities without any additional work."

Unfortunately it appears that apps including Apple's own Final Cut Pro and the associated software frameworks need a lot of additional work to fully use the M1 Max and M1 Ultra hardware. The statement about not requiring "any additional work" seems incorrect.
The M1 Ultra does behave like a single chip. There's nothing incorrect about that statement. I don't know what the problem is with the encode/decode engines but I think John Ternus was talking about how other multiprocessors systems require extra work due to latency/bandwidth between chips to coordinate caches and stuff.

The poster stated "Intel chips won't run at max turbo frequency unless there is truly just one thread running". That is demonstrably false. Pointing that out isn't semantics.
I doubt he meant *literally* one thread running, but rather a single thread running a significant workload. Hence the later mention to how most programs should be closed before running benchmarks (no significant background processes, but definitely not a single thread running on the OS).
 
Last edited:

joema2

macrumors 68000
Sep 3, 2013
1,646
866
The M1 Ultra does behave like a single chip. There's nothing incorrect about that statement. I don't know what the problem is with the encode/decode engines but I think John Ternus was talking about how other multiprocessors systems require extra work due to latency/bandwidth between chips to coordinate caches and stuff.
Hardware VP Johny Srouji said of the M1 Ultra: "...thanks to the magic of the UltraFusion architecture, it behaves like a single chip to software...it has twice the capabilities of the amazing media engine in M1 Max for unprecedented....video encode and decode throughput.."

Despite UltraFusion, the marquee feature of multiple video engines is simply not working in the current version, even though Apple expended a significant % of their transistor budget on this. With multiple CPU cores, a tapering off of scalability is expected due to Amdahl's Law. By contrast the M1 Max/Ultra multiple accelerators don't show diminished scalability -- rather only single units are working. There is zero scalability.

In a recent interview discussing the M1 Ultra, Apple VP Tom Boger described the UltraFusion architecture as "incredibly important" for video workloads.

Supposedly a key advantage of Apple developing both hardware, system software and application software is coordination and integration.

Apple's head of Pro Apps marketing Xander Soren described this integration as "...many years of work of not only developing the Pro Apps, but tuning them really in a very specific way to our hardware."

Whatever "tuning" took place in Apple's Pro Apps or system software, it is currently leaving unused much of the video encode/decode hardware on the M1 Max and Ultra. This implies lack of coordination between Apple's hardware and software teams during the development process.
 

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
Hardware VP Johny Srouji said of the M1 Ultra: "...thanks to the magic of the UltraFusion architecture, it behaves like a single chip to software...it has twice the capabilities of the amazing media engine in M1 Max for unprecedented....video encode and decode throughput.."

Despite UltraFusion, the marquee feature of multiple video engines is simply not working in the current version, even though Apple expended a significant % of their transistor budget on this. With multiple CPU cores, a tapering off of scalability is expected due to Amdahl's Law. By contrast the M1 Max/Ultra multiple accelerators don't show diminished scalability -- rather only single units are working. There is zero scalability.

In a recent interview discussing the M1 Ultra, Apple VP Tom Boger described the UltraFusion architecture as "incredibly important" for video workloads.

Supposedly a key advantage of Apple developing both hardware, system software and application software is coordination and integration.

Apple's head of Pro Apps marketing Xander Soren described this integration as "...many years of work of not only developing the Pro Apps, but tuning them really in a very specific way to our hardware."

Whatever "tuning" took place in Apple's Pro Apps or system software, it is currently leaving unused much of the video encode/decode hardware on the M1 Max and Ultra. This implies lack of coordination between Apple's hardware and software teams during the development process.
You are piecing together sentences as if he was talking about the same thing. He wasn't. What he said is:

"The result is an SoC with blazing performance due to low latency, massive bandwidth, and incredible power efficiency. And thanks to the magic of UltraFusion architecture if behaves like a single chip to software. And preserves the benefits of the Unified Memory Architecture"

Then he goes on talking about the M1 Ultra architecture, transistor count and layout for several minutes, and only then says:

"It [the M1 Ultra] has twice the capabilities of the amazing media engine in M1 Max for unprecedented video encode and decode throughput."

Then he spends several more minutes talking about the M1 Ultra CPU and GPU power efficiencies before finishing his part of the presentation, and he doesn't ever mention video decoding throughput again.

He wasn't implying, at any point, that extra accelerators would be used without developer effort, just because the package behaves as if it were a single chip.

Additionally, a quick google search seems to indicate that Final Cut DOES use all the media engines in the M1 Ultra whenever possible. If the encoding/decoding of a single file/video doesn't go any faster I'd bet it's because you can't split the encoding of a file between different Media Engines trivially (just like you can't split a single threaded task into multiple cores trivially). But you should be able to encode/decode four streams of video in the same amount of time it takes for the M1 to encode a single stream. Which is, admittedly, not very useful if you're exporting a Final Cut timeline that only shows a single source clip onscreen at a time. Composite videos with multiple angles all shown at once (like the one they demoed on the keynote, IIRC) may have reduced export times though.
 

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
Additionally, a quick google search seems to indicate that Final Cut DOES use all the media engines in the M1 Ultra whenever possible. If the encoding/decoding of a single file/video doesn't go any faster I'd bet it's because you can't split the encoding of a file between different Media Engines trivially (just like you can't split a single threaded task into multiple cores trivially). But you should be able to encode/decode four streams of video in the same amount of time it takes for the M1 to encode a single stream. Which is, admittedly, not very useful if you're exporting a Final Cut timeline that only shows a single source clip onscreen at a time. Composite videos with multiple angles all shown at once (like the one they demoed on the keynote, IIRC) may have reduced export times though.

Reading through the thread I was wondering the same thing. Video encoding in particular is an odd duck.

You can parallelize the work within a single frame easily enough by splitting up the frame into regions for some work. But are the accelerators able to work on a region, or are they frame-by-frame? If the latter, then it’s a more serial process because of the inherent dependencies between frames. And it would mean the scenario you described is completely accurate: you can encode/decode multiple streams simultaneously, but you can’t split the work of one stream across multiple units.

And I suspect this is the case with these hardware units, but I don’t have a ton of experience with these sort of accelerators to know for sure. But generally you are feeding a hardware encoder/decoder a stream of video data and it spits out the transformed data the other end, which means splitting up by region isn’t that straight-forward.
 
  • Like
Reactions: leman

joema2

macrumors 68000
Sep 3, 2013
1,646
866
...are the accelerators able to work on a region, or are they frame-by-frame? If the latter, then it’s a more serial process because of the inherent dependencies between frames. And it would mean the scenario you described is completely accurate: you can encode/decode multiple streams simultaneously, but you can’t split the work of one stream across multiple units.

And I suspect this is the case with these hardware units, but I don’t have a ton of experience with these sort of accelerators to know for sure. But generally you are feeding a hardware encoder/decoder a stream of video data and it spits out the transformed data the other end, which means splitting up by region isn’t that straight-forward.
The regions are called GOPs (Groups of Pictures) and for GOP compression algorithms like H264, HEVC, VP9, etc, each GOP is usually independent of others. This means each one can be processed in parallel, given the available hardware. Within each GOP the processing must be serialized, IOW one frame must be processed before another, but a typical GOP size might be 16 to 30 frames, so there are many frames in a file available for concurrent processing.

However the All-Intra codecs such as ProRes, XAVC-I, etc are composed of independent frames. Given multiple accelerators they could each be fed a batch of frames.

In fact the MacOS VideoToolBox API VTCompressionSession is designed to work on batches of frames. It automatically applies any available hardware acceleration. Unfortunately I don't think VideoToolBox is thread safe so I don't think multiple threads within a process can call it simultaneously. That might be the reason that the four encoders on M1 Ultra don't seem to benefit single-stream tasks.

At least that is my observation from testing my M1 Max MacBook Pro 16 and M1 Ultra Mac Studio.

But given the lead time in silicon design and fabrication, Apple has known about the M1 Ultra for years. It would seem that's enough lead time for a single company to coordinate hardware and software they totally control.
 

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
The regions are called GOPs (Groups of Pictures) and for GOP compression algorithms like H264, HEVC, VP9, etc, each GOP is usually independent of others. This means each one can be processed in parallel, given the available hardware. Within each GOP the processing must be serialized, IOW one frame must be processed before another, but a typical GOP size might be 16 to 30 frames, so there are many frames in a file available for concurrent processing.
It's possible (likely?) that the partition into those groups of pictures is also done in the accelerated hardware. I see for example that H.265 uses something called Coding Tree Units (some sort of k-tree). If the traversal through the tree is done in fixed function hardware of the Media Engine, it may very well be that each Media Engine can only partition and schedule work for itself, even if the tiles generated could theoretically be processed in any SIMD capable hardware.
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,060
The cost price of electronics is very sensitive to economies of scale, and Apple are effectively the only customer for 5k, 27" panels, so "projected sales" and "cost of panel" are two sides of the same coin.
I've been wondering how economies of scale work here in terms of panel production. Panels are probably produced in much larger sizes and then chopped up. The 24" iMac, 27" Studio, and 32" XDR are all 218 ppi, so it's possible Apple is using the same LCD panels (different backlighting, etc., of course) for all three. If so, the 24" iMac should help with economies of scale.

OTOH, the current and former Retina laptop displays are all over the map in their pixel densities (221, 226, 227, and, most recently, 254 ppi), and thus would be using different panels.
 

Unregistered 4U

macrumors G4
Jul 22, 2002
10,610
8,628
I've been wondering how economies of scale work here in terms of panel production. Panels are probably produced in much larger sizes and then chopped up. The 24" iMac, 27" Studio, and 32" XDR are all 218 ppi, so it's possible Apple is using the same LCD panels (different backlighting, etc., of course) for all three. If so, the 24" iMac should help with economies of scale.

OTOH, the current and former Retina laptop displays are all over the map in their pixel densities (221, 226, 227, and, most recently, 254 ppi), and thus would be using different panels.
I don’t think that’s how it works. Panels are made as complete entities and, if there are some dead pixels found in testing, they can’t shave off the bad ones on the edges and repurpose them. The panel is rejected. In order for a specific panel of a specific size and resolution to be created, resources must be allocated for that exact panel. And companies allocate those resources based on how many they can sell (or how many they’ve been contracted to produce).

OH, and the larger the panels and the higher the resolution, the greater the likelihood that a panel will fail which would drive the price of the panels that pass tests higher.
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,060
I don’t think that’s how it works. Panels are made as complete entities and, if there are some dead pixels found in testing, they can’t shave off the bad ones on the edges and repurpose them. The panel is rejected. In order for a specific panel of a specific size and resolution to be created, resources must be allocated for that exact panel. And companies allocate those resources based on how many they can sell (or how many they’ve been contracted to produce).

OH, and the larger the panels and the higher the resolution, the greater the likelihood that a panel will fail which would drive the price of the panels that pass tests higher.
Actually it is, at least for LCD TV's I would think the same principles should apply to panels for LCD computer displays. The "mother glass" produced by a modern (Gen 10.5) LCD plant is quite large. It is then cut up to produce various sizes of display panels:

From:
https://www.forbes.com/sites/willys...d-they-make-my-big-screen-tv/?sh=2ec17ce41003
"The Hefei Gen 10.5 fab is designed to produce the panels for either eight 65 inch or six 75 inch TVs on a single mother glass. If you wanted to make 110 inch TVs, you could make two of them at a time."

And more details on panel cutting from:
"From the cutting method, one 10.5 generation line panel display can effectively cut 18 43 inches, 8 65 inches, 6 75 inches panel display, and can be more efficient in hybrid mode cutting, with half of the panel display 65 inches, the other half of the panel display 75 inches, the yield is also guaranteed."
 

theluggage

macrumors G3
Jul 29, 2011
8,013
8,446
OTOH, the current and former Retina laptop displays are all over the map in their pixel densities (221, 226, 227, and, most recently, 254 ppi), and thus would be using different panels.
I suspect that it’s more down to sheer numbers of sales than the technicalities of production - the MacBooks self-evidently sell in sufficient quantities to justify having unique displays. Desktops are generally less popular these days (esp. since Apple doesn’t do bog-standard corporate Mini-towers).

It’s also a no-brainier that - for a given PPI/technology - failure rate is going to go up with the number of pixels, which will go up with the square of screen size…
 
  • Like
Reactions: JMacHack
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.