When you say “warm up”, does it also ramp down slowly, or does it need to start ramping up again after each work package completes? This might be interesting to see graphically, if you have the time. Is it linear? Would we expect the Ultra to take twice as long?
My methodology is probably much more primitive than you might think I am not aware of any way to query the GPU frequency, and the GPU power consumption counter is neither informative nor high resolution, so I simply looked at the output of the Metal trace profiler. There is an indicator that shows the GPU performance state (low, medium, high). The moment a kernel starts executing, the GPU goes into the “medium” state, and it takes it about 10ms to go into the “high” state. I will try to investigate how long it stays in the “high” state after the work is completed, not sure whether the profiler will give me this information.
This is a strange design approach. I wonder if the reason it ramps up slowly is because the packaging approach can’t handle the inrush current required to just step the clocks up. Still, 10ms seems a long time to compensate for that.
I don’t think it’s strange if you consider that M1 aims to be efficient. It would be wasteful to trigger the high performance mode every time the GPU to render a button, so their solution to keep the clocks low as long as there is no urgent work seems like a reasonable strategy. 10ms is brief enough to appear instantaneous to human perception, and it you can do the work within that period of time - great! If not, we’ll, time to gather those performance reserves.
In other words, you probably don’t care whether your image filter application takes 10ms or 1ms - both is quicker than the button press animation. But you probably care whether it takes 10 seconds or 40 seconds.
In the end, the only class of applications that “suffers” from this style of power management are benchmarks. Abs that’s why benchmarks should include a warmup phase to make sure that they are measuring the correct thing.