Your friends already have an M1 ultra?From my obsessive friends the difference in WoW same settings between 3090 and m1 ultra is 7-9 fps,
If that helps
Your friends already have an M1 ultra?From my obsessive friends the difference in WoW same settings between 3090 and m1 ultra is 7-9 fps,
If that helps
Yes he got it on the 18thYour friends already have an M1 ultra?
This video confirm M1 Ultra doesn’t ramp up in Geekbench
Could Apple have a problem with the drivers?According to that review Ultra doesn’t ramp up anywhere. Bug? Or some other weirdness? BTW, fully agree with Maxim that it’s a shame that the software is not ready.
You're talking about the GPU?According to that review Ultra doesn’t ramp up anywhere. Bug?
Again, this is not reflected in real-world usage and this is not how GPUs are utilized.
You keep saying that's not how GPUs are used, but that's clearly not true. GPUs are used that way all the time-- acceleration of highly parallel computations, even if they're discrete operations rather than massive, continuous computation. We're just learning that the M1 series doesn't necessarily scale as well at that kind of workload.It may not make it inaccurate in the narrow sense of the word, but it does make it uninteresting. GB5 compute is essentially measuring GPU performance on very short workloads. That’s not how GPUs are used.
Yes, that's what I mean by "native". Running under Rosetta 2 to see how your x86 code will run on the new CPU versus your current Intel CPU in the short term is valid, but comparing an x86 application under Rosetta 2 with an i9 running native and saying the M1 or Arm is somehow limited beyond the need to migrate applications is not valid. Most critical applications have quickly been ported to AS native.Running stuff under Rosetta 2 is fair in my book - a lot of apps are still Intel and knowing how fast they run on AS is an important metric. Depends on what you are after though. I agree that running some sort of CPU test using x86 code and then making conclusions about M1 CPU performance is dumb.
It's possible that 100% is referenced to a Max, not a dual-Max Ultra. The number says 100%, but the bar graph only goes half way up. I suspect the code is only using 100% of 32 cores.You're talking about the GPU?
It does ramp up in 3DMark and in the GFXBench tests where it is much faster than the M1 Max. Some Tomb Raider results also showed good scaling at high resolution (not far from the RTX 3090).
The Blender test is disappointing. Wattage was very low although GPU frequency was normal and usage was reported at 100%. I suppose the software is not mature enough and doesn't use GPU ressources very well.
Software is young, nerds (us) needs to calm down, the product is still shipping to costumers, new products gets new crucial updates for months after release and it is a good thing.According to that review Ultra doesn’t ramp up anywhere. Bug? Or some other weirdness? BTW, fully agree with Maxim that it’s a shame that the software is not ready.
The graph also goes halfway up during the 3DMark test, although both GPUs are used (we know it since the M1 ultra is much faster than the Max).It's possible that 100% is referenced to a Max, not a dual-Max Ultra. The number says 100%, but the bar graph only goes half way up. I suspect the code is only using 100% of 32 cores.
If apps need to be updated to support the M1 Ultra, this is not good news. Apple strongly implied that the dual nature of the M1 ultra was invisible to software.Software is young, nerds (us) needs to calm down, the product is still shipping to costumers, new products gets new crucial updates for months after release and it is a good thing.
still must be a bug since other apps see the Ultra extra hardwareIf apps need to be updated to support the M1 Ultra, this is not good news. Apple strongly implied that the dual nature of the M1 ultra was invisible to software.
In a sense, if the M1 ultra does not use its full potential at low res when it gives more than 120 fps anyway, this is totally fine. Reaching >200 fps is purely a benchmarking exercice, but it has not practical use on the Mac (no one uses a gaming monitor on a Mac, and I believe that improvements past 120 fps are hardly noticeable).From what i can see the Ultra gpu shines at 1440p or higher, but in 1080p is even worse than the Max
they did it on beta version and i quote from Apple itselfHow did Apple test GPU drivers if the software and benchmarks don't accurately reflect Ultra performance? How did Apple do performance testing for marketing?
You keep saying that's not how GPUs are used, but that's clearly not true. GPUs are used that way all the time-- acceleration of highly parallel computations, even if they're discrete operations rather than massive, continuous computation. We're just learning that the M1 series doesn't necessarily scale as well at that kind of workload.
It’s not sending a small task once and only once, it’s a sharing of resources between the parallelizable bits on GPU and the flow control and logic on the CPU.You think so? I am not aware of many contexts where your just fire off a single very small kernel and that's it. Image/video editing software is probably the most common user of one-time kernels, but that type of software also continuously uses the graphical capabilities of the GPUs, thus giving the system more information about how the GPU is used and what level of performance is expected.
Of course, it all boils down to how Geekbench uses the GPU. If they do each of the benchmarks in isolation, creating a new pipeline separately for each benchmark run and then tearing it down, then no, that's definitely not how these devices are supposed to be used.
It’s not sending a small task once and only once, it’s a sharing of resources between the parallelizable bits on GPU and the flow control and logic on the CPU.
How do you think Apple is handling the clock, then? GB5 is running a stereo disparity map, for example. How is the GPU supposed to know there’s another one coming after the first one? It’s starting to look more like an incomplete implementation in the drivers or something than a question of ramping the clock but if you’re right about a clock ramp I assumed it ramped up as a long job continued. Are you saying it can somehow predict how many small jobs it will see in the future?
And why ramp up, why not step up? This still seems to go against the race to sleep approach used elsewhere…
It is not that easy, M1 runs everything tat was running before, but with every computer realised comes firmware/bios/drivers updates in the first weeks to improve performance/stability, this computer will not be the only one expection. I still make daily BIOS and drivers update to Windows client that several years old. I would be worried that no updates come with a new machine.If apps need to be updated to support the M1 Ultra, this is not good news. Apple strongly implied that the dual nature of the M1 ultra was invisible to software.
It is not that easy, M1 runs everything tat was running before, but with every computer realised comes firmware/bios/drivers updates in the first weeks to improve performance/stability, this computer will not be the only one expection. I still make daily BIOS and drivers update to Windows client that several years old. I would be worried that no updates come with a new machine.
Doesn't seems strange to me since I use computer since I was a baby and this is the way for everyone, check firmware updates on new related Mac or PC even 10 years ago. Check Apple Support download and see that every big launch was followed by firmware updates. It is complex, it is not Apple or other company issue, it is normal.It seems strange that Apple would release such a flagship device and have not embedded the required hardware to actually use all of the resources it can bring to bear.
I'd say if the behavior we're seeing is due to the first paragraph, then the benchmark is accurate. If it's due to the second paragraph the benchmark is not accurate.I don’t know, I’m speculating My guess is that Apple would use GPU utilization counters as well as additional heuristics to manage the GPU power. That’s something one should experiment with.
Or maybe Geekbench is simply not launching enough threadgroups to properly utilize the Ultra. Could also be.
When you say “warm up”, does it also ramp down slowly, or does it need to start ramping up again after each work package completes? This might be interesting to see graphically, if you have the time. Is it linear? Would we expect the Ultra to take twice as long?I've been doing some experiments on my M1 Max... so far, it seems to take about 10ms for the GPU to ramp up to the high power mode. That is, if the compute shaders are done in less than 10ms, you won't get peak performance. And 10ms is a very long time in GPU land. My experiment involved doing long chains of trivial dependent computation (1024 operations per GPU thread) without memory access and launching various number of computation groups (threagroups in Metal lingo) with 1024 threads each. Launching 1024 theadgroups (that's 1 million threads or 1 billion computations) takes less than a millisecond, so it's not enough to trigger high performance mode — I only get around 30% of peak performance (or 1.5 TFLOPS). I need to launch more then 64K threadgrouops (that's over 64 billion computations) for the speed to start ramping up, and it takes over 256k threadgroups to reach close to the maximal aggregated 5 TFLOPS for M1 Max.
Bottomline is: if you want to benchmark the maximal performance on M1 GPUs, you need to "warm up" the device by doing excess of 10ms of work, which is A LOT OF WORK on these devices. Benchmark runs should ideally push the GPU for at least 1 second sustained work, ideally longer. It is very unlikely that Geekbench uses any workloads that adhere to these criteria and their style of running benchmarks sequentially basically means that the GPU will never leave the minimal/medium power mode.
P.S. Note that I say that 5TFLOPS is maximal for M1 Max where the usual figures state 10TFLOPS — that's because fused multiply-add (a*b + c) is executed as a single operation. So peak FLOPS using FMA operations is obviously double, if you prefer to count it that way. That's how GPU makers "cheat" with their peak figures (and yes, everyone does it).