I'm sorry but people who thought that an integrated graphic can beat top of the line graphics card are very naive
Let us finally put this "integrated graphics must be bad" thing to rest. A GPU is a GPU, it all depends on how big and powerful you make it. M1 Max has 4096 GPU "shader cores" running at approx 1.2Ghz, so that's the baseline GPU throughput you get. Closest equivalents in this department are Nvidia RTX 3060 or Radeon RX 6800. M1 Max is the first "big" GPU Apple made, and it makes perfect sense given which products it is intended for (laptops with maximal power consumption of 80-90 watt). This is not the last "large" GPU Apple will build and certainly not the largest they will build. The only thing preventing them of putting 10+K shader cores in an integrated solution is a) power consumption and b) size. But there will be other products for which Apple will undoubtedly use bigger GPUs.
So no, M1 Max is not slower than a desktop RTX 3080 because it's "integrated". It is slower because it has literally half the amount of shader cores and runs on lower frequency. Different products for different use cases.
The main issue with M1 Pro/Max is that, for whatever reason, its performance currently depends a lot on the task. For some workloads, M1 Pro/Max run as good as top Nvidia GPUs, but for others, only as good as low-tier Nvidia GPUs.
I don't think it's such a big mystery if you look at the specs. First, the obvious ones — ALU throughput and bandwidth. For example, in a straightforward throughput workloads you can't really expect the M1 Max with it's 5TFLOPS to outperform a 3080 with it's 14TFLOPS (I am not counting FMA double here). But there are also less obvious ones, if we look at unique strength and weaknesses of these GPUs:
- Apple G13 has access to much more cache
- Apple G13 has access to much more RAM and can share data with the CPU without incurring latency penalties
- Apple G13 has TBDR which allows it to use shading and memory resources more efficiently when rasterizing
- Nvidia/AMD have hardware RT
- Nvidia/AMD generally have more memory bandwidth
When you consider these, things should become more clear. For example, M1 is able to punch well above its weight in rasterization tasks (eg. gaming) because TBDR allows it to use the resources smarter. So it can perform close to a GPU that nominally has higher performance. Similarly, M1 will excel in tasks where you have to synchronize the data between GPU memory and system memory a lot, or where you need a lot of GPU memory to begin with. Content creation is a prime example, where data is streamed in and out from the GPU RAM constantly. No matter how fast your Nvidia GPU is, all these data syncs will introduce latency and incur overhead which will reduce the effective performance — no such problem with M1 as it simply doesn't care. And finally, M1 might do very well on complex workloads with a lot of data dependencies, where it can play out its cache advantage, but that's not really something GPUs are used for, so this one stays purely academical.