Yep not optimized yet. In other GPU based task and benchmarks, the M1 Max reaches 3070m and in some rare cases even 3080m levels of performance.
GPU rendering has been almost exclusively developed for Nvidia GPUs for the past 5+ years so it’s not surprising that Apple Silicon (and AMD for that matter) are falling behind and need extra development to show their true performance.
If Apple was already getting twice the level of rendering performance before the official Blender 3.1 release then we should be seeing something closer to 3070m performance once the optimizations are built in and especially if the neural network is used for de-noising.
I don't know how much render time is spent de-noising the image, but unless it's a big percentage of the total frame rendering time, I think there are other things that are going to speed it more.
From the Apple engineer's post on the Blender forum, I take that white it's now running using Metal, there are significant architecture-dependent optimizations to be made. He mentions that it's already somewhat optimized because it's not copying data back and forth due to the Unified Memory Architecture, but that comes for 'free' with the Metal implementation. There are other (structural) changes that should improve the GPU performance further, without needing to offload work to additional hardware.
For example, and since the post explicitly mentions the rendering path still being closely tied to the Nvidia/CUDA model and not optimized for Apple's architecture, maybe they haven't started trying to reorder and merge the rendering passes that only need access to tile memory. The TBDR of Apple's GPUs means that some rendering passes that needed to be separate in IMR GPUs can now be merged into a single pass, greatly reducing pipeline change overhead and providing much faster memory access, specially if they can fit in 'memoryless' render targets that would need to be copied to VRAM in IMR GPUs but can reside only in tile memory on TBDR GPUs. That IMHO could be a huge performance boost, but it takes a lot of time and deep knowledge of the whole rendering process.