Another point to note is that the memory bandwidth of the M3Max is already 400mb/sec, not the 273 of the M4Pro. No software makes use of this at any level even approaching 200 mb/sec, but one might assume they would, and when they do the M3Max will beat out the M4pro.
LLM inference speed is closely tied to and limited by memory bandwidth. Running local models is the overwhelming use case for why this spec is important.
After the first token, an M3 Max will generate tokens ~47% faster than an M4 Pro, an M4 Max ~37% faster than an M3 Max.
Last edited: