Got a tip for us? Let us know

Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

energy efficiency

D
Seeking M5 Max/Ultra telemetry: does higher bandwidth improve Tokens/Joule or just raw throughput?

I built an open-source LLM inference telemetry suite that measures Tokens Per Joule — the energy efficiency metric, not just raw speed. Current baseline is on an M1 Pro (32GB UMA): 2.42 Tokens/Joule on Qwen-3B Q4_K_M 22 t/s on Llama-3.1-8B Q8_0 at 8192 context (13.7GB workload) at 35W...
- dilber
- Thread
- Apr 16, 2026
- apple silicon benchmarking energy efficiency inference llama.cpp llm m1 pro m5 max tokens per joule unified memory
- Replies: 0
- Forum: Apple Silicon (Arm) Macs

Top Bottom

Back