I built an open-source LLM inference telemetry suite that measures Tokens Per Joule — the energy efficiency metric, not just raw speed.
Current baseline is on an M1 Pro (32GB UMA):
2.42 Tokens/Joule on Qwen-3B Q4_K_M
22 t/s on Llama-3.1-8B Q8_0 at 8192 context (13.7GB workload) at 35W...