M1 Max Ethereum Mining Test

metapunk2077fail · Nov 22, 2021

JMacHack said:
I point you to the EV company market.

Yeah its stupid what's going on there too.

ChainfireXDA · Jan 17, 2022

UselethMiner hit 18 MH/s on the M1 Max.

Pressure · Jan 17, 2022

ChainfireXDA said:
UselethMiner hit 18 MH/s on the M1 Max.

That's using Metal and Aarch64 (both GPU and CPU). That equates to 3.4W/MH. GitHub link for those interested.

If only using the GPU you are down to 2.3W/MH (10.3MH on M1 Max).

Turribeach · Jan 17, 2022

This is promising, specially if the rumours about the M1 Max Duo and the M1 Max Quad are true. We will have to see if those new SoCs get released but assuming linear scale (which is another stretch!) then 36 MH/s for the Duo and 72 MH/s for the Quad starts to get more interesting. At 72 MH/s and current prices it will generate $4/day or $121/month or $1,460/year, without electricity costs. So probably still far from a decent ROI (it will take 5 years if a MacPro Quad costed $7,000) but certainly interesting as a side income specially if the MacPro has a lot of dead time...

leman · Jan 17, 2022

Turribeach said:
This is promising, specially if the rumours about the M1 Max Duo and the M1 Max Quad are true. We will have to see if those new SoCs get released but assuming linear scale (which is another stretch!) then 36 MH/s for the Duo and 72 MH/s for the Quad starts to get more interesting. At 72 MH/s and current prices it will generate $4/day or $121/month or $1,460/year, without electricity costs. So probably still far from a decent ROI (it will take 5 years if a MacPro Quad costed $7,000) but certainly interesting as a side income specially if the MacPro has a lot of dead time...

You have free electricity?

diamond.g · Jan 17, 2022

leman said:
You have free electricity?

You don't? ?

Turribeach · Jan 17, 2022

leman said:
You have free electricity?

No but my Mac already runs 24x7 as I run lots of other things like Home Automation, various docker images, etc. Also increase in energy will be minor, it's only pulling ~60w.

diamond.g · Jan 17, 2022

Turribeach said:
No but my Mac already runs 24x7 as I run lots of other things like Home Automation, various docker images, etc. Also increase in energy will be minor, it's only pulling ~60w.

Isn't the 60W draw only for 18MH/s?

Turribeach · Jan 17, 2022

diamond.g said:
Isn't the 60W draw only for 18MH/s?

I don't understand your question. Have a look at the benchmarks table.

diamond.g · Jan 17, 2022

Turribeach said:
I don't understand your question. Have a look at the benchmarks table.

I was just thinking that to get 72 MH/s using Apples hardware would require ~240W of power, if you need ~60W to get 18MH/s.

I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.

Pressure · Jan 17, 2022

diamond.g said:
I was just thinking that to get 72 MH/s using Apples hardware would require ~240W of power, if you need ~60W to get 18MH/s.

I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.

I believe this is the first Metal implementation for any miner software. It is probably still rough around the edges seeing the project only has a single developer attached.

UBS28 · Jan 17, 2022

diamond.g said:
I was just thinking that to get 72 MH/s using Apples hardware would require ~240W of power, if you need ~60W to get 18MH/s.

I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.

Maybe Apple implemented something to prevent miners buying the M1 Max MBP in masses, creating huge shortages?

Turribeach · Jan 17, 2022

diamond.g said:
I was just thinking that to get 72 MH/s using Apples hardware would require ~240W of power, if you need ~60W to get 18MH/s.

I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.

Ah OK I get you now. Yes for sure it will be more than ~60w but probably less than ~240w. There are economies of scale and inefficiencies that mean that power should be less than 4x60w. But even if it was 240w at 0.21c $/kWh (which is more than most pay) that still leaves $2.75/day or $83/month or $1,000/year so still very respectable for a passive income.

diamond.g · Jan 17, 2022

Pressure said:
I believe this is the first Metal implementation for any miner software. It is probably still rough around the edges seeing the project only has a single developer attached.

This metal backend is my first code using the Metal Shading Language. Its performance is within 1% of the fastest OpenCL-based miner.

While I am convinced the hardware can do it faster still, getting MSL to actually do it is another story entirely.

Sheesh. Is that 1% on the same hardware, or in general?

diamond.g · Jan 17, 2022

UBS28 said:
Maybe Apple implemented something to prevent miners buying the M1 Max MBP in masses, creating huge shortages?

I wonder if they did that for all coins or just Eth. If I am not mistaken Nvidia has just nerfed Eth, so you can still mine other coins at full rate.

ChainfireXDA · Jan 18, 2022

UselethMiner crossed the 20 MH/s mark, due to an improvement in the CPU code.

diamond.g said:
I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.

Because Ethereum mining is largely latency limited. It reads 128 bytes of memory from a random location in the DAG (a currently ~4.5GB memory block of data) in a tight loop. Those bytes need to be pulled from actual memory into CPU/GPU cache for the (very basic) hash math to be performed. CPUs and GPUs are optimized for sequential memory access prefetching, which Ethereum's algorithm intentionally breaks.

Assuming sufficiently fast integer math (pretty much the case on any modern CPU/GPU), hashing performance is limited primarily by the latency of pulling those bytes in. "Latency hiding" is done by trying to keep the CPU/GPU occupied with the integer math part of other nonces while waiting for the bytes to come in from memory for the current nonce - parallel processing.

How many nonces can be computed in parallel depends on threads and cache sizes, and how quick those can be processed depends on how many memory loads we can get queued in the controller.

The M1 is relatively fast in CPU hashing because the unified memory has lower latency than is common between CPU and RAM on high-end devices.

The M1 is relatively slow in GPU hashing because the unified memory has higher latency than is common between GPU and VRAM on high-end devices.

Pressure said:
I believe this is the first Metal implementation for any miner software. It is probably still rough around the edges seeing the project only has a single developer attached.

Correct. Though right now I think we're at most 20% off of what the hardware can do in theory. Then again, I have to do some assumptions in that math that I cannot prove or disprove at this point.

diamond.g said:
Sheesh. Is that 1% on the same hardware, or in general?

ethminer-m1's OpenCL and UselethMiner's Metal implementation (so just the GPU parts) are within 1% of each-other performance-wise in my tests on the M1 Max.

diamond.g · Jan 18, 2022

ChainfireXDA said:
UselethMiner crossed the 20 MH/s mark, due to an improvement in the CPU code.

Because Ethereum mining is largely latency limited. It reads 128 bytes of memory from a random location in the DAG (a currently ~4.5GB memory block of data) in a tight loop. Those bytes need to be pulled from actual memory into CPU/GPU cache for the (very basic) hash math to be performed. CPUs and GPUs are optimized for sequential memory access prefetching, which Ethereum's algorithm intentionally breaks.

Assuming sufficiently fast integer math (pretty much the case on any modern CPU/GPU), hashing performance is limited primarily by the latency of pulling those bytes in. "Latency hiding" is done by trying to keep the CPU/GPU occupied with the integer math part of other nonces while waiting for the bytes to come in from memory for the current nonce - parallel processing.

How many nonces can be computed in parallel depends on threads and cache sizes, and how quick those can be processed depends on how many memory loads we can get queued in the controller.

The M1 is relatively fast in CPU hashing because the unified memory has lower latency than is common between CPU and RAM on high-end devices.

The M1 is relatively slow in GPU hashing because the unified memory has higher latency than is common between GPU and VRAM on high-end devices.

Correct. Though right now I think we're at most 20% off of what the hardware can do in theory. Then again, I have to do some assumptions in that math that I cannot prove or disprove at this point.

ethminer-m1's OpenCL and UselethMiner's Metal implementation (so just the GPU parts) are within 1% of each-other performance-wise in my tests on the M1 Max.

So Apples hardware is cryptomining resistant because of its architecture.

quarkysg · Jan 18, 2022

ChainfireXDA said:
The M1 is relatively slow in GPU hashing because the unified memory has higher latency than is common between GPU and VRAM on high-end devices.

Most dGPUs uses GDDR RAM? Isn’t GDDR’s access latency higher than LPDDR memory but has higher bandwidth?

leman · Jan 18, 2022

ChainfireXDA said:
The M1 is relatively fast in CPU hashing because the unified memory has lower latency than is common between CPU and RAM on high-end devices.

The M1 is relatively slow in GPU hashing because the unified memory has higher latency than is common between GPU and VRAM on high-end devices.

I was under the impression that this is the other way around? LPDDR5 should have higher latency (~100ns as measured by Anandtech) than the DDR4/5 (~70ns) commonly used. Similarly, GDDR5/6 should have higher latency, at least according to the "common knowledge".

The M1 GPU should in theory be quite good at hiding latency of random memory fetches, but there might be just not enough cache locality to keep the GPU going. The register file of M1 is not the largest either.

diamond.g · Jan 18, 2022

leman said:
I was under the impression that this is the other way around? LPDDR5 should have higher latency (~100ns as measured by Anandtech) than the DDR4/5 (~70ns) commonly used. Similarly, GDDR5/6 should have higher latency, at least according to the "common knowledge".

The M1 GPU should in theory be quite good at hiding latency of random memory fetches, but there might be just not enough cache locality to keep the GPU going. The register file of M1 is not the largest either.

Wait, I thought M1(x) had huge caches (especially compared to nvidia/AMD gpu side)?

leman · Jan 18, 2022

diamond.g said:
Wait, I thought M1(x) had huge caches (especially compared to nvidia/AMD gpu side)?

Well, yeah, but on a working set of over 4.5GB with really random data access huge cache means nothing.

diamond.g · Jan 18, 2022

leman said:
Well, yeah, but on a working set of over 4.5GB with really random data access huge cache means nothing.

So now we are back to why is M1 GPU so terrible at ETH mining if it has lower latency then GDDR6(x) and roughly the same bandwidth as comparable GDDR6(x) GPU.

leman · Jan 18, 2022

diamond.g said:
So now we are back to why is M1 GPU so terrible at ETH mining if it has lower latency then GDDR6(x) and roughly the same bandwidth as comparable GDDR6(x) GPU.

How much faster are other GPUs with comparable bandwidth and FLOPS?

ChainfireXDA · Jan 18, 2022

diamond.g said:
So Apples hardware is cryptomining resistant because of its architecture.

It's not an ideal fit for this algorithm, no.

quarkysg said:
Most dGPUs uses GDDR RAM? Isn’t GDDR’s access latency higher than LPDDR memory but has higher bandwidth?

I thought it GDDR had lower access latency. But regardless of that, we know the *effective average latency* for getting bytes from unified memory into the registers is higher (for this access pattern). Whether that is due to the base latency of the RAM, a different amount of channels, different minimum word size moved, less parallelism in the controllers... ?‍♂️

diamond.g · Jan 18, 2022

leman said:
How much faster are other GPUs with comparable bandwidth and FLOPS?

IIRC the 6600XT has similar TFLOPs as the M1 Max with less bandwidth (256 vs 400) but gets something like 30 MH/s @ ~55W.

M1 Max Ethereum Mining Test

macrumors 6502a

macrumors newbie

macrumors 603

macrumors member

macrumors Core

macrumors G5

macrumors member

macrumors G5

macrumors member

macrumors G5

macrumors 603

macrumors 68030

macrumors member

macrumors G5

macrumors G5

macrumors newbie

macrumors G5

macrumors 65816

macrumors Core

macrumors G5

macrumors Core

macrumors G5

macrumors Core

macrumors newbie

macrumors G5

Our Staff