Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
This is promising, specially if the rumours about the M1 Max Duo and the M1 Max Quad are true. We will have to see if those new SoCs get released but assuming linear scale (which is another stretch!) then 36 MH/s for the Duo and 72 MH/s for the Quad starts to get more interesting. At 72 MH/s and current prices it will generate $4/day or $121/month or $1,460/year, without electricity costs. So probably still far from a decent ROI (it will take 5 years if a MacPro Quad costed $7,000) but certainly interesting as a side income specially if the MacPro has a lot of dead time...
 
This is promising, specially if the rumours about the M1 Max Duo and the M1 Max Quad are true. We will have to see if those new SoCs get released but assuming linear scale (which is another stretch!) then 36 MH/s for the Duo and 72 MH/s for the Quad starts to get more interesting. At 72 MH/s and current prices it will generate $4/day or $121/month or $1,460/year, without electricity costs. So probably still far from a decent ROI (it will take 5 years if a MacPro Quad costed $7,000) but certainly interesting as a side income specially if the MacPro has a lot of dead time...

You have free electricity? :rolleyes:
 
  • Like
Reactions: bobcomer
I was just thinking that to get 72 MH/s using Apples hardware would require ~240W of power, if you need ~60W to get 18MH/s.

I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.
I believe this is the first Metal implementation for any miner software. It is probably still rough around the edges seeing the project only has a single developer attached.
 
I was just thinking that to get 72 MH/s using Apples hardware would require ~240W of power, if you need ~60W to get 18MH/s.

I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.

Maybe Apple implemented something to prevent miners buying the M1 Max MBP in masses, creating huge shortages?
 
I was just thinking that to get 72 MH/s using Apples hardware would require ~240W of power, if you need ~60W to get 18MH/s.

I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.
Ah OK I get you now. Yes for sure it will be more than ~60w but probably less than ~240w. There are economies of scale and inefficiencies that mean that power should be less than 4x60w. But even if it was 240w at 0.21c $/kWh (which is more than most pay) that still leaves $2.75/day or $83/month or $1,000/year so still very respectable for a passive income.
 
I believe this is the first Metal implementation for any miner software. It is probably still rough around the edges seeing the project only has a single developer attached.
This metal backend is my first code using the Metal Shading Language. Its performance is within 1% of the fastest OpenCL-based miner.

While I am convinced the hardware can do it faster still, getting MSL to actually do it is another story entirely.
Sheesh. Is that 1% on the same hardware, or in general?
 
Maybe Apple implemented something to prevent miners buying the M1 Max MBP in masses, creating huge shortages?
I wonder if they did that for all coins or just Eth. If I am not mistaken Nvidia has just nerfed Eth, so you can still mine other coins at full rate.
 
UselethMiner crossed the 20 MH/s mark, due to an improvement in the CPU code.

I am still not clear why a AS GPU with as much bandwidth as it has is so bad at Eth mining.

Because Ethereum mining is largely latency limited. It reads 128 bytes of memory from a random location in the DAG (a currently ~4.5GB memory block of data) in a tight loop. Those bytes need to be pulled from actual memory into CPU/GPU cache for the (very basic) hash math to be performed. CPUs and GPUs are optimized for sequential memory access prefetching, which Ethereum's algorithm intentionally breaks.

Assuming sufficiently fast integer math (pretty much the case on any modern CPU/GPU), hashing performance is limited primarily by the latency of pulling those bytes in. "Latency hiding" is done by trying to keep the CPU/GPU occupied with the integer math part of other nonces while waiting for the bytes to come in from memory for the current nonce - parallel processing.

How many nonces can be computed in parallel depends on threads and cache sizes, and how quick those can be processed depends on how many memory loads we can get queued in the controller.

The M1 is relatively fast in CPU hashing because the unified memory has lower latency than is common between CPU and RAM on high-end devices.

The M1 is relatively slow in GPU hashing because the unified memory has higher latency than is common between GPU and VRAM on high-end devices.

I believe this is the first Metal implementation for any miner software. It is probably still rough around the edges seeing the project only has a single developer attached.

Correct. Though right now I think we're at most 20% off of what the hardware can do in theory. Then again, I have to do some assumptions in that math that I cannot prove or disprove at this point.

Sheesh. Is that 1% on the same hardware, or in general?

ethminer-m1's OpenCL and UselethMiner's Metal implementation (so just the GPU parts) are within 1% of each-other performance-wise in my tests on the M1 Max.
 
  • Like
Reactions: Turribeach
UselethMiner crossed the 20 MH/s mark, due to an improvement in the CPU code.



Because Ethereum mining is largely latency limited. It reads 128 bytes of memory from a random location in the DAG (a currently ~4.5GB memory block of data) in a tight loop. Those bytes need to be pulled from actual memory into CPU/GPU cache for the (very basic) hash math to be performed. CPUs and GPUs are optimized for sequential memory access prefetching, which Ethereum's algorithm intentionally breaks.

Assuming sufficiently fast integer math (pretty much the case on any modern CPU/GPU), hashing performance is limited primarily by the latency of pulling those bytes in. "Latency hiding" is done by trying to keep the CPU/GPU occupied with the integer math part of other nonces while waiting for the bytes to come in from memory for the current nonce - parallel processing.

How many nonces can be computed in parallel depends on threads and cache sizes, and how quick those can be processed depends on how many memory loads we can get queued in the controller.

The M1 is relatively fast in CPU hashing because the unified memory has lower latency than is common between CPU and RAM on high-end devices.

The M1 is relatively slow in GPU hashing because the unified memory has higher latency than is common between GPU and VRAM on high-end devices.



Correct. Though right now I think we're at most 20% off of what the hardware can do in theory. Then again, I have to do some assumptions in that math that I cannot prove or disprove at this point.



ethminer-m1's OpenCL and UselethMiner's Metal implementation (so just the GPU parts) are within 1% of each-other performance-wise in my tests on the M1 Max.
So Apples hardware is cryptomining resistant because of its architecture.
 
The M1 is relatively slow in GPU hashing because the unified memory has higher latency than is common between GPU and VRAM on high-end devices.
Most dGPUs uses GDDR RAM? Isn’t GDDR’s access latency higher than LPDDR memory but has higher bandwidth?
 
The M1 is relatively fast in CPU hashing because the unified memory has lower latency than is common between CPU and RAM on high-end devices.

The M1 is relatively slow in GPU hashing because the unified memory has higher latency than is common between GPU and VRAM on high-end devices.

I was under the impression that this is the other way around? LPDDR5 should have higher latency (~100ns as measured by Anandtech) than the DDR4/5 (~70ns) commonly used. Similarly, GDDR5/6 should have higher latency, at least according to the "common knowledge".

The M1 GPU should in theory be quite good at hiding latency of random memory fetches, but there might be just not enough cache locality to keep the GPU going. The register file of M1 is not the largest either.
 
I was under the impression that this is the other way around? LPDDR5 should have higher latency (~100ns as measured by Anandtech) than the DDR4/5 (~70ns) commonly used. Similarly, GDDR5/6 should have higher latency, at least according to the "common knowledge".

The M1 GPU should in theory be quite good at hiding latency of random memory fetches, but there might be just not enough cache locality to keep the GPU going. The register file of M1 is not the largest either.
Wait, I thought M1(x) had huge caches (especially compared to nvidia/AMD gpu side)?
 
Well, yeah, but on a working set of over 4.5GB with really random data access huge cache means nothing.
So now we are back to why is M1 GPU so terrible at ETH mining if it has lower latency then GDDR6(x) and roughly the same bandwidth as comparable GDDR6(x) GPU.
 
So now we are back to why is M1 GPU so terrible at ETH mining if it has lower latency then GDDR6(x) and roughly the same bandwidth as comparable GDDR6(x) GPU.

How much faster are other GPUs with comparable bandwidth and FLOPS?
 
So Apples hardware is cryptomining resistant because of its architecture.

It's not an ideal fit for this algorithm, no.

Most dGPUs uses GDDR RAM? Isn’t GDDR’s access latency higher than LPDDR memory but has higher bandwidth?

I thought it GDDR had lower access latency. But regardless of that, we know the *effective average latency* for getting bytes from unified memory into the registers is higher (for this access pattern). Whether that is due to the base latency of the RAM, a different amount of channels, different minimum word size moved, less parallelism in the controllers... ?‍♂️
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.