I thought it GDDR had lower access latency. But regardless of that, we know the *effective average latency* for getting bytes from unified memory into the registers is higher. Whether that is due to the base latency of the RAM, a different amount of channels, different minimum word size moved, less parallelism in the controllers... 🤷♂️
Bias aside, there is a good explanation in this.Most cryptocurrencies to date are built on "proof of work": Solve a really hard math problem whose solution is trivial to verify (this is mining), get rewarded with some meaningless tokens which crypto promoters claim are going to go to the moon because obviously the whole world wants to abandon conventional money and switch to crypto. The claims are all pump and dump nonsense, but we live in interesting times, so we haven't gotten to the end of the pumpers creating crypto bubbles to profit off the gullible.
Anyways. PoW uses enormous amounts of electrical power. This is by design: the act of solving those hard math problems is also tied to how a cryptocurrency network signs off on transactions (i.e. Bob has 5 coins and wants to pay Alice three of them). The idea is that by forcing anyone who wants to propose including some transactions in the globally visible public ledger to prove that they burned a substantial amount of compute power working on a meaningless math problem, you can make it too expensive for individual bad actors to try to push cheating transactions (e.g. pay the same "coin" to two different people) into the ledger.
But if the world actually ran on PoW crypto, we'd end up using more energy on mining than industrial and residential uses put together. As public awareness about this problem rises, crypto promoters have been searching for a solution. It's hard to sell people on the idea that crypto is Money 2.0 if adopting Money 2.0 means accelerating global warming and harming the economy.
The only one that's gotten much traction so far is Proof of Stake. However, it amounts to "Let those with the most cryptocurrency make the rules". This appeals to the people who hold a large amount of crypto and want to become the new oligarchs of the world without actually doing anything of merit. I probably do not need to tell you why this is not likely to actually work in the real world.
Also, while crypto promoters have been claiming PoS is about to be real for a long time, it never quite materializes.
Also, even if you solved PoW, crypto would still face enormous problems taking over global finance. It turns out that despite what the pumpers claim, cryptocurrencies are terrible at actually being currency at any kind of large scale.
From what I’ve read, GDDR is suited for sequential access and DDR is suited for random access.I thought it GDDR had lower access latency. But regardless of that, we know the *effective average latency* for getting bytes from unified memory into the registers is higher (for this access pattern). Whether that is due to the base latency of the RAM, a different amount of channels, different minimum word size moved, less parallelism in the controllers... 🤷♂️
How? Once the data is in memory, and you are purely doing memory access and crunching numbers, how is it optimized for dGPU's.I guess the mining software is better optimized for dGPU.
Software need to be optimised for the architecture that the software is meant to be run. Let's take the Stockfish thread for example. There's a difference between standard Stockfish code and one that has been optmized for the M1.How? Once the data is in memory, and you are purely doing memory access and crunching numbers, how is it optimized for dGPU's.
Not trying to stir things up. Does Metal provide the means to change the things @ChainfireXDA mentioned were issues they saw?Software need to be optimised for the architecture that the software is meant to be run. Let's take the Stockfish thread for example. There's a difference between standard Stockfish code and one that has been optmized for the M1.
I think you already know this, but just trying to stir things up?
Well, I have to admit I know very little on Metal's capability.Not trying to stir things up. Does Metal provide the means to change the things @ChainfireXDA mentioned were issues they saw?
The issue really seems to be memory subsystem centric. I know GPU speed doesn't really affect hashrate (within reason). UMA is supposed to be an advantage (which it should be for inital DAG creation for sure) and because Apple is using LPDDR latency should be lower than GDDR.
Maybe the DAG needs to exist on all 4 memory chips to get the full 400GB/s bandwidth (assuming the issue is actually bandwidth and not latency to get data to the cache for operation)?
Well, I have to admit I know very little on Metal's capability.
Is there any official source to learn this?
Yup, here:Is there any official source to learn this?
Now that is an awesome result, if it translates to a “real” pool.I've coded the Ethereum algorithm on Metal and ARM assembly (using the new crypto instructions) - testing the results using the go-ethereum repo (ie making sure input X products output Y) - on my M1 MacBook Pro (Apple M1 Max), I'm getting:
GPU: over 100Mh/s @ 30 watts of additional power usage
CPU: over 10Mh/s @ 20 watts of additional power usage
I'm in the process of testing this more (against test pools with real life values rather than the test data from go-ethereum) and then turning this into a Mac app atm (won't go on the App Store because of their rules)
Edit: looking at this thread, it looks like someone else has already done the same thing? I wonder why my results (for GPU) are much higher?
That looks great!I've coded the Ethereum algorithm on Metal and ARM assembly (using the new crypto instructions) - testing the results using the go-ethereum repo (ie making sure input X products output Y) - on my M1 MacBook Pro (Apple M1 Max), I'm getting:
GPU: over 100Mh/s @ 30 watts of additional power usage
CPU: over 10Mh/s @ 20 watts of additional power usage
I'm in the process of testing this more (against test pools with real life values rather than the test data from go-ethereum) and then turning this into a Mac app atm (won't go on the App Store because of their rules)
Edit: looking at this thread, it looks like someone else has already done the same thing? I wonder why my results (for GPU) are much higher?
Aww, I wonder why there is a difference in performance.An update: @ChainfireXDA has been helping me out and when converting the app to NOT use test data, the performance is reduced a lot.. no longer looking like 100mh/s!
Seems due to memory and memory cache performance
The first tests were using such a small amount of data that it's likely it all was populated in cache - where as now there is much more data being worked with
30Mh/s at that wattage would be freaking incredible. IIRC, the 3060 Ti was the efficiency king at 60MH/s and 120w.What hashrate are you seeing now? Even half would be considered very good.
The CPU performance has me considering trying Monero on the M1.
30Mh/s at that wattage would be freaking incredible. IIRC, the 3060 Ti was the efficiency king at 60MH/s and 120w.
That's 2Mh per watt. If Apple Silicon did 1Mh / watt it'd be a huge step forward.
What a genius comparison: stand alone GPU vs complete notebook…3060ti FE is 3x faster ETH hashrate, 22% faster per W and almost 1/8 the cost ($400 3060ti from Best Buy vs $3100+ M1 Max).
Well that is a shame, it would have been cool to get 100 mega hash without burning all the power GA102 does to get there. It is also interesting that there is such a high penalty for not playing in the cache, as you would think 400 gigs of bandwidth would make up for it.Seems due to memory and memory cache performance
The first tests were using such a small amount of data that it's likely it all was populated in cache - where as now there is much more data being worked with