I thought the 4K random read might have improved instead of getting slightly worse?
Your queue depth 64 random read score got a lot better. QD1 random read/write scores are not too meaningful, imo.
QD1 means AmorphousDiskMark queues one 4K read or write, then waits for it to complete before queueing another. QD64 means it queues up to 64 random ops in parallel, then queues a replacement op whenever an earlier one completes - it tries to keep the queue of outstanding I/O operations at 64 deep.
A key fact about SSD architecture that's useful in understanding this: each NAND flash die is only capable of processing one read or write at a time (*). Adding more capacity to a SSD means more flash die (**), and that means more parallelism is available, since each die can work independently on its own I/O request. (You have to have access patterns that hit multiple die, of course.)
However, if your application software (Amorphous Disk Mark in this case) is limiting itself to a queue depth of 1, it doesn't matter how many die there are - the benchmark only exercises one die at a time. At QD64, it actually takes advantage of the SSD's parallelism and should show random IOPS scaling up with SSD size.
The slightly lower score at QD1 was probably just random variance. I bet if you had been able to collect a lot of data points at QD1 on both configurations, the median scores would look pretty much the same.
* - This was a bit of simplification on my part. Most NAND flash die are internally divided into at least two independent 'planes', each of which can handle one I/O at a time. Same thing in the end though, the division into planes just multiplies IOPS per die by a constant factor.
** - Unless the bigger SSD uses denser NAND flash (more capacity per die), which complicates things.