Resampling with replacement creates non-independent observations within a sample (since they are picked several times), which violates assumptions of most statistical tests.If you want to apply statistical testing to it the trick would be finding a suitable model. I suppose one would look at Cauchy distribution since it describes the ratio of normally distributed variables, but I am not knowledgeable in that particular domain and don't have an intuition.
I would not compare on the performance ratios to 1, I would just compare the M3 and M4 data for each test. If normalized GB results are normally distributed (which I don't know), you can make linear model with the CPU_type × type_of_test as explanatory variables. Then an ANOVA can be applied on this model to estimate the part of variance explained by the CPU model, the type of test, and their interaction.