Intel Alder Lake vs. Apple M1

mi7chy · Feb 12, 2022

robco74 said:
I guess core count and clock speeds are the new muscle cars.

ARM trails x64 by about half for TFlops/s per core and TFlops/s per Kw on TOP500 supercomputers.

AMD Epyc 7763 (Perlmutter)
0.093 TFlops/s per core
27.37 TFlops/s per Kw

Fujitsu A64FX ARM (Fagaku)
0.058 TFlops/s per core
14.78 TFlops/s per Kw

Don't tell lemon some of them also run Redhat/CentOS 7.

exoticSpice said:
1.5TB of RAM

Meh...

8TB
https://www.supermicro.com/en/products/motherboard/H12DSG-Q-CPU6

crazy dave · Feb 12, 2022

mi7chy said:
ARM trails x64 by about half for TFlops/s per core and TFlops/s per Kw on TOP500 supercomputers.

AMD Epyc 7763 (Perlmutter)
0.093 TFlops/s per core
27.37 TFlops/s per Kw

Fujitsu A64FX ARM (Fagaku)
0.058 TFlops/s per core
14.78 TFlops/s per Kw

While the description on the TOP 500 website talks about the clock speed of the CPU for calculating Rpeak, it also includes GPU capability for heterogeneous systems like Perlmutter.

The Top 500 lists Perlmutter at 760,000+ cores. Of that, however, currently Perlmutter only has about 96,000 AMD cores as the heterogenous system was built first. The other ~650,000 cores that achieve its TFlops ratings are GPU cores from the Nvidia A100 (roughly 6,000 of them). Phase 2 cabinets with just the Milan cores are only being built now. Fugaku is CPU only.

November 2021 | TOP500

www.top500.org

Perlmutter

NERSC’s flagship supercomputer is an HPE Cray named “Perlmutter” in honor of Saul Perlmutter. An astrophysicist at Berkeley Lab and a professor of physics at UC Berkeley, Dr. Perlmutter shared the 2011 Nobel Prize in Physics for his contributions to research showing that the expansion of the...

www.nersc.gov

That's 1,500+ nodes with 1x64 core AMD CPU and 4x108 SM Nvidia GPU - 744,000+ cores (then add the user access nodes, non-compute nodes, and service nodes to get 760,000+).

Again, Fugaku is CPU only. That it's as good as it is against a heterogenous system is impressive. I would've expected better from Perlmutter though elements of linpack really do favor CPUs and vector processing. Zen 3 Milan is a very fine, well rounded server CPU and compares favorably with just about everything out in the market and the GPUs will be probably excellent for the scenarios they are designed for.

But just out of curiosity did you actually think that Perlmutter's 700,000 cores were all CPU despite the description of it in the list explicitly mentioning the GPU type?

Or did you hope no one else would know what Perlmutter was or bother to check when you left out the GPU in your description and divided the Tflops by the 700,000 cores and declared that to be the score per Epyc core and Kw?

crazy dave · Feb 12, 2022

mr_roboto said:
No, it doesn't.

The post you're replying to made no mention of frequency at all. I also have experienced you responding to one of my posts with a torrent of paragraphs about stuff I simply didn't say.

You need to actually read what Intel says the left hand graph is about instead of going off on what you think it's about. It's right there in the graph's caption: "P-Core delivers higher Performance on single and lightly threaded scalable apps." And the captions for the curves in that graph: "1 E-core", "1 P-Core".

The reason there's a spot where the two curves almost touch is just that this is often what happens when you plot perf/power curves for two different CPU architectures. It doesn't mean Intel's trying to imply handoff.

It's not hard to simply accept Intel at their word and use that as the basis for further reasoning: "E-core provide higher computational density under given physical constraints". That's the caption for the right hand graph, where they contrast the MT performance of a 4P configuration against a 2P+8E configuration. If you go look up annotated die photos of Alder Lake family processors, you'll find that a rectangle drawn around eight E cores is not much larger than a rectangle drawn around two P cores. With that context, the message of the slide can be understood as: "By substituting eight E cores for two P cores, we gain 50% MT performance over a 4P config".

So Intel's definition of "E"fficiency for Alder Lake E cores is mostly about area, not power. Golden Cove's high ST performance comes at an extreme area cost, and they weren't going to look competitive against high core count AMD processors without doing something about that.

I agree with two minor caveats: as the chips and cheese article says for integer-heavy workloads Gracemont is better efficiency-wise and at very low power around 5 watts it is slightly better there too. But otherwise yes the E cores are really about die area. Especially as the article notes in the power regime they are run at and in the loads they are expected to take on.

Sydde · Feb 12, 2022

cmaier said:
we know that Apple has a long and very consistent history of increasing single core performance around 15-20% per year

How much longer can they keep that up, though? A lot of their gains have been from the process (which, ignoring the bs that process nodes amount to, has been shrinking on average something like 9% a year). Are there real, practical ways that they can improve performance with every new core set? And honestly, is there that much to gain at this point in boosting SpecInt? I should be hoping they would focus more on the heavy work GPU and ANE parts, if they are looking for bragging rights.

cmaier · Feb 12, 2022

Sydde said:
How much longer can they keep that up, though? A lot of their gains have been from the process (which, ignoring the bs that process nodes amount to, has been shrinking on average something like 9% a year). Are there real, practical ways that they can improve performance with every new core set? And honestly, is there that much to gain at this point in boosting SpecInt? I should be hoping they would focus more on the heavy work GPU and ANE parts, if they are looking for bragging rights.

They can keep it up for quite awhile. All sorts of micro architectural and physical design tricks left in the bag, still.

JMacHack · Feb 13, 2022

crazy dave said:
Or did you hope no one else would know what Perlmutter was or bother to check when you left out the GPU in your description and divided the Tflops by the 700,000 cores and declared that to be the score per Epyc core and Kw?

Ding ding ding, there’s your answer.

Mitchy either doesn’t know what the **** he’s talking about or posts deliberately misleading claims. Apparently not aware how easily it is to disprove.

leman · Feb 13, 2022

JMacHack said:
Mitchy either doesn’t know what the **** he’s talking about or posts deliberately misleading claims. Apparently not aware how easily it is to disprove.

Mithcy is just quoting stuff without really understanding the topic or the context. Difficult to deal with people like that, because they always have an "argument" while completely lacking any sense of intellectual responsibility.

robco74 · Feb 13, 2022

Another proud graduate of Dunning-Kruger U.

Sydde · Feb 13, 2022

I saw some einstein post on ars that the world runs on x86, and while there is some grain of truth to that, it is still far from accurate. There are billions of phones out there, of which six or seven of them run an x86 CPU, if even that. The architecture was designed for 1977 and is held together with bubblegum and bailing wire. It is decades past when we should have moved on.

I mean, if a thing like Vulkan can run on different GPU architectures given the appropriate drivers, there is no practical reason that computing platforms should not be ISA-agnostic as well. We just need the right standard for designing pre-object code that any given system can convert and optimize to run well on the processor it sports.

Xiao_Xi · Feb 13, 2022

Sydde said:
computing platforms should be ISA-agnostic as well. We just need the right standard for designing pre-object code that any given system can convert and optimize to run well on the processor it sports.

LLVM does that.

Sydde said:
Vulkan can run on different GPU architectures given the appropriate drivers

GPU architectures are simpler than CPU architectures, so it is easier to create a uniform platform for all GPUs.

leman · Feb 13, 2022

Sydde said:
I saw some einstein post on ars that the world runs on x86, and while there is some grain of truth to that, it is still far from accurate. There are billions of phones out there, of which six or seven of them run an x86 CPU, if even that.

I am fairly sure that there are many many more ARM devices than x86 devices in the world.

Sydde said:
I mean, if a thing like Vulkan can run on different GPU architectures given the appropriate drivers, there is no practical reason that computing platforms should not be ISA-agnostic as well. We just need the right standard for designing pre-object code that any given system can convert and optimize to run well on the processor it sports.

Apple already does this on iOS… but you are always paying with some efficiency for high-performance applications. One could of course agree on a standard bytecode and have all CPU vendors optimize for that, but won’t this just be a new de-facto ISA? Anyway, modern “system” languages already give you an ISA-agnostic platform. If you write your code in correct C/C+/Rust without relying on any platform assumptions your code should run pretty much anywhere.

Xiao_Xi said:
GPU architectures are simpler than CPU architectures, so it is easier to create a uniform platform for all GPUs.

Im not sure that is the case. We have a uniform GPU platform because that’s how the industry always worked. But there can be some fairly significant differences under the hood. For example Intel GPUs are a completely different beast from AMD/Nvidia/Apple and support very different programming modes, most of which are not exposed by common APIs.

cmaier · Feb 13, 2022

leman said:
I am fairly sure that there are many many more ARM devices than x86 devices in the world.

Apple already does this on iOS… but you are always paying with some efficiency for high-performance applications. One could of course agree on a standard bytecode and have all CPU vendors optimize for that, but won’t this just be a new de-facto ISA? Anyway, modern “system” languages already give you an ISA-agnostic platform. If you write your code in correct C/C+/Rust without relying on any platform assumptions your code should run pretty much anywhere.

Im not sure that is the case. We have a uniform GPU platform because that’s how the industry always worked. But there can be some fairly significant differences under the hood. For example Intel GPUs are a completely different beast from AMD/Nvidia/Apple and support very different programming modes, most of which are not exposed by common APIs.

The problem with a standard “pseudo-ISA” is that you crowd out innovation. When there’s a new type of software workload that requires innovation with new programming models and new architectural innovations, everyone just fractures again. The world can’t even agree on a standard HTML/CSS implementation, after all.

leman · Feb 13, 2022

cmaier said:
The problem with a standard “pseudo-ISA” is that you crowd out innovation. When there’s a new type of software workload that requires innovation with new programming models and new architectural innovations, everyone just fractures again. The world can’t even agree on a standard HTML/CSS implementation, after all.

Fully agree. That’s also why I stopped being a fan of common GPU APIs. I think every vendor should roll their own proprietary stuff and frameworks like Vulkan should be just user-sude libraries.

crazy dave · Feb 13, 2022

Xiao_Xi said:
GPU architectures are simpler than CPU architectures, so it is easier to create a uniform platform for all GPUs.

An individual GPU core is pretty simple and small but the GPU as a whole is a different beast. As one person, I think maybe Raja at AMD/Intel, put it, "it's amazing the d*** things work at all". The truth is that APIs are the best way to get at them - even within an individual manufacturer they're too different under the hood. The drivers are doing a lot of the heavy lifting. Heck, one of the reasons why Nvidia and AMD ship new drivers with AAA games? Almost all of those massive games with cutting edge graphics ship API busting graphics engine code that the driver is secretly correcting.

Also GPUs are geared towards bandwidth rather than latency anyway while a CPU is the opposite. A user will notice if their interface is slow while multitasking so a CPU process running as close to metal as possible is desirable. Running close to metal is still helpful for GPUs, hence the new APIs, but trading a little latency for portability is worth it. Even the older APIs are slightly more portable than the new ones and the main emphasis of the new ones was increasing bandwidth, not latency - a big uplift was more easily allowing multiple CPU threads to control the GPU.

Xiao_Xi · Feb 13, 2022

leman said:
If you write your code in correct C/C+/Rust without relying on any platform assumptions your code should run pretty much anywhere.

LLVM helps to do that and has become a "standard".

Source:

Explanation of LLVM in the first five minutes.

But, LLVM is getting bloated https://www.npopov.com/2020/05/10/Make-LLVM-fast-again.html

So, programming languages are looking for new alternatives. https://jason-williams.co.uk/a-possible-new-backend-for-rust

JouniS · Feb 13, 2022

leman said:
Anyway, modern “system” languages already give you an ISA-agnostic platform. If you write your code in correct C/C+/Rust without relying on any platform assumptions your code should run pretty much anywhere.

The portability of correct C/C++/Rust code is greatly exaggerated. For example, your code is not really platform-independent if your build system and dependency management are not platform-independent. Rust kind of achieves this if all dependencies are also written in Rust, but building C/C++ code with complex dependencies is a constant struggle even on a single platform.

Sometimes correctness is not enough. For example, you may require that you get the same results on every platform with the same random seed. C++ standard specifies the algorithms used in random number generators but not in random number distributions. If you need deterministic behavior, you have to implement your own distributions. Standard libraries often have subtle traps like that, where the behavior you actually depend on is underspecified and platform-dependent.

Sometimes you need functionality that cannot be built using only the standard library. Things like network connections and memory-mapped files, or anything involving graphics and/or sound. Then you need platform-specific code either in your codebase or in your dependencies.

pshufd · Feb 14, 2022

Sydde said:
I saw some einstein post on ars that the world runs on x86, and while there is some grain of truth to that, it is still far from accurate. There are billions of phones out there, of which six or seven of them run an x86 CPU, if even that. The architecture was designed for 1977 and is held together with bubblegum and bailing wire. It is decades past when we should have moved on.

I mean, if a thing like Vulkan can run on different GPU architectures given the appropriate drivers, there is no practical reason that computing platforms should not be ISA-agnostic as well. We just need the right standard for designing pre-object code that any given system can convert and optimize to run well on the processor it sports.

Mobile phones depend heavily on servers though. What architecture do those servers predominantly use?

cmaier · Feb 14, 2022

pshufd said:
Mobile phones depend heavily on servers though. What architecture do those servers predominantly use?

They also rely on wifi routers and cell towers and routers to talk to those servers. What architecture does that networking architecture predominantly use?

Xiao_Xi · Feb 14, 2022

cmaier said:
What architecture does that networking architecture predominantly use?

MIPS?

cmaier · Feb 14, 2022

Xiao_Xi said:
MIPS?

a lot of it, for sure, though less and less in the last couple of years.

pshufd · Feb 14, 2022

cmaier said:
They also rely on wifi routers and cell towers and routers to talk to those servers. What architecture does that networking architecture predominantly use?

They rely on electricity too. We can devolve of course but WiFi and Ethernet have been around for a long time.

AWS and the like were really what spurred on the cloud.

cmaier · Feb 14, 2022

pshufd said:
They rely on electricity too. We can devolve of course but WiFi and Ethernet have been around for a long time.

AWS and the like were really what spurred on the cloud.

No, server farms were around long before AWS. The availability of desktop-style browsing in a handheld form factor with wireless connectivity is what spurred the cloud.

pshufd · Feb 14, 2022

cmaier said:
No, server farms were around long before AWS. The availability of desktop-style browsing in a handheld form factor with wireless connectivity is what spurred the cloud.

And that's what I wrote.

januarydrive7 · Feb 14, 2022

pshufd said:
And that's what I wrote.

Pretty sure you wrote AWS and cmaier wrote iPhone.

pshufd · Feb 14, 2022

januarydrive7 said:
Pretty sure you wrote AWS and cmaier wrote iPhone.

"No, server farms were around long before AWS. The availability of desktop-style browsing in a handheld form factor with wireless connectivity is what spurred the cloud."

Intel Alder Lake vs. Apple M1

Suspended

macrumors 68000

macrumors 68000

macrumors 68030

Suspended

Suspended

macrumors Core

macrumors 6502a

macrumors 68030

macrumors 68000

macrumors Core

Suspended

macrumors Core

macrumors 68000

macrumors 68000

macrumors 6502a

macrumors G4

Suspended

macrumors 68000

Suspended

macrumors G4

Suspended

macrumors G4

macrumors 6502a

macrumors G4

Our Staff