Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,296
I guess core count and clock speeds are the new muscle cars.

ARM trails x64 by about half for TFlops/s per core and TFlops/s per Kw on TOP500 supercomputers.

AMD Epyc 7763 (Perlmutter)
0.093 TFlops/s per core
27.37 TFlops/s per Kw

Fujitsu A64FX ARM (Fagaku)
0.058 TFlops/s per core
14.78 TFlops/s per Kw

Don't tell lemon some of them also run Redhat/CentOS 7.

1.5TB of RAM

Meh...

8TB
https://www.supermicro.com/en/products/motherboard/H12DSG-Q-CPU6
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
ARM trails x64 by about half for TFlops/s per core and TFlops/s per Kw on TOP500 supercomputers.

AMD Epyc 7763 (Perlmutter)
0.093 TFlops/s per core
27.37 TFlops/s per Kw

Fujitsu A64FX ARM (Fagaku)
0.058 TFlops/s per core
14.78 TFlops/s per Kw

While the description on the TOP 500 website talks about the clock speed of the CPU for calculating Rpeak, it also includes GPU capability for heterogeneous systems like Perlmutter.

The Top 500 lists Perlmutter at 760,000+ cores. Of that, however, currently Perlmutter only has about 96,000 AMD cores as the heterogenous system was built first. The other ~650,000 cores that achieve its TFlops ratings are GPU cores from the Nvidia A100 (roughly 6,000 of them). Phase 2 cabinets with just the Milan cores are only being built now. Fugaku is CPU only.


That's 1,500+ nodes with 1x64 core AMD CPU and 4x108 SM Nvidia GPU - 744,000+ cores (then add the user access nodes, non-compute nodes, and service nodes to get 760,000+).

Again, Fugaku is CPU only. That it's as good as it is against a heterogenous system is impressive. I would've expected better from Perlmutter though elements of linpack really do favor CPUs and vector processing. Zen 3 Milan is a very fine, well rounded server CPU and compares favorably with just about everything out in the market and the GPUs will be probably excellent for the scenarios they are designed for.

But just out of curiosity did you actually think that Perlmutter's 700,000 cores were all CPU despite the description of it in the list explicitly mentioning the GPU type?

Or did you hope no one else would know what Perlmutter was or bother to check when you left out the GPU in your description and divided the Tflops by the 700,000 cores and declared that to be the score per Epyc core and Kw?
 
Last edited:
  • Like
Reactions: Homy and JMacHack

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
No, it doesn't.


The post you're replying to made no mention of frequency at all. I also have experienced you responding to one of my posts with a torrent of paragraphs about stuff I simply didn't say.


You need to actually read what Intel says the left hand graph is about instead of going off on what you think it's about. It's right there in the graph's caption: "P-Core delivers higher Performance on single and lightly threaded scalable apps." And the captions for the curves in that graph: "1 E-core", "1 P-Core".

The reason there's a spot where the two curves almost touch is just that this is often what happens when you plot perf/power curves for two different CPU architectures. It doesn't mean Intel's trying to imply handoff.

It's not hard to simply accept Intel at their word and use that as the basis for further reasoning: "E-core provide higher computational density under given physical constraints". That's the caption for the right hand graph, where they contrast the MT performance of a 4P configuration against a 2P+8E configuration. If you go look up annotated die photos of Alder Lake family processors, you'll find that a rectangle drawn around eight E cores is not much larger than a rectangle drawn around two P cores. With that context, the message of the slide can be understood as: "By substituting eight E cores for two P cores, we gain 50% MT performance over a 4P config".

So Intel's definition of "E"fficiency for Alder Lake E cores is mostly about area, not power. Golden Cove's high ST performance comes at an extreme area cost, and they weren't going to look competitive against high core count AMD processors without doing something about that.

I agree with two minor caveats: as the chips and cheese article says for integer-heavy workloads Gracemont is better efficiency-wise and at very low power around 5 watts it is slightly better there too. But otherwise yes the E cores are really about die area. Especially as the article notes in the power regime they are run at and in the loads they are expected to take on.
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
we know that Apple has a long and very consistent history of increasing single core performance around 15-20% per year
How much longer can they keep that up, though? A lot of their gains have been from the process (which, ignoring the bs that process nodes amount to, has been shrinking on average something like 9% a year). Are there real, practical ways that they can improve performance with every new core set? And honestly, is there that much to gain at this point in boosting SpecInt? I should be hoping they would focus more on the heavy work GPU and ANE parts, if they are looking for bragging rights.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
How much longer can they keep that up, though? A lot of their gains have been from the process (which, ignoring the bs that process nodes amount to, has been shrinking on average something like 9% a year). Are there real, practical ways that they can improve performance with every new core set? And honestly, is there that much to gain at this point in boosting SpecInt? I should be hoping they would focus more on the heavy work GPU and ANE parts, if they are looking for bragging rights.
They can keep it up for quite awhile. All sorts of micro architectural and physical design tricks left in the bag, still.
 
  • Like
Reactions: Sydde and michalm

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
Or did you hope no one else would know what Perlmutter was or bother to check when you left out the GPU in your description and divided the Tflops by the 700,000 cores and declared that to be the score per Epyc core and Kw?
Ding ding ding, there’s your answer.

Mitchy either doesn’t know what the **** he’s talking about or posts deliberately misleading claims. Apparently not aware how easily it is to disprove.
 
  • Like
Reactions: Corpora and Homy

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
Mitchy either doesn’t know what the **** he’s talking about or posts deliberately misleading claims. Apparently not aware how easily it is to disprove.

Mithcy is just quoting stuff without really understanding the topic or the context. Difficult to deal with people like that, because they always have an "argument" while completely lacking any sense of intellectual responsibility.
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
I saw some einstein post on ars that the world runs on x86, and while there is some grain of truth to that, it is still far from accurate. There are billions of phones out there, of which six or seven of them run an x86 CPU, if even that. The architecture was designed for 1977 and is held together with bubblegum and bailing wire. It is decades past when we should have moved on.

I mean, if a thing like Vulkan can run on different GPU architectures given the appropriate drivers, there is no practical reason that computing platforms should not be ISA-agnostic as well. We just need the right standard for designing pre-object code that any given system can convert and optimize to run well on the processor it sports.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
computing platforms should be ISA-agnostic as well. We just need the right standard for designing pre-object code that any given system can convert and optimize to run well on the processor it sports.
LLVM does that.

Vulkan can run on different GPU architectures given the appropriate drivers
GPU architectures are simpler than CPU architectures, so it is easier to create a uniform platform for all GPUs.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
I saw some einstein post on ars that the world runs on x86, and while there is some grain of truth to that, it is still far from accurate. There are billions of phones out there, of which six or seven of them run an x86 CPU, if even that.

I am fairly sure that there are many many more ARM devices than x86 devices in the world.

I mean, if a thing like Vulkan can run on different GPU architectures given the appropriate drivers, there is no practical reason that computing platforms should not be ISA-agnostic as well. We just need the right standard for designing pre-object code that any given system can convert and optimize to run well on the processor it sports.

Apple already does this on iOS… but you are always paying with some efficiency for high-performance applications. One could of course agree on a standard bytecode and have all CPU vendors optimize for that, but won’t this just be a new de-facto ISA? Anyway, modern “system” languages already give you an ISA-agnostic platform. If you write your code in correct C/C+/Rust without relying on any platform assumptions your code should run pretty much anywhere.

GPU architectures are simpler than CPU architectures, so it is easier to create a uniform platform for all GPUs.

Im not sure that is the case. We have a uniform GPU platform because that’s how the industry always worked. But there can be some fairly significant differences under the hood. For example Intel GPUs are a completely different beast from AMD/Nvidia/Apple and support very different programming modes, most of which are not exposed by common APIs.
 
  • Like
Reactions: crazy dave

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
I am fairly sure that there are many many more ARM devices than x86 devices in the world.



Apple already does this on iOS… but you are always paying with some efficiency for high-performance applications. One could of course agree on a standard bytecode and have all CPU vendors optimize for that, but won’t this just be a new de-facto ISA? Anyway, modern “system” languages already give you an ISA-agnostic platform. If you write your code in correct C/C+/Rust without relying on any platform assumptions your code should run pretty much anywhere.



Im not sure that is the case. We have a uniform GPU platform because that’s how the industry always worked. But there can be some fairly significant differences under the hood. For example Intel GPUs are a completely different beast from AMD/Nvidia/Apple and support very different programming modes, most of which are not exposed by common APIs.

The problem with a standard “pseudo-ISA” is that you crowd out innovation. When there’s a new type of software workload that requires innovation with new programming models and new architectural innovations, everyone just fractures again. The world can’t even agree on a standard HTML/CSS implementation, after all.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
The problem with a standard “pseudo-ISA” is that you crowd out innovation. When there’s a new type of software workload that requires innovation with new programming models and new architectural innovations, everyone just fractures again. The world can’t even agree on a standard HTML/CSS implementation, after all.

Fully agree. That’s also why I stopped being a fan of common GPU APIs. I think every vendor should roll their own proprietary stuff and frameworks like Vulkan should be just user-sude libraries.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
GPU architectures are simpler than CPU architectures, so it is easier to create a uniform platform for all GPUs.

An individual GPU core is pretty simple and small but the GPU as a whole is a different beast. As one person, I think maybe Raja at AMD/Intel, put it, "it's amazing the d*** things work at all". The truth is that APIs are the best way to get at them - even within an individual manufacturer they're too different under the hood. The drivers are doing a lot of the heavy lifting. Heck, one of the reasons why Nvidia and AMD ship new drivers with AAA games? Almost all of those massive games with cutting edge graphics ship API busting graphics engine code that the driver is secretly correcting.

Also GPUs are geared towards bandwidth rather than latency anyway while a CPU is the opposite. A user will notice if their interface is slow while multitasking so a CPU process running as close to metal as possible is desirable. Running close to metal is still helpful for GPUs, hence the new APIs, but trading a little latency for portability is worth it. Even the older APIs are slightly more portable than the new ones and the main emphasis of the new ones was increasing bandwidth, not latency - a big uplift was more easily allowing multiple CPU threads to control the GPU.
 
Last edited:
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
If you write your code in correct C/C+/Rust without relying on any platform assumptions your code should run pretty much anywhere.

LLVM helps to do that and has become a "standard".
LLVM.png

Source:
Explanation of LLVM in the first five minutes.

But, LLVM is getting bloated https://www.npopov.com/2020/05/10/Make-LLVM-fast-again.html

So, programming languages are looking for new alternatives. https://jason-williams.co.uk/a-possible-new-backend-for-rust
 

JouniS

macrumors 6502a
Nov 22, 2020
638
399
Anyway, modern “system” languages already give you an ISA-agnostic platform. If you write your code in correct C/C+/Rust without relying on any platform assumptions your code should run pretty much anywhere.
The portability of correct C/C++/Rust code is greatly exaggerated. For example, your code is not really platform-independent if your build system and dependency management are not platform-independent. Rust kind of achieves this if all dependencies are also written in Rust, but building C/C++ code with complex dependencies is a constant struggle even on a single platform.

Sometimes correctness is not enough. For example, you may require that you get the same results on every platform with the same random seed. C++ standard specifies the algorithms used in random number generators but not in random number distributions. If you need deterministic behavior, you have to implement your own distributions. Standard libraries often have subtle traps like that, where the behavior you actually depend on is underspecified and platform-dependent.

Sometimes you need functionality that cannot be built using only the standard library. Things like network connections and memory-mapped files, or anything involving graphics and/or sound. Then you need platform-specific code either in your codebase or in your dependencies.
 
  • Like
Reactions: Xiao_Xi

pshufd

macrumors G4
Oct 24, 2013
10,149
14,574
New Hampshire
I saw some einstein post on ars that the world runs on x86, and while there is some grain of truth to that, it is still far from accurate. There are billions of phones out there, of which six or seven of them run an x86 CPU, if even that. The architecture was designed for 1977 and is held together with bubblegum and bailing wire. It is decades past when we should have moved on.

I mean, if a thing like Vulkan can run on different GPU architectures given the appropriate drivers, there is no practical reason that computing platforms should not be ISA-agnostic as well. We just need the right standard for designing pre-object code that any given system can convert and optimize to run well on the processor it sports.

Mobile phones depend heavily on servers though. What architecture do those servers predominantly use?
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Mobile phones depend heavily on servers though. What architecture do those servers predominantly use?
They also rely on wifi routers and cell towers and routers to talk to those servers. What architecture does that networking architecture predominantly use?
 

pshufd

macrumors G4
Oct 24, 2013
10,149
14,574
New Hampshire
They also rely on wifi routers and cell towers and routers to talk to those servers. What architecture does that networking architecture predominantly use?

They rely on electricity too. We can devolve of course but WiFi and Ethernet have been around for a long time.

AWS and the like were really what spurred on the cloud.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
They rely on electricity too. We can devolve of course but WiFi and Ethernet have been around for a long time.

AWS and the like were really what spurred on the cloud.
No, server farms were around long before AWS. The availability of desktop-style browsing in a handheld form factor with wireless connectivity is what spurred the cloud.
 
  • Like
Reactions: bobcomer
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.