Python CPU-bound multiprocessing benchmarks: 3 Intel Macs (+ Parallels on one) vs. the 16” M1 Pro MBP

Menneisyys2 · Jan 8, 2022

Using the multiprocessing benchmark code at the very end of https://realpython.com/python-concurrency/ (look for the codeblock in the “CPU-Bound multiprocessing Version” section - direct link: https://realpython.com/python-concurrency/#cpu-bound-multiprocessing-version ), I made the tests longer ten and a hundred times by changing the original “20” in “numbers = [5_000_000 + x for x in range(20)]” to 200 and 2000, respectively, and doing the same tests.

All the four Macs were up-to-date: MacOS 12.1. The same Python 3.10.1 was used everywhere. Only Activity Monitor (MacOS) / Task Manager (Windows) running in addition to PyCharm (python) to assess the process number.

The following chart, first, shows the four models (early 2015 13" MBP / 13" 2018 MBP / 2018 MacMini / 2021 MBP 16”) and their CPU configuration. The last four rows show Parallels Pro (latest version!) running on the 2018 MacMini. I’ll compare these figures to some real WinTel figures some time to see how much worse these are.

As you can see, the 2021 MBP 16” is almost three times faster than the second-fastest 2018 MacMini; six times faster than the 13" 2018 MBP and more than ten times faster than the early 2015 13" MBP.

Again, this is a CPU-bound multiprocessing benchmark, which heavily (!!!) profits from multiple cores as it evenly distributes the workload over every single of the cores. Tasks (for example Web browsing, non-optimized or non-optimizable stuff) NOT using multiple / all cores will NOT show so drastic a speed difference!

All values are in seconds.

Also note that the current Python 3.10.1 runtime is universal; that is, it also has an ARM binary. (Strange: MacOS still forced me to install Rosetta upon installing it - dunno why?!)

UPDATE (08/Jan/2022 20:40CET): on my 16” MBP, I’ve installed the current Parallels version (in Trial mode) and let it install the default Win11 ARM. I ran the single update in it; with that, it became version Windows 11 Home / 21H2; 22000.318. I installed PyCharm CE and Python 3.10.1 (all are the latest releases and x86 only).

The benchmark itself ran far better than I anticipated: the 16” was definitely (1,5…2 times) faster than the MacMini. However, PyCharm was really sluggish. It’s borderline useless, while, on the MacMini (which runs it (x86) natively), it’s perfectly usable - not much worse than the MacOS-native PyCharm running on the MacOS host. All in all, while the executing environment (3.10.1) itself definitely surprised me (positively), the downsides of x86 emulation were more than evident WRT PyCharm.

I’ve added the results in a new row starting with “2021 MBP 16” + current Parallels (in Trial mode), Win11 ARM + x86 Python”.

UPDATE (09/Jan/2022 02:04CET): Note: this isn’t related to Python benchmarking but MacOS ARM vs x86 + Rosetta2 emulation. I tested some files from the 8k MKV file archive at https://drive.google.com/drive/folders/1TSdV36G_npDtjRJze54GEYpdxpBt7nCK (also linked from https://www.avsforum.com/threads/8k-demo-videos.3107418/ ). The x86 VLC version running under Rosetta has always exhibited some 1.2…1.33 more CPU usage than the ARM version. I tested with both 30 and 60 fps MKV files. Not even the x96 version playing the 60p video did drop any frames. Some results (avg CPU usage):

8k30 (“First 8K Video From Space~Orig.mkv”): ARM: 27%, x86: 36%
8k60 (“Bulgaria 8K Hdr 60P Fuhd”): ARM: 37%, x86: 45%

UPDATE (09/Jan/2022 04:22CET): I’ve just posted a separate thread on hacking PyCharm to use the ARM JDK instead of the built-in x86 one for a MAJOR speedup - now it seems to be totally usable!!!

MAJOR speedup (Win11 ARM, Parallels): Hacking PyCharm to use the ARM JDK instead of the built-in x86 one

Note: at work, I need to do some serious work in PyCharm under Windows on my Mac. This is why I’m more than interested in making PyCharm work in the ARM version of Windows 11. In my previous thread (...

forums.macrumors.com

ADGrant · Jan 8, 2022

People who care about performance (particularly multi-core performance) don't use Python. Also you are not comparing the 2016 MBP with a comparable Intel Mac, your Intel Macs are all pretty low end performance wise.

I am also a little confused about why someone would care about Python performance in a Window 10 VM on a Mac. Python is a cross platform scripting language.

Menneisyys2 · Jan 8, 2022

ADGrant said:
I am also a little confused about why someone would care about Python performance in a Window 10 VM on a Mac. Python is a cross platform scripting language.

I just wanted to know how much a perf. hit Parallels causes and whether changing the emulated CPU cores has a positive effect.

Menneisyys2 · Jan 8, 2022

ADGrant said:
People who care about performance (particularly multi-core performance) don't use Python.

I used this Python example as it's available as it's platform-independent ( ! ) and available in source code form + can easily be modded.

ADGrant said:
Also you are not comparing the 2016 MBP with a comparable Intel Mac, your Intel Macs are all pretty low end performance wise.

Well, apart from the extra-expensive Mac Pros, the 2018 MacMini isn't THAT bad - the 2019 16" isn't much better (if at all), perf.-wise...

Of course it would have been best to compare to the latest-and-speediest Mac Pro, I just don't have access to it, unlike those other three Macs.

ADGrant · Jan 8, 2022

Menneisyys2 said:
I just wanted to know how much a perf. hit Parallels causes and whether changing the emulated CPU cores has a positive effect.

Fair point, performance in VMs is always worse than on the bare metal the VM runs on but it is useful to know by how much.

ADGrant · Jan 8, 2022

Menneisyys2 said:
I used this Python example as it's available as it's platform-independent ( ! ) and available in source code form + can easily be modded.

Well, apart from the extra-expensive Mac Pros, the 2018 MacMini isn't THAT bad - the 2019 16" isn't much better (if at all), perf.-wise...

Of course it would have been best to compare to the latest-and-speediest Mac Pro, I just don't have access to it, unlike those other three Macs.

The problem is that there is a lot going on between the Python code and the CPU. Python also doesn't support multithreading within the same process and forking off processes is very expensive resource wise.

True, Mac Pro does offer the fastest Intel Mac performance at a huge cost but the 27" iMacs are more reasonable and the best performing Intel Macs that don't cost the same as a car.

Xiao_Xi · Jan 8, 2022

@Menneisyys2 Can you create a public repo with your modified script and a markdown file with your results?

The more results this benchmark has, the more valuable it becomes.

Menneisyys2 · Jan 8, 2022

Xiao_Xi said:
@Menneisyys2 Can you create a public repo with your modified script and a markdown file with your results?

The more results this benchmark has, the more valuable it becomes.

I added them as attachments to this post. It's the same as the one in the article, except for the 20 -> 200 and 2000 changes. NOte: I had to change the file extension from .py to .txt to be able to attach them.

leman · Jan 8, 2022

ADGrant said:
People who care about performance (particularly multi-core performance) don't use Python.

I have to protest ? For example, most of my work is done in R. Sure, I could rewrite all of our pipeline in C++ and get a speed up of at least 100x, but it will likely take me years and make everything unmaintainable and undeployable, not to mention breaking workflows for my entire group (who are scientists, not programmers). There are very few organizations that care about performance only, it usually just one among many other constraints and concerns.

Given that it is not realistic for us to use a different ecosystem, we are very happy that these new Mac laptops run our scripts 3-4x quicker than their Intel predecessors.

Xiao_Xi · Jan 8, 2022

leman said:
I could rewrite all of our pipeline in C++ and get a speed up of at least 100x, but it will likely take me years and make everything unmaintainable and undeployable

Every data scientist faces the eternal two-language programming problem.

ADGrant said:
People who care about performance (particularly multi-core performance) don't use Python.

This benchmark may not prove the true potential of a computer, but it helps to construe a better picture of a computer.

mi7chy · Jan 8, 2022

M1 and Alder Lake seem to do well on these type of workloads. Would like to see Alder Lake results for comparison.

AMD 5950x isn't much faster.

200 4.4s
2000 39.5s

Menneisyys2 · Jan 8, 2022

Updated the original post with results under Windows 11 ARM + Parallels.

ADGrant · Jan 8, 2022

leman said:
I have to protest ? For example, most of my work is done in R. Sure, I could rewrite all of our pipeline in C++ and get a speed up of at least 100x, but it will likely take me years and make everything unmaintainable and undeployable, not to mention breaking workflows for my entire group (who are scientists, not programmers). There are very few organizations that care about performance only, it usually just one among many other constraints and concerns.

Given that it is not realistic for us to use a different ecosystem, we are very happy that these new Mac laptops run our scripts 3-4x quicker than their Intel predecessors.

All my work is done in C++ though a decent amount of my team's code base is Python (and there is even some R). We are gradually migrating some of the Python to C++.

I would not call C++ unmaintainable or undeployable but I am happy to concede that it is more challenging to work with than Python.

ADGrant · Jan 8, 2022

Xiao_Xi said:
Every data scientist faces the eternal two-language programming problem.

Some have argued that the solution to this problem is Swift though Swift for Tensor flow appears to have been mothballed. https://www.tensorflow.org/swift/guide/overview

Xiao_Xi · Jan 8, 2022

ADGrant said:
Some have argued that the solution to this problem is Swift though Swift for Tensor flow

Python has some limitations regarding automatic differentiation, so devs use new languages such as Swift or Julia to experiment with automatic differentiation.

Julia devs also claim that Julia solves the two-language problem.

ikramerica · Jan 8, 2022

Xiao_Xi said:
Every data scientist faces the eternal two-language programming problem.

This benchmark may not prove the true potential of a computer, but it helps to construe a better picture of a computer.

Not only that, but I found the exact same result comparing CPU bound rendering in ArchiCAD using a 16” M1Pro vs an older gen 4 core MBP. 3x faster rendering the same view with the same settings on the same model.

And ArchCAD is not native yet. So that’s using Rosetta 2.

ADGrant · Jan 8, 2022

Xiao_Xi said:
Python has some limitations regarding automatic differentiation, so devs use new languages such as Swift or Julia to experiment with automatic differentiation.

Julia devs also claim that Julia solves the two-language problem.

Well both Swift and Julia use the same LLVM toolchain also used by the Clang C++ compiler to build platform native binaries. OTOH Swift, like C++, is statically typed but Julia like Python appears to be dynamically typed. Static typing typically provides better runtime performance.

ahurst · Jan 8, 2022

ADGrant said:
People who care about performance (particularly multi-core performance) don't use Python.

As a scientific researcher I absolutely care about how well Python performs. I mean, unless some code is horrendously slow and easy to optimize in Cython, we’re going to write our analysis pipelines in Python using Numpy and Scipy where possible.

All sorts of data science and scientific workflows are Python-based (for good reason), so it’s a huge practical benefit when those workflows run fast!

resoverlord · Jan 8, 2022

ahurst said:
As a scientific researcher I absolutely care about how well Python performs. I mean, unless some code is horrendously slow and easy to optimize in Cython, we’re going to write our analysis pipelines in Python using Numpy and Scipy where possible.

All sorts of data science and scientific workflows are Python-based (for good reason), so it’s a huge practical benefit when those workflows run fast!

People who say python is slow are speaking of something that used to be true more than 10 years ago. Between pandas, numpy, and scipy (all which have compiled c libraries), python is becoming (or arguably already is) the language of choice for data scientists, stock traders, and others because of both how robust it is, and how fast it is.

resoverlord · Jan 8, 2022

ADGrant said:
The problem is that there is a lot going on between the Python code and the CPU. Python also doesn't support multithreading within the same process and forking off processes is very expensive resource wise.

True, Mac Pro does offer the fastest Intel Mac performance at a huge cost but the 27" iMacs are more reasonable and the best performing Intel Macs that don't cost the same as a car.

Python supports multithreading. You do have the GIL you have to contend with, but it is supported.

mi7chy · Jan 8, 2022

resoverlord said:
People who say python is slow are speaking of something that used to be true more than 10 years ago.

Python is usually towards the bottom of programming languages.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html

resoverlord · Jan 8, 2022

mi7chy said:
Python is usually towards the bottom of programming languages.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html

The libraries I mentioned are written in C and accessed via python code. Artificial benchmarks aren’t telling the whole story.

mi7chy · Jan 8, 2022

resoverlord said:
The libraries I mentioned are written in C and accessed via python code. Artificial benchmarks aren’t telling the whole story.

It's not really Python then if it's calling C code. Almost forty years ago we'd call assembly language or machine language code from slow AppleSoft BASIC. That's not AppleSoft BASIC either.

resoverlord · Jan 8, 2022

mi7chy said:
It's not really Python then if it's calling C code. Almost forty years ago we'd call assembly language or machine language code from slow AppleSoft. That's not AppleSoft either.

I’m still writing my applications in python, and it’s fast, so the semantics of it all really don’t bother me much.

Menneisyys2 · Jan 8, 2022

Added a section (see "UPDATE (09/Jan/2022 02:04CET)") on VLC's ARM vs x86 + Rosetta2 emulation resuts on the same 16" base model

Python CPU-bound multiprocessing benchmarks: 3 Intel Macs (+ Parallels on one) vs. the 16” M1 Pro MBP

macrumors 603

macrumors 68000

macrumors 603

macrumors 603

macrumors 68000

macrumors 68000

macrumors 68000

macrumors 603

Attachments

macrumors Core

macrumors 68000

Suspended

macrumors 603

macrumors 68000

macrumors 68000

macrumors 68000

macrumors 68000

macrumors 68000

macrumors 6502

macrumors newbie

macrumors newbie

Suspended

macrumors newbie

Suspended

macrumors newbie

macrumors 603

Our Staff