Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

dburkhanaev

macrumors 6502
Aug 8, 2018
295
170
Intel Macs (at least 2017 and newer) will get more than two OS releases. When the PPC-Intel transition occurred, OS releases were roughly every two years. Now Apple has moved to an annual release cycle, and given that ~2012 is the cutoff for Big Sur, it stands to season that a current Mac should easily have four OS updates before support is killed off.

I’m sure it will. I know the timeframe for OS update cycles now as opposed to back then. But four OS updates is still ridiculous considering the MacBook Pro I just upgraded from supported seven and will support BigSur, for a total of eight MacOS updates since the computer was released. And my PowerBook should have had more than two. It was powerful enough and sophisticated enough that it would have if Apple wasn’t already in the process of abandoning the hardware. I don’t fault Apple for making the move. I get that it is good for them and potentially for the customers. Time can only tell if it will be good moving forward for all existing Apple customers, but I digress.

The point that I’m poorly getting around to is that Apple gave a two year transition window from PPC to Intel, but made it through in just over a one year period and pretty quickly dumped very expensive and capable hardware much more quickly than they would have if they had stuck with the platform. And that green lit the way forward for developers to start doing the same.

I get the reason for abandoning Intel. But as much as it might hurt from the Apple side, I think they need to settle in to support Intel computers they release for a minimum 8 os releases from the last hardware that ships. It’s not going to be the end of the world and for many of us it won’t be the end of our owning Apple products. But I have a lot invested in the company, it’s ecosystem, and the platform of services attached. Instead of the deciding that they’ve made a token effort in four years, maybe they can stick with it a bit longer on the software side as well as the hardware side. Maybe they will keep producing apps in universal binary that have long term Apple support and perhaps that will encourage developers to do the same.
 

thekev

macrumors 604
Aug 5, 2010
7,005
3,343
That means all those who spend 50k or more on intel 2019 mac pro - you are stuck with a paper weight.

^Given the original question, this is unhelpful advice.

If the OP required resources at that level, they would be better off using an existing cluster, either managed by whatever institution they are affiliated or an existing cloud service. The edge cases where it makes sense to have a maxed out configuration here for this kind of work are better supported on
 

konqerror

macrumors 68020
Dec 31, 2013
2,298
3,701
To make to clear: nothing has been officially confirmed. But if Apple means business in the pro space, they have to work on their SIMD.

No they don't.

First, you have to cut through the marketing crap and understand that SIMD is a compromise: it's because fetching, decoding and scheduling an instruction has a cost, so you can get a gain by doing it once for multiple pieces of data. However, you get inefficiencies when you can't fill the whole vector, either because of your problem or because of conditionals, or because you have to wait until the data shows up.

If there were no per-instruction costs, then having 8 independent ALUs/EUs will always match or perform better than an 8-way vector.

Intel is the only design in the industry that's so wide, because the fetch and decode of x86 is so expensive. On most CPUs, AVX-512 doesn't even buy you anything in compute: they fuse the 2 AVX units together. Again, the gain is putting half the instructions through the front end.

All of the RISC players seem to be happy with 128-bit vectors, because their costs are different. Look at POWER9. Instead of Intel's approach of executing 1 512-bit vector, it handles 4 128-bit vectors at a time.
 
  • Like
Reactions: BigSplash

dburkhanaev

macrumors 6502
Aug 8, 2018
295
170
people keep saying this but the majority of Mac Pro purchases are by enterprises who if spending 20k must feel they need top performance so in 4-5 years they will need to buy a new machine anyway to keep having the best possible performance. If you are working in machine learning etc you will always need to be on the cutting edge.

I don’t know what organizations you have worked for that chuck a $50k computer/workstation once it’s not capable of pulling bleeding edge performance for the workload they were purchased for. But my experience is a lot different with corporations, especially coming from the finance side of business.

A high dollar asset has to depreciated somehow and typically computers aren’t depreciated at a rate of $10k per year, so that they be tossed in five years. The role of the computer is just shifted to other tasks so that it can still be a useful asset.

An example I can give is a company I worked for not too long ago. They are a logistics engineering firm. They design, model, and build the automation systems used for logistics companies. All of those automated eyes connected to sorters that push product down to conveyors. Then there are the pick robots and soon the driverless forklifts, etc.

But they had low tier laptops with docks and dual monitors for people like me in administration, procurement, finance, etc. and then they had the beastly towers attached to very large hi-res two or three displays. They would use lower intensive products for CAD, but more intensive stuff to 3-D render a whole warehouse and then animate it. That 6 year old tower that was a massive workhorse and cost the company something like $30k in 2013, was put in a caddy under the desk of the accounts payable clerk.

My point is that a company will donate, sell, or dispose/recycle computers eventually, but an expensive rig will be downgraded to other tasks until it loses that value and is no longer a productive asset. If a company can’t load office suite and run an updated browser and RDC on a $50k computer 3-5 years after they buy it, then it just won’t typically be seen as a good purchase from a corporate point of view.
 

konqerror

macrumors 68020
Dec 31, 2013
2,298
3,701
A high dollar asset has to depreciated somehow and typically computers aren’t depreciated at a rate of $10k per year, so that they be tossed in five years.

We do. The engineering software we run on workstations starts at $30k per year per seat, and goes up to $90k. The PhD running them has a loaded cost of $200k per year. The price of the computer is noise.

That 6 year old tower that was a massive workhorse and cost the company something like $30k in 2013, was put in a caddy under the desk of the accounts payable clerk.

False economy. We worked this out. The workstations burn 200 watts just sitting idle. At 12 cents a kWh, that costs $210 a year, or $630 over a 3 year lifecycle. We can get base-model admin computers new for $600. So nothing has been saved. Not to mention the noise, IT guys spending time fiddling with RAID arrays and buying SAS drives for a simple admin, the old systems are missing a number of security features (TPM 2, HVCI), and the complaint that the thing hogs cubicle space whereas everybody else has a USFF or an all-in-one.
 
  • Like
Reactions: leman

dburkhanaev

macrumors 6502
Aug 8, 2018
295
170
We do. The engineering software we run on workstations starts at $30k per year per seat, and goes up to $90k. The PhD running them has a loaded cost of $200k per year. The price of the computer is noise.



False economy. We worked this out. The workstations burn 200 watts just sitting idle. At 12 cents a kWh, that costs $210 a year, or $630 over a 3 year lifecycle. We can get base-model admin computers new for $600. So nothing has been saved. Not to mention the noise, IT guys spending time fiddling with RAID arrays and buying SAS drives for a simple admin, the old systems are missing a number of security features (TPM 2, HVCI), and the complaint that the thing hogs cubicle space whereas everybody else has a USFF or an all-in-one.

I’m hearing you, but the costs aren’t typically worked out that way in an enterprise. I haven’t worked in a field that has your use case so I don’t know what their decisions are. And in practice companies all do things similarly, unless they don’t. Another company I worked for designed end-user software for property management and they took all two year old laptops off rotation and sold them to employees for $50. That’s a waste of money in my opinion. But in the first company I mentioned, but worked at last, depreciation was based on the sum-of-digits method and the straight line depreciation method. It’s a calculator based over time used to expense the item until it passes below a positive salvage value. At which point that business would simply write it off and the machine will be wiped and hauled off as garbage.

I would have been given the joy of depreciating and writing it off. Businesses will hold onto things far longer than their initial use cases unless you’re in a company that just doesn’t.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
No they don't.

First, you have to cut through the marketing crap and understand that SIMD is a compromise: it's because fetching, decoding and scheduling an instruction has a cost, so you can get a gain by doing it once for multiple pieces of data. However, you get inefficiencies when you can't fill the whole vector, either because of your problem or because of conditionals, or because you have to wait until the data shows up.

If there were no per-instruction costs, then having 8 independent ALUs/EUs will always match or perform better than an 8-way vector.

Oh, we are in complete agreement on this. What I mean is that Apple currently offers 3 128-bit EUs and that is just not enough for pro-level data-parallel workflows.

Personally, I am not a fan of Intel's approach — making separate ISA sub-extensions for various vector widths seems like a waste to me, especially given the fact that you start running into all kinds of weird behavior when using these extensions (like how transition between AVX and SSE stalls everything or that you get reduced clocks when using wider vectors). You end up doing a lot of crap like runtime feature detection and function dispatch, in the end this is unnecessary engineering cost and the performance is hampered.

ARM64 advanced SIMD is a very nice instruction set, but for many data-crunching applications, chaining 128-bit SIMD operations is just not the most efficient thing (code-wise). SVE is the way to go and I really hope that Apple would adopt them — this would allow the software to be built once and automatically take advantage of all the vector units on the machine, no matter where it is running. As to how it's implemented in hardware — I agree with you that having, say, 4 128-bit units that be scheduled independently is much more flexible.
 

konqerror

macrumors 68020
Dec 31, 2013
2,298
3,701
I would have been given the joy of depreciating and writing it off. Businesses will hold onto things far longer than their initial use cases unless you’re in a company that just doesn’t.

The companies that I have worked on have no problem cycling old hardware with one simple policy: all business-critical hardware and software must be manufacturer-supported, both for reliability (same or next day repair) and security (BIOS and driver updates for PCs). The latter point is not an option when you're dealing with regulated data and third-party IP.

Since PC manufacturers have 3 year standard/5 year max service contracts, that sets the upper limit. Macs have a shorter lifespan since Applecare only goes to 3 years.
[automerge]1595310665[/automerge]
Personally, I am not a fan of Intel's approach — making separate ISA sub-extensions for various vector widths seems like a waste to me, especially given the fact that you start running into all kinds of weird behavior when using these extensions (like how transition between AVX and SSE stalls everything or that you get reduced clocks when using wider vectors). You end up doing a lot of crap like runtime feature detection and function dispatch, in the end this is unnecessary engineering cost and the performance is hampered.

From observing Intel's own MKL, it doesn't use AVX-512 very often, even on systems with the second FMA. I think it's more marketing than useful, which is really 80% of new Intel features.

Ultimately, if your problem really can take advantage of vector, then go GPU. I suspect that AVX-512 is a worst of both worlds.
 
Last edited:

thekev

macrumors 604
Aug 5, 2010
7,005
3,343
All of the RISC players seem to be happy with 128-bit vectors, because their costs are different. Look at POWER9. Instead of Intel's approach of executing 1 512-bit vector, it handles 4 128-bit vectors at a time.

The ARM approach is easier to work with, assuming it's implemented sanely. Both intel architectures haswell and later and a lot of cpus that support some version of Neon have many similarities in expected latency/throughput numbers on common operations, such as fused multiply add. Intel has been doing the weird thing of using an increasing number of vector widths, while keeping lanes at 128 bits. Cross lane operations such as shuffles then have higher latency costs than within lane counterparts, which is pure garbage.

Other things look completely different when comparing solutions across various intel simd ISAs. Anything tailored to AVX tends to favor high unrolling factors with unaligned memory ops folded wherever possible, if you ignore Sandy Bridge, Ivy Bridge, and AMD. The older SSE forms favor the use of pack aligned instructions wherever possible to mitigate the use of unaligned memory movement.

With AVX512, they split things even further to the point where it's very difficult to write anything portable by hand. You're either writing very specific per ISA solutions for code that requires it or leaving it up to the compiler's vectorizer, which can still be very very hit and miss, even on the latest versions of GCC and Clang.

ARM (or rather Neon) is actually fairly elegant there by comparison. If it becomes a more mainstream option beyond just Apple, I'll probably go that way myself.

From observing Intel's own MKL, it doesn't use AVX-512 very often, even on systems with the second FMA. I think it's more marketing than useful, which is really 80% of new Intel features.

Ultimately, if your problem really can take advantage of vector, then go GPU. I suspect that AVX-512 is a worst of both worlds.

It should depend how you compile and link it. For the longest time they recommended dynamic linkage as a weak form of runtime dispatch. It seems to throttle clock rates somewhat whenever AVX512 is used, so they may be more cautious about that.

How did you track its use there though?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
ARM (or rather Neon) is actually fairly elegant there by comparison. If it becomes a more mainstream option beyond just Apple, I'll probably go that way myself.

Did you have a look at SVE? Vector width agnostic code is really neat, no need to unroll anything, no need to treat the last few elements in a special way — the ISA just takes care of it for you.
[automerge]1595321024[/automerge]
That would be the gamers with their "muh single core speedz"...

What does it have to do with games? Higher IPC > number of cores. Put differently, a dual-core CPU that can do 2 instructions per cycle per core will have better real-world performance than a quad-core CPU with 1 instruction per cycle per core. One of the reasons why I am exited about ARM Macs is precisely because they will offer better single-core performance.
 
Last edited:
  • Like
Reactions: thekev

thekev

macrumors 604
Aug 5, 2010
7,005
3,343
Did you have a look at SVE? Vector width agnostic code is really neat, no need to unroll anything, no need to treat the last few elements in a special way — the ISA just takes care of it for you.

That looks like a very pretty solution, especially from the compiler end. It mentions multiples though, so it might not completely save you from peeling the last few elements. I just googled for it, which yielded a paper entitled " The ARM Scalable Vector Extension". It sounds very nice, if that's what you're referring to.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
That looks like a very pretty solution, especially from the compiler end. It mentions multiples though, so it might not completely save you from peeling the last few elements. I just googled for it, which yielded a paper entitled " The ARM Scalable Vector Extension". It sounds very nice, if that's what you're referring to.

SVE uses masking which allows you to only use fractions of the available SIMD registers when needed. Basically, you loop over multiples of the reported vector widths (which is determined on runtime) and the last iteration will use a partially masked operation. See an example starting from page 17 on these slides.

The beauty of the system: you don't need to write different code for a machine with different vector width. You could write an algorithm and run it on an iPhone (with its 128bit vectors) or on some sort of ARM super-CP with 2048-bit vectors. This radically simplifies debugging. I really hope that SVE becomes widespread, it is amazing for scientific computing.
 
  • Like
Reactions: thekev

thekev

macrumors 604
Aug 5, 2010
7,005
3,343
SVE uses masking which allows you to only use fractions of the available SIMD registers when needed. Basically, you loop over multiples of the reported vector widths (which is determined on runtime) and the last iteration will use a partially masked operation. See an example starting from page 17 on these slides.

The beauty of the system: you don't need to write different code for a machine with different vector width. You could write an algorithm and run it on an iPhone (with its 128bit vectors) or on some sort of ARM super-CP with 2048-bit vectors. This radically simplifies debugging. I really hope that SVE becomes widespread, it is amazing for scientific computing.

Oh man those assembly snippets are so pretty. It's not just debugging. Can you imagine how much simpler a compiler's vectorization planner would be for such that kind of back end target? If you had that + a few standardized directives to indicate expected aliasing and what loop to unroll, you could end up with a really nice optimizer.
 

vigilant

macrumors 6502a
Aug 7, 2007
715
288
Nashville, TN
Hi all,
apologies for the n00b question: invest in the MacPro now, or wait for an ARM version to come out?
I'm an empirical researcher working with large datasets, statistics and machine learning (Python, Matlab, C++).
I was about to order a maxed out Mac Pro for my work (research funding), and hoped to be comfortable for the next 5-7 years.
But ARM might topple this. What are the implications for software (like the statistical programs), and could I upgrade a mother board later on using the current components (RAM, SSD)?
thanks for your help.

TLDR : Buy the Mac Pro today. Apple made that thing to last.

Long : I have no doubt Apple will leap frog the Mac Pro in 2 years easily. The problem is there is no guarantee how soon third party companies are going to prioritize it anytime soon. I expect an ARM Mac Pro probably in a year at the earliest. I don't know if they have enough of the Apple Silicon stuff fully at the point where we can start thinking about an 80 Core ARM Mac Pro. I'm sure it's absolutely on the table to be designed, but I seriously doubt said chip is close to completing a design phase, let alone prototype phase. It maybe in prototype phase, but these things take so damn long to develop.
 

Hexley

Suspended
Jun 10, 2009
1,641
505
Hi all,
apologies for the n00b question: invest in the MacPro now, or wait for an ARM version to come out?
I'm an empirical researcher working with large datasets, statistics and machine learning (Python, Matlab, C++).
I was about to order a maxed out Mac Pro for my work (research funding), and hoped to be comfortable for the next 5-7 years.
But ARM might topple this. What are the implications for software (like the statistical programs), and could I upgrade a mother board later on using the current components (RAM, SSD)?
thanks for your help.
Are you willing to wait until WWDC 2021?

Are you willing to wait beyond that date for your apps to have Universal 2 binary?
 

Glenn_Magerman

macrumors newbie
Original poster
Jun 30, 2020
9
1
I went for the maxed out Mac Pro Rack in July, and I'm loving every minute I made that decision!

I'm running large analyses (statistics and network analysis) that were just impossible a few months ago or would take days to run at best per code block. I have full control over my machine and do not depend on intermediaries like IT admin to tweak setting for particular projects (softwares, credentials, CRAN...). All this even while our university is scaling up from HPC to HDPA infrastructure. I'm also indeed often dependent on proprietary software, and it's unclear when these providers would move ahead with new binaries.

Much of this is a state of flow: if you're in the project you want to move ahead and not stop and go due to admin/sudo tweaks.

Also, it's much easier to deal with data confidentiality on a separate machine than using external resources like AWS (but perhaps my understanding on this topic is too shallow).

Now, most of the bottlenecks I face are (i) deployability, RAM allocation and speed of softwares (e.g. compiled/non-compiled languages and multicore processing, but also large graph visualisation tools) and (ii) my own coding efficiency :) . Areas I should invest in...
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.