Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I'd guess that a lot of video editing and transcoding programs use x86-specific vector acceleration and that will need to be ported for best performance on the ARM stuff unless Apple has a layer that makes that transparent. Most compilers added Intrinsics about ten years ago so you didn't have to program in assembler and so that you didn't have to worry about registers as well but the Intrinsics themselves are geared to x86 architectures.

ARM has a full set of intrinsics for various versions of Neon. It's not a complete 1:1 mapping compared to intel's interface.


Clang also implements the pragma simd openmp directive, but I don't know if Apple's fork supports it. They don't seem to bundle an OpenMP runtime with XCode, but pragma simd doesn't need to link against them anyway.

As far as writing in intrinsics, I prefer ARM here.

Intel has been kind of a mess. Right now they support 128, 256, and 512 bits. Some architectures only support 256. Very old architectures only support 128, but some of them still exist in inexpensive units that are sold today. The 128 bit types have 2 types of instructions, the newer vex prefixed 3 operand ones and the older 2 operand ones. Intrinsics do an imperfect job of handling this, as the correspondence between vex prefixed and non-prefixed instructions isn't perfect, even for 128 bit widths.

Complicating things further, intel makes a lot of incremental changes on what instructions will optionally micro fuse with a load operation. While you don't deal with that when writing in intrinsics, it's common to perform unrolling steps directly when using intrinsics based code, since compilers are very inconsistent in what/how they will unroll these and you typically need multiple independent ops to fully hide latency. Micro fusion impacts how far you can go, as compilers may spill additional stuff into stack based copies.

ARM does not expose micro fusion at the assembly layer. They expose 32 register names instead (AVX512 also does this, but it's the least common), which tends to accommodate things that would require micro fusion with 16 names. They don't have opcodes or intrinsics for cross lane shuffles, because their register name width matches their lane width. Restricting intel to 128 bits results in sub-optimal performance.

Neon doesn't have masked load / write, but it provides clean options for scalar loads and stores to/from simd register names.


I could go on. I don't mind using intel intrinsics for a small amount of code where necessary, but I don't see this as a drawback for ARM. Rather I think ARM allows for simpler vectorization back ends and cleaner use of intrinsics.
 
  • Like
Reactions: Maximara
ARM has a full set of intrinsics for various versions of Neon. It's not a complete 1:1 mapping compared to intel's interface.


Clang also implements the pragma simd openmp directive, but I don't know if Apple's fork supports it. They don't seem to bundle an OpenMP runtime with XCode, but pragma simd doesn't need to link against them anyway.

As far as writing in intrinsics, I prefer ARM here.

Intel has been kind of a mess. Right now they support 128, 256, and 512 bits. Some architectures only support 256. Very old architectures only support 128, but some of them still exist in inexpensive units that are sold today. The 128 bit types have 2 types of instructions, the newer vex prefixed 3 operand ones and the older 2 operand ones. Intrinsics do an imperfect job of handling this, as the correspondence between vex prefixed and non-prefixed instructions isn't perfect, even for 128 bit widths.

Complicating things further, intel makes a lot of incremental changes on what instructions will optionally micro fuse with a load operation. While you don't deal with that when writing in intrinsics, it's common to perform unrolling steps directly when using intrinsics based code, since compilers are very inconsistent in what/how they will unroll these and you typically need multiple independent ops to fully hide latency. Micro fusion impacts how far you can go, as compilers may spill additional stuff into stack based copies.

ARM does not expose micro fusion at the assembly layer. They expose 32 register names instead (AVX512 also does this, but it's the least common), which tends to accommodate things that would require micro fusion with 16 names. They don't have opcodes or intrinsics for cross lane shuffles, because their register name width matches their lane width. Restricting intel to 128 bits results in sub-optimal performance.

Neon doesn't have masked load / write, but it provides clean options for scalar loads and stores to/from simd register names.


I could go on. I don't mind using intel intrinsics for a small amount of code where necessary, but I don't see this as a drawback for ARM. Rather I think ARM allows for simpler vectorization back ends and cleaner use of intrinsics.

Development on intrinsics was a mess as, from my perspective, it was led by the compilers, not Intel. There is a lot of inconsistency in the SIMD instruction sets and sometimes it's frustrating doing something that should be simple but isn't because you're missing something or an instruction doesn't do precisely what you want it to do.

There are times when I didn't like what the compiler was doing so I would drop down to assembler for precision.

I recall that Altivec had similar issues in that you had varying operand instructions, and I recall, quite a bit of them.

I would love to have a matrix type as well. Say, something 8x8 integers and perform an operation on it in parallel.
 
  • Like
Reactions: thekev
Development on intrinsics was a mess as, from my perspective, it was led by the compilers, not Intel. There is a lot of inconsistency in the SIMD instruction sets and sometimes it's frustrating doing something that should be simple but isn't because you're missing something or an instruction doesn't do precisely what you want it to do.

There are times when I didn't like what the compiler was doing so I would drop down to assembler for precision.

I recall that Altivec had similar issues in that you had varying operand instructions, and I recall, quite a bit of them.

I would love to have a matrix type as well. Say, something 8x8 integers and perform an operation on it in parallel.

I'm worried that if I keep going too detailed, this will be deleted as off topic. I guess what I'm thinking here is that with compiler advancements over the last few years and ARM's instruction set being a bit cleaner, it should be more manageable porting SSE, AVX -> Neon than the other way around. Clang 9 and 10 improved quite a lot, presumably in response to GCC pulling significantly ahead of them for a bit there. It's worth noting that the vectorization planner probably helped too.


I don't know if it made its way into Apple's build.

I would love to have a matrix type as well. Say, something 8x8 integers and perform an operation on it in parallel.

Compilers rarely implement that sort of thing. Apple might. It's hard to say. They have Accelerate. If they added inlinable primitives for matrices and things, it might be enough. Projects like libflame have shown that you can get good performance with smaller matrix op kernels than something like openblas uses.

 
I'm worried that if I keep going too detailed, this will be deleted as off topic. I guess what I'm thinking here is that with compiler advancements over the last few years and ARM's instruction set being a bit cleaner, it should be more manageable porting SSE, AVX -> Neon than the other way around. Clang 9 and 10 improved quite a lot, presumably in response to GCC pulling significantly ahead of them for a bit there. It's worth noting that the vectorization planner probably helped too.


I don't know if it made its way into Apple's build.



Compilers rarely implement that sort of thing. Apple might. It's hard to say. They have Accelerate. If they added inlinable primitives for matrices and things, it might be enough. Projects like libflame have shown that you can get good performance with smaller matrix op kernels than something like openblas uses.


I think that I will leave it at this as I can get lost in a thread and go way off topic.
 
  • Like
Reactions: thekev
After today's big news:

This is a big vendor for MS to buy in gaming. Upcoming new games, Elder Scrolls VI, ESO.

I wonder if the Mac client for ESO will just run forever on Rosetta. They've been slow to keep it up to date as is (although it runs well these days on Mac.)

It will depend on how the Mac client for ESO was written and how large a demand there is for what is now a six year MMO on the Mac. Sometimes ports are written in such a way you basically have to recode the whole thing. Take the Sims 3 which had an announcement of a 64-bit clean version that would run on Metal back in Oct 2019 and there is a lot of work involved to do that. Yes it is 5 years older then ESO but it shows what made be involved in bringing an old game up to modern standards.

One hopes that with money rich Microsoft running things Fallout 76 will keep moving away from being a freaking garbage fire.
 
  • Like
Reactions: thedocbwarren
So many months since Apple's initial announcement about ARM/Apple silicon,
and still we wait to see the first machine with such chips...

I guess their whole timeline have gone a little bit delayed...
 
So many months since Apple's initial announcement about ARM/Apple silicon,
and still we wait to see the first machine with such chips...

I guess their whole timeline have gone a little bit delayed...
Not exactly as we have "seen" the Developer Transition Kit which has an ARM/Apple silicon CPU. :) :p

Moreover Apple hasn't given us more than the first commercial ARM/Apple silicon Mac is be announced before the end of 2020. Everything else is rumor.
 
Last edited:
Source?

Next week's event may or may not have Macs. Definitely hoping for yes.
I’m seeing great discounts on 2020 13” so it makes me wonder if something’s coming.

But even if AS was released and available next week I will stick with the 13” i5 10th Gen I bought yesterday. I will be running Windows under Parallels and don’t want any hassles.
 

Misread all rumor as an apple announcements (oops). The reasoning presented in on article was Apple would want to aim for the Christmas crown..that with the way Covid has mess things up I think this will be the worst Christmas retail wise seen in decades.

Next week's event may or may not have Macs. Definitely hoping for yes.

Everbody and his brother seems to chomping at the bit for an October Announcement of an AS Mac. However, looking over Apple's announcement history I see the last time they made the announcement of new macs in October was 2016. Well we will know soon. Wonder if Apple with do both a November and December announcement or they will just combine the two.
 
Misread all rumor as an apple announcements (oops). The reasoning presented in on article was Apple would want to aim for the Christmas crown..that with the way Covid has mess things up I think this will be the worst Christmas retail wise seen in decades.

Everbody and his brother seems to chomping at the bit for an October Announcement of an AS Mac. However, looking over Apple's announcement history I see the last time they made the announcement of new macs in October was 2016. Well we will know soon. Wonder if Apple with do both a November and December announcement or they will just combine the two.

Given COVID problems peppering supply chains - I'd rather they take their time and get everything right.

They're probably trying to deal with the T2 problems too.
 
Next week's event may or may not have Macs. Definitely hoping for yes.

It seems most of the publications are reporting that the event next week will be iPhones and HomePad, maybe with a "now available" for the Air Pad announced next month; and, that a November release is the likely event for new Macs. Apple is moving to less product intros per event, more events since they are virtual.

One note about the November Macs event:
 
It seems most of the publications are reporting that the event next week will be iPhones and HomePad, maybe with a "now available" for the Air Pad announced next month; and, that a November release is the likely event for new Macs. Apple is moving to less product intros per event, more events since they are virtual.

More events? How do you figure that? Counting October Apple will have had three events this year. With the exception of 2017 Apple has had three to four events a year. Unless Apple has an event for November and December they will have had four this year - same number they had in 2019 and 2018.

One note about the November Macs event:

Darn really wanted to see the what ARM Macs we would get. Got to wait another month. Joy.
 
Source?

Next week's event may or may not have Macs. Definitely hoping for yes.
Next week will not be Macs IMO. This will take away from iPhone. The latest rumors are November but keep in mind this is rumor not anything official coming from Apple. Apple's official timeline is before the end of the year so before December 31st, 2020
 
More events? How do you figure that? Counting October Apple will have had three events this year. With the exception of 2017 Apple has had three to four events a year. Unless Apple has an event for November and December they will have had four this year - same number they had in 2019 and 2018.

This was a post-reopening/pandemic decision. They will likely have events 3 fall months in a row this year. And if the reports are correct we will see more virtual events after that. I can't ever remember them doing that.
 
This was a post-reopening/pandemic decision. They will likely have events 3 fall months in a row this year. And if the reports are correct we will see more virtual events after that. I can't ever remember them doing that.

Ah I see. Yes, having more virtual events makes perfect sense. They have to be cheaper than physical events for everyone conserned.
 
What is the current status on that Apple vs Pepper class action lawsuit against Apple's App Store antitrust behavior? The one where the Supreme Court ruled 5-4 in favor of letting it proceed last year (May 2019)?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.