Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

M5-vs-All-Pro-chips-up-to-the-M4-Pro-v2.png

M5-vs-All-Max-chips-up-to-the-M4-Max.png

M5-vs-All-ultras-up-to-the-M3-Ultra.png
 
  • Like
Reactions: Antony Newman
Idk man. To get 512 GB of VRAM with Mac Studio, you are paying at least $10.000. So a 1.5TB Mac Studio is $30.000.

With NVIDIA, you need about 8 of these B200 chips, that would be around $240.000 to $400.000

However, this doesn't tell you the entire story, because those NVIDA chips do around 1000 tokens / second while 3 Mac Studio's would only do 54 tokens / second for the same LLM model.

So actually, an entire B200 system (yes I know, the upfront costs are high as f***) is actually way cheaper than Mac Studios per token for 1.5 TB VRAM.

And I'm not even talking about the limitations of trying to run LLM's on Mac relative to the industry standard, which is NVIDIA.
 
Last edited:
Idk man. To get 512 GB of VRAM with Mac Studio, you are paying at least $10.000. So a 1.5TB Mac Studio is $30.000.

With NVIDIA, you need about 8 of these B200 chips, that would be around $240.000 to $400.000

However, this doesn't tell you the entire story, because those NVIDA chips do around 1000 tokens / second while 3 Mac Studio's would only do 54 tokens / second for the same LLM model.

So actually, an entire B200 system (yes I know, the upfront costs are high as f***) is actually way cheaper than Mac Studios per token for 1.5 TB VRAM.

And I'm not even talking about the limitations of trying to run LLM's on Mac relative to the industry standard, which is NVIDIA.
Nobody is claiming that the Mac Studio, even a future M5 Ultra, replaces big iron chips ... hence the main comparators are GB10 and Strix Halo. This is about people running models locally on their own devices not on the cloud or at most small inference providers (the kind that have been building stacks of minis for inference) - people in the market for multiple Mac Studios are very unlikely to be in the market for Nvidia big iron, even if B200s were widely available to buy, which they aren't.
 
Last edited:
If Apple can get the M5 Ultra with an SFP port and the same price point, they'll have a major hit for the "home" market. Especially if they can keep the $10,000 Mac Studio in stock - which given Cook's expertise I'm assuming they can.

Home, of course, in quotes because there's probably not that many people willing to drop $40,000 on a local LLM no matter how fast it is, when tech is advancing so rapidly.
 
Idk man. To get 512 GB of VRAM with Mac Studio, you are paying at least $10.000. So a 1.5TB Mac Studio is $30.000.

With NVIDIA, you need about 8 of these B200 chips, that would be around $240.000 to $400.000

However, this doesn't tell you the entire story, because those NVIDA chips do around 1000 tokens / second while 3 Mac Studio's would only do 54 tokens / second for the same LLM model.

So actually, an entire B200 system (yes I know, the upfront costs are high as f***) is actually way cheaper than Mac Studios per token for 1.5 TB VRAM.

And I'm not even talking about the limitations of trying to run LLM's on Mac relative to the industry standard, which is NVIDIA.
A lot of people are, IMHO, missing the most interesting element of this sort of project.

What this stuff (mac cluster, MLX, and collection of working models) does is put non-trivial TRAINING at the disposal of small departments and grad students. This in turn allows for the investigation, via LLM, of various other scholarly disciplines.

One version of this is training models on corpuses that are restricted in some way, eg
This allows you to get at some feeling for what a "society" was really thinking at a given time, stripped of the opinions and theories projected backwards by our current times.

Another version of this might be to investigate linguistics, to see what's common (and not common) in models trained on different languages (for example the extent to which tokens match/cluster and don't in embedding space).

A third version might be to begin with a "blank-trained" model (something that has some sort of best-guess token clusters as described above, and see if that can extract useful information (even translation) from difficult and limited corpuses like hieroglyphics or cuneiform or old Chinese.

A fourth version might be to investigate alternative ways to cluster languages, build language trees, and try to construct/validate proto-models (especially for language families who've had much less work done on them than Indo-European).

The common thread in all this is that - there's interesting work to be done- it's now feasible- on a departmental budget- but it's not going to be done by the big labs because it's not especially relevant to their on-going interests (and why should it be? they can't do everything, and it's the job of the history/classics/humanities/archaeology/linguistics department to investigate these matters, using whatever new tools become available).
 
A lot of people are, IMHO, missing the most interesting element of this sort of project.

What this stuff (mac cluster, MLX, and collection of working models) does is put non-trivial TRAINING at the disposal of small departments and grad students. This in turn allows for the investigation, via LLM, of various other scholarly disciplines.

One version of this is training models on corpuses that are restricted in some way, eg
This allows you to get at some feeling for what a "society" was really thinking at a given time, stripped of the opinions and theories projected backwards by our current times.

Another version of this might be to investigate linguistics, to see what's common (and not common) in models trained on different languages (for example the extent to which tokens match/cluster and don't in embedding space).

A third version might be to begin with a "blank-trained" model (something that has some sort of best-guess token clusters as described above, and see if that can extract useful information (even translation) from difficult and limited corpuses like hieroglyphics or cuneiform or old Chinese.

A fourth version might be to investigate alternative ways to cluster languages, build language trees, and try to construct/validate proto-models (especially for language families who've had much less work done on them than Indo-European).

The common thread in all this is that - there's interesting work to be done- it's now feasible- on a departmental budget- but it's not going to be done by the big labs because it's not especially relevant to their on-going interests (and why should it be? they can't do everything, and it's the job of the history/classics/humanities/archaeology/linguistics department to investigate these matters, using whatever new tools become available).

We are already doing things like that in humanities, but we use university-provided NVIDIA clusters for training. A linguistics department is not going to buy a 40K stack of Studios if they can use existing infrastructure for much cheaper.
 
We are already doing things like that in humanities, but we use university-provided NVIDIA clusters for training. A linguistics department is not going to buy a 40K stack of Studios if they can use existing infrastructure for much cheaper.
Aye it always depends on personal situation, if you have good access to powerful clusters, then that's almost certainly what is going to be used most of the time for the biggest projects. And there is a correlation, well heeled departments/labs are typically also going to have access to better infrastructure. For humanities ... well ... from what I can gather these days ... given the tight budgets at most of their departments almost regardless of institution, they'd probably struggle with this sort of expenditure and rely on whatever cluster access they have.

At the same time, I do know that some computationally intensive labs will absolutely outlay this kind of money (again depending on the lab finances) for personal lab computation if they can justify its use. It eases the bottlenecks of access to the cluster infrastructure, gives more students direct access to powerful development machines, etc ... So I'd say there is a market - basically it's the same one as the academia workstation market. Smaller than it used to be of course, but it still exists.
 
Last edited:
  • Like
Reactions: Bungaree.Chubbins
Some people here might find this interesting:
<a href="https://stack.int.mov/a-reverse-engineers-anatomy-of-the-macos-boot-chain-security-architecture/">https://stack.int.mov/a-reverse-engineers-anatomy-of-the-macos-boot-chain-security-architecture/</a>

It's an EXTREMELY detailed description of every step of the Apple boot process. It's astonishing (to me) the amount of effort they have put into this, and the number of layers and details involved. As far as I know it "works", in the sense that there have been no exploits against the elements protected in this way. Of course there are exploits, from the inevitable phishing/social engineering to bugs in Safari, or Messages, or even the OS proper. But nothing catastrophic (eg page table manipulation). And of course they're not done; I assume at some point things like Safari will be refactored to further limit bug propagation as (I believe) has already happened to some extent with Messages. (The first version was not great, but the second version based on "lessons learned" so far seems to be holding up?)

It also raises, as an aside, questions about what we might see in the M5 Pro/Max and going forward.
Right now we are at 5+5+4 and 6+6+4. Will we move to a 6 E-core cluster? Seems plausible, at least in part (though of course Apple might not have know this at design finalization) given that QC's contender sitting between the Pro and the Max is essentially 6+6+6.
One relevance of this to the above article is that clearly part of hardening the system is creating these auxiliary "execution units" (call them hypervisor, exclave or whatever) that run in addition to the OS. This is all part of a larger project to split the OS into smaller units outside the kernel (user space networking and drivers, moving to user space file system though I don't think that's fully complete yet). The more of this stuff exists, the more independent work there is to do on E-cores. The other question is what's happening in PC-land? Intel seems to be all-in on their "small" cores, likewise AMD has their c cores, and I'm guessing will Intel with designs that have a few large and many small cores. This in turn (presumably) means a lot of developers looking at ways to split code over many smaller cores, which then feeds into Apple providing more E cores?

The extreme end of this is to add a new cluster, the O-cluster, basically say two (or four?) E-cores (probably no SME needed, so minor area) that executes NOTHING but Apple "trusted" code. This means that, no matter what weird Spectre or other exploits are devised, that sort of hostile code cannot ever get to run on sensitive cores to do its thing.
I don't know enough security to know if this makes sense, but it feels like the end-game?
 
Some people here might find this interesting:
<a href="https://stack.int.mov/a-reverse-engineers-anatomy-of-the-macos-boot-chain-security-architecture/">https://stack.int.mov/a-reverse-engineers-anatomy-of-the-macos-boot-chain-security-architecture/</a>

It's an EXTREMELY detailed description of every step of the Apple boot process. It's astonishing (to me) the amount of effort they have put into this, and the number of layers and details involved. As far as I know it "works", in the sense that there have been no exploits against the elements protected in this way. Of course there are exploits, from the inevitable phishing/social engineering to bugs in Safari, or Messages, or even the OS proper. But nothing catastrophic (eg page table manipulation). And of course they're not done; I assume at some point things like Safari will be refactored to further limit bug propagation as (I believe) has already happened to some extent with Messages. (The first version was not great, but the second version based on "lessons learned" so far seems to be holding up?)

It also raises, as an aside, questions about what we might see in the M5 Pro/Max and going forward.
Right now we are at 5+5+4 and 6+6+4. Will we move to a 6 E-core cluster? Seems plausible, at least in part (though of course Apple might not have know this at design finalization) given that QC's contender sitting between the Pro and the Max is essentially 6+6+6.
One relevance of this to the above article is that clearly part of hardening the system is creating these auxiliary "execution units" (call them hypervisor, exclave or whatever) that run in addition to the OS. This is all part of a larger project to split the OS into smaller units outside the kernel (user space networking and drivers, moving to user space file system though I don't think that's fully complete yet). The more of this stuff exists, the more independent work there is to do on E-cores. The other question is what's happening in PC-land? Intel seems to be all-in on their "small" cores, likewise AMD has their c cores, and I'm guessing will Intel with designs that have a few large and many small cores. This in turn (presumably) means a lot of developers looking at ways to split code over many smaller cores, which then feeds into Apple providing more E cores?

The extreme end of this is to add a new cluster, the O-cluster, basically say two (or four?) E-cores (probably no SME needed, so minor area) that executes NOTHING but Apple "trusted" code. This means that, no matter what weird Spectre or other exploits are devised, that sort of hostile code cannot ever get to run on sensitive cores to do its thing.
I don't know enough security to know if this makes sense, but it feels like the end-game?
I am not qualified to judge, but it might be worth knowing that some consider that site to be AI slop. It seems to have some considerable errors, as discussed here.

This is considered a more authoritative source.
 
At the same time, I do know that some computationally intensive labs will absolutely outlay this kind of money (again depending on the lab finances) for personal lab computation if they can justify its use. It eases the bottlenecks of access to the cluster infrastructure, gives more students direct access to powerful development machines, etc ... So I'd say there is a market - basically it's the same one as the academia workstation market. Smaller than it used to be of course, but it still exists.
It's less about having money and more about having the right kind of money.

In the common case, buying equipment with one grant and using it in other projects is fraud. Most research funders prefer funding specific projects, and they only allow using the money for expenses specific to the project. If you have a big project or an unusually flexible source of funding, you may be able to buy equipment directly. Otherwise the department or the university will use grant overheads for equipment. Sometimes they might even let you choose what to buy. Or you can use the grant money for buying services from an internal or external service provider, instead of buying the equipment directly.
 
It's less about having money and more about having the right kind of money.

In the common case, buying equipment with one grant and using it in other projects is fraud. Most research funders prefer funding specific projects, and they only allow using the money for expenses specific to the project. If you have a big project or an unusually flexible source of funding, you may be able to buy equipment directly. Otherwise the department or the university will use grant overheads for equipment. Sometimes they might even let you choose what to buy. Or you can use the grant money for buying services from an internal or external service provider, instead of buying the equipment directly.

Yep, and universities usually want you to use the shared resources instead of building new infrastructure (unless you are large enough to justify this). We are probably the largest linguistics research consortium in the world, and even we can't have our own infrastructure.

My last funding request for a group-internal storage solution was denied by the procurement with the reason that we should use university-provided storage services instead. Of course, we also have to pay for those, and they actually end up being more expensive overall — and of course we don't get additional funding to cover it. Very convenient :)
 
  • Like
Reactions: novagamer
Yep, and universities usually want you to use the shared resources instead of building new infrastructure (unless you are large enough to justify this). We are probably the largest linguistics research consortium in the world, and even we can't have our own infrastructure.

My last funding request for a group-internal storage solution was denied by the procurement with the reason that we should use university-provided storage services instead. Of course, we also have to pay for those, and they actually end up being more expensive overall — and of course we don't get additional funding to cover it. Very convenient :)
Colors of money matter tremendously. The hack is to get multiple awards from the same issuing body and befriend someone up high who wants them to all work together. I've seen this and it makes things so much easier.

Universities have it really rough, in the private sector you can negotiate but they want a much higher quality product in return.

Or develop things that require high security which can't be provided offsite or on shared devices :).

I've worked with a bunch of research labs on integrating their work into products and I have no idea how you don't go insane, so much if it is built on baling wire and hope and you often don't get to finish the work to the very end. Plus, much lower pay to boot. It's a screwed up system.
 
I am not qualified to judge, but it might be worth knowing that some consider that site to be AI slop. It seems to have some considerable errors, as discussed here.

This is considered a more authoritative source.
Hmm.
The most important of the links in this supposed takedown is dead.
The primary claim (about EL3) seems to be a nerdish complaint about the EXACT wording used in the first paragraph of the article, not anything especially deep.
And the overall complaint, "that it is AI slop", is this decade's ad hominem attack.

I don't give a fsck about whether an AI wrote something, just like I don't care whether it was written by a female, a gay, or an <insert nationality>. What I care about is
1. Is it correct/true? and
2. Is it well-written? (ie appropriately ordered, explains things well, appropriately connects later ideas with earlier ideas, etc)
The article meets both these criteria. In the form *I* read it, it appears to be correct (no important glaring errors, meshes with stuff I have read elsewhere). Were earlier versions incorrect? No idea, since the link supposedly exposing this smoking gun does not work...

It feels to me like the article you referenced is actually some uninteresting and ultimately pathetic security nerd infighting; someone is pissed off at someone else and resorts to these sorts of drive-by attacks.

As someone who has written vastly more than my fair share of attempts at explaining technical material based on difficult sources and the frequent need to best guess, I don't see anything especially wrong or problematic in an article having gone through multiple frequent rewrites. The only way you CAN figure out some of this stuff is to write up your best guess, and then see what sort of informed criticisms and corrections are generated.
Most experts in the community understand this and act constructively to help; but there are always the few youngsters or emotionally broken who feel it is some sort of personal affront that someone either got one technical fact wrong, or, even worse, is daring to explain to the masses the mysteries that are supposed to be *your* personal secret.
 
Hmm.
The most important of the links in this supposed takedown is dead.
The primary claim (about EL3) seems to be a nerdish complaint about the EXACT wording used in the first paragraph of the article, not anything especially deep.
And the overall complaint, "that it is AI slop", is this decade's ad hominem attack.

I don't give a fsck about whether an AI wrote something, just like I don't care whether it was written by a female, a gay, or an <insert nationality>. What I care about is
1. Is it correct/true? and
2. Is it well-written? (ie appropriately ordered, explains things well, appropriately connects later ideas with earlier ideas, etc)
The article meets both these criteria. In the form *I* read it, it appears to be correct (no important glaring errors, meshes with stuff I have read elsewhere). Were earlier versions incorrect? No idea, since the link supposedly exposing this smoking gun does not work...

It feels to me like the article you referenced is actually some uninteresting and ultimately pathetic security nerd infighting; someone is pissed off at someone else and resorts to these sorts of drive-by attacks.

As someone who has written vastly more than my fair share of attempts at explaining technical material based on difficult sources and the frequent need to best guess, I don't see anything especially wrong or problematic in an article having gone through multiple frequent rewrites. The only way you CAN figure out some of this stuff is to write up your best guess, and then see what sort of informed criticisms and corrections are generated.
Most experts in the community understand this and act constructively to help; but there are always the few youngsters or emotionally broken who feel it is some sort of personal affront that someone either got one technical fact wrong, or, even worse, is daring to explain to the masses the mysteries that are supposed to be *your* personal secret.
The problem isn’t people making mistakes. That’s perfectly normal. The issue here is the original article (diffs shown here which is loading fine for me. https://gist.github.com/nicolas17/81d082c93599c8bc70492caabb97289d/revisions) had so many errors and relied on many corrections which they didn’t attribute, that we no longer know if it’s correct in it’s current state or if it just hasn’t been corrected enough.
 
Hmm.
The most important of the links in this supposed takedown is dead.
The primary claim (about EL3) seems to be a nerdish complaint about the EXACT wording used in the first paragraph of the article, not anything especially deep.
And the overall complaint, "that it is AI slop", is this decade's ad hominem attack.
Why is it ad hominem to describe obvious AI slop as what it is?

I don't give a fsck about whether an AI wrote something, just like I don't care whether it was written by a female, a gay, or an <insert nationality>. What I care about is
1. Is it correct/true? and
2. Is it well-written? (ie appropriately ordered, explains things well, appropriately connects later ideas with earlier ideas, etc)
The article meets both these criteria.
No, it does not. It conflates a lot of things, it meanders, it is fuzzy, it uses overly emotive language to describe technical concepts, it mixes flights of fancy with things that are more plausibly factual without any sign of which is which. And it gets things just plain wrong.

It does have some dramatic lines that don't quite read like a LLM to me, more like a human blog owner trying to jazz up the AI slop. Consider the ridiculous opening line, "The security of the macOS platform on Apple Silicon is not defined by the kernel; it is defined by the physics of the die". Completely insecure SoCs rely on exactly the same semiconductor physics as Apple's chips! The security of the system derives mainly from design choices - the information content in the secure boot ROM, the remainder of the secure boot chain the ROM hands control off to, the design of the isolations between application processors and coprocessors, and so forth.

Even if we put that aside as tasteless hyperbole by a bad and clueless writer, we're still left with something that doesn't sit right. Hyping Apple's trust root and implying that it's something unique or noteworthy is just not a thing any real security researcher would do. I'm not one of those, but even I'm aware that there are countless non-Apple SoCs which also use a mask ROM as their root of trust. I worked on such a chip almost 15 years ago; it's been standard practice for a long time.

But hey, let's go back to "is it correct/true". Despite you claiming that EL3 is only a minor nerdish complaint about wording, getting that wrong was actually an important sign that the original blog post really was just slop.

Although EL3 is an optional Arm feature, it's present in all Arm Holdings designed CPU cores, which is what the vast majority of Arm platforms are built on. Most of these use EL3 to implement Arm's TrustZone, a secure monitor that runs the 'real' OS as a VM guest at EL2 or lower (meaning: with less privileges than TrustZone). So if you did nothing but read Arm Holdings documentation (or, as a LLM, were strongly influenced by how much of that is available in the public scrapable Web), you'd come away thinking that EL3 and TrustZone are a defining feature of all things Arm.

But in Apple's modern systems, EL3 and TrustZone simply do not exist. Although Apple does not document this in public, this was one of the first things noticed by M1 reverse engineering efforts (and even earlier; iirc they dropped EL3 several generations before A14/M1). If this blog had originally been written by anyone with a clue, they'd never have needed to correct that, because they would've gotten it right the first time. This is a notable area where Apple has diverged from the norm, which is supposed to be what the blogpost is about!

Even after the corrections, there's still plenty of signs that it's just LLM slop. One that stuck out to me is that it inappropriately refers to things as "New in Tahoe" which just... aren't. The first two are GXF and SPRR. Both of these are hardware features, therefore not introduced in Tahoe. Both are also much older than Tahoe - they've been around since at least M1/A14. Later, the post also claims that there's something new in Tahoe related to the "Guarded Execution Environment", in the process identifying that it thinks of the "Tahoe era" as "A15/M2+". This is such obvious slop - the LLM is conflating things that it shouldn't be, and got the timeline wrong to boot.
 
It's less about having money and more about having the right kind of money.

In the common case, buying equipment with one grant and using it in other projects is fraud. Most research funders prefer funding specific projects, and they only allow using the money for expenses specific to the project. If you have a big project or an unusually flexible source of funding, you may be able to buy equipment directly. Otherwise the department or the university will use grant overheads for equipment. Sometimes they might even let you choose what to buy. Or you can use the grant money for buying services from an internal or external service provider, instead of buying the equipment directly.

Yep, and universities usually want you to use the shared resources instead of building new infrastructure (unless you are large enough to justify this). We are probably the largest linguistics research consortium in the world, and even we can't have our own infrastructure.

My last funding request for a group-internal storage solution was denied by the procurement with the reason that we should use university-provided storage services instead. Of course, we also have to pay for those, and they actually end up being more expensive overall — and of course we don't get additional funding to cover it. Very convenient :)

Colors of money matter tremendously. The hack is to get multiple awards from the same issuing body and befriend someone up high who wants them to all work together. I've seen this and it makes things so much easier.

Universities have it really rough, in the private sector you can negotiate but they want a much higher quality product in return.

Or develop things that require high security which can't be provided offsite or on shared devices :).

I've worked with a bunch of research labs on integrating their work into products and I have no idea how you don't go insane, so much if it is built on baling wire and hope and you often don't get to finish the work to the very end. Plus, much lower pay to boot. It's a screwed up system.
Just for the record, my statement "again depending on the lab finances" was not really meant to be a full accounting of how labs acquire equipment. :) So yes, color of money matters.

Having said that, my personal experience in this was that labs could, and did, in fact get funding for capital expenditures, including lab computational resources, through grants (and yes overhead) though I will concede it was much harder than other types of expenses and there were generally more rules and, yes, more pushback. Of course that was my experience in my little corner of academia which was not in the last ... well ... let's just say more than 5 years. So given other fields and being admittedly out-of-date, I have no trouble believing that things have gotten worse resulting in different experiences on just how hard it is to acquire computational lab resources.
 
Last edited:
labs could, and did, in fact get funding for capital expenditures
Certainly. I'm sure there is wide variation depending on the funding source, but for e.g., the NIH (my main source - before the orange menace...) we routinely got capital expense funding, as long it was justified and linked to the project, like freezers or centrifuges etc.
 
  • Like
Reactions: crazy dave
Having said that, my personal experience in this was that labs could, and did, in fact get funding for capital expenditures, including lab computational resources, through grants (and yes overhead) though I will concede it was much harder than other types of expenses and there were generally more rules and, yes, more pushback. Of course that was my experience in my little corner of academia which was not in the last ... well ... let's just say more than 5 years. So given other fields and being admittedly out-of-date, I have no trouble believing that things have gotten worse resulting in different experiences on just how hard it is to acquire computational lab resources.

Oh, absolutely! It's just that in my experience generic computational infrastructure is much more harder to get through funding these days, unless you have a very good argument.
 
Comments?
I don't know enough about GPU details to know if this is something novel, and especially if the last paragraph of speculation is correct. But it's my best guess as to the meaning of https://patents.google.com/patent/US20250342645A1

More Efficient Texturing

This is a small tweak, but cute.
Texturing consists of two parts: we sample some number of points, the address of each generated in some way by using the appropriate mapping from triangle space to texture space; and then we filter those sampled points (that is, combine them in some way, eg bilinear filter) to smooth over the fact that the exact point we wanted to sample from the texture is generally some none-integer location, so we need to approximate the value we would gave sampled).
The normal use of the texture unit is that we use both of these functions \[Dash] multiple samples, followed by combining them via filtering.
The previously existing texture unit tied these together, in the sense that even if you did not want filtering (you're doing something like just reading the data out of a texture to blit it unmodified into a rectangle) the pipeline was locked into filtering and so could only output one sample per cycle (not filtered, but implemented as executing one, not four, samplings in the previous stage).

The insight of (2024) https://patents.google.com/patent/US20250342645A1 Mapping Texture Point Samples to Lanes of a Filter Pipeline is that we can do better than this with minor additions to the hardware.
If we don't want to filter, then we can sample four values per cycle, as before, bypass filtering, and return those four values in one cycle, giving 4x the blit bandwidth. "All" that's needed is to provide some wires to return the extra data per cycle, along with multiple small technical details, but that's the big idea.

Of course one can never know what Apple has in mind, but an interesting possibility here has to do with tensor memory. As mentioned earlier, one of the features nVidia added to the tensor core a few generations back was dedicated tensor memory attached to the tensor core and able to provide operands to the tensor core, overcoming the bandwidth limitations of sourcing operands from either registers or threadgroup memory.
It's possible that Apple has the same idea in mind for a future GPU, but using texture memory as the source rather than adding a new block of dedicated SRAM? That would obviously be a great way to boost large-matrix performance while not having to pay the cost of a lot of SRAM that's not useful for any other task.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.