Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

JouniS

macrumors 6502a
Nov 22, 2020
613
377
I can't find a single use case for 12,000MB/s sequential read and writes.
Can you find a use case for 750 MB/s sequential reads and writes? If you can, then just multiply that by 16 tasks running in parallel.

If you use a computer for processing data, you can always take advantage of higher performance. When you have many CPU cores and they are trying to access the disk at the same time, SSD speed can easily become a bottleneck.
 

dmccloud

macrumors 68030
Sep 7, 2009
2,990
1,727
Anchorage, AK
Can you find a use case for 750 MB/s sequential reads and writes? If you can, then just multiply that by 16 tasks running in parallel.

If you use a computer for processing data, you can always take advantage of higher performance. When you have many CPU cores and they are trying to access the disk at the same time, SSD speed can easily become a bottleneck.

How many use cases could even saturate 16 750 MB/s reads & writes simultaneously?
 
  • Like
Reactions: jdb8167

unrigestered

Suspended
Jun 17, 2022
879
840
faster will always be better.
one question though: how will those speeds of PCIe5 impact operating temperatures and could these gains actually lead into not much different performance because of more thermal throttling?

in most real world scenarios, faster access times would benefit most people more though, unless it's your job or hobby to move files in the size of giga or terrabytes around (which of course is also still an important metric)
 

JouniS

macrumors 6502a
Nov 22, 2020
613
377
How many use cases could even saturate 16 750 MB/s reads & writes simultaneously?
It's easy to find such use cases if you drop the end-user mindset where a computer is a tool people use. Consider it a machine that processes data instead. When you have enough data, you just launch new tasks until you run out of CPU cores / RAM / I/O bandwidth / another important resource.
 
  • Like
Reactions: aj_niner

Longplays

Suspended
May 30, 2023
1,308
1,156
Just wondering if/when Apple Silicon will support PCIe 5? By extension, when will Thunderbolt support PCIe 5 as well?
Likely with 3nm M3 in Q1 2024 with Thunderbolt 5 80Gbps.

3nm M3 Ultra in a 2025 Mac Pro with PCIe 5.0 will likely occur in Q1 2025.

This is assuming Apple maintains a 19.5 month cadence from M1 > M2 > M3.
 
Last edited:

theorist9

macrumors 68040
May 28, 2015
3,710
2,812
PCIe5 sequential read and write speeds doesn't really matter. You will still be bottlenecked by the much more important random read and write speeds, which isn't faster than PCIe3 or PCIe4 SSDs to begin with.

I can't find a single use case for 12,000MB/s sequential read and writes.
I agree that it's small random reads and writes that are most important in a boot drive. But, properly tuned and optimized, PCIe5 consumer SSDs should also offer significantly faster small random R/W's than is currently available from PCIe4. According to this review from https://www.storagereview.com/review/samsung-pm1743-ssd-review , we're seeing that now with enterprise SSD's (see screenshot).

Now you might argue that there's no reason you'd need to go beyond PCIe4 if your focus was on increasing small random R/W's, since even if you double those with PCIe5, you're still well under the PCIe4 limit.

That seems reasonable—yet if that's the case, why aren't we seeing faster small random R/W's in PCIe4 SSD's? The answer is probably that you need faster NAND to achieve this (ideally 2400 MT/s; see https://www.anandtech.com/show/18753/first-pcie-gen5-ssds-finally-hit-shelves-more-to-come#:~:text=As a result, due to,PCIe Gen5 SSDs for now.), and if a mfr is going to go to the expense to use that, they're going to go full PCIe5.

The performance downside of PCIe5 is that it uses more power and generates more heat, both of which are concerns for Apple (at least in their laptops). So Apple (which apparently doesn't use PCIe to interface with its internal storage) perhaps could use 2400 MT/s NAND to significantly increase small random R/W's, but limit the duration of large sustained sequential R/W's in their laptops specifically. That would allow them to advertise 12 GB/sec, like everyone else will be doing, thus avoiding a potential marketing headache (which would be exacerbated by how much Apple charges for storage upgrades)—while also avoiding excessive power consumption and heat generation.

1705289959990.png

one question though: how will those speeds of PCIe5 impact operating temperatures and could these gains actually lead into not much different performance because of more thermal throttling?
Yes, it's possible. According to this review from https://www.pcgamer.com/crucial-t700-pcie-5-ssd-preview/ , that can be a problem for PCIe 5 SSD's—they can generate so much heat that it causes writes to slow significantly. Though it should be noted that this was an engineering sample:

 
Last edited:

Tagbert

macrumors 603
Jun 22, 2011
5,664
6,632
Seattle
It's easy to find such use cases if you drop the end-user mindset where a computer is a tool people use. Consider it a machine that processes data instead. When you have enough data, you just launch new tasks until you run out of CPU cores / RAM / I/O bandwidth / another important resource.
Apple doesn’t sell machines to process data. They sell tools that people use. Very few people have unlimited data that they want to pump through the system without end. Apple designs its systems so that they are able to handle the use cases of their users, not as some abstract data pump with no purpose.
 
  • Like
Reactions: Adult80HD

theorist9

macrumors 68040
May 28, 2015
3,710
2,812
But: Even if consumers don't need ≈12 GB/s I/O for external storage, there is one area where Apple probably will need such I/O speeds: External display support. If Apple wants its next XDR to be 6k@120 or 7k@120, it will probably need TB5.

So I'm wondering--could Apple support TB5's 80 Gb/s bandwidth by using 5 x PCIe4 lanes for each TB port (5 x 16 Gb/s = 80 Gb/s)? [I.e., could it practically support TB5 at full bandwidth without needing PCIe5?]

According to this, TB5 with PCIe4 x4 gives 4 x 16 = 64 Gb/s: https://www.anandtech.com/show/20050/intel-unveils-barlow-ridge-thunderbolt-5-controllers ): "
  • "PCIe Gen4 x4 support (64 Gbps full duplex)"
The Raptor Lake HX (mobile) refresh uses TB5 with PCIe4 x4, so it doesn't get the full 80 Gb/s bandwidth:


"The biggest addition coming to Intel 14th Gen Core HX series laptops is that Intel is pushing vendors to include Thunderbolt 5/USB4v2 support. Because the platform itself doesn't natively integrate Thunderbolt 5 silicon, Intel is relying on their discrete Barlow Ridge Thunderbolt 5 controllers here, hanging it off of the PCIe 4.0 lanes coming from the processor itself. Barlow Ridge uses a PCIe 4.0 x4 link for backhaul purposes – and for those of you doing the math at home, no, that's not enough bandwidth to saturate a TB5 connection. Ultimately, TB5's high bandwidth is meant to afford a combination of data ana graphics (DisplayPort 2.1), so the data portion alone does not need to be able to fully saturate the entire link."
 
Last edited:
  • Like
Reactions: Tagbert

joevt

Contributor
Jun 21, 2012
6,689
4,086
But: Even if consumers don't need ≈12 GB/s I/O for external storage, there is one area where Apple probably will need such I/O speeds: External display support. If Apple wants its next XDR to be 6k@120 or 7k@120, it will probably need TB5.

So I'm wondering--could Apple support TB5's 80 Gb/s bandwidth by using 5 x PCIe4 lanes for each TB port (5 x 16 Gb/s = 80 Gb/s)? [I.e., could it practically support TB5 at full bandwidth without needing PCIe5?]

According to this, TB5 with PCIe4 x4 gives 4 x 16 = 64 Gb/s: https://www.anandtech.com/show/20050/intel-unveils-barlow-ridge-thunderbolt-5-controllers ): "
  • "PCIe Gen4 x4 support (64 Gbps full duplex)"
The Raptor Lake HX (mobile) refresh uses TB5 with PCIe4 x4, so it doesn't get the full 80 Gb/s bandwidth:


"The biggest addition coming to Intel 14th Gen Core HX series laptops is that Intel is pushing vendors to include Thunderbolt 5/USB4v2 support. Because the platform itself doesn't natively integrate Thunderbolt 5 silicon, Intel is relying on their discrete Barlow Ridge Thunderbolt 5 controllers here, hanging it off of the PCIe 4.0 lanes coming from the processor itself. Barlow Ridge uses a PCIe 4.0 x4 link for backhaul purposes – and for those of you doing the math at home, no, that's not enough bandwidth to saturate a TB5 connection. Ultimately, TB5's high bandwidth is meant to afford a combination of data ana graphics (DisplayPort 2.1), so the data portion alone does not need to be able to fully saturate the entire link."
PCIe is separate from Thunderbolt.
A discrete Thunderbolt host controller such as Titan Ridge or Alpine Ridge is limited to PCIe gen 3 x4 upstream to the CPU and 40 Gbps downstream to other Thunderbolt devices.
An integrated Thunderbolt host controller such as in Ice Lake or Tiger Lake or Apple Silicon is not limited to PCIe gen 3 x4 upstream to the CPU because they are integrated to the CPU.

A Thunderbolt 3/4 peripheral controller is lmited to 40 Gbps upstream to the host or upstream Thunderbolt devices, 40 Gbps downstream to other Thunderbolt devices, and PCIe gen 3 x4 downstream to PCIe devices.

There exist now USB4 peripheral controllers that are limited to PCIe gen 4 x4 downstream to PCIe devices.
https://www.asmedia.com.tw/product/802zX91Yw3tsFgm4/C64ZX59yu4sY1GW5
They also have a USB4 host controller that is able to do PCIe gen 4 x4 upstream.
https://www.asmedia.com.tw/product/e20zx49yU0SZBUH5/363Zx80yu6sY3XH2

So if Apple wants to support Thunderbolt 5, it will be an integrated Thunderbolt 5 host controller. It will not use PCIe to connect to the CPU because it is inside the CPU. Apple can choose any speed connection from the CPU to the Integrated Thunderbolt controller.

Regarding PCIe links, these can be 1,2,4,8,16 lanes. So your proposed discrete Thunderbolt controller would be PCIe gen 4 x8 instead of x5. There's no reason a Thunderbolt host or peripheral controller couldn't use 8 or more PCIe lanes. The Thunderbolt controller is a bridge chip which means the upstream link doesn't need to be the same as the downstream link. Consider, the upstream can be PCIe (for a discrete Thunderbolt host controller) with 1, 2 or 4 lanes or a special CPU link (for an integrated Thunderbolt host controller) or Thunderbolt 10/20/40/80 Gbps (for a peripheral Thunderbolt controller). The downstream can be PCIe gen 3 or gen 4 with 1,2 or 4 lanes or the downstream can be Thunderbolt 1,2,3,4,5.
 

theorist9

macrumors 68040
May 28, 2015
3,710
2,812
PCIe is separate from Thunderbolt.
A discrete Thunderbolt host controller such as Titan Ridge or Alpine Ridge is limited to PCIe gen 3 x4 upstream to the CPU and 40 Gbps downstream to other Thunderbolt devices.
An integrated Thunderbolt host controller such as in Ice Lake or Tiger Lake or Apple Silicon is not limited to PCIe gen 3 x4 upstream to the CPU because they are integrated to the CPU.

A Thunderbolt 3/4 peripheral controller is lmited to 40 Gbps upstream to the host or upstream Thunderbolt devices, 40 Gbps downstream to other Thunderbolt devices, and PCIe gen 3 x4 downstream to PCIe devices.

There exist now USB4 peripheral controllers that are limited to PCIe gen 4 x4 downstream to PCIe devices.
https://www.asmedia.com.tw/product/802zX91Yw3tsFgm4/C64ZX59yu4sY1GW5
They also have a USB4 host controller that is able to do PCIe gen 4 x4 upstream.
https://www.asmedia.com.tw/product/e20zx49yU0SZBUH5/363Zx80yu6sY3XH2

So if Apple wants to support Thunderbolt 5, it will be an integrated Thunderbolt 5 host controller. It will not use PCIe to connect to the CPU because it is inside the CPU. Apple can choose any speed connection from the CPU to the Integrated Thunderbolt controller.

Regarding PCIe links, these can be 1,2,4,8,16 lanes. So your proposed discrete Thunderbolt controller would be PCIe gen 4 x8 instead of x5. There's no reason a Thunderbolt host or peripheral controller couldn't use 8 or more PCIe lanes. The Thunderbolt controller is a bridge chip which means the upstream link doesn't need to be the same as the downstream link. Consider, the upstream can be PCIe (for a discrete Thunderbolt host controller) with 1, 2 or 4 lanes or a special CPU link (for an integrated Thunderbolt host controller) or Thunderbolt 10/20/40/80 Gbps (for a peripheral Thunderbolt controller). The downstream can be PCIe gen 3 or gen 4 with 1,2 or 4 lanes or the downstream can be Thunderbolt 1,2,3,4,5.
I think I understand the part between the die and the TB host controller; here's my summary:

You need some way for the CPU to interface with the TB host controller. If, as with AS, the controller is on-die, there is no need for PCIe.

OTOH, if the controller is discrete, as is the case with the Barlow Ridge TB5 controller in Intel Gen 14 HX systems, then you need a way to interface the CPU and controller, and that is typically (always?) done with PCIe. In the latter case, if you had PCIe5, 4 lanes would be sufficient to get 80 Gbps. But if you only had PCIe4 (as is the case for those HX chips), then 4 lanes would get you only 64 Gbps. To get 80 Gbps you'd need 8 lanes, because while you can aggregate 2 x (PCIe4 x4) to get PCIe4 x8, you can't aggregate (PCIe4 x4) + (PCIe4 x1) to get PCIe4 x5, because the standard only allows for x1, x2, x4, x8, and x16.

Questions:

1) Does Apple's on-die integration of the TB host controller delay its ability to implement newer standards? For instance, suppose Intel's TB5 specs weren't finalized when Apple completed its design of M4. That seems to mean we won't see TB5 until at least M5. By comparison, because Intel uses discrete controllers, it is able to offer TB5 with its current Gen 14 chips, as an add-on, even though the final TB5 specs likely weren't available when the Gen 14 design was finalized.

2) Given that M3 currently has TB4 host controllers baked into the die, and given their current Display Engines, is there any way Apple could support 6k@120 to 8k@120 over a single TB cable with M3 (without using DSC with dual-tile HBR3, which you and I determined would be theoretically sufficient to support 6.7k@120)? For instance, suppose Apple wanted to release a 7k@120 version of the XDR along with the M3 Studio. Could they aggregate the output of two TB4 controllers to output 80 Gbps through a single USB-C port?
 
Last edited:
  • Like
Reactions: drrich2

joevt

Contributor
Jun 21, 2012
6,689
4,086
You need some way for the CPU to interface with the TB host controller. If, as with AS, the controller is on-die, there is no need for PCIe.

OTOH, if the controller is discrete, as is the case with the Barlow Ridge TB5 controller in Intel Gen 14 HX systems, then you need a way to interface the CPU and controller, and that is typically (always?) done with PCIe.
Correct. You could have a controller chip that connects to a CPU using something other than PCIe such as something like HyperTransport.

In the latter case, if you had PCIe5, 4 lanes would be sufficient to get 80 Gbps. But if you only had PCIe4 (as is the case for those HX chips), then 4 lanes would get you only 64 Gbps. To get 80 Gbps you'd need 8 lanes, because while you can aggregate 2 x (PCIe4 x4) to get PCIe4 x8, you can't aggregate (PCIe4 x4) + (PCIe4 x1) to get PCIe4 x5, because the standard only allows for x1, x2, x4, x8, and x16.
I don't think you should think of it as aggrating two x4 to get x8. Rather, you have n lanes in a PCIe host controller, and you can divide them up into ports that are 1,2,4,8,16,32 lanes wide.

Here's a PCI-PCI bridge (PCIe Gen3 switch) with 96 lanes (used by Mac Pro 2019):
https://docs.broadcom.com/doc/12351860
The lanes can be divided into 24 ports. Each port can have 4,8,16 lanes. Up to 4 ports can be upstream (the Mac Pro uses two x16 upstream ports). The rest are downstream ports.

1) Does Apple's on-die integration of the TB host controller delay its ability to implement newer standards? For instance, suppose Intel's TB5 specs weren't finalized when Apple completed its design of M4. That seems to mean we won't see TB5 until at least M5. By comparison, because Intel uses discrete controllers, it is able to offer TB5 with its current Gen 14 chips, as an add-on, even though the final TB5 specs likely weren't available when the Gen 14 design was finalized.
I don't know the timing. TB5 is mostly just USB4v2. It builds on TB4 which is mostly USB4. I suppose USB4v2 spec was worked on by multiple companies/stake holders. Any of them could be developing hardware (integrated into a CPU or discrete) to test various ideas even before the specs are finalized.

I don't think Apple is going to let you use a discrete Thunderbolt 4 or 5 controller chip in an older Mac. They would rather you buy a new Mac. They didn't make drivers for Thunderbolt 4 Maple Ridge host controllers. You have to go through some hoops to get a discrete Thunderbolt 3 host controller working on Macs that didn't come with Thunderbolt.

2) Given that M3 currently has TB4 host controllers baked into the die, and given their current Display Engines, is there any way Apple could support 6k@120 to 8k@120 over a single TB cable with M3 (without using DSC with dual-tile HBR3, which you and I determined would be theoretically sufficient to support 6.7k@120)? For instance, suppose Apple wanted to release a 7k@120 version of the XDR along with the M3 Studio. Could they aggregate the output of two TB4 controllers to output 80 Gbps through a single USB-C port?
Are you asking for a single tile method or a non-DSC method to get 6K+120?

I think the Thunderbolt controllers are mostly separate. Or at least, you can't get a second Thunderbolt controller to help provide data for the Thunderbolt port of the first Thunderbolt controller.

For a single tile method (with DSC):

1) Allow DSC target bpp to be less than 12. This one seems like the easiest/simplest method. DSC@9bpp is sufficient for 6K120. You can choose sixteenths of a pixel so 9.88 is an option. 8K120 is too much for DSC@8bpp with only HBR3 x4. You can't do DSC lower than 8bpp (unless you add chroma sub sampling?).

2) Allow DisplayPort link rates greater than HBR3. DisplayPort link rate is defined as a multiplier of 0.27 Gbps. So 6 is used for RBR and 30 is used for HBR3. There's some weird ones such as 12 for 3.24 Gbps used by some Apple VGA adapters. A multiplier of 40 would be sufficient for 6K120 but I don't know if the hardware can support anything more than 30. USB4v1 defines DisplayPort link rate as a number between 0 and 3 (RBR, HBR, HBR2, HBR3). Does that mean it can't do the 3.24 Gbps of an Apple VGA adapter? The number has space for up to 15. Could Apple use one of those reserved values?

USB4v2 add DisplayPort 2.0 link rates. UHBR10 would be sufficient for 6K120. Can Apple just change the firmware of an existing Apple Silicon chip to do DisplayPort 2.0? I dunno.

In any case, Apple would rather you buy a new Mac than add capabilities to their older Macs. But we are talking about an M3 Studio Mac that doesn't exist yet. Maybe they could add DisplayPort 2.0 from USB4v2 without adding the new 80 or 120 Gbps USB4v2 link rates. Since Thunderbolt 4 is 20 Gbps per lane, they could do 80 Gbps DisplayPort 2.0 (UHBR20) with USB 2.0 or have a separate connection for Thunderbolt or USB 3.1 gen 2.

For a tiled method (without DSC):
Can't be done using dual HBR3 x4 (using two separate Thunderbolt ports). Dual UHBR10 or single UHBR20 can do 6K120 without HDR. You need dual HBR13.5 to do 6K120 with HDR.
 

theorist9

macrumors 68040
May 28, 2015
3,710
2,812
Here's a PCI-PCI bridge (PCIe Gen3 switch) with 96 lanes (used by Mac Pro 2019):
https://docs.broadcom.com/doc/12351860
The lanes can be divided into 24 ports. Each port can have 4,8,16 lanes. Up to 4 ports can be upstream (the Mac Pro uses two x16 upstream ports). The rest are downstream ports.
Would you happen to know the the total I/O bandwidth of the M2 Ultra Studio and Mac Pro? I've found it hard to get clear info on that. I'm specifically curious if the direct PCIe access afforded by the MP translates into more total I/O bandwidth than is available from the Ultra Studio.
Are you asking for a single tile method or a non-DSC method to get 6K+120?
I was asking for any method that didn't require compression beyond 12 bpp, since you've said that's what Apple uses now. But I think you addressed that.
 

joevt

Contributor
Jun 21, 2012
6,689
4,086
Would you happen to know the the total I/O bandwidth of the M2 Ultra Studio and Mac Pro? I've found it hard to get clear info on that. I'm specifically curious if the direct PCIe access afforded by the MP translates into more total I/O bandwidth than is available from the Ultra Studio.
I don't know.

The Mac Pro 2019 has 64 lanes of PCIe gen 3 from the CPU plus 4 more (DMI) to the PCH? 535 Gbps?
RAM is ≈140.8 GB/s

The M2 Ultra Studio has 6 Thunderbolt 4 ports. Let's say 6 * 4 = 24 PCIe gen 3 lanes (although Thunderbolt is usually limited to ≈24 Gbps). + 8 for SSD. 252 Gbps?
RAM is ≈800 GB/s.

One way to find out is connect a drive to every port and run ATTO Disk Benchmark.app with all disks selected. For the Mac Pro, you need to connect ≈16 gen 3 NVMe drives (I would say 5 or 6 per 16 lane pool to be sure to maximize each pool ≈ 24 gen 3 NVMe drives).
 

theorist9

macrumors 68040
May 28, 2015
3,710
2,812
The M2 Ultra Studio has 6 Thunderbolt 4 ports. Let's say 6 * 4 = 24 PCIe gen 3 lanes (although Thunderbolt is usually limited to ≈24 Gbps). + 8 for SSD. 252 Gbps?
RAM is ≈800 GB/s.
Ah, I just found this for the M2 Ultra in the MP! Your guess was right—32 lanes, with 8 dedicated to the SSD:

PCIe bandwidth​

The M2 Ultra chip provides 32 lanes of PCIe gen 4 to the system, with 8 lanes dedicated to the internal SSD. The M2 Ultra chip connects to the PCIe slots through a PCIe switch and provides 24 lanes of gen 4 bandwidth. Pool A provides a maximum of 16 lanes of gen 4 bandwidth and Pool B provides a maximum of 8 lanes of gen 4 bandwidth.



It's PCIe4, so 24 lanes (not incl. SSD) x 15.754 Gb/s/lane = 378 Gb/s external I/O (I'm counting the internal USB-A and SATA ports as part of external I/O since, like the PCIe slots, they connect to devices that are logically, if not physically, external to the built-in system).

I thought TB4 maxed out at 32 Gb/s for data:

1705453925567.png



Source: https://eshop.macsales.com/blog/63715-intel-introduces-thunderbolt-4-what-is-it-and-does-it-matter/

If so, by comparison, here's my attempt to estimate what the Ultra Studio offers:

286 Gb/s external I/O max for video + data. I'm assuming two of the TB ports are sharing a single 64 Gb/s PCIe4 x4, and the remaining PCIe4 x4 is shared by the remaining non-TB ports:

6 TB ports, with 2 sharing a single PCIe4 x4 (4 x 40 Gb/s + 2 x 32 Gb/s) + 2 x USB-A (2 x 5 Gb/s) + Ethernet (10 Gb/s) + HDMI (42 Gb/s) = 286 Gb/s

212 Gb/s external I/O max for data only:

6 TB4 ports (6 x 32 Gb/s) + 2 x USB-A (2 x 5 Gb/s) + Ethernet (10 Gb/s) = 212 Gb/s

[I ignored the audio jack and the UHS-II SXDCII, since they are <0.5 Gb/s combined]

But at the same time, if you want to configure a high-speed external connection, when you have the I/O scattered across multiple ports it's probably hard to max them all out. So in practice the disparity in max external I/O between the Studio and MP is likely significantly more than the above figures indicate.
 
Last edited:
  • Like
Reactions: tenthousandthings

joevt

Contributor
Jun 21, 2012
6,689
4,086
Ah, I just found this for the M2 Ultra in the MP! Your guess was right—32 lanes, with 8 dedicated to the SSD:

PCIe bandwidth​

The M2 Ultra chip provides 32 lanes of PCIe gen 4 to the system, with 8 lanes dedicated to the internal SSD. The M2 Ultra chip connects to the PCIe slots through a PCIe switch and provides 24 lanes of gen 4 bandwidth. Pool A provides a maximum of 16 lanes of gen 4 bandwidth and Pool B provides a maximum of 8 lanes of gen 4 bandwidth.
The lanes I was thinking about were mostly for the Thunderbolt ports of the Mac Studio which do tunnelled PCIe to downstream Thunderbolt devices.

I'm not sure how the M2 Ultra in a Mac Studio relates to a M2 Ultra in a Mac Pro. The M2 Ultra in a Mac Pro has the 6 Thunderbolt 4 ports and 8 lanes for SSDs, but also adds 24 lanes of gen 4 PCIe? Is it a different chip than the one used in a Mac Studio, or is this unused PCIe I/O capability on the Mac Studio?

It's PCIe4, so 24 lanes (not incl. SSD) x 15.754 Gb/s/lane = 378 Gb/s external I/O (I'm counting the internal USB-A and SATA ports as part of external I/O, since I mean external to the SoC).

I thought TB4 maxed out at 32 Gb/s for data:
32 Gb/s is 4000 MB/s but a gen 3 x4 NVMe usually doesn't do more than 3500 MB/s and Thunderbolt usually doesn't do more than 2800 MB/s though some benchmarks have shown 3200 MB/s.

If so, by comparison, here's my attempt to estimate what the Ultra Studio offers:

286 Gb/s external I/O max for video + data. I'm assuming two of the TB ports are sharing a single 64 Gb/s PCIe4 x4, and the remaining PCIe4 x4 is shared by the remaining ports:

6 TB ports, with 2 sharing a single PCIe4 x4 (4 x 40 Gb/s + 2 x 32 Gb/s) + 2 x USB-A (2 x 5 Gb/s) + Ethernet (10 Gb/s) + HDMI (42 Gb/s) = 286 Gb/s

212 Gb/s external I/O max for data only:

6 TB4 ports (6 x 32 Gb/s) + 2 x USB-A (2 x 5 Gb/s) + Ethernet (10 Gb/s) = 212 Gb/s

[I ignored the audio jack and the UHS-II SXDCII, since they are <0.5 Gb/s combined]

But at the same time, if you want to configure a high-speed external connection, when you have the I/O scattered across multiple ports it's probably hard to max them all out. So in practice the disparity in max data I/O between the Studio and MP is likely significantly more than the above figures indicate.
Remember that the Thunderbolt controllers aren't connected with real PCIe so the 32 Gb/s is ballpark figure. Maybe a Thunderbolt 3 port can do 4000 MB/s (40 Gbps is 5000 MB/s) but nothing has seen those numbers, not even with the ASMedia ASM2464PD USB4v1 controller which has support for PCIe gen 4 x4 downstream but there have been some strange results in some benchmarks with over 5000 MB/s which is impossible for 40 Gbps Thunderbolt or USB4v1).

I don't think there's a difference with any of the Thunderbolt ports so I don't see why you would have "(4 x 40 Gb/s + 2 x 32 Gb/s)" for data+video instead of (6 x 40) although maybe it might be difficult to connect enough displays to fill all the ports. 6 HBR3 displays could do it but the specs say you can only connect 8 HBR2 displays.

The ATTO Disk Benchmark.app lets you test multiple drives at the same time, no matter what they are connected to (SSD, USB, SD, Ethernet). So if you can connect a drive that fills the bandwidth of a single port, then you can compare the total of that with testing them together to see how it scales or to find other bottlenecks.
 

theorist9

macrumors 68040
May 28, 2015
3,710
2,812
The lanes I was thinking about were mostly for the Thunderbolt ports of the Mac Studio which do tunnelled PCIe to downstream Thunderbolt devices.

I'm not sure how the M2 Ultra in a Mac Studio relates to a M2 Ultra in a Mac Pro. The M2 Ultra in a Mac Pro has the 6 Thunderbolt 4 ports and 8 lanes for SSDs, but also adds 24 lanes of gen 4 PCIe? Is it a different chip than the one used in a Mac Studio, or is this unused PCIe I/O capability on the Mac Studio?
It's the same chip. According to Apple, the M2 Ultra has 32 PCIe4 lanes total, 8 of which are reserved for the SSD. I was thinking there's wasted PCIe I/O capability in the Studio because each TB4 port requires 4 lanes of PCIe 4. And the latter has 63 Gb/s bandwidth, only a portion of which can be used by the TB4 port. By contrast, with PCIe cards, you can utilize PCIe's full bandwidth.

To give an extreme example, if you dedicate 16 lanes to this PCIe NVMe RAID controller, you get 224 Gb/s transfer speeds (at least that's their claim)--far more than you'd get from 4 TB ports:

32 Gb/s is 4000 MB/s but a gen 3 x4 NVMe usually doesn't do more than 3500 MB/s and Thunderbolt usually doesn't do more than 2800 MB/s though some benchmarks have shown 3200 MB/s.
Why are you referencing Gen 3 SSD's? Most higher-end consumer devices currently use Gen 4. For instance, here are the peak transfer rates measured by https://www.storagereview.com/review/wd-black-sn850x-ssd-review for the Gen 4 WD SN850X NVMe SSD. These use 4 PCIe4 lanes to achieve ≈6000 MB/s ≈ 48 Gb/s R/W). WD's marketing materials claim peak R/W of 7.3/6.6 GB/s (= 58/53 Gb/s), under ideal conditions, which is approacing PCIe4 x4's 63 Gb/s bandwidth.

1705472283865.png

I don't think there's a difference with any of the Thunderbolt ports so I don't see why you would have "(4 x 40 Gb/s + 2 x 32 Gb/s)" for data+video instead of (6 x 40) although maybe it might be difficult to connect enough displays to fill all the ports. 6 HBR3 displays could do it but the specs say you can only connect 8 HBR2 displays.
I was trying to figure out how the 24 downstream PCIe lanes might be distributed among 6 x TB + 2 x USB-A + 10 Gb Ethernet + HDMI + SD, so that's what I came up with. If each of the six TB ports got PCIe4 x4, there would be no dedicated PCIe lanes left for the rest of the ports. Thus the HDMI port, for instance, would have to share one set of 4 PCIe x4 lanes with one of the TB ports, and I didn't think there would be enough bandwidth to support both (plus I wasn't even sure if such sharing were possible).
 
Last edited:

quarkysg

macrumors 65816
Oct 12, 2019
1,233
823
I was thinking there's wasted PCIe I/O capability in the Studio because each TB4 port requires 4 lanes of PCIe 4. And the latter has 64 Gb/s bandwidth, only a portion of which can be used by the TB4 port.
Likely because PCIe dictate 1x, 4x, 8x, 16x and 32x. Cannot choose arbitrary lane counts.
 

joevt

Contributor
Jun 21, 2012
6,689
4,086
Why are you referencing Gen 3 SSD's? Most higher-end consumer devices currently use Gen 4. For instance, here are the peak transfer rates measured by https://www.storagereview.com/review/wd-black-sn850x-ssd-review for the Gen 4 WD SN850X NVMe SSD. These use 4 PCIe4 lanes to achieve ≈6000 MB/s ≈ 48 Gb/s R/W). WD's marketing materials claim peak R/W of 7.3/6.6 GB/s (= 58/53 Gb/s), under ideal conditions, which is approacing PCIe4 x4's 63 Gb/s bandwidth.
I was discussing PCIe bandwidth for the Thunderbolt ports and gen 3 x4 is usually more than the limit of Thunderbolt 3/4.

I was trying to figure out how the 24 downstream PCIe lanes might be distributed among 6 x TB + 2 x USB-A + 10 Gb Ethernet + HDMI + SD, so that's what I came up with. If each of the six TB ports got PCIe4 x4, there would be no dedicated PCIe lanes left for the rest of the ports. Thus the HDMI port, for instance, would have to share one set of 4 PCIe x4 lanes with one of the TB ports, and I didn't think there would be enough bandwidth to support both (plus I wasn't even sure if such sharing were possible).
Thunderbolt doesn't use the 24 PCIe lanes. The Mac Pro 2023 specs say that Thunderbolt is separate from the PCIe lanes.
"Each built-in Thunderbolt port in Mac Pro is managed by its own controller integrated in the M2 Ultra chip and doesn't share bandwidth with the PCIe slots"
https://support.apple.com/en-euro/HT213663
So the question is, are those 24 PCIe lanes doing nothing in a Mac Studio?

HDMI comes from the GPU, not PCIe.

You don't need lots of bandwidth to use a PCIe controller. You can connect a 128 Gbps device to a 2.5 Gbps slot and it will still work. It will just be slower. You can connect a dozen and they'll all still work, but the total can't transfer more than 2.5 Gbps. They have to share the upstream connection to the CPU. That Mac Pro 2023 document says you can shuffle devices between Pool A and Pool B to try and balance the two available upstream connections to the CPU.
 

theorist9

macrumors 68040
May 28, 2015
3,710
2,812
Thunderbolt doesn't use the 24 PCIe lanes. The Mac Pro 2023 specs say that Thunderbolt is separate from the PCIe lanes.
"Each built-in Thunderbolt port in Mac Pro is managed by its own controller integrated in the M2 Ultra chip and doesn't share bandwidth with the PCIe slots"
https://support.apple.com/en-euro/HT213663
So the question is, are those 24 PCIe lanes doing nothing in a Mac Studio?
Ah, I see what you're saying. I'd originally interpreted that differently.

The Ultra Studio has 6 built-in TB ports, while the MP has 8. The MP uses PCIe for all the non-TB ports (other than HDMI, which you indicated is connected directly to the GPU). Thus, comparing the two machines for I/O, it seems we have:

M2 Ultra Studio: 6 x TB4 + Ethernet (10 Gb/s) + 2 x USB-A (2 x 5 Gb/s)
M2 Mac Pro: 8 x TB4 + 24 x PCIe4

Still, part of me is puzzled by this: If there really were such a substantial difference, you'd think Apple would emphasize that in their marketing materials for the MP (i.e., not just the far greater variety of possible interfaces b/c of PCIe cards, but the far greater I/O bandwidth as well). Particularly since they've struggled to differentiate its capabilities from those of the Ultra Studio.
 
Last edited:
  • Like
Reactions: tenthousandthings

joevt

Contributor
Jun 21, 2012
6,689
4,086
Ah, I see what you're saying. I'd originally interpreted that differently.

The Ultra Studio has 6 built-in TB ports, while the MP has 8. The MP uses PCIe for all the non-TB ports (other than HDMI, which you indicated is connected directly to the GPU). Thus, comparing the two machines for I/O, it seems we have:

M2 Ultra Studio: 6 x TB4 + Ethernet (10 Gb/s) + 2 x USB-A (2 x 5 Gb/s)
M2 Mac Pro 8 x TB4 + 24 x PCIe4

Still, part of me is puzzled by this: If there really were such a substantial difference, you'd think Apple would emphasize that in their marketing materials for the MP (i.e., not just the far greater variety of possible interfaces b/c of PCIe cards, but the far greater I/O bandwidth as well). Particularly since they've struggled to differentiate its capabilities from those of the Ultra Studio.
The Ultra in either case is two Max's fused together. A Max has four Thunderbolt 4 ports so an Ultra can have eight but Apple exposes the last two of the eight ports only on Mac Pro? For space reasons, or for additional product differentiation. ioreg from an Ultra Mac Studio and a Mac Pro 2023 would help find the differences.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,311
3,902
Thunderbolt doesn't use the 24 PCIe lanes. The Mac Pro 2023 specs say that Thunderbolt is separate from the PCIe lanes.
"Each built-in Thunderbolt port in Mac Pro is managed by its own controller integrated in the M2 Ultra chip and doesn't share bandwidth with the PCIe slots"
https://support.apple.com/en-euro/HT213663
So the question is, are those 24 PCIe lanes doing nothing in a Mac Studio?

There is a thermal 'hit' for running those lanes so probably nothing in a Studio as it has less thermal headroom.
There are x1 PCI-e v4 lanes for stuff like USB-A , Wi-Fi , Ethernet that the Mini/iMac/laptops use. The large block that only the Mac Pro use can just be shut down. Or even binned into the Studio if it is broken.

Same issue with the UltraFusion connector for the Max in the MBP 14/16 and solo Max Studio. That block can be present and disused. It is a little wasteful, but not hugely so.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,311
3,902
Ah, I see what you're saying. I'd originally interpreted that differently.

The Ultra Studio has 6 built-in TB ports, while the MP has 8. The MP uses PCIe for all the non-TB ports (other than HDMI, which you indicated is connected directly to the GPU). Thus, comparing the two machines for I/O, it seems we have:

All the M-series chips use one of the four x1 PCI-e v4 links for the non-TB/non-video ports (e.g. USB 3 only Type C/A ) .

iMac M1 teardown ( step 10 )
Red square below

" ...

  • ASMedia ASM3142 PCIe-to-USB 3.1 Gen 2 controller

..."

e1SyUWhu5CCZOInO.large




If same stuff different day for Mx Pro , Mx Max. The Ultra has about double the number of 'one-sies' dangling around. None of that is useful for backhaul to the dual input slot switch in the Mac Pro.


M2 Ultra Studio: 6 x TB4 + Ethernet (10 Gb/s) + 2 x USB-A (2 x 5 Gb/s)
M2 Mac Pro: 8 x TB4 + 24 x PCIe4

Still, part of me is puzzled by this: If there really were such a substantial difference, you'd think Apple would emphasize that in their marketing materials for the MP (i.e., not just the far greater variety of possible interfaces b/c of PCIe cards, but the far greater I/O bandwidth as well). Particularly since they've struggled to differentiate its capabilities from those of the Ultra Studio.

The Mac Pro has two Ethernet Ports and two HDMI ports also. Even before get to the "x16+x8 PCI-e v4" have already doubled up on the Mac Studio Ultra. ( the external USB-A is the same ... but somewhat running out of space on the custom I/O add-in card ( headphone jack , two HDMI , two USB A ). The six thunderbolt basically fill up the edge on that card also.

If count the internal (e.g., software dongle key ) USB-A slot it is 1.5 the number of USB-A ports. Again more ports. And have SATA ports that the Studio has ZERO of.

All of those these only require dipping into the 'excess' x1 PCI-e v4 lanes lying around.


The Mac Pro is just physically bigger so it has more edge space. The Studio in part has just 6 TB sockets because the Max version can only dribble out 4. (and the front backslides to USB). They are not trying to drawn tons of "Thunderbolt" attention to those front slots because in 'half' the configurations the are not even Thunderbolt slots at all. ( just a discrete USB 3 controller; not all that much different than the one on the iMac 4 port model. ). [ The Mini , iMac , and regular MP 2019 (or 2013) don't have front sockets ... useful , but Apple has been rowing in the opposite direction for over a decade. I suspect it is not a coincidence came back after Ive left. ]


Apple hasn't struggled to differentiate. They have just been lazy. They don't want to 'talk up' more Type-A ports. (Type is the 'future'). They don't want to talk about SATA drives. It is 'lazy' in that they are leaving mostly to the "intuitively obvious" that SATA drive lovers will see the SATA connector and pick it out themselves. The internal key dongle thing... ditto. Someone with a pile of x8-x16 SSD cards. HDR 8K video capture.

I suspect they are also trying to duck the backlash from the "GPU cards are the only useful high bandwidth PCI-e cards on the planet" crowd. They have purposely targeted a subset of PCI-e cards and the folks that own those cards know what they need.


P.S. I don't know if there has been a detailed examination of the M2 Ultra package , but I don't think it clear if the UltraFusion play are role in provisioning the x24 lanes or not. It doesn't have to be the same Ultra package.
 

theorist9

macrumors 68040
May 28, 2015
3,710
2,812
All the M-series chips use one of the four x1 PCI-e v4 links for the non-TB/non-video ports (e.g. USB 3 only Type C/A ) .

iMac M1 teardown ( step 10 )
Red square below

" ...

  • ASMedia ASM3142 PCIe-to-USB 3.1 Gen 2 controller

..."

e1SyUWhu5CCZOInO.large




If same stuff different day for Mx Pro , Mx Max. The Ultra has about double the number of 'one-sies' dangling around. None of that is useful for backhaul to the dual input slot switch in the Mac Pro.




The Mac Pro has two Ethernet Ports and two HDMI ports also. Even before get to the "x16+x8 PCI-e v4" have already doubled up on the Mac Studio Ultra. ( the external USB-A is the same ... but somewhat running out of space on the custom I/O add-in card ( headphone jack , two HDMI , two USB A ). The six thunderbolt basically fill up the edge on that card also.

If count the internal (e.g., software dongle key ) USB-A slot it is 1.5 the number of USB-A ports. Again more ports. And have SATA ports that the Studio has ZERO of.

All of those these only require dipping into the 'excess' x1 PCI-e v4 lanes lying around.


The Mac Pro is just physically bigger so it has more edge space. The Studio in part has just 6 TB sockets because the Max version can only dribble out 4. (and the front backslides to USB). They are not trying to drawn tons of "Thunderbolt" attention to those front slots because in 'half' the configurations the are not even Thunderbolt slots at all. ( just a discrete USB 3 controller; not all that much different than the one on the iMac 4 port model. ). [ The Mini , iMac , and regular MP 2019 (or 2013) don't have front sockets ... useful , but Apple has been rowing in the opposite direction for over a decade. I suspect it is not a coincidence came back after Ive left. ]


Apple hasn't struggled to differentiate. They have just been lazy. They don't want to 'talk up' more Type-A ports. (Type is the 'future'). They don't want to talk about SATA drives. It is 'lazy' in that they are leaving mostly to the "intuitively obvious" that SATA drive lovers will see the SATA connector and pick it out themselves. The internal key dongle thing... ditto. Someone with a pile of x8-x16 SSD cards. HDR 8K video capture.

I suspect they are also trying to duck the backlash from the "GPU cards are the only useful high bandwidth PCI-e cards on the planet" crowd. They have purposely targeted a subset of PCI-e cards and the folks that own those cards know what they need.


P.S. I don't know if there has been a detailed examination of the M2 Ultra package , but I don't think it clear if the UltraFusion play are role in provisioning the x24 lanes or not. It doesn't have to be the same Ultra package.
So, bottom line, what's your calculation of the extra I/O bandwidth provided by the MP over the Ultra Studio (US)?

FWIW, Apple's documentation says the total for all the built-in non-TB ports in the MP totals 88% of the Pool A's PCIe4 x 8

The I/O card is 2 x HDMI + 2 x USB-A; the internal ports are 2 x SATA6 + USB-A; and the other external ports are 2 x 10 Gb ethernet.

Combined, that would be 2 x HDMI 2.1 (2 x 48 Gb/s) + 2 x 10 Gb/s ethernet + 2 x SATA6 (2 x 6 Gb/s) + 3 x 5 Gb/s USB-A + Wi-Fi 6E (9.6 Gb/s) + BT 5.3 (0.05 Gb/s) = 153 Gb/s

But 152.8 Gb/s seems too high, since 88% of PCIe4 x8 should bre .88 x 15.754 *8 = 111 Gb/s.



1705555533131.png
 

Attachments

  • 1705555527341.png
    1705555527341.png
    107.4 KB · Views: 27
  • 1705555640911.png
    1705555640911.png
    210.4 KB · Views: 22
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.