Can you find a use case for 750 MB/s sequential reads and writes? If you can, then just multiply that by 16 tasks running in parallel.I can't find a single use case for 12,000MB/s sequential read and writes.
Can you find a use case for 750 MB/s sequential reads and writes? If you can, then just multiply that by 16 tasks running in parallel.
If you use a computer for processing data, you can always take advantage of higher performance. When you have many CPU cores and they are trying to access the disk at the same time, SSD speed can easily become a bottleneck.
It's easy to find such use cases if you drop the end-user mindset where a computer is a tool people use. Consider it a machine that processes data instead. When you have enough data, you just launch new tasks until you run out of CPU cores / RAM / I/O bandwidth / another important resource.How many use cases could even saturate 16 750 MB/s reads & writes simultaneously?
Likely with 3nm M3 in Q1 2024 with Thunderbolt 5 80Gbps.Just wondering if/when Apple Silicon will support PCIe 5? By extension, when will Thunderbolt support PCIe 5 as well?
I agree that it's small random reads and writes that are most important in a boot drive. But, properly tuned and optimized, PCIe5 consumer SSDs should also offer significantly faster small random R/W's than is currently available from PCIe4. According to this review from https://www.storagereview.com/review/samsung-pm1743-ssd-review , we're seeing that now with enterprise SSD's (see screenshot).PCIe5 sequential read and write speeds doesn't really matter. You will still be bottlenecked by the much more important random read and write speeds, which isn't faster than PCIe3 or PCIe4 SSDs to begin with.
I can't find a single use case for 12,000MB/s sequential read and writes.
Yes, it's possible. According to this review from https://www.pcgamer.com/crucial-t700-pcie-5-ssd-preview/ , that can be a problem for PCIe 5 SSD's—they can generate so much heat that it causes writes to slow significantly. Though it should be noted that this was an engineering sample:one question though: how will those speeds of PCIe5 impact operating temperatures and could these gains actually lead into not much different performance because of more thermal throttling?
Apple doesn’t sell machines to process data. They sell tools that people use. Very few people have unlimited data that they want to pump through the system without end. Apple designs its systems so that they are able to handle the use cases of their users, not as some abstract data pump with no purpose.It's easy to find such use cases if you drop the end-user mindset where a computer is a tool people use. Consider it a machine that processes data instead. When you have enough data, you just launch new tasks until you run out of CPU cores / RAM / I/O bandwidth / another important resource.
PCIe is separate from Thunderbolt.But: Even if consumers don't need ≈12 GB/s I/O for external storage, there is one area where Apple probably will need such I/O speeds: External display support. If Apple wants its next XDR to be 6k@120 or 7k@120, it will probably need TB5.
So I'm wondering--could Apple support TB5's 80 Gb/s bandwidth by using 5 x PCIe4 lanes for each TB port (5 x 16 Gb/s = 80 Gb/s)? [I.e., could it practically support TB5 at full bandwidth without needing PCIe5?]
According to this, TB5 with PCIe4 x4 gives 4 x 16 = 64 Gb/s: https://www.anandtech.com/show/20050/intel-unveils-barlow-ridge-thunderbolt-5-controllers ): "
The Raptor Lake HX (mobile) refresh uses TB5 with PCIe4 x4, so it doesn't get the full 80 Gb/s bandwidth:
- "PCIe Gen4 x4 support (64 Gbps full duplex)"
"The biggest addition coming to Intel 14th Gen Core HX series laptops is that Intel is pushing vendors to include Thunderbolt 5/USB4v2 support. Because the platform itself doesn't natively integrate Thunderbolt 5 silicon, Intel is relying on their discrete Barlow Ridge Thunderbolt 5 controllers here, hanging it off of the PCIe 4.0 lanes coming from the processor itself. Barlow Ridge uses a PCIe 4.0 x4 link for backhaul purposes – and for those of you doing the math at home, no, that's not enough bandwidth to saturate a TB5 connection. Ultimately, TB5's high bandwidth is meant to afford a combination of data ana graphics (DisplayPort 2.1), so the data portion alone does not need to be able to fully saturate the entire link."
I think I understand the part between the die and the TB host controller; here's my summary:PCIe is separate from Thunderbolt.
A discrete Thunderbolt host controller such as Titan Ridge or Alpine Ridge is limited to PCIe gen 3 x4 upstream to the CPU and 40 Gbps downstream to other Thunderbolt devices.
An integrated Thunderbolt host controller such as in Ice Lake or Tiger Lake or Apple Silicon is not limited to PCIe gen 3 x4 upstream to the CPU because they are integrated to the CPU.
A Thunderbolt 3/4 peripheral controller is lmited to 40 Gbps upstream to the host or upstream Thunderbolt devices, 40 Gbps downstream to other Thunderbolt devices, and PCIe gen 3 x4 downstream to PCIe devices.
There exist now USB4 peripheral controllers that are limited to PCIe gen 4 x4 downstream to PCIe devices.
https://www.asmedia.com.tw/product/802zX91Yw3tsFgm4/C64ZX59yu4sY1GW5
They also have a USB4 host controller that is able to do PCIe gen 4 x4 upstream.
https://www.asmedia.com.tw/product/e20zx49yU0SZBUH5/363Zx80yu6sY3XH2
So if Apple wants to support Thunderbolt 5, it will be an integrated Thunderbolt 5 host controller. It will not use PCIe to connect to the CPU because it is inside the CPU. Apple can choose any speed connection from the CPU to the Integrated Thunderbolt controller.
Regarding PCIe links, these can be 1,2,4,8,16 lanes. So your proposed discrete Thunderbolt controller would be PCIe gen 4 x8 instead of x5. There's no reason a Thunderbolt host or peripheral controller couldn't use 8 or more PCIe lanes. The Thunderbolt controller is a bridge chip which means the upstream link doesn't need to be the same as the downstream link. Consider, the upstream can be PCIe (for a discrete Thunderbolt host controller) with 1, 2 or 4 lanes or a special CPU link (for an integrated Thunderbolt host controller) or Thunderbolt 10/20/40/80 Gbps (for a peripheral Thunderbolt controller). The downstream can be PCIe gen 3 or gen 4 with 1,2 or 4 lanes or the downstream can be Thunderbolt 1,2,3,4,5.
Correct. You could have a controller chip that connects to a CPU using something other than PCIe such as something like HyperTransport.You need some way for the CPU to interface with the TB host controller. If, as with AS, the controller is on-die, there is no need for PCIe.
OTOH, if the controller is discrete, as is the case with the Barlow Ridge TB5 controller in Intel Gen 14 HX systems, then you need a way to interface the CPU and controller, and that is typically (always?) done with PCIe.
I don't think you should think of it as aggrating two x4 to get x8. Rather, you have n lanes in a PCIe host controller, and you can divide them up into ports that are 1,2,4,8,16,32 lanes wide.In the latter case, if you had PCIe5, 4 lanes would be sufficient to get 80 Gbps. But if you only had PCIe4 (as is the case for those HX chips), then 4 lanes would get you only 64 Gbps. To get 80 Gbps you'd need 8 lanes, because while you can aggregate 2 x (PCIe4 x4) to get PCIe4 x8, you can't aggregate (PCIe4 x4) + (PCIe4 x1) to get PCIe4 x5, because the standard only allows for x1, x2, x4, x8, and x16.
I don't know the timing. TB5 is mostly just USB4v2. It builds on TB4 which is mostly USB4. I suppose USB4v2 spec was worked on by multiple companies/stake holders. Any of them could be developing hardware (integrated into a CPU or discrete) to test various ideas even before the specs are finalized.1) Does Apple's on-die integration of the TB host controller delay its ability to implement newer standards? For instance, suppose Intel's TB5 specs weren't finalized when Apple completed its design of M4. That seems to mean we won't see TB5 until at least M5. By comparison, because Intel uses discrete controllers, it is able to offer TB5 with its current Gen 14 chips, as an add-on, even though the final TB5 specs likely weren't available when the Gen 14 design was finalized.
Are you asking for a single tile method or a non-DSC method to get 6K+120?2) Given that M3 currently has TB4 host controllers baked into the die, and given their current Display Engines, is there any way Apple could support 6k@120 to 8k@120 over a single TB cable with M3 (without using DSC with dual-tile HBR3, which you and I determined would be theoretically sufficient to support 6.7k@120)? For instance, suppose Apple wanted to release a 7k@120 version of the XDR along with the M3 Studio. Could they aggregate the output of two TB4 controllers to output 80 Gbps through a single USB-C port?
Would you happen to know the the total I/O bandwidth of the M2 Ultra Studio and Mac Pro? I've found it hard to get clear info on that. I'm specifically curious if the direct PCIe access afforded by the MP translates into more total I/O bandwidth than is available from the Ultra Studio.Here's a PCI-PCI bridge (PCIe Gen3 switch) with 96 lanes (used by Mac Pro 2019):
https://docs.broadcom.com/doc/12351860
The lanes can be divided into 24 ports. Each port can have 4,8,16 lanes. Up to 4 ports can be upstream (the Mac Pro uses two x16 upstream ports). The rest are downstream ports.
I was asking for any method that didn't require compression beyond 12 bpp, since you've said that's what Apple uses now. But I think you addressed that.Are you asking for a single tile method or a non-DSC method to get 6K+120?
I don't know.Would you happen to know the the total I/O bandwidth of the M2 Ultra Studio and Mac Pro? I've found it hard to get clear info on that. I'm specifically curious if the direct PCIe access afforded by the MP translates into more total I/O bandwidth than is available from the Ultra Studio.
Ah, I just found this for the M2 Ultra in the MP! Your guess was right—32 lanes, with 8 dedicated to the SSD:The M2 Ultra Studio has 6 Thunderbolt 4 ports. Let's say 6 * 4 = 24 PCIe gen 3 lanes (although Thunderbolt is usually limited to ≈24 Gbps). + 8 for SSD. 252 Gbps?
RAM is ≈800 GB/s.
The lanes I was thinking about were mostly for the Thunderbolt ports of the Mac Studio which do tunnelled PCIe to downstream Thunderbolt devices.Ah, I just found this for the M2 Ultra in the MP! Your guess was right—32 lanes, with 8 dedicated to the SSD:
PCIe bandwidth
The M2 Ultra chip provides 32 lanes of PCIe gen 4 to the system, with 8 lanes dedicated to the internal SSD. The M2 Ultra chip connects to the PCIe slots through a PCIe switch and provides 24 lanes of gen 4 bandwidth. Pool A provides a maximum of 16 lanes of gen 4 bandwidth and Pool B provides a maximum of 8 lanes of gen 4 bandwidth.
32 Gb/s is 4000 MB/s but a gen 3 x4 NVMe usually doesn't do more than 3500 MB/s and Thunderbolt usually doesn't do more than 2800 MB/s though some benchmarks have shown 3200 MB/s.It's PCIe4, so 24 lanes (not incl. SSD) x 15.754 Gb/s/lane = 378 Gb/s external I/O (I'm counting the internal USB-A and SATA ports as part of external I/O, since I mean external to the SoC).
I thought TB4 maxed out at 32 Gb/s for data:
Remember that the Thunderbolt controllers aren't connected with real PCIe so the 32 Gb/s is ballpark figure. Maybe a Thunderbolt 3 port can do 4000 MB/s (40 Gbps is 5000 MB/s) but nothing has seen those numbers, not even with the ASMedia ASM2464PD USB4v1 controller which has support for PCIe gen 4 x4 downstream but there have been some strange results in some benchmarks with over 5000 MB/s which is impossible for 40 Gbps Thunderbolt or USB4v1).If so, by comparison, here's my attempt to estimate what the Ultra Studio offers:
286 Gb/s external I/O max for video + data. I'm assuming two of the TB ports are sharing a single 64 Gb/s PCIe4 x4, and the remaining PCIe4 x4 is shared by the remaining ports:
6 TB ports, with 2 sharing a single PCIe4 x4 (4 x 40 Gb/s + 2 x 32 Gb/s) + 2 x USB-A (2 x 5 Gb/s) + Ethernet (10 Gb/s) + HDMI (42 Gb/s) = 286 Gb/s
212 Gb/s external I/O max for data only:
6 TB4 ports (6 x 32 Gb/s) + 2 x USB-A (2 x 5 Gb/s) + Ethernet (10 Gb/s) = 212 Gb/s
[I ignored the audio jack and the UHS-II SXDCII, since they are <0.5 Gb/s combined]
But at the same time, if you want to configure a high-speed external connection, when you have the I/O scattered across multiple ports it's probably hard to max them all out. So in practice the disparity in max data I/O between the Studio and MP is likely significantly more than the above figures indicate.
It's the same chip. According to Apple, the M2 Ultra has 32 PCIe4 lanes total, 8 of which are reserved for the SSD. I was thinking there's wasted PCIe I/O capability in the Studio because each TB4 port requires 4 lanes of PCIe 4. And the latter has 63 Gb/s bandwidth, only a portion of which can be used by the TB4 port. By contrast, with PCIe cards, you can utilize PCIe's full bandwidth.The lanes I was thinking about were mostly for the Thunderbolt ports of the Mac Studio which do tunnelled PCIe to downstream Thunderbolt devices.
I'm not sure how the M2 Ultra in a Mac Studio relates to a M2 Ultra in a Mac Pro. The M2 Ultra in a Mac Pro has the 6 Thunderbolt 4 ports and 8 lanes for SSDs, but also adds 24 lanes of gen 4 PCIe? Is it a different chip than the one used in a Mac Studio, or is this unused PCIe I/O capability on the Mac Studio?
Why are you referencing Gen 3 SSD's? Most higher-end consumer devices currently use Gen 4. For instance, here are the peak transfer rates measured by https://www.storagereview.com/review/wd-black-sn850x-ssd-review for the Gen 4 WD SN850X NVMe SSD. These use 4 PCIe4 lanes to achieve ≈6000 MB/s ≈ 48 Gb/s R/W). WD's marketing materials claim peak R/W of 7.3/6.6 GB/s (= 58/53 Gb/s), under ideal conditions, which is approacing PCIe4 x4's 63 Gb/s bandwidth.32 Gb/s is 4000 MB/s but a gen 3 x4 NVMe usually doesn't do more than 3500 MB/s and Thunderbolt usually doesn't do more than 2800 MB/s though some benchmarks have shown 3200 MB/s.
I was trying to figure out how the 24 downstream PCIe lanes might be distributed among 6 x TB + 2 x USB-A + 10 Gb Ethernet + HDMI + SD, so that's what I came up with. If each of the six TB ports got PCIe4 x4, there would be no dedicated PCIe lanes left for the rest of the ports. Thus the HDMI port, for instance, would have to share one set of 4 PCIe x4 lanes with one of the TB ports, and I didn't think there would be enough bandwidth to support both (plus I wasn't even sure if such sharing were possible).I don't think there's a difference with any of the Thunderbolt ports so I don't see why you would have "(4 x 40 Gb/s + 2 x 32 Gb/s)" for data+video instead of (6 x 40) although maybe it might be difficult to connect enough displays to fill all the ports. 6 HBR3 displays could do it but the specs say you can only connect 8 HBR2 displays.
Likely because PCIe dictate 1x, 4x, 8x, 16x and 32x. Cannot choose arbitrary lane counts.I was thinking there's wasted PCIe I/O capability in the Studio because each TB4 port requires 4 lanes of PCIe 4. And the latter has 64 Gb/s bandwidth, only a portion of which can be used by the TB4 port.
Yes, that was discussed above in posts #34–36.Likely because PCIe dictate 1x, 4x, 8x, 16x and 32x. Cannot choose arbitrary lane counts.
I was discussing PCIe bandwidth for the Thunderbolt ports and gen 3 x4 is usually more than the limit of Thunderbolt 3/4.Why are you referencing Gen 3 SSD's? Most higher-end consumer devices currently use Gen 4. For instance, here are the peak transfer rates measured by https://www.storagereview.com/review/wd-black-sn850x-ssd-review for the Gen 4 WD SN850X NVMe SSD. These use 4 PCIe4 lanes to achieve ≈6000 MB/s ≈ 48 Gb/s R/W). WD's marketing materials claim peak R/W of 7.3/6.6 GB/s (= 58/53 Gb/s), under ideal conditions, which is approacing PCIe4 x4's 63 Gb/s bandwidth.
Thunderbolt doesn't use the 24 PCIe lanes. The Mac Pro 2023 specs say that Thunderbolt is separate from the PCIe lanes.I was trying to figure out how the 24 downstream PCIe lanes might be distributed among 6 x TB + 2 x USB-A + 10 Gb Ethernet + HDMI + SD, so that's what I came up with. If each of the six TB ports got PCIe4 x4, there would be no dedicated PCIe lanes left for the rest of the ports. Thus the HDMI port, for instance, would have to share one set of 4 PCIe x4 lanes with one of the TB ports, and I didn't think there would be enough bandwidth to support both (plus I wasn't even sure if such sharing were possible).
Ah, I see what you're saying. I'd originally interpreted that differently.Thunderbolt doesn't use the 24 PCIe lanes. The Mac Pro 2023 specs say that Thunderbolt is separate from the PCIe lanes.
"Each built-in Thunderbolt port in Mac Pro is managed by its own controller integrated in the M2 Ultra chip and doesn't share bandwidth with the PCIe slots"
https://support.apple.com/en-euro/HT213663
So the question is, are those 24 PCIe lanes doing nothing in a Mac Studio?
The Ultra in either case is two Max's fused together. A Max has four Thunderbolt 4 ports so an Ultra can have eight but Apple exposes the last two of the eight ports only on Mac Pro? For space reasons, or for additional product differentiation.Ah, I see what you're saying. I'd originally interpreted that differently.
The Ultra Studio has 6 built-in TB ports, while the MP has 8. The MP uses PCIe for all the non-TB ports (other than HDMI, which you indicated is connected directly to the GPU). Thus, comparing the two machines for I/O, it seems we have:
M2 Ultra Studio: 6 x TB4 + Ethernet (10 Gb/s) + 2 x USB-A (2 x 5 Gb/s)
M2 Mac Pro 8 x TB4 + 24 x PCIe4
Still, part of me is puzzled by this: If there really were such a substantial difference, you'd think Apple would emphasize that in their marketing materials for the MP (i.e., not just the far greater variety of possible interfaces b/c of PCIe cards, but the far greater I/O bandwidth as well). Particularly since they've struggled to differentiate its capabilities from those of the Ultra Studio.
ioreg
from an Ultra Mac Studio and a Mac Pro 2023 would help find the differences.Thunderbolt doesn't use the 24 PCIe lanes. The Mac Pro 2023 specs say that Thunderbolt is separate from the PCIe lanes.
"Each built-in Thunderbolt port in Mac Pro is managed by its own controller integrated in the M2 Ultra chip and doesn't share bandwidth with the PCIe slots"
https://support.apple.com/en-euro/HT213663
So the question is, are those 24 PCIe lanes doing nothing in a Mac Studio?
Ah, I see what you're saying. I'd originally interpreted that differently.
The Ultra Studio has 6 built-in TB ports, while the MP has 8. The MP uses PCIe for all the non-TB ports (other than HDMI, which you indicated is connected directly to the GPU). Thus, comparing the two machines for I/O, it seems we have:
M2 Ultra Studio: 6 x TB4 + Ethernet (10 Gb/s) + 2 x USB-A (2 x 5 Gb/s)
M2 Mac Pro: 8 x TB4 + 24 x PCIe4
Still, part of me is puzzled by this: If there really were such a substantial difference, you'd think Apple would emphasize that in their marketing materials for the MP (i.e., not just the far greater variety of possible interfaces b/c of PCIe cards, but the far greater I/O bandwidth as well). Particularly since they've struggled to differentiate its capabilities from those of the Ultra Studio.
So, bottom line, what's your calculation of the extra I/O bandwidth provided by the MP over the Ultra Studio (US)?All the M-series chips use one of the four x1 PCI-e v4 links for the non-TB/non-video ports (e.g. USB 3 only Type C/A ) .
iMac M1 teardown ( step 10 )
Red square below
" ...
- ASMedia ASM3142 PCIe-to-USB 3.1 Gen 2 controller
..."
iMac M1 24" Teardown
Is the new M1 iMac just a silly-sized M1 iPad, a detached-keyboard M1 MacBook, an M1 Mac Mini stretched out like pizza dough? We’ll only know by...www.ifixit.com
If same stuff different day for Mx Pro , Mx Max. The Ultra has about double the number of 'one-sies' dangling around. None of that is useful for backhaul to the dual input slot switch in the Mac Pro.
The Mac Pro has two Ethernet Ports and two HDMI ports also. Even before get to the "x16+x8 PCI-e v4" have already doubled up on the Mac Studio Ultra. ( the external USB-A is the same ... but somewhat running out of space on the custom I/O add-in card ( headphone jack , two HDMI , two USB A ). The six thunderbolt basically fill up the edge on that card also.
If count the internal (e.g., software dongle key ) USB-A slot it is 1.5 the number of USB-A ports. Again more ports. And have SATA ports that the Studio has ZERO of.
All of those these only require dipping into the 'excess' x1 PCI-e v4 lanes lying around.
The Mac Pro is just physically bigger so it has more edge space. The Studio in part has just 6 TB sockets because the Max version can only dribble out 4. (and the front backslides to USB). They are not trying to drawn tons of "Thunderbolt" attention to those front slots because in 'half' the configurations the are not even Thunderbolt slots at all. ( just a discrete USB 3 controller; not all that much different than the one on the iMac 4 port model. ). [ The Mini , iMac , and regular MP 2019 (or 2013) don't have front sockets ... useful , but Apple has been rowing in the opposite direction for over a decade. I suspect it is not a coincidence came back after Ive left. ]
Apple hasn't struggled to differentiate. They have just been lazy. They don't want to 'talk up' more Type-A ports. (Type is the 'future'). They don't want to talk about SATA drives. It is 'lazy' in that they are leaving mostly to the "intuitively obvious" that SATA drive lovers will see the SATA connector and pick it out themselves. The internal key dongle thing... ditto. Someone with a pile of x8-x16 SSD cards. HDR 8K video capture.
I suspect they are also trying to duck the backlash from the "GPU cards are the only useful high bandwidth PCI-e cards on the planet" crowd. They have purposely targeted a subset of PCI-e cards and the folks that own those cards know what they need.
P.S. I don't know if there has been a detailed examination of the M2 Ultra package , but I don't think it clear if the UltraFusion play are role in provisioning the x24 lanes or not. It doesn't have to be the same Ultra package.