Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
Not open for further replies.

Stacc

macrumors 6502a
Jun 22, 2005
888
353
That Ellesmere DS, and Ellesmere XT will be on GTX 980 Ti level of performance.

Pascal still seems to bring higher performance, ultimately. 2560 CUDA cores clocked at 1.48 GHz would bring 7.6 TFLOPs of compute power.

The question is: in what thermal envelope, and what price...

The performance being in the same ballpark as the 390X seems reasonable. Offering HBM and GDDR5(X) on the same GPU doesn't make any sense though. They would essentially be different chips as the memory controllers would be very different. Remember that you only need more memory bandwidth as you scale the performance of the GPU up. If this is targeting 390X performance, then a 256 bit memory controller with GDDR5X would give it the same bandwidth as the 390X. With the compression that has been implemented since Hawaii it should have plenty of bandwidth to meet or beat the 390X in performance. Also, given that Polaris is positioned as a "mainstream" part then HBM is likely too expensive.

I am interested to see if AMD keeps Polaris in the rumored 125 W to 150 W envelope. I am not sure why they wouldn't offer a Polaris chip with higher clocks at > 200 W. Maybe something like 480 (100 W), 480X(125 W), 480X "Super Ultra 1337 Gamerz Ghz Edition" (~200 W). If the 480/X is stuck around 390X performance than this way they could replace the Fury lineup as well and have some hope of competing with whatever Nvidia is about to release.
 

koyoot

macrumors 603
Jun 5, 2012
5,939
1,853
The performance being in the same ballpark as the 390X seems reasonable. Offering HBM and GDDR5(X) on the same GPU doesn't make any sense though. They would essentially be different chips as the memory controllers would be very different. Remember that you only need more memory bandwidth as you scale the performance of the GPU up. If this is targeting 390X performance, then a 256 bit memory controller with GDDR5X would give it the same bandwidth as the 390X. With the compression that has been implemented since Hawaii it should have plenty of bandwidth to meet or beat the 390X in performance. Also, given that Polaris is positioned as a "mainstream" part then HBM is likely too expensive.

I am interested to see if AMD keeps Polaris in the rumored 125 W to 150 W envelope. I am not sure why they wouldn't offer a Polaris chip with higher clocks at > 200 W. Maybe something like 480 (100 W), 480X(125 W), 480X "Super Ultra 1337 Gamerz Ghz Edition" (~200 W). If the 480/X is stuck around 390X performance than this way they could replace the Fury lineup as well and have some hope of competing with whatever Nvidia is about to release.
Why nobody considers R9 390X on the same level of performance as GTX 980 Ti, if currently, R9 390X is faster in DX12 than GTX 980 Ti and has similar compute power?

Secondly, they do not need to release Gaming edition with higher clocks. Polaris will have new scheduler, that will be compatible with DX11 and DX12. DX11 because it is serial API always will create stalls of pipeline and underutilize that wide architecture like GCN. New scheduler is supposed to lift this a bit, but there might be new hardware feature that will change a lot. Remember the link to patent of AMD technology that power gates unused blocks of GPU? Yeah, you know what I am talking here. GPU will power gate unused parts of GPU, and boost parts that are executing task. How high? 1.5-1.6 GHz. Also the scheduler looks to be designed to schedule tasks to parts in linear way to create full utilization of the GPU.

So expect a 2816 GCN core GPU with 1.5 GHz boost clock. If there will be DX12 scenario the GPU will be utilized in 100% regardless and the clock will be most likely in the range between 1100 and 1200 MHz.

Yes, even HBM1 is also possible on mainstream part. But it may be even more different that Ellesmere. Think about this: 3072 GCN core GPU clocked at 1250 MHz, with 4 GB of HBM1 and 125W of TDP. Direct competitor in terms of compute power to Pascal GP104(both would have 7.6 TFLOPs of compute power). But AMD would have much lower power consumption, and better hardware features. And much lower price.

Now you may know why would AMD want that type of GPU ;).

P.S. The message is simple: AMD goes all-in for Gaming, VR, and computing world with best hardware, and open source initiatives.

P.S. 2 Our discussion about Polaris can be pointless because Apple still can use Fiji in the Mac Pro ;).
Edit. Guys, You will be blown away with this.

http://browser.primatelabs.com/geekbench3/6279282

Computer with Engineering Sample of AMD CPU and OS X.
 
Last edited:

Stacc

macrumors 6502a
Jun 22, 2005
888
353
While I still don't believe Polaris 10 will get HBM, if they are limited to HBM1 they could get up to 8 GB by using 8 stacks instead of the 4 used on Fiji. Its possible, since Polaris 10 will be much smaller and they could fit more stacks on the interposer. This is unlikely though, since it would have way too much bandwidth given the expected performance and the cost associated with that many HBM stacks.

Why nobody considers R9 390X on the same level of performance as GTX 980 Ti, if currently, R9 390X is faster in DX12 than GTX 980 Ti and has similar compute power?

Secondly, they do not need to release Gaming edition with higher clocks. Polaris will have new scheduler, that will be compatible with DX11 and DX12. DX11 because it is serial API always will create stalls of pipeline and underutilize that wide architecture like GCN. New scheduler is supposed to lift this a bit, but there might be new hardware feature that will change a lot. Remember the link to patent of AMD technology that power gates unused blocks of GPU? Yeah, you know what I am talking here. GPU will power gate unused parts of GPU, and boost parts that are executing task. How high? 1.5-1.6 GHz. Also the scheduler looks to be designed to schedule tasks to parts in linear way to create full utilization of the GPU.

So expect a 2816 GCN core GPU with 1.5 GHz boost clock. If there will be DX12 scenario the GPU will be utilized in 100% regardless and the clock will be most likely in the range between 1100 and 1200 MHz.

Yes, even HBM1 is also possible on mainstream part. But it may be even more different that Ellesmere. Think about this: 3072 GCN core GPU clocked at 1250 MHz, with 4 GB of HBM1 and 125W of TDP. Direct competitor in terms of compute power to Pascal GP104(both would have 7.6 TFLOPs of compute power). But AMD would have much lower power consumption, and better hardware features. And much lower price.

Now you may know why would AMD want that type of GPU ;).

P.S. The message is simple: AMD goes all-in for Gaming, VR, and computing world with best hardware, and open source initiatives.

P.S. 2 Our discussion about Polaris can be pointless because Apple still can use Fiji in the Mac Pro ;).
Edit. Guys, You will be blown away with this.

http://browser.primatelabs.com/geekbench3/6279282

Computer with Engineering Sample of AMD CPU and OS X.

Is there any reason to think that Polaris 10 will be a 2816 core part? I thought all the rumors pointed to a 2560 core part and that faster clocks and architectural improvements would make up for the difference in the number of cores from Hawaii.

Geekbench results are a pretty bad source of rumors since you can make up all of that information. Many false entries are in there from Hackintoshes. Even if its true, that AMD CPU has terrible performance.
 

ManuelGomes

macrumors 68000
Original poster
Dec 4, 2014
1,617
354
Aveiro, Portugal
Polaris is rumored to have both HBM and GDDR5(X) memory controllers, but different boards are required for each model. If this is true or not, your guess is as good as mine. Does it make sense? Not really, since this is aimed at mainstream only, and silicon real estate is expensive, so GDDR5(X) would be the logic choice.
Still, as a gaming only card... but that would go against being a D710, wouldn't it?
Does it also imply that XT has a 384b mem bus?
I wouldn't trust that info a lot!!
But Polaris could have indeed "hidden" features that weren't leaked, maybe for a XTX model?
We'll have to wait for a leaked shot of the die to see the layout and do some math.

Mac3,1? Hack? Apple board?
 

ManuelGomes

macrumors 68000
Original poster
Dec 4, 2014
1,617
354
Aveiro, Portugal
koyoot, those have been in the wild for some time but it might end up not being the real deal.
The coming day are going to be amazing, and stressful until we get to see the final specs on both camp, and the benchmarks that will ignite some flames around here :)
Since gaming is not my ballpark, I couldn't care less which one will get you more fps, I would like to see great cards on the nMP though
 

Mago

macrumors 68030
Aug 16, 2011
2,789
912
Beyond the Thunderdome
If you look at the GP GPU result its 1:3. Although I'm not sure why the shader compute would be 1:16 and the GPGPU result would be 1:3.
I see these ranks and then I'm hungry for salt.

1:3 fp64 isn't for a consumer GPU its an server compute gpgpu , even the most optimistic AMD fan doubts this could come from Polaris (consumer).

1:16 fp64 it's the other extreme, make it worthless for compute.

Unless it provides a setting enabling fp64 by reorganizing the cores
 

Stacc

macrumors 6502a
Jun 22, 2005
888
353
I see these ranks and then I'm hungry for salt.

1:3 fp64 isn't for a consumer GPU its an server compute gpgpu , even the most optimistic AMD fan doubts this could come from Polaris (consumer).

1:16 fp64 it's the other extreme, make it worthless for compute.

Unless it provides a setting enabling fp64 by reorganizing the cores

Remember that the 7970, a consumer gpu, was 1:4. So it's not out of the question AMD would release a 1:3 chip.

Whether or not you need dual precision largely depends on the task. Fiji, which is 1:16, is the fastest gpu to date at video encoding.
 

Mago

macrumors 68030
Aug 16, 2011
2,789
912
Beyond the Thunderdome
Remember that the 7970, a consumer gpu, was 1:4. So it's not out of the question AMD would release a 1:3 chip.

Whether or not you need dual precision largely depends on the task. Fiji, which is 1:16, is the fastest gpu to date at video encoding.
I still sceptical on these benchmarks, while I'm optimistic on AMD Polaris on the updated nMP.

If these cards are 6 TFlop fp32 and 1:3 fp64 with a couple of these the little black trash can would be capable of serious work (having total 4 TFlop fp64 it's something serious) much better than my most optimistic dream about the u-nMP
 

edanuff

macrumors 6502a
Oct 30, 2008
578
259
TB3 can use active cables for longer cable runs, but even passive cables can do 5K on TB 3, via MST DP 1.2:

upload_2016-5-2_7-38-46.png
 

ManuelGomes

macrumors 68000
Original poster
Dec 4, 2014
1,617
354
Aveiro, Portugal
This slide is incorrect, I was gonna question it before but forgot.
It states that 5K@60Hz@30bpp consumes 22Gbps, which is actually lower than it should be. If this was the case, even DP1.2 would be enough for this. Thing is, this value must be for 24bpp instead of 30bpp, and then it sounds about right.
 

Mago

macrumors 68030
Aug 16, 2011
2,789
912
Beyond the Thunderdome
This slide is incorrect, I was gonna question it before but forgot.
It states that 5K@60Hz@30bpp consumes 22Gbps, which is actually lower than it should be. If this was the case, even DP1.2 would be enough for this. Thing is, this value must be for 24bpp instead of 30bpp, and then it sounds about right.
The pdf where it comes states about protocol overhead , maybe this was accounted or Intel PR had a lapsus brutis (I apologize them, this happen after some time attending meetings with MS people).
 

ManuelGomes

macrumors 68000
Original poster
Dec 4, 2014
1,617
354
Aveiro, Portugal
Mago, the raw bandwidth needed is over 26G if my calculations are correct. Protocol overhead comes after that.
The only explanation would be compression, but I don't think that was considered there.
Someone screwed up in my opinion.
 

Mago

macrumors 68030
Aug 16, 2011
2,789
912
Beyond the Thunderdome
Mago, the raw bandwidth needed is over 26G if my calculations are correct. Protocol overhead comes after that.
The only explanation would be compression, but I don't think that was considered there.
Someone screwed up in my opinion.
Most likely its an error, the data actually matches a 5K 24bpp 60Hz + overhead, at 30bpp it could reach 26 Gbps still left 14 Gbps good for a single channel TB1 plus full bandwidth USB3.
 
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.