The Vega RX Thread (Rumors and Info)

Asgorath · Aug 1, 2017

koyoot said:
If it is Graphics workload, and in this workload Vega is faster than Titan Xp, why it does not translate into gaming performance?

That's a great question! I'm assuming you are talking about SPECviewperf, which traditionally only NVIDIA's Quadro line of cards have been optimized for (whether that's in hardware or in software, i.e. driver optimizations). So, the simplest answer is that the subtests that Vega FE does well in are using a feature that is intentionally slower on a consumer-grade TITAN card.

https://www.pcper.com/reviews/Graph...B-Air-Cooled-Review/Professional-Testing-SPEC

Let's look at some of the specific cases.

The "catia-04" test says:

The catia-04 viewset was created from traces of the graphics workload generated by the CATIA V6 R2012 application from Dassault Systemes. Model sizes range from 5.1 to 21 million vertices.

The viewset includes numerous rendering modes supported by the application, including wireframe, anti-aliasing, shaded, shaded with edges, depth of field, and ambient occlusion

https://www.spec.org/gwpg/gpc.static/catia04.html

As we've discussed before, the Quadro cards have hardware acceleration for antialiased lines, while a GeForce consumer card (including the TITAN Xp) does not.

Exactly the same situation with the "creo-01" test, which also uses antialiased lines:

The creo-01 viewset was created from traces of the graphics workload generated by the Creo 2™ application from PTC. Model sizes range from 20 to 48 million vertices.

The viewset includes numerous rendering modes supported by the application, including wireframe, anti-aliasing, shaded, shaded with edges, and shaded reflection modes.

https://www.spec.org/gwpg/gpc.static/creo-01.html

energy-01 is a volume rendering test:

The energy-01 viewset is representative of a typical volume rendering application in the seismic and oil and gas fields. Similar to medical imaging such as MRI or CT, geophysical surveys generate image slices through the subsurface that are built into a 3D grid. Volume rendering provides a 2D projection of this 3D volumetric grid for further analysis and interpretation.

At every frame, depending on the viewer position, a series of coplanar slices aligned with the viewing angle are computed on the CPU and then sent to the graphics hardware for texturing and further calculations such as transfer function lookup, lighting and clipping to reveal internal structures. Finally, the slices are blended together before the image is displayed.

https://www.spec.org/gwpg/gpc.static/energy-01.html

Vega clearly has an advantage here, though I'm not sure why. Once again there is a difference between Quadro P5000 and the TITAN Xp, but Vega might just have enough raw horsepower over the Quadro that it wins.

The "medical-01" test seems to be similar to energy:

The medical-01 viewset is representative of a typical volume rendering application that renders a 2D projection of a 3D volumetric grid. A typical 3D grid in this viewset is a group of 3D slices acquired by a scanner (such as CT or MRI).

At every frame, depending on the viewer position, a series of coplanar slices aligned with the viewing angle are computed on the CPU and then sent to the graphics hardware for texturing and further calculations, such as transfer function lookup, lighting and clipping to reveal internal structures. Finally, the slices are blended together before the image is displayed.

https://www.spec.org/gwpg/gpc.static/med-01.html

The "snx-02" test is another win for Quadro, but not TITAN.

The snx-02 viewset was created from traces of the graphics workload generated by the NX 8.0 application from Siemens PLM. Model sizes range from 7.15 to 8.45 million vertices.

https://www.spec.org/gwpg/gpc.static/snx02.html

Again, this uses antialiased lines and other Quadro-only features for NVIDIA, so the P5000 wins handily.

The "sw-03" test is yet another antialiased line test:

The sw-03 viewset was created from traces of Dassault Systemes’ SolidWorks 2013 SP1 application. Models used in the viewset range in size from 2.1 to 21 million vertices.

The viewset includes numerous rendering modes supported by the application, including shaded mode, shaded with edges, ambient occlusion, shaders, and environment maps.

In all the other subtests, the TITAN Xp beats the Vega FE. So, there are exactly 2 subtests out of 9 that the Vega FE wins, and 7 where either the Quadro P5000 or the TITAN Xp win. Those 2 subtests involve 3D volumetric rendering, which is slower on the TITAN than the Quadro (indicating that it's a Quadro-optimized feature for NVIDIA).

Going back to your original question, SPECviewperf is a professional graphics test. It uses features of OpenGL that are common in professional apps used for CAD, 3D modelling and so on. Games do not render with antialiased lines or 2-sided lighting, which is why NVIDIA can accelerate those in their Quadro parts and have some differentiation between their professional and consumer product lines. These professional applications usually have legacy codepaths, using features such as OpenGL's immediate mode, that can be optimized only in the Quadro drivers since no games use them.

So, long story short, while SPECviewperf is a graphics test, it is exercising different functionality that modern game engines. Thus, it cannot be used as a stand-in for actual game benchmarks. We can wait and see how RX Vega performs in game benchmarks, though the initial results from the Vega FE do not look encouraging.

cube · Aug 1, 2017

Stacc said:
That makes Vega's lackluster graphics performance that much harder to swallow, since its going to be awhile before AMD has anything better.

Apple develops the Radeon drivers for OSX, so it could perform better. But if it lags behind in API level, you can blame them.

koyoot · Aug 1, 2017

Asgorath said:
So, long story short, while SPECviewperf is a graphics test, it is exercising different functionality that modern game engines. Thus, it cannot be used as a stand-in for actual game benchmarks. We can wait and see how RX Vega performs in game benchmarks, though the initial results from the Vega FE do not look encouraging.

So long story short it is compute kernel, that is simulating gaming performance.

And the "all" other subtests is 3. So 3 test out of 9 are showing better compute performance on GTX Titan Xp, than on Vega.

h9826790 · Aug 1, 2017

koyoot said:
Then let me express what I am seeing.

"Vega is on par or faster in gaming but uses quite a lot more power - rubbish"
"Vega is faster or on par with Titan Xp in compute - I don't care, it is not a gaming card, its a failure".

Has anybody considered two things: that drivers, and BIOS of Vega FE were not ready? Has anyone considered that Vega features are not implemented, or visible in the applications? Has anyone considered, that those features are heavily important for this architecture to start flying? And yes, I am talking about gaming features. Vega brought ONLY gaming features, to the table. The only one appears to be working is Draw Stream Binning Rasterizer(in Vega RX, Vega FE had it disabled).

Why do you believe that I am saying that gaming is not important.

What I am constantly saying: wait with your judgments about this particular hardware to the moment when software will mature.

Vega in DX11 will unfortunately behave just like GTX 1080. There is nothing AMD can do here. I have to wait and see the effect of Primitive Shaders implementation in game engines, to draw conclusion with what GPU it will compete, in future games in DX12 and Vulkan, but I think it will easily tie with Titan Xp.

That's not what I understand. What I see is

"Vega is ONLY on par with 1080 in gaming, but not the expected 1080Ti, and use much more power, so it's a failure"
"Vega can be faster on Titan Xp in some compute task, which is good, but big lost on most area. So, overall, it's still a failure"

As a end user, I care it's performance, but not that care where to hold it back. No matter if the BOIS not ready or most software developer unable to rewritten their software. What I care is the end result, the fact is that Vega unable to perform better than Pascal card in most scenario.

We make the conclusion now, because we using the computer now. We base on the current fact to make conclusion. If AMD suddenly fix the Vega's performance by a software update, I believe most of us will change our mind. However, we generally made decision base on what we can get actually get, but not what the card potentially able to achieve in the unforeseeable future.

Asgorath · Aug 1, 2017

cube said:
Apple develops the Radeon drivers for OSX, so it could perform better. But if it lags behind in API level, you can blame them.

This is not true. Apple produces the frameworks (i.e. OpenGL, Metal) but the hardware vendors (AMD, Intel, NVIDIA) write the driver back-ends (i.e. the part that actually talks with their hardware).

TheScavenger · Aug 1, 2017

I'm not sure if I love or hate the aesthetics of this card...

koyoot · Aug 1, 2017

Asgorath said:
This is not true. Apple produces the frameworks (i.e. OpenGL, Metal) but the hardware vendors (AMD, Intel, NVIDIA) write the driver back-ends (i.e. the part that actually talks with their hardware).

Intel drivers are better on OSX than on Windows. Its not a statement of who is writing them. Just observable facts.

h9826790 · Aug 1, 2017

koyoot said:
Intel drivers are better on OSX than on Windows. Its not a statement of who is writing them. Just observable facts.

Could you mind explain a bit more on this matter (only if you have time), what observation make you have this conclusion? I am not trying to argue anything. I don't even know how to define good / bad Intel driver. Just want to know more. Thanks in advance.

cube · Aug 1, 2017

Asgorath said:
In all the other subtests, the TITAN Xp beats the Vega FE. So, there are exactly 2 subtests out of 9 that the Vega FE wins, and 7 where either the Quadro P5000 or the TITAN Xp win. Those 2 subtests involve 3D volumetric rendering, which is slower on the TITAN than the Quadro (indicating that it's a Quadro-optimized feature for NVIDIA).

I am not sure it's 100% clear cut, but I think Vega FE wins in OpenGL and Titan Xp wins in Direct3D.

It would seem that Vega FE is tuned for engineering use and Titan Xp for consumer.

Asgorath · Aug 1, 2017

koyoot said:
So long story short it is compute kernel, that is simulating gaming performance.

And the "all" other subtests is 3. So 3 test out of 9 are showing better compute performance on GTX Titan Xp, than on Vega.

Did you even read my post? First of all, none of these tests are running compute kernels. They are all graphics tests that use OpenGL. However, most of them use graphics features that NVIDIA only optimizes in their Quadro hardware/drivers, which is why the Quadro P5000 is so much faster than the TITAN Xp. As I said, Vega FE beats the Quadro P5000 in 2 subtests that feature volume rendering. Perhaps Vega is better at volume rendering, or perhaps it just has enough rendering horsepower that it was able to beat the P5000.

You can keep comparing to the TITAN Xp, but I'll keep telling you that NVIDIA has intentionally hamstrung the performance of its consumer level cards (including the TITAN) for professional applications that are measured in SPECviewperf. Or, in other words, NVIDIA's message is that if you want to run Solidworks or any other test measured in SPECviewperf, they want you to buy a Quadro card. It's absolutely expected that a GeForce/TITAN card will not do well in most of these subtests.
[doublepost=1501605434][/doublepost]

cube said:
I am not sure it's 100% clear cut, but I think Vega FE wins in OpenGL and Titan Xp wins in Direct3D.

It would seem that Vega FE is tuned for engineering use and Titan Xp for consumer.

Quadro is optimized for professional graphics (at a minimum) which is why it does so well in most of the SPECviewperf subtests. There are some cases where TITAN Xp wins, including the D3D11 case(s). Again, it's kind of silly to be running SPECviewperf on a consumer-level card since everyone knows that there is a difference between Quadro and GeForce/TITAN.

I would stronly encourage you to not look at the SPECviewperf results and jump to conclusions about the TITAN Xp's graphics performance. Look at game benchmarks if you want to see it running at full power.

koyoot · Aug 1, 2017

Asgorath said:
Did you even read my post? First of all, none of these tests are running compute kernels. They are all graphics tests that use OpenGL. However, most of them use graphics features that NVIDIA only optimizes in their Quadro hardware/drivers, which is why the Quadro P5000 is so much faster than the TITAN Xp. As I said, Vega FE beats the Quadro P5000 in 2 subtests that feature volume rendering. Perhaps Vega is better at volume rendering, or perhaps it just has enough rendering horsepower that it was able to beat the P5000.

You can keep comparing to the TITAN Xp, but I'll keep telling you that NVIDIA has intentionally hamstrung the performance of its consumer level cards (including the TITAN) for professional applications that are measured in SPECviewperf. Or, in other words, NVIDIA's message is that if you want to run Solidworks or any other test measured in SPECviewperf, they want you to buy a Quadro card. It's absolutely expected that a GeForce/TITAN card will not do well in most of these subtests.

So if they are graphics test, why it does not translate to gaming performance?

You have taken out, as for proof of concept, that Quadro GPUs are using specific features to increase peformance.

The problem which you fail to understand is that VEGA DOES NOT HAVE THOSE FEATURES ENABLED IN THE DRIVERS!

It did not had even Tile Based Rasterization enabled, and you think it had had more specific features, that are reserved for Radeon Pro WX 9100, to have enabled? The driver in current form for Vega Fe is very basic. And funnily enough, Spec is actually using Tile Based Rasterization.

What will happen in benchmarks, when Vega will have those features enabled?

cube · Aug 1, 2017

Asgorath said:
I would stronly encourage you to not look at the SPECviewperf results and jump to conclusions about the TITAN Xp's graphics performance. Look at game benchmarks if you want to see it running at full power.

What did I say? I would not buy a Vega FE just to run games, but I would consider buying it if my engineering applications worked and a comparable professional card was too expensive.

koyoot · Aug 1, 2017

Back to topic: Interview with Chris Hook:

h9826790 · Aug 1, 2017

koyoot said:
Driver leaves a lot of control this way, out of developers. They have been asking for control over hardware, and this is what they have got.

Koyoot, may I ask you another question on this matter?

AMD let the developer to optimise the software by themselves to fully utilise the hardware. Is that mean they have to optimise the same software for each different AMD GPU model? e.g. A software than optimised for Vega 64 doesn't mean that it can perform well on Vega 56. Also, since the computer is not just have the GPU, without knowing other hardware e.g. which CPU, or how much RAM available, is that really possible to optimise a software to used the GPU's real power?

I personally believe AMD is not making rubbish hardware, because all PS4 and XB1 using their chip, and they are amazing. In fact, the XB1X really impress me, that's the gaming machine that I am looking for quite a few years already. Which means that their hardware can do well on graphics (if the software is optimised). However, for console, it's relatively simple, because all hardware is known, almost no variable at all (apart from minor version change). However, for PC, each family has few variation, and there are few families. If a software developer must optimise the software for each every single combination of different hardwares, which sounds very impossible for me.

Asgorath · Aug 1, 2017

koyoot said:
So if they are graphics test, why it does not translate to gaming performance?

You have taken out, as for proof of concept, that Quadro GPUs are using specific features to increase peformance.

The problem which you fail to understand is that VEGA DOES NOT HAVE THOSE FEATURES ENABLED IN THE DRIVERS!

It did not had even Tile Based Rasterization enabled, and you think it had had more specific features, that are reserved for Radeon Pro WX 9100, to have enabled? The driver in current form for Vega Fe is very basic. And funnily enough, Spec is actually using Tile Based Rasterization.

What will happen in benchmarks, when Vega will have those features enabled?

Graphics is not the same as gaming performance. The graphics features (e.g. antialiased lines) that are used in professional modelling/CAD applications like SolidWorks or Catia are not used by games. For example, people designing things often work with wireframe models, and OpenGL's antialised lines look much better for that type of workload.

SPECviewperf cannot decide whether to use tile-based rasterization or not. It's not a feature that is exposed at the OpenGL API level. If the AMD driver enables this and some of the SPECviewperf tests (or other unrelated gaming benchmarks) get faster, great for them! I think I (and many other people) would've preferred them to release completed drivers with their products on day one, not force people to wait weeks or months to actually get the software that fully unlocks all the low-level hardware features in their new architecture.

You asked why a graphics workload where the Vega FE beats the TITAN Xp doesn't translate into gaming performance. I've explained to you about 3 or 4 times now that NVIDIA optimizes certain parts of the OpenGL API in their Quadro products only, and specifically hampers the performance of those paths on their consumer cards. A great example of this is antialiased lines, which most of the SPECviewperf tests use in some form or other. If a TITAN card renders antialiased lines 100x slower than a Quadro card, then this will translate into the Quadro card doing very very well in professional graphics workloads, while the TITAN card still doing very very well in gaming workloads. Or, in other words, you cannot look at SPECviewperf results of a Vega FE vs a TITAN Xp and expect them to correlate to gaming benchmarks.

But sure, feel free to think that the RX Vega is going to magically beat a TITAN Xp at gaming benchmarks when AMD releases a driver update in 6 months. Of course, by then there will be a Volta-based TITAN card which will destroy everything.
[doublepost=1501607026][/doublepost]

cube said:
What did I say? I would not buy a Vega FE just to run games, but I would consider buying it if my engineering applications worked and a comparable professional card was too expensive.

That comment was directed at koyoot, who appears to be having difficulty understanding the difference between professional graphics and gaming graphics (neither of which uses compute kernels).

koyoot · Aug 1, 2017

h9826790 said:
Koyoot, may I ask you another question on this matter?

AMD let the developer to optimise the software by themselves to fully utilise the hardware. Is that mean they have to optimise the same software for each different AMD GPU model? e.g. A software than optimised for Vega 64 doesn't mean that it can perform well on Vega 56. Also, since the computer is not just have the GPU, without knowing other hardware e.g. which CPU, or how much RAM available, is that really possible to optimise a software to used the GPU's real power?

I personally believe AMD is not making rubbish hardware, because all PS4 and XB1 using their chip, and they are amazing. In fact, the XB1X really impress me, that's the gaming machine that I am looking for quite a few years already. Which means that their hardware can do well on graphics (if the software is optimised). However, for console, it's relatively simple, because all hardware is known, almost no variable at all (apart from minor version change). However, for PC, each family has few variation, and there are few families. If a software developer must optimise the software for each every single combination of different hardwares, which sounds very impossible for me.

It is more complicated by I will try to answer this and not create another wall of text.

GCN in general is the same on ground based level. Each compute unit has 4, 16 wide SIMD's. that execute data, regardless if they are compute or graphics. The data is pushed the same way. If you wil refer to frist wall of text on the page #2 of this thread you will see how it looked between GCN1 and GCN4 in terms of data availability to each of the cores, and how that related to L1, L2 cache, and memory controller.

This has not changed since GCN1 to GCN4. What differed all of those GPU architectures, was graphics pipeline capability. GCN4 for example added more robust memory compression, and Tesselation culling which improved graphics IPC in some games that were actually released up to 12 months before Polaris architecture.

With Vega, things are a little different. Vega added: Tile Based Rasterization, Primitive Shaders, Load Balancing, memory Paging system, and connected, Pixel Engine, Geometry Engine to L2 cache, not memory controller, to improve scheduling.

The ground is the same as always: 4 16 wide SIMD's accounting for 64 CUs, with 16 KB L1 cache. What has changed is L2 cache - 4 MB, instead of 2. This is pretty huge change for GCN, biggest actually in years.

This is short version of the situation.

Let me give you an example. You did not needed to optimize specifically for GCN1, 2, 3, 4 because features of the GCN architecture, and its structure hasn't changed. Features that improved performance could be added in drivers, and there was no developer interaction, because everything was executed on hardware,. That is how small changes were in GCN in graphics.

Vega on the other hand has different pipeline, because Pixel Engine, and others are connected to L2 cache. It is longer, but still has the same width as before. The features that increase graphics throughput are hardware features, but they have to be implemented in the engine, just like the Asgorath's Mentioned features of SpecPerf, and enabled in drivers. This alone creates nightmare for AMD engineers, but not so much for game developers.

If one of this parts is put out of the equation - Vega will perform per clock slower than Fiji, because of different pipeline. The driver is executing it this way, just like faster Fiji.

As for optimization. You have to optimize only for Vega architecture, as a Vega, not GCN4, or GCN3. Most important parts of the architecture: High Bandwidth Cache Controller, Primitive Shaders, etc. are inherent parts of the architecture and will be shared by all of the GPUs released by AMD.

JMacHack · Aug 1, 2017

Kinda off topic, but I would like to point out something I've noticed about Apple's marketing lately; they seem to be pushing the gaming angle somewhat.

On their website they mention he performance of games in the new iPad Pro and MacBook Pro pages, I don't think this has happened before, and while not the focus, it seems Apple does want to push their platforms as "gaming platforms" somewhat. This is also supported by their indie games event in the App Store a few months back.

Considering this, I would guess that gaming performance is of some importance to Apple at this point, and maybe is relevant to the discussion at hand.

Also I never got my question answered, is the Vega FE's TDP too much for the cMP?

koyoot · Aug 1, 2017

JMacHack said:
Kinda off topic, but I would like to point out something I've noticed about Apple's marketing lately; they seem to be pushing the gaming angle somewhat.

On their website they mention he performance of games in the new iPad Pro and MacBook Pro pages, I don't think this has happened before, and while not the focus, it seems Apple does want to push their platforms as "gaming platforms" somewhat. This is also supported by their indie games event in the App Store a few months back.

Considering this, I would guess that gaming performance is of some importance to Apple at this point, and maybe is relevant to the discussion at hand.

Also I never got my question answered, is the Vega FE's TDP too much for the cMP?

It has always been like this. Unfortunately it is only Apple's marketing, but at least Metal 2 has some ground to be modern and very good(actually the best modern) API.

h9826790 · Aug 1, 2017

koyoot said:
It is more complicated by I will try to answer this and not create another wall of text.

GCN in general is the same on ground based level. Each compute unit has 4, 16 wide SIMD's. that execute data, regardless if they are compute or graphics. The data is pushed the same way. If you wil refer to frist wall of text on the page #2 of this thread you will see how it looked between GCN1 and GCN4 in terms of data availability to each of the cores, and how that related to L1, L2 cache, and memory controller.

This has not changed since GCN1 to GCN4. What differed all of those GPU architectures, was graphics pipeline capability. GCN4 for example added more robust memory compression, and Tesselation culling which improved graphics IPC in some games that were actually released up to 12 months before Polaris architecture.

With Vega, things are a little different. Vega added: Tile Based Rasterization, Primitive Shaders, Load Balancing, memory Paging system, and connected, Pixel Engine, Geometry Engine to L2 cache, not memory controller, to improve scheduling.

The ground is the same as always: 4 16 wide SIMD's accounting for 64 CUs, with 16 KB L1 cache. What has changed is L2 cache - 4 MB, instead of 2. This is pretty huge change for GCN, biggest actually in years.

This is short version of the situation.

Let me give you an example. You did not needed to optimize specifically for GCN1, 2, 3, 4 because features of the GCN architecture, and its structure hasn't changed. Features that improved performance could be added in drivers, and there was no developer interaction, because everything was executed on hardware,. That is how small changes were in GCN in graphics.

Vega on the other hand has different pipeline, because Pixel Engine, and others are connected to L2 cache. It is longer, but still has the same width as before. The features that increase graphics throughput are hardware features, but they have to be implemented in the engine, just like the Asgorath's Mentioned features of SpecPerf, and enabled in drivers. This alone creates nightmare for AMD engineers, but not so much for game developers.

If one of this parts is put out of the equation - Vega will perform per clock slower than Fiji, because of different pipeline. The driver is executing it this way, just like faster Fiji.

As for optimization. You have to optimize only for Vega architecture, as a Vega, not GCN4, or GCN3. Most important parts of the architecture: High Bandwidth Cache Controller, Primitive Shaders, etc. are inherent parts of the architecture and will be shared by all of the GPUs released by AMD.

Not sure if I understand correctly.

So, it's the software developers' job to optimise a software for that 4x16 architecture. However, the details inside how to deal with the hardware is the AMD's software engineer's job.

But even though they optimised a software for GCN 1-4. They still has to rewritten the software to optimise it again for Vega. As long as it's optimised for Vega, it doesn't really matter which Vega chip it is. But not matter how good the optimisation is, at the end still require AMD's software to make use of the GPU's power.

I read your wall of text, even though can't understand all of them (or even most of them), but still learn a lot. From memory, one of your post mentioned that some of the hardware feature (load balancing?) is still disabled by AMD (by software), so is that mean at this stage, no matter how hard the software developer to do the optimisation, they just can make use the Vega's power?

koyoot · Aug 1, 2017

h9826790 said:
Not sure if I understand correctly.

So, it's the software developers' job to optimise a software for that 4x16 architecture. However, the details inside how to deal with the hardware is the AMD's software engineer's job.

But even though they optimised a software for GCN 1-4. They still has to rewritten the software to optimise it again for Vega. As long as it's optimised for Vega, it doesn't really matter which Vega chip it is. But not matter how good the optimisation is, at the end still require AMD's software to make use of the GPU's power.

I read your wall of text, even though can't understand all of them (or even most of them), but still learn a lot. From memory, one of your post mentioned that some of the hardware feature (load balancing?) is still disabled by AMD (by software), so is that mean at this stage, no matter how hard the software developer to do the optimisation, they just can make use the Vega's power?

Yes, if you will optimize for Vega, it does not matter what Chip you have there.
Let me put this simpler. Vega in DX11 and OpenGL will perform like faster Fiji. Only feature that is there that can be used by DX11 is Draw Stream Binning Rasterizer.

DX12, and Vulkan it will perform like Vega, but the notion here is that you have to rewrite your application or write it correctly with Vega in mind, and its features, to utilize it fully. And this is the only scenario, and only case where you see implementation of most important features of Vega hardware, that have highest impact on performance.

Load Balancing is possibly the most complex feature, because from what I am reading it involves extremely coheretn interplay between hardware, hardware's software(BIOS, and drivers), and the software(application). AMD tied Scheduling, Programmable Geometry Pipeline, clock gating of different parts of Shader Engines of the GPU, with Memory Paging system. The data is split into pages, and executed dynamically based on the need of the moment. The pipeline HAS to be considered "one thing" but split into smaller parts. Otherwise we will occur stalls in the pipeline, which will bottleneck the GPU. It requires tremendous synchronisation, and will require the largest amount of time to do this properly.

Asgorath · Aug 1, 2017

koyoot said:
Let me give you an example. You did not needed to optimize specifically for GCN1, 2, 3, 4 because features of the GCN architecture, and its structure hasn't changed. Features that improved performance could be added in drivers, and there was no developer interaction, because everything was executed on hardware,. That is how small changes were in GCN in graphics.

Vega on the other hand has different pipeline, because Pixel Engine, and others are connected to L2 cache. It is longer, but still has the same width as before. The features that increase graphics throughput are hardware features, but they have to be implemented in the engine, just like the Asgorath's Mentioned features of SpecPerf, and enabled in drivers. This alone creates nightmare for AMD engineers, but not so much for game developers.

What on earth are you talking about? Are you really suggesting that it's up to every app developer to optimized for Vega because things like a better cache architecture or higher geometry throughput don't Just Work(TM). Connecting things to the L2 cache should just automatically make things run faster, or they did it wrong. Increasing geometry throughput should just automatically make things run faster, or they did it wrong.

The apps in SPECviewperf already use antialialised lines, which is a hardware feature. If you want to make antialiased lines render fast, you either build hardware that does it or you don't. There is no driver magic that will make antialiased lines run faster.

I can definitely see that the draw stream binning rasterizer (i.e. tiled rendering) is a nightmare, because it doesn't always improve performance and thus they have to determine when they should use it and when they should turn it off. However, this is going to magically solve their performance issues.

Edit: Let me put it another way. NVIDIA released their Pascal GPUs, and existing games got significantly faster. That's because their architecture didn't require every app developer to go and rewrite their engine to make it work well with the new GPU. If AMD is banking on every app developer rewriting their app to make it run on Vega, they've already lost.

cube · Aug 1, 2017

DX12, Vulkan, and Metal are not "modern" APIs. They are lower level APIs.

koyoot · Aug 1, 2017

Asgorath said:
What on earth are you talking about? Are you really suggesting that it's up to every app developer to optimized for Vega because things like a better cache architecture or higher geometry throughput don't Just Work(TM). Connecting things to the L2 cache should just automatically make things run faster, or they did it wrong. Increasing geometry throughput should just automatically make things run faster, or they did it wrong.

The apps in SPECviewperf already use antialialised lines, which is a hardware feature. If you want to make antialiased lines render fast, you either build hardware that does it or you don't. There is no driver magic that will make antialiased lines run faster.

I can definitely see that the draw stream binning rasterizer (i.e. tiled rendering) is a nightmare, because it doesn't always improve performance and thus they have to determine when they should use it and when they should turn it off. However, this is going to magically solve their performance issues.

Edit: Let me put it another way. NVIDIA released their Pascal GPUs, and existing games got significantly faster. That's because their architecture didn't require every app developer to go and rewrite their engine to make it work well with the new GPU. If AMD is banking on every app developer rewriting their app to make it run on Vega, they've already lost.

The features are not apparent to the applications because the drivers are not reporting them. The features are hardware features, but they have to be reported to the application. I thought that you have described this in your wall of text, and now you are trying to argue against it?

What is creating the nightmare for AMD Engineers? Different pipeline, different features, and different synchronization of the data flow through the pipeline. You know this perfectly well, do I have to write everything to deepest level of detail?

Geometry throughput is increased in Vega through Primitive Shaders, and culling of unused triangles. Its part of Programmable Geometry Pipeline. It has to be implemented by the developer to the game engine. Otherwise, hardware can be ready, driver can report it, but it still will not be used, and wasted.

Nvidia released Pascal GPUs without any problem because they are Maxwell GPUs, on 14 nm process, which I have described in post. GP100 chip would require rewriting of Nvidia drivers for gaming. The same will go for GV chips, if the architecture will reuse SM structure of GP100 chip(64 cores/256 KB Register File size).

Nvidia has the benefit here, because the time, between release of GP100, and consumer GV chips they can use on optimizing the software.

Asgorath · Aug 1, 2017

koyoot said:
The features are not apparent to the applications because the drivers are not reporting them. The features are hardware features, but they have to be reported to the application. I thought that you have described this in your wall of text, and now you are trying to argue against it?

Please explain how a driver reports the fact it has fast antialised lines in OpenGL to an application? If this was an OpenGL extension, sure, the driver could just not report it. However, antialiased lines are a baseline feature from OpenGL 1.0 which came out 25 years ago.

SoyCapitanSoyCapitan · Aug 1, 2017

koyoot said:
Intel drivers are better on OSX than on Windows. Its not a statement of who is writing them. Just observable facts.

No way man. I have a simple dual core Pentium Skylake that can decode 10bit 4K HEVC better on Windows than an i7 Skylake can on High Sierra. Not by just a little. I mean massive difference because macOS fails entirely.

The Vega RX Thread (Rumors and Info)

macrumors 68000

Suspended

macrumors 603

macrumors P6

macrumors 68000

macrumors 6502

macrumors 603

macrumors P6

Suspended

macrumors 68000

macrumors 603

Suspended

macrumors 603

macrumors P6

macrumors 68000

macrumors 603

Suspended

macrumors 603

macrumors P6

macrumors 603

macrumors 68000

Suspended

macrumors 603

macrumors 68000

Suspended

Our Staff