2016 nMP

koyoot · Apr 5, 2016

You know what is interesting about GP100? It has 0 ROPs. So it will not be used in consumer market. It is only HPC part.

GP102 will most likely have castrated FP64 cores, and most importantly, ROPs.

ManuelGomes · Apr 5, 2016

Yeah, 'm guessing the GPU parts will come with surprises. This was just show off for now, to keep the interest...

tuxon86 · Apr 5, 2016

koyoot said:
You know what is interesting about GP100? It has 0 ROPs. So it will not be used in consumer market. It is only HPC part.

GP102 will most likely have castrated FP64 cores, and most importantly, ROPs.

The sky is falling, the sky is falling....
We can always count on you...

koyoot · Apr 5, 2016

tuxon86 said:
The sky is falling, the sky is falling....
We can always count on you...

And where I complain here?

fuchsdh · Apr 5, 2016

AidenShaw said:
Nvidia keynote starting now (9AM PDT) at:

http://www.ustream.tv/channel/fWbQyaEMfbh
or
http://www.gputechconf.com/
[doublepost=1459872443][/doublepost]
Mostly true - based on the definition of "crater". If Apple sold zero MP6,1s it would have next to no effect on their sales figures.

Mmm. I mean if we go with it has to significantly damage Apple, as opposed to simply damage the product line, I guess nothing could qualify. They could produce an iPhone that sells one unit and they'd still survive a decade.

ManuelGomes · Apr 5, 2016

http://9to5mac.com/2016/04/05/magic-mouse-force-touch/

fuchsdh · Apr 5, 2016

ManuelGomes said:
Too bad AMD lost the chance to first "announce" a HBM2 GPU. They co-developed the tech, I believe they should have had the need to show it first. Maybe they are still first one out the door...

What's weird is it looks like HBM was pointless given the memory limit. You got a card that was overkill for most low resolutions and yet couldn't quite handle huge resolutions due to being a 4GB card. We're going to get GDDR and HBM2 cards soon enough that are better options.

ManuelGomes · Apr 5, 2016

Yep, 4GB was too limiting. Mow GPUs can have as much mem as the whole computer - almost

Mago · Apr 5, 2016

ManuelGomes said:
Mago, I know. Itanium was my hope for the future until it wasn't. I don't expect it to come back though, no one seems to care about it anymore, unfortunately.
I'd love to see Itanium based Xeons, on the latest node, they would rock for sure.
Too bad it didn't survive other interests, it was indeed a much better, future proof, arch.
Maybe it will come back, but MS lost interest, Apple won't port OS X for sure, not just for just for a possible nMP based on it. I guess that would make us both happy

And now comes Aiden saying something smart about it...

Actually Itanium still produced and sold, but only for certain HP servers I mean. The tech still not dead, it's comeback may occur but for servers (as IBM's PowerPC now Power8).

I think it will comeback as soon Intel realized the x86 architecture can't be patched anymore.

AidenShaw · Apr 5, 2016

koyoot said:
GP100 comes Q1 2017.

I understood that Jen-Hsun said that GP100 is in production now, has been sampled to close partners (like Baidu), will be available soon to early adopters through the GDX-1 and to hyperscale customers, and available in off-the-shelf systems in Q1 2017.

Rather reasonable way to handle the ramp up of supply to demand.

Also, nothing was said about other variants (other than the one in the Drive PX2). I'd fully expect to see other GPxxx cards coming out over the next few months with consumer prices and power needs (and GDDR5 or a variant).

And to connect to the original topic, a GP2xx with GDDR5 in an MXM card could be a possibility for the nnMP.

Mago · Apr 5, 2016

AidenShaw said:
I understood that Jen-Hsun said that GP100 is in production now, has been sampled to close partners (like Baidu), will be available soon to early adopters through the GDX-100 and to hyperscale customers, and available in off-the-shelf systems in Q1 2017.

Rather reasonable way to handle the ramp up of supply to demand.

Also, nothing was said about other variants (other than the one in the Drive PX2). I'd fully expect to see other GPxxx cards coming out over the next few months with consumer prices and power needs (and GDDR5 or a variant).

Same history as with intel's new Xeon Phi Knights Landing, HPC market is too hungry for these parts and given the money involved they have priority 1AA+ over general consumers.

gorbag42 · Apr 5, 2016

So anyone have any complaints about the MP architecture? For myself, while I understand the increasing demand for HPC like services increasing the GPU would help, most of my actual workload benefits from shared memory low cache approaches (pointer chasing). AI and large graph algorithms and the like (subgraph isomorphism, neural simulation with large numbers of synapses per cell, recurrent neural nets, etc.). MP is definitely an improvement over running on a MBP, and I can't afford a Cray XMP... (do they even make them anymore?)

AidenShaw · Apr 5, 2016

gorbag42 said:
So anyone have any complaints about the MP architecture? For myself, while I understand the increasing demand for HPC like services increasing the GPU would help, most of my actual workload benefits from shared memory low cache approaches (pointer chasing). AI and large graph algorithms and the like (subgraph isomorphism, neural simulation with large numbers of synapses per cell, recurrent neural nets, etc.). MP is definitely an improvement over running on a MBP, and I can't afford a Cray XMP... (do they even make them anymore?)

Cray (different company, but they bought the name) sells Intel Xeon systems today (their front page touts E5-26xx v4 availability).

From your description, the MP6,1 would probably OK if the current 64 GiB RAM support is enough. (Actually, 128 GiB is known to work with the 12-core, but Apple doesn't sell or support the 32 GiB RDIMMs that are needed.) If Apple moves to E5-26xx v4 CPUs, the four DIMM slots would support 256 GiB.

Mago · Apr 5, 2016

gorbag42 said:
So anyone have any complaints about the MP architecture? For myself, while I understand the increasing demand for HPC like services increasing the GPU would help, most of my actual workload benefits from shared memory low cache approaches (pointer chasing). AI and large graph algorithms and the like (subgraph isomorphism, neural simulation with large numbers of synapses per cell, recurrent neural nets, etc.). MP is definitely an improvement over running on a MBP, and I can't afford a Cray XMP... (do they even make them anymore?)

The nMP actually is an excellent development platform for HPC, not the same as an good platform for running HPC applications.

Mago · Apr 5, 2016

AidenShaw said:
Cray (different company, but they bought the name) sells Intel Xeon systems today (their front page touts E5-26xx v4 availability).

From your description, the MP6,1 would probably OK if the current 64 GiB RAM support is enough. (Actually, 128 GiB is known to work with the 12-core, but Apple doesn't sell or support the 32 GiB RDIMMs that are needed.) If Apple moves to E5-26xx v4 CPUs, the four DIMM slots would support 256 GiB.

Cray shouldn't exist, they are just Intel's front end for HPC business, while they don't develope supercomputers (as former cray did), it's like I to build and startup and name it "Turing Systems" just to resell supermicro and mellanox under the umbrella of an iconic name.

tomvos · Apr 5, 2016

Unified Memory Support on Mac OS X

In addition to Pascal support in CUDA 8, CUDA 8 platform support for Unified Memory expands to Mac OS X. Now developers using Macs with NVIDIA GPUs can take advantage of the benefits and convenience of Unified Memory in their applications.

Source: https://devblogs.nvidia.com/parallelforall/cuda-8-features-revealed/

I'm not sure if that means some kind of commitment to deliver Pascal based hardware for Macs.

ManuelGomes · Apr 6, 2016

Mago, Itanium still exists indeed but way outdated, still on 32nm process node, and there have been no recent developments. Only HP uses them still in some servers. I don't see Intel putting more money into it's further development, but I hope I'm wrong.

ManuelGomes · Apr 6, 2016

DP is 2:1 in Pascal, that's great. Finally.
I wonder what the ratio will be in Polaris and Vega. Let's hope it's 2:1 too.
I believe FP16 will also come in at double the rate.
The architectures of both camps are coming closer, it will be interesting to see how HS will work on Pascal.

Mago · Apr 6, 2016

Reading Cuda8 and Pascal it's unified memory model put these GPGPU at par to Xeon Phi Knights Landing about the freedom your algorithm can access memory, it's very important for Graph analysis and finance, also as they target for deep learning.

Mago · Apr 6, 2016

ManuelGomes said:
Mago, Itanium still exists indeed but way outdated, still on 32nm process node, and there have been no recent developments. Only HP uses them still in some servers. I don't see Intel putting more money into it's further development, but I hope I'm wrong.

I need to remember you that's AMD FX line still on 28nm until they launch Zen FX

Itanium obviously is not a priority for Intel but they still keep it staff.

PD I'm tired on all these nomenclature "Polaris pascal, Zen, Phi c'mon...

koyoot · Apr 6, 2016

ManuelGomes said:
DP is 2:1 in Pascal, that's great. Finally.
I wonder what the ratio will be in Polaris and Vega. Let's hope it's 2:1 too.
I believe FP16 will also come in at double the rate.
The architectures of both camps are coming closer, it will be interesting to see how HS will work on Pascal.

Unfortunately I think it s highly unlikely

. So far what has been seen P10 has 1/16 ratio of DP.

What is interesting about Pascal architecture is that it is much more GCN-like than any Nvidia architecture before. It will be very optimized for any console port in gaming. What is more interesting is the fact that GP102 looks like in architecture it will have 128 ROPs. It will also have very high core clock. This thing will be fast in gaming. Also it has pretty unusual approach to Asynchronous Compute because each SMM can execute it differently depending on the tasks. It will adapt itself very well.

Now we have to wait for red team answer.

tuxon86 · Apr 6, 2016

koyoot said:
And where I complain here?

Are you kidding?

Mago · Apr 6, 2016

koyoot said:
Unfortunately I think it s highly unlikely . So far what has been seen P10 has 1/16 ratio of DP.

What is interesting about Pascal architecture is that it is much more GCN-like than any Nvidia architecture before. It will be very optimized for any console port in gaming. What is more interesting is the fact that GP102 looks like in architecture it will have 128 ROPs. It will also have very high core clock. This thing will be fast in gaming. Also it has pretty unusual approach to Asynchronous Compute because each SMM can execute it differently depending on the tasks. It will adapt itself very well.

Now we have to wait for red team answer.

Polaris as far I know isn't targeted at compute but I think it may include something to trick fp64 thru fp32 and get a descent 4:1 execution ratio (as nVidia did via driver on maxwell), 1:16 corresponds to fiji.

BTW nVidia aimed directly to Xeon Phi when redesigned it's compute core, while we need to wait for real testing on both platform on the paper (or on power point to be precise) nVidia seems ahead Xeon Phi w/o none it's previous restrictions that prevented some HPC deployment due algorithm incompatibility which favored Xeon Phi and Mic ( nVidia pascal actually looks more as a MIC solution than SMM).

Although a MIC based on ARM hardly will be competitive against Pascal (at least on the paper).

koyoot · Apr 6, 2016

tuxon86 said:
Are you kidding?

No. Where did I complained there?

Stacc · Apr 6, 2016

koyoot said:
Unfortunately I think it s highly unlikely . So far what has been seen P10 has 1/16 ratio of DP.

What is interesting about Pascal architecture is that it is much more GCN-like than any Nvidia architecture before. It will be very optimized for any console port in gaming. What is more interesting is the fact that GP102 looks like in architecture it will have 128 ROPs. It will also have very high core clock. This thing will be fast in gaming. Also it has pretty unusual approach to Asynchronous Compute because each SMM can execute it differently depending on the tasks. It will adapt itself very well.

Now we have to wait for red team answer.

While I can't find where I read it, I believe I have seen a 1/3 double precision ratio either rumored or speculated. There is no way it would be 1/16 like Fiji. AMD needs a compute focused chip since they haven't had one since Hawaii. Just like Nvidia went compute heavy with Pascal after going graphics heavy with Maxwell, AMD will mirror this by going compute heavy with Polaris after going graphics heavy with Fiji. Remember that Tahiti, the first AMD GPU on 28 nm, was also compute focused with a 1/4 ratio.

What is so surprising to me is that Nvidia is shipping a 600 mm2 GPU on 16 nm in June. Before this announcement I think people were wondering if they would make it by the end of the year. I wonder if AMD is regretting going with Global foundaries if they are going to be stuck with a ~230 mm2 GPU until Vega comes out in 2017.

I wonder if there is any chance we would see Pascal in a Mac Pro. Some of those technologies like NVLink and GPU/CPU memory sharing would be pretty slick. Those Pascal boards look very much like the GPUs already in the Mac Pro.

2016 nMP

macrumors 603

macrumors 68000

macrumors 65816

macrumors 603

macrumors 68020

macrumors 68000

macrumors 68020

macrumors 68000

macrumors 68030

macrumors P6

macrumors 68030

macrumors member

macrumors P6

macrumors 68030

macrumors 68030

macrumors 6502

macrumors 68000

macrumors 68000

macrumors 68030

macrumors 68030

macrumors 603

macrumors 65816

macrumors 68030

macrumors 603

macrumors 6502a

Our Staff