Was dual FirePro dumb?

theSeb · Dec 12, 2013

I would say we can assume it will be close to the MBP's storage upgrade costs

Code:

512GB PCIe-based Flash Storage [Add $300.00]

1TB PCIe-based Flash Storage [Add $800.00]

Cubemmal · Dec 12, 2013

theSeb said:
I would say we can assume it will be close to the MBP's storage upgrade costs

Code:

512GB PCIe-based Flash Storage [Add $300.00] 1TB PCIe-based Flash Storage [Add $800.00]

Which would be nice, but isn't the nMP flash supposed to be some high end Flash much faster than the MBP? If 512GB is only $300, I kind of doubt they'd only give the nMP 256 which would only be $150 on the system price of a $3k or $4k computer. Besides if I recall Schiller said something like "if you knew how much that Flash and GPU's cost you'd be impressed with the pricing." Anyhow I doubt we can use the MBP flash as guidance.

theSeb · Dec 12, 2013

Cubemmal said:
Which would be nice, but isn't the nMP flash supposed to be some high end Flash much faster than the MBP? If 512GB is only $300, I kind of doubt they'd only give the nMP 256 which would only be $150 on the system price of a $3k or $4k computer. Besides if I recall Schiller said something like "if you knew how much that Flash and GPU's cost you'd be impressed with the pricing." Anyhow I doubt we can use the MBP flash as guidance.

Yes, it is supposed to be faster, so I would say estimate on another $200 dollars on top of those options.

Cubemmal · Dec 12, 2013

theSeb said:
Yes, it is supposed to be faster, so I would say estimate on another $200 dollars on top of those options.

OK, I misunderstood - by 512G you mean 512G total, not an additional 512G of course. So that would be maybe $500 for an extra 256G for a total of 512G flash. This sounds right.

flat five · Dec 12, 2013

.

a little something from a chat with the head dev of my rendering app.

ME said:
what type of performance can we expect using the current version of indigo with the new mac pro's dual gpus? for indigo alone, would it be worth buying the (amd) d700s over the d300s?

then down the road a bit, is it unreasonable to expect the program to be able to utilize more of the gpu's potential power beyond the current gpgpu implementation?

thanks for any insight.

HIM said:
Hi Jeff,
Currently I think the D300s should be fine.
But in the future (somewhat near future if you want to use Indigo betas), theD700s could provide a major boost in rendering power.

the current openCL implementation is more of a 'gpu accelerated' type of deal or that the gpu more/less assists the cpu.. when i enable it on my 5770, i'm seeing 2-3x speed ups in certain modes/lighting..
and it appears as if it's going to keep getting better and better

anyway, just thought i'd share..

demars · Dec 12, 2013

Cubemmal said:
Which would be nice, but isn't the nMP flash supposed to be some high end Flash much faster than the MBP? If 512GB is only $300, I kind of doubt they'd only give the nMP 256 which would only be $150 on the system price of a $3k or $4k computer. Besides if I recall Schiller said something like "if you knew how much that Flash and GPU's cost you'd be impressed with the pricing." Anyhow I doubt we can use the MBP flash as guidance.

Yes, when the first preview of the new Mac Pro was given, they were hyping the fact that it uses PCI flash, whereas other Macs at that time with flash options were using SATA flash.

However, since then new iMac and MacBook Pro models were introduced, and they now use PCI flash also. So, it's possible that the flash is about the same speed in these machines as the Mac Pro.

Even if the Mac Pro flash is faster, I'm guessing it is not dramatically faster and the price might be a little higher. I think the price will be in the same ballpark as it is for the iMac and MacBook Pro. I actually think is likely that an upgrade to 1TB will be $800 as it is for the other machines; if not, I doubt it will be higher than, say, $1,000.

The fact that the entry level machine has 256GB flash drive and 12GB RAM indicates, I think, that they were having trouble hitting $2.999 for the base price. Also, it gives people a choice. If someone has a workflow where they are keeping all of their data on an external RAID and only system files and applications on the internal drive, maybe they don't want to pay extra for flash that they aren't going to use.

goMac · Dec 12, 2013

theSeb said:
Yes, it is supposed to be faster, so I would say estimate on another $200 dollars on top of those options.

I don't think it's supposed to be faster. They're both advertised to use the same 1 gigabyte/sec Flash drive.

Edit: NM! The new Macbook Pro is 800-ish megabyte.

jetjaguar · Dec 12, 2013

goMac said:
I don't think it's supposed to be faster. They're both advertised to use the same 1 gigabyte/sec Flash drive.

Edit: NM! The new Macbook Pro is 800-ish megabyte.

the 1tb ssd in the rmbp has seen 1100 .. there are benchmark results in the mbp forum

Larry-K · Dec 13, 2013

theSeb said:
Base config - Quad-Core and Dual GPU
We know the base prices and we know all of the configurations available. We do not know the prices of the upgraded configurations, as I said.

All of the information is here:
http://www.apple.com/mac-pro/specs/ (upgrade and base spec info)

and

http://store.apple.com/us/buy-mac/mac-pro (base spec configs and price)

Well, it certainly looks more impressive stretched out like that than it does in the Apple store "Buy Now" (Not) window. I was always pretty sure they'd include a a power cord, unless they found some way to charge $40 for a Thunderbolt to AC adapter.

I still find it odd we have such minimal pricing data on a computer rumored to ship in two weeks, although at this point I've given up on it for this fiscal year anyway, so it's a moot point for me.

Detrius · Dec 14, 2013

Cubemmal said:
64 bit is faster mostly because they added more registers than for 32 bit, and likewise 16. So if you're running 64 bit compiled and at least somewhat optimized code you should see it run faster.

No, on a G5, 32-bit code ran faster than 64-bit code because pointers are twice as big on 64-bit, so it's more data to push around. Also, very few programs need 64-bit integers. However, with the jump from 16-bit to 32-bit, 32-bit was faster because a lot of programs need numbers bigger than 65535, so 32-bit integers were being emulated on 16-bit by mostly everything.

deconstruct60 · Dec 14, 2013

Detrius said:
No, on a G5, 32-bit code ran faster than 64-bit code because pointers are twice as big on 64-bit, so it's more data to push around.

Which isn't a particularly big impact issue if are pushing around lots of data anyway. (e.g., holding more at a higher, faster memory hiearchy level can offset having to get marginally more data. )

64bit is a bad way of optimizing 32 bit apps with relatively small data sets.

There is also not a necessary exact size coupling between L1 , L2 , and/or L3 cache line sizes and the width of the registers. Bigger doesn't necessarily means slower. 64 is wider but can just load more into short range memory. There is also a limit to just how much instruct level parallelism can squeeze out of code... so loading more 32 code may not necessarily buy as much as having a 64 instruction that does 'more'.

Also, very few programs need 64-bit integers.

Although 'C' and relatied languages like to play fast and loose with the distinction pointers and integers are really separate classes. If have 3-4 billion integers than 64bit pointers to them would be necessary. It isn't like 64bit programs make 8 bit chars explode to 64 bits also.

It is number of items that has generally grown over time at fairly high rates across a wide variety of workloads.

However, with the jump from 16-bit to 32-bit, 32-bit was faster because a lot of programs need numbers bigger than 65535, so 32-bit integers were being emulated on 16-bit by mostly everything.

Again it depends upon what counting. World's population can't be represented in a 32 bit int. Seconds past 1970 ... going to be a 64 problem eventually. Most modern file systems have blown right past 32 bit ints.

Going 16->32 or 32 -> 64 is often a window to dump baggage (instructions and/or concepts) that is holding the archtecture back. ARM may not have been register impoverished but some of its other assumptions and constraints were problematical. ( more enabling super scalar , etc because of commonly available higher transistor budgets without power problems can basically trade-off any marginal load reduction by simply just doing more with what is loaded. )

thekev · Dec 14, 2013

deconstruct60 said:
Although 'C' and relatied languages like to play fast and loose with the distinction pointers and integers are really separate classes. If have 3-4 billion integers than 64bit pointers to them would be necessary. It isn't like 64bit programs make 8 bit chars explode to 64 bits also.

In what way do they play loosely with the distinction? I'm not sure what you mean by this one, although I understood the prior comment about using pointers to two 16 bit integers.

AidenShaw · Dec 14, 2013

deconstruct60 said:
Seconds past 1970 ... going to be a 64 problem eventually. Most modern file systems have blown right past 32 bit ints.

After the embarrassment of "Y2K", most systems have redefined "time_t" as a 64-bit signed integer a decade or two ago. Hopefully, by 2038 few older applications will still be around.

On your second statement, I'd say that "no" modern file systems have a 32-bit issue. Some legacy file systems do, but I'd say that's proof that they're not modern.

Cubemmal · Dec 14, 2013

Detrius said:
No, on a G5, 32-bit code ran faster than 64-bit code because pointers are twice as big on 64-bit, so it's more data to push around. Also, very few programs need 64-bit integers. However, with the jump from 16-bit to 32-bit, 32-bit was faster because a lot of programs need numbers bigger than 65535, so 32-bit integers were being emulated on 16-bit by mostly everything.

I'm obviously talking CISC, RISC processors have plenty of registers so this would be the case.

AidenShaw · Dec 14, 2013

Cubemmal said:
I'm obviously talking CISC, RISC processors have plenty of registers so this would be the case.

Are you claiming that the G5 was CISC, not RISC?

Although the whole "CISC vs RISC" debate was settled in the last century. Neither won, μops became the future game.

Cubemmal · Dec 14, 2013

AidenShaw said:
Are you claiming that the G5 was CISC, not RISC?

Obviously not. My original point was that with Intel CISC procs going 64 bit gave more registers to theoretically can be faster. As you say with RISC you'd probably see the opposite.

It took way too many posts to make that point

undesign · Dec 15, 2013

Since they put in two FirePro cards, does that mean Apple finally implemented Crossfire support in OS X?

haravikk · Dec 15, 2013

undesign said:
Since they put in two FirePro cards, does that mean Apple finally implemented Crossfire support in OS X?

Probably not; CrossFire is mainly used for gaming, and that's not what the Mac Pro is for. You don't need CrossFire support to use the two GPUs to run OpenCL workloads, or to run two separate GPU accelerated tasks.

It's possible they may work in CrossFire under Windows, as some of the new AMD cards support CrossFire over the PCIe bus, but that's unknown as the graphics cards are custom, so may not support it either.

deconstruct60 · Dec 15, 2013

thekev said:
In what way do they play loosely with the distinction? I'm not sure what you mean by this one,

char *c_ptr ; // a character pointer
int fastloose ; // an integer

.....

c_ptr++ ;

....

// versus

fastloose = c_ptr ;
....
c_ptr = fastloose++

.....

That you can willy nilly apply math operators ( appropriate for integers ) to some pointer without any context. That language causally allows users to assign ints to points and pointers to ints.

deconstruct60 · Dec 15, 2013

undesign said:
Since they put in two FirePro cards, does that mean Apple finally implemented Crossfire support in OS X?

There are two GPUs in part because have 7 video output ports ( 6 DisplayPort/TB and one HDMI). It is not particularly Apple's style to have present but dead in some contexts video ports. If present, the user can plug in and use them with no mystery configuration panel. Plug it in and it works.

Similarly can drive 3 4K displays. Perhaps overkill short term since the monitors are relatively expensive, but 3-4 years out probably not so much.

Crossfire isn't particularly going to help with either one of those. Crossfire is oriented toward driving one single monitor at much higher than normal frame rates.

The other issue is that Crossfire/SLI are proprietary solutions. Apple doesn't particularly buy into locking in their and user resources into single sourced constraints. If the two vendors would come up with a standard shared resource solution over PCI-e v3.0 Apple would probably get on board. They probably won't (at least short-intermediate term); so Apple won't.

leman · Dec 15, 2013

goMac said:
An integrated chip (like Iris Pro or the AMD integrated stuff) does it's work from DMA, because it uses integrated memory. There is no second bank of memory like VRAM to store data in or transfer to do. Everything is kept in RAM and there is no shuffling of data around. That's typically faster (and more ideal for audio), but you lose speed in an integrated GPU and RAM is typically slower than VRAM.

For a real awakening, take a look at the Iris Pro OpenCL benchmarks. One reason that GPU is such a speed demon is it's not moving data over a PCI bus.

Image

Even the 4000 series IGPU is faster than the dedicated stuff due to the lack of the PCI bus hit.

I just have to comment on this. The iGPU is faster here mainly because its much better at executing code with lots of branches/random memory accesses. LuxMark is a ray tracer - its memory access patterns already disfavours the design of a typical GPU. The PCI transfer penalty has nothing to do with it.

To your discussion - it depends solely on how 'large' the job is. Transferring the real-time data to the GPU and getting it back is the least of the issues. But the data packages will be so small for real-time processing that there is no reason to bother - e.g. when processing 5-channel 32-bit audio with 44.1kHz sampling rate 100 times per second, each sampling window is just around 8K - this comfortably fits in the CPU's L1 cache. The full second of such audio is 'only' 220k floating point values. For a modern CPU which operates in a GFLOPs range, such amount of data is absolutely miniscule. There is a good reason why there are no more audio accelerators for 3D sound effects in games - a CPU can do the same thing in software more efficiently.

----------

undesign said:
Since they put in two FirePro cards, does that mean Apple finally implemented Crossfire support in OS X?

No, but as a developer, you can do a 'do-it-yourself' crossfire. Just leverage the two cards to perform different tasks/render different parts of the screen and then combine the result. Using multiple GPUs is fairly easy under OS X.

----------

Cubemmal said:
Obviously not. My original point was that with Intel CISC procs going 64 bit gave more registers to theoretically can be faster. As you say with RISC you'd probably see the opposite.

I never understand while people still talk about CISC in the age of super-scalar CPUs. And - Iphone5 has shown that increasing the number of registers on an ARM instruction set does improve performance for many applications.

thekev · Dec 15, 2013

deconstruct60 said:
char *c_ptr ; // a character pointer
int fastloose ; // an integer

.....

c_ptr++ ;

....

// versus

fastloose = c_ptr ;
....
c_ptr = fastloose++

.....

That you can willy nilly apply math operators ( appropriate for integers ) to some pointer without any context. That language causally allows users to assign ints to points and pointers to ints.

bleh, perhaps I'm misinterpreting the goal of that math. On the second one it appears you're just storing the numerical value of a character address (not the characters themselves) to an integer, then incrementing fastloose and pointing c_ptr at fastloose, unless I'm missing something. Was the concern that this is possible?

deconstruct60 said:
Similarly can drive 3 4K displays. Perhaps overkill short term since the monitors are relatively expensive, but 3-4 years out probably not so much.

These machines typically encounter long refresh cycles, so I suspect they wouldn't want to build in some room for growth. It's not even necessarily 3 x 4K displays so much as some combination that might otherwise hit a wall. It could be an issue of storage hookups + 2 displays. Dell's 24" is is coming in around $1400. We'll see how that one looks. I suspect NEC will have something to market within the next year too.

Cubemmal · Dec 15, 2013

deconstruct60 said:
There are two GPUs in part because have 7 video output ports ( 6 DisplayPort/TB and one HDMI). It is not particularly Apple's style to have present but dead in some contexts video ports. If present, the user can plug in and use them with no mystery configuration panel. Plug it in and it works.

Similarly can drive 3 4K displays. Perhaps overkill short term since the monitors are relatively expensive, but 3-4 years out probably not so much.

Crossfire isn't particularly going to help with either one of those. Crossfire is oriented toward driving one single monitor at much higher than normal frame rates.

The other issue is that Crossfire/SLI are proprietary solutions. Apple doesn't particularly buy into locking in their and user resources into single sourced constraints. If the two vendors would come up with a standard shared resource solution over PCI-e v3.0 Apple would probably get on board. They probably won't (at least short-intermediate term); so Apple won't.

All true. The interesting question would be if the card was plumbed for crossfire at all, because then in a Windows boot you should see it. With the newer cards it's handled over the PCIe but, with the older it required plumbing. At any rate I'm sure that Apple didn't plumb for it as they have no interest, it's a pure gaming solution as you say.

VirtualRain · Dec 15, 2013

Cubemmal said:
All true. The interesting question would be if the card was plumbed for crossfire at all, because then in a Windows boot you should see it. With the newer cards it's handled over the PCIe but, with the older it required plumbing. At any rate I'm sure that Apple didn't plumb for it as they have no interest, it's a pure gaming solution as you say.

I thought I read somewhere that Crossfire on old cards could work without the added plumbing (bridge) but that the PCIe bus didn't offer enough bandwidth (on top of everything else it's doing) for it to work all that well. Now with PCIe 3.0 effectively doubling bus bandwidth the thinking is that Crossfire will work just fine without the bridge. If that's true, it's very possible Crossfire will work on the nMP in Windows.

deconstruct60 · Dec 16, 2013

Cubemmal said:
All true. The interesting question would be if the card was plumbed for crossfire at all, because then in a Windows boot you should see it.

And Apple is going to sink hardware R&D into enabling a corner case mode for Windows? Probably not. Leaving it off is cheaper; both shorter term (in design costs ) and in intermediate-longer term ( support ). There is also limited board space. Most "normal" GPU boards don't need to place a SSD on them. Far more folks would like a boot drive than Crossfire. More folks probably would like a 2nd SSD more than Crossfire (although that may have to wait till next Mac Pro design update).

With the newer cards it's handled over the PCIe but, with the older it required plumbing. At any rate I'm sure that Apple didn't plumb for it as they have no interest, it's a pure gaming solution as you say.

There is some synergy. The core DMA infrastructure engine that connects memory to PCI-e transfers can be leverage (even only if shared "gate" to PCI-e ) in OpenCL.

"So, AMD built a DMA engine into its compositing block, facilitating direct communication between GPUs over PCI Express and enough throughput for those triple-screen and 4K configurations that performed so poorly before. .... Moving display data is a real-time operation, necessitating bandwidth provisioning, buffering, and prioritization.
http://www.tomshardware.com/reviews/radeon-r9-290x-hawaii-review,3650-2.html

I doubt Apple is using these newer configs in the Mac Pro. ( 290x is out of the thermal envelope. ). Again short term issue versus long term fit with the core design that they can evolve as "better" aligned components become available.

The "no interest" is bit overblown. Useful improvements for broader range of solutions is generally gets more of their interest than some narrow spec porn chasing niche group. Over time the Mac systems get better.

Was dual FirePro dumb?

macrumors 604

macrumors 6502a

macrumors 604

macrumors 6502a

macrumors 603

macrumors member

macrumors 604

macrumors 68040

macrumors 68000

macrumors 68000

macrumors G5

macrumors 604

macrumors P6

macrumors 6502a

macrumors P6

macrumors 6502a

macrumors regular

macrumors 65832

macrumors G5

macrumors G5

macrumors Core

macrumors 604

macrumors 6502a

macrumors 603

macrumors G5

Our Staff