Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

webbp

macrumors member
Original poster
May 26, 2021
31
4
I am hoping for some help in diagnosing my MP 5,1 which won't start up anymore.
  • MP 4,1 flashed to 5,1
  • 2 CPUS, 12-core 3.06 GHz
  • GPU: EVGA NVIDIA GTX 980 Ti Hybrid
  • I ran furmark for too long and overheated the system last week. It powered itself down.
  • The inside of the case was very hot after that, but no burning smell nor visible damage anywhere.
  • Almost certainly either the GPU itself overheated and powered the system down, or else the heat coming off the GPU overheated something else in the system which caused it to power down.
  • I waited for it to cool down, then powered up seemingly with no problem.
  • Then I started seeing kernel panics, first every few days, then with increasing frequency.
  • Now, power button usually produces no effect. No power, no fans, and no power button LED.
  • Plugging in the power does flash the top two CPU LEDs on the backplane.
  • The system will power on if I let it rest for e.g. 24 hours, but usually won't boot. If it does boot, it powers itself down after a few minutes. I have heard there are some fuses that auto-reset, and this behavior seems consistent with those being involved.
  • Same if I switch to original NVIDIA GT 120 GPU, but in this case, the GT 120 fan goes to 100% and stays there until the Mac powers off or I power it off.
  • I would like to press the DIAG button on the backplane, but I am not sure which of the tiny specks on that row is the button, and I don't want to press the wrong thing and break something.
What are some likely causes / fixes? PSU? Backplane? Fuses? Northbridge? CPU? A temp sensor? Maybe needs new thermal paste somewhere?

So far, I've tried:
  • Removing and inspecting the PSU. No visible damage nor burning smell.
  • Removing the optical drive, all disks except the boot disk, all RAM but two chips in slots 1 and 5, and changing GPU back to stock GT 120.
  • Replacing the backplane battery.
  • Using a different power cord.
  • Removing and reseating the internal power cable connecting the PSU to the backplane.
  • Removing and reseating the processor tray.
 
I'd buy a cheap single 4.1 and test the components part by part. Keep the 4.1 as a spare cause the next part on a 2009 computer will fail some day.
 
  • Like
Reactions: webbp
What are some likely causes / fixes? PSU? Backplane? Fuses? Northbridge? CPU? A temp sensor? Maybe needs new thermal paste somewhere?

So far, I've tried:
  • Removing and inspecting the PSU. No visible damage nor burning smell.
  • Removing the optical drive, all disks except the boot disk, all RAM but two chips in slots 1 and 5, and changing GPU back to stock GT 120.
  • Replacing the backplane battery.
  • Using a different power cord.
  • Removing and reseating the internal power cable connecting the PSU to the backplane.
  • Removing and reseating the processor tray.

Have you tried the GT120 on a different PCIe slot?
Have you tried GTX980 on another machine? Windows PC, perhaps?
I doubt that you have damaged the power supply circuit to the PCIe slot which hold your GTX980 before.
 
  • Like
Reactions: webbp
First thing is to replace the BR2032 battery even if the voltage is good, over 3.0V, it's right below the GPU and the RTC battery don't like being overheated.

Then you need to run mid-2010 ASD, since it's a MP4,1>5,1. With the test results you will know what to look.
 
  • Like
Reactions: webbp
First thing is to replace the BR2032 battery

I did replace the BR2032. No change. However, I didn't very that the new BR2032 itself is good, so I'll check that.

Have you tried the GT120 on a different PCIe slot?
Just tried the second (upper) x16 slot. The GT 120 fan was slow and quiet instead of fast and loud, but the machine powered down after a few seconds and won't power up, probably for the next few hours.
 
the machine powered down after a few seconds and won't power up, probably for the next few hours.
This is usually a symptom of a dying power supply. You probably overstressed the 11 to 12 years old PSU.

How did you powered the GPU? You could also have damaged the backplane power circuit if you powered it just from the mini-PCIe power connectors, AUX-A and AUX-B.
 
  • Like
Reactions: webbp
This is usually a symptom of a dying power supply. You probably overstressed the 11 to 12 years old PSU.

How did you powered the GPU? You could also have damaged the backplane power circuit if you powered it just from the mini-PCIe power connectors, AUX-A and AUX-B.
Thanks for the info. The GPU is supposed to consume maximum ~250W, and I connected it to the first (lowest) PCIe slot, x16, and connected two mini 6-pin PCIe cables from the backplane PCIe power sockets to the 6-pin and 8-pin sockets of the GPU. One of the cables was a mini 6-pin to 6-pin, the other was this mini 6-pin to 8-pin:


 
Thanks for the info. The GPU is supposed to consume maximum ~250W, and I connected it to the first (lowest) PCIe slot, x16, and connected two mini 6-pin PCIe cables from the backplane PCIe power sockets to the 6-pin and 8-pin sockets of the GPU. One of the cables was a mini 6-pin to 6-pin, the other was this mini 6-pin to 8-pin:


Don't forget that no modern GPU will use 75W from the PCIe slot, the last NVIDIA that used anything over 45 to 50W from the PCIe slot was the GTX 480. You definitively overstressed the backplane power circuit powering it this way, Apple designed it back in 2008 with 75W + ~25W surge backup for each AUX connection. Anything over 95W being consumed will damage the power circuit over time and it's very common to find backplanes with the PCB power lines crisped and some with the resettable power fuses completely burned.

Even the Apple recommended Sapphire Pulse RX 580 can trigger the backplane power shutdown when running some stress tests, a GPU that is 183W + PowerPlay.
 
  • Like
Reactions: webbp
I'd buy a cheap single 4.1 and test the components part by part...
New 4,1 with the old CPU tray works, which is consistent with the old 4,1's PSU or backplane being the problem.
You definitively overstressed the backplane power circuit powering it this way, Apple designed it back in 2008 with 75W + ~25W surge backup for each AUX connection. Anything over 95W being consumed will damage the power circuit over time and it's very common to find backplanes with the PCB power lines crisped and some with the resettable power fuses completely burned.

Even the Apple recommended Sapphire Pulse RX 580 can trigger the backplane power shutdown when running some stress tests, a GPU that is 183W + PowerPlay.
Seems like the most likely cause. As it happens, the new 4,1 happens to have a Sapphire RX 580. Any recommendations on how I can avoid this in the future? E.g., external power supply, pixlas power supply mod, or a hardware or GPU BIOS method of throttling or underclocking or disabling PowerPlay? I train neural networks, which uses the GPU at close to 100% for days at a time, but the absolute top speed is not important, only long-term survivability and stability.
 
New 4,1 with the old CPU tray works, which is consistent with the old 4,1's PSU or backplane being the problem.

Seems like the most likely cause. As it happens, the new 4,1 happens to have a Sapphire RX 580. Any recommendations on how I can avoid this in the future? E.g., external power supply, pixlas power supply mod, or a hardware or GPU BIOS method of throttling or underclocking or disabling PowerPlay? I train neural networks, which uses the GPU at close to 100% for days at a time, but the absolute top speed is not important, only long-term survivability and stability.
Pixla's mod, no doubt.
 
New 4,1 with the old CPU tray works, which is consistent with the old 4,1's PSU or backplane being the problem.

Seems like the most likely cause. As it happens, the new 4,1 happens to have a Sapphire RX 580. Any recommendations on how I can avoid this in the future? E.g., external power supply, pixlas power supply mod, or a hardware or GPU BIOS method of throttling or underclocking or disabling PowerPlay? I train neural networks, which uses the GPU at close to 100% for days at a time, but the absolute top speed is not important, only long-term survivability and stability.

If you train neural networkk, I would suggest sourcing for some old Quadro Tesla, which drain less powerr.
Pixlar mod is a must, if you want to install 2-3 Testla cards to your iMac.
 
Any recommendations on how I can avoid this in the future? E.g., external power supply, pixlas power supply mod, or a hardware or GPU BIOS method of throttling or underclocking or disabling PowerPlay?
You can use GPU expansion board and take the heat in a dedicated case where you can provide adequate cooling.
 
  • Like
Reactions: webbp
Thanks all for the advice. I got a replacement lower-end 4,1 for the backplane and PSU, and now would like to diagnose whether the fault was with the old machine's backplane or PSU. What is a safe way to do this? Should I try the old PSU with the new backplane in the new machine? Or the new PSU with the old backplane in the old machine? Is either of these tests likely to cause damage to the working backplane and PSU of the new machine?
 
Always start with a known working PSU with suspected defective backplane first.

A dead backplane usually don't damage a PSU unless it's a serious short, but a defective PSU can damage a working backplane.
 
  • Like
Reactions: webbp
A few test results to report: I edited the 980 Ti BIOS in Windows 10 to lower power consumption using
The default 8-pin EVGA 980 Ti Hybrid firmware allows the 8-pin PCIe socket to draw up to 175W, the 6-pin socket to draw up to 108W, and total draw up to 275W, which probably explains how I killed my logic board (or PSU). Max slot power was already 75W. I changed the 8-pin and 6-pin power settings to match the slot power settings: max 75W each, and changed the total max to 225W.

I am now running furmark at 1080p in windows while observing power consumption with GPU-Z, started 30 minutes ago. TDP is holding steady between 98-99%, PCIe slot power is holding steady ~61W, and 6-pin power and 8-pin power are now balanced and holding steady ~66W each. Temperature increased from ~23 C to ~41 C in ~5 minutes and has held steady after that. (The EVGA Hybrid has a liquid cooling system with external fan and heatsink.) Additionally, the furmark framerate is very steady ~132 fps, so changing the power settings this way has not caused any jerky throttling issues. CPU fan and temp also increased to 53 C and are holding steady.

I repeated the test at 4k. Framerate dropped to 27 fps, i.e., ~1/4 the framerate for 4x the pixels, while power and temperature held steady*. *CPU temperature rose from 53 C to 63 C and then stabilized.

Note: it was not possible to use the EVGA PowerLink because there was not enough clearance to the Mac Pro's PCIe fan.

Note 2: I plan to perform the dual-GPU variant of pixla's mod and use a 980 Ti and a 1080 Ti, but I am glad to see that the BIOS modification works, because it allows me to use the GPU for ML immediately, and a similar BIOS modification will be required after doing the dual-GPU pixla's mod to make sure power and temperature use stay within acceptable limits for continuous operation.

Note 3: I can provide more details about flashing the GPU BIOS on request.

Note 4: cross-posted at https://forums.macrumors.com/thread...up-for-machine-learning.2298235/post-29953533 . Please reply about Mac Pro GPU machine learning there vs Mac Pro hardware failures caused by GPUs here.
 
Last edited:
Always start with a known working PSU with suspected defective backplane first.

A dead backplane usually don't damage a PSU unless it's a serious short, but a defective PSU can damage a working backplane.
Thanks @tsialex. Machine 1 with known working PSU machine 2 won't start, just like machine 1, so likely backplane. Any suggestions on how to test machine 1's PSU without endangering machine 2's working backplane?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.