Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

thewoj

macrumors newbie
Original poster
Jan 8, 2018
12
0
Firstly, the system:
  • MacPro 4,1 updated to 5,1 firmware
  • Dual x5690 processors (delidded), originally dual 2.26
  • 56gb of RAM (7x8gb, 1 stick was faulty and never bothered to replace)
  • GTX980 flashed with Mac firmware
  • Generic USB 3.0 card
  • 4 spindle drives, SSD in second optical slot
I did the processor upgrade about a month ago, computer has been running 24/7 since then mostly at idle and has been completely stable. Video card, RAM, USB card, hard drives, and firmware have been installed for over a year w/o issue.

A couple of days ago I started doing heavy processing on the new processors, with sustained 100% processor usage across all cores for numerous hours. Processor temperatures seemed to peak just under 180*f and stay there. This afternoon I came home to find that the screen was blank, but the computer restarted fine. The diagnostic report stated there was a kernel panic for the Intel CPU Power Management kext.

I then started to run the system at full blast again, and after 30 minutes or so I came back to a blank screen, but this time the computer would only restart to a blank screen and would not chime. I happen to have a similarly spec'ed MacPro with stock 2.66 processors in it, so I swapped the GTX980's between the systems and it restarted fine again, but cut out awhile later while just doing a basic file transfer with no processor load.

I then swapped the processor tray into my other MacPro, and it also would not start using the x5690 processor tray. I swapped RAM across and it didn't change anything, I checked to make sure the Northbridge had both it's clips (it does), and I re-did the thermal grease on the CPU that the Northbridge is under. The temperature for the Northbridge hangs around 160*f when the processors are idling around 120-140*f.

The couple of times I have been using the system when this problem occurs, the USB 3.0 card unmounted first, and Wifi would not connect to my network the second time.

Basically, the system will start, run for a short period of time, then cut out again when this processor tray is installed. It seems to work better when it has cooled off for a bit.

I am sure that someone has had this issue before, but several hours of searching and reading didn't find anything that I hadn't already tested. I think I have narrowed it down to the processor tray or the X5690's, I just have no idea which and why 2 days of sustained load has caused a critical failure. I am also not sure if the kernel panic has anything to do with it.

I think my next step is to run each of the X5690's in CPU A and see if it can sustain load that way.
 

thewoj

macrumors newbie
Original poster
Jan 8, 2018
12
0
As an update:

I removed CPU B and ran the computer fine for about 10 minutes with no load, fans running at full the whole time.

After accidentally dropping processor B when reseating, I noticed a bent pin (that may have just been caused by dropping processor--unclear), so I bent it back and reseated processor B. Tightened down both heat sinks as much as I could without really cranking down (I am using a Craftsman T15 screwdriver for this). The computer refused to boot at all.

I then did a 3/4 loosening of all the heatsink screws and the computer booted and ran fine at idle for about 8 hours. At which time, I loaded up a render test and let it go, within a few minutes the computer restarted again to a black screen.

I believe my next step is to stress test each processor in CPU A and see if one is failing under pressure. If inconclusive, try running swapping the processors across slots.
[doublepost=1515456431][/doublepost]Offhand--I would like to replace my Northbridge plastic retainers and re-do the thermal grease--does anyone know the correct size bolt/screw to replace the plastic nut?
 

ActionableMango

macrumors G3
Sep 21, 2010
9,613
6,909
When it doesn't start up, what do the diagnostic LEDs show?

If you have the processors out at any point during your process of troubleshooting, I'd check them with a high resolution macro picture or a magnifying lens for minute damage that might have occurred during the delidding process.
 

Dr. Stealth

macrumors 6502a
Sep 14, 2004
814
740
SoCal-Surf City USA
As an update:

I removed CPU B and ran the computer fine for about 10 minutes with no load, fans running at full the whole time.

After accidentally dropping processor B when reseating, I noticed a bent pin (that may have just been caused by dropping processor--unclear), so I bent it back and reseated processor B. Tightened down both heat sinks as much as I could without really cranking down (I am using a Craftsman T15 screwdriver for this). The computer refused to boot at all.

I then did a 3/4 loosening of all the heatsink screws and the computer booted and ran fine at idle for about 8 hours. At which time, I loaded up a render test and let it go, within a few minutes the computer restarted again to a black screen.

I believe my next step is to stress test each processor in CPU A and see if one is failing under pressure. If inconclusive, try running swapping the processors across slots.
[doublepost=1515456431][/doublepost]Offhand--I would like to replace my Northbridge plastic retainers and re-do the thermal grease--does anyone know the correct size bolt/screw to replace the plastic nut?


I would recommend a Round Head plastic nut & bolt. Usually Nylon.

If Metric the fastener would be 2.5 mm Dia. x 12 mm long

If U.S. hardware use a #3/32 x 1/2" long

Use the appropriate plastic nuts for the threads of whatever fasteners you use.

I would also put the head of the fastener UNDER the board as there is more room above for a little thread extending through the nut.

Also... I'd be willing to bet if you pull the processors, re-apply thermal grease and re-seat very carefully it will solve your shutdown issues. There is some thermal movement of the processors especially as you indicated you have been going from a nice (cool) idle to MAX CPU (hot) usage. This causes a small amount of thermal movement due to differential expansion between components.

Let us know how it goes!
 
Last edited:

thewoj

macrumors newbie
Original poster
Jan 8, 2018
12
0
When it doesn't start up, what do the diagnostic LEDs show?

If you have the processors out at any point during your process of troubleshooting, I'd check them with a high resolution macro picture or a magnifying lens for minute damage that might have occurred during the delidding process.

I will check the lights on the daughterboard when I get home (I don't remember any LED's on, but I will check again), but the main light on the computer is steady. The whole rig does power down, and then start back up a few seconds later when it occurs.

I will also dissassemble and take some high res macro photos of the processor's and sockets later.
[doublepost=1515461003][/doublepost]
I would recommend a Round Head plastic nut & bolt. Usually Nylon.

If Metric the fastener would be 2.5 mm Dia. x 12 mm long

If U.S. hardware use a #3/32 x 1/2" long

Use the appropriate plastic nuts for the threads of whatever fasteners you use.

I would also put the head of the fastener UNDER the board as there is more room above for a little thread extending through the nut.

Ah, damn, I was hoping to not have to remove the daughterboard from the tray, though I suspected it was necessary. Thanks for the info, there was a lot of info about the Northbridge but I was not able to find any good info on these dimensions online.
 

ActionableMango

macrumors G3
Sep 21, 2010
9,613
6,909
You might follow the "won't start" troubleshooting procedure in the Mac Pro service guide. The guide will also show you where all of the various diagnostic LEDs are and what they mean.
 

Dr. Stealth

macrumors 6502a
Sep 14, 2004
814
740
SoCal-Surf City USA
When it doesn't start up, what do the diagnostic LEDs show?

If you have the processors out at any point during your process of troubleshooting, I'd check them with a high resolution macro picture or a magnifying lens for minute damage that might have occurred during the delidding process.


Great point. I would also have a VERY CLOSE look at the sockets at the same time. A fraction of a mm on one pin can make a difference.
[doublepost=1515462879][/doublepost]
I will check the lights on the daughterboard when I get home (I don't remember any LED's on, but I will check again), but the main light on the computer is steady. The whole rig does power down, and then start back up a few seconds later when it occurs.

I will also dissassemble and take some high res macro photos of the processor's and sockets later.
[doublepost=1515461003][/doublepost]

Ah, damn, I was hoping to not have to remove the daughterboard from the tray, though I suspected it was necessary. Thanks for the info, there was a lot of info about the Northbridge but I was not able to find any good info on these dimensions online.


I just recently replaced my NorthBridge fasteners so I had my originals in my desk which I just measured with a pair of Digital Calipers.

I would have a VERY CLOSE look at the socket pins. They are Critical.
 
Last edited:

thewoj

macrumors newbie
Original poster
Jan 8, 2018
12
0
Great point. I would also have a VERY CLOSE look at the sockets at the same time. A fraction of a mm on one pin can make a difference.
[doublepost=1515462879][/doublepost]


I just recently replaced my NorthBridge fasteners so I had my originals in my desk which I just measured with a pair of Digital Calipers.

I would have a VERY CLOSE look at the socket pins. They are Critical.

Do you think that a very slightly misaligned socket pin would cause this issue as the processors are being fed more voltage/heating up/expanding slightly?
 

thewoj

macrumors newbie
Original poster
Jan 8, 2018
12
0
Do you think that a very slightly misaligned socket pin would cause this issue as the processors are being fed more voltage/heating up/expanding slightly?

Well, to finish up this tale of woe:

Tonight I took out CPU B, and ran CPU A full throttle on all cores for 8 minutes with no problems. I then socketed what was CPU B into the CPU A slot and ran that at full throttle for 8 minutes with no problem. I then put what was originally CPU A into CPU B slot (the processors have now switched slots), and could barely boot into OSX before the computer restarted into blackness. At this time I looked for any diagnostic LED's on the CPU tray but nothing was lit up.

So, I got motivated and took apart my second (working) MacPro. I put my X5690's onto the CPU tray of that computer and put the dual 2.66's onto my primary CPU tray. As I was expecting/hoping, the 2.66's fired up just fine and the X5690's did their normal crash routine. At this point, I swapped the X5690's for the original 2.26's I had sitting around, and both computers are working fine.

TLDR: One or both of the X5690's are causing my MacPro's to crash.

Curiosity:
1. Why does the computer boot and run just fine with either of the X5690's installed, but not both?
2. Does the kernel panic I repeatedly received for "com.apple.driver.AppleIntelCPUPowerManagement" mean anything?
3. I know there are many opinions on this matter, but what is the best method for delidding? I used the 4 razor blades and a flame method that I saw here, but that may or may not have borked my processors (or they may have been failing when I purchased them and I finally just pushed them over the ragged edge).
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.