EVGA GTX 780 3Gb SC (Card B)

Seller Note “Bought it myself to see if I could get it up and running again but all I get is fans at full blast when booted and no post unless BIOS is set for integrated graphics. Booted on integrated graphics but the GTX 780 doesn’t show up in Windows device manager either.”

Summary

  • Why did I buy this given that description, has to be dead? Well, firstly it was very cheap and secondly I quite like this type of card and hoped some of it’s parts could be useful. Quite nice condition, very clean. Time to work out how many shades of dead it is!
  • Resistances, immediately I see a short on the memory rail, very likely a dead core
    • Vcore – 8.1Ω
    • Vmem – 0.2Ω
    • PEX – 2.5KΩ
    • 5v – 2.5KΩ ?
    • 3.3v – 900Ω
    • 12v – 6KΩ+
  • Probably now unfixable, but I am still hoping for learnings through improving my analysis skills.

Looking for the Vram short…

I want to inject some voltage into the memory rail just in case there is another reason for the short aside from the GPU (the fact that the seller mentioned fans running at full speed, I would think not).

OK, this is possibly interesting!

My new UNI-T UTi260b thermal imager is hopefully paying off, as, 3A+ is being drawn with 1v and nothing is showing as warm except the memory PWM U14. So I guess this could have a short on it’s gate.

EVGA GTX 780 SC – Shorted Memory PWM U14

Digging deeper…

Checking the Vmem Dual MOSFETs with an ESR meter shows L12 at 1.08/1.10Ω whereas L13 is higher at 1.16Ω, this could mean the problem is MOSFET Q16 (coil L12) perhaps being shorted. Next, I will measure the gates:

Q17: G1=45KΩ G2=1.6MΩ

Q16: G1=45KΩ G2=7.8Ω! (should be 1.6MΩ, checked another working 780)

These dual MOSFETs are 4901NF https://www.onsemi.com/pdf/datasheet/ntmfd4901nf-d.pdf

I think the PWM U14 marked 0T=FD Q1M is actually Richteck RT8811A https://manualmachine.com/richtek/rt8811agqw/3781835-schematic/

Ok, so hopefully removing Q16 should resolve the short, but the PWM U14 will also need replacing. However, I wouldn’t be surprised if this wasn’t the only problem.

After removal, Vmem is now a much healthier 45Ω (Samsung memory)

OK! So attempting to power the card on, 12v now has a short (or very low ~45Ω resistance) on the 8pin PCI-E! More voltage injection on the way…

Hmm.. sadly, I think this could be the end of the line. I can see the warmth and injected voltage both on the memory rail and near the GPU, so I would guess 12v has also been there. Either I didn’t see the 12v PCI-E short or it formed when I powered on (possible, as the power supply took a few seconds to trip). I guess I could remove the other Vmem MOSFET to see if this has shorted or maybe I opened it during my readings?

Seeing as the resistance of that 12v connector is now the same as the memory rail (about 45Ω), this could be one of those times for hard lessons!

Lessons learned (or reminded) …

a) ALWAYS re-measure for shorts after removing/replacing components, especially prior to power-on testing – definitely my mistake whatever the prior condition of the card (am feeling somewhat careless and foolish as a result).

b) MOSFET gates could be activated by measuring resistances, again, I knew this, but was a bit careless.

Still, on the positive, hopefully, there are more learnings to be had that could help me when dealing with other cards. I will try to resolve the short and then re-measure and try to see if the GPU issues can be proved.

After removing MOSFET Q17, the 12v 8pin PCI-E resistance has now risen to a sensible value in the kiloohm range. The card also starts and I can see all expected voltages (except Vmem of course). However, the GPU is naturally cool and doesn’t really heat up – probably confirmation of being dead of course! I would still like to further analyse this card to check indicators that confirm this:

  • BIOS signal (I expect it’s missing)
  • PEX reset (I expect it’s present, but the GPU cannot wake up)
  • The various PEX resistances (would like to see if indicators are present)
  • I would also like to know if all the memory chips are fried by the Vmem short. The rail has a normal resistance, so maybe only the GPU got fried (probably wishful thinking!).

Round up, spare parts!

  • Well, 3.3v is there on PEX reset.
  • Cannot see a BIOS chip-select on the oscilloscope (often a bad sign when missing).
  • However, the most striking reading was that refclk+- and every single transmit/receive PCI lane measures open-line! (double checked my readings again and again).
  • TODO Check crystal
  • TODO (Maybe) restore voltage on Vmem rail

I think this is as dead as dead can be for a GPU core! I wouldn’t be surprised if someone had hot-aired this core to death and all this analysis was in vain. Although, I don’t mind too much, as I expected a dead card and it was fun putting the thermal imager to good use! 🙂