EVGA GTX 780 Dual FTW w/ ACX Cooler

EVGA GTX 780

Seller Note “There is no video output from this graphics card, I once was able to get it to boot and show video, but that was for only one time, there were artifacts and it got worse and worse progressively before a black screen.”

Summary

  • Resistances vcore 4.6 Ohms, vmem 93 Ohms, pex 333 Ohms
  • Actually has caps missing / damaged on the data lanes, possibly an attempted repair (see picture below)
  • Interestingly, this card loads drivers, passed kombuster, ran heaven with good FPS, but crashed on subnautica
  • Old mats (367) will not run, exits immediately – TODO Investigate this
  • New mats finds one failing chip with just 1 failing bit
  • The card does indeed fail to load drivers sometimes, the seller’s note appears accurate apart from the artefacts which I have not seen
  • Quite serious corrosion across a number of components/areas, which was seemingly responsible for the rare successful boots.
  • PCI speeds are impacted by the damaged/missing capacitors on data lanes.

Mats Report (slight memory corruption detected)

mats version 400.184. Testing GK110B with 20 MB of memory starting with 0 MB.

Read Error Count: 0
Write Error Count: 436784
Unknown Error Count: 0

=== MEMORY ERRORS BY SUBPARTITION ===
SUBPART READ ERRORS WRITE ERRORS UNKNOWN ERRS


FBIOA0 0 0 0
FBIOA1 0 0 0
FBIOB0 0 0 0
FBIOB1 0 0 0
FBIOC0 0 0 0
FBIOC1 0 0 0
FBIOD0 0 0 0
FBIOD1 0 0 0
FBIOE0 0 0 0
FBIOE1 0 436784 0
FBIOF0 0 0 0
FBIOF1 0 0 0

Failing Bits:
E037

=== MEMORY ERRORS BY BIT ===
P : Partition (FBIO)
READ 0 READ 1 READ ?
P BIT READ ERRORS WRITE ERRORS UNKNOWN ERRS EXP. 1 EXP. 0 EXP. ?


E 037 0 436784 0 0 436784 0

=== MEMORY ERRORS BY ADDRESS ===
ADDRESS : Failing memory address, or buffer offset if starting with ‘X+’
T : Type of memory error: W = write, R = read
P : Partition (FBIO)
S : Subpartition
B : Bank
E : Beat
ADDRESS EXPECTED ACTUAL REREAD1 REREAD2 FAILBITS TPSBE ROW COL BIT(s)
——- ——– —— ——- ——- ——– —– — — ——
0001354cbc 55555555 55555575 55555575 55555575 00000020 WE137 0033 02d E037
0001354cb8 55555555 55555575 55555575 55555575 00000020 WE136 0033 02d E037
0001354cb4 55555555 55555575 55555575 55555575 00000020 WE135 0033 02d E037
0001354cb0 55555555 55555575 55555575 55555575 00000020 WE134 0033 02d E037
0001354cac 55555555 55555575 55555575 55555575 00000020 WE133 0033 02d E037
0001354ca8 55555555 55555575 55555575 55555575 00000020 WE132 0033 02d E037
0001354ca4 55555555 55555575 55555575 55555575 00000020 WE131 0033 02d E037
0001354ca0 55555555 55555575 55555575 55555575 00000020 WE130 0033 02d E037

Serious corrosion towards the IO end of the card (impact: rare successful boot)

Further signs of damage. Need to check if this could be a factor in the mats failure, as some corrosion is behind two of the VRAM chips. Also, due to the extent of the corrosion across this area, this could also account for the sporadic startup issues e.g. rust could be bridging some areas…

So, if we inspect the area behind memory chip E1, there is definitely some corrosion that could possibly give rise to issues with that memory chip.

The F1/F0 area is also heavily impacted, although nothing showed on mats for these chips. However, perhaps if the corrosion impacts the card might also not be capable of completing mats.

Repairing the corrosion (seemed to fix sporadic bootup)

This corrosion needs cleaning up to restore any kind of sanity to the board, as I strongly suspect it is contributory at least to the rare boot success.

PCB Corrosion Cleaning Tools

I chose against isopropyl alcohol in favour of something more petrochemical / hydrocarbon-based, although I expect you could use IPA. The toothbrush helped clean the general area and won’t scratch. The cotton buds are very helpful both mopping up the dislodged rust and also when snapped, the stick can be used to scrape off stubborn corrosion deposits. In the end, it seems to have cleaned up nicely enough.

EVGA GTX 780 Corrosion cleaned up

After testing, there seems to be some good news and as expected, remaining problems:

  • The card seems to boot every time now! Or at least 7/7 times, which is great.
  • The exact same memory error is present in mats, so perhaps the memory issue isn’t related to the corrosion or perhaps there is other damage that the cleaning didn’t fix.
  • Now that windows access seems more reliable, I was able to test the PCI speeds in GPU-Z, which as expected is indeed impacted by the damaged/missing capacitors on the data lanes.

Missing/damaged capacitors on the PCI data lanes (impacts: reduced PCI bandwidth, unstable FPS during benchmarks)

Obviously, damage like this has to have some impact. Some pads and traces appear damaged. This will take me some time, but at least the card remains mostly functional.

EVGA GTX 780 Damaged Capacitors on the data lanes

Update 15/09/2022 – Need to take another look at this card

Next Steps

  • Recheck MATs 367 try to find out why it wouldn’t run (I could be missing something important)
  • I should reflow or reball E1, at least to resolve this ahead of further testing.
  • Deal with those terrible capacitors on the data lanes (now I have a better microscope)