Gainward GTX 770 4Gb

Seller Note “It has serious artefacts, likely due to VRAM issues, so I’m selling it as not working. It has received a dusting off, brief clean with isopropyl alcohol and the die has been re-pasted.”

Summary

  • An unusual variant of the classic GTX 770 with double the amount of memory chips, running around the back of the card as well as the front.
  • resistances
    • Vcore – 9.7Ω
    • Vmem – 105.1Ω
    • PEX – TODO
    • 5v – 508Ω (measured at 5v voltage regulator)
    • 12v – 1.3ΩK
    • 3.3v -733Ω
  • The voltages are normal and the card starts. As indicated by the seller, there is a memory issue. MATS reveals a single failing bit on the C1 chip.
mats version 400.184.  Testing GK104 with 20 MB of memory starting with 0 MB.

Read    Error Count: 0
Write   Error Count: 15744
Unknown Error Count: 0

=== MEMORY ERRORS BY SUBPARTITION ===
SUBPART READ ERRORS WRITE ERRORS UNKNOWN ERRS
------- ----------- ------------ ------------
FBIOA0            0            0            0
FBIOA1            0            0            0
FBIOB0            0            0            0
FBIOB1            0            0            0
FBIOC0            0            0            0
FBIOC1            0        15744            0
FBIOD0            0            0            0
FBIOD1            0            0            0

Failing Bits: 
   C044 


=== MEMORY ERRORS BY BIT ===
P : Partition (FBIO)
                                            READ 0 READ 1 READ ?
P BIT READ ERRORS WRITE ERRORS UNKNOWN ERRS EXP. 1 EXP. 0 EXP. ?
- --- ----------- ------------ ------------ ------ ------ ------
C 044           0        15744            0      0  15744      0


=== MEMORY ERRORS BY ADDRESS ===
ADDRESS : Failing memory address, or buffer offset if starting with 'X+'
T : Type of memory error: W = write, R = read
P : Partition (FBIO)
S : Subpartition
B : Bank
E : Beat
   ADDRESS EXPECTED   ACTUAL  REREAD1  REREAD2 FAILBITS TPSBE  ROW COL BIT(s)
   ------- --------   ------  -------  ------- -------- -----  --- --- ------
0001354c80 aaaaaaaa aaaabaaa aaaabaaa aaaabaaa 00001000 WC140 004d 034   C044
0001354c84 aaaaaaaa aaaabaaa aaaabaaa aaaabaaa 00001000 WC141 004d 034   C044
0001354c88 aaaaaaaa aaaabaaa aaaabaaa aaaabaaa 00001000 WC142 004d 034   C044
0001354c8c aaaaaaaa aaaabaaa aaaabaaa aaaabaaa 00001000 WC143 004d 034   C044
......
0000559a84 aaaaaaaa aaaabaaa aaaabaaa aaaabaaa 00001000 WC181 0015 034   C044
0000559a88 aaaaaaaa aaaabaaa aaaabaaa aaaabaaa 00001000 WC182 0015 034   C044
0000559a8c aaaaaaaa aaaabaaa aaaabaaa aaaabaaa 00001000 WC183 0015 034   C044
If you are getting failure for first MB of FB then try option -no_scan_out
Error Code = 00000001 

                                        
 #######     ####    ########  ###      
 #######    ######   ########  ###      
 ##        ##    ##     ##     ###      
 ##        ##    ##     ##     ###      
 #######   ########     ##     ###      
 #######   ########     ##     ###      
 ##        ##    ##     ##     ###      
 ##        ##    ##  ########  ######## 
 ##        ##    ##  ########  ######## 

This is possibly just a bad solder joint, I take a guess and try to reflow the front chip of the C1 pair. I accidentally nudge it too much and am forced to proceed directly to a re-ball, as I don’t have a spare chip of this type (Hynix H5GQ2H24BFR-R2C).

See Graphics Card VRAM Reball Procedure for re-ball steps

The re-balling appears successful as a clear picture is now displayed. I am still concerned that the other C1 chip in the pair (on the back of the card) might also have had an issue, I don’t think MATS has a way to narrow it down. So, only thorough testing can hopefully give some level of confidence that the fix is reliable.

Unfortunately, after reassembling all the card and powering up, it crashes when loading the drivers. Worse, the MATS report now looks like there is a failure on multiple channels, including C! A deeper investigation is now required…

mats version 367.38.  Testing GK104 with 10 MB of memory starting with 0 MB.
Errors found. Use -matsinfo for details.
This message will only appear once.
  SUBPART     RANK0 RD ERR  RANK0 WR ERR   UNKNOWN ERR
------------- ------------- -------------  ------------
FBIOA[ 31:  0]            0             0             0
FBIOA[ 63: 32]            0             0             0
FBIOB[ 31:  0]            0        986733             0
FBIOB[ 63: 32]            0       1018112             0
FBIOC[ 31:  0]            0       1023256             0
FBIOC[ 63: 32]            0        648968             0
FBIOD[ 31:  0]            0             0             0
FBIOD[ 63: 32]            0             0             0

Rank 0 Failing bits:
   B000 B001 B002 B003 B004 B005 B006 B007 B032 B033 B034 B035 B036 B037 B038 B039 
   B040 B041 B042 B043 B044 B045 B046 B047 B048 B049 B050 B051 B052 B053 B054 B055 
   B056 B057 B058 B059 B060 B061 B062 B063 C000 C001 C002 C003 C004 C005 C006 C007 
   C008 C009 C010 C011 C012 C013 C014 C015 C016 C017 C018 C019 C020 C021 C022 C023 
   C024 C025 C026 C027 C028 C029 C030 C031 C036 C038 C048 C049 C050 C053 C055 
...

Aside from the possibility that the memory controller is faulty, the failing chips are next to each other, there could be another explanation. First off, I ran MATS 400.184, because I couldn’t get a picture and my MATS 400.184 install runs and powers off by itself. However, I think for older cards like GTX 770, MATS 367.38 is the correct choice. Previous to the re-ball, I am pretty sure this wouldn’t run at all. Now that I am able to run it, it could be it is able to detect the additional issues. I am concerned that the failing 4 chips (which could be as bad as 8, due to this card having pairs) are next to each other, although not all bits are failing. One challenge with this card is that using heat on one VRAM chip probably also affects the one underneath. I could try replacing all 8 chips, but this is quite an increase in time and money.