Seller Note “Card has been working fine but now white stripes on the screen but still works fine”
Summary
- Resistances
- Vcore – 1Ω
- Vmem – 60Ω (Samsung)
- VDCCI – 27.5Ω
- Display Rail (PEX) – 21Ω
- 5v – 556Ω (same on my fixed XFX RX 590 8Gb Fat Boy)
- 1.8v – 2.7KΩ
- The card does indeed have vertical bar artifacts
- tserver reveals that probably A0 (could be A1 or both) is the failing chip. This chip is on it’s own near the PCI slot.
The failing chip is likely A0 and/or A1. The chip type is Samsung K4G80325FB-HC25 (Like my 1060 6Gb)
Replacing A0
- Seemed to work heat and come off easily enough (400 deg C) and no annoying SMD components surrounding the chip to accidentally dislodge.
- Sadly, I can see several pads that look damaged. They are on all 4 inner corners of the BGA and I cannot see anything on the removed chip. So I am not sure if I did it or if it was part of the actual fault?
- Will examine it under the microscope to try and work out what damage there is…
I can’t find a datasheet to confirm, but it appears these 4 empty pads might be redundant, as I cannot see any traces leading to them:
Well, unfortunately, there is no improvement, the vertical bar artefacts are still there.
This could mean:
- A1 was the issue or is also an issue (A channel appears to be clearly at fault from the screen position of the bars)
- Perhaps, the replacement was not successful or the replacement chip also has an issue.
- There is another issue affecting the whole A channel e.g. trace/component, worse memory controller.
Part of me doesn’t want to replace A1 without more evidence, although one good thing about doing so would be to see if those unusual empty pads are also present on that chip, which would support that it is a feature of the memory type and not an issue (which would account for A0 replacement not succeeding).
OK, let’s go ahead and take a chance in the hope we learn something new. After removal, a similar pattern of those brown/empty pads can be seen, worryingly with one exception:
Although, comparing the layout to the above photos of the other A0 pads, cannot see any connection? I guess it’s possible that I didn’t clean that pad thoroughly. I am not optimistic about any improvement at this stage.
OK! Great! So I now have a picture without artefacts without the heatsink! Oh, wait, it seems after putting the card all back together, repasting and adding a new thermal pad, I now have an undetected card! Not the happy ending I wanted…
On investigation, it appears it now has an issue with the 5v rail, seemingly just like XFX RX 590 8Gb Fat Boy did. Checking the resistances that I carefully logged in the RX 590 post, they appear fine. With a bit of luck replacing the 5v voltage regulator will improve things, luckily I ordered spares, so hopefully worth a try.
So, I replaced the 5v regulator:
Well, as usual with this card, it appears there is some good news and some bad news.. the good news is the card is all back together and running, Kombuster HD passed and tserver memory test passed! The not so good news is that I think there could still be voltage issues. During Kombuster, I could see regular frame drops and the GPU-Z graph whilst playing Subnautica showed unstable VDDC from what I could see. VRM efficiency is also a bit lower than I would expect.
The choppy GPU load is the main thing that can be seen above and the 80% VRM efficiency (VDDC fluctuates a lot). I need to investigate the VRM performance with an oscilloscope next I think.
Next Steps
- Forgot to check if the BIOS settings look reasonable, Polaris cards are often modded.
- Measure each VRM phase with an oscilloscope
- Does the card have the ‘failed to get VDDC avg. current’ error in tserver?
Update 04/10/2022 – Taking another look
- Was missing some screws, I think I changed the thermal pads last time.
- Benchmarks are all looking normal so far
- Kombuster HD, Furmark HD both fine
- 3D Timespy – scored ‘Good’ just over average. To be fair the core clock is only 1366Mhz for this model, memory is 2000Mhz, those are quite average.
- VRM efficiency as high as 92%, otherwise high 80s, which is probably normal
- Thermals look good
Next Steps
- Check is the average current warning is present in tserver, this could be due to a faulty resistor on a feedback circuit, as the card seems to be performing well enough
- A long stress test in a horizontally mounted position to draw out any memory / BGA type issues hopefully.
Update 03/11/2022 – Failed play testing
After several weeks of intense play testing in my son’s gaming PC, this card has developed another memory issue. Appears to be the same channel. The display port closest to the HDMI port displays the same artefacts: