I got a really weird problem.
Two years ago bought a set of 4x DDR4 3200 16gb each, single sided and placed them in a ryzen 5600 desktop computer, which i almost never turned it off. It worked without issue.
This weekend I wanted to dust off the PC, so I took all the components out, replaced the thermal paste and so on.
Turned on the PC again, worked apparently without issues until after a while Linux was pissed about going out of memory. Out of memory? With 64gb of RAM? I checked with dmidecode -t memory
and I saw that a channel was reporting completely empty.
Shut down the PC, reinserted the second channel, rebooted, saw 64gb. One hour later, kernel panic. Rebooted in memtest86+, error in memory. What? Removed one module, error. Removed two modules, no error. Switched the modules, no error. What??
Placed the two modules that are passing the test in another computer, error. Put back in the original computer, pass test. AAAAAAAAAAAAAAA
Now I downclocked from 3200 to 2400 and everything seems working fine.
What could be? Have I been cursed?
After a few reinsertions do the slots degrade to a point that can’t sustain 3200 anymore?
Inspect the channels for debris. Hit the RAM contacts and slot with contact cleaner (don’t get any on your skin).
You put new thermal paste on things? Did you remove the CPU as well? You could have damaged some pins there too.
The delay in the failure sounds like it could be as the components expand with heat.
Take it apart and look at all the pins of both the RAM, RAM slot, and CPU (if you removed that) for any damage.
… dust off the PC …
It’s not at all out of the question that some filth got into your connector(s). Hit them with a mess of canned air and try again?
I don’t think that you will see a difference in performance. :)
SO DIMM and DIMM sockets have a somewhat limited durability (mating cycles) of just 25. link
I never reached that limit. And I’m not sure if this is related to your case.
Wow, I had no idea. Thanks for the link
Wow I didn’t imagine that the connector was so fragile
I wonder what that 25 number actually means. It’s 25 across multiple slot types so I’m guessing it’s less a measured value and more a quality control number based on their most fragile product.
Probably something like a sample is cycled 25 times and if less than X% still test as being in spec they know something is wrong with the current batch, but again that’s mostly a guess and the actual durability experienced by the end user would vary significantly depending on what the acceptable failure rate is.
I think so too. Most likely most of the sockets will survive more than 25 cycles. Maybe it’s a specified minimum durability which is guaranteed for nearly all sockets.
Placed the two modules that are passing the test in another computer, error
So you put the ram you thought was good in another motherboard and it failed memtest? I’d interpret that to mean one of 3 things
A) the problem is in one of those modules you switched
B) separate problems occurred on both motherboards either due to unrelated issues or the memory being seated incorrectly (this is really unlucky)
C) there’s a problem with the modules you switched and an unrelated problem either in the other modules or in your primary motherboard (you poor bastard)
Did you take note of where in memory memtest was finding errors? If it wasn’t in the same general area between runs then its more likely to be a motherboard issue.
RAM is easily damaged by static discharge. Were you wearing a ground strap and took care not to let the memory module touch any ungrounded surfaces while you were handling it?
Static damage can often appear as marginal or intermittent failures, probably more often than complete failure.
No I manhandled them and put them on a random shelf, I was under the impression modern electronics are designed to withstand that light abuse, saw a electroboom video where he tries and fails to fry RAM with electrostatic discharge
Newer components are if anything more vulnerable to ESD because they have more delicate construction.
I’ve encountered oxidisation of the contacts before. You can try and rub them with an ordinary eraser
Maybe the contacts were damaged on reinsert? Not just degrading / wearing down, but physically damaged