Sunday, June 24, 2012

Over the last couple days my primary workstation (Win 7 Pro x64) has been experiencing repeated crashes (BSoDs). After well over two dozen BSoDs I was finally able to pinpoint and resolve the error. (Spoiler: my Crucial M4 SSD needed a firmware update.)

It all started Friday forenoon. I was typing away trying to architect an elegant solution to a sticky problem on one of our client sites when my computer up and crashed (BSoD, KERNEL_DATA_INPAGE_ERROR, STOP: 0x0000007A). Annoying - but it's not as if I don't expect Windows to crash periodically. I rebooted and got back to work. An hour later: it crashed again, in the same way.

Two crashes in as many hours is indicative of something; but what? I reseated all the hardware and took a look at task manager & the event logs to see if there were any unusual goings-on. Nothing. An hour later and the computer crashed again.

"Strange", I thought, that the crashes would be so scheduled; so I started writing a crash log. I jotted down the time when I rebooted and the time of the crash. A couple hours (and a couple crashes) later I was noticing a very distinct pattern: after 60 minutes (give or take a couple minutes) the interface would freeze and a blue screen shortly followed. After each reboot I would try flipping random "switches", note them in my crash log and wait to see if things got better.

I spent all of Friday and most of Saturday trying to work in discrete one-hour blocks of time. The computer would crash, I would flip more switches, wait an hour, and repeat. I tried myriad different tactics (crash log analysis, driver updates, disabling Windows features, hardware removal) but nothing worked. Finally, I just Googled "bsod 60 minutes" and right there at the top, the very first result, contained the solution to my problem. Some guy aliased "JoePeeDee" wrote:

"I was following some links on another Win7 forum and happened upon mention of SSD drives causing a BSOD.  The one in particular mentioned was a Crucial M4 SSD. It sounded familiar . . . I have one as my C: drive.  Turns out I needed a firmware update. The update corrects a condition where an incorrect response to a SMART power-on counter will cause the m4 drive to become unresponsive after 5184 hours of Power-on time. The drive will recover after a power cycle, however, this failure will repeat once per hour after reaching this point. 5184 hours is about 7.2 months. Just a bit under how long I've had this system.

I applied the update . . . . . computer has been running seven hours now."

This described my problem perfectly: same hardware, same crash schedule! I hunted down the Crucial firmware update, installed it, and waited. At 58 minutes uptime I was tentatively optimistic, at 65 minutes uptime I was excited, and at 170 minutes uptime I was confident I had solved the problem and could, finally, get back to work.

