Linux: Adding ECC Memory Error Reporting (EDAC)

Came across a hint today about reporting on ECC memory errors. For those who do not know, ECC memory detects memory errors and corrects correctable errors. Normal memory (as found in almost all laptops and desktops) simply ignores the errors and lets them accumulate and cause problems either with data corruption or by causing software errors.

As I happen to have ECC memory in my desktop machine I thought I would have a look into the hint. Turns out that Linux does not report on ECC events automatically; you need to install the relevant EDAC (Error Detection and Correction) tools. Which for Debian, turns out to be pretty simple :-

# apt-get install edac-utils

As part of the installation process, a daemon process is started. But for whatever reason, it didn’t automatically detect what driver to load. So I edited /etc/default/edac and added :-

EDAC_DRIVER=amd64_edac_mod

Once that is done, a simple /etc/init.d/edac restart loads the driver and starts monitoring. Messages should appear in your log files (/var/log/messages) and reports can be displayed with edac-util :-

# edac-util --report=full 
mc0:csrow0:mc#0csrow#0channel#0:CE:0
mc0:csrow0:mc#0csrow#0channel#1:CE:0
mc0:csrow1:mc#0csrow#1channel#0:CE:0
mc0:csrow1:mc#0csrow#1channel#1:CE:0
mc0:csrow2:mc#0csrow#2channel#0:CE:0
mc0:csrow2:mc#0csrow#2channel#1:CE:0
mc0:csrow3:mc#0csrow#3channel#0:CE:0
mc0:csrow3:mc#0csrow#3channel#1:CE:0
mc0:noinfo:all:UE:0
mc0:noinfo:all:CE:0

Of course memory errors are relatively rare (or at least should be) so it may take months before any error is reported.

Comments

One response to “Linux: Adding ECC Memory Error Reporting (EDAC)”

  1. […] is a continuation of an earlier post regarding ECC memory under Linux, and is how I added a little widget to display the current ECC […]

Leave a Reply

Your email address will not be published. Required fields are marked *