{"id":3391,"date":"2014-06-26T18:48:16","date_gmt":"2014-06-26T18:48:16","guid":{"rendered":"http:\/\/really.zonky.org\/?p=3391"},"modified":"2018-03-31T21:27:27","modified_gmt":"2018-03-31T21:27:27","slug":"linux-adding-ecc-memory-error-reporting-edac","status":"publish","type":"post","link":"https:\/\/really.zonky.org\/?p=3391","title":{"rendered":"Linux: Adding ECC Memory Error Reporting (EDAC)"},"content":{"rendered":"<p>Came across a hint today about reporting on ECC memory errors. For those who do not know, ECC memory\u00a0<em>detects<\/em> memory errors and corrects correctable errors. Normal memory (as found in almost all laptops and desktops) simply ignores the errors and lets them accumulate and cause problems either with data corruption or by causing software errors.<\/p>\n<p>As I happen to have ECC memory in my desktop machine I thought I would have a look into the hint. Turns out that Linux does not report on ECC events automatically; you need to install the relevant <a href=\"http:\/\/buttersideup.com\/edacwiki\/Main_Page\">EDAC<\/a> (Error Detection and Correction) tools. Which for Debian, turns out to be pretty simple :-<\/p>\n<pre># apt-get install edac-utils\r\n<\/pre>\n<p>As part of the installation process, a daemon process is started. But for whatever reason, it didn&#8217;t automatically detect what driver to load. So I edited <em>\/etc\/default\/edac<\/em> and added :-<\/p>\n<pre>EDAC_DRIVER=amd64_edac_mod\r\n<\/pre>\n<p>Once that is done, a simple <em>\/etc\/init.d\/edac restart<\/em> loads the driver and starts monitoring. Messages should appear in your log files (<em>\/var\/log\/messages<\/em>) and reports can be displayed with <em>edac-util<\/em> :-<\/p>\n<pre># edac-util --report=full \r\nmc0:csrow0:mc#0csrow#0channel#0:CE:0\r\nmc0:csrow0:mc#0csrow#0channel#1:CE:0\r\nmc0:csrow1:mc#0csrow#1channel#0:CE:0\r\nmc0:csrow1:mc#0csrow#1channel#1:CE:0\r\nmc0:csrow2:mc#0csrow#2channel#0:CE:0\r\nmc0:csrow2:mc#0csrow#2channel#1:CE:0\r\nmc0:csrow3:mc#0csrow#3channel#0:CE:0\r\nmc0:csrow3:mc#0csrow#3channel#1:CE:0\r\nmc0:noinfo:all:UE:0\r\nmc0:noinfo:all:CE:0\r\n<\/pre>\n<p>Of course memory errors are relatively rare (or at least should be) so it may take months before any error is reported.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3904\" src=\"https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/damascus-unix-prompt1.png?resize=695%2C463&#038;ssl=1\" alt=\"\" width=\"695\" height=\"463\" srcset=\"https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/damascus-unix-prompt1.png?w=792&amp;ssl=1 792w, https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/damascus-unix-prompt1.png?resize=300%2C200&amp;ssl=1 300w\" sizes=\"auto, (max-width: 695px) 100vw, 695px\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Came across a hint today about reporting on ECC memory errors. For those who do not know, ECC memory\u00a0detects memory errors and corrects correctable errors. Normal memory (as found in almost all laptops and desktops) simply ignores the errors and lets them accumulate and cause problems either with data corruption or by causing software errors. <a href='https:\/\/really.zonky.org\/?p=3391' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_share_on_mastodon":"0"},"categories":[4,209],"tags":[1167,1215,43,1216],"class_list":["post-3391","post","type-post","status-publish","format-standard","hentry","category-it","category-linux-it","tag-ecc","tag-edac","tag-linux","tag-memory","category-4-id","category-209-id","post-seq-1","post-parity-odd","meta-position-corners","fix"],"share_on_mastodon":{"url":"","error":""},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1f2KI-SH","_links":{"self":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/3391","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3391"}],"version-history":[{"count":4,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/3391\/revisions"}],"predecessor-version":[{"id":5226,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/3391\/revisions\/5226"}],"wp:attachment":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}