www.anandtech.com/guides/viewfaq.html?i=3
pm ECC stands for either Error Checking and Correcting, or Error Correcting Code, depending on who you talk to. As its name implies, ECC is a form of code that can check and correct errors that occur in memory. It is a more sophisticated form of checking than another common form of error c hecking known as parity memory. Similar to parity memory, ECC works by adding extra error checking bits on to a stream of bits. There are many different types of ECC codes, but in computing applications the most co mmonly used is known as Reed-Solomon after its discoverers. In ECC SDRA M, the length of the bits is 64 bits (8 bytes) and 7 extra error checkin g bits are added which allows detection of two incorrect bits, and the d etection and correction one incorrect bit. In other words, if there is one mistake in 8 bytes, then ECC can detect the error and fix it, and if there are two errors in 8 bytes then ECC can detect the error and repor t it. There are two types of typical memory errors: hard errors and soft error s Hard errors are typically caused by physical defects within the memo ry chip. Typically these come from manufacturing problems but can occas ionally crop up as a result of reliability issues (ie. Soft errors are caused by radiation in modern systems the most common source of soft error radiation is from cosmic rays although oc casionally soft errors can occur due to low-levels of radiation in the c omponents that make up the computer (for example, the solder). Hard err ors, once they manifest themselves, will continue to repeat themselves a nd they are a sign of a defect DIMM Soft errors are a rare event and th ere should be no permanent damage to memory. ECC SDRAM generally carries a price premium of 10-20% over no n-ECC memory. In these days of cheap memory, this difference is fairly negligible but it is worth noting.
The ECC calculation process results in a small but meas urable decrease in performance. The redundant ECC bits are calculated when ECC is written back to memory, and so a read-after-modified-write will incur a minor performance penalty. This reduction in performance is usually approximately 3-4% on PC133 CAS2 ECC SDRAM. Modern semiconductor memories have a low defect rate, these defects are almost always caught by the manufacturer, and memory tends to be very reliable over the life of the chip (typically 7+ years). For the frequency of soft errors, however, different manufacturers quote different numbers.
Statistically soft error rates scale linearly with memory size, so 256MB is twice as likely to see a problem as 128MB Soft errors are affected by altitude: it has been widely agreed that a soft error event in SDRAM is 10x more likely at one mile above sea level and it is 100x more likely in an airplane at 60,000 ft. The performance impact is insignificant since 3% is barely within the margin of error on bench marks. The price impact in these days of cheap memory is insignificant. The frequency of soft errors in SDRAM is high enough at high altitudes (Denver, CO, for example) to be of concern. Paying 10% more for memory to avoid potential data corruption seems like a low price to pay. In a way, ECC is like insurance on memory errors, if you care about data int egrity and system reliability then its worth the premium.
|