ECC memory Technical Analysis

ECC memory, also known as Error-Correcting Code memory, has the capability to detect and correct errors in data. It is commonly used in high-end desktop computers, servers, and workstations to enhance system stability and safety.

Memory is an electronic device, and errors may occur during its operation. For users with high stability requirements, memory errors can lead to critical issues. Memory errors can be classified into two types: hard errors and soft errors. Hard errors are caused by hardware damage or defects, and the data is consistently incorrect. These errors cannot be corrected. On the other hand, soft errors occur randomly due to factors like electronic interference near the memory and can be corrected.

To detect and correct soft memory errors, the concept of memory “parity check” was introduced. The smallest unit in memory is a bit, represented by either 1 or 0. Eight consecutive bits make up a byte. Memory without parity check has only 8 bits per byte, and if any bit stores an incorrect value, it can lead to erroneous data and application failures. Parity check adds an extra bit to each byte as an error-checking bit. After storing data in a byte, the eight bits have a fixed pattern. For example, if the bits store data as 1, 1, 1, 0, 0, 1, 0, 1, the sum of these bits is odd (1+1+1+0+0+1+0+1=5). For even parity, the parity bit is defined as 1; otherwise, it is 0. When the CPU reads the stored data, it adds up the first 8 bits and compares the result with the parity bit. This process can detect memory errors, but parity check cannot correct them. Additionally, parity check cannot detect double-bit errors, although the probability of double-bit errors is low.

ECC (Error Checking and Correcting) memory, on the other hand, stores an encrypted code alongside the data bits. When data is written into memory, the corresponding ECC code is saved. When reading back the stored data, the saved ECC code is compared with the newly generated ECC code. If they do not match, the codes are decoded to identify the incorrect bit in the data. The erroneous bit is then discarded, and the memory controller releases the correct data. Corrected data is rarely written back into memory. If the same erroneous data is read again, the correction process is repeated. Re-writing data can introduce overhead, leading to a noticeable performance decrease. However, ECC memory is crucial for servers and similar applications, as it provides error correction capabilities. ECC memory is more expensive than regular memory due to its additional features.

Using ECC memory can have a significant impact on system performance. While it may reduce overall performance, error correction is essential for critical applications and servers. As a result, ECC memory is a common choice in environments where data integrity and system stability are paramount.


Post time: Jul-19-2023