Read more at: Parallel Error Detection Using Heterogeneous Cores
Parallel Error Detection Using Heterogeneous Cores
27 June 2018
Soft, or transient, errors are faults that occur seemingly at random, causing bits to flip within an integrated circuit. This is especially important in memory cells, and I remember very clearly reading a blog post from James Hamilton several years ago now, where he talked about the need for ECC on DRAM in servers and discussed some (what was then) recent academic work in the subject. ECC is a great way to protect memory, being high performance with low power and area overheads, given its ability to detect multiple errors and correct some too. However, beyond the memory hierarchy, techniques for error detection and recovery are little used due to the difficulties in protecting logic cheaply.
One area where error detection...