Citadel: Efficiently Protecting Stacked Memory from Large Granularity Failures
Citations Over TimeTop 10% of 2014 papers
Abstract
Stacked memory modules are likely to be tightly integrated with the processor. It is vital that these memory modules operate reliably, as memory failure can require the replacement of the entire socket. To make matters worse, stacked memory designs are susceptible to newer failure modes (for example, due to faulty through-silicon vias, or TSVs) that can cause large portions of memory, such as a bank, to become faulty. To avoid data loss from large-granularity failures, the memory system may use symbol-based codes that stripe the data for a cache line across several banks (or channels). Unfortunately, such data-striping reduces memory level parallelism causing significant slowdown and higher power consumption. This paper proposes Citadel, a robust memory architecture that allows the memory system to retain each cache line within one bank, thus allowing high performance, lower power and efficiently protects the stacked memory from large-granularity failures. Citadel consists of three components, TSV-Swap, which can tolerate both faulty data-TSVs and faulty address-TSVs, Tri Dimensional Parity (3DP), which can tolerate column failures, row failures, and bank failures, and Dynamic Dual Granularity Sparing (DDS), which can mitigate permanent faults by dynamically sparing faulty memory regions either at a row granularity or at a bank granularity. Our evaluations with real-world data for DRAM failures show that Citadel provides performance and power similar to maintaining the entire cache line in the same bank, and yet provides 700x higher reliability than Chip Kill-like ECC codes.
Related Papers
- → Transparent Hardware Management of Stacked DRAM as Part of Memory(2014)109 cited
- → Memory Architecture for Integrating Emerging Memory Technologies(2011)31 cited
- → A Study of Memory Placement on Hardware-Assisted Tiered Memory Systems(2020)1 cited
- → Memory systems(2007)
- → Polymorphic Memory: A Hybrid Approach for Utilizing On-Chip Memory in Manycore Systems(2020)