A Load-Balanced Collaborative Repair Algorithm for Single-Disk Failures in Erasure Coded Storage Systems
Abstract
In large-scale cloud data centers and distributed storage systems, erasure coding is usually employed to enhance data availability and storage efficiency. However, with the explosive growth of data volume and the continuous expansion of storage system scale, traditional erasure coding techniques face significant challenges in handling single-disk failures. These challenges are primarily reflected in low data recovery efficiency and imbalanced system load distribution, which ultimately result in excessive I/O load and network bandwidth consumption, severely limiting the overall performance of the system. To address these issues, this article proposes a load-balanced data repair algorithm for single disk failures in erasure coded storage systems, called MNCR (Multi-Node Cooperative Repair). This algorithm improves data recovery efficiency in single-disk failure scenarios by minimizing data reading and inter-disk data transmission, using a cooperative repair strategy among disks. In addition, the algorithm designs a dynamic load balancing mechanism, which effectively resolves the issue of imbalanced data load distribution among disks during the repair process, thus avoiding performance bottlenecks caused by overloaded disks. Experimental results show that the MNCR algorithm significantly outperforms traditional methods in terms of repair efficiency and load balancing, providing an effective solution for single disk failure recoveries in erasure coding based large-scale storage systems.
Related Papers
- → Minimax Universal Decoding With an Erasure Option(2007)22 cited
- → Universal Decoding With an Erasure Option(2007)2 cited
- → Step-by-step error/erasure decoding Reed-Solomon codes(2003)2 cited
- → Error and erasure decoding of binary cyclic code up to actual minimum distance(2002)1 cited
- → Improved Constructions for Optimal Multi-erasure Locally Recoverable Codes for Big Data Storage(2019)