Skip to main content
Log in

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

High reliability must be ensured in distributed storage systems (DSSs) to maintain the stability of warehouse-scale computing and high-performance computing (HPC) systems. For system-level reliability, a repair operation using redundant storage nodes can be used in conjunction with erasure coding (EC), which can also affect the system performance. The existing EC design mainly focused on minimizing the required bandwidth for the repair and storage overheads. However, the computing performance for EC should be considered to achieve high bandwidth in order to exploit back-end network link capacity with heterogeneous and high-speed interconnects over 10 Gbps Ethernet. In this study, a new computing acceleration method for repair operation in EC is proposed using multiple repair paths and modifying the computation kernel on the graphics processing unit (GPU) device. For the Cauchy Reed–Solomon (CRS) codes, the proposed scheme is observed to achieve sufficient repair bandwidth compared to the theoretical bound or exceed the current maximum Ethernet link bandwidth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Balaji, S.B., Krishnan, M.N., Vajha, M., Ramkuma, V., Sasidharan, B., Kumar, P.V.: Erasure coding for distributed storage: an overview. Sci. China Inf. Sci. 6 (2018)

  2. Bloemer, J., Kalfane, M., Karpz, R., Karpinski, M., Luby, M., Zuckermank, D.: An XOR-based erasure-resilient coding scheme. International Computer Science Institute, University of California at Berkeley, Berkeley, CA, USA, technical report no. TR-95-048 (1995)

  3. Rashmi, K.V., Shah, N.B., Kumar, P.V.: Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans. Inf. Theory 57(8), 5227–5239 (2011)

    Article  MathSciNet  Google Scholar 

  4. Papailiopoulos, D.S., Dimakis, A.G.: Locally repairable codes. IEEE Trans. Inf. Theory 60(10), 5843–5855 (2014)

    Article  MathSciNet  Google Scholar 

  5. Poutievski, L., Mashayekhi, O., Ong, J., Singh, A., Tariq, M., Wang, R., Zhang, J., Beauregard, V., Conner, P., Gribble, S., Kapoor, R., Kratzer, S., Li, N., Liu, H., Nagaraj, K., Ornstein, J., Sawhney, S., Urata, R., Vicisano, L., Yasumura, K., Zhang, S., Zhou, J., Vahdat, A.: Jupiter evolving: transforming Google’s datacenter network via optical circuit switches and software-defined networking ACM SIGCOMM conference, Amsterdam, Netherlands, pp. 66–85, August 2022

  6. Zhou, H., Feng, D., Hu, Y.: Bandwidth-aware scheduling repair techniques in erasure-coded clusters: design and analysis. IEEE Trans. Parallel Distrib. Syst. 33(12), 3333–3348 (2022)

    Article  Google Scholar 

  7. Zhou, T., Tian, C.: Fast erasure coding for data storage: a comprehensive study of the acceleration techniques. In: USENIX Conference on File and Storage Technologies (FAST), Boston, USA, February 25–28, 2019

  8. Miller, L.T.E., Schwarz, T., Kwong, A.: High performance Galois field arithmetic. http://www.crss.ucsc.edu/proj/galois.html (2017)

  9. Mitra, S., Panta, R., Ra, M.-R., Bagchi, S.: Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage. In: Proceedings of the Eleventh European Conference on Computer Systems (EuroSys), No. 30, pp. 1–16, April 2016

  10. Uezato, Y.: Accelerating XOR-based erasure coding using program optimization techniques. In: Supercomputing Conference (SC), New York, USA, No. 87, pp 1–14, November 2021

  11. Niu, T., Lyu, M., Wang, W., Li, Q., Xu, Y.: Cerasure: fast accelaration strategies for XOR-based erasure codes. In: International Conference on Computer Design (ICCD), Washington, DC, USA, November 2023

  12. Liu, C., Wang, Q., Chu, X., Leung, Y.-W.: G-CRS: GPU accelerated Cauchy Reed-Solomon coding. IEEE Trans. Parallel Distrib. Syst. 64(2), 715–722 (2016)

    Google Scholar 

  13. Shah, N.B., Lee, K., Ramchandran, K.: When do redundant requests reduce latency? IEEE Trans. Commun. 64(2), 715–722 (2016)

    Article  Google Scholar 

  14. Rawat, A.S., Papailiopoulous, D., Dimakis, A., Vishwanath, S.: Locality and availability in distributed storage. IEEE Trans. Inf. Theory 62(8), 4481–4493 (2016)

    Article  MathSciNet  Google Scholar 

  15. Yang, S., Hareedy, A., Calderbank, R., Dolecek, L.: Hierarchical coding for cloud storage: topology-adaptivity, scalability, and flexibility. IEEE Trans. Inf. Theory 68, 3657–3680 (2022)

    Article  MathSciNet  Google Scholar 

  16. Li, J., Li, B.: Parallelism-aware locally repairable codes for distributed storage systems. In: IEEE International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, July 2–5, 2018

  17. Macwilliams, F.J., Sloane, N.J.A.: The Theory of Error-Correcting Codes. North-Holland Publishing Company, Amsterdam (1977)

    Google Scholar 

  18. Dinh, T.X., Ngyen L.Y Nhi, Mohan L.J., Boztas, S., Luong, T., Dau, H.: Practical consideration in repairing Reed-Solomon codes. In: International Symposium on Information Theory (ISIT), Helsinki, Finland, June 27–July 1, 2022

  19. Open-source software of “G-CRS:GPU Accelerated Cauchy Reed-Solomon Coding” (2018). https://www.comp.hkbu.edu.hk/chxw/gcrs.html

  20. Nvidia Corp., “CUDA C++ Programming Guide” (2024). https://docs.nvidia.com/cuda/cuda-c-programming-guide

  21. Yao, Q., Hu, Y., Tu, X., Lee, P.P.C., Feng, D.: PivotRepair: fast pipelined repair for erasure-coded hot storage. In: International Conference on Distributed Computing Systems (ICDCS), Bologna, Italy, July 10–13, 2022, pp. 614–624

  22. Li, X., Cheng, K., Tang, K., Lee, P.P.C., Hu, Y., Feng, D., Li, J., Wu, T.-Y.: ParaRC: embracing sub-packetization for repair parallelization in MSR-coded storage. In: USENIX Conference on File and Storage Technologies (FAST), Santa Clara, USA, February 21–23, 2023, pp. 17–31

  23. Chon, K.-W., Hwang, S.-H., Kim, M.-S.: GMiner: a fast GPU based frequent itemset mining method for large-scale data. Inf. Sci. 439–440, 19–38 (2018)

    Article  MathSciNet  Google Scholar 

  24. Kim, M.-S., An, K., Park, H., Kim, J.: GTS: a fast and scalable graph processing method based on streaming topology to GPUs. In: International Conference on Management of Data (SIGMOD), San Francisco, USA, June 26–July 1, 2016, pp. 447–461

  25. Han, S., Jang, K., Park, K., Moon, S.: PacketShader: a GPU accelerated software router. ACM SIGCOMM Comput. Commun. Rev. 40(4), 195–206 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Nos. 2021R1G1A1091369 and RS-2023-00281635). In addition, the research was supported by “Research Base Construction Fund Support Program” funded by Jeonbuk National University in 2023.

Funding

This study was supported by National Research Foundation of Korea (Grant Nos. 2021R1G1A1091369, RS-2023-00281635), Jeonbuk National University (Grant No. Research Base Construction Fund Support Program).

Author information

Authors and Affiliations

Authors

Contributions

All authors wrote and reviewed the manuscript.

Corresponding author

Correspondence to Kang-Wook Chon.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, C., Chon, KW. Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04438-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04438-y

Keywords

Navigation