Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Kim, Chanki; Chon, Kang-Wook

doi:10.1007/s10586-024-04438-y

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Published: 12 April 2024

(2024)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Chanki Kim¹ &
Kang-Wook Chon²

33 Accesses
Explore all metrics

Abstract

High reliability must be ensured in distributed storage systems (DSSs) to maintain the stability of warehouse-scale computing and high-performance computing (HPC) systems. For system-level reliability, a repair operation using redundant storage nodes can be used in conjunction with erasure coding (EC), which can also affect the system performance. The existing EC design mainly focused on minimizing the required bandwidth for the repair and storage overheads. However, the computing performance for EC should be considered to achieve high bandwidth in order to exploit back-end network link capacity with heterogeneous and high-speed interconnects over 10 Gbps Ethernet. In this study, a new computing acceleration method for repair operation in EC is proposed using multiple repair paths and modifying the computation kernel on the graphics processing unit (GPU) device. For the Cauchy Reed–Solomon (CRS) codes, the proposed scheme is observed to achieve sufficient repair bandwidth compared to the theoretical bound or exceed the current maximum Ethernet link bandwidth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study of the performance of novel storage-centric repairable codes

Article 29 July 2015

CRL: Efficient Concurrent Regeneration Codes with Local Reconstruction in Geo-Distributed Storage Systems

Article 19 November 2018

H-V: An Improved Coding Layout Based on Erasure Coded Storage System

Data availability

No datasets were generated or analysed during the current study.

References

Balaji, S.B., Krishnan, M.N., Vajha, M., Ramkuma, V., Sasidharan, B., Kumar, P.V.: Erasure coding for distributed storage: an overview. Sci. China Inf. Sci. 6 (2018)
Bloemer, J., Kalfane, M., Karpz, R., Karpinski, M., Luby, M., Zuckermank, D.: An XOR-based erasure-resilient coding scheme. International Computer Science Institute, University of California at Berkeley, Berkeley, CA, USA, technical report no. TR-95-048 (1995)
Rashmi, K.V., Shah, N.B., Kumar, P.V.: Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans. Inf. Theory 57(8), 5227–5239 (2011)
Article MathSciNet Google Scholar
Papailiopoulos, D.S., Dimakis, A.G.: Locally repairable codes. IEEE Trans. Inf. Theory 60(10), 5843–5855 (2014)
Article MathSciNet Google Scholar
Poutievski, L., Mashayekhi, O., Ong, J., Singh, A., Tariq, M., Wang, R., Zhang, J., Beauregard, V., Conner, P., Gribble, S., Kapoor, R., Kratzer, S., Li, N., Liu, H., Nagaraj, K., Ornstein, J., Sawhney, S., Urata, R., Vicisano, L., Yasumura, K., Zhang, S., Zhou, J., Vahdat, A.: Jupiter evolving: transforming Google’s datacenter network via optical circuit switches and software-defined networking ACM SIGCOMM conference, Amsterdam, Netherlands, pp. 66–85, August 2022
Zhou, H., Feng, D., Hu, Y.: Bandwidth-aware scheduling repair techniques in erasure-coded clusters: design and analysis. IEEE Trans. Parallel Distrib. Syst. 33(12), 3333–3348 (2022)
Article Google Scholar
Zhou, T., Tian, C.: Fast erasure coding for data storage: a comprehensive study of the acceleration techniques. In: USENIX Conference on File and Storage Technologies (FAST), Boston, USA, February 25–28, 2019
Miller, L.T.E., Schwarz, T., Kwong, A.: High performance Galois field arithmetic. http://www.crss.ucsc.edu/proj/galois.html (2017)
Mitra, S., Panta, R., Ra, M.-R., Bagchi, S.: Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage. In: Proceedings of the Eleventh European Conference on Computer Systems (EuroSys), No. 30, pp. 1–16, April 2016
Uezato, Y.: Accelerating XOR-based erasure coding using program optimization techniques. In: Supercomputing Conference (SC), New York, USA, No. 87, pp 1–14, November 2021
Niu, T., Lyu, M., Wang, W., Li, Q., Xu, Y.: Cerasure: fast accelaration strategies for XOR-based erasure codes. In: International Conference on Computer Design (ICCD), Washington, DC, USA, November 2023
Liu, C., Wang, Q., Chu, X., Leung, Y.-W.: G-CRS: GPU accelerated Cauchy Reed-Solomon coding. IEEE Trans. Parallel Distrib. Syst. 64(2), 715–722 (2016)
Google Scholar
Shah, N.B., Lee, K., Ramchandran, K.: When do redundant requests reduce latency? IEEE Trans. Commun. 64(2), 715–722 (2016)
Article Google Scholar
Rawat, A.S., Papailiopoulous, D., Dimakis, A., Vishwanath, S.: Locality and availability in distributed storage. IEEE Trans. Inf. Theory 62(8), 4481–4493 (2016)
Article MathSciNet Google Scholar
Yang, S., Hareedy, A., Calderbank, R., Dolecek, L.: Hierarchical coding for cloud storage: topology-adaptivity, scalability, and flexibility. IEEE Trans. Inf. Theory 68, 3657–3680 (2022)
Article MathSciNet Google Scholar
Li, J., Li, B.: Parallelism-aware locally repairable codes for distributed storage systems. In: IEEE International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, July 2–5, 2018
Macwilliams, F.J., Sloane, N.J.A.: The Theory of Error-Correcting Codes. North-Holland Publishing Company, Amsterdam (1977)
Google Scholar
Dinh, T.X., Ngyen L.Y Nhi, Mohan L.J., Boztas, S., Luong, T., Dau, H.: Practical consideration in repairing Reed-Solomon codes. In: International Symposium on Information Theory (ISIT), Helsinki, Finland, June 27–July 1, 2022
Open-source software of “G-CRS:GPU Accelerated Cauchy Reed-Solomon Coding” (2018). https://www.comp.hkbu.edu.hk/chxw/gcrs.html
Nvidia Corp., “CUDA C++ Programming Guide” (2024). https://docs.nvidia.com/cuda/cuda-c-programming-guide
Yao, Q., Hu, Y., Tu, X., Lee, P.P.C., Feng, D.: PivotRepair: fast pipelined repair for erasure-coded hot storage. In: International Conference on Distributed Computing Systems (ICDCS), Bologna, Italy, July 10–13, 2022, pp. 614–624
Li, X., Cheng, K., Tang, K., Lee, P.P.C., Hu, Y., Feng, D., Li, J., Wu, T.-Y.: ParaRC: embracing sub-packetization for repair parallelization in MSR-coded storage. In: USENIX Conference on File and Storage Technologies (FAST), Santa Clara, USA, February 21–23, 2023, pp. 17–31
Chon, K.-W., Hwang, S.-H., Kim, M.-S.: GMiner: a fast GPU based frequent itemset mining method for large-scale data. Inf. Sci. 439–440, 19–38 (2018)
Article MathSciNet Google Scholar
Kim, M.-S., An, K., Park, H., Kim, J.: GTS: a fast and scalable graph processing method based on streaming topology to GPUs. In: International Conference on Management of Data (SIGMOD), San Francisco, USA, June 26–July 1, 2016, pp. 447–461
Han, S., Jang, K., Park, K., Moon, S.: PacketShader: a GPU accelerated software router. ACM SIGCOMM Comput. Commun. Rev. 40(4), 195–206 (2010)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Nos. 2021R1G1A1091369 and RS-2023-00281635). In addition, the research was supported by “Research Base Construction Fund Support Program” funded by Jeonbuk National University in 2023.

Funding

This study was supported by National Research Foundation of Korea (Grant Nos. 2021R1G1A1091369, RS-2023-00281635), Jeonbuk National University (Grant No. Research Base Construction Fund Support Program).

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, Jeonbuk National University, 567, Baekje-daero, Deokjin-gu, Jeonju-si, Jeollabuk-do, 54896, Republic of Korea
Chanki Kim
School of Computer Science and Engineering, Korea University of Technology and Education, 1600 Chungjeolno, Byeongchunmyun, Cheonan-si, Chungcheongnam-do, 31253, Republic of Korea
Kang-Wook Chon

Authors

Chanki Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kang-Wook Chon
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors wrote and reviewed the manuscript.

Corresponding author

Correspondence to Kang-Wook Chon.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kim, C., Chon, KW. Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04438-y

Download citation

Received: 28 November 2023
Revised: 14 February 2024
Accepted: 18 March 2024
Published: 12 April 2024
DOI: https://doi.org/10.1007/s10586-024-04438-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Abstract

Access this article

Similar content being viewed by others

A study of the performance of novel storage-centric repairable codes

CRL: Efficient Concurrent Regeneration Codes with Local Reconstruction in Geo-Distributed Storage Systems

H-V: An Improved Coding Layout Based on Erasure Coded Storage System

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Abstract

Access this article

Similar content being viewed by others

A study of the performance of novel storage-centric repairable codes

CRL: Efficient Concurrent Regeneration Codes with Local Reconstruction in Geo-Distributed Storage Systems

H-V: An Improved Coding Layout Based on Erasure Coded Storage System

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation