Abstract
Distributed storage systems can provide large-scale data storage and high data reliability by redundant schemes, such as replica and erasure codes. Redundant data may get lost due to frequent node failures in the system. The lost data is needed to be regenerated as soon as possible so as to maintain data availability and reliability. The direct way for reducing regeneration time is to reduce network traffic in regeneration. Compared with that way, tree-structured regeneration achieves shorter regeneration time by constructing better tree-structured topology to increase transmission bandwidth. However, some bandwidth of many other edges beyond the tree is not utilized to speed up transmission in tree-structured regeneration. In this paper, we consider to use multiple edge-disjoint trees to parallel regenerate the lost data, and analyze the total regeneration time. We deduce the formula about optimal regeneration time, and propose an approximate construction algorithm with polynomial time complexity for the optimal multiple regeneration trees. Our experiments shows, the regeneration time reduces 62 % compared with common tree–structured scheme, and the file availability reaches almost 99 %.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., Kubia towicz, J.: Pond: the OceanStore Prototype. In: FAST, pp. 1–14 (2003)
Huang, C., Simitci, H., Xu, Y., et al.: Erasure coding in windows azure storage. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, pp. 2–2. USENIX Association, Boston, MA, USA (2012)
Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., et al.: XORing elephants: novel erasure codes for big data. In: Proceedings of the 39th International Conference on Very Large Data Bases, pp. 325–336. VLDB Endowment (2013)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: SOSP, pp. 29–43 (2003)
Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., Lu, S.: BCube: a high performance, server-centric network architecture for modular data centers. In: Proceedings of ACM SIGCOMM 2009 conference on Data communication, pp. 63–74 (2009)
Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: a quantitative comparison. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, p. 328. Springer, Heidelberg (2002)
Rodrigues, R., Zhou, T.H.: High availability in DHTs: erasure coding vs. replication. In: van Renesse, R. (ed.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005)
Acedanski, S., Deb, S., Medard, M., Koetter, R.: How good is random linear coding based distributed networked storage?. In: Proceedings of 1st Workshop on Network Coding, pp. 1–6, Riva del Garda, Italy (2005)
Dimakis, A., Godfrey, P., Wainwright, M., Ramchandran, K.: Network coding for distributed storage systems. In: Proceedings of 26th INFOCOM, pp. 2000–2008 (2007)
Wu, Y., Dimakis, R., Ramch, K.: Deterministic regenerating codes for distributed storage. In: Allerton Conference on Control, Computing, and Communication, pp. 1–5, Urbana-Champaign, IL (2007)
Li, J., Yang, S., Wang, X., Xue, X., Li, B.: Tree-structured data regeneration with network coding in distributed storage systems. In: Proceedings of 17th IEEE International Workshop on Quality of Service (IWQoS), pp. 1–9 (2009)
Li, J., Yang, S., Wang, X., Li, B.: Tree-structured data regeneration in distributed storage systems with regenerating codes. In: Proceedings INFOCOM, pp. 1–9 (2010)
Ahlswede, R., Cai, N., Li, S.-Y., Yeung, R.: Network information flow. IEEE Trans. Inf. Theory 46(4), 1204–1216 (2000)
Duminuco, A., Biersack, E.: Hierarchical codes: how to make erasure codes attractive for peer-to-peer storage systems. In: Eighth International Conference on Peer-to-Peer Computing, pp. 89–98 (2008)
Bhagwan, R., Tati, K., Cheng, Y., Savage, S., Voelker, G.: Total recall: system support for automated availability management. In: Proceedings of NSDI 2001, pp. 25–25 (2004)
Ho, T., Koetter, R., Medard, M., Karger, D., Effros, M.: The benefits of coding over routing in a randomized setting. In: Proceedings of IEEE International Symposium on Information Theory, pp. 442–447 (2003)
Planetlab. http://www.planet-lab.org/
Banerjee, S., Lee, S.-J., Sharma, P., Yalagandula., P.: S3 (Scalable Sensing Service). http://networking.hpl.hp.com/scube/PLI
Stribling., J.: Planetlab All Pairs Ping. http://infospect.planet-lab.org/pings
Tarjan, R.E.: A good algorithm for edge-disjoint branching. Inf. Process. Lett. 51–53 (1974)
Roskind, J., Tarjan, R.E.: A note on finding minimum-cost edge-disjoint spanning trees. Math. Oper. Res. 701–708 (1985)
Acknowledgments
This research work is supported by National Basic Research Program of China under Grant No.2014CB340303, and The Program of National Natural Science Foundation of China under Grant No.61402514 and No.61402490, and Scientific Research Program of Hunan Provincial Education Department (No.12b012).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
You, P., Huang, Z., Wang, C., Hu, M., Peng, Y. (2015). Parallel Data Regeneration Based on Multiple Trees with Network Coding in Distributed Storage System. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-27122-4_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27121-7
Online ISBN: 978-3-319-27122-4
eBook Packages: Computer ScienceComputer Science (R0)