Skip to main content

Parallel Data Regeneration Based on Multiple Trees with Network Coding in Distributed Storage System

  • Conference paper
  • First Online:
  • 1298 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9529))

Abstract

Distributed storage systems can provide large-scale data storage and high data reliability by redundant schemes, such as replica and erasure codes. Redundant data may get lost due to frequent node failures in the system. The lost data is needed to be regenerated as soon as possible so as to maintain data availability and reliability. The direct way for reducing regeneration time is to reduce network traffic in regeneration. Compared with that way, tree-structured regeneration achieves shorter regeneration time by constructing better tree-structured topology to increase transmission bandwidth. However, some bandwidth of many other edges beyond the tree is not utilized to speed up transmission in tree-structured regeneration. In this paper, we consider to use multiple edge-disjoint trees to parallel regenerate the lost data, and analyze the total regeneration time. We deduce the formula about optimal regeneration time, and propose an approximate construction algorithm with polynomial time complexity for the optimal multiple regeneration trees. Our experiments shows, the regeneration time reduces 62 % compared with common tree–structured scheme, and the file availability reaches almost 99 %.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., Kubia towicz, J.: Pond: the OceanStore Prototype. In: FAST, pp. 1–14 (2003)

    Google Scholar 

  2. Huang, C., Simitci, H., Xu, Y., et al.: Erasure coding in windows azure storage. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, pp. 2–2. USENIX Association, Boston, MA, USA (2012)

    Google Scholar 

  3. Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., et al.: XORing elephants: novel erasure codes for big data. In: Proceedings of the 39th International Conference on Very Large Data Bases, pp. 325–336. VLDB Endowment (2013)

    Google Scholar 

  4. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: SOSP, pp. 29–43 (2003)

    Google Scholar 

  5. Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., Lu, S.: BCube: a high performance, server-centric network architecture for modular data centers. In: Proceedings of ACM SIGCOMM 2009 conference on Data communication, pp. 63–74 (2009)

    Google Scholar 

  6. Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: a quantitative comparison. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, p. 328. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Rodrigues, R., Zhou, T.H.: High availability in DHTs: erasure coding vs. replication. In: van Renesse, R. (ed.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Acedanski, S., Deb, S., Medard, M., Koetter, R.: How good is random linear coding based distributed networked storage?. In: Proceedings of 1st Workshop on Network Coding, pp. 1–6, Riva del Garda, Italy (2005)

    Google Scholar 

  9. Dimakis, A., Godfrey, P., Wainwright, M., Ramchandran, K.: Network coding for distributed storage systems. In: Proceedings of 26th INFOCOM, pp. 2000–2008 (2007)

    Google Scholar 

  10. Wu, Y., Dimakis, R., Ramch, K.: Deterministic regenerating codes for distributed storage. In: Allerton Conference on Control, Computing, and Communication, pp. 1–5, Urbana-Champaign, IL (2007)

    Google Scholar 

  11. Li, J., Yang, S., Wang, X., Xue, X., Li, B.: Tree-structured data regeneration with network coding in distributed storage systems. In: Proceedings of 17th IEEE International Workshop on Quality of Service (IWQoS), pp. 1–9 (2009)

    Google Scholar 

  12. Li, J., Yang, S., Wang, X., Li, B.: Tree-structured data regeneration in distributed storage systems with regenerating codes. In: Proceedings INFOCOM, pp. 1–9 (2010)

    Google Scholar 

  13. Ahlswede, R., Cai, N., Li, S.-Y., Yeung, R.: Network information flow. IEEE Trans. Inf. Theory 46(4), 1204–1216 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  14. Duminuco, A., Biersack, E.: Hierarchical codes: how to make erasure codes attractive for peer-to-peer storage systems. In: Eighth International Conference on Peer-to-Peer Computing, pp. 89–98 (2008)

    Google Scholar 

  15. Bhagwan, R., Tati, K., Cheng, Y., Savage, S., Voelker, G.: Total recall: system support for automated availability management. In: Proceedings of NSDI 2001, pp. 25–25 (2004)

    Google Scholar 

  16. Ho, T., Koetter, R., Medard, M., Karger, D., Effros, M.: The benefits of coding over routing in a randomized setting. In: Proceedings of IEEE International Symposium on Information Theory, pp. 442–447 (2003)

    Google Scholar 

  17. Planetlab. http://www.planet-lab.org/

  18. Banerjee, S., Lee, S.-J., Sharma, P., Yalagandula., P.: S3 (Scalable Sensing Service). http://networking.hpl.hp.com/scube/PLI

  19. Stribling., J.: Planetlab All Pairs Ping. http://infospect.planet-lab.org/pings

  20. Tarjan, R.E.: A good algorithm for edge-disjoint branching. Inf. Process. Lett. 51–53 (1974)

    Google Scholar 

  21. Roskind, J., Tarjan, R.E.: A note on finding minimum-cost edge-disjoint spanning trees. Math. Oper. Res. 701–708 (1985)

    Google Scholar 

Download references

Acknowledgments

This research work is supported by National Basic Research Program of China under Grant No.2014CB340303, and The Program of National Natural Science Foundation of China under Grant No.61402514 and No.61402490, and Scientific Research Program of Hunan Provincial Education Department (No.12b012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengfei You .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

You, P., Huang, Z., Wang, C., Hu, M., Peng, Y. (2015). Parallel Data Regeneration Based on Multiple Trees with Network Coding in Distributed Storage System. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27122-4_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27121-7

  • Online ISBN: 978-3-319-27122-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics