Skip to main content
Log in

Parallel failure recovery techniques in cluster-based media servers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

For large-scale video-on-demand (VOD) service, cluster servers are highlighted due to their high performance and low cost. A cluster server consists of a front-end node and multiple backend nodes. Though the increase in backend nodes provides more quality of service (QoS) streams, the possibility of backend node failure is proportionally increased. The failure causes not only the cessation of streaming services but also the loss of current playing positions. In this paper, when a backend node fails, recovery mechanisms are studied to support the streaming service continuously. Without considering the characteristics of cluster-based servers and MPEG media, the basic redundant array of independent disks (RAID) techniques cause a network bottleneck in the internal network path and demonstrate inefficient CPU usage in backend nodes. To address these problems, a new failure recovery mechanism is proposed based on the pipeline computing concept. The proposed method not only distributes the internal network traffic generated from the recovery operations but also utilizes the CPU time available in the backend nodes. In the experiments, even if a backend node fails, the proposed method provides continuous streaming media services within a short MTTR value as well as more QoS streams than the existing method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bolosky WJ, Pitzgerald RP, Draves JH (1997) Distributed schedule management in the tiger video fileserver. In: Proceedings of the sixteenth ACM symposium on operating systems principles, Saint Malo, France, October 5–8, 1997, pp 212–223

  2. Chang T, Shim S, Du D (1998) The designs of RAID with XOR engines on disks for mass storage systems. In: IEEE mass storage conference, March 23–26, 1998, pp 181–186

  3. Choi J-M, Lee S-W, Chung K-D (2001) A multicast delivery scheme for VCR operations in a large VOD system. In: IEEE international conference on parallel and distributed systems, June 26–29, 2001, pp 555–561

  4. Fox A, Patterson D (2005) Approaches to recovery oriented computing. IEEE Internet Comput 9(2):14–16. doi:10.1109/MIC.2005.39

    Article  Google Scholar 

  5. Gafsi J, Biersack EW (1999) Data striping and reliability aspects in distributed video servers. Cluster Comput Netw Softw Tools Appl 2(1):75–91

    Google Scholar 

  6. Gafsi J, Biersack EW (2000) Modeling and performance comparison of reliability strategies for distributed video servers. IEEE Trans Parallel Distrib Syst 11(4):412–430. doi:10.1109/71.850836

    Article  Google Scholar 

  7. Holland M, Gibson G, Siewiorek D (1994) Architectures and algorithms for on-line failure recovery in redundant disk arrays. J Distrib Parallel Databases 2:295–335. doi:10.1007/BF01266332

    Article  Google Scholar 

  8. http://www.ieeetfcc.org (2009)

  9. http://www.mpeg.org (2009)

  10. Kang S, Yeom HY (2003) Modeling the caching effect in continuous media servers. Multimedia Tools Appl 23(3):203–224. doi:10.1023/A:1025702332314

    Article  Google Scholar 

  11. Merchant A, Yu PS (1995) Analytic modeling and comparisons of striping strategies for replicated disk arrays. IEEE Trans Comput 44:419–433. doi:10.1109/12.372034

    Article  MATH  Google Scholar 

  12. Patterson DA, Hennessy JL (1998) Computer organization & design. Morgan Kaufmann, San Mateo, pp 392–490

    MATH  Google Scholar 

  13. Sarhan NJ, Das CR (2004) Caching and scheduling in NAD-based multimedia servers. IEEE Trans Parallel Distrib Syst 15(10):921–933. doi:10.1109/TPDS.2004.49

    Article  Google Scholar 

  14. Schmidt BK, Lam MS, Northcutt JD (1999) The interactive performance of SLIM: a stateless, thin-client architecture. In: ACM symposium on operating systems principles, 1999, pp 31–47

  15. Seo D, Lee J, Jung I (2007) Resource consumption-aware QoS in cluster-based VOD servers. J Syst Archit 53(1):39–52

    Article  Google Scholar 

  16. Shenoy PJ, Goyal P, Vin HM (2002) Data storage and retrieval for video-on-demand servers. In: IEEE fourth international symposium on multimedia software engineering, December 2002, pp 240–245

  17. Sitaram D, Dan A (2000) Multimedia servers: applications, environments, and design. Morgan Kaufmann, San Mateo

    Google Scholar 

  18. Tang D, Zhu J, Andrada R (2002) Automatic generation of availability models in RAScard. In: IEEE international conference of dependable systems and networks, June 23–26, 2002, pp 488–494

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inbum Jung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, J., Jung, I. Parallel failure recovery techniques in cluster-based media servers. J Supercomput 51, 20–39 (2010). https://doi.org/10.1007/s11227-009-0305-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-009-0305-6

Keywords

Navigation