Abstract
For large-scale video-on-demand (VOD) service, cluster servers are highlighted due to their high performance and low cost. A cluster server consists of a front-end node and multiple backend nodes. Though the increase in backend nodes provides more quality of service (QoS) streams, the possibility of backend node failure is proportionally increased. The failure causes not only the cessation of streaming services but also the loss of current playing positions. In this paper, when a backend node fails, recovery mechanisms are studied to support the streaming service continuously. Without considering the characteristics of cluster-based servers and MPEG media, the basic redundant array of independent disks (RAID) techniques cause a network bottleneck in the internal network path and demonstrate inefficient CPU usage in backend nodes. To address these problems, a new failure recovery mechanism is proposed based on the pipeline computing concept. The proposed method not only distributes the internal network traffic generated from the recovery operations but also utilizes the CPU time available in the backend nodes. In the experiments, even if a backend node fails, the proposed method provides continuous streaming media services within a short MTTR value as well as more QoS streams than the existing method.
Similar content being viewed by others
References
Bolosky WJ, Pitzgerald RP, Draves JH (1997) Distributed schedule management in the tiger video fileserver. In: Proceedings of the sixteenth ACM symposium on operating systems principles, Saint Malo, France, October 5–8, 1997, pp 212–223
Chang T, Shim S, Du D (1998) The designs of RAID with XOR engines on disks for mass storage systems. In: IEEE mass storage conference, March 23–26, 1998, pp 181–186
Choi J-M, Lee S-W, Chung K-D (2001) A multicast delivery scheme for VCR operations in a large VOD system. In: IEEE international conference on parallel and distributed systems, June 26–29, 2001, pp 555–561
Fox A, Patterson D (2005) Approaches to recovery oriented computing. IEEE Internet Comput 9(2):14–16. doi:10.1109/MIC.2005.39
Gafsi J, Biersack EW (1999) Data striping and reliability aspects in distributed video servers. Cluster Comput Netw Softw Tools Appl 2(1):75–91
Gafsi J, Biersack EW (2000) Modeling and performance comparison of reliability strategies for distributed video servers. IEEE Trans Parallel Distrib Syst 11(4):412–430. doi:10.1109/71.850836
Holland M, Gibson G, Siewiorek D (1994) Architectures and algorithms for on-line failure recovery in redundant disk arrays. J Distrib Parallel Databases 2:295–335. doi:10.1007/BF01266332
http://www.ieeetfcc.org (2009)
http://www.mpeg.org (2009)
Kang S, Yeom HY (2003) Modeling the caching effect in continuous media servers. Multimedia Tools Appl 23(3):203–224. doi:10.1023/A:1025702332314
Merchant A, Yu PS (1995) Analytic modeling and comparisons of striping strategies for replicated disk arrays. IEEE Trans Comput 44:419–433. doi:10.1109/12.372034
Patterson DA, Hennessy JL (1998) Computer organization & design. Morgan Kaufmann, San Mateo, pp 392–490
Sarhan NJ, Das CR (2004) Caching and scheduling in NAD-based multimedia servers. IEEE Trans Parallel Distrib Syst 15(10):921–933. doi:10.1109/TPDS.2004.49
Schmidt BK, Lam MS, Northcutt JD (1999) The interactive performance of SLIM: a stateless, thin-client architecture. In: ACM symposium on operating systems principles, 1999, pp 31–47
Seo D, Lee J, Jung I (2007) Resource consumption-aware QoS in cluster-based VOD servers. J Syst Archit 53(1):39–52
Shenoy PJ, Goyal P, Vin HM (2002) Data storage and retrieval for video-on-demand servers. In: IEEE fourth international symposium on multimedia software engineering, December 2002, pp 240–245
Sitaram D, Dan A (2000) Multimedia servers: applications, environments, and design. Morgan Kaufmann, San Mateo
Tang D, Zhu J, Andrada R (2002) Automatic generation of availability models in RAScard. In: IEEE international conference of dependable systems and networks, June 23–26, 2002, pp 488–494
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, J., Jung, I. Parallel failure recovery techniques in cluster-based media servers. J Supercomput 51, 20–39 (2010). https://doi.org/10.1007/s11227-009-0305-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-009-0305-6