A novel data replication mechanism in P2P VoD system

https://doi.org/10.1016/j.future.2011.10.006Get rights and content

Abstract

With the development of the Internet, high-quality streaming services, including Video-on-Demand, are more popular than ever with the help of P2P technologies. But peer-to-peer (P2P) on-demand streaming systems inevitably suffer from peer churn that is the inherent dynamic characteristic of overlay networks. With frequent peer departure and VCR operations, a large amount of media data cached on peer disks turn off-line and unavailable, which becomes the major reason of heavy server load. And the phenomenon has been proved by the system logs of self-developed P2P based Video-on-Demand platform, called GridCast. To address the above issues, a new proactive data replication mechanism is proposed and implemented into GirdCast. Based on the new mechanism, a peer can proactively replicate data chunks to stable cache servers for future sharing, when it has high possibility to leave the overlay. Two key heuristic algorithms are designed for departure prediction and replicating chunks selection. And the cache servers managements are also described in the submission. Trace driven simulations show that the mechanisms greatly decrease bandwidth load of media source server and improve the availability of chunks highly demanded but poorly provisioned by overlay peers.

Highlights

► A new proactive data replication mechanism can decrease server load. ► Two key heuristic algorithms are designed for departure prediction and replicating chunks selection. ► Distributed cache servers are managed in load balance mode.

Introduction

Video-On-Demand (VoD) has become one of the most popular Internet applications with the development of Internet. Many years ago, traditional VoD systems used the client/server architecture and delivered all streaming data by centralized media servers, in which the bandwidth of the server often turns to be the bottleneck for the whole system. But these systems are stable to distribute high-quality streaming services almost all the time. Large scale VoD services such as YouTube, Youku will put million dollar investments on server bandwidth cost. And the number continues to rapidly increase with the prevailing trend of providing high definition VoD streaming systems, which make server loading as the most hardly affordable burden for every Internet video service provider.

Recently, many researchers have proposed that Peer-to-Peer (P2P) technology [1] in large scale VoD (Video-on-Demand) systems [2] [3], [4] could have great potential for client-sided data sharing, which would leverage the peer’s upload capability and disk cache space to relieve the bandwidth burden on media servers. In the realm of telecom service, there have been lots of research on applying P2P in the next-generation IPTV set-top devices [5].

However, several new challenging problems have come along with the widespread applications of P2P technology in VoD system [6], [7]. One important issue is that the overlay network of most P2P systems is born with strong dynamic character, which may counteract the advantages taken by P2P in the following two aspects: first, in P2P VoD systems the prerequisite for bandwidth cost reduction on media servers is that data chunks requested could be fetched from another peers’ disk cache. But after random and unexpected departure behavior, or VCR operations (VCR stands for Video Cassette Recorder, the first popular technology to provide users with time control over their video. VCRs became popular in the 1980s. Today, VCRs have been supplanted by DVD players, PVRs, and even PCs) of dynamic peers, nodes once benefit from the previous online peers are possible to relocate their data source to media server and consequently increase the bandwidth burden; second, in large scale P2P VoD systems, there are usually a large amount of off-line peers whose data chunks cached on disk could not be shared with other system nodes. Even if the requested chunks were stored in off-line peers, they are unable to contribute the desired resource, which again brings bandwidth burden to media servers.

Several years ago, we had developed one P2P based Video-on-Demand system, called GridCast [8], which has been deployed and has provided streaming services for almost 4 years up to now. Analysis of log data collected from the deployed GridCast system indicates that the utilization of P2P technology makes the media server load decrease by 36% compared with the traditional client/server model. There is 64% still left for further improvement. And this means 64% of the data requests could not be hit (or missed) in peers’ aggregate cache space and have to be downloaded from the media server. Observation of GridCast unveils that data request misses are mainly caused by the following factors: (1) insufficient upload capability of peers. Some peers do have cached the requested data chunks, but fail to share due to poor upload bandwidth; (2) firewall or NAT (Network Address Translation). Chunks should be fetched from peers behind firewall or NAT; (3) chunks evicted. There is limited disk cache space for each peer, so requested chunks may have been evicted out of a peer’s cache; (4) new media contents. These requested data chunks are all new content and never appear in the system, which have to be downloaded from media servers unavoidably; (5) peer departure. Online peers could not provide the requested chunks. But the data is cached on currently off-line peers. Table 1 breaks down the data request misses by reason.

As demonstrated in the table, peer departure accounts for nearly half of all the missed requests. Especially in large scale P2P VoD systems, the overlay dynamics makes the effective lifespan shorter for some chunks because the availability is not guaranteed if data are cached on off-line peers. In the research of classical distributed storage systems, one important scheme to improve data availability is replication, which can increase service stability by realizing effective data redundancy. Back to P2P VoD system, we could simply regard the aggregate cache space by all peers’ disk resource as a loosely coupled distributed cache system.

In this paper, we propose a client side proactive data replication mechanism named iDARE, whose main idea follows the replication philosophy of distributed storage system. In iDARE, peers would proactively upload valuable chunks to dedicated cache servers for increasing their availability and potential for future sharing.

The rest of this paper is organized as follows. The system architecture, modules design and working procedures of iDARE is described in Section 2. In Sections 3 Predicting peers departure, 4 Selecting chunks to be replicated, 5 Management of cache servers, strategies for predicting peer departure, selecting chunks to be replicated and management policy of cache servers are explained. Section 6 presents the performance evaluation and demonstrates the effectiveness of the new mechanism. Section 7 introduces related work. We conclude this work in Section 8.

Section snippets

Architecture of GridCast

Before exploring a detailed design of the iDARE mechanism, this section gives a brief introduction to the P2P VoD system GridCast on which iDARE is built. The GridCast VoD service has been online in CERNET since May 2006. Currently, it hosts more than 5000 videos with an average bitrate of 600 kbps. In peak months, GridCast has served to about 23,000 users of which most are campus students in China. In this paper, all traces are selected from the four-year system logs.

As showed in Fig. 1, the

Predicting peers departure

Determining the proper opportunity for a peer to start the data replication procedure is one important but challenging issue in building the iDARE mechanism. Only replication of data from off-line peers (who would soon leave the overlay network) could effectively compensate for the departure misses caused by peers churn, and reduce the source server load. So, the key for this issue is to accurately analyze and characterize the peer behaviors under P2P VoD, especially the user departure.

Selecting chunks to be replicated

Determining the selection of chunks that need to be uploaded is another critical issue for realizing cost-effective data replication procedure. Only upload of data that highly demanded (by peers) but rarely provisioned (in peers’ cache) could compensate for the misses caused by peer departure, and reduce the source server load. So, the key for this issue is to accurately analyze and characterize data demand and supply in P2P VoD system.

Management of cache servers

When we have decided when to replicate and which should be replicated, the management policy of cache servers is important. In real implementation, there are many issues, which should be considered. Here we just explain two important issues: firstly, the chunk management in the cache server; then the assignment scheme of distributed cache servers.

Performance evaluations

The performance of iDARE mechanism is evaluated through trace-driven simulation based on half-month log data collected between Sep 13, 2007 and Oct 4, 2007. During the logging period, the tracker server records detailed data access information of 1721 video channels from about 8199 peer nodes.

Firstly, we compare the data request hit ratio in none-source cache resource between pure P2P architecture and GridCast system with iDARE mechanism. Here none-source cache resource indicates all available

Related work

The related research of proactive data replication mechanism in P2P VoD systems can be described as two types: the mechanism should be able to analyze the characters of peers churn and peers data access pattern (trend); during the data replication procedure, the mechanism should support effective chunks selection, upload, management and sharing.

Conclusions

This paper presents the strategy using client-side proactive data replication mechanism iDARE to decrease departure misses and media server load in P2P VoD. Our contributions are: first, based on user behaviors analysis of log data from deployed system GridCast, we propose two core algorithms for determining replication opportunity and selecting to-be replicated chunks, which are targeted at the characteristics of P2P VoD. Second, we design and implement a prototype of iDARE, which is

Acknowledgments

This work is supported by Program for New Century Excellent Talents in University under grant NCET-08-0218, China National Natural Science Foundation (NSFC) under grant 60973133, FOK YING TUNG Education Foundation under grant No. 122007 and the National Science and Technology Major Project of the Ministry of Science and Technology of China under grant No. 2010ZX-03004-001-03.

Xiaofei Liao received his Ph.D. degree in computer science and engineering from Huazhong University of Science and Technology (HUST), China, in 2005. He is now an associate professor in the school of Computer Science and Engineering at HUST. He has served as a reviewer for many conferences and journal papers. His research interests are in the areas of virtualization technology for computing system, P2P system, cluster computing and streaming services. He is a member of the IEEE and the IEEE

References (34)

  • G. Fortino et al.

    Cooperative control of multicast-based streaming on-demand systems

    Future Generation Computer Systems

    (2005)
  • L. Liu et al.

    Fault-tolerant peer-to-peer search on small-world networks

    Future Generation Computer Systems

    (2007)
  • Y. He et al.

    Solving streaming capacity problems in P2P VoD systems

    IEEE Transactions on Circuits and Systems for Video Technology

    (2010)
  • B. Li, M. Ma, Z. Jin, D. Zhao, Topology investigation of a large-scale P2P VoD overlay network based on active...
  • Y. Zhang, H. Wang, P. Li, Z. Jiang, C. Gao, How can peers assist each other in large-scale P2P-VoD systems, in:...
  • Z. Wang, C. Wu, L. Sun, S. Yang, Strategies of collaboration in multi-channel P2P VoD streaming, in: Proceedings of...
  • V. Janardhan, H. Schulzrinne, Peer assisted VoD for set-top box based IP network, in: Proceedings of the 2007 Workshop...
  • B. Cheng et al.

    GridCast: improving peer sharing for P2P VoD

    ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

    (2008)
  • L. Guo, S. Chen, S. Ren, X. Chen, S. Jiang, PROP: a scalable and reliable P2P assisted proxy streaming system, in:...
  • G. Dan, Cooperative caching and relaying strategies for peer-to-peer content delivery, in: Proceedings of IPTPS’08,...
  • T. Do, K.A. Hua, M. Tantaoui, P2VoD: providing fault tolerant video-on-demand streaming in peer-to-peer environment,...
  • L.H. Ying, A. Basu, pcVOD: internet peer-to-peer video-ondemand with storage caching on peers, in: Proceedings of the...
  • M. Hefeeda, A. Habib, B. Botev, D. Xu, D.B. Bhargava, PROMISE: A peer-to-peer media streaming using collectcast, in:...
  • M. Zhou et al.

    Tree-assisted gossiping for overlay video distribution

    Multimed Tools and Applications

    (2009)
  • D. Wang et al.

    A dynamic skip list-based overlay for on-demand media streaming with VCR interactions

    IEEE Transactions on Parallel and Distributed Systems

    (2008)
  • D. Stutzbach, R. Rejaie, Understanding churns in peer-to-peer networks, in: Proceedings of the 6th ACM SIGCOMM...
  • M. Ripeanu et al.

    Mapping the Gnutella network

    IEEE Internet Computing

    (2002)
  • Cited by (0)

    Xiaofei Liao received his Ph.D. degree in computer science and engineering from Huazhong University of Science and Technology (HUST), China, in 2005. He is now an associate professor in the school of Computer Science and Engineering at HUST. He has served as a reviewer for many conferences and journal papers. His research interests are in the areas of virtualization technology for computing system, P2P system, cluster computing and streaming services. He is a member of the IEEE and the IEEE Computer Society.

    Hai Jin received his B.S., an M.A. and a Ph.D. degree in computer engineering from Huazhong University of Science and Technology (HUST) in 1988, 1991 and 1994, respectively. Now he is a Professor of Computer Science and Engineering at HUST in China. He is now the Dean of School of Computer Science and Technology at HUST. In 1996, he was awarded German Academic Exchange Service (DAAD) fellowship for visiting the Technical University of Chemnitz in Germany. He worked for the University of Hong Kong between 1998 and 2000 and participated in the HKU Cluster project. He worked as a visiting scholar at the University of Southern California between 1999 and 2000. He is the chief scientist of the 973 project “ChinaV” and the largest grid computing project, “ChinaGrid”, in China. His research interests include virtualization technology for computing system, cluster computing and grid computing, peer-to-peer computing, network storage, network security, and high assurance computing. He is the member of Grid Forum Steering Group (GFSG). He is a senior member of IEEE and member of ACM.

    Linchen Yu now is a Ph.D. candidate in computer science and engineering of Huazhong University of Science and Technology (HUST), China. Her research interests are in the areas of peer-to-peer system, cluster computing and streaming services. Her email address is [email protected].

    View full text