ABSTRACT
To achieve good trade-off between access performance and memory efficiency, it is appropriate to adopt replication and erasure coding to keep popular and unpopular in-memory datasets, respectively. An issue of redundancy transition from replication to erasure coding (a.k.a., erasure-coded archival) should be addressed for unpopular in-memory datasets, since caching workloads exhibit long-tail distributions and most in-memory data are unpopular.
In this paper, we propose an encoding-oriented replica placement policy - ERP - by incorporating an interleaved declustering mechanism, and design a traffic-efficient erasure-coded archival schemes - TEA - for ERP-powered in-memory stores. With ERP in place, TEA embraces three salient features: (i) it alleviates cross-rack traffic raised by retrieving data-block replicas, (ii) it improves rack-level load balancing by distributing replicas via load-aware primary-rack-selection approach, and (iii) it mitigates block-relocation operations launched to sustain rack-level fault-tolerance. The empirical results show that TEA not only brings forth lower cross-rack traffic than four candidate encoding schemes, but also exhibits superb archival-throughput and rack-level-balancing performance. In particular, TEA accelerates archival throughput by at least 70.8%; and improves rack-level load-balancing by a factor of more than 1.58x relative to the four competitors.
- Faraz Ahmad, Srimat T Chakradhar, and Anand Raghunathan. 2014. Shuffle-Watcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters.. In Proceeding of the 2014 USENIX Annual Technical Conference (ATC'14). 1--12. Google ScholarDigital Library
- Werner Almesberger. 2001. Linux Network Traffic Control-Implementation Overview.Google Scholar
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS Performance Evaluation Review. ACM, 53--64. Google ScholarDigital Library
- Dhruba Borthakur. 2007. The hadoop distributed file system: Architecture and design. Hadoop Project Website 11 (2007), 21.Google Scholar
- D Borthakur, R Schmidt, R Vadali, S Chen, and P Kling. 2010. HDFS RAID. Technical Talk. Yahoo! Developer Network.Google Scholar
- Ruay-Shiung Chang and Hui-Ping Chang. 2008. A dynamic data replication strategy using access-weights in data grids. The Journal of Supercomputing 45, 3 (2008), 277--295. Google ScholarDigital Library
- Yuanqi Chen, Yi Zhou, Shubbhi Taneja, Xiao Qin, and Jianzhong Huang. 2017. aHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters. IEEE Transactions on Parallel and Distributed Systems 28, 11 (2017), 3060--3073.Google ScholarDigital Library
- Asaf Cidon, Stephen M Rumble, Ryan Stutsman, Sachin Katti, John K Ousterhout, and Mendel Rosenblum. 2013. Copysets: Reducing the Frequency of Data Loss in Cloud Storage.. In Usenix Annual Technical Conference. 37--48. Google ScholarDigital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, and et al. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC'10). ACM, 143--154. Google ScholarDigital Library
- George Copeland and Tom Keller. 1989. A Comparison of High-availability Media Recovery Techniques. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data (SIGMOD'89). ACM, 98--109. Google ScholarDigital Library
- D. Ford, F. Labelle, F.I. Popovici, and et al. 2010. Availability in Globally Distributed Storage Systems. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI'10). USENIX, 61--74. Google ScholarDigital Library
- James Lee Hafner. 2005. WEAVER Codes: Highly Fault Tolerant Erasure Codes for Storage Systems. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies (FAST'05). USENIX, 211--224. Google ScholarDigital Library
- Li-Yung Ho, Jan-Jan Wu, and Pangfeng Liu. 2011. Optimal algorithms for cross-rack communication optimization in MapReduce framework. In Proceedings of 2011 IEEE International Conference onCloud Computing (CLOUD). IEEE, 420--427. Google ScholarDigital Library
- Jianzhong Huang, Yanqun Wang, Xiao Qin, and Xianhai Liang. 2015. Exploiting pipelined encoding process to boost erasure-coded data archival. IEEE Transactions on Parallel and Distributed Systems 26, 11 (2015), 2984--2996. Google ScholarDigital Library
- KLab Inc. {n. d.}. Repcached{Online}. http://repcached.lab.klab.org.Google Scholar
- Intel. {n. d.}. Intel® Storage Acceleration Library (Open Source Version). https://goo.gl/zkVl4N.Google Scholar
- Runhui Li, Yuchong Hu, and Patrick PC Lee. 2017. Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Transactions on Parallel and Distributed Systems 28, 9 (2017), 2500--2513.Google ScholarDigital Library
- MS Manasse, CA Thekkath, and A Silverberg. 2009. A Reed-solomon Code for Disk Storage, and Efficient Recovery Computations for Erasure-coded Disk Storage. Proceeding in Informatics (2009), 1--11.Google Scholar
- Rajesh Nishtala, Hans Fugal, Steven Grimm, and et al. 2013. Scaling Memcache at Facebook.. In Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation (NSDI'13), Vol. 13. 385--398. Google ScholarDigital Library
- Lluis Pamies-Juarez, Anwitaman Datta, and Frederique Oggier. 2013. RapidRAID: Pipelined erasure codes for fast data archival in distributed storage systems. In Proceedings of the 32nd IEEE International Conference on Computer Communications (INFOCOM'13). IEEE, 1294--1302.Google ScholarCross Ref
- K. V. Rashmi, Mosharaf Chowdhury, Jack Kosaian, and et al. 2016. EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). USENIX Association, 401--417. Google ScholarDigital Library
- Qaisar Rasool, Jianzhong Li, and Shuo Zhang. 2009. Replica placement in multitier data grid. In 8th IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC'09). IEEE, 103--108. Google ScholarDigital Library
- T. P. Shabeera and S. D. Madhu Kumar. 2013. Bandwidth-aware data placement scheme for Hadoop. In Proceedings of the 2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS'13). 64--67.Google Scholar
- Konstantin Taranov, Gustavo Alonso, and Torsten Hoefler. 2018. Fast and strongly-consistent per-item resilience in key-value stores. In Proceedings of the 13th European Conference on Computer Systems (EuroSys'18). ACM, 39--53. Google ScholarDigital Library
- A Vahdat. 2009. Scale and efficiency in data center networks. UC San Diego (2009).Google Scholar
- Hakim Weatherspoon and John D Kubiatowicz. 2002. Erasure Coding vs. Replication: A Quantitative Comparison. In Peer-to-Peer Systems. Springer, 328--337. Google ScholarDigital Library
- Shuzhan Wei, Yongkun Li, Yinlong Xu, and Si Wu. 2017. DSC: Dynamic stripe construction for asynchronous encoding in clustered file system. In IEEE Conf. on Computer Communications (INFOCOM'17). IEEE, 1--9.Google ScholarCross Ref
- Daniel Wind. 2013. Instant Effective Caching with Ehcache. Packt Publishing Ltd.Google Scholar
- Yanwen Xie, Dan Feng, and Fang Wang. 2017. Non-Sequential Striping for Distributed Storage Systems with Different Redundancy Schemes. In 2017 46th International Conference on Parallel Processing (ICPP). IEEE, 231--240.Google Scholar
- Matt M. T. Yiu, Helen H. W. Chan, and Patrick P. C. Lee. 2017. Erasure Coding for Small Objects in In-memory KV Storage. In Proceedings of the 10th ACM International Systems and Storage Conference (SYSTOR'17). ACM, Article 14, 12 pages. Google ScholarDigital Library
- Heng Zhang, Mingkai Dong, and Haibo Chen. 2016. Efficient and Available In-memory KV-store with Hybrid Erasure Coding and Replication. In Proceedings of the 14th Usenix Conf. on File and Storage Technologies. USENIX, 167--180. Google ScholarDigital Library
Index Terms
- TEA: A Traffic-efficient Erasure-coded Archival Scheme for In-memory Stores
Recommendations
Lonestar: An Energy-Aware Disk Based Long-Term Archival Storage System
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsWe present the architecture for an disk based archival storage system and propose a new RAID scheme that is designed for "write once, read sometimes" workloads. By intertwining parity groups into a multi-dimensional RAID and improving the single disk ...
LoneStar RAID: Massive Array of Offline Disks for Archival Systems
Special Issue on Massive Storage Systems and Technologies (MSST 2015)The need for huge storage archives rises with the ever growing creation of data. With today’s big data and data analytics applications, some of these huge archives become active in the sense that all stored data can be accessed at any time. Running and ...
Measurement for Improving the Design of Commodity Archival Storage Tiers
UCC '11: Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud ComputingArchival data storage plays a critical role in data preservation as almost all current data will eventually be archived. In addition, the demands placed on archival storage tiers are growing because of large regularly-scheduled backups. Archival storage ...
Comments