research-article

Trident: task scheduling over tiered storage systems in big data platforms

Authors:
Herodotos Herodotou

Cyprus University of Technology, Limassol, Cyprus

Cyprus University of Technology, Limassol, Cyprus
View Profile

,
Elena Kakoulli

Cyprus University of Technology, Limassol, Cyprus

Cyprus University of Technology, Limassol, Cyprus
View Profile

Proceedings of the VLDB Endowment Volume 14 Issue 9pp 1570–1582https://doi.org/10.14778/3461535.3461545

Published:01 May 2021Publication History

Proceedings of the VLDB Endowment

Abstract

The recent advancements in storage technologies have popularized the use of tiered storage systems in data-intensive compute clusters. The Hadoop Distributed File System (HDFS), for example, now supports storing data in memory, SSDs, and HDDs, while OctopusFS and hatS offer fine-grained storage tiering solutions. However, the task schedulers of big data platforms (such as Hadoop and Spark) will assign tasks to available resources only based on data locality information, and completely ignore the fact that local data is now stored on a variety of storage media with different performance characteristics. This paper presents Trident, a principled task scheduling approach that is designed to make optimal task assignment decisions based on both locality and storage tier information. Trident formulates task scheduling as a minimum cost maximum matching problem in a bipartite graph and uses a standard solver for finding the optimal solution. In addition, Trident utilizes two novel pruning algorithms for bounding the size of the graph, while still guaranteeing optimality. Trident is implemented in both Spark and Hadoop, and evaluated extensively using a realistic workload derived from Facebook traces as well as an industry-validated benchmark, demonstrating significant benefits in terms of application performance and cluster efficiency.

References

Cristina L Abad, Yi Lu, and Roy H Campbell. 2011. DARE: Adaptive Data Replication for Efficient Cluster Scheduling. In Proc. of the 2011 IEEE Intl. Conf. on Cluster Computing (CLUSTER). IEEE, 159--168. Google ScholarDigital Library
Faraz Ahmad, Srimat T Chakradhar, Anand Raghunathan, and TN Vijaykumar. 2012. Tarazu: Optimizing MapReduce on Heterogeneous Clusters. ACM SIGARCH Computer Architecture News 40, 1 (2012), 61--74. Google ScholarDigital Library
Alluxio 2021. Alluxio: Data Orchestration for the Cloud. Retrieved May 5, 2021 from http://www.alluxio.org/Google Scholar
Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert Greenberg, Ion Stoica, Duke Harlan, and Ed Harris. 2011. Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters. In Proc. of the 6th European Conf. on Computer Systems (EuroSys). ACM, 287--300. Google ScholarDigital Library
Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2011. Disk-locality in Datacenter Computing Considered Irrelevant. In Proc. of the 13th Workshop on Hot Topics in Operating Systems (HotOS). USENIX, 12--17. Google ScholarDigital Library
Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Warfield, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, and Ion Stoica. 2012. PACMan: Coordinated Memory Caching for Parallel Jobs. In Proc. of the 9th USENIX Symp. on Networked Systems Design and Implementation (NSDI). USENIX, 267--280. Google ScholarDigital Library
Apache Hadoop 2021. Apache Hadoop. Retrieved May 5, 2021 from https://hadoop.apache.orgGoogle Scholar
Apache Spark 2021. Apache Spark. Retrieved May 5, 2021 from https://spark.apache.orgGoogle Scholar
Quan Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo. 2010. SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment. In Proc. of the 10th IEEE Intl. Conf. on Computer and Information Technology (ICCIT). IEEE, 2736--2743. Google ScholarDigital Library
Yanpei Chen, Sara Alspaugh, and Randy Katz. 2012. Interactive Analytical Processing in Big Data Systems: A Cross-industry Study of MapReduce Workloads. PVLDB 5, 12 (2012), 1802--1813. Google ScholarDigital Library
Yanpei Chen, Archana Ganapathi, Rean Griffith, and Randy Katz. 2011. The Case for Evaluating MapReduce Performance using Workload Suites. In Proc. of the 2011 IEEE Intl. Symp. on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 390--399. Google ScholarDigital Library
Dazhao Cheng, Jia Rao, Yanfei Guo, and Xiaobo Zhou. 2014. Improving MapReduce Performance in Heterogeneous Environments with Adaptive Task Tuning. In Proc. of the 15th IEEE Intl. Conf. on Cluster Computing (CLUSTER). ACM, 97--108.Google ScholarDigital Library
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT press. Google ScholarDigital Library
Francis Deslauriers, Peter McCormick, George Amvrosiadis, Ashvin Goel, and Angela Demke Brown. 2016. Quartet: Harmonizing Task Scheduling and Caching for Cluster Computing. In Proc. of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). USENIX, 1--5. Google ScholarDigital Library
Ran Duan and Seth Pettie. 2014. Linear-time Approximation for Maximum Weight Matching. Journal of the ACM (JACM) 61, 1 (2014), 1--23. Google ScholarDigital Library
Avrilia Floratou, Nimrod Megiddo, Navneet Potti, Fatma Özcan, Uday Kale, and Jan Schmitz-Hermes. 2016. Adaptive Caching in Big SQL using the HDFS Cache. In Proc. of the 7th ACM Symp. on Cloud Computing (SoCC). ACM, 321--333. Google ScholarDigital Library
Rohan Gandhi, Di Xie, and Y Charlie Hu. 2013. PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters. In Proc. of the 2013 USENIX Annual Technical Conference (ATC). USENIX, 61--66. Google ScholarDigital Library
Kannan Govindarajan, Supun Kamburugamuve, Pulasthi Wickramasinghe, Vibhatha Abeykoon, and Geoffrey Fox. 2017. Task Scheduling in Big Data-Review, Research Challenges, and Prospects. In Proc. of the 9th Intl. Conf. on Advanced Computing (ICoAC). IEEE, 165--173.Google ScholarCross Ref
GridGain 2021. GridGain In-Memory Computing Platform. Retrieved May 5, 2021 from http://www.gridgain.com/Google Scholar
HDFS 2020. HDFS Archival Storage, SSD & Memory. Retrieved May 5, 2021 from https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.htmlGoogle Scholar
Herodotos Herodotou and Elena Kakoulli. 2019. Automating Distributed Tiered Storage Management in Cluster Computing. PVLDB 13, 1 (2019), 43--56. Google ScholarDigital Library
HiBench 2020. HiBench Suite. Retrieved May 5, 2021 from https://github.com/intel-hadoop/HiBenchGoogle Scholar
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In Proc. of the 8th USENIX Symp. on Networked Systems Design and Implementation (NSDI). USENIX, 295--308. Google ScholarDigital Library
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2011. The HiBench Benchmark Suite: Characterization of the MapReduce-based Data Analysis. In New Frontiers in Information and Software as Services. Springer, 209--228.Google Scholar
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proc. of the 22nd ACM Symp. on Operating Systems Principles (SOSP). ACM, 261--276. Google ScholarDigital Library
Jingjie Jiang, Shiyao Ma, Bo Li, and Baochun Li. 2016. Symbiosis: Network-aware Task Scheduling in Data-parallel Frameworks. In Proc. of the 35th IEEE Intl. Conf. on Computer Communications (INFOCOM). IEEE, 1--9.Google ScholarDigital Library
Elena Kakoulli and Herodotos Herodotou. 2017. OctopusFS: A Distributed File System with Tiered Storage Management. In Proc. of the 2017 ACM Intl. Conf. on Management of Data (SIGMOD). ACM, 65--78. Google ScholarDigital Library
KR Krish, Ali Anwar, and Ali R Butt. 2014. hatS: A Heterogeneity-aware Tiered Storage for Hadoop. In Proc. of the 14th IEEE/ACM Intl. Symp. on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 502--511. Google ScholarDigital Library
Sparsh Mittal and Jeffrey S Vetter. 2015. A Survey of Software Techniques for using Non-volatile Memories for Storage and Main Memory Systems. IEEE Transactions on Parallel and Distributed Systems (TPDS) 27, 5 (2015), 1537--1550. Google ScholarDigital Library
Seyed Reza Pakize. 2014. A Comprehensive View of Hadoop MapReduce Scheduling Algorithms. International Journal of Computer Networks & Communications Security 2, 9 (2014), 308--317.Google Scholar
Fengfeng Pan, Jin Xiong, Yijie Shen, Tianshi Wang, and Dejun Jiang. 2018. H-scheduler: Storage-aware task scheduling for heterogeneous-storage spark clusters. In Proc. of the 24th IEEE Intl. Conf. on Parallel and Distributed Systems (ICPADS). IEEE, 1--9.Google ScholarCross Ref
Mario Pastorelli, Damiano Carra, Matteo Dell'Amico, and Pietro Michiardi. 2015. HFSP: Bringing Size-based Scheduling to Hadoop. IEEE Transactions on Cloud Computing 5, 1 (2015), 43--56.Google ScholarCross Ref
Aparna Raj, Kamaldeep Kaur, Uddipan Dutta, V Venkat Sandeep, and Shrisha Rao. 2012. Enhancement of Hadoop Clusters with Virtualization Using the Capacity Scheduler. In Proc. of the Third Intl. Conf. on Services in Emerging Markets. IEEE, 50--57. Google ScholarDigital Library
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. In Proc. of the 26th Intl. Conf. on Massive Storage Systems and Technology (MSST). IEEE, 1--10. Google ScholarDigital Library
Mbarka Soualhia, Foutse Khomh, and Sofiène Tahar. 2017. Task Scheduling in Big Data Platforms: A Systematic Literature Review. Journal of Systems and Software 134 (2017), 170--189. Google ScholarDigital Library
Xiaoyu Sun, C. He, and Ying Lu. 2012. ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm. In Proc. of the 18th IEEE Intl. Conf. on Parallel and Distributed Systems (ICPADS). IEEE, 148--155. Google ScholarDigital Library
SWIM 2016. SWIM: Statistical Workload Injector for MapReduce. Retrieved May 5, 2021 from https://github.com/SWIMProjectUCB/SWIM/wikiGoogle Scholar
Jian Tan, Xiaoqiao Meng, and Li Zhang. 2013. Coupling Task Progress for MapReduce Resource-aware Scheduling. In Proc. of the 32nd IEEE Intl. Conf. on Computer Communications (INFOCOM). IEEE, 1618--1626.Google ScholarCross Ref
Zhuo Tang, Min Liu, Almoalmi Ammar, Kenli Li, and Keqin Li. 2016. An Optimized MapReduce Workflow Scheduling Algorithm for Heterogeneous Computing. The Journal of Supercomputing 72, 6 (2016), 2059--2079. Google ScholarDigital Library
Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, et al. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proc. of the 4th ACM Symp. on Cloud Computing (SoCC). ACM, 1--16. Google ScholarDigital Library
Jiayin Wang, Yi Yao, Ying Mao, Bo Sheng, and Ningfang Mi. 2014. Fresh: Fair and Efficient Slot Configuration and Scheduling for Hadoop Clusters. In Proc. of the 7th IEEE Intl. Conf. on Cloud Computing (CLOUD). IEEE, 761--768. Google ScholarDigital Library
Weina Wang, Kai Zhu, Lei Ying, Jian Tan, and Li Zhang. 2014. Map Task Scheduling in MapReduce with Data Locality: Throughput and Heavy-traffic Optimality. IEEE/ACM Transactions On Networking 24, 1 (2014), 190--203. Google ScholarDigital Library
Luna Xu, A. Butt, Seung-Hwan Lim, and R. Kannan. 2018. A Heterogeneity-Aware Task Scheduler for Spark. In Proc. of the 2018 IEEE Intl. Conf. on Cluster Computing (CLUSTER). IEEE, 245--256.Google Scholar
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2009. Job Scheduling for Multi-User MapReduce Clusters. Technical Report UCB/EECS-2009-55. EECS Department, University of California, Berkeley. Retrieved May 5, 2021 from http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.htmlGoogle Scholar
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In Proc. of the 5th European Conf. on Computer Systems (EuroSys). ACM, 265--278. Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, et al. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proc. of the 9th USENIX Symp. on Networked Systems Design and Implementation (NSDI). USENIX, 15--28. Google ScholarDigital Library
Matei Zaharia, Andy Konwinski, Anthony D Joseph, Randy H Katz, and Ion Stoica. 2008. Improving MapReduce Performance in Heterogeneous Environments. In Proc. of the 8th USENIX Symp. on Operating Systems Design and Implementation (OSDI). USENIX, 29--42. Google ScholarDigital Library

Index Terms

Trident: task scheduling over tiered storage systems in big data platforms

Index terms have been assigned to the content through auto-classification.

Recommendations

Trident: a scalable architecture for scalar, vector, and matrix operations

Within a few years it will be possible to integrate a billion transistors on a single chip. At this integration level, we propose using a high level ISA to express parallelism to hardware instead of using a huge transistor budget to dynamically extract ...
Read More
Trident: a scalable architecture for scalar, vector, and matrix operations
CRPIT '02: Proceedings of the seventh Asia-Pacific conference on Computer systems architecture

Within a few years it will be possible to integrate a billion transistors on a single chip. At this integration level, we propose using a high level ISA to express parallelism to hardware instead of using a huge transistor budget to dynamically extract ...
Read More
Big Data Analytics with R and Hadoop
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 14, Issue 9
May 2021
249 pages
ISSN:2150-8097
Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 May 2021
Published in pvldb Volume 14, Issue 9
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 50
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Trident: task scheduling over tiered storage systems in big data platforms

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Trident: a scalable architecture for scalar, vector, and matrix operations

Trident: a scalable architecture for scalar, vector, and matrix operations

Big Data Analytics with R and Hadoop

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Trident: task scheduling over tiered storage systems in big data platforms

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Trident: a scalable architecture for scalar, vector, and matrix operations

Trident: a scalable architecture for scalar, vector, and matrix operations

Big Data Analytics with R and Hadoop

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media