ABSTRACT
Inter-coflow scheduling improves application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information and ignore cluster dynamics like pipelining, task failures, and speculative executions, which limit their applicability. Schedulers without prior knowledge compromise on performance to avoid head-of-line blocking. In this paper, we present Aalo that strikes a balance and efficiently schedules coflows without prior knowledge.
Aalo employs Discretized Coflow-Aware Least-Attained Service (D-CLAS) to separate coflows into a small number of priority queues based on how much they have already sent across the cluster. By performing prioritization across queues and by scheduling coflows in the FIFO order within each queue, Aalo's non-clairvoyant scheduler reduces coflow completion times while guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete 1.93X faster on average and 3.59X faster at the 95th percentile using Aalo in comparison to per-flow mechanisms. Aalo's performance is comparable to that of solutions using prior knowledge, and Aalo outperforms them in presence of cluster dynamics.
Supplemental Material
- Amazon EC2. http://aws.amazon.com/ec2.Google Scholar
- Apache Hive. http://hive.apache.org.Google Scholar
- Apache Tez. http://tez.apache.org.Google Scholar
- Impala performance update: Now reaching DBMS-class speed. http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed.Google Scholar
- A look inside Google's data center networks. http://googlecloudplatform.blogspot.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html.Google Scholar
- TPC Benchmark DS (TPC-DS). http://www.tpc.org/tpcds.Google Scholar
- TPC-DS kit for Impala. https://github.com/cloudera/impala-tpcds-kit.Google Scholar
- M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In NSDI, 2010. Google ScholarDigital Library
- M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus, R. Pan, N. Yadav, and G. Varghese. CONGA: Distributed congestion-aware load balancing for datacenters. In SIGCOMM, 2014. Google ScholarDigital Library
- M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. Mckeown, B. Prabhakar, and S. Shenker. pFabric: Minimal near-optimal datacenter transport. In SIGCOMM, 2013. Google ScholarDigital Library
- G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated memory caching for parallel jobs. In NSDI, 2012. Google ScholarDigital Library
- G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in mapreduce clusters using Mantri. In OSDI, 2010. Google ScholarDigital Library
- R. H. Arpaci-Dusseau and A. C. Arpaci-Dusseau. Scheduling: The multi-level feedback queue. In Operating Systems: Three Easy Pieces. 2014.Google Scholar
- W. Bai, L. Chen, K. Chen, D. Han, C. Tian, and H. Wang. Information-agnostic flow scheduling for commodity data centers. In NSDI, 2015. Google ScholarDigital Library
- H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Towards predictable datacenter networks. In SIGCOMM, 2011. Google ScholarDigital Library
- T. Benson, A. Anand, A. Akella, and M. Zhang. MicroTE: Fine grained traffic engineering for data centers. In CoNEXT, 2011. Google ScholarDigital Library
- M. Chowdhury, S. Kandula, and I. Stoica. Leveraging endpoint flexibility in data-intensive clusters. In SIGCOMM, 2013. Google ScholarDigital Library
- M. Chowdhury and I. Stoica. Coflow: A networking abstraction for cluster applications. In HotNets, 2012. Google ScholarDigital Library
- M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing data transfers in computer clusters with Orchestra. In SIGCOMM, 2011. Google ScholarDigital Library
- M. Chowdhury, Y. Zhong, and I. Stoica. Efficient coflow scheduling with Varys. In SIGCOMM, 2014. Google ScholarDigital Library
- E. G. Coffman and L. Kleinrock. Feedback queueing models for time-shared systems. Journal of the ACM, 15(4):549--576, 1968. Google ScholarDigital Library
- T. Condie, N. Conway, P. Alvaro, and J. M. Hellerstein. Mapreduce online. In NSDI, 2010. Google ScholarDigital Library
- F. J. Corbató, M. Merwin-Daggett, and R. C. Daley. An experimental time-sharing system. In Spring Joint Computer Conference, pages 335--344, 1962. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarDigital Library
- F. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized task-aware scheduling for data center networks. In SIGCOMM, 2014. Google ScholarDigital Library
- N. G. Duffield, P. Goyal, A. Greenberg, P. Mishra, K. K. Ramakrishnan, and J. E. van der Merive. A flexible model for resource management in virtual private networks. In SIGCOMM, 1999. Google ScholarDigital Library
- A. D. Ferguson, A. Guha, C. Liang, R. Fonseca, and S. Krishnamurthi. Participatory networking: An API for application control of SDNs. In SIGCOMM, 2013. Google ScholarDigital Library
- A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible data center network. In SIGCOMM, 2009. Google ScholarDigital Library
- C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In SIGCOMM, 2012. Google ScholarDigital Library
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007. Google ScholarDigital Library
- N. Kang, Z. Liu, J. Rexford, and D. Walker. Optimizing the "One Big Switch" abstraction in Software-Defined Networks. In CoNEXT, 2013. Google ScholarDigital Library
- J. E. Kelley. Critical-path planning and scheduling: Mathematical basis. Operations Research, 9(3):296--320, 1961. Google ScholarDigital Library
- J. E. Kelley. The critical-path method: Resources planning and scheduling. Industrial scheduling, 13:347--365, 1963.Google Scholar
- D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003. Google ScholarDigital Library
- Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.Google Scholar
- G. Kumar, M. Chowdhury, S. Ratnasamy, and I. Stoica. A case for performance-centric network allocation. In HotCloud, 2012. Google ScholarDigital Library
- M. Mastrolilli, M. Queyranne, A. S. Schulz, O. Svensson, and N. A. Uhan. Minimizing the sum of weighted completion times in a concurrent open shop. Operations Research Letters, 38(5):390--395, 2010. Google ScholarDigital Library
- T. Moscibroda and O. Mutlu. Distributed order scheduling and its application to multi-core DRAM controllers. In PODC, 2008. Google ScholarDigital Library
- R. Motwani, S. Phillips, and E. Torng. Nonclairvoyant scheduling. Theoretical Computer Science, 130(1):17--47, 1994. Google ScholarDigital Library
- R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM, 2009. Google ScholarDigital Library
- J. Nair, A. Wierman, and B. Zwart. The fundamentals of heavy tails: Properties, emergence, and identification. In SIGMETRICS, 2013. Google ScholarDigital Library
- M. Nuyens and A. Wierman. The Foreground--Background queue: A survey. Performance Evaluation, 65(3):286--307, 2008. Google ScholarDigital Library
- L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. FairCloud: Sharing the network in cloud computing. In SIGCOMM, 2012. Google ScholarDigital Library
- Z. Qiu, C. Stein, and Y. Zhong. Minimizing the total weighted completion time of coflows in datacenter networks. In SPAA, 2015. Google ScholarDigital Library
- I. A. Rai, G. Urvoy-Keller, and E. W. Biersack. Analysis of LAS scheduling for job size distributions with high variance. ACM SIGMETRICS Performance Evaluation Review, 31(1):218--228, 2003. Google ScholarDigital Library
- C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: A compiler and runtime for heterogeneous systems. In SOSP, 2013. Google ScholarDigital Library
- C. Wilson, H. Ballani, T. Karagiannis, and A. Rowstron. Better never than late: Meeting deadlines in datacenter networks. In SIGCOMM, 2011. Google ScholarDigital Library
- R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: SQL and rich analytics at scale. In SIGMOD, 2013. Google ScholarDigital Library
- J. Yu, R. Buyya, and K. Ramamohanarao. Workflow scheduling algorithms for grid computing. In Metaheuristics for Scheduling in Distributed Computing Environments, pages 173--214. 2008.Google ScholarCross Ref
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012. Google ScholarDigital Library
- M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In OSDI, 2008. Google ScholarDigital Library
- Y. Zhao, K. Chen, W. Bai, C. Tian, Y. Geng, Y. Zhang, D. Li, and S. Wang. RAPIER: Integrating routing and scheduling for coflow-aware data center networks. In INFOCOM, 2015.Google ScholarCross Ref
Index Terms
- Efficient Coflow Scheduling Without Prior Knowledge
Recommendations
Efficient coflow scheduling with Varys
SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMMCommunication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-...
Efficient Coflow Scheduling Without Prior Knowledge
SIGCOMM'15Inter-coflow scheduling improves application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information and ignore cluster dynamics like pipelining, task failures, and ...
Efficient coflow scheduling with Varys
SIGCOMM'14Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-...
Comments