skip to main content
10.1145/2785956.2787480acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Public Access

Efficient Coflow Scheduling Without Prior Knowledge

Published:17 August 2015Publication History

ABSTRACT

Inter-coflow scheduling improves application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information and ignore cluster dynamics like pipelining, task failures, and speculative executions, which limit their applicability. Schedulers without prior knowledge compromise on performance to avoid head-of-line blocking. In this paper, we present Aalo that strikes a balance and efficiently schedules coflows without prior knowledge.

Aalo employs Discretized Coflow-Aware Least-Attained Service (D-CLAS) to separate coflows into a small number of priority queues based on how much they have already sent across the cluster. By performing prioritization across queues and by scheduling coflows in the FIFO order within each queue, Aalo's non-clairvoyant scheduler reduces coflow completion times while guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete 1.93X faster on average and 3.59X faster at the 95th percentile using Aalo in comparison to per-flow mechanisms. Aalo's performance is comparable to that of solutions using prior knowledge, and Aalo outperforms them in presence of cluster dynamics.

Skip Supplemental Material Section

Supplemental Material

p393-chowdhury.webm

webm

193.5 MB

References

  1. Amazon EC2. http://aws.amazon.com/ec2.Google ScholarGoogle Scholar
  2. Apache Hive. http://hive.apache.org.Google ScholarGoogle Scholar
  3. Apache Tez. http://tez.apache.org.Google ScholarGoogle Scholar
  4. Impala performance update: Now reaching DBMS-class speed. http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed.Google ScholarGoogle Scholar
  5. A look inside Google's data center networks. http://googlecloudplatform.blogspot.com/2015/06/A-Look-Inside-Googles-Data-Center-Networks.html.Google ScholarGoogle Scholar
  6. TPC Benchmark DS (TPC-DS). http://www.tpc.org/tpcds.Google ScholarGoogle Scholar
  7. TPC-DS kit for Impala. https://github.com/cloudera/impala-tpcds-kit.Google ScholarGoogle Scholar
  8. M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In NSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus, R. Pan, N. Yadav, and G. Varghese. CONGA: Distributed congestion-aware load balancing for datacenters. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. Mckeown, B. Prabhakar, and S. Shenker. pFabric: Minimal near-optimal datacenter transport. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated memory caching for parallel jobs. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in mapreduce clusters using Mantri. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. H. Arpaci-Dusseau and A. C. Arpaci-Dusseau. Scheduling: The multi-level feedback queue. In Operating Systems: Three Easy Pieces. 2014.Google ScholarGoogle Scholar
  14. W. Bai, L. Chen, K. Chen, D. Han, C. Tian, and H. Wang. Information-agnostic flow scheduling for commodity data centers. In NSDI, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Towards predictable datacenter networks. In SIGCOMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Benson, A. Anand, A. Akella, and M. Zhang. MicroTE: Fine grained traffic engineering for data centers. In CoNEXT, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Chowdhury, S. Kandula, and I. Stoica. Leveraging endpoint flexibility in data-intensive clusters. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Chowdhury and I. Stoica. Coflow: A networking abstraction for cluster applications. In HotNets, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing data transfers in computer clusters with Orchestra. In SIGCOMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Chowdhury, Y. Zhong, and I. Stoica. Efficient coflow scheduling with Varys. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. G. Coffman and L. Kleinrock. Feedback queueing models for time-shared systems. Journal of the ACM, 15(4):549--576, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Condie, N. Conway, P. Alvaro, and J. M. Hellerstein. Mapreduce online. In NSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. J. Corbató, M. Merwin-Daggett, and R. C. Daley. An experimental time-sharing system. In Spring Joint Computer Conference, pages 335--344, 1962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized task-aware scheduling for data center networks. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. G. Duffield, P. Goyal, A. Greenberg, P. Mishra, K. K. Ramakrishnan, and J. E. van der Merive. A flexible model for resource management in virtual private networks. In SIGCOMM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. D. Ferguson, A. Guha, C. Liang, R. Fonseca, and S. Krishnamurthi. Participatory networking: An API for application control of SDNs. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible data center network. In SIGCOMM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Kang, Z. Liu, J. Rexford, and D. Walker. Optimizing the "One Big Switch" abstraction in Software-Defined Networks. In CoNEXT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. E. Kelley. Critical-path planning and scheduling: Mathematical basis. Operations Research, 9(3):296--320, 1961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. E. Kelley. The critical-path method: Resources planning and scheduling. Industrial scheduling, 13:347--365, 1963.Google ScholarGoogle Scholar
  34. D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.Google ScholarGoogle Scholar
  36. G. Kumar, M. Chowdhury, S. Ratnasamy, and I. Stoica. A case for performance-centric network allocation. In HotCloud, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Mastrolilli, M. Queyranne, A. S. Schulz, O. Svensson, and N. A. Uhan. Minimizing the sum of weighted completion times in a concurrent open shop. Operations Research Letters, 38(5):390--395, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Moscibroda and O. Mutlu. Distributed order scheduling and its application to multi-core DRAM controllers. In PODC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Motwani, S. Phillips, and E. Torng. Nonclairvoyant scheduling. Theoretical Computer Science, 130(1):17--47, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Nair, A. Wierman, and B. Zwart. The fundamentals of heavy tails: Properties, emergence, and identification. In SIGMETRICS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Nuyens and A. Wierman. The Foreground--Background queue: A survey. Performance Evaluation, 65(3):286--307, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. FairCloud: Sharing the network in cloud computing. In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Z. Qiu, C. Stein, and Y. Zhong. Minimizing the total weighted completion time of coflows in datacenter networks. In SPAA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. I. A. Rai, G. Urvoy-Keller, and E. W. Biersack. Analysis of LAS scheduling for job size distributions with high variance. ACM SIGMETRICS Performance Evaluation Review, 31(1):218--228, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: A compiler and runtime for heterogeneous systems. In SOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. C. Wilson, H. Ballani, T. Karagiannis, and A. Rowstron. Better never than late: Meeting deadlines in datacenter networks. In SIGCOMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: SQL and rich analytics at scale. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Yu, R. Buyya, and K. Ramamohanarao. Workflow scheduling algorithms for grid computing. In Metaheuristics for Scheduling in Distributed Computing Environments, pages 173--214. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  50. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Y. Zhao, K. Chen, W. Bai, C. Tian, Y. Geng, Y. Zhang, D. Li, and S. Wang. RAPIER: Integrating routing and scheduling for coflow-aware data center networks. In INFOCOM, 2015.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient Coflow Scheduling Without Prior Knowledge

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
      August 2015
      684 pages
      ISBN:9781450335423
      DOI:10.1145/2785956

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGCOMM '15 Paper Acceptance Rate40of242submissions,17%Overall Acceptance Rate554of3,547submissions,16%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader