skip to main content
10.1145/2931088.2931090acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform

Authors Info & Claims
Published:01 June 2016Publication History

ABSTRACT

Exascale computation is the next target of high performance computing. In the push to create exascale computing platforms, simply increasing the number of hardware devices is not an acceptable option given the limitations of power consumption, heat dissipation, and programming models which are designed for current hardware platforms. Instead, new hardware technologies, coupled with improved programming abstractions and more autonomous runtime systems, are required to achieve this goal.

This position paper presents the design of a new runtime for a new heterogeneous hardware platform being developed to explore energy efficient, high performance computing. By extending and enhancing the OpenCL framework, this work will both simplify the programming of current and future HPC applications, as well as automating the scheduling of data and computation across this new hardware platform. Also, this work explores the use of FPGAs to achieve both the power and performance goals of exascale, as well as utilising the runtime to automatically effect dynamic configuration and reconfiguration of hardware platforms.

References

  1. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pages 207--216, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT '92, pages 144--152, New York, NY, USA, 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl., 21(3):291--312, Aug. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Chapman, G. Jost, and R. Van Der Pas. Using OpenMP: portable shared memory parallel programming, volume 10. MIT press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375--408, Sept. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. P. Forum. MPI: A message-passing interface standard. Technical report, Knoxville, TN, USA, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Grewe, Z. Wang, and M. F. P. O'Boyle. OpenCL task partitioning in the presence of GPU contention. In 26th International Workshop, LCPC 2013, San Jose, CA, USA, September 25-27, 2013., 2013.Google ScholarGoogle Scholar
  8. P. Harvey, K. Hentschel, and J. Sventek. Parallel programming in actor-based applications via OpenCL. In Proceedings of the 16th Annual Middleware Conference, Middleware '15, pages 162--172, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Jääskeläinen, C. S. de La Lama, E. Schnetter, K. Raiskila, J. Takala, and H. Berg. pocl: A performance-portable opencl implementation. International Journal of Parallel Programming, 43(5):752--785, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Jeffers and J. Reinders. Intel Xeon Phi Coprocessor High Performance Programming. Newnes, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. A. Jette, A. B. Yoo, and M. Grondona. Slurm: Simple linux utility for resource management. In In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP) 2003, pages 44--60. Springer-Verlag, 2002.Google ScholarGoogle Scholar
  12. J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pages 341--352, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. Mavroidis, I. Papaefstathiou, L. Lavagno, D. Nikolopoulos, D. Koch, J. Goodacre, V. Papaefstathiou, I. Sourdis, M. Coppola, and M. Palomino. ECOSCALE: Reconfigurable Computing and Runtime System for Future Exascale Systems. Institute of Electrical and Electronics Engineers (IEEE), 2016.Google ScholarGoogle ScholarCross RefCross Ref
  14. R. W. Numrich and J. Reid. Co-array fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1--31, Aug. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. O'Neill, J. McGlone, P. Milligan, and P. Kilpatrick. Shepard: Scheduling on heterogeneous platforms using application resource demands. In Proceedings of the 2014 22nd Euromicro Intl Conf on Parallel, Dist, and Network-Based Processing, PDP '14, pages 213--217, Washington, DC, USA, 2014. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Pandit and R. Govindarajan. Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pages 273:273--273:283, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. G. Schmidt, B. Huang, R. Sass, and M. French. Checkpoint/restart and beyond: Resilient high performance computing with FPGAs. In Field-Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual International Symposium on, pages 162--169, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Trimberger. Scheduling designs into a time-multiplexed FPGA. In Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, FPGA '98, pages 153--160, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Wen, Z. Wang, and M. F. P. O'Boyle. Smart multi-task scheduling for opencl programs on CPU/GPU heterogeneous platforms. In 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17-20, 2014, pages 1--10, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Wienke, P. Springer, C. Terboven, and D. an Mey. OpenACC: First experiences with real-world applications. In Proceedings of the 18th International Conference on Parallel Processing, Euro-Par'12, pages 859--870, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Xiao, P. Balaji, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. chun Feng. VOCL: An optimized environment for transparent virtualization of graphics processing units. In In Proc. of the 1st Innovative Parallel Computing (InPar, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  22. Y. Yan, P.-H. Lin, C. Liao, B. R. de Supinski, and D. J. Quinlan. Supporting multiple accelerators in high-level programming models. In Proceedings of the 6th Intl Workshop on Programming Models and Applications for Multicores and Manycores, PMAM '15, pages 170--180, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Zheng, A. Kamil, M. B. Driscoll, H. Shan, and K. Yelick. UPC++: A PGAS Extension for C++. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 1105--1114, May 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ROSS '16: Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers
    June 2016
    54 pages
    ISBN:9781450343879
    DOI:10.1145/2931088

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 June 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    ROSS '16 Paper Acceptance Rate6of10submissions,60%Overall Acceptance Rate58of169submissions,34%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader