ABSTRACT
Exascale computation is the next target of high performance computing. In the push to create exascale computing platforms, simply increasing the number of hardware devices is not an acceptable option given the limitations of power consumption, heat dissipation, and programming models which are designed for current hardware platforms. Instead, new hardware technologies, coupled with improved programming abstractions and more autonomous runtime systems, are required to achieve this goal.
This position paper presents the design of a new runtime for a new heterogeneous hardware platform being developed to explore energy efficient, high performance computing. By extending and enhancing the OpenCL framework, this work will both simplify the programming of current and future HPC applications, as well as automating the scheduling of data and computation across this new hardware platform. Also, this work explores the use of FPGAs to achieve both the power and performance goals of exascale, as well as utilising the runtime to automatically effect dynamic configuration and reconfiguration of hardware platforms.
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pages 207--216, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT '92, pages 144--152, New York, NY, USA, 1992. ACM. Google ScholarDigital Library
- B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl., 21(3):291--312, Aug. 2007. Google ScholarDigital Library
- B. Chapman, G. Jost, and R. Van Der Pas. Using OpenMP: portable shared memory parallel programming, volume 10. MIT press, 2008. Google ScholarDigital Library
- E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375--408, Sept. 2002. Google ScholarDigital Library
- M. P. Forum. MPI: A message-passing interface standard. Technical report, Knoxville, TN, USA, 1994. Google ScholarDigital Library
- D. Grewe, Z. Wang, and M. F. P. O'Boyle. OpenCL task partitioning in the presence of GPU contention. In 26th International Workshop, LCPC 2013, San Jose, CA, USA, September 25-27, 2013., 2013.Google Scholar
- P. Harvey, K. Hentschel, and J. Sventek. Parallel programming in actor-based applications via OpenCL. In Proceedings of the 16th Annual Middleware Conference, Middleware '15, pages 162--172, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- P. Jääskeläinen, C. S. de La Lama, E. Schnetter, K. Raiskila, J. Takala, and H. Berg. pocl: A performance-portable opencl implementation. International Journal of Parallel Programming, 43(5):752--785, 2015. Google ScholarDigital Library
- J. Jeffers and J. Reinders. Intel Xeon Phi Coprocessor High Performance Programming. Newnes, 2013. Google ScholarDigital Library
- M. A. Jette, A. B. Yoo, and M. Grondona. Slurm: Simple linux utility for resource management. In In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP) 2003, pages 44--60. Springer-Verlag, 2002.Google Scholar
- J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pages 341--352, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- I. Mavroidis, I. Papaefstathiou, L. Lavagno, D. Nikolopoulos, D. Koch, J. Goodacre, V. Papaefstathiou, I. Sourdis, M. Coppola, and M. Palomino. ECOSCALE: Reconfigurable Computing and Runtime System for Future Exascale Systems. Institute of Electrical and Electronics Engineers (IEEE), 2016.Google ScholarCross Ref
- R. W. Numrich and J. Reid. Co-array fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1--31, Aug. 1998. Google ScholarDigital Library
- E. O'Neill, J. McGlone, P. Milligan, and P. Kilpatrick. Shepard: Scheduling on heterogeneous platforms using application resource demands. In Proceedings of the 2014 22nd Euromicro Intl Conf on Parallel, Dist, and Network-Based Processing, PDP '14, pages 213--217, Washington, DC, USA, 2014. IEEE Computer Society. Google ScholarDigital Library
- P. Pandit and R. Govindarajan. Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pages 273:273--273:283, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- A. G. Schmidt, B. Huang, R. Sass, and M. French. Checkpoint/restart and beyond: Resilient high performance computing with FPGAs. In Field-Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual International Symposium on, pages 162--169, May 2011. Google ScholarDigital Library
- S. Trimberger. Scheduling designs into a time-multiplexed FPGA. In Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, FPGA '98, pages 153--160, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- Y. Wen, Z. Wang, and M. F. P. O'Boyle. Smart multi-task scheduling for opencl programs on CPU/GPU heterogeneous platforms. In 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17-20, 2014, pages 1--10, 2014.Google ScholarCross Ref
- S. Wienke, P. Springer, C. Terboven, and D. an Mey. OpenACC: First experiences with real-world applications. In Proceedings of the 18th International Conference on Parallel Processing, Euro-Par'12, pages 859--870, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarDigital Library
- S. Xiao, P. Balaji, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. chun Feng. VOCL: An optimized environment for transparent virtualization of graphics processing units. In In Proc. of the 1st Innovative Parallel Computing (InPar, 2012.Google ScholarCross Ref
- Y. Yan, P.-H. Lin, C. Liao, B. R. de Supinski, and D. J. Quinlan. Supporting multiple accelerators in high-level programming models. In Proceedings of the 6th Intl Workshop on Programming Models and Applications for Multicores and Manycores, PMAM '15, pages 170--180, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- Y. Zheng, A. Kamil, M. B. Driscoll, H. Shan, and K. Yelick. UPC++: A PGAS Extension for C++. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 1105--1114, May 2014. Google ScholarDigital Library
Recommendations
Nuclear Reactor Simulations on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysField-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The maturing high-level synthesis (HLS) ...
Base64 Encoding on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysBase64 encoding has many applications on the Web. Previous studies are focused on improving the efficiency of Base64 encoding on central processing units (CPUs). As field-programmable gate arrays (FPGAs) are becoming promising heterogeneous computing ...
Comparing Hardware Accelerators in Scientific Applications: A Case Study
Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing ...
Comments