Skip to main content

KART – A Runtime Compilation Library for Improving HPC Application Performance

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10524))

Included in the following conference series:

Abstract

The effectiveness of ahead-of-time compiler optimization heavily depends on the amount of available information at compile time. Input-specific information that is only available at runtime cannot be used, although it often determines loop counts, branching predicates and paths, as well as memory-access patterns. It can also be crucial for generating efficient SIMD-vectorized code. This is especially relevant for the many-core architectures paving the way to exascale computing, which are more sensitive to code-optimization. We explore the design-space for using input-specific information at compile-time and present KART, a library solution that allows developers to compile, link, and execute code (e.g., C, , Fortran) at application runtime. Besides mere runtime compilation of performance-critical code, KART can be used to instantiate the same code multiple times using different inputs, compilers, and options. Other techniques like auto-tuning and code-generation can be integrated into a KART-enabled application instead of being scripted around it. We evaluate runtimes and compilation costs for different synthetic kernels, and show the effectiveness for two real-world applications, HEOM and a WSM6 proxy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. OpenMP Compilers, September 2016. http://openmp.org/wp/openmp-compilers/

  2. OpenMP®: Support for the OpenMP language, April 2016. http://openmp.llvm.org/

  3. Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing, November 2014

    Google Scholar 

  4. Bezanson, J., Karpinski, S., Shah, V.B., Edelman, A.: Julia: a fast dynamic language for technical computing. http://julialang.org

  5. Heinecke, A., Henry, G., Hutchinson, M., Pabst, H.: LIBXSMM: accelerating small matrix multiplications by runtime code generation. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 84:1–84:11, SC 2016. IEEE Press, Piscataway (2016). http://dl.acm.org/citation.cfm?id=3014904.3015017

  6. Heinecke, A., Klemm, M., Pflüger, D., Bode, A., Bungartz, H.J.: Extending a highly parallel data mining algorithm to the Intel\(^{\textregistered }\) many integrated core architecture. In: Alexander, M., et al. (eds.) Parallel Processing Workshops, Euro-Par 2011. LNCS, vol. 7156. Springer, Heidelberg (2011)

    Google Scholar 

  7. Henderson, T., Michalakes, J., Gokhale, I., Jha, A.: Chapter 2 - Numerical weather prediction optimization. In: Reinders, J., Jeffers, J. (eds.) High Performance Parallelism Pearls, pp. 7–23. Morgan Kaufmann, Boston (2015)

    Chapter  Google Scholar 

  8. Joó, B.: LLVM and QDP-JIT. In: iXPUG Workshop, Berkeley (2015). https://www.ixpug.org/events/ixpug-annual-meeting-2015

  9. Khronos OpenCL Working Group: The OpenCL Specification, Version 2.2. https://www.khronos.org/registry/cl/specs/opencl-2.2.pdf

  10. Kreisbeck, C., Kramer, T., Aspuru-Guzik, A.: Scalable high-performance algorithm for the simulation of exciton dynamics. Application to the light-harvesting Complex II in the presence of resonant vibrational modes. J. Chem. Theory Comput. 10(9), 4045–4054 (2014). pMID: 26588548. http://dx.doi.org/10.1021/ct500629s

    Article  Google Scholar 

  11. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis and transformation. In: CGO, pp. 75–88, San Jose, CA, USA, March 2004. llvm.org

  12. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queue 6(2), 40–53 (2008). http://doi.acm.org/10.1145/1365490.1365500

    Article  Google Scholar 

  13. Noack, M., Wende, F., Oertel, K.D.: Chapter 19 - OpenCL: there and back again. In: Reinders, J., Jeffers, J. (eds.) High Performance Parallelism Pearls, pp. 355–378. Morgan Kaufmann, Boston (2015)

    Chapter  Google Scholar 

  14. Noack, M., Wende, F., Steinke, T., Cordes, F.: A unified programming model for intra- and inter-node offloading on xeon phi clusters. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, 16–21 November 2014, pp. 203–214 (2014). http://dx.doi.org/10.1109/SC.2014.22

  15. NVIDIA: NVRTC - CUDA Runtime Compilation User Guide. http://docs.nvidia.com/cuda/pdf/NVRTC_User_Guide.pdf

  16. OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 4.5 (2015). http://www.openmp.org/

  17. Schling, B.: The Boost C++ Libraries. XML Press, Fort Collins (2011)

    Google Scholar 

  18. Schneider, T., Kjolstad, F., Hoefler, T.: MPI datatype processing using runtime compilation. In: Proceedings of the 20th European MPI Users’ Group Meeting, pp. 19–24. ACM, September 2013

    Google Scholar 

  19. Siso, S.: DL_MESO Code Modernization. Intel Xeon Phi Users Group (IXPUG). IXPUG Workshop, Ostrava, March 2016

    Google Scholar 

  20. Winter, F.T., Clark, M.A., Edwards, R.G., Joó, B.: A framework for lattice QCD calculations on GPUs. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1073–1082, IPDPS 2014 (2014). http://dx.doi.org/10.1109/IPDPS.2014.112

Download references

Acknowledgments

This work is partially supported by Intel Corporation within the “Research Center for Many-core High-Performance Computing” (Intel PCC) at ZIB. We thank the “The North-German Supercomputing Alliance - HLRN” for providing us access to the HLRN-III production system ‘Konrad’ and the Cray TDS system with Intel KNL nodes.

Intel, Xeon, and Xeon Phi are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

* Other names and brands are the property of their respective owners. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Noack .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Noack, M., Wende, F., Zitzlsberger, G., Klemm, M., Steinke, T. (2017). KART – A Runtime Compilation Library for Improving HPC Application Performance. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67630-2_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67629-6

  • Online ISBN: 978-3-319-67630-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics