skip to main content
10.1145/3627535.3638484acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Artifacts Available / v1.1

A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs

Published:20 February 2024Publication History

ABSTRACT

Reducing floating-point (FP) precision is used to trade the quality degradation of a numerical program's output for performance, but this optimization coincides with type casting, whose overhead is undisclosed until a mixed-precision code version is generated. This uncertainty enforces the decoupled implementation of mixed-precision code generation and autotuning in prior work. In this paper, we present a holistic approach called PrecTuner that consolidates the mixed-precision code generator and the autotuner by defining one parameter. This parameter is first initialized by some automatically sampled values and used to generate several code variants, with various loop transformations also taken into account. The generated code variants are next profiled to solve a performance model formulated using the aforementioned parameter, possibly under a pre-defined quality degradation budget. The best-performing value of the defined parameter is finally predicted without evaluating all code variants. Experimental results of the PolyBench benchmarks on CPU demonstrate that PrecTuner outperforms LuIs by 3.28× while achieving smaller errors, and we also validate its effectiveness in optimizing a real-life large-scale application. In addition, PrecTuner also obtains a mean speedup of 1.81× and 1.52×-1.73× over Pluto on single- and multi-core CPU, respectively, and 1.71× over PPCG on GPU.

References

  1. Aravind Acharya, Uday Bondhugula, and Albert Cohen. 2018. Polyhedral Auto-transformation with No Integer Linear Programming. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA, USA) (PLDI 2018). ACM, New York, NY, USA, 529--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (Dublin, Ireland) (PLDI'09). Association for Computing Machinery, New York, NY, USA, 38--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. IEEE Standards Association. 2019. IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2019 (Revision of IEEE 754-2008) (2019), 1--84. Google ScholarGoogle ScholarCross RefCross Ref
  4. Cédric Bastoul. 2004. Code Generation in the Polyhedral Model Is Easier Than You Think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT '04). IEEE Computer Society, Washington, DC, USA, 7--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Atılım Günes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2017. Automatic Differentiation in Machine Learning: A Survey. J. Mach. Learn. Res. 18, 1 (jan 2017), 5595--5637.Google ScholarGoogle Scholar
  6. Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, and Nicolas Vasilache. 2014. Tiling and Optimizing Time-Iterated Computations on Periodic Domains. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (Edmonton, AB, Canada) (PACT '14). Association for Computing Machinery, New York, NY, USA, 39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (Tucson, AZ, USA) (PLDI'08). ACM, New York, NY, USA, 101--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daniele Cattaneo, Michele Chiari, Nicola Fossati, Stefano Cherubin, and Giovanni Agosta. 2021. Architecture-Aware Precision Tuning with Multiple Number Representation Systems. In 2021 58th ACM/IEEE Design Automation Conference (DAC) (San Francisco, CA, USA). IEEE Press, 673--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stefano Cherubin and Giovanni Agosta. 2020. Tools for Reduced Precision Computation: A Survey. ACM Comput. Surv. 53, 2, Article 33 (apr 2020), 35 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stefano Cherubin, Daniele Cattaneo, Michele Chiari, Antonio Di Bello, and Giovanni Agosta. 2020. TAFFO: Tuning Assistant for Floating to Fixed Point Optimization. IEEE Embedded Systems Letters 12, 1 (2020), 5--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Eva Darulova and Viktor Kuncak. 2017. Towards a Compiler for Reals. ACM Trans. Program. Lang. Syst. 39, 2, Article 8 (mar 2017), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Marc Daumas and Guillaume Melquiond. 2010. Certification of Bounds on Expressions Involving Rounded Operators. ACM Trans. Math. Softw. 37, 1, Article 2 (jan 2010), 20 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Paul Feautrier. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20, 1 (01 Feb 1991), 23--53. Google ScholarGoogle ScholarCross RefCross Ref
  14. Paul Feautrier. 1992. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. International journal of parallel programming 21, 6 (1992), 389--420.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Paul Feautrier and Christian Lengauer. 2011. Polyhedron Model. Springer US, Boston, MA, 1581--1592. Google ScholarGoogle ScholarCross RefCross Ref
  16. Eric Goubault and Sylvie Putot. 2006. Static Analysis of Numerical Algorithms. In Static Analysis, Kwangkeun Yi (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 18--34.Google ScholarGoogle Scholar
  17. Martin Griebl, Paul Feautrier, and Christian Lengauer. 2000. Index Set Splitting. Int. J. Parallel Program. 28, 6 (Dec. 2000), 607--631. Google ScholarGoogle ScholarCross RefCross Ref
  18. Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly-Performing Polyhedral Optimizations on a Low-level Intermediate Representation. Parallel Processing Letters 22, 04 (2012), 1250010.Google ScholarGoogle ScholarCross RefCross Ref
  19. Tobias Grosser, Sven Verdoolaege, and Albert Cohen. 2015. Polyhedral AST Generation Is More Than Scanning Polyhedra. ACM Trans. Program. Lang. Syst. 37, 4, Article 12 (July 2015), 50 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hui Guo and Cindy Rubio-González. 2018. Exploiting Community Structure for Floating-Point Precision Tuning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (Amsterdam, Netherlands) (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 333--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ken Kennedy and John R. Allen. 2002. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pradeep V Kotipalli, Ranvijay Singh, Paul Wood, Ignacio Laguna, and Saurabh Bagchi. 2019. AMPT-GA: Automatic Mixed Precision Floating Point Tuning for GPU Applications (ICS '19). Association for Computing Machinery, New York, NY, USA, 160--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ki-Il Kum, Jiyang Kang, and Wonyong Sung. 2000. AUTOSCALER for C: an optimizing floating-point to integer C program converter for fixed-point digital signal processors. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 47, 9 (2000), 840--848. Google ScholarGoogle ScholarCross RefCross Ref
  24. Ignacio Laguna, Paul C. Wood, Ranvijay Singh, and Saurabh Bagchi. 2019. GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications. In High Performance Computing, Michèle Weiland, Guido Juckeland, Carsten Trinitis, and Ponnuswamy Sadayappan (Eds.). Springer International Publishing, Cham, 227--246.Google ScholarGoogle Scholar
  25. Michael O. Lam, Jeffrey K. Hollingsworth, Bronis R. de Supinski, and Matthew P. Legendre. 2013. Automatically Adapting Programs for Mixed-Precision Floating-Point Computation. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (Eugene, Oregon, USA) (ICS '13). Association for Computing Machinery, New York, NY, USA, 369--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Victor Magron, George Constantinides, and Alastair Donaldson. 2017. Certified Roundoff Error Bounds Using Semidefinite Programming. ACM Trans. Math. Softw. 43, 4, Article 34 (jan 2017), 31 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Daniel Menard, Daniel Chillet, François Charot, and Olivier Sentieys. 2002. Automatic Floating-Point to Fixed-Point Conversion for DSP Code Generation. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (Grenoble, France) (CASES '02). Association for Computing Machinery, New York, NY, USA, 270--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Harshitha Menon, Michael O. Lam, Daniel Osei-Kuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, and Jeffrey Hittinger. 2018. ADAPT: Algorithmic Differentiation Applied to Floating-Point Precision Tuning. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 614--626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ricardo Nobre, Luís Reis, João Bispo, Tiago Carvalho, João M.P. Cardoso, Stefano Cherubin, and Giovanni Agosta. 2018. Aspect-Driven Mixed-Precision Tuning Targeting GPUs. In Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (Manchester, United Kingdom) (PARMA-DITAM '18). Association for Computing Machinery, New York, NY, USA, 26--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Arjun Pitchanathan, Christian Ulmann, Michel Weber, Torsten Hoefler, and Tobias Grosser. 2021. FPL: Fast Presburger Arithmetic through Transprecision. Proc. ACM Program. Lang. 5, OOPSLA, Article 162 (oct 2021), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Louis-Noël Pouchet, Uday Bondhugula, Cédric Bastoul, Albert Cohen, J. Ramanujam, P. Sadayappan, and Nicolas Vasilache. 2011. Loop Transformations: Convexity, Pruning and Optimization. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Austin, Texas, USA) (POPL'11). Association for Computing Machinery, New York, NY, USA, 549--562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Louis-Noël Pouchet and Tomofumi Yuki. 2016. PolyBench/C 4.2. https://sourceforge.net/projects/polybenchGoogle ScholarGoogle Scholar
  33. Cindy Rubio-González, Cuong Nguyen, Benjamin Mehne, Koushik Sen, James Demmel, William Kahan, Costin Iancu, Wim Lavrijsen, David H. Bailey, and David Hough. 2016. Floating-Point Precision Tuning Using Blame Analysis. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE '16). Association for Computing Machinery, New York, NY, USA, 1074--1085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning Assistant for FloatingPoint Precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC '13). Association for Computing Machinery, New York, NY, USA, Article 27, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Alex Sanchez-Stern, Pavel Panchekha, Sorin Lerner, and Zachary Tatlock. 2018. Finding Root Causes of Floating Point Error. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA, USA) (PLDI 2018). Association for Computing Machinery, New York, NY, USA, 256--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alexey Solovyev, Marek S. Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan. 2018. Rigorous Estimation of Floating-Point Round-Off Errors with Symbolic Taylor Expansions. ACM Trans. Program. Lang. Syst. 41, 1, Article 2 (dec 2018), 39 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, and Luca Benini. 2018. A transprecision floating-point platform for ultra-low power computing. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1051--1056. Google ScholarGoogle ScholarCross RefCross Ref
  38. Xiaohan Tao, Yu Zhu, Boyang Wang, Jinlong Xu, Jianmin Pang, and Jie Zhao. 2023. Automatically Generating High-Performance Matrix Multiplication Kernels on the Latest Sunway Processor. In Proceedings of the 51st International Conference on Parallel Processing (Bordeaux, France) (ICPP '22). Association for Computing Machinery, New York, NY, USA, Article 52, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Seoul National University. 2019. SNU NPB 2019 Suite. http://snuvm.snu.ac.kr/software/snu-npb-2019/Google ScholarGoogle Scholar
  40. Anand Venkat, Manu Shantharam, Mary Hall, and Michelle Mills Strout. 2018. Non-Affine Extensions to Polyhedral Code Generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (Orlando, FL, USA) (CGO '14). Association for Computing Machinery, New York, NY, USA, 185--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sven Verdoolaege. 2010. Isl: An Integer Set Library for the Polyhedral Model. In Proceedings of the Third International Congress Conference on Mathematical Software (Kobe, Japan) (ICMS'10). Springer-Verlag, Berlin, Heidelberg, 299--302. Google ScholarGoogle ScholarCross RefCross Ref
  42. Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gömez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral Parallel Code Generation for CUDA. ACM Trans. Archit. Code Optim. 9, 4, Article 54 (Jan. 2013), 23 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Sven Verdoolaege and Tobias Grosser. 2012. Polyhedral extraction tool. In Second International Workshop on Polyhedral Compilation Techniques (IMPACT'12), Paris, France, Vol. 141.Google ScholarGoogle Scholar
  44. Jie Zhao and Peng Di. 2020. Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-53) (Vitural Event). IEEE, Piscataway, NJ, USA, 427--441. Google ScholarGoogle ScholarCross RefCross Ref
  45. Jie Zhao, Michael Kruse, and Albert Cohen. 2018. A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds. In Proceedings of the 27th International Conference on Compiler Construction (Vienna, Austria) (CC 2018). Association for Computing Machinery, New York, NY, USA, 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jie Zhao, Bojie Li, Wang Nie, Zhen Geng, Renwei Zhang, Xiong Gao, Bin Cheng, Chen Wu, Yun Cheng, Zheng Li, Peng Di, Kun Zhang, and Xuefeng Jin. 2021. AKG: Automatic Kernel Generation for Neural Processing Units Using Polyhedral Transformations. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI'21). Association for Computing Machinery, New York, NY, USA, 1233--1248. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
        March 2024
        498 pages
        ISBN:9798400704352
        DOI:10.1145/3627535

        Copyright © 2024 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 February 2024

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate230of1,014submissions,23%
      • Article Metrics

        • Downloads (Last 12 months)244
        • Downloads (Last 6 weeks)89

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader