Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology

Shin, Jaewook; Hall, Mary W.; Chame, Jacqueline; Chen, Chun; Hovland, Paul D.

doi:10.1007/978-1-4419-6935-4_20

Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology

Jaewook Shin⁵,
Mary W. Hall,
Jacqueline Chame,
Chun Chen &
…
Paul D. Hovland

Chapter
First Online: 01 January 2010

637 Accesses
7 Citations

Abstract

Autotuning technology has emerged recently as a systematic process for evaluating alternative implementations of a computation to select the best-performing solution for a particular architecture. Specialization optimizes code customized to a particular class of input data. This paper presents a compiler optimization approach that combines novel autotuning compiler technology with specialization for expected data set sizes of key computations, focused on matrix multiplication of small matrices. We describe compiler techniques developed for this approach, including the interface to a polyhedral transformation system for generating specialized code and the heuristics used to prune the enormous search space of alternative implementations. We demonstrate significantly better performance than directly using libraries such as GOTO, ATLAS, and ACML BLAS that are not specifically optimized for the problem sizes on hand. In a case study of Nek5000, a spectral-element-based code that extensively uses the specialized matrix multiply, we demonstrate a performance improvement of 36% for the full application.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

http://nek5000.mcs.anl.gov
http://rosecompiler.org/
http://www.mcs.anl.gov/~jaewook/tune.html
http://www.netlib.org/blas/
Barthou D, Donadio S, Carribault P, Duchateau A, Jalby W (2007) Loop optimization using hierarchical compilation and kernel decomposition. In International symposium on code generation and optimization, San Jose, CA
Google Scholar
Bilmes J, Asanovic K, Chin C-W, Demmel J (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In International conference on supercomputing, Vienna, Austria, pp 340–347
Google Scholar
Chen C (2007) Model-guided empirical optimization for memory hierarchy. PhD thesis, University of Southern California
Google Scholar
Chen C, Chame J, Hall M (2005) Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In International symposium on code generation and optimization, March 2005
Google Scholar
Chen C, Chame J, Hall M (2008) CHiLL: a framework for composing high-level loop transformations. Technical Report 08-897, University of Southern California, Computer Science Department
Google Scholar
Chung E-Y, Benini L, DeMicheli G, Luculli G, Carilli M (2002) Value-sensitive automatic code specialization for embedded software. IEEE Trans Comput Aided Des Integr Circuits Syst 21(9):1051–1067
Article Google Scholar
Frigo M, Johnson SG (1997) The fastest Fourier transform in the West. Technical Report MIT-LCS-TR728, MIT Lab for Computer Science
Google Scholar
Gunnels JA, Gustavson FG, Henry GM, Van De Geijn RA (2001) FLAME: formal linear algebra methods environment. ACM Trans Math Software 27(4):422–455
Article MATH Google Scholar
Hall M, Chame J, Chen C, Shin J, Rudy G, Murtaza Khan M (2009) Loop transformation recipes for code generation and auto-tuning. The 22nd international workshop on languages and compilers for parallel computing, October 8-10, 2009, University of Delaware, Newark, Delaware
Google Scholar
Hartono A, Norris B, Sadayappan P (2009) Annotation-based empirical performance tuning using orio. In IEEE international parallel and distributed processing symposium (IPDPS), Rome, Italy
Google Scholar
Herrero JR, Navarro JJ (2003) Improving performance of hypermatrix cholesky factorization. In 9th International Euro-Par Conference, pp 461–469
Google Scholar
Intel (2008) Intel Fortran Compiler User and Reference Guides. http://www.intel.com/cd/software/products/asmo-na/eng/406088.htm
Kaushik DK, Gropp W, Minkoff M, Smith B (2008) Improving the performance of tensor matrix vector multiplication in cumulative reaction probability based quantum chemistry codes. In 15th international conference on high performance computing (HiPC 2008), vol. 5374 of Lecture Notes in Computer Science, Springer, Berlin
Google Scholar
Knijnenburg PMW, Kisuki T, O’Boyle MFP (2003) Combined selection of tile sizes and unroll factors using iterative compilation. J Supercomput 24(1):43–67
Article MATH Google Scholar
Muth R, Watterson S, Debray S (2002) Code specialization based on value profiles. In Proceedings of static analysis symposium, June 2000
Google Scholar
Püschel M, Moura JMF, Johnson J, Padua D, Veloso M, Singer B, Xiong J, Franchetti F, Gačić A, Voronenko Y, Chen K, Johnson RW, Rizzolo N (2005) SPIRAL: code generation for DSP transforms. Proc IEEE 93(2):232–275
Article Google Scholar
Tiwari A, Chen C, Chame J, Hall M, Hollingsworth JK (2009) A scalable autotuning framework for compiler optimization. In IPDPS, Rome, Italy
Google Scholar
Tufo HM, Fischer PF (1999) Terascale spectral element algorithms and implementations. In ACM/IEEE conference on Supercomputing, Portland, OR
Google Scholar
Vuduc R, Demmel JW, Yelick KA (2005) Oski: a library of automatically tuned sparse matrix kernels. J Phys Conf Ser 16(1):521–530
Article Google Scholar
Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In SuperComputing
Google Scholar
Yi Q, Seymour K, You H, Vuduc R, Quinlan D (2007) POET: parameterized optimizations for empirical tuning. In IPDPS, Long Beach, CA, March 2007
Google Scholar
Yotov K, Li X, Ren G, Garzarán MJ, Padua D, Pingali K, Stodghill P (2005) Is search really necessary to generate high-performance BLAS? Proc IEEE 93(2):358–386
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 60439, USA
Jaewook Shin

Authors

Jaewook Shin
View author publications
You can also search for this author in PubMed Google Scholar
Mary W. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline Chame
View author publications
You can also search for this author in PubMed Google Scholar
Chun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Paul D. Hovland
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaewook Shin .

Editor information

Editors and Affiliations

Central Research Laboratory, Hitachi Ltd., Higashi-Koigakubo 1-280, Kokubunji-shi, Tokyo, 185-8601, Japan
Ken Naono
Cray, Inc., Jackson St. 380, St Paul, 55101, Minnesota, USA
Keita Teranishi
Dept. Computer & Information Sciences, University of Delaware, Smith Hall 101, Newark, 19716, Delaware, USA
John Cavazos
Dept. Computer Science, University of Tokyo, Hongo 7-3-1, Tokyo, 113-0033, Japan
Reiji Suda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shin, J., Hall, M.W., Chame, J., Chen, C., Hovland, P.D. (2011). Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology. In: Naono, K., Teranishi, K., Cavazos, J., Suda, R. (eds) Software Automatic Tuning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6935-4_20

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6935-4_20
Published: 13 August 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6934-7
Online ISBN: 978-1-4419-6935-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics