research-article

Public Access

Generating piecewise-regular code from irregular structures

Authors:
Travis Augustine

Colorado State University, USA

Colorado State University, USA
View Profile

,
Janarthanan Sarma

Colorado State University, USA

Colorado State University, USA
View Profile

,
Louis-Noël Pouchet

Colorado State University, USA

Colorado State University, USA
View Profile

,
Gabriel Rodríguez

Universidade da Coruña, Spain

Universidade da Coruña, Spain
View Profile

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2019Pages 625–639https://doi.org/10.1145/3314221.3314615

Published:08 June 2019Publication History

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 625–639

ABSTRACT

Irregular data structures, as exemplified with sparse matrices, have proved to be essential in modern computing. Numerous sparse formats have been investigated to improve the overall performance of Sparse Matrix-Vector multiply (SpMV). But in this work we propose instead to take a fundamentally different approach: to automatically build sets of regular sub-computations by mining for regular sub-regions in the irregular data structure. Our approach leads to code that is specialized to the sparsity structure of the input matrix, but which does not need anymore any indirection array, thereby improving SIMD vectorizability. We particularly focus on small sparse structures (below 10M nonzeros), and demonstrate substantial performance improvements and compaction capabilities compared to a classical CSR implementation and Intel MKL IE's SpMV implementation, evaluating on 200+ different matrices from the SuiteSparse repository.

Supplemental Material

p625-augustine.webm

webm

93.8 MB

Download

References

G. Agrawal, J. Saltz, and R. Das. 1995. Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. La Jolla, CA, USA, 258ś269. Google ScholarDigital Library
A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. 2014. Fast Sparse Matrix-vector Multiplication on GPUs for Graph Applications. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. New Orleans, LA, USA, 781ś792.Google Scholar
C. Bastoul. 2004. Code Generation in the Polyhedral Model Is Easier Than You Think. In 13th International Conference on Parallel Architectures and Compilation Techniques, PACT. IEEE, Antibes, France, 7ś16. Google ScholarDigital Library
N. Bell and M. Garland. 2008. Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004. NVIDIA Corporation.Google Scholar
N. Bell and M. Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In ACM/IEEE Conference on High Performance Computing, SC. Portland, OR, USA. Google ScholarDigital Library
K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 13. Google ScholarDigital Library
K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2018. ParSy: inspection and transformation of sparse matrix computations for parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 62. Google ScholarDigital Library
J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. Bangalore, India, 115ś126. Google ScholarDigital Library
S. Chou, F. Kjolstad, and S. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 123. Google ScholarDigital Library
P. Clauss and B. Kenmei. 2006. Polyhedral Modeling and Analysis of Memory Access Profiles. In IEEE International Conference on Application-Specific Systems, Architecture and Processors, ASAP. Steamboat Springs, CO, USA, 191ś198. Google ScholarDigital Library
P. Clauss, B. Kenmei, and J. C. Beyler. 2005. The Periodic-Linear Model of Program Behavior Capture. In 11th International Euro-Par Conference. Lisbon, Portugal, 325ś335. Google ScholarDigital Library
R. Das, P. Havlak, J. Saltz, and K. Kennedy. 1995. Index Array Flattening Through Program Transformation. In ACM/IEEE Supercomputing Conference, SC. San Diego, CA, USA, Article 70. Google ScholarDigital Library
T. A. Davis and Y. Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Software 38 (2011), 1ś25. Issue 1. Google ScholarDigital Library
E.F. D’Azevedo, M.R. Fahey, and R.T. Mills. 2005. Vectorized Sparse Matrix Multiply for Compressed Row Storage Format. In Intl. Conference on Computational Science, ICCS. Atlanta, GA, USA, 99ś106. Google ScholarDigital Library
A. Ekambaram and E. Montagne. 2003. An Alternative Compressed Storage Format for Sparse Matrices. In Intl. Symposium on Computer Science and Information Sciences, ISCIS. Antalya, Turkey, 196ś203.Google Scholar
J. Godwin, J. Holewinski, and P. Sadayappan. 2012. High-performance Sparse Matrix-vector Multiplication on GPUs for Structured Grid Computations. In 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU. London, UK, 47ś56. Google ScholarDigital Library
R.G. Grimes, D.R. Kincaid, and D.M. Young. 1980. ITPACK 2.0: User’s Guide. http://books.google.com/books?id=h8RcNAAACAAJGoogle Scholar
G. Gupta and S. Rajopadhye. 2007. The Z-Polyhedral Model. In 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. San Jose, CA, USA, 237ś248. Google ScholarDigital Library
S. Han, J. Pool, J. Tran, and W. Dally. 2015. Learning both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems, NIPS. Quebec, Canada, 1135ś1143. Google ScholarDigital Library
B. Hassibi and D.G. Stork. 1992. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon. In Advances in Neural Information Processing Systems, NIPS. Denver, CO, USA, 164ś171. Google ScholarDigital Library
A. Ketterlin and P. Clauss. 2008. Prediction and Trace Compression of Data Access Addresses through Nested Loop Recognition. In 6th International Symposium on Code Generation and Optimization, CGO. Boston, MA, USA, 94ś103. Google ScholarDigital Library
F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 77. Google ScholarDigital Library
A. LaMielle and M. Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework. Technical Report. Colorado State University.Google Scholar
Y. LeCun, C. Cortes, and C. Burges. {n. d.}. The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/ . Last accessed: April 2019.Google Scholar
R. Ponnusamy, J.H. Saltz, and A.N. Choudhary. 1993. Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In ACM/IEEE Conference on Supercomputing, SC. Portland, OR, USA, 361ś370. Google ScholarDigital Library
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. 2011. Loop Transformations: Convexity, Pruning and Optimization. In Proc. Symposium on Principles of Programming Languages (POPL ’11). ACM, 549ś562. Google ScholarDigital Library
M. Ravishankar, R. Dathathri, V. Elango, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. 2015. Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. ACM, San Francisco, CA, USA, 65ś75.Google Scholar
G. Rodríguez and L.-N. Pouchet. 2018. Polyhedral Modeling of Immutable Sparse Matrices. In 8th International Workshop on Polyhedral Compilation Techniques. Manchester, UK.Google Scholar
G. Rodríguez, J. M. Andión, M. T. Kandemir, and J. Touriño. 2016. Trace-based Affine Reconstruction of Codes. In Proceedings of the 14th International Symposium on Code Generation and Optimization, CGO. Barcelona, Spain, 139ś149. Google ScholarDigital Library
G. Rodríguez, M. T. Kandemir, and J. Touriño. 2018. Affine Modeling of Program Traces. ACM. Trans. Comput. 68, 2 (2018), 294ś300.Google Scholar
Y. Saad. 1990. SPARSKIT: A basic tool kit for sparse matrix computations. (1990).Google Scholar
J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. 1990. Runtime Scheduling and Execution of Loops on Message Passing Machines. J. Parallel Distrib. Comput. 8, 4 (1990), 303ś312. Google ScholarDigital Library
S. Sharma, R. Ponnusamy, B. Moon, Y.-S. Hwang, R. Das, and J. Saltz. 1994. Run-time and Compile-time Support for Adaptive Irregular Problems. In ACM/IEEE Conference on Supercomputing, SC. Washington, DC, USA, 97ś106. Google ScholarDigital Library
M.M. Strout, G. George, and C. Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. In 25th International Workshop on Languages and Compilers for Parallel Computing, LCPC. Tokyo, Japan, 61ś75.Google Scholar
A. Sukumaran-Rajam and P. Clauss. 2016. The Polyhedral Model of Nonlinear Loops. ACM Trans. Archit. Code Optim. 12, 4 (2016), 48. Google ScholarDigital Library
W.T. Tang, R. Zhao, M. Lu, Y. Liang, H.P. Huynh, X. Li, and R.S.M. Goh. 2015. Optimizing and Auto-tuning Scale-free Sparse Matrixvector Multiplication on Intel Xeon Phi. In 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO. IEEE Computer Society, San Francisco, CA, USA, 136ś145. Google Scholar
A. Venkat, M.S. Mohammadi, J. Park, H. Rong, R. Barik, M.M. Strout, and M. Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Salt Lake City, UT, USA, Article 41. Google ScholarDigital Library
R. von Hanxleden, K. Kennedy, C. Koelbel, R. Das, and J. Saltz. 1992. Compiler analysis for irregular problems in Fortran D. In 6th International Workshop on Languages and Compilers for Parallel Computing, LCPC. New Haven, CT, USA, 97ś111. Google ScholarDigital Library
R.W. Vuduc. 2004. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. University of California. Google ScholarDigital Library
S. Williams, L. Oliker, R.W. Vuduc, J. Shalf, K.A. Yelick, and J. Demmel. 2009. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput. 35, 3 (2009), 178ś194. Google ScholarDigital Library
S. Yan, C. Li, Y. Zhang, and H. Zhou. 2014. yaSpMV: Yet Another SpMV Framework on GPUs. In 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. ACM, Orlando, FL, USA, 107ś118. Google ScholarDigital Library
X. Yang, S. Parthasarathy, and P. Sadayappan. 2011. Fast Sparse Matrixvector Multiplication on GPUs: Implications for Graph Mining. Proc. VLDB Endow. 4, 4 (2011), 231ś242. Google ScholarDigital Library

Index Terms

Generating piecewise-regular code from irregular structures
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
2. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Data compression
  2. Semantics and reasoning
    1. Program reasoning
      1. Program analysis

Recommendations

Distributed memory code generation for mixed Irregular/Regular computations
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Many applications feature a mix of irregular and regular computational structures. For example, codes using adaptive mesh refinement (AMR) typically use a collection of regular blocks, where the number of blocks and the relationship between blocks is ...
Read More
Distributed memory code generation for mixed Irregular/Regular computations
PPoPP '15

Many applications feature a mix of irregular and regular computational structures. For example, codes using adaptive mesh refinement (AMR) typically use a collection of regular blocks, where the number of blocks and the relationship between blocks is ...
Read More
Explicit Fourth-Order Runge---Kutta Method on Intel Xeon Phi Coprocessor

This paper concerns an Intel Xeon Phi implementation of the explicit fourth-order Runge---Kutta method (RK4) for very sparse matrices with very short rows. Such matrices arise during Markovian modeling of computer and telecommunication networks. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2019
1162 pages
ISBN:9781450367127
DOI:10.1145/3314221
General Chair:
Kathryn S. McKinley
Google, USA
,
Program Chair:
Kathleen Fisher
Tufts University, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional
Author Tags
Polyhedral compilation
SpMV
sparse data structure
trace compression
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 1,494
  Total Downloads
- Downloads (Last 12 months)240
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.