skip to main content
10.1145/3314221.3314615acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access
Artifacts Evaluated & Functional

Generating piecewise-regular code from irregular structures

Published:08 June 2019Publication History

ABSTRACT

Irregular data structures, as exemplified with sparse matrices, have proved to be essential in modern computing. Numerous sparse formats have been investigated to improve the overall performance of Sparse Matrix-Vector multiply (SpMV). But in this work we propose instead to take a fundamentally different approach: to automatically build sets of regular sub-computations by mining for regular sub-regions in the irregular data structure. Our approach leads to code that is specialized to the sparsity structure of the input matrix, but which does not need anymore any indirection array, thereby improving SIMD vectorizability. We particularly focus on small sparse structures (below 10M nonzeros), and demonstrate substantial performance improvements and compaction capabilities compared to a classical CSR implementation and Intel MKL IE's SpMV implementation, evaluating on 200+ different matrices from the SuiteSparse repository.

Skip Supplemental Material Section

Supplemental Material

p625-augustine.webm

webm

93.8 MB

References

  1. G. Agrawal, J. Saltz, and R. Das. 1995. Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. La Jolla, CA, USA, 258ś269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. 2014. Fast Sparse Matrix-vector Multiplication on GPUs for Graph Applications. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. New Orleans, LA, USA, 781ś792.Google ScholarGoogle Scholar
  3. C. Bastoul. 2004. Code Generation in the Polyhedral Model Is Easier Than You Think. In 13th International Conference on Parallel Architectures and Compilation Techniques, PACT. IEEE, Antibes, France, 7ś16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Bell and M. Garland. 2008. Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004. NVIDIA Corporation.Google ScholarGoogle Scholar
  5. N. Bell and M. Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In ACM/IEEE Conference on High Performance Computing, SC. Portland, OR, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2018. ParSy: inspection and transformation of sparse matrix computations for parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. Bangalore, India, 115ś126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chou, F. Kjolstad, and S. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Clauss and B. Kenmei. 2006. Polyhedral Modeling and Analysis of Memory Access Profiles. In IEEE International Conference on Application-Specific Systems, Architecture and Processors, ASAP. Steamboat Springs, CO, USA, 191ś198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Clauss, B. Kenmei, and J. C. Beyler. 2005. The Periodic-Linear Model of Program Behavior Capture. In 11th International Euro-Par Conference. Lisbon, Portugal, 325ś335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Das, P. Havlak, J. Saltz, and K. Kennedy. 1995. Index Array Flattening Through Program Transformation. In ACM/IEEE Supercomputing Conference, SC. San Diego, CA, USA, Article 70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. A. Davis and Y. Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Software 38 (2011), 1ś25. Issue 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E.F. D’Azevedo, M.R. Fahey, and R.T. Mills. 2005. Vectorized Sparse Matrix Multiply for Compressed Row Storage Format. In Intl. Conference on Computational Science, ICCS. Atlanta, GA, USA, 99ś106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Ekambaram and E. Montagne. 2003. An Alternative Compressed Storage Format for Sparse Matrices. In Intl. Symposium on Computer Science and Information Sciences, ISCIS. Antalya, Turkey, 196ś203.Google ScholarGoogle Scholar
  16. J. Godwin, J. Holewinski, and P. Sadayappan. 2012. High-performance Sparse Matrix-vector Multiplication on GPUs for Structured Grid Computations. In 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU. London, UK, 47ś56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R.G. Grimes, D.R. Kincaid, and D.M. Young. 1980. ITPACK 2.0: User’s Guide. http://books.google.com/books?id=h8RcNAAACAAJGoogle ScholarGoogle Scholar
  18. G. Gupta and S. Rajopadhye. 2007. The Z-Polyhedral Model. In 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. San Jose, CA, USA, 237ś248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Han, J. Pool, J. Tran, and W. Dally. 2015. Learning both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems, NIPS. Quebec, Canada, 1135ś1143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Hassibi and D.G. Stork. 1992. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon. In Advances in Neural Information Processing Systems, NIPS. Denver, CO, USA, 164ś171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Ketterlin and P. Clauss. 2008. Prediction and Trace Compression of Data Access Addresses through Nested Loop Recognition. In 6th International Symposium on Code Generation and Optimization, CGO. Boston, MA, USA, 94ś103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. LaMielle and M. Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework. Technical Report. Colorado State University.Google ScholarGoogle Scholar
  24. Y. LeCun, C. Cortes, and C. Burges. {n. d.}. The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist/ . Last accessed: April 2019.Google ScholarGoogle Scholar
  25. R. Ponnusamy, J.H. Saltz, and A.N. Choudhary. 1993. Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In ACM/IEEE Conference on Supercomputing, SC. Portland, OR, USA, 361ś370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. 2011. Loop Transformations: Convexity, Pruning and Optimization. In Proc. Symposium on Principles of Programming Languages (POPL ’11). ACM, 549ś562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Ravishankar, R. Dathathri, V. Elango, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. 2015. Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. ACM, San Francisco, CA, USA, 65ś75.Google ScholarGoogle Scholar
  28. G. Rodríguez and L.-N. Pouchet. 2018. Polyhedral Modeling of Immutable Sparse Matrices. In 8th International Workshop on Polyhedral Compilation Techniques. Manchester, UK.Google ScholarGoogle Scholar
  29. G. Rodríguez, J. M. Andión, M. T. Kandemir, and J. Touriño. 2016. Trace-based Affine Reconstruction of Codes. In Proceedings of the 14th International Symposium on Code Generation and Optimization, CGO. Barcelona, Spain, 139ś149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Rodríguez, M. T. Kandemir, and J. Touriño. 2018. Affine Modeling of Program Traces. ACM. Trans. Comput. 68, 2 (2018), 294ś300.Google ScholarGoogle Scholar
  31. Y. Saad. 1990. SPARSKIT: A basic tool kit for sparse matrix computations. (1990).Google ScholarGoogle Scholar
  32. J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. 1990. Runtime Scheduling and Execution of Loops on Message Passing Machines. J. Parallel Distrib. Comput. 8, 4 (1990), 303ś312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Sharma, R. Ponnusamy, B. Moon, Y.-S. Hwang, R. Das, and J. Saltz. 1994. Run-time and Compile-time Support for Adaptive Irregular Problems. In ACM/IEEE Conference on Supercomputing, SC. Washington, DC, USA, 97ś106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M.M. Strout, G. George, and C. Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. In 25th International Workshop on Languages and Compilers for Parallel Computing, LCPC. Tokyo, Japan, 61ś75.Google ScholarGoogle Scholar
  35. A. Sukumaran-Rajam and P. Clauss. 2016. The Polyhedral Model of Nonlinear Loops. ACM Trans. Archit. Code Optim. 12, 4 (2016), 48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W.T. Tang, R. Zhao, M. Lu, Y. Liang, H.P. Huynh, X. Li, and R.S.M. Goh. 2015. Optimizing and Auto-tuning Scale-free Sparse Matrixvector Multiplication on Intel Xeon Phi. In 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO. IEEE Computer Society, San Francisco, CA, USA, 136ś145. Google ScholarGoogle Scholar
  37. A. Venkat, M.S. Mohammadi, J. Park, H. Rong, R. Barik, M.M. Strout, and M. Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Salt Lake City, UT, USA, Article 41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. von Hanxleden, K. Kennedy, C. Koelbel, R. Das, and J. Saltz. 1992. Compiler analysis for irregular problems in Fortran D. In 6th International Workshop on Languages and Compilers for Parallel Computing, LCPC. New Haven, CT, USA, 97ś111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R.W. Vuduc. 2004. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. University of California. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Williams, L. Oliker, R.W. Vuduc, J. Shalf, K.A. Yelick, and J. Demmel. 2009. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput. 35, 3 (2009), 178ś194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Yan, C. Li, Y. Zhang, and H. Zhou. 2014. yaSpMV: Yet Another SpMV Framework on GPUs. In 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. ACM, Orlando, FL, USA, 107ś118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. X. Yang, S. Parthasarathy, and P. Sadayappan. 2011. Fast Sparse Matrixvector Multiplication on GPUs: Implications for Graph Mining. Proc. VLDB Endow. 4, 4 (2011), 231ś242. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generating piecewise-regular code from irregular structures

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2019
          1162 pages
          ISBN:9781450367127
          DOI:10.1145/3314221

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 June 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate406of2,067submissions,20%

          Upcoming Conference

          PLDI '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader