Skip to main content

A Dynamic Parameter Tuning Method for High Performance SpMM

  • Conference paper
  • First Online:
Parallel and Distributed Computing, Applications and Technologies (PDCAT 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12606))

Abstract

Sparse matrix-matrix multiplication (SpMM) is a basic kernel that is used by many algorithms. Several researches focus on various optimizations for SpMM parallel execution. However, a division of a task for parallelization is not well considered yet. Generally, a matrix is equally divided into blocks for processes even though the sparsities of input matrices are different. The parameter that divides a task into multiple processes for parallelization is fixed. As a result, load imbalance among the processes occurs. To balance the loads among the processes, this paper proposes a dynamic parameter tuning method by analyzing the sparsities of input matrices. The experimental results show that the proposed method improves the performance of SpMM for examined matrices by up to 39.5% and 12.3% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bell, N., Dalton, S., Olson, L.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012). https://doi.org/10.1137/110838844

    Article  MathSciNet  MATH  Google Scholar 

  2. Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks, pp. 233–244. SPAA 2009, Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1583991.1584053

  3. Chen, Y., Li, K., Yang, W., Xiao, G., Xie, X., Li, T.: Performance-aware model for sparse matrix-matrix multiplication on the sunway taihulight supercomputer. IEEE Trans. Parallel Distrib. Syst. 30(4), 923–938 (2019). https://doi.org/10.1109/TPDS.2018.2871189

    Article  Google Scholar 

  4. Dalton, S., Olson, L., Bell, N.: Optimizing sparse matrix–matrix multiplication for the gpu. ACM Trans. Math. Softw. 41(4) (2015). https://doi.org/10.1145/2699470

  5. Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. textbf38(1) (2011). https://doi.org/10.1145/2049662.2049663

  6. Deveci, M., Trott, C., Rajamanickam, S.: Performance-portable sparse matrix-matrix multiplication for many-core architectures. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 693–702 (2017). https://doi.org/10.1109/IPDPSW.2017.8

  7. Forum, M.P.: MPI: a message-passing interface standard. Technical report, USA (1994)

    Google Scholar 

  8. Gilbert, J., Reinhardt, S., Shah, V.: A unified framework for numerical and combinatorial computing. Comput. Sci. Eng. 10, 20–25 (2008). https://doi.org/10.1109/MCSE.2008.45

    Article  Google Scholar 

  9. Gilbert, J.R., Moler, C., Schreiber, R.: Sparse matrices in matlab: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992). https://doi.org/10.1137/0613024

  10. Graf, D., Labib, K., Uznański, P.: Hamming distance completeness and sparse matrix multiplication (2018)

    Google Scholar 

  11. Green, O., Mccoll, R., Bader, D.: GPU merge path: a GPU merging algorithm (2014). https://doi.org/10.1145/2304576.2304621

  12. Gremse, F., Höfter, A., Schwen, L.O., Kiessling, F., Naumann, U.:Gpu-accelerated sparse matrix-matrix multiplication by iterative row merging. SIAM J. Sci. Comput. 37(1), C54–C71 (2015).https://doi.org/10.1137/130948811

  13. Komatsu, K., et al.: Performance evaluation of a vector supercomputer sx-aurora tsubasa. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. SC 2018, IEEE Press (2018)

    Google Scholar 

  14. Li, J., Wang, F., Araki, T., Qiu, J.: Generalized sparse matrix-matrix multiplication for vector engines and graph applications. In: 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). pp. 33–42 (2019)

    Google Scholar 

  15. Li, K., Yang, W., Li, K.: Performance analysis and optimization for SPMV on GPU using probabilistic modeling. IEEE Trans. Parallel and Distrib. Syst. 26(1), 196–205 (2015). https://doi.org/10.1109/TPDS.2014.2308221

    Article  Google Scholar 

  16. Liu, W., Vinter, B.: A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors. J. Parallel Distrib. Comput. 85, 47–61 (2015). https://doi.org/10.1016/j.jpdc.2015.06.010

  17. Matam, K., Krishna Bharadwaj Indarapu, S.R., Kothapalli, K.: Sparse matrix-matrix multiplication on modern architectures. In: 2012 19th International Conference on High Performance Computing, pp. 1–10 (2012). https://doi.org/10.1109/HiPC.2012.6507483

  18. Nagasaka, Y., Nukada, A., Matsuoka, S.: High-performance and memory-saving sparse general matrix-matrix multiplication for NVIDIA pascal GPU. In: 2017 46th International Conference on Parallel Processing (ICPP), pp. 101–110 (2017). https://doi.org/10.1109/ICPP.2017.19

  19. Vingelmann, P., Fitzek, F.H.: NVIDIA Cuda, release: 10.2.89 (2020). https://developer.nvidia.com/cuda-toolkit

  20. Ordonez, C., Zhang, Y., Cabrera, W.: The gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Trans. Knowl. Data Eng. 28(7), 1905–1918 (2016). https://doi.org/10.1109/TKDE.2016.2545664

    Article  Google Scholar 

  21. Parger, M., Winter, M., Mlakar, D., Steinberger, M.: Speck: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 362–375. PPoPP 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3332466.3374521

  22. Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–10 (2009). https://doi.org/10.1109/IPDPS.2009.5161005

  23. Schaub, M.T., Trefois, M., van Dooren, P., Delvenne, J.C.: Sparse matrix factorizations for fast linear solvers with application to laplacian systems. SIAM J. Matrix Anal. Appl. 38(2), 505–529 (2017). https://doi.org/10.1137/16m1077398

  24. Xie, Z., Tan, G., Liu, W., Sun, N.: IA-SPGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. In: Proceedings of the ACM International Conference on Supercomputing, pp. 94–105. ICS 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3330345.3330354

  25. Yang, W., Li, K., Mo, Z., Li, K.: Performance optimization using partitioned SPMV on GPUs and multicore CPUs. IEEE Trans. Comput. 64(9), 2623–2636 (2015). https://doi.org/10.1109/TC.2014.2366731

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgment

This research was partially supported by MEXT Next Generation High Performance Computing Infrastructures and Applications R&D Program, entitled “R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Qi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qi, B., Komatsu, K., Sato, M., Kobayashi, H. (2021). A Dynamic Parameter Tuning Method for High Performance SpMM. In: Zhang, Y., Xu, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2020. Lecture Notes in Computer Science(), vol 12606. Springer, Cham. https://doi.org/10.1007/978-3-030-69244-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69244-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69243-8

  • Online ISBN: 978-3-030-69244-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics