Skip to main content

Minimal Data Copy for Dense Linear Algebra Factorization

  • Conference paper
Applied Parallel Computing. State of the Art in Scientific Computing (PARA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4699))

Included in the following conference series:

Abstract

The full format data structures of Dense Linear Algebra hurt the performance of its factorization algorithms. Full format rectangular matrices are the input and output of level the 3 BLAS. It follows that the LAPACK and Level 3 BLAS approach has a basic performance flaw. We describe a new result that shows that representing a matrix A as a collection of square blocks will reduce the amount of data reformating required by dense linear algebra factorization algorithms from O(n 3) to O(n 2). On an IBM Power3 processor our implementation of Cholesky factorization achieves 92% of peak performance whereas conventional full format LAPACK DPOTRF achieves 77% of peak performance. All programming for our new data structures may be accomplished in standard Fortran, through the use of higher dimensional full format arrays. Thus, new compiler support may not be necessary. We also discuss the role of concatenating submatrices to facilitate hardware streaming. Finally, we discuss a new concept which we call the L1 / L0 cache interface.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, R.C., Gustavson, F.G., Zubair, M.: Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development 38(5), 563–576 (1994)

    Google Scholar 

  2. Andersen, B.S., Gustavson, F.G., Wasnieski, J.: A Recursive Formulation of Cholesky Factorization of a Matrix in Packed Storage. ACM TOMS 27(2), 214–244 (2001)

    Article  MATH  Google Scholar 

  3. Andersen, B.S., Gunnels, J.A., Gustavson, F.G., Reid, J.K., Wasnieski, J.: A Fully Portable High Performance Minimal Storage Hybrid Cholesky Algorithm. ACM TOMS 31(2), 201–227 (2005)

    Article  MATH  Google Scholar 

  4. Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide Release 3.0, SIAM, Philadelphia (1999), http://www.netlib.org/lapack/lug/lapack_lug.html

  5. Bilmes, J., Asanovic, K., Whye Chin, C., Demmel, J.: Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology. In: Proceedings of International Conference on Supercomputing, Vienna, Austria (1997)

    Google Scholar 

  6. Chatterjee, S., et al.: Design and Exploitation of a High-performance SIMD Floating-point Unit for Blue Gene/L. IBM Journal of Research and Development 49(2-3), 377–391 (2005)

    Article  Google Scholar 

  7. Dongarra, J.J., Moler, C.B., Bunch, J.R., Stewart, G.W.: LINPACK Users’ Guide Release 2.0. SIAM, Philadelphia (1979)

    Google Scholar 

  8. Dongarra, J.J., Gustavson, F.G., Karp, A.: Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine. SIAM Review 26(1), 91–112 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  9. Dongarra, J.J., Du Croz, J., Hammarling, S., Hanson, R.J.: An Extended Set of FORTRAN Basic Linear Algebra Subprograms. TOMS 14(1), 1–17 (1988)

    Article  MATH  Google Scholar 

  10. Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A Set of Level 3 Basic Linear Algebra Subprograms. TOMS 16(1), 1–17 (1990)

    Article  MATH  Google Scholar 

  11. Elmroth, E., Gustavson, F.G., Kagstrom, B., Jonsson, I.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  12. Gunnels, J., Gustavson, F.G., Henry, G., van de Geijn, R.: Formal linear algebra methods environment (FLAME). ACM TOMS 27(4), 422–455 (2001)

    Article  MATH  Google Scholar 

  13. Gunnels, J.A., Gustavson, F.G.: A New Array Format for Symmetric and Triangular Matrices. In: Dongarra, J.J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 247–255. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: A Family of High-Performance Matrix Multiplication Algorithms. In: Dongarra, J.J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 256–265. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Gustavson, F.G.: Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM Journal of Research and Development 41(6), 737–755 (1997)

    Google Scholar 

  16. Gustavson, F.G., Jonsson, I.: Minimal Storage High Performance Cholesky via Blocking and Recursion. IBM Journal of Research and Development 44(6), 823–849 (2000)

    Google Scholar 

  17. Gustavson, F.G.: High Performance Linear Algebra Algorithms using New Generalized Data Structures for Matrices. IBM Journal of Research and Development 47(1), 31–55 (2003)

    MathSciNet  Google Scholar 

  18. Gustavson, F.G.: New Generalized Data Structures for Matrices Lead to a Variety of High performance Dense Linear Algorithms. In: Dongarra, J.J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 11–20. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Gustavson, F.G., Wasniewski, J.: Rectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers. In: Kagstom, B., Elmroth, E. (eds.) Para 2006. LNCS, vol. xxxx, pp. 570–579. Springer, Heidelberg (2006)

    Google Scholar 

  20. IBM: IBM Engineering and Scientific Subroutine Library for AIX Version 3, Release 3. IBM Pub. No. SA22-7272-04 (December 2001)

    Google Scholar 

  21. Kalla, R., Sinharoy, B., Tendler, J.: Power 5. HotChips-15, August 17-19, 2003, Stanford, CA (2003)

    Google Scholar 

  22. Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic Linear Algebra Subprograms for Fortran Usage. TOMS 5(3), 308–323 (1979)

    Article  MATH  Google Scholar 

  23. Park, N., Hong, B., Prasanna, V.K.: Tiling, Block Data Layout, and Memory Hierarchy Performance. IEEE Trans. Parallel and Distributed Systems 14(7), 640–654 (2003)

    Article  Google Scholar 

  24. Sinharoy, B., Kalla, R.N., Tendler, J.M, Kovacs, R.G., Eickemeyer, R.J., Joyner, J.B.: POWER5 System Microarchitecture. IBM Journal of Research and Development 49(4/5), 505–521 (2005)

    Google Scholar 

  25. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimization of Software and the ATLAS Project. Parallel Computing (1-2), 3–35 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bo Kågström Erik Elmroth Jack Dongarra Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gustavson, F.G., Gunnels, J.A., Sexton, J.C. (2007). Minimal Data Copy for Dense Linear Algebra Factorization. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75755-9_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75754-2

  • Online ISBN: 978-3-540-75755-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics