Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

Zhou, Tong; Jiang, Jingfei

doi:10.1007/s11227-016-1767-y

Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

Published: 03 June 2016

Volume 72, pages 4181–4203, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Tong Zhou¹ &
Jingfei Jiang¹

147 Accesses
1 Citation
Explore all metrics

Abstract

Solving large-scale sparse linear systems over GF(2) plays a key role in fluid mechanics, simulation and design of materials, petroleum seismic data processing, numerical weather prediction, computational electromagnetics, and numerical simulation of unclear explosions. Therefore, developing algorithms for this issue is a significant research topic. In this paper, we proposed a hyper-scale custom supercomputer architecture that matches specific data features to process the key procedure of block Wiedemann algorithm and its parallel algorithm on the custom machine. To increase the computation, communication, and storage performance, four optimization strategies are proposed. This paper builds a performance model to evaluate the execution performance and power consumption for our custom machine. The model shows that the optimization strategies result in a considerable speedup, even three times faster than the fastest supercomputer, TH2, while consuming less power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Article 02 May 2019

Maximilian Bremer, Kazbek Kazhyken, … Clint Dawson

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Article 23 January 2023

Xiaobin He, Xin Chen, … Zuoning Chen

Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight

Article 13 May 2019

Wenjing Ma, Yulong Ao, … Samuel Williams

References

Anzt H, Tomov S, Dongarra J (2015) Energy efficiency and performance frontiers for sparse computations on GPU supercomputers. In: Proceedings of the sixth international workshop on programming models and applications for multicores and manycores, pp 1–10. ACM
Aoki K, Shimoyama T, Ueda H (2007) Experiments on the linear algebra step in the number field sieve. In: Atsuko M, Hiroaki K, Kai R (eds) Advances in information and computer security, pp 58–73. Springer, Berlin
Awad M (2009) FPGA supercomputing platforms: a survey. In: International conference on field programmable logic and applications, 2009. FPL 2009, pp 564–568. IEEE
Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies. IBM Reserach Report, RC24704 (W0812-047)
Buluç A, Gilbert JR (2008) On the representation and multiplication of hypersparse matrices. In: IEEE international symposium on parallel and distributed processing, 2008. IPDPS 2008, pp 1–11. IEEE
Çatalyürek UV, Aykanat C (2001) A fine-grain hypergraph model for 2D decomposition of sparse matrices. In: Parallel and distributed processing symposium. Proceedings 15th international, pp 1199–1204. IEEE
Chen C, Du Y, Jiang H, Zuo K, Yang C (2014) HPCG: preliminary evaluation and optimization on Tianhe-2 CPU-only nodes. In: 2014 IEEE 26th international symposium on computer architecture and high performance computing (SBAC-PAD), pp 41–48. IEEE
Coppersmith D (1994) Solving homogeneous linear equations over GF(2) via block Wiedemann algorithm. Math Comput 62(205):333–350
MathSciNet MATH Google Scholar
Dave N, Fleming K, King M, Pellauer M, Vijayaraghavan M (2007) Hardware acceleration of matrix multiplication on a xilinx FPGA. In: 5th IEEE/ACM international conference on formal methods and models for codesign, 2007. MEMOCODE 2007, pp 97–100. IEEE
Dordopulo AI, Levin II, Doronchenko YI, Raskladkin MK (2015) High-performance reconfigurable computer systems based on virtex FPGAs. In: Victor M (ed) Parallel computing technologies, pp 349–362. Springer, Berlin
Dou Y, Vassiliadis S, Kuzmanov G, Gaydadjiev G (2005) 64-bit floating-point FPGA matrix multiplication. In: FPGA, pp 86–95. ACM, New York
Güneysu T, Paar C, Pfeiffer G, Schimmler M (2008) Enhancing copacobana for advanced applications in cryptography and cryptanalysis. In: International conference on field programmable logic and applications, 2008. FPL 2008, pp 675–678. IEEE
Kapre N, Moorthy P (2015) A case for embedded FPGA-based socs in energy-efficient acceleration of graph problems. Supercomput Front Innov 2(3):76–86
Google Scholar
Kimball D, Michel E, Keltcher P, Wolf MM (2014) Quantifying the effect of matrix structure on multithreaded performance of the SPMV kernel. In: High performance extreme computing conference (HPEC), 2014 IEEE, pp 1–6. IEEE
Langr D, Tvrdik P (2015) Evaluation criteria for sparse matrix storage formats. IEEE Trans Parallel Distrib Syst 27(2):428–440
Meintanis D, Papaefstathiou I (2009) A module-based partial reconfiguration design for solving sparse linear systems over GF (2). In: International conference on field-programmable technology, 2009. FPT 2009, pp 335–338. IEEE
Pichel JC, Lorenzo JA, Heras DB, Cabaleiro JC (2009) Evaluating sparse matrix-vector product on the finisterrae supercomputer. In: 9th international conference on computational and mathematical methods in science and engineering, pp 831–842
Rajovic N, Carpenter PM, Gelado I, Puzovic N, Ramirez A, Valero M (2013) Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: 2013 international conference for high performance computing, networking, storage and analysis (SC), pp 1–12. IEEE
Schmidt B, Aribowo H, Dang HV (2013) Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF (2) on multi-graphics processing unit systems. Concurr Comput Pract Exp 25(4):586–603
Article Google Scholar
Sedaghati N, Ashari A, Pouchet LN, Parthasarathy S, Sadayappan P (2015) Characterizing dataset dependence for sparse matrix-vector multiplication on GPUs. In: Proceedings of the 2nd workshop on parallel programming for analytics applications, pp 17–24. ACM
Stanisic L, Videau B, Cronsioe J, Degomme A, Marangozova-Martin V, Legrand A, Méhaut JF (2013) Performance analysis of HPC applications on low-power embedded platforms. In: Proceedings of the conference on design, automation and test in Europe, pp 475–480. EDA Consortium
Thomé E (2001) Fast computation of linear generators for matrix sequences and application to the block Wiedemann algorithm. In: Proceedings of the 2001 international symposium on symbolic and algebraic computation, pp 323–331. ACM
Vastenhouw B, Bisseling RH (2005) A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev 47(1):67–95
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was funded by National Natural Science Foundation of China (number 61303070). We acknowledge TH-1A supercomputing system service to support our simulation. We would like to thank the reviewers for their helpful comments.

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Tong Zhou & Jingfei Jiang

Authors

Tong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jingfei Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingfei Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, T., Jiang, J. Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm. J Supercomput 72, 4181–4203 (2016). https://doi.org/10.1007/s11227-016-1767-y

Download citation

Published: 03 June 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11227-016-1767-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

Abstract

Access this article

Similar content being viewed by others

Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

Abstract

Access this article

Similar content being viewed by others

Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation