Skip to main content
Log in

Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Solving large-scale sparse linear systems over GF(2) plays a key role in fluid mechanics, simulation and design of materials, petroleum seismic data processing, numerical weather prediction, computational electromagnetics, and numerical simulation of unclear explosions. Therefore, developing algorithms for this issue is a significant research topic. In this paper, we proposed a hyper-scale custom supercomputer architecture that matches specific data features to process the key procedure of block Wiedemann algorithm and its parallel algorithm on the custom machine. To increase the computation, communication, and storage performance, four optimization strategies are proposed. This paper builds a performance model to evaluate the execution performance and power consumption for our custom machine. The model shows that the optimization strategies result in a considerable speedup, even three times faster than the fastest supercomputer, TH2, while consuming less power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Anzt H, Tomov S, Dongarra J (2015) Energy efficiency and performance frontiers for sparse computations on GPU supercomputers. In: Proceedings of the sixth international workshop on programming models and applications for multicores and manycores, pp 1–10. ACM

  2. Aoki K, Shimoyama T, Ueda H (2007) Experiments on the linear algebra step in the number field sieve. In: Atsuko M, Hiroaki K, Kai R (eds) Advances in information and computer security, pp 58–73. Springer, Berlin

  3. Awad M (2009) FPGA supercomputing platforms: a survey. In: International conference on field programmable logic and applications, 2009. FPL 2009, pp 564–568. IEEE

  4. Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies. IBM Reserach Report, RC24704 (W0812-047)

  5. Buluç A, Gilbert JR (2008) On the representation and multiplication of hypersparse matrices. In: IEEE international symposium on parallel and distributed processing, 2008. IPDPS 2008, pp 1–11. IEEE

  6. Çatalyürek UV, Aykanat C (2001) A fine-grain hypergraph model for 2D decomposition of sparse matrices. In: Parallel and distributed processing symposium. Proceedings 15th international, pp 1199–1204. IEEE

  7. Chen C, Du Y, Jiang H, Zuo K, Yang C (2014) HPCG: preliminary evaluation and optimization on Tianhe-2 CPU-only nodes. In: 2014 IEEE 26th international symposium on computer architecture and high performance computing (SBAC-PAD), pp 41–48. IEEE

  8. Coppersmith D (1994) Solving homogeneous linear equations over GF(2) via block Wiedemann algorithm. Math Comput 62(205):333–350

    MathSciNet  MATH  Google Scholar 

  9. Dave N, Fleming K, King M, Pellauer M, Vijayaraghavan M (2007) Hardware acceleration of matrix multiplication on a xilinx FPGA. In: 5th IEEE/ACM international conference on formal methods and models for codesign, 2007. MEMOCODE 2007, pp 97–100. IEEE

  10. Dordopulo AI, Levin II, Doronchenko YI, Raskladkin MK (2015) High-performance reconfigurable computer systems based on virtex FPGAs. In: Victor M (ed) Parallel computing technologies, pp 349–362. Springer, Berlin

  11. Dou Y, Vassiliadis S, Kuzmanov G, Gaydadjiev G (2005) 64-bit floating-point FPGA matrix multiplication. In: FPGA, pp 86–95. ACM, New York

  12. Güneysu T, Paar C, Pfeiffer G, Schimmler M (2008) Enhancing copacobana for advanced applications in cryptography and cryptanalysis. In: International conference on field programmable logic and applications, 2008. FPL 2008, pp 675–678. IEEE

  13. Kapre N, Moorthy P (2015) A case for embedded FPGA-based socs in energy-efficient acceleration of graph problems. Supercomput Front Innov 2(3):76–86

    Google Scholar 

  14. Kimball D, Michel E, Keltcher P, Wolf MM (2014) Quantifying the effect of matrix structure on multithreaded performance of the SPMV kernel. In: High performance extreme computing conference (HPEC), 2014 IEEE, pp 1–6. IEEE

  15. Langr D, Tvrdik P (2015) Evaluation criteria for sparse matrix storage formats. IEEE Trans Parallel Distrib Syst 27(2):428–440

  16. Meintanis D, Papaefstathiou I (2009) A module-based partial reconfiguration design for solving sparse linear systems over GF (2). In: International conference on field-programmable technology, 2009. FPT 2009, pp 335–338. IEEE

  17. Pichel JC, Lorenzo JA, Heras DB, Cabaleiro JC (2009) Evaluating sparse matrix-vector product on the finisterrae supercomputer. In: 9th international conference on computational and mathematical methods in science and engineering, pp 831–842

  18. Rajovic N, Carpenter PM, Gelado I, Puzovic N, Ramirez A, Valero M (2013) Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: 2013 international conference for high performance computing, networking, storage and analysis (SC), pp 1–12. IEEE

  19. Schmidt B, Aribowo H, Dang HV (2013) Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF (2) on multi-graphics processing unit systems. Concurr Comput Pract Exp 25(4):586–603

    Article  Google Scholar 

  20. Sedaghati N, Ashari A, Pouchet LN, Parthasarathy S, Sadayappan P (2015) Characterizing dataset dependence for sparse matrix-vector multiplication on GPUs. In: Proceedings of the 2nd workshop on parallel programming for analytics applications, pp 17–24. ACM

  21. Stanisic L, Videau B, Cronsioe J, Degomme A, Marangozova-Martin V, Legrand A, Méhaut JF (2013) Performance analysis of HPC applications on low-power embedded platforms. In: Proceedings of the conference on design, automation and test in Europe, pp 475–480. EDA Consortium

  22. Thomé E (2001) Fast computation of linear generators for matrix sequences and application to the block Wiedemann algorithm. In: Proceedings of the 2001 international symposium on symbolic and algebraic computation, pp 323–331. ACM

  23. Vastenhouw B, Bisseling RH (2005) A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev 47(1):67–95

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was funded by National Natural Science Foundation of China (number 61303070). We acknowledge TH-1A supercomputing system service to support our simulation. We would like to thank the reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingfei Jiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, T., Jiang, J. Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm. J Supercomput 72, 4181–4203 (2016). https://doi.org/10.1007/s11227-016-1767-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1767-y

Keywords

Navigation