Skip to main content

Advertisement

Log in

Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components

  • Published:
Computational Mathematics and Modeling Aims and scope Submit manuscript

Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced software consists of a Python module and a C++ library which enable to manage streams for concurrent computations of separated linear systems on a GPU (and GPUs). The GMRES solver is parallelized for running on a NVIDIA GPGPU accelerator. The parallelization efficiency is explored when GMRES is applied to solve (Helmholtz equation) linear systems based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain. The proposed implementation shew the maximal speedup of 55 (\( \overline{t}=0.017\ \mathrm{s} \)) and of 125 (\( \overline{t}=0.77\ \mathrm{s} \)) for 1024 × 1024 (on GTX 1080 Ti) and 8192 × 8192 (on Tesla V100) dense Toeplitz matrices generated from GFIEM. 1024 × 1024 resolution provides accuracy 6.1% that can be acceptable according to testing and demonstrating on gradient computations and topology optimization. We open up possibilities for robust topology optimization of passive photonic integrated components. That has the advantage, e. g., of faster and more accurate designing photonic components on a PC without a supercomputer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. T. Søderåard, Green’s Function Integral Equation Methods in Nano-optics, CRC Press, Boca Raton (2019).

    Google Scholar 

  2. E. Chu and A. George, Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms, CRC Press, Boca Raton (1999).

    Book  Google Scholar 

  3. J. JáJá, An Introduction to Parallel Algorithms, vol. 17, Addison-Wesley Reading, New York (1992).

    MATH  Google Scholar 

  4. A. E. Martínez-Castro, J. A. Molina-Moya, and P. Ortiz, “An iterative parallel solver in gpu applied to frequency domain linear water wave problems by the boundary element method,” Front. Built Env., 4, 69 (2018).

    Article  Google Scholar 

  5. X. Liu, Z. Liu, S. X.-Tan, and A. J. Gordon, “Full-chip thermal analysis of 3D ICs with liquid cooling by GPU-accelerated GMRES method,” in: Thirteenth International Symposium on Quality Electronic Design (ISQED) (2012), pp. 123–128; 10.1109/ISQED.2012.6187484.

  6. Z. Chen, H. Liu, S. Yu, B. Hsieh, and L. Shao, “Reservoir simulation on nvidia tesla gpus,” Rec. Adv. Sci. Comp. Appl., 586, 125 (2013).

    MathSciNet  MATH  Google Scholar 

  7. R. Li, and Y. Saad, “GPU-accelerated preconditioned iterative linear solvers,” J. Supercomp., 63(2), 443–466 (2013); https://doi.org/10.1007/s11227-012-0825-3.

    Article  Google Scholar 

  8. I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, “Improving the performance of ca-gmres on multicores with multiple gpus,” in: 2014 IEEE 28th International Parallel and Distributed Processing Symposium (2014), pp. 382–391.

  9. R. Couturier, “Designing scientific applications on GPUs,” Chapman & Hall/CRC Numerical Analysis and Scientific Computing Series, CRC Press, Boca Raton (2013); https://books.google.ru/books?id=C1 SBQAAQBAJ.

  10. G. Marchuk and Y. Kuznetsov, “On the question of optimal iteration processes [in Russian],” in: Doklady Akademii SSSR, 181, 1331–1334 (1968).

    MathSciNet  Google Scholar 

  11. Y. Saad and M. H. Schultz, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM J. Sc. Stat. Comp., 7(3), 856–869 (1986).

    Article  MathSciNet  Google Scholar 

  12. I. Dravins, “Numerical implementations of the generalized minimal residual method (GMRES),” MSc Theses in Math. Sci. (2015).

  13. H. F. Walker and P. Ni, “Anderson acceleration for fixed-point iterations,” SIAM J. Num. Anal., 49(4), 1715–1735 (2011).

    Article  MathSciNet  Google Scholar 

  14. J. Drkošová, A. Greenbaum, M. Rozložník, and Z. Strakoš, “Numerical stability of GMRES,” BIT Num. Math., 35(3), 309–330 (1995).

    Article  MathSciNet  Google Scholar 

  15. R. Karlson, A Study of Some Roundoff Effects of the GMRES-Method, Universitetet i Linköping/Tekniska Högskolan i Linköping, Linköping (1991).

    Google Scholar 

  16. G. Meurant, Computer Solution of Large Linear Systems, Vol. 28, Elsevier, Amsterdam (1999).

    MATH  Google Scholar 

  17. Y. T. Feng, D. Peri, and D. R. J. Owen, “A multi-grid enhanced gmres algorithm for elasto-plastic problems,” Int. J. Num. Meth. Eng., 42(8), 1441–1462 (1998).

    Article  Google Scholar 

  18. P. Ghysels, T. Ashby, K. Meerbergen, and W. Vanroose, “Hiding global communication latency in the gmres algorithm on massively parallel machines,” SIAM J. Sci. Comp., 35(1), 48–71 (2013); 10.1137/12086563X; 10.1137/12086563X.

  19. C. Vuik, R. R. P. van Nooyen, and A. P. Wesseling, “Parallelism in ILU-preconditioned GMRES,” Par. Comp., 24(14), 1927–1946 (1998); https://doi.org/10.1016/S0167-8191(98)00084-2.

    Article  MathSciNet  Google Scholar 

  20. M. Harris, “An efficient matrix transpose in CUDA C/C++,” Nvidia, 26, 2018 (2013).

    Google Scholar 

  21. E. de Sturler, “A parallel variant of GMRES (m),” in: Proceedings of the 13th IMACS World Congress on Computational and Applied Mathematics, IMACS, Criterion Press, vol. 9 (1991).

  22. M. Bobrov, R. Melton, S. Radziszowski, and M. Lukowiak, “Effects of GPU and CPU loads on performance of CUDA applications,” in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 1, WorldComp (2011).

  23. T. J. Ashby, P. Ghysels, W. Heirman, and W. Vanroose, “The impact of global communication latency at extreme scales on Krylov methods,” in: International Conference on Algorithms and Architectures for Parallel Processing, Springer (2012), pp. 428–442.

  24. E. C. Carson, Communication-avoiding Krylov subspace methods in theory and practice, PhD Thesis, UC Berkeley (2015).

  25. M. Hoemmen, Communication-avoiding Krylov subspace methods, PhD Thesis, UC Berkeley (2010).

  26. G. Li, “A block variant of the gmres method on massively parallel processors,” Par. Comp., 23(8), 1005–1019 (1997); https://doi.org/10.1016/S0167-8191(97)00004-5.

    Article  MathSciNet  MATH  Google Scholar 

  27. Y. Liu, S. Mukherjee, N. Nishimura, M. Schanz, W. Ye, A. Sutradhar, E. Pan, N. Dumont, A. Frangi, and A. Saez, “Recent advances and emerging applications of the boundary element method,” Appl. Mech. Rev., 64(3), 030802 (2011).

    Article  Google Scholar 

  28. I. B. Minin, E. E. Nuzhin, A. I. Boyko, M. S. Litsarev, and I. V. Oseledets, “Evolutionary structural optimization al- gorithm based on fft-jvie solver for inverse design of wave devices,” in: 2018 Engineering and Telecommunication (EnT-MIPT) (2018), pp. 146–150.

  29. D. Guide, “Cuda c best practices guide,” NVIDIA, July (2013).

  30. J. Sanders and E. Kandrot, CUDA by Example: an Introduction to General-Purpose GPU Programming (2005).

    Google Scholar 

  31. M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, et al., “An overview of the Trilinos project,” ACM TOMS, 31(3), 397–423 (2005).

    Article  MathSciNet  Google Scholar 

  32. S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, et al., Petsc Users Manual (2019).

  33. H. Anzt, T. Cojean, G. Flegar, F. Gbel, T. Grtzmacher, P. Nayak, T. Ribizel, Y. M. Tsai, A. E. S. Quintana-Ortí, Ginkgo, A Modern Linear Operator Algebra Framework for High Performance Computing (2020).

  34. L. Reichel and Q. Ye, “Breakdown-free gmres for singular systems,” SIAM J. Math. Anal. Appl., 26(4), 1001–1021 (2005).

    Article  MathSciNet  Google Scholar 

  35. D. Calvetti, B. Lewis, and L. Reichel, “Gmres-type methods for inconsistent systems,” Lin. Alg. Appl., 316(1-3), 157–169 (2000).

    Article  MathSciNet  Google Scholar 

  36. J. R. Partington, J. R. Partington, et al., An Introduction to Hankel Operators, Vol. 13, Cambridge University Press, Cambridge (1988).

    MATH  Google Scholar 

  37. V. Y. Pan, Structured Matrices and Polynomials: Unified Superfast Algorithms, Springer, Boston (2012).

    MATH  Google Scholar 

  38. S. Kavitha, V. Vijay, and A. Saketh, “Matrix sort-a parallelizable sorting algorithm,” Int. J. Comp. Appl., 143(9), 1–6 (2016).

    Google Scholar 

  39. V. Olshevsky, I. Oseledets, and E. Tyrtyshnikov, “Tensor properties of multilevel toeplitz and related matrices,” Lin. Alg. Appl., 412(1), 1–21 (2006).

    Article  MathSciNet  Google Scholar 

  40. I. B. Minin, pycuGMRES (2020); https://github.com/iurii-minin/pycuGMRES; https://pypi.org/project/pycuGMRES/.

  41. I. B. Minin, cuGMRES (2020); https://github.com/iurii-minin/cuGMRES.

  42. M. Lucia, F. Maggio, and G. Rodriguez, “Numerical solution of the helmholtz equation in an infinite strip by wiener- hopf factorization,” Num. Meth. Part. Diff. Eq., 26(6), 1247–1274 (2010).

    MathSciNet  MATH  Google Scholar 

  43. R. Borghi, F. Gori, M. Santarsiero, F. Frezza, and G. Schettini, “Plane-wave scattering by a set of perfectly conducting circular cylinders in the presence of a plane surface,” JOSA A, 13(12), 2441–2452 (1996).

    Article  Google Scholar 

  44. M. P. Bendsoe and O. Sigmund, Topology Optimization: Theory, Methods, and Applications, Springer, Berlin (2013).

    MATH  Google Scholar 

  45. J. Bannister, L. Fratta, and M. Gerla, “Optimal topologies for the wavelength-division optical network,” in: Proc. EFOC/LAN’90, Munich, Germany (1990), pp. 53–57.

  46. S. Banerjee and B. Mukherjee, “The photonic ring: Algorithms for optimized node arrangements,” Fib. & Int. Opt., 12(2), 133–171 (1993).

    Article  Google Scholar 

  47. J. Smajic, C. Hafner, and D. Erni, “Optimization of photonic crystal structures,” JOSA A, 21(11), 2223–2232 (2004).

    Article  Google Scholar 

  48. T. Asano and S. Noda, “Iterative optimization of photonic crystal nanocavity designs by using deep neural networks,” Nanoph., 8(12), 2243–2256 (2019).

    Article  Google Scholar 

  49. I. Zacharov, R. Arslanov, M. Gunin, D. Stefonishin, A. Bykov, S. Pavlov, O. Panarin, A. Maliutin, S. Rykovanov, and M. Fedorov, “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology, vol. 9, pp. 512–520 (2019); 10.1515/eng-2019-0059; https://www.degruyter.com/view/j/eng.2019.9.issue-1/eng-2019-0059/eng-2019-0059.xml.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Minin, I.B., Matveev, S.A., Fedorov, M.V. et al. Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components. Comput Math Model 32, 438–452 (2021). https://doi.org/10.1007/s10598-022-09545-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10598-022-09545-2

Keywords

Navigation