Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components

Minin, Iu. B.; Matveev, S. A.; Fedorov, M. V.; Zacharov, I. E.; Rykovanov, S. G.

doi:10.1007/s10598-022-09545-2

Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components

Published: 05 May 2022

Volume 32, pages 438–452, (2021)
Cite this article

Computational Mathematics and Modeling Aims and scope Submit manuscript

Iu. B. Minin^1,2,
S. A. Matveev^1,3,4,
M. V. Fedorov^1,5,
I. E. Zacharov¹ &
…
S. G. Rykovanov¹

124 Accesses
5 Citations
Explore all metrics

Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced software consists of a Python module and a C++ library which enable to manage streams for concurrent computations of separated linear systems on a GPU (and GPUs). The GMRES solver is parallelized for running on a NVIDIA GPGPU accelerator. The parallelization efficiency is explored when GMRES is applied to solve (Helmholtz equation) linear systems based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain. The proposed implementation shew the maximal speedup of 55 (\( \overline{t}=0.017\ \mathrm{s} \)) and of 125 (\( \overline{t}=0.77\ \mathrm{s} \)) for 1024 × 1024 (on GTX 1080 Ti) and 8192 × 8192 (on Tesla V100) dense Toeplitz matrices generated from GFIEM. 1024 × 1024 resolution provides accuracy 6.1% that can be acceptable according to testing and demonstrating on gradient computations and topology optimization. We open up possibilities for robust topology optimization of passive photonic integrated components. That has the advantage, e. g., of faster and more accurate designing photonic components on a PC without a supercomputer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High Performance Parallel LOBPCG Method for Large Hamiltonian Derived from Hubbard Model on Multi-GPU Systems

Efficient hybrid topology optimization using GPU and homogenization-based multigrid approach

Article 29 December 2022

A parallel geometric multigrid method for adaptive topology optimization

Article Open access 09 October 2023

References

T. Søderåard, Green’s Function Integral Equation Methods in Nano-optics, CRC Press, Boca Raton (2019).
Google Scholar
E. Chu and A. George, Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms, CRC Press, Boca Raton (1999).
Book Google Scholar
J. JáJá, An Introduction to Parallel Algorithms, vol. 17, Addison-Wesley Reading, New York (1992).
MATH Google Scholar
A. E. Martínez-Castro, J. A. Molina-Moya, and P. Ortiz, “An iterative parallel solver in gpu applied to frequency domain linear water wave problems by the boundary element method,” Front. Built Env., 4, 69 (2018).
Article Google Scholar
X. Liu, Z. Liu, S. X.-Tan, and A. J. Gordon, “Full-chip thermal analysis of 3D ICs with liquid cooling by GPU-accelerated GMRES method,” in: Thirteenth International Symposium on Quality Electronic Design (ISQED) (2012), pp. 123–128; 10.1109/ISQED.2012.6187484.
Z. Chen, H. Liu, S. Yu, B. Hsieh, and L. Shao, “Reservoir simulation on nvidia tesla gpus,” Rec. Adv. Sci. Comp. Appl., 586, 125 (2013).
MathSciNet MATH Google Scholar
R. Li, and Y. Saad, “GPU-accelerated preconditioned iterative linear solvers,” J. Supercomp., 63(2), 443–466 (2013); https://doi.org/10.1007/s11227-012-0825-3.
Article Google Scholar
I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, “Improving the performance of ca-gmres on multicores with multiple gpus,” in: 2014 IEEE 28th International Parallel and Distributed Processing Symposium (2014), pp. 382–391.
R. Couturier, “Designing scientific applications on GPUs,” Chapman & Hall/CRC Numerical Analysis and Scientific Computing Series, CRC Press, Boca Raton (2013); https://books.google.ru/books?id=C1 SBQAAQBAJ.
G. Marchuk and Y. Kuznetsov, “On the question of optimal iteration processes [in Russian],” in: Doklady Akademii SSSR, 181, 1331–1334 (1968).
MathSciNet Google Scholar
Y. Saad and M. H. Schultz, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM J. Sc. Stat. Comp., 7(3), 856–869 (1986).
Article MathSciNet Google Scholar
I. Dravins, “Numerical implementations of the generalized minimal residual method (GMRES),” MSc Theses in Math. Sci. (2015).
H. F. Walker and P. Ni, “Anderson acceleration for fixed-point iterations,” SIAM J. Num. Anal., 49(4), 1715–1735 (2011).
Article MathSciNet Google Scholar
J. Drkošová, A. Greenbaum, M. Rozložník, and Z. Strakoš, “Numerical stability of GMRES,” BIT Num. Math., 35(3), 309–330 (1995).
Article MathSciNet Google Scholar
R. Karlson, A Study of Some Roundoff Effects of the GMRES-Method, Universitetet i Linköping/Tekniska Högskolan i Linköping, Linköping (1991).
Google Scholar
G. Meurant, Computer Solution of Large Linear Systems, Vol. 28, Elsevier, Amsterdam (1999).
MATH Google Scholar
Y. T. Feng, D. Peri, and D. R. J. Owen, “A multi-grid enhanced gmres algorithm for elasto-plastic problems,” Int. J. Num. Meth. Eng., 42(8), 1441–1462 (1998).
Article Google Scholar
P. Ghysels, T. Ashby, K. Meerbergen, and W. Vanroose, “Hiding global communication latency in the gmres algorithm on massively parallel machines,” SIAM J. Sci. Comp., 35(1), 48–71 (2013); 10.1137/12086563X; 10.1137/12086563X.
C. Vuik, R. R. P. van Nooyen, and A. P. Wesseling, “Parallelism in ILU-preconditioned GMRES,” Par. Comp., 24(14), 1927–1946 (1998); https://doi.org/10.1016/S0167-8191(98)00084-2.
Article MathSciNet Google Scholar
M. Harris, “An efficient matrix transpose in CUDA C/C++,” Nvidia, 26, 2018 (2013).
Google Scholar
E. de Sturler, “A parallel variant of GMRES (m),” in: Proceedings of the 13th IMACS World Congress on Computational and Applied Mathematics, IMACS, Criterion Press, vol. 9 (1991).
M. Bobrov, R. Melton, S. Radziszowski, and M. Lukowiak, “Effects of GPU and CPU loads on performance of CUDA applications,” in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 1, WorldComp (2011).
T. J. Ashby, P. Ghysels, W. Heirman, and W. Vanroose, “The impact of global communication latency at extreme scales on Krylov methods,” in: International Conference on Algorithms and Architectures for Parallel Processing, Springer (2012), pp. 428–442.
E. C. Carson, Communication-avoiding Krylov subspace methods in theory and practice, PhD Thesis, UC Berkeley (2015).
M. Hoemmen, Communication-avoiding Krylov subspace methods, PhD Thesis, UC Berkeley (2010).
G. Li, “A block variant of the gmres method on massively parallel processors,” Par. Comp., 23(8), 1005–1019 (1997); https://doi.org/10.1016/S0167-8191(97)00004-5.
Article MathSciNet MATH Google Scholar
Y. Liu, S. Mukherjee, N. Nishimura, M. Schanz, W. Ye, A. Sutradhar, E. Pan, N. Dumont, A. Frangi, and A. Saez, “Recent advances and emerging applications of the boundary element method,” Appl. Mech. Rev., 64(3), 030802 (2011).
Article Google Scholar
I. B. Minin, E. E. Nuzhin, A. I. Boyko, M. S. Litsarev, and I. V. Oseledets, “Evolutionary structural optimization al- gorithm based on fft-jvie solver for inverse design of wave devices,” in: 2018 Engineering and Telecommunication (EnT-MIPT) (2018), pp. 146–150.
D. Guide, “Cuda c best practices guide,” NVIDIA, July (2013).
J. Sanders and E. Kandrot, CUDA by Example: an Introduction to General-Purpose GPU Programming (2005).
Google Scholar
M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, et al., “An overview of the Trilinos project,” ACM TOMS, 31(3), 397–423 (2005).
Article MathSciNet Google Scholar
S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, et al., Petsc Users Manual (2019).
H. Anzt, T. Cojean, G. Flegar, F. Gbel, T. Grtzmacher, P. Nayak, T. Ribizel, Y. M. Tsai, A. E. S. Quintana-Ortí, Ginkgo, A Modern Linear Operator Algebra Framework for High Performance Computing (2020).
L. Reichel and Q. Ye, “Breakdown-free gmres for singular systems,” SIAM J. Math. Anal. Appl., 26(4), 1001–1021 (2005).
Article MathSciNet Google Scholar
D. Calvetti, B. Lewis, and L. Reichel, “Gmres-type methods for inconsistent systems,” Lin. Alg. Appl., 316(1-3), 157–169 (2000).
Article MathSciNet Google Scholar
J. R. Partington, J. R. Partington, et al., An Introduction to Hankel Operators, Vol. 13, Cambridge University Press, Cambridge (1988).
MATH Google Scholar
V. Y. Pan, Structured Matrices and Polynomials: Unified Superfast Algorithms, Springer, Boston (2012).
MATH Google Scholar
S. Kavitha, V. Vijay, and A. Saketh, “Matrix sort-a parallelizable sorting algorithm,” Int. J. Comp. Appl., 143(9), 1–6 (2016).
Google Scholar
V. Olshevsky, I. Oseledets, and E. Tyrtyshnikov, “Tensor properties of multilevel toeplitz and related matrices,” Lin. Alg. Appl., 412(1), 1–21 (2006).
Article MathSciNet Google Scholar
I. B. Minin, pycuGMRES (2020); https://github.com/iurii-minin/pycuGMRES; https://pypi.org/project/pycuGMRES/.
I. B. Minin, cuGMRES (2020); https://github.com/iurii-minin/cuGMRES.
M. Lucia, F. Maggio, and G. Rodriguez, “Numerical solution of the helmholtz equation in an infinite strip by wiener- hopf factorization,” Num. Meth. Part. Diff. Eq., 26(6), 1247–1274 (2010).
MathSciNet MATH Google Scholar
R. Borghi, F. Gori, M. Santarsiero, F. Frezza, and G. Schettini, “Plane-wave scattering by a set of perfectly conducting circular cylinders in the presence of a plane surface,” JOSA A, 13(12), 2441–2452 (1996).
Article Google Scholar
M. P. Bendsoe and O. Sigmund, Topology Optimization: Theory, Methods, and Applications, Springer, Berlin (2013).
MATH Google Scholar
J. Bannister, L. Fratta, and M. Gerla, “Optimal topologies for the wavelength-division optical network,” in: Proc. EFOC/LAN’90, Munich, Germany (1990), pp. 53–57.
S. Banerjee and B. Mukherjee, “The photonic ring: Algorithms for optimized node arrangements,” Fib. & Int. Opt., 12(2), 133–171 (1993).
Article Google Scholar
J. Smajic, C. Hafner, and D. Erni, “Optimization of photonic crystal structures,” JOSA A, 21(11), 2223–2232 (2004).
Article Google Scholar
T. Asano and S. Noda, “Iterative optimization of photonic crystal nanocavity designs by using deep neural networks,” Nanoph., 8(12), 2243–2256 (2019).
Article Google Scholar
I. Zacharov, R. Arslanov, M. Gunin, D. Stefonishin, A. Bykov, S. Pavlov, O. Panarin, A. Maliutin, S. Rykovanov, and M. Fedorov, “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology, vol. 9, pp. 512–520 (2019); 10.1515/eng-2019-0059; https://www.degruyter.com/view/j/eng.2019.9.issue-1/eng-2019-0059/eng-2019-0059.xml.

Download references

Author information

Authors and Affiliations

Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow, Russia
Iu. B. Minin, S. A. Matveev, M. V. Fedorov, I. E. Zacharov & S. G. Rykovanov
Fryazino Branch of Kotel’nikov Institute of Radio-Engineering and Electronics of Russian Academy of Sciences, Fryazino, Moscow, Russia
Iu. B. Minin
Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Moscow, Russia
S. A. Matveev
Marchuk Institute of Numerical Mathematics, Russian Academy of Sciences, Moscow, Russia
S. A. Matveev
Sirius University of Science and Technology, Sochi, Krasnodar Krai, Russia
M. V. Fedorov

Authors

Iu. B. Minin
View author publications
You can also search for this author in PubMed Google Scholar
S. A. Matveev
View author publications
You can also search for this author in PubMed Google Scholar
M. V. Fedorov
View author publications
You can also search for this author in PubMed Google Scholar
I. E. Zacharov
View author publications
You can also search for this author in PubMed Google Scholar
S. G. Rykovanov
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Minin, I.B., Matveev, S.A., Fedorov, M.V. et al. Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components. Comput Math Model 32, 438–452 (2021). https://doi.org/10.1007/s10598-022-09545-2

Download citation

Published: 05 May 2022
Issue Date: October 2021
DOI: https://doi.org/10.1007/s10598-022-09545-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components

Access this article

Similar content being viewed by others

High Performance Parallel LOBPCG Method for Large Hamiltonian Derived from Hubbard Model on Multi-GPU Systems

Efficient hybrid topology optimization using GPU and homogenization-based multigrid approach

A parallel geometric multigrid method for adaptive topology optimization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components

Access this article

Similar content being viewed by others

High Performance Parallel LOBPCG Method for Large Hamiltonian Derived from Hubbard Model on Multi-GPU Systems

Efficient hybrid topology optimization using GPU and homogenization-based multigrid approach

A parallel geometric multigrid method for adaptive topology optimization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation