Parallel Adaptive Cross Approximation for the Multi-trace Formulation of Scattering Problems

Kravčenko, Michal; Zapletal, Jan; Claeys, Xavier; Merta, Michal

doi:10.1007/978-3-030-43229-4_13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12043))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1292 Accesses
1 Citations

Abstract

We present a highly parallel version of the boundary element method accelerated by the adaptive cross approximation for the efficient solution of scattering problems with composite scatterers. Individual boundary integral operators are treated independently, i.e. the boundary of every homogeneous subdomain is decomposed into clusters of elements defining a block structure of the local matrix. The blocks are distributed across computational nodes by a graph algorithm providing a load balancing strategy. The intra-node implementation further utilizes threading in shared memory and in-core SIMD vectorization to make use of all features of modern processors. The suggested approach is validated on a series of numerical experiments presented in the paper.

You have full access to this open access chapter, Download conference paper PDF

Fast multipole preconditioners for sparse matrices arising from elliptic equations

Article Open access 09 November 2017

Huda Ibeid, Rio Yokota, … David Keyes

Heterogeneous Implementation of Preconditioners Based on Gauss–Seidel Method for Sparse Block Matrices

Article 01 October 2022

A. R. Magomedov & A. V. Gorobets

An Efficient High Performance Parallelization of a Discontinuous Galerkin Spectral Element Method

Keywords

1 Introduction

Boundary integral equations present a valuable tool for the description of natural phenomena including wave scattering problems. Their numerical counterpart, the boundary element method (BEM), has become an alternative to volume based approaches such as finite element or finite volume methods. Except for a natural transition of the problem to the skeleton of the computational domain, BEM has found its place in HPC implementations of PDE solvers. One of the reasons is high computational intensity of system matrix assemblers, which fits well with today’s design of HPC clusters, where memory accesses are much more costly than arithmetic operations. To overcome the quadratic complexity of BEM, several methods have been proposed including the fast multipole [4, 15], or adaptive cross approximation (ACA) [2, 16, 17] methods, with the latter one utilized in this paper.

To describe wave scattering in a composite scatterer and its complement we utilize the local version of the multi-trace formulation (MTF) as introduced in [5,6,7] and presented here briefly in Sect. 2. The formulation leads to a block matrix with individual boundary integral operators for every homogeneous subdomain and coupling matrices on the off-diagonals. Although not described in detail in this paper, MTF allows for a natural operator preconditioning. In Sect. 3 we propose a strategy to parallelize the assembly of the MTF matrix blocks and their application in an iterative solver based on the approach presented in [11,12,13] for single domain problems. Except for the distributed parallelism, the method takes full advantage of the BEM4I library [14, 20, 21] and its assemblers parallelized in shared memory and vectorized by OpenMP. We provide the results of numerical experiments in Sect. 4.

2 Multi-trace Formulation

In the following we consider a partitioning of the space into $n+1$ Lipschitz domains

$$\begin{aligned} {\mathbb {R}}^3 = \bigcup _{i=0}^n \overline{\varOmega _i},\quad \varGamma _i := \partial \varOmega _i, \quad \varSigma := \bigcup _{i=0}^n \varGamma _i \end{aligned}$$

with $\varOmega _0$ denoting the background unbounded domain, $\bigcup _{i=1}^n \varOmega _i$ representing a composite scatterer, and the skeleton $\varSigma $. For a given incident field ${u_\mathrm {inc}}$ satisfying $-\varDelta {u_\mathrm {inc}}- \kappa _0^2 {u_\mathrm {inc}}= 0$ in ${\mathbb {R}}^3$, we aim to solve the equation for $u={u_\mathrm {sc}}+{u_\mathrm {inc}}$,

$$\begin{aligned} -\varDelta u - \kappa _i^2 u = 0 \text { in } \varOmega _i,\quad u-{u_\mathrm {inc}}\text { satisfies the Sommerfeld condition} \end{aligned}$$

(2.1)

with the transmission conditions

$$\begin{aligned} u_i - u_j = 0, \quad \mu _i^{-1} t_i + \mu _j^{-1} t_j = 0 \quad \text {on } \varGamma _{i,j} := \varGamma _i \cap \varGamma _j \end{aligned}$$

(2.2)

with $u_i$, $t_i$ denoting the Dirichlet and Neumann traces of $u|_{\varOmega _i}$. Note that the normal vector ${{\varvec{n}}}_i$ is always directed outside of $\varOmega _i$, see Fig. 1.

Due to (2.1) and ${u_\mathrm {inc}}$ satisfying the Helmholtz equation with $\kappa _0$ in ${\mathbb {R}}^3\backslash \overline{\varOmega _0}$ we have $u = \widetilde{V}_{\kappa _0} t_0 - W_{\kappa _0} u_0 + {u_\mathrm {inc}}$ in $\varOmega _0$ and $u = \widetilde{V}_{\kappa _i} t_i - W_{\kappa _i} u_i$ in $\varOmega _i, \,i>0$ [16, 18] with the single- and double-layer potentials

$$\begin{aligned} \widetilde{V}_{\kappa _i}&:H^{-1/2}(\varGamma _i) \rightarrow H^{1}_{\mathrm {loc}}(\varOmega _i),&\widetilde{V}_{\kappa _i} t_i({{\varvec{x}}})&:= \int _{\varGamma _i} v_{\kappa _i}({{\varvec{x}}},{{\varvec{y}}}) t_i({{\varvec{y}}}) \,{\mathrm {d}}{{\varvec{s}}}_{{\varvec{y}}}, \\ W_{\kappa _i}&:H^{1/2}(\varGamma _i) \rightarrow H^{1}_{\mathrm {loc}}(\varOmega _i),&W_{\kappa _i}u_i({{\varvec{x}}})&:= \int _{\varGamma _i} \frac{\partial v_{\kappa _i}}{\partial {{\varvec{n}}}_{i,{{\varvec{y}}}}}({{\varvec{x}}},{{\varvec{y}}}) u_i({{\varvec{y}}}) \,{\mathrm {d}}{{\varvec{s}}}_{{\varvec{y}}}, \end{aligned}$$

respectively, and the fundamental solution $v_{\kappa }({{\varvec{x}}},{{\varvec{y}}}) = \exp ({\mathrm {i}}\kappa \vert {{\varvec{x}}}-{{\varvec{y}}}\vert )/(4\pi \vert {{\varvec{x}}}-{{\varvec{y}}}\vert )$. Observing that

$$\begin{aligned} \begin{bmatrix} -K_{\kappa _0} &{} V_{\kappa _0} \\ D_{\kappa _0} &{} K^*_{\kappa _0} \end{bmatrix} \begin{bmatrix} {u_\mathrm {inc}}\\ {t_\mathrm {inc}}\end{bmatrix} = -\frac{1}{2} \begin{bmatrix} {u_\mathrm {inc}}\\ {t_\mathrm {inc}}\end{bmatrix} \quad \text {on }\varGamma _0 \end{aligned}$$

results in the relation between the traces of $u|_{\varOmega _i}$

$$\begin{aligned} \begin{bmatrix} -K_{\kappa _i} &{} V_{\kappa _i} \\ D_{\kappa _i} &{} K^*_{\kappa _i} \end{bmatrix} \begin{bmatrix} u_i \\ t_i \end{bmatrix} - \frac{1}{2} \begin{bmatrix} u_i \\ t_i \end{bmatrix} = \begin{bmatrix} f_i \\ g_i \end{bmatrix} \text { on }\varGamma _i, \ \begin{bmatrix} f_0 \\ g_0 \end{bmatrix} := - \begin{bmatrix} {u_\mathrm {inc}}\\ {t_\mathrm {inc}}\end{bmatrix}, \begin{bmatrix} f_i \\ g_i \end{bmatrix} := 0 \text { for } i > 0, \end{aligned}$$

(2.3)

with the boundary integral operators defined as the composition of the trace operators ${\gamma _i^\mathrm {D}}:H^1_\mathrm {loc}(\varOmega _i) \rightarrow H^{1/2}(\varGamma _i)$, ${\gamma _i^\mathrm {N}}:H^1_\mathrm {loc}(\varOmega _i) \rightarrow H^{-1/2}(\varGamma _i)$ and the potentials, i.e.

$$\begin{aligned} V_{\kappa _i} := {\gamma _i^\mathrm {D}}\widetilde{V}_{\kappa _i}, \ -\frac{1}{2} I + K_{\kappa _i} := {\gamma _i^\mathrm {D}}W_{\kappa _i}, \ \frac{1}{2} I + K_{\kappa _i}^*:= {\gamma _i^\mathrm {N}}V_{\kappa _i}, \ D_{\kappa _i} := -{\gamma _i^\mathrm {N}}W_{\kappa _i}. \end{aligned}$$

The idea of MTF is to replace the identity part of (2.3) by the transmission conditions (2.2). To this end we define the operators

$$\begin{aligned} X_{i,j} u_i&:= {\left\{ \begin{array}{ll} \frac{1}{2} u_i|_{\varGamma _{i,j}} &{} \text {on } \varGamma _{i,j}, \\ 0 &{} \text {otherwise}, \end{array}\right. }&Y_{i,j} t_i&:= {\left\{ \begin{array}{ll} \frac{1}{2} \frac{\mu _j}{\mu _i} t_i|_{\varGamma _{i,j}} &{} \text {on } \varGamma _{i,j}, \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

The variational formulation for the situation in Fig. 1 thus reads

with

$$\begin{aligned} (u_i,t_i)&\in H^{1/2}(\varGamma _i) \times H^{-1/2}(\varGamma _i),&(v_i,s_i)&\in \widetilde{H}^{1/2}_{\mathrm {pw}}(\varGamma _i) \times \widetilde{H}^{-1/2}_{\mathrm {pw}}(\varGamma _i). \end{aligned}$$

For a definition of the above Sobolev spaces on manifolds we refer the interested reader to [1, 7, 18, 19] and references therein.

To discretize the system we decompose the skeleton $\varSigma $ into plane triangles $\tau _k$ and define the discrete function space ${{\,\mathrm{span}\,}}(\varphi _\ell )$ of globally continuous piecewise linear functions to approximate all the above spaces. The discrete system thus reads

$$\begin{aligned} \begin{bmatrix} -{{\mathsf {K}}}_{\kappa _0,h} &{} {{\mathsf {V}}}_{\kappa _0,h} &{} -{{\mathsf {M}}}_{1,0,h} &{} {{\mathsf {O}}}&{} -{{\mathsf {M}}}_{2,0,h} &{} {{\mathsf {O}}}\\ {{\mathsf {D}}}_{\kappa _0,h} &{} {{\mathsf {K}}}^\top _{\kappa _0,h} &{} {{\mathsf {O}}}&{} \frac{\mu _0}{\mu _1}{{\mathsf {M}}}_{1,0,h} &{} {{\mathsf {O}}}&{} \frac{\mu _0}{\mu _2}{{\mathsf {M}}}_{2,0,h} \\ -{{\mathsf {M}}}_{0,1,h} &{} {{\mathsf {O}}}&{} -{{\mathsf {K}}}_{\kappa _1,h} &{} {{\mathsf {V}}}_{\kappa _1,h} &{} -{{\mathsf {M}}}_{2,1,h} &{} {{\mathsf {O}}}\\ {{\mathsf {O}}}&{} \frac{\mu _1}{\mu _0}{{\mathsf {M}}}_{0,1,h} &{} {{\mathsf {D}}}_{\kappa _1,h} &{} {{\mathsf {K}}}^\top _{\kappa _1,h} &{} {{\mathsf {O}}}&{} \frac{\mu _1}{\mu _2}{{\mathsf {M}}}_{2,1,h} \\ -{{\mathsf {M}}}_{0,2,h} &{} {{\mathsf {O}}}&{} -{{\mathsf {M}}}_{1,2,h} &{} {{\mathsf {O}}}&{} -{{\mathsf {K}}}_{\kappa _2,h} &{} {{\mathsf {V}}}_{\kappa _2,h} \\ {{\mathsf {O}}}&{} \frac{\mu _2}{\mu _0}{{\mathsf {M}}}_{0,2,h} &{} {{\mathsf {O}}}&{} \frac{\mu _2}{\mu _1}{{\mathsf {M}}}_{1,2,h} &{} {{\mathsf {D}}}_{\kappa _2,h} &{} {{\mathsf {K}}}^\top _{\kappa _2,h} \end{bmatrix} \begin{bmatrix} {{\varvec{u}}}_0 \\ {{\varvec{t}}}_0 \\ {{\varvec{u}}}_1 \\ {{\varvec{t}}}_1 \\ {{\varvec{u}}}_2 \\ {{\varvec{t}}}_2 \end{bmatrix} = - \begin{bmatrix} {{\mathsf {M}}}_{0,h} {{\varvec{u}}}_\mathrm {inc} \\ {{\mathsf {M}}}_{0,h} {{\varvec{t}}}_\mathrm {inc} \\ {{\varvec{0}}}\\ {{\varvec{0}}}\\ {{\varvec{0}}}\\ {{\varvec{0}}}\end{bmatrix} \end{aligned}$$

with the matrices

$$\begin{aligned} \begin{aligned} {{\mathsf {V}}}_{\kappa _i,h}[k,\ell ]&:= \langle V_{\kappa _i} \varphi _\ell , \varphi _k \rangle _{\varGamma _i},&{{\mathsf {K}}}_{\kappa _i,h}[k,\ell ]&:= \langle K_{\kappa _i} \varphi _\ell , \varphi _k \rangle _{\varGamma _i}, \\ {{\mathsf {D}}}_{\kappa _i,h}[k,\ell ]&:= \langle D_{\kappa _i} \varphi _\ell , \varphi _k \rangle _{\varGamma _i}, \\ {{\mathsf {M}}}_{i,j,h}[k,\ell ]&:= \frac{1}{2} \langle \varphi _\ell , \varphi _k \rangle _{\varGamma _{i,j}},&{{\mathsf {M}}}_{0,h}[k,\ell ]&:= \langle \varphi _\ell , \varphi _k \rangle _{\varGamma _0} \end{aligned} \end{aligned}$$

(2.4)

and the duality pairings

$$\begin{aligned} \langle u_i, t_i \rangle _{\varGamma _i}&= \int _{\varGamma _i} u_i({{\varvec{x}}}) t_i({{\varvec{x}}}) \,{\mathrm {d}}{{\varvec{s}}}_{{\varvec{x}}},&\langle u_i, t_j \rangle _{\varGamma _{i,j}}&= \int _{\varGamma _{i,j}} u_i({{\varvec{x}}}) t_j({{\varvec{x}}}) \,{\mathrm {d}}{{\varvec{s}}}_{{\varvec{x}}}. \end{aligned}$$

The local indices $k,\ell $ in (2.4) span over all globally defined basis functions supported on the respective interfaces $\varGamma _i$ or $\varGamma _{i,j}$.

3 Parallel ACA

The matrices produced by a classical BEM are usually dense. Although the number of degrees of freedom is smaller compared to the volume-based methods (e.g., finite element method), some of the so-called fast BEM have to be applied in order to solve large-scale problems. These methods are usually based on a hierarchical clustering of the underlying mesh and approximations of matrix blocks corresponding to pairs of sufficiently separated clusters. Here we use the adaptive cross approximation (ACA) method since it is a purely algebraic approach and once properly implemented, it can be used for various kind of problems. Other approaches include the fast multipole method [9, 15] or the wavelet compression [8].

After hierarchical clustering, ACA proceeds by approximating sub-matrices corresponding to pairs of well-separated (admissible) clusters by a product of two low-rank matrices. Non-admissible clusters are assembled as dense matrices. Due to a limited scope of this paper, we refer the reader to [3, 16] for more details. The parallelization of the method based on the cyclic graph decomposition was presented in [13] where only certain special numbers of processors were discussed. In [12] we further extended the approach to support general number of processors. In the following section we recollect its basic principle and extend it to support the solution of MTF systems.

Let us first briefly describe the parallelization of a single-domain problem. To distribute BEM system matrices among P processors we decompose the input surface mesh $\varGamma $ into P disjoint sub-meshes $\varGamma ^1, \varGamma ^2, \ldots , \varGamma ^P$ of approximately the same number of elements using, e.g. the METIS library [10]. In this manner we obtain a $P\times P$ block structure of matrices such that the block in row i and column j is induced by the integration over $\varGamma ^{i}$ and $\varGamma ^{j}$. Afterwards, the $P^2$ blocks are assembled in parallel via P MPI processes. The workload distribution follows the so-called cyclic graph decomposition introduced in [13] for special values of P and later generalized in [12]. This way, each MPI process requires $\mathcal {O}(n/\sqrt{P})$ mesh data for the assembly of the blocks and actively works with $\mathcal {O}(n/\sqrt{P})$ degrees of freedom during matrix-vector multiplication, which has a positive effect on the efficiency of the required MPI synchronization phase. See Fig. 2 for an example of an $8\times 8$ block distribution together with the respective generator graph.

The extension of the proposed parallelization scheme to local multi-trace operators can be implemented in a straight-forward manner. Let $\varOmega _{0}, \varOmega _{1}, \cdots , \varOmega _{m}$ denote the subdomains with their respective boundaries $\varGamma _{0}, \varGamma _{1}, \cdots , \varGamma _{m}$. We apply our parallel ACA scheme to each subdomain individually, i.e. each $\varGamma _{j}$ is split into P submeshes. The BEM operators ${{\mathsf {K}}}_{\kappa _j,h}, {{\mathsf {V}}}_{\kappa _j,h}, {{\mathsf {D}}}_{\kappa _j,h}$ are treated as $P \times P$ block operators, each assembled in parallel according to the corresponding cyclic graph decomposition. This approach works reasonably well, however, distributing each boundary among the same number of processes may lead to a poor parallel efficiency in the case of $\varGamma _j$ with small number of elements. Thus, the goal of our future research is to design an advanced parallel scheme for the assembly of local operators such that ${{\mathsf {K}}}_{\kappa _j,h}, {{\mathsf {V}}}_{\kappa _j,h}, {{\mathsf {D}}}_{\kappa _j,h}$ are assembled by various numbers of MPI processes.

4 Numerical Experiments

Our main interest lies in the study of the parallel matrix assembly and efficiency of the matrix-vector multiplication. The results of strong and weak scalability experiments are presented in Tables 1 and 2, respectively.

The numerical experiments were carried out on the Salomon supercomputer at IT4Innovations National Supercomputing Center, Czech Republic. The cluster is composed of 1 008 compute nodes, each of them equipped with two 12-core Intel Xeon E5-2680v3 processors and 128 GB of RAM. The nodes are interconnected by the 7D enhanced hypercube InfiniBand network. The theoretical peak performance totals 2 PFlop/s. We used METIS 5.1.0 for the decomposition of the mesh [10] and Intel Parallel Studio 2017.1.132 with Intel Math Kernel Library (MKL) as a BLAS implementation. We use two MPI processes per compute node, 12 OMP threads per one MPI process and set KMP_AFFINITY="granularity=fine,scatter".

All the tests have been performed on a cubical geometry split into three parts, see Fig. 3 for a central cut of the domain and also the depiction of the resulting total field. The wave numbers are $\kappa _{0}=4$, $\kappa _{1}=2$, $\kappa _{2}=5$, $\kappa _{3}=3$ and $\mu _{0}=\mu _{1}=\mu _{2}=\mu _{3}=1$. We used globally continuous piecewise linear trial and test functions for all operators included in the formulation. The parameters controlling the complexity of ACA were set to $\varepsilon _{\mathrm {ACA}}=10^{-6}$, $\mu _{\mathrm {ACA}} = 1.2$. The relative precision for the GMRES solver was set to $\varepsilon _{\mathrm {GMRES}}=10^{-6}$.

The measured quantities are the time in seconds to assemble the matrices, the time to perform one matrix-vector multiplication without MPI synchronization, and also the overhead required to synchronize the results via MPI. We also present the efficiency of the parallel solver. We performed a series of five experiments and the presented values are the averages of the results.

In the case of strong scaling experiments (see Table 1), $\varGamma _{0}$ contains $258\,048$ elements and $\varGamma _{1}, \varGamma _{2}, \varGamma _{3}$ contain $110\,592$ elements each. Let the real time for parallel matrix assembly and matrix-vector multiplication on P processes be $t_{P}$, then the efficiency of the strong scaling on P processes is calculated as $(2 t_{2})/(Pt_{P})$. Taking into account the relatively small size of the problem, we obtain a reasonable parallel efficiency of the system matrix assembly up to 64 compute nodes (128 processes). The matrix-vector multiplication scales relatively well up to 16 nodes (32 processes). The reason is twofold. The first major factor is increasing time required for vector synchronization after each multiplication. The second factor is the increasing density of the ACA matrices on higher number of processes. In the current implementation, our synchronization scheme is the following:

we split the outer vector into 4 disjoint intervals (one for each subdomain),
then we perform all necessary computations for each interval followed by a non-blocking allreduce across all processes,
to facilitate the progress of the non-blocking reductions, we perform periodic calls to MPI_Test on the master OMP thread.

Table 1. Strong scalability of operator assembly, matrix-vector multiplication and total runtime of the solver on $1, 2, \dots , 64$ compute nodes (MPI synchronization included).

Full size table

The weak scaling experiments were performed on 1, 4 and 16 compute nodes each running two MPI processes (see Table 2). On a single node, $\varGamma _{0}$ contains $k_0 := 81\,648$ elements, and $\varGamma _{1}, \varGamma _{2}, \varGamma _{3}$ contain $k_1:=34\,992$ elements each. On P/2 nodes, the number of mesh elements is proportionally increased, i.e. $Pk_{0}/2$ for $\varGamma _{0}$ and $Pk_{1}/2$ for $\varGamma _{1}, \varGamma _{2}$ and $\varGamma _{3}$. Let the expected complexity of parallel matrix assembly and matrix-vector multiplication on P processes be $e_{P}:= (k_0\log (\frac{P}{2}k_0)) + 3k_1\log (\frac{P}{2}k_1))/2$ and let the real time be $t_{P}$, respectively. The efficiency of the weak scaling on P processes is then calculated as $(t_{2}e_{P})/(t_{P}e_{2})$.

Table 2. Efficiency of the weak scaling of operator assembly and matrix-vector multiplication on 1, 4, 16 compute nodes (MPI synchronization included).

Full size table

5 Conclusion

We briefly presented a local multi-trace formulation for scattering problems with heterogeneous scatterers and applied the so-called cyclic graph decomposition to define a block-based workload distribution for distributed parallel matrix assembly and matrix-vector multiplication. We performed experiments on up to 64 compute nodes and presented strong and weak scaling properties of our approach. In our following work, we plan to refine and generalize the proposed methods to directly decompose the skeleton of the scatterer instead of the individual subdomains. We believe this could lead to a better scalability and ability to deal with problems with differently sized subdomains.

References

Amann, D.: Boundary element methods for Helmholtz transmission problems. Master’s thesis, TU Graz (2014)
Google Scholar
Bebendorf, M.: Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems. Lecture Notes in Computational Science and Engineering. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77147-0
Book MATH Google Scholar
Bebendorf, M., Rjasanow, S.: Adaptive low-rank approximation of collocation matrices. Computing 70(1), 1–24 (2003). https://doi.org/10.1007/s00607-002-1469-6
Article MathSciNet MATH Google Scholar
Chaillat, S., Collino, F.: A wideband fast multipole method for the Helmholtz kernel: theoretical developments. Comput. Math. Appl. 70(4), 660–678 (2015)
Article MathSciNet Google Scholar
Claeys, X., Dolean, V., Gander, M.: An introduction to multi-trace formulations and associated domain decomposition solvers. Appl. Numer. Math. 135, 69–86 (2019)
Article MathSciNet Google Scholar
Claeys, X., Hiptmair, R.: Multi-trace boundary integral formulation for acoustic scattering by composite structures. Commun. Pure Appl. Math. 66(8), 1163–1201 (2013)
Article MathSciNet Google Scholar
Claeys, X., Hiptmair, R., Jerez-Hancknes, C., Pintarelli, S.: Novel multi-trace boundary integral equations for transmission boundary value problems. In: Unified Transform for Boundary Value Problems: Applications and Advances, pp. 227–258. SIAM (2015)
Google Scholar
Dahlke, S., Harbrecht, H., Utzinger, M., Weimar, M.: Adaptive wavelet BEM for boundary integral equations: theory and numerical experiments. Numer. Functional Anal. Optim. 39(2), 208–232 (2018)
Article MathSciNet Google Scholar
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 135(2), 280–292 (1997)
Article MathSciNet Google Scholar
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1999)
Article MathSciNet Google Scholar
Kravcenko, M., Maly, L., Merta, M., Zapletal, J.: Parallel assembly of ACA BEM matrices on Xeon Phi clusters. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10777, pp. 101–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78024-5_10
Chapter Google Scholar
Kravčenko, M., Merta, M., Zapletal, J.: Distributed fast boundary element methods for Helmholtz problems. Appl. Math. Comput. 362, 124503 (2019)
MathSciNet MATH Google Scholar
Lukáš, D., Kovář, P., Kovářová, T., Merta, M.: A parallel fast boundary element method using cyclic graph decompositions. Numer. Algorithms 70(4), 807–824 (2015)
Article MathSciNet Google Scholar
Merta, M., Zapletal, J.: BEM4I. IT4Innovations, VŠB - Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic (2013). http://bem4i.it4i.cz/
Of, G.: Fast multipole methods and applications. In: Schanz, M., Steinbach, O. (eds.) Boundary Element Analysis. Lecture Notes in Applied and Computational Mechanics, vol. 29, pp. 135–160. Springer, Berlin Heidelberg (2007). https://doi.org/10.1007/978-3-540-47533-0_6
Chapter Google Scholar
Rjasanow, S., Steinbach, O.: The Fast Solution of Boundary Integral Equations. Mathematical and Analytical Techniques with Applications to Engineering. Springer, Heidelberg (2007). https://doi.org/10.1007/0-387-34042-4
Rjasanow, S., Weggler, L.: Matrix valued adaptive cross approximation. Math. Methods Appl. Sci. 40(7), 2522–2531 (2017)
Article MathSciNet Google Scholar
Sauter, S.A., Schwab, C.: Boundary Element Methods. Springer Series in Computational Mathematics. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-540-68093-2
Book MATH Google Scholar
Steinbach, O.: Numerical Approximation Methods for Elliptic Boundary Value Problems: Finite and Boundary Elements. Texts in Applied Mathematics. Springer, New York (2008). https://doi.org/10.1007/978-0-387-68805-3
Book MATH Google Scholar
Zapletal, J., Merta, M., Malý, L.: Boundary element quadrature schemes for multi- and many-core architectures. Comput. Math. Appl. 74(1), 157–173 (2017). 5th European Seminar on Computing ESCO 2016
Google Scholar
Zapletal, J., Of, G., Merta, M.: Parallel and vectorized implementation of analytic evaluation of boundary integral operators. Engineering Analysis with Boundary Elements 96, 194–208 (2018)
Article MathSciNet Google Scholar

Download references

Acknowledgement

The authors acknowledge the support provided by The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPS II) project ‘IT4Innovations excellence in science – LQ1602’ and from the Large Infrastructures for Research, Experimental Development and Innovations project ‘IT4Innovations National Supercomputing Center – LM2015070’. MK was further supported by the ESF in ‘Science without borders’ project, reg. nr. CZ.02.2.69/0.0./0.0./16_027/0008463 within the Operational Programme Research, Development and Education.

Author information

Authors and Affiliations

IT4Innovations, VŠB – Technical University of Ostrava, 17. listopadu 2172/15, 708 00, Ostrava-Poruba, Czech Republic
Michal Kravčenko, Jan Zapletal & Michal Merta
Department of Applied Mathematics, VŠB – Technical University of Ostrava, 17. listopadu 2172/15, 708 00, Ostrava-Poruba, Czech Republic
Michal Kravčenko, Jan Zapletal & Michal Merta
Jacques-Louis Lions Laboratory, Sorbonne Université, Boite courrier 187, 75252, Paris Cedex 05, France
Xavier Claeys

Authors

Michal Kravčenko
View author publications
You can also search for this author in PubMed Google Scholar
Jan Zapletal
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Claeys
View author publications
You can also search for this author in PubMed Google Scholar
Michal Merta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michal Kravčenko .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kravčenko, M., Zapletal, J., Claeys, X., Merta, M. (2020). Parallel Adaptive Cross Approximation for the Multi-trace Formulation of Scattering Problems. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-43229-4_13
Published: 19 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43228-7
Online ISBN: 978-3-030-43229-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Adaptive Cross Approximation for the Multi-trace Formulation of Scattering Problems

Abstract

Similar content being viewed by others

Fast multipole preconditioners for sparse matrices arising from elliptic equations

Heterogeneous Implementation of Preconditioners Based on Gauss–Seidel Method for Sparse Block Matrices

An Efficient High Performance Parallelization of a Discontinuous Galerkin Spectral Element Method

Keywords

1 Introduction

2 Multi-trace Formulation

3 Parallel ACA

4 Numerical Experiments

5 Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parallel Adaptive Cross Approximation for the Multi-trace Formulation of Scattering Problems

Abstract

Similar content being viewed by others

Fast multipole preconditioners for sparse matrices arising from elliptic equations

Heterogeneous Implementation of Preconditioners Based on Gauss–Seidel Method for Sparse Block Matrices

An Efficient High Performance Parallelization of a Discontinuous Galerkin Spectral Element Method

Keywords

1 Introduction

2 Multi-trace Formulation

3 Parallel ACA

4 Numerical Experiments

5 Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation