A fast direct solver for nonlocal operators in wavelet coordinates

In this article, we consider fast direct solvers for nonlocal operators. The pivotal idea is to combine a wavelet representation of the system matrix, yielding a quasi-sparse matrix, with the nested dissection ordering scheme. The latter drastically reduces the fill-in during the factorization of the system matrix by means of a Cholesky decomposition or an LU decomposition, respectively. This way, we end up with the exact inverse of the compressed system matrix with only a moderate increase of the number of nonzero entries in the matrix. To illustrate the efficacy of the approach, we conduct numerical experiments for different highly relevant applications of nonlocal operators: We consider (i) the direct solution of boundary integral equations in three spatial dimensions, issuing from the polarizable continuum model, (ii) a parabolic problem for the fractional Laplacian in integral form and (iii) the fast simulation of Gaussian random fields.


Introduction
Various problems in science and engineering lead to nonlocal operators and corresponding operator equations. Examples arise from physical problems like field calculations and Riesz energy problems, from machine learning, and also from stochastic simulations and uncertainty quantification.
Traditional discretizations of nonlocal operators result in densely populated system matrices. This feature renders the computation very costly in both respects, the computation time and computer memory requirements. Therefore, over recent decades, different ideas for the data sparse approximation of nonlocal operators have been developed. Most prominent examples of such methods are the fast multipole method [15], the panel clustering [18], the wavelet matrix compression [1,7], and the hierarchical matrix format [17]. These techniques are able to represent nonlocal operators in linear or almost linear cost with respect to the number of degrees of freedom used for their discretization.
The present article relies on a compression of the system matrix by wavelets. Especially, the matrix representation of the nonlocal operator in wavelet coordinates is quasi-sparse, i.e. most matrix entries are negligible and can be treated as zero without compromising the overall accuracy. Discarding the non-relevant matrix entries is called matrix compression. Roughly speaking, nonlocal operators become local operators in wavelet coordinates. A fully discrete version of the wavelet matrix compression has been developed in [26]. It computes the compressed operator within discretization accuracy with linear cost.
Based on the sparsity pattern of the system matrix which is solely determined by the order of the underlying operator, we employ a fill-in reducing reordering of the matrix entries by means of nested dissection, see [13,30]. This reordering in turn allows for the rapid inversion of the system matrix by the Cholesky decomposition or more generally by the LU decomposition. In particular, besides the rigorously controllable error for the matrix compression in the wavelet format and the roundoff errors in the computation of the matrix factorization, no additional approximation errors are introduced. This is a major difference to other approaches for the discretization and the arithmetics of nonlocal operators, e.g. by means of hierarchical matrices. As the hierarchical matrix format is not closed under arithmetic operations, a recompression step after each arithmetic (block) operation has to be performed, which results into accumulating and hardly controllable consistency errors for matrix factorizations, see [16,17].
In order to demonstrate the efficiency of the suggested approach, we consider applications from different fields. Namely, we consider (i) a boundary integral equation arising from the polarizable continuum model in quantum chemistry as a classical example for a nonlocal operator equation, (ii) a parabolic problem for the fractional Laplacian, and finally (iii) the fast numerical simulation of Gaussian random fields as an important example from computational uncertainty quantification.
One of the most widespread methods to include solvent effects in quantum chemistry is by making use of a continuum dielectric: the solvent is represented by a continuum which surrounds the molecule. Solute-solvent interactions are then described through appropriate functions supported on the molecule's surface. For an overview of continuum solvation models, we refer the reader to [40], and in particular for the polarizable continuum model to [4,5,33]. Wavelet matrix compression for the polarizable continuum model has been considered in [2,41]. Especially, in [20], the use of an incomplete Cholesky decomposition based preconditioner has been suggested, which is however inferior to the approach presented here.
The fractional Laplacian is an operator which generalizes the notion of spatial derivatives to fractional orderss. It appears in image asnalysis, kinetic equations, phase transitions and nonlocal heat conduction, just to mention some applications. We refer to the review article [11] and the references therein for further details. In particular, we will focus here on the definition of the fractional Laplacian in its integral form, as it can be found in [10] and also in [11]. To the best of our knowledge, the numerical treatment of the parabolic problem for the fractional Laplacian by means of wavelets has not been addressed in literature yet.
The rapid simulation of (Gaussian) random fields with a prescribed covariance structure is of paramount importance in computational uncertainty quantification. The fast methods, which have been suggested so far are based on the computation of matrix square roots employing low-rank factorizations, block-wise low-rank factorizations, such as obtained by hierarchical matrices, or the discretization of the action of the matrix square root on a given vector by Krylov subspace methods. Other approaches compute the Karhunen-Loève expansion by circulant embeddings and fast Fourier techniques or employ the contour integral method. For more details on these methods, we refer to [12,14,22,28,36]. In contrast to the previously mentioned approaches, we consider here the direct simulation of the random field by the Cholesky decomposition of the covariance matrix, which is very sparse in wavelet coordinates. This article is organized as follows. Wavelet bases and their properties are specified in Section 2. Section 3 briefly repeats the main features of the fully discrete wavelet matrix compression scheme from [26]. Then, in Section 4, for the sake of completeness, the idea of nested dissection is briefly outlined. Section 5 presents the three different applications considered in this article, while Section 6 is devoted to related numerical experiments. Finally, Section 7 contains some concluding remarks.
In the following, in order to avoid the repeated use of generic but unspecified constants, we write C D to indicate that C can be bounded by a multiple of D, independently of parameters which C and D may depend on. Then, C D is defined as D C, while we write C ∼ D, iff C D and C D.

Wavelets and multiresolution Analysis
Let D denote a domain in R n or a manifold in R n+1 . A multiresolution analysis consists of a nested family of finite dimensional approximation spaces We will refer to j as the level of V j in the multiresolution analysis. Each space V j is endowed with a single-scale basis i.e. V j = span Φ j , where ∆ j denotes a suitable index set with cardinality |∆ j | ∼ 2 jn . For convenience, we shall in the sequel write bases on the form of row vectors, such that, for v = [v k ] k∈∆ j ∈ 2 (∆ j ), the corresponding function can simply be written as a dot product according to In addition, we shall assume that the single-scale bases Φ j are uniformly stable, this means that v 2 (∆ j ) ∼ Φ j v L 2 (D) for all v ∈ 2 (∆ j ) uniformly in j, and that they satisfy the locality condition Additional properties of the spaces V j are required for using them as trial spaces in a Galerkin scheme. The approximation spaces shall have the regularity γ := sup{s ∈ R : V j ⊂ H s (D)} and the approximation order d ∈ N, that is Rather than using the multiresolution analysis corresponding to the hierarchy in (2.1), the pivotal idea of wavelets is to keep track of the increment of information between two consecutive levels j − 1 and j. Since we have V j−1 ⊂ V j , we may decompose with an appropriate detail space W j . Of practical interest is the particular choice of the basis of the detail space W j in V j . This basis will be denoted by In particular, we shall assume that the collections Φ j−1 ∪Ψ j form uniformly stable bases of V j , as well. If Ψ = j≥0 Ψ j , where Ψ 0 := Φ 0 , is even a Riesz-basis of L 2 (D), then it is called a wavelet basis. We require the functions ψ j,k to be localized with respect to the corresponding level j, i.e. diam(supp ψ j,k ) ∼ 2 −j , and we normalize them such that At first glance it would be very convenient to deal with a single orthonormal system of wavelets. However, it has been shown in [7,8,38] that orthogonal wavelets are not optimal for the efficient approximation nonlocal operator equations. For this reason, we rather use biorthogonal wavelet bases. In this case, we also have a dual, multiresolution analysis, i.e. dual single-scale bases and wavelets which are coupled to the primal ones by the orthogonality condition The corresponding spaces V j := span Φ j and W j := span Ψ j satisfy Moreover, the dual spaces are supposed to exhibit some approximation order d ∈ N and regularity γ > 0. Denoting in complete analogy to the primal basis In particular, relation (2.2) implies that the wavelets exhibit vanishing moments of order d, i.e.
Herein, the quantity We refer to [6] for further details. Piecewise constant and bilinear wavelets which provide the above properties have been constructed in [25,27]. In what follows, we will refer to the wavelet basis of V J by Ψ J = {ψ λ : λ ∈ ∇ J }, where the multi-index λ = (j, k) incorporates the scale j = |λ| and the spatial location k = k(λ).

Wavelet Matrix Compression
For a given domain or manifold D and q ∈ R, let denote a given (continuous and bijective) nonlocal operator of order 2q. According to the Schwartz kernel theorem, it can be represented in accordance with for a suitable kernel function k : D × D → R. The kernel functions under consideration are supposed to be smooth as functions in the variables x and y, apart from the diagonal {(x, y) ∈ D × D : x = y} and may exhibit a singularity on the diagonal. Such kernel functions arise, for instance, from applying a boundary integral formulation to a second order elliptic problem [37,39]. Typically, they decay like a negative power of the distance of the arguments which depends on the order 2q of the operator. More precisely, there holds We emphasize that this estimate remains valid for the kernels of arbitrary pseudodifferential operators, see [9] for the details.
Corresponding to the nonlocal operator from (3.4), we may consider the operator equation Au = f which gives rise to the Galerkin approach: Traditionally, this equation is discretized employing the single-scale basis of V J which results in densely populated system matrices. If N J ∼ 2 Jn denotes the number of basis functions in the space V J , then the system matrix contains O(N 2 J ) nonzero matrix entries. In contrast, by utilizing a wavelet basis in the Galerkin discretization, we end up with a matrix that is quasi-sparse, i.e. it is compressible to O(N J ) nonzero matrix entries without compromising the overall accuracy. More precisely, by combining (2.3) and (3.5), we arrive at the decay estimate which is the foundation of the compression estimates in [7]. Herein, D λ := supp ψ λ and D λ := supp ψ λ denote the convex hulls of the supports of the wavelets ψ λ and ψ λ . Based on (3.6), we shall neglect all matrix entries for which the distance of the supports between the associated ansatz and test wavelets is larger than a level dependent cut-off parameter B j,j . An additional compression, reflected by a cut-off parameter B s j,j , is achieved by neglecting several of those matrix entries, for which the corresponding trial and test functions have overlapping supports.
To formulate this result, we introduce the abbreviation D s λ := sing supp ψ λ which denotes the singular support of the wavelet ψ λ , i.e. that subset of D where the wavelet is non-smooth. Theorem 3.1 (A-priori compression [7]). Let D λ and D s λ be given as above and define the compressed system matrix A J , corresponding to the boundary integral operator A, by the cut-off parameters B j,j and B s j,j are set according to . Then, the system matrix A J only has O(N J ) nonzero entries. In addition, the error estimate holds for the solution u J of the compressed Galerkin system provided that u and D are sufficiently regular.
The compressed system matrix can be assembled with linear cost if the exponentially convergent hp-quadrature method proposed in [26] is employed for the computation of matrix entries. Moreover, for performing faster matrix-vector multiplications, an additional a-posteriori compression might be applied which reduces again the number of nonzero entries by a factor 2-5, see [7]. The pattern of the compressed system matrix shows the typical finger structure, see the left hand side of Figure 3.1.

Nested dissection
The representation of the system matrix corresponding to a nonlocal operator with respect to an appropriate wavelet basis leads to a quasi-sparse matrix, i.e. a matrix with many small entries which can be neglected without compromising accuracy. Performing a thresholding procedure as discussed in the previous section then yields a sparse system matrix whose symmetric sparsity pattern is solely determined by the order of the underlying operator, see the left hand side of Figure 3.1.
The factorization of the system matrix represented in the canoncial levelwise ordering leads to a massive fill-in. This means that a huge amount of nonzero entries is generated by a Cholesky decomposition or an LU decomposition, typically resulting in dense matrix factors. In order to obtain much sparser factorizations, we employ a nested dissection ordering, cf. [13,30], see the right hand side of Figure 3.1.
Nested dissection is a divide and conquer algorithm whose foundation is a graph theoretical observation. To each matrix A ∈ R N ×N with a symmetric sparsity pattern,  we may assign an undirected graph G = (V, E) with vertices V = {1, 2, . . . , N } and edges E = {i, j} : a i,j = 0 . Then, a symmetric permutation P AP of the rows and columns of A amounts to a permutation π(V ) of the nodes in V . In particular, we have the following important result from [35], see also [34], which we formulate here only for the Cholesky decomposition P AP = LL .
The lemma states that the Cholesky decomposition connects all nodes i and j, resulting in a nonzero entry i,j , for which there exists a path of nodes that have been eliminated before i and j.
Finding an optimal ordering is a hard problem in general. Therefore, we resort to the following strategy, which is known as nested dissection ordering: i.e. the removal of the vertices of the separator S and its adjacent edges results into two disjoint subgraphs. Hence, employing an ordering which puts first the nodes into V 1 and V 2 and afterwards the nodes in S, leads to a matrix structure of the form Recursively applying this procudeure then yields a structure similar to the one on the right hand side of Figure 3.1. For obvious reasons, it is desirable to have a minimal separator S, which evenly splits the V into two subsets, we refer to [30] and the references therein for a comprehensive discussion of this topic. In order to obtain suitable separators for the computations in this article, we will adopt the strategy from [29], which performs very well in terms of reducing the fill-in.

5.1.
Polarizable continuum model. Continuum solvation models are widely used to model quantum effects of molecules in liquid solutions, compare [40] for an overview. In the polarizable continuum model (PCM), introduced in [33], the molecule under study (the solute) is located inside a cavity D, surrounded by a homogeneous dielectric (the solvent) with dielectric constant ≥ 1. The solute-solvent interactions between the charge distributions which compose the solute and the dielectric are reduced to those of electrostatic origin.
For a given charge ρ ∈ H −1 (D), located inside the cavity, the solute-solvent interaction is expressed by the apparent surface charge σ ∈ H −1/2 (∂D). It is given by the integral equation where V is the single layer operator n(y), x − y 4π x − y 3 dσ y , and N ρ denotes the Newton potential of the given charge The discretization of the boundary integral equation (5.11) by means of a Galerkin scheme is as follows, compare [23,24]: We make the ansatz σ J = λ σ λ ψ λ and introduce the mass matrix G J = [(ψ λ , ψ λ ) L 2 (D) ] λ,λ and the system matrices Then, for a given data vector f J = [(N ρ , ψ λ ) L 2 (∂D) ] λ , we need to solve the linear system of equations in order to determine the sought apparent surface charge. In quantum chemical simulations, for example when solving the Hartree-Fock equations in a self consistent field approximation, one has to compute the interaction energies between the different particles. This amounts to the determination of different apparent surface charges. Therefore, the fast solution of (5.12) for multiple right hand sides is indispensable for fast simulations in quantum chemistry.

5.2.
Parabolic diffusion problem for the fractional Laplacian. For a given domain D ⊂ R n and 0 < s < 1/2, the fractional Laplacian L s : H s (D)/R → H −s (D)/R is given by [10,11]. We intent to solve the following parabolic diffusion problem for the fractional Laplacian. To this end, we employ the θ-scheme in time and a Galerkin discretization of the problem in space. This leads to the linear system of equations Here, are the system matrix of the fractional Laplacian and the mass matrix, respectively. In each time step, we have hence to invert the matrix and covariance If the expectation and the covariance are known, we may represent a by its Karhunen-Loève expansion Herein, (µ k , a k ), k = 1, 2, . . ., denote the eigen pairs of the Hilbert-Schmidt operator while Y 1 , Y 2 , . . . are independent and standard normally distributed random variables. In order to discretize the Karhunen-Loeve expansion, we proceed in complete analogy to [22] and compute the orthogonal projection of C onto V J . Let C J ∈ R N J ×N J denote the corresponding coefficient matrix. Obviously, a suitable basis for the orthogonal

Numerical Results
In order to obtain consistent computation times, all computations reported in this section have been carried out on a single core of a compute server with Intel Xeon E5-2650 v3 @2.30GHz CPUs and 512GB DDR4 @2133MHz main memory. The implementation of the wavelet matrix compression has been performed in ANSI C, while nested dissection orderings, Cholesky decompositions and LU decompositions have been computed using Matlab 2020a, compare [32].
6.1. Polarizable continuum model. For PCM, we consider the assembly and the factorization of the matrices which are required to set up the linear system of equations (5.12). The dielectric constant is chosen as = 78.39, corresponding to the solvent water. The molecule under study is benzene, whose solvent excluding surface is depicted in Figure 6.2. For the wavelet matrix compression, we use piecewise constant wavelets with three vanishing moments, as developed in [25].   Table 6.1 shows the numerical results. The first column labelled N J corresponds to the number of surface elements. The second column labelled t WEM contains the combined computation times in seconds for the assembly of V J and A J by the wavelet matrix compression. The third column, labelled by t ND , provides the computation times in seconds for the nested dissection ordering, as V J and A J have identical sparsity patterns, the same reordering can be applied to both of them. The fourth column labelled by t Chol (V J ) denotes the computation times in seconds for the Cholesky factorization of V J , while the fifth column shows the computation times in seconds of the LU decomposition of A J . For the purpose of measuring the fill-in relative to the matrix size, we introduce for a sparse matrix A ∈ R N ×N the average number of nonzeros per row The values for anz(V J ) and anz(A J ) are identical and can be found in the sixth row of Table 6.1, while the values anz(L J ) for the Cholesky factor of anz(V J ) are given in the last column of the table. The sparsity pattern of the L-factor of the LU decomposition of A J coincides with that of L J , while the U-factor has somewhat less coefficients compared to the L-factor. Hence, the average number of nonzeros per row for the LU factorization of A J matches that of anz(L J ). The sparsity patterns for V J , L J and for the U-factor are shown in Figure 6.3 for N J = 93 184. In order to obtain a neat representation in this figure, we have merged matrix blocks of size 64 × 64. Darker blocks have a higher density of entries, while lower blocks have a lower density of entries.
As can clearly be inferred from Table 6.1, the times for the computing the nested dissection ordering and the subsequent Cholesky decomposition are negligible with respect to the wavelet matrix compression. The average number of nonzero entries per row stays rather low and only increases for increasing numbers of unknowns up to a factor of approximately 5 for N J = 372 736.
It can be seen from the fifth column of Table 6.1 that the LU decomposition is significantly slower than the Cholesky decomposition. This issues from the fact that all matrices are stored in a sparse column major format, resulting in a large overhead for the access of matrix rows. We remark that, in principle, the LU decomposition could be accelerated by an appropriate data structure, which enables direct access to the rows of the matrix as well.
6.2. Parabolic diffusion problem for the fractional Laplacian. We consider the parabolic problem for the fractional Laplacian with s = 3 /8. The right hand side is a Gaussian heat spot moving on a circular trajectory, given by while the initial condition is set to 0. For the solution of the ordinary differential equation in time, we employ the θ-scheme with θ = 1 /2, which yields the Crank-Nicolson method [3].  In the θ-scheme (5.13), we need to invert the matrix which is assembled with the help of wavelet matrix compression using Haar wavelets. The computational geometry is the unit disc depicted in Figure 6 We have tabulated the similar characteristics from the previous example for the matrix A J in Table 6.2, while the sparsity patterns of A J and L J for N J = 81 920 after reordering are shown in Figure 6.6. We observe a similar behaviour as for PCM, the computation times for nested dissection and the Cholesky decomposition are negligible compared to the wavelet matrix compression. In view of the fill-in, there is an increase of approximately a factor of 5 for the average number of nonzero entries per row for N J = 1 310 720. The computation times for the θ-scheme in seconds for the 150 time steps are shown in the column labelled by t θ . As can be seen, we obtain a solution time of roughly 11 minutes for N J = 1 310 720 unknowns in the spatial discretization.  6.3. Gaussian random field. For the simulation of a Gaussian random field, we consider an L-shaped domain with three holes, compare Figure 6.7. The domain has a side length of 4, while the holes have a diameter of 0.8. For the wavelet matrix compression, we use piecewise bilinear biorthogonal wavelets with four vanishing moments, see [25]. The expectation is set to E[a](x) ≡ 0, while the covariance is given by the exponential  The pattern of the covariance matrix C G J is shown in Figure 6.9, while the patterns of the reordered matrix and its Cholesky factor are provided in Figure 6.10. Table 6.3 shows the computation times and the average numbers of nonzero entries as in the previous examples. In addition, we have the column t Sample , which contains the times for computing a single realization of the random field in seconds. These times have been computed by averaging the computation times over 1000 samples. In order to compute the inverse of the mass matrix, we could in principle reuse the nested dissection ordering which has been computed for the system matrix, as the pattern of the mass matrix is a subset of the pattern of the system matrix. However, this will result in a fill-in similar to the system matrix. Hence, it is favourable to use a different ordering for the mass matrix, resulting in much less fill-in.
t Sample anz(C G J ) anz(L J ) anz(G J ) anz( L J ) 972 1.50 0.04 0. 23 Table 6.3. Computation times and numbers of nonzero entries in case of the Gaussian random field.
As can be seen from Table 6.3, the times for sampling the random field only increase moderately. We remark that, due to the larger supports of the bilinear wavelets and the higher precision of the discretization, the system matrix contains more entries per row on average. This also leads to higher computation times for the matrix assembly. However, as before the increase of nonzero entries in the Cholesky factor remains very moderate. In addition, we have provided the average number of nonzeros per row for the mass matrix in the column labelled by anz(G J ). The number of nonzeros for the corresponding Cholesky factor of the mass matrix is given in the column labelled by anz( L J ).

Conclusion
In this article, we have proposed a very efficient direct solver for nonlocal operators. The pivotal idea is to combine the wavelet matrix compression with the nested dissection ordering. Thereby, the fill-in resulting from a Cholesky decomposition or an LU decomposition is drastically reduced. This has been numerically investigated into depth for three relevant applications, namely the polarizable continuum model, a parabolic problem for the fractional Laplacian in integral form, and the fast simulation of Gaussian random fields. In all three cases, wavelet matrix compression yields sparse system matrices while the fill-in in the matrix factorization remains very low thanks to the nested dissection ordering. This behaviour could be observed for more then 10 6 unknowns in the discretization.
A formidable application of the presented approach is the fast simulation of rough (Gaussian) random fields. In such cases, the numerical solution of the eigenvalue problem for the covariance is computationally prohibitive and, hence, the computation of a Karhunen-Loève expansion is not feasible. In turn, the use of a wavelet basis yields a sparse representation of the covariance operator and a matrix root is rapidly computable by employing the nested dissection ordering and the Cholesky decomposition.