Orthonormal Expansions for Translation-Invariant Kernels

We present a general Fourier analytic technique for constructing orthonormal basis expansions of translation-invariant kernels from orthonormal bases of $\mathscr{L}_2(\mathbb{R})$. This allows us to derive explicit expansions on the real line for (i) Mat\'ern kernels of all half-integer orders in terms of associated Laguerre functions, (ii) the Cauchy kernel in terms of rational functions, and (iii) the Gaussian kernel in terms of Hermite functions.


Introduction
Let r : Ω×Ω → R be a symmetric positive-semidefinite kernel on some domain Ω and H r (Ω) its reproducing kernel Hilbert space (RKHS), which is equipped with an inner product ·, r and the associated norm · r . Practically every commonly used kernel induces an infinite-dimensional RKHS that is separable (see [17] for a short review on separability of RKHSs), which means that H r (Ω) has an orthonormal basis {ψ m } m∈I for some countably infinite index set I (e.g., I = N) in terms of which the kernel has the pointwise convergent orthonormal expansion r(t, u) = m∈I ψ * m (t)ψ m (u) for all t, u ∈ Ω. (1.1) If Ω is a compact subset of R d and r is continuous, the expansion converges uniformly [18,Section 11.3]. One can approach kernels and their orthonormal expansions in two different ways. The space-centric approach is to start with a Hilbert space or its orthonormal basis, show that this space is an RKHS, and construct its reproducing kernel via (1.1); under fortuitous circumstances the kernel is available in closed form. A prime example of this approach is how Korobov spaces and their kernels, which can be expressed in terms of Bernoulli polynomials, are used in the quasi-Monte Carlo literature [e.g., 5, Section 5.8]. Other examples include Hardy spaces [18,Section 1.4.2], power series kernels [29], and Hermite spaces [12]. In the kernel-centric approach one instead starts with a kernel that has, in some sense, desirable or intuitive properties and, if necessary for whatever purpose, finds its RKHS or orthonormal expansion. This approach is dominant in scattered data approximation [25] and Gaussian process modelling as used in, for example, spatial statistics [21] and machine learning [19]. Given only the algebraic form of the kernel, it is usually a non-trivial task to find an orthonormal expansion. However, access to an expansion of the form in (1.1) is often needed to develop reduced rank methods of sub-cubic computational complexity [20], to improve numerical stability [9], or to illustrate and validate theoretical results [13,Section 5]. Translation-invariant kernels, which take the form r(t, u) = Φ(t − u) for some function Φ : Ω → R, are a mainstay of scattered data approximation and Gaussian process modelling. Nevertheless, few orthonormal expansions appear to have been constructed for these kernels. To the best of our knowledge, the Matérn- 1 2 kernel r(t, u) = exp(−λ |t − u|) and the Gaussian kernel r(t, u) = exp(− 1 2 λ 2 (t − u) 2 ) on subsets of R are the only commonly used translationinvariant kernels for which orthonormal expansions have been found. For various expansions of the Matérn- 1 2 kernel, see Section 4 in [10], Section 3.4.1 in [24], Example 4.1 in [27], and Example 2.5 and Appendix A.2 in [8]. For the Gaussian kernel both a simple non-Mercer expansion based on a Taylor expansion of the exponential function [e.g., 16] and a class of Mercer expansions [8,Section 12.2.1], which appear to have originated in [28,Section 4], are available. A large collection of expansions for kernels which are not translation-invariant can be found in [8,Appendix A]. In this article we describe a general and conceptually simple technique for constructing orthonormal bases for translation-invariant kernels on R out of orthonormal bases of L 2 (R).

Construction of orthonormal bases
Let z * denote the complex conjugate and |z| the modulus of z ∈ C. The spaces L 2 (R) and L 2 (R, 1/2π) consist of all square-integrable functions f : R → C and are equipped with the inner products The Fourier transform and the corresponding inverse transform for any integrable or squareintegrable function f are defined aŝ The Fourier transform defines an isometry from L 2 (R) to L 2 (R, 1/2π) via the Plancherel theorem The functions f andf are referred to as time domain and Fourier domain representations, respectively. Our H r (R)-orthonormal expansions are derived from the following rather straightforward theorem. Let I be a countably infinite index set, typically either N or Z.
Theorem 1.1 (Construction of orthonormal bases). Let r(t, u) = Φ(t − u) be a translationinvariant symmetric positive-definite kernel defined by Φ ∈ C(R) ∩ L 1 (R) and {ϕ m } m∈I an orthonormal basis of L 2 (R). Let h be a function such that |ĥ(ω)| =Φ(ω) 1/2 . Then the functions for m ∈ I form an orthonormal basis of H r (R) and the kernel r has the pointwise convergent expansion Proof. That r is symmetric positive-definite implies thatΦ is real-valued and positive [25,Theorem 6.11]. For a non-vanishing function h such that |ĥ(ω)| =Φ(ω) 1/2 we define a convolution operator H : Note that the convolution theorem yields Hf (ω) =ĥ(ω)f (ω). By the standard characterisation (see [14] or [25,Theorem 10.12]) of the RKHS of a translation-invariant kernel, For any f, g ∈ L 2 (R) the convolution theorem and Plancherel theorem thus give which shows that H is an isometry from L 2 (R) to H r (R). It follows from (1.2) that the inverse Fourier transform defines the inverse of H. Therefore H is an isometric isomorphism and thus maps every orthonormal basis of L 2 (R) to an orthonormal basis of H r (R) [11,Section 2.6].
To obtain the basis functions ψ m in time domain using Theorem 1.1 one has to either compute the convolution ∞ −∞ h(t − τ )φ m (τ ) dτ or the inverse Fourier transform ofĥ(ω)φ m (ω). It is therefore necessary to select a basis of L 2 (R) for which either of these operations can be done in closed form. We use Theorem 1.1 to derive orthonormal expansions for (i) Matérn kernels for all half-integer orders, (ii) the Cauchy kernel (i.e., rational quadratic kernel [19,Equation (4.19)] with α = 1), and (iii) the Gaussian kernel. The expansions are summarised in Section 2.

On Mercer expansions
Let Ω be a subset of R d and w : Ω → [0, ∞) a weight function. The Hilbert space L 2 (Ω, w) is equipped with the inner product and consists of all functions f : R → C for which the corresponding norm is finite. Suppose that the kernel r is continuous and define the integral operator Under certain assumptions, Mercer's theorem [22] states that (i) T r,w has continuous eigenfunctions {φ m } ∞ m=0 and corresponding positive non-increasing eigenvalues {µ m } ∞ m=0 which tend to zero, (ii) {φ m } ∞ m=0 are an orthonormal basis of L 2 (Ω, w), and (iii) { √ µ m φ m } ∞ m=0 is an orthonormal basis of H r (Ω). Consequently, the kernel has the pointwise convergent Mercer expansion While Mercer's theorem and the eigenvalues of T r,w constitute a powerful tool for understanding, for example, optimal approximation in L 2 (Ω, w)-norm (e.g., [7,Section 2.4]) and improved approximation orders in subsets of H r (Ω) [25,Section 11.5], there is often (both in theoretical research and practical applications) no reason to prefer a Mercer expansion (1.4) over a generic RKHS-orthonormal expansion (1.1). For example, the Karhunen-Loève theorem is merely a special case of a more general result that a Gaussian process with covariance kernel r can be expanded in terms of any orthonormal basis of H r (Ω) [1, Chapter III]. It therefore appears unjustified that Mercer expansions should be generally preferred over generic orthonormal expansions.
Constructing a Mercer expansion by first identifying a convenient weight and then finding the eigendecomposition of the integral operator (1.3) can be rather involved, which is illustrated by the construction in [8, Example 2.5] for the Matérn-1 2 kernel. What makes Theorem 1.1 convenient is therefore that it does not require that the expansion be Mercer for some weight. However, identifying a weight w for which the basis function ψ m constructed via Theorem 1.1 are L 2 (R, w)-orthogonal shows that the expansion is Mercer because the L 2 (R, w)-normalised versions of ψ m are the eigenfunctions of T r,w . It turns out that our expansion for the Gaussian kernel is Mercer and the ones for Matérn kernels are "almost" Mercer, in that all but finitely many basis functions are orthogonal in L 2 (R, w) for a certain weight.

Summary of expansions
This section summarises the expansions that we derive using Theorem 1.1. Each expansion converges pointwise for all t, u ∈ R. The parameter λ is a positive scale parameter.

Matérn kernels
Expansions for Matérn kernels are derived in Section 3. A Matérn kernel of order α > 0 is where Γ is the Gamma function and K α the modified Bessel function of the second kind of order α. Let L form an orthonormal basis of the RKHS when the order is α = ν + 1/2 for ν ∈ N 0 . Therefore for all t, u ∈ R. The basis functions ψ −,(ν) m,λ and ψ +,(ν) m,λ are orthogonal in L 2 (R, w ν,λ ) for the weight function w ν,λ (t) = 2λ/|2λt| ν+1 .

Cauchy kernel
Expansions for the Cauchy kernel are derived in Section 4. The Cauchy kernel is The complex-valued Cauchy-Laguerre functions for m ∈ N 0 and the real-valued trigonometric Cauchy-Laguerre functions for m ∈ N 0 form orthonormal bases of the RKHS. Therefore, the Cauchy kernel has the expansions for all t, u ∈ R. Expressions of α m,1/λ and β m,1/λ in terms of real parameters are given in (4.5).

Gaussian kernel
Expansions for the Gaussian kernel are derived in Section 5. The Gaussian kernel is The functions form an orthonormal basis of the RKHS and the kernel has the expansion for all t, u ∈ R. This expansion is a special case of the well-known Mercer expansion of the Gaussian kernel [8, Section 12.

Laguerre functions
The following material is mostly based on Section 2.6.4 in [11] and Section 1.03 in [26]. To derive an orthonormal expansion for the Matérn kernel we use the so-called Laguerre functions ϕ m,λ whose Fourier transforms are given bŷ The functionsφ m,λ form an orthonormal basis of L 2 (R, 1/2π). Because the Fourier transform is an isometry, the Laguerre functions themselves, defined by the inverse Fourier transform are an orthonormal basis of L 2 (R). Let L m for m ∈ N 0 be the mth Laguerre polynomial For non-negative indices m ∈ N 0 the inverse Fourier transform (3.4) is given by The conjugate symmetryφ * −m−1,λ (ω) = −φ m,λ (ω) gives the following expression for negative indices: The Laguerre functions and their Fourier transforms satisfy the scaling laws Furthermore, they satisfy the following useful identities:

Matérn-Laguerre functions
In view of Theorem 1.1, an orthonormal basis for the RKHS of the Matérn kernel r ν,+1/2,λ is obtained from (3.1) and (3.3) in Fourier domain aŝ for m ∈ Z. We call the resulting functions the Matérn-Laguerre functions. They satisfy the scaling laws Like the Laguerre functions, the Matérn-Laguerre functions also satisfy a certain conjugate symmetry property in the sense that Furthermore, by the binomial identity and the shift property of Laguerre functions, the Matérn-Laguerre functions and their Fourier transforms are for m ∈ Z. The Matérn kernel of order ν + 1/2 can therefore be expanded as The following proposition provides a uniform upper bound on the Matérn-Laguerre functions.
Proof. By (3.11) and the binomial identity for Laguerre functions, Apply triangle inequality, the Cauchy-Schwartz inequality, and the orthonormality in L 2 (R, 1/2π) ofφ m,λ to arrive at The asymptotic equivalence as ν → ∞ follows from Stirling's formula.
It appears difficult to improve upon the bound in Proposition 3.1. Consequently, uniform convergence of Matérn-Laguerre expansions on R is likely unattainable.

Classification of Matérn-Laguerre functions
For m ∈ N 0 , a more compact and convenient expression of the Matérn-Laguerre functions (3.10) may be obtained by using the convolution formula in Theorem 1.1. For η ∈ N 0 , the associated Laguerre polynomial L (η) m is defined as The associated Laguerre polynomial L (0) m equals the Laguerre polynomial L m in (3.5). For t > 0, we get from (3.2) and (3.6) that where the last equality follows from a convolution identity for Laguerre polynomials [3, Chapter 6, Problem (3)]. For t < 0, the Laguerre functions ϕ m,1/2 (t) vanish and the convolution evaluates to zero. Consequently, the scaling law (3.8) gives for m ∈ N 0 . This motivates the following notation for the three classes of Matérn-Laguerre functions that comprise an orthonormal basis: For convenience, define the corresponding sets theunion M ν,λ = M − ν,λ ∪ M + ν,λ , and the kernels (3.18) We call the set M 0 ν,λ the null-space and study it in more detail in Section 3.5. For now, note that the null-space functions are supported on R because from (3.10) one can see that for m = 0, . . . , ν the sum that defines ψ (ν) −ν−1+m,λ contains Laguerre functions with both negative and non-negative index. Some of the basis functions are shown in Figures 1 and 4.
The Matérn expansion (3.12) can now be written in terms of these functions and kernels as It is clear that the functions in M − ν,λ are supported on the negative real line and the functions in M + ν,λ on the positive real line. This observation yields the following simplifications: We next show that M ν,λ , M − ν,λ , and M + ν,λ form orthogonal bases with respect to the weight function w ν,λ (t) = 2λ/|2λt| ν+1 .
This justifies saying that the expansions we have derived for Matérn kernels are "almost" Mercer.

Truncation error
in terms of the kernels in (3.18). A few translates of this kernel are displayed in Figure 2. The full Matérn kernel is therefore From Proposition 3.2 we see that the kernel ρ ν+1/2,λ is an element of L 2 (R × R, w ν,λ ⊗ w ν,λ ) and that its squared norm is given by Observe that r ν+1/2,λ,n is a finite expansion of ν + 1 + 2n terms. Some truncations of Matérn kernels are displayed in Figure 3. (r ν+1/2,λ (t, u) − r ν+1/2,λ,n (t, u)) 2 w ν,λ (t)w ν,λ (u) dt du Proof. Firstly, the truncation error is m,λ (t, u) . Using Proposition 3.2, the squared norm of the truncation error is straight-forwardly computed as The sum may be estimated with an integral as where n ≥ 1 was used in the last inequality. This yields the desired upper bound. The asymptotic equivalence for c ν as ν → ∞ follows from Stirling's formula.  For this reason we refer to these functions as the null-space functions. The null space functions have a symmetry property similar to that of the functions M ν,λ given by (3.15) and (3.16).
Proof. Starting from (3.11), using the conjugate symmetry of Laguerre functions, and then changing the order of summation giveŝ which is the Fourier domain symmetry. The time domain symmetry is then obtained from Fourier inversion. The set M 0 1,λ (i.e., ν = 1) consists of the functions The set M 0 2,λ (i.e., ν = 2) consists of the functions Some null space functions are depicted in Figure 4. Unlike the basis functions M + ν,λ depicted in Figure 1, the null space functions are supported on the entire real line. For d = |t − u|, a Matérn kernel can be written as where we have used (3.19) and the fact that the kernel ρ + ν+1/2,λ (t, u) vanishes if t = 0 or u = 0. Upon substitution of the expressions in Example 3.5 we obtain the well-known explicit forms of Matérn kernels in terms of d, such as r 3/2,λ (t, u) = (1 + λd)e −λd and r 5/2,λ (t, u) = 1 + λd + λ 2 d 2 3 e −λd .

Cauchy kernel
The Cauchy kernel and its Fourier transform are The Cauchy kernel is thus a Fourier dual to the Matérn kernel of smoothness index α = 1/2 (i.e., ν = 0). In what follows this will inform the construction of an RKHS basis. To emphasize this relationship, we use the reparametrisation λ = 1/γ. A square-root ofΦ 1/γ (ω) is then given bŷ

Expansion in complex-valued Cauchy-Laguerre functions
In view of the Fourier dualism with the Matérn-1 2 kernel and the fact that the Fourier transform is an isometry from L 2 (R) to L 2 (R, 1/2π), a straight-forward way to construct a suitable basis in L 2 (R, 1/2π) is to pick the Laguerre functions ϕ m,γ (ω) from Section 3.1 and scale them by √ 2π. These L 2 (R) basis functions and (4.2) yield the RKHS basis functionŝ in the Fourier domain. Because their inverse Fourier transforms are complex-valued, we call these functions the complex-valued Cauchy-Laguerre functions. For m ∈ N 0 , Fourier inversion gives Similarly, for negative indices we get To summarise, the complex valued Cauchy-Laguerre functions are (4.3b) They have the conjugate symmetry property An expansion of the Cauchy kernel (4.1) in terms of complex-valued Cauchy-Laguerre functions is thus given by This expansion is remarkably easy to verify since geometric summation and conjugate symmetry yield which indeed is the Cauchy kernel. An appropriate L 2 (R, w) space in which the complex-valued Cauchy-Laguerre functions form a complete orthogonal set remains elusive to us. However, just as with the Matérn-Laguerre expansions in Section 3, the present expansion is very good at origin since all but two terms vanish:

Expansion in trigonometric Cauchy-Laguerre functions
It would be desirable to obtain a real-valued basis for the Cauchy RKHS. This can be done by scaling the real and imaginary parts ofφ m,λ [4]. The resulting trigonometric Laguerre functions, which form an orthonormal basis of L 2 (R), are given bŷ in the Fourier domain. These functions are so named because they resemble the trigonometric functions by being even and odd, respectively. Explicit expressions are [4, Section 3] In time domain, the functions are which have the more compact expressions The results in Section 4.1 give the RKHS basis functions for m ∈ N 0 , where ψ m,γ are the complex-valued Cauchy-Laguerre functions in (4.3). We call the functions α m,γ and β m,γ the trigonometric Cauchy-Laguerre functions. These functions can be computed by taking the real and imaginary parts of √ 2 ψ m,γ (t). The binomial theorem yields the explicit expressions which can be transformed into expressions of only real parameters by considering even and odd m separately. This yields

Gaussian kernel
The Gaussian kernel and its Fourier transform are so that taking the inverse Fourier transform gives the function h in Theorem 1.1 as h λ (t) = 2 1/4 π −1/4 √ λe −λ 2 t 2 .

Expansion for the Gaussian kernel
As an orthonormal basis of L 2 (R) we use the Hermite functions (for them being an orthonormal basis, see [23,Theorem 5.7 Here H m is the mth physicist's Hermite polynomial given by By Theorem 1.1, the functions form an orthonormal basis of the RKHS of the Gaussian kernel (5.1). Equation (17) in Section 16.5 of [6] states that for any reals s and a. Completing the square, doing a change of variables, and using this equation yields We thus obtain the basis functions  and δ 2 = α 2 2 (β 2 − 1).
The Mercer expansion of the Gaussian kernel with respect to the weight function on the real line is where µ m,λ,α = α 2 α 2 + δ 2 + λ 2 /2 λ 2 /2 α 2 + δ 2 + λ 2 /2 m are the eigenvalues and the L 2 (R, w α )-orthonormal eigenfunctions of the integral operator in (1.3). By requiring that αβ = 2λ/ √ 3, so that the Hermite polynomials appearing in (5.5) and (5.8) have the same scaling, it is straight-forward to solve that which shows that the basis (5.5) is a special case of the Mercer basis. Results of some of the above computations are collected in the following proposition. form an orthonormal basis of L 2 (R, w α ).
Although the Mercer expansion (5.7) has been known for some time, apparently originating in [28,Section 4], all its derivations in the literature that we are aware of are based on Mehler's formula and integral identities for Hermite polynomials (the only detailed derivation that we know of is given in [8, Section 12.2.1]). The expansion (5.6) is therefore the first Mercer expansion for the Gaussian kernel that has been derived from some general principle, which in this case is Theorem 1.1, instead of utilising ad hoc calculations. The relative simplicity of the basis functions (5.5) and the fact that the Hermite functions (5.3) have the same exponential decay as the kernel suggest that the choice α = √ 2/3 λ for the standard deviation of the Gaussian weight w α may be in some sense the most natural one. More discussion on the selection of α may be found in [9, Section 5.3].

Truncation error
Define the truncated kernel for any n ∈ N. A few truncations are shown in Figure 8. The truncated kernel converges to the full Gaussian kernel r λ pointwise on R × R. The following proposition shows that the convergence of (5.9) to r λ is exponential in L 2 (R × R, w α ⊗ w α ). This completes the proof.

Conclusion
In this article, we have demonstrated that Theorem 1.1 is a simple and powerful tool for constructing orthonormal expansions of translation-invariant kernels. In particular, using the Cholesky factor of the Fourier transform of the kernel together with the Laguerre functions led to an interesting decomposition of the RKHS of the Matérn kernel for half-integer smoothness parameters in terms of a finite dimensional space and a Hilbert space of functions vanishing at origin. This might be deemed unsatisfying, and a possible avenue to obtaining basis functions for Matérns in a common space would be to investigate constructions based on the symmetric square-root. The expansion for the Cauchy kernel was derived from the Fourier duality with the Matérn kernel of smoothness α = 1/2. It remains an open problem to find a weighted L 2 space in which the Cauchy basis functions are orthogonal. For the Gaussian kernel, our construction is a means to reproduce certain Mercer expansions that are typically derived from Mehler's formula.
Observe that in each case we have used a basis of L 2 (R) that is "compatible" with the kernel, in being parametrised by λ in a suitable way that mirrors how this parameter appears in the kernel or its Fourier transform. For example, compare (3.1) and (3.3) or (5.1) and (5.3). It may be possible to replace λ with an arbitrary positive κ in the basis of L 2 (R) and base constructions on the functions ϕ m,κ . We do not know how to proceed with Matérns or the Cauchy kernel, but for the Gaussian kernel one can proceed exactly as in Section 5.1. Doing so yields the basis functions ψ m,λ,κ (t) = where a = √ λ 2 + κ 2 /2, for any κ ∈ (0, √ 2λ).