Functional Penalised Basis Pursuit on Spheres

In this paper, we propose a unified theoretical and practical spherical approximation framework for functional inverse problems on the hypersphere. More specifically, we consider recovering spherical fields directly in the continuous domain using functional penalised basis pursuit problems with gTV regularisation terms. Our framework is compatible with various measurement types as well as non-differentiable convex cost functionals. Via a novel representer theorem, we characterise their solution sets in terms of spherical splines with sparse innovations. We use this result to derive an approximate canonical spline-based discretisation scheme, with vanishing approximation error. To solve the resulting finite-dimensional optimisation problem, we propose an efficient and provably convergent primal-dual splitting algorithm. We illustrate the versatility of our framework on real-life examples from the field of environmental sciences.


Spherical Approximation: An Overview
Many scientific inquiries in natural sciences, such as environmental and planetary sciences [17,43,79], acoustics [54] or astronomy [47,51,63], involve approximating a spherical field -a scalar quantity such as a function or measure defined over a continuum of directions, from a finite number of measurements acquired by probing sensors. During the reconstruction task, the physical evidence is compared to some prior model of the unknown spherical field, reflecting the analyst's a priori beliefs about the latter. In practice, a trade-off between fidelity to the data and compliance with this prior is assessed via a composite convex optimisation problem, linear combination of a cost functional and a regularisation term. Popular regularisation strategies include generalised Tikhonov (gTikhonov) or generalised total variation (gTV) [34], which favour physically admissible spherical fields with smooth and sharp variations respectively. Since spherical fields encountered in nature are continuous and hence have infinitely many degrees of freedom, scientists often constrain the approximation problem using domain discretisation schemes, which help reducing the number of degrees of freedom to something more manageable, ideally comparable to the size of the available data. For Euclidean domains, it is for example common practice to approximate the continuum by means of discrete uniformly distributed point sets, typically forming regular rectangular grids. 1 The popularity of such domain discretisation schemes can be primarily explained by their simplicity and computational conveniency. Indeed, signals defined over rectangular grids admit a natural representation as multi-dimensional arrays, a data structure commonly used in computer science for computation, storage and visualisation purposes.
Unfortunately, the sphere manifold structure makes it much more difficult to discretise by means of uniformly distributed point sets. For example, points gridded regularly on the azimuth-elevation domain [0, 2π] × [−π/2, π/2] are highly non-uniformly distributed at the surface of the sphere, with a much higher concentration of points near the poles (see Fig. 1a). As a matter of fact, uniform spherical point sets are only known [54,Chapter 3] for fixed numbers of points: 4, 6, 8, 12 and 20. They are respectively obtained from the vertices of the five Platonic solids: the tetrahedron, cube, octahedron, dodecahedron and icosahedron. For arbitrary numbers of points, spherical point sets with quasi-uniform distribution have been proposed [33, 36,  show examples of non-uniform ( Fig. 1a) and quasi-uniform (Figs. 1b to 1d) spherical point sets (marked by black dots). Fig. 1e on the other hand, shows an example of parametric discretisation by means of bell-shaped zonal basis functions. The equal-angle point set in Fig. 1a is obtained by gridding the azimuth-elevation domain. The point set in Fig. 1b is called the Fibonacci lattice, and can be generated as explained in [32]. The point sets in Figs. 1c and 1d are obtained from the cell centroids of the cubic and HEALPix spherical tessellations respectively. The cubic tessellation is obtained by projecting the pixelated faces of a cube onto the sphere. The HEALPix tessellation, very popular in cosmology and astronomy, is constructed by hierarchically subdividing the faces of a dodecahedron [33]. 54]. The spherical Voronoi diagrams of the latter typically tile the sphere with near-regular polygonal tiles, 2 see for example Fig. 1. Unfortunately, quasi-uniform spherical point sets are significantly more complicated to work with as they are not easily represented by array-like data-structures. Moreover, derivatives and more generally pseudo-differential operators are difficult to approximate on quasi-uniform spherical point sets [12,21], making it cumbersome to work with gTikhonov and gTV priors. The difficulty in designing domain discretisation schemes for the sphere has led scholars to consider alternative parametric discretisation schemes, where the unknown spherical field is constrained to a finite dimensional functional space, typically spanned by zonal basis functions 3 [27,31,37,40,49], i.e. functions with rotational invariance around a particular central direction on the sphere. The majority of zonal basis functions used in practice take the form of positive and smooth bell-shaped functions, sharply decaying to zero as the angular distance from their central direction increases (see Fig. 1e). They possess moreover many useful properties, particularly convenient for practical purposes: • They are identical, spatially localised and highly symmetric, and thus easy to evaluate and amenable to sparse, parallel computations. • Their overlapping supports and strong regularity make them well-suited for approximating smooth natural phenomena. • Their centres can be positioned arbitrarily at the surface of the sphere, permitting for example the concentration of more zonal basis functions in regions more susceptible of welcoming the signal. • Strictly positive definite zonal basis functions [31] are all linearly independent, irrespective of the chosen centres [7]. This guarantees a non-redundant representation and limits the risk of numerical instability. • They are particularly well-suited for scattered data interpolation problems [27,37,49] where the spatial samples to interpolate may be non-uniformly distributed. This last fact is probably the main reason for their wide adoption in the literature. As a matter of fact, some zonal basis functions are not merely well-suited but canonical to spherical scattered data interpolation. This is notably the case for a specific type of zonal functions, called spherical splines [48], which arise naturally as solutions of interpolation problems on the sphere. As an illustration, consider the simplest interpolation problem where one wishes to find all maximally smooth functions with prescribed values {y 1 , . . . , y L } ⊂ R at directions {r 1 , . . . , r L } ⊂ S 2 . The relevant notion of smoothness is of course application dependent, but is generally enforced by seeking an interpolant with minimal generalised Tikhonov (gTikhonov) norm, induced by some linear, self-adjoint and strictly positive definite pseudo-differential operator D. In mathematical terms, the prototypical interpolation problem can be formulated as: { Df 2 such that f (r i ) = y i , i = 1, . . . , L} , (1.1) where the search space H D is an appropriately chosen reproducing kernel Hilbert space (RKHS) so that all the quantities involved in (1.1) are well-defined. It is then possible to show that there exists a unique maximally smooth interpolant V = {f }, and that the latter has exactly L degrees of freedom. Moreover, the maximally smooth interpolant f can be expressed [48,Section 6.3] as a spherical spline 4 with knots coinciding with the sampling directions {r 1 , . . . , r L } ⊂ S 2 . The L spline weights can moreover be recovered by solving a square linear system [48,Section 6.3]. This result is quite remarkable, since it provides us with a canonical discretisation scheme operating in a lossless fashion: the infinite-dimensional optimisation problem (1.1) is transformed into an equivalent finite-dimensional optimisation problem, amenable to numerical optimisation. Theorem 6.40 in [48, Section 6. 4.2] generalises 5 this result to the smoothing spline approximation problem: used as an alternative to (1.1) in the context of noisy spatial samples, since it is less prone to overfitting. Unfortunately, both problems (1.1) and ( 1.2) are too restrictive for most spherical approximation tasks encountered in practice. This is for example the case when the measurements are corrupted by non Gaussian noise -hence requiring a more general cost functional than a quadratic one-or do not consist in directional samples of the spherical field, but rather in local averages or more generally filtrations of the latter. In addition, gTikhonov-regularised optimisation problems à la (1.1) and (1.2) suffer from two main drawbacks: they tend to produce overly smooth interpolants and are too sensitive to the sampling locations {r 1 , . . . , r L } ⊂ S 2 . This is a general behaviour of smooth spline approximation even in the Euclidean setting [34], which can be explained in part by the fact that the gTikhonov regularising norm is a weighted L 2 norm. The latter favours indeed functions with relatively smooth variations, rendering the spline interpolant incapable of adapting to rapid changes in the data. To overcome this limitation scholars have, motivated by empirical studies [73], advocated the use of generalised total variation (gTV) regularisation norms, promoting functions with relatively sparse but potentially sharp variations, as often encountered in natural phenomena. However, representer theorems [10,11,24,34,70] characterising the form of the solutions yielded by the use of such regularisation strategies are, to date, unadapted to spherical geometries (see Section 1.3 for a discussion on prior art). For this reason, and in contrast with most fields of signal processing, total variation based penalties are still very much unexplored in spherical setups. As explained in greater detail in Section 1.2, one of the goals of this work is to close this theoretical gap and promote the wider use of such recovery methods across the community of geomathematicians.

Contributions and Organisation of the Manuscript
A primary goal of this paper is to offer a unified theoretical and practical approximation framework for gTV regularised infinite dimensional inference on the hypersphere S d−1 of arbitrary dimension d ≥ 2. Out of concern for making the content of this work accessible to a wider audience, care has been taken in thoroughly interpreting and analysing the stated results and their assumptions, with a particular focus on their practical implications. Multiple real-life examples from environmental sciences and radio astronomy are moreover considered. The main contributions of this manuscript are listed hereafter and classified into three categories: theory, algorithms and applications.
Theory Our main theoretical contribution is a representer theorem for functional penalised basis pursuit (FPBP) problems on the hypersphere, established and proved in Section 4. FPBP problems essentially seek spherical functions, measures or distributions minimising an optimal trade-off between a convex cost functional and a gTV penalty term. They are well suited for solving functional linear inverse problems on the hypersphere. To accommodate various measurements types, this class of optimisation problems is built upon a convenient generalised sampling framework, modelling the measurement process in terms of sampling linear functionals. Our representer theorem 2 shows that the solution sets of FPBP problems can be characterised geometrically as the (weak * ) closed convex hull of their extreme points. These extreme points take moreover the form of spherical splines with sparse innovations -i.e. fewer degrees of freedom than the total number of measurements-and unknown knots. The splines are canonically associated to the pseudo-differential operator D intervening in the gTV regularisation term. A key assumption of Theorem 2 is that the pseudo-differential operator D is invertible. With this assumption, Theorem 2 can be obtained as corollary of Lemma 1, an abstract representer theorem pertaining to penalised convex optimisation in abstract Banach spaces. This result, which generalises [68,Theorem 6] to the case of non-strictly convex cost functionals, is based on [68,Theorem 5], [34,Proposition 8] and [10,Theorem 3.1]. Note that assuming the regularising pseudo-differential operator D to be invertible does not restrict too much the applicability of Theorem 2 in practice. Indeed, we show that non-invertible pseudo-differential operators such as the Laplace-Beltrami operator ∆ S d−1 can be brought into the scope of Theorem 2 if properly regularised on their nullspaces. Finally, we provide a sufficient condition for sampling linear functionals to be compatible with a given regularising operator D. More precisely, we show that choosing the sampling linear functionals in the predual of the search space yields well-defined FPBP problems. This allows us, in particular, to show that most practically interesting sampling functionals are compatible with the class of pseudo-differential operators considered in this paper, including atomic Dirac measures -convenient for spatial sampling-or square-integrable functions.

Representer Theorems in the Literature
In this section, we review the most notable representer theorems proposed in the literature for gTV regularisation, and discuss their limitations in the context of spherical approximation. A summary of this section is provided in Table 1.
Inspired by the pioneering work of Fisher and Jerome in [23], Unser et al. have investigated in a series of papers [18-20, 34, 70] gTV-regularised functional inverse problems over R d . Like Theorem 2, they consider a generalised sampling framework, compatible with a great variety of linear measurements. Unlike Theorem 2 however, they do not assume that the regularising pseudo-differential operator is invertible, and allow it to have a finite dimensional nullspace. The conclusions of their representer theorem in [70] -derived for Euclidean domains only-are analogous to the ones of Theorem 2, proposed in this paper for spherical domains. In subsequent publications [19,20,34], the authors proposed canonical discretisation schemes as well as numerical algorithms for approximating extreme point solutions of gTV-penalised convex optimisation problems. In [34], they consider a discretisation based on cardinal splines of the gTV pseudo-differential operator, with uniform knots chosen over a dense grid. In [20], they propose a numerically stabler discretisation scheme, based this time on multi-resolution B-splines with refinable grid sizes. In both cases they solve the resulting discrete optimisation problem with a two-stage procedure leveraging proximal gradient descent and the simplex algorithm. In the specific case where the domain is R and the regularising operator is the second derivative, Debarre et al. describe in [19] an even simpler reconstruction algorithm. In contrast with the proximal algorithms proposed in this paper, the optimisation pipelines proposed in these papers are however limited to differentiable cost functionals with Lipschitz-continuous derivatives. While remarkably generic, their spline approximation framework is only valid for functions defined over R d , and cannot be used in the spherical setting.
In a subsequent work [24], Flinth et al. proposed an alternative proof of the representer theorem proposed in [70]. Their proof is based on a limit argument, considering nested finite dimensional discretisations of the domain Ω ⊂ R d based on finer and finer uniform rectangular grids. They claim that such an approach, presented in the Euclidean case for the sake of simplicity, could easily be adapted to domains more general than R d , such as the torus or any separable, locally compact topological space. They specify however that such an extension would require modifying adequately the discretisation scheme to the specific geometry of the domain, without giving additional details on how this could be achieved canonically. Unfortunately, such a task may be very complex if even possible at all for geometries such as the sphere. Indeed, discretising the sphere by means of nested quasi-uniform point sets with finer and finer resolution is, as previously discussed, a nontrivial problem. More recently, Boyer et al. [10] and Bredies et al. [11] have independently shown that the solutions to infinite dimensional optimisation problems with convex regularisers are convex combinations of extreme points of the regulariser level sets. This result applies notably to gTV regularisers with not only scalar but also vector pseudo-differential operators such as the gradient. This is in contrast with the previously cited works which were all limited to scalar pseudo-differential operators such as the Laplace-Beltrami operator. While theoretically applicable to spherical geometries, their result neither addresses existence conditions nor characterises the minimal search space (and its corresponding predual) associated to a certain gTV norm. This is problematic for practical purposes, where it is crucial to know if a given optimisation problem admits a solution or understand which sampling linear functionals are compatible with a specific choice of gTV penalty.
Finally, Unser [68] established a Banach representer theorem with very broad applicability. Unlike the previously cited results, this representer theorem relies on the notion of duality map, which generalises the Hilbert notion of Riesz map to Banach spaces. More precisely, it shows that the solutions of convex regularised inverse problems are contained in the image by a certain duality map of a linear combination of the sensing linear functionals. As acknowledged by the author, this result is however of limited use in the context of gTV regularisation, since the duality map is unknown, nonlinear and set-valued.

Notations and Terminology
Throughout the manuscript, we adopt the following conventions: • We use the term spherical field to refer, depending on the context, to functions, measures or generalised functions [71] defined over the sphere S d−1 for any dimension 7 d ≥ 2. In full generality, one shall think at a spherical field as an element of some infinite-dimensional Banach space f ∈ B. • It is traditional to call the 1-sphere S 1 ⊂ R 2 a circle, the 2-sphere S 2 ⊂ R 3 a sphere and the (d − 1)- For the sake of simplicity, we break with tradition and use the appellation "sphere" agnostic to the underlying dimension. Moreover, we denote by a d the area of the unit sphere S d−1 , d ≥ 2, given in general by: a d = 2π d/2 /Γ(d/2). We have notably a 2 = 2π and a 3 = 4π. • Vectors and matrices are written in bold face, in an attempt to make finite-dimensional quantities more apparent. The adjoint, 8 Moore-Penrose pseudo-inverse, range and nullspace of a linear operator Φ are denoted by Φ * , Φ † , R(Φ) and N (Φ) respectively. For scalars z ∈ C finally, we denote byz, |z|, R(z), I(z) the conjugate, modulus, real part and imaginary part of z respectively.
and strictly convex if the inequality is strict. If moreover, F (x) > −∞ for all x ∈ C N and D = {x ∈ C N : F (x) < +∞} = ∅, then F is called a proper convex function. 9 • Let (X , T ) be a topological space. A function F : X → R ∪ {−∞, +∞} is said lower semi-continuous (lwsc) 10 at x 0 ∈ X if for every y < F (x 0 ) there exists a neighborhood U of x 0 such that F (x) > y for all x ∈ U . A function is lwsc i.f.f. all of its lower level sets {x ∈ X : F (x) ≤ y}, y ∈ R are closed in T . When X is a metric space, we assume the metric topology as underlying topology and do not specify it explicitly.

Preliminaries
This section briefly reviews the key ingredients of our approximation framework, borrowed from the fields of functional analysis, measure theory and Fourier analysis on the hypersphere respectively. We begin our discussion in Section 2.1 with the concept of duality in topological vector spaces, central to the process of generalised sampling introduced in Section 4. 1. In Section 2. 1.4, we define and provide a dual characterisation of the total variation norm for regular Borel measures. The latter generalises the discrete 1 norm to continuous setups, and will be used in our approximation framework as a sparsity-promoting regularisation norm. Finally, we conclude in Section 2.2 with important notions from Fourier analysis on the hypersphere, namely spherical harmonics and spherical zonal functions, which will help us define the class of spline-admissible pseudo-differential operators in Section 3.

Duality in Topological Vector Spaces
In order to approximate a spherical field, modelled here as a generic element f of a vector space B, one must first collect evidence of the latter, often by sensing it via a linear acquisition device. As we shall see in Section 4.1, these linear measurements can in general be modelled as the outcomes of a collection of device-specific linear functionals acting on the object f of interest. In this section, we investigate the structure of the space of all linear functionals associated with a given vector space B, with a special focus on those that yield well-defined measurements when acting on any element f ∈ B.

Schwartz Duality Product
For a vector space B over a scalar field C, the space of all linear functionals f : B → C is a vector space called the algebraic dual and is denoted by B * . It is customary to write the action of a linear functional f ∈ B * onto an element h ∈ B by means of a bilinear map ·|· : B * × B → C called the Schwartz duality product defined as ·|· : The bra-ket notation used in (2.1) to denote the Schwartz duality product is common in quantum mechanics and was introduced by Paul Dirac in 1939. Its resemblance to the inner product is not fortuitous, and is motivated by Hilbert space theory. Indeed, the Riesz-Fréchet representation theorem [25] states that, for a Hilbert space H , every linear functional in H * can be written as an inner product with some unique element g of H , i.e. ∀f ∈ H * , ∃!g ∈ H such that f |h = h, g H .

Topological Dual
Any sensible acquisition system should react continuously to variations in its input. It seems hence reasonable to require that the linear functionals modelling it be continuous as well. The subset of continuous linear functionals in the algebraic dual is a linear subspace, called the topological dual 11 and denoted by B . In infinite dimensions, not all linear functionals are guaranteed to be continuous so we have in general B ⊂ B * . Moreover, since continuity is a topological notion, there can exist multiple topological duals of a given space B depending on the topology chosen on the latter. In the special case where (B, · ) is a Banach space equipped with its canonical normed topology, continuous linear functionals can be characterised as: The norm |||·||| : B → R + used in (2.2) makes B a Banach space and is called the dual norm induced by the norm · on B. In plain words, (2.2) states that continuous linear functionals are bounded 12 linear functionals in B * . The latter will hence produce bounded measurements, well-defined for any input f ∈ B.

Vocabulary 1 (Predual and Duality Pair).
A topological vector space B and its topological dual B are said to form a duality pair. Moreover, B is called the predual of B .

Weak * and Strong Topologies on the Topological Dual
Throughout, we will sometimes have cause to define a topology on the topological dual B . Two popular choices are the strong topology and the weak * topology. The strong topology, or topology of uniform convergence, is the Banach topology induced by the dual norm defined in (2.2). With this topology, the closed balls are necessarily not compact when B is infinite dimensional. This renders the strong topology cumbersome to work with. For this reason, we will prefer the weak * topology, or topology of pointwise convergence, which does not suffer from similar issues (see Banach-Alaoglu theorem [60, p. 68]). It is the coarsest topology on B such that elements h ∈ B are continuous functionals on B . It is induced by the family of seminorms: Convergence with respect to the weak * topology is pointwise. Indeed {f n , n ∈ N} ⊂ B converges towards a limit functional f * ∈ B with respect to the weak * topology i.
Throughout, we will employ expressions such as "weak * compact", "weak * closed" or "weak * convergent" when it is important to make obvious the underlying topology with respect to which the topological notions should be understood.

Duality Pairs for Common Functional Spaces
In this section we provide well-known duality pairs for functional spaces of interest.

Schwartz Functions and Generalised Functions
The space of generalised functions or distributions is almost the largest functional space that can be defined on S d−1 . It contains as subspaces the Lebesgue spaces as well as the spaces of continuous functions or regular Borel measures. It is denoted S (S d−1 ) and is defined as the topological dual of the space of Schwartz functions 13 S (S d−1 ) = C ∞ (S d−1 ), equipped with the metric topology generated by the family of norms: where ∆ S d−1 is the Laplace-Beltrami operator on S d−1 . The Schwartz space is a locally convex Fréchet space 14 .
It is not normable and in particular not complete with the supremum norm (indeed, it is dense in the space of continuous functions). Note that since the metric topology on S (S d−1 ) is not induced by a norm 15 , it is not possible to define a strong topology on S (S d−1 ). We will hence always assume the weak * topology as canonical topology on S (S d−1 ).

Continuous Functions, Measures and Total Variation Norm
The Riesz-Markov representation theorem (see [64,Theorem 2.5] for a statement of the latter in the spherical setup) establishes the duality pair between the space of continuous functions equipped with the supremum norm and the space of regular Borel measures equipped with the total variation (TV) norm [64, Definition 2.2]. The TV norm can be thought as an L 1 norm for measures, 16 which motivates its use as sparsity-inducing regularisation norm in 13 The hypersphere being compact and bounded, Schwartz functions simply reduce to infinitely smooth functions. 14 Fréchet spaces generalise Banach spaces with metrics that do not originate from norms. 15 It is possible to show that S (S d−1 ) is not normable [71]. 16 As a matter of fact, the TV norm of a measure absolutely continuous w.r.t. the Lebesgue measure is given by the L 1 norm of its density (or Radon-Nikodym derivative) [ penalised convex optimisation problem involving measures as considered in this paper. From (2.2) and the Riesz-Markov representation theorem, we moreover get the following dual characterisation of M(S d−1 ):

Fourier Analysis on the Hypersphere
The class of spherical pseudo-differential operators introduced in Section 3 are defined implicitly in the Fourier domain. In this section, we hence introduce the basic mathematical machinery needed for performing Fourier analysis on the hypersphere. The material presented in this chapter is based on the formalism adopted in [37,48,54]. Useful additional results are also available in [64, Chapter 3].

Spherical Harmonics
One possible route towards defining the Fourier basis on the hypersphere is to proceed analogously to Fourier and study fundamental solutions of the heat differential equation on S d−1 .

Definition 1 (Spherical Harmonics).
Let We call spherical harmonic of order n any eigenfunction Y in the eigenspace Harm n (S d−1 ) associated to the eigenvalue λ n : Moreover, we denote by B n := {Y m n , m = 1, · · · , N d (n)} any orthonormal basis of Harm n (S d−1 ), where N d (n) is the geometric multiplicity of the eigenvalue λ n .
function f ∈ L 2 (S d−1 ) admits a spherical Fourier expansion given by where the spherical Fourier coefficients {f m n } ⊂ C of f are given by the spherical harmonic transform (SHT):

Remark 2 (Fully Normalised Spherical Harmonics).
Note that the spherical harmonics Y m n in Definition 1 and Theorem 1 are not uniquely specified, since there exists infinitely many orthonormal bases B n of Harm n (S d−1 ). In practice, the convention is to work with the system of so-called fully normalised spherical harmonics (FNSH), 18 obtained inductively by the method of separation of variables applied to the eigenvalue problem for the Laplace-Beltrami operator ∆ S d−1 . This construction is detailed in [48,Section 5.2] for the case d = 2 and in [65] for the general case. In all that follows, we will always assume the fully normalised spherical harmonics as canonical Fourier basis. Closed-form analytical expressions of the fully normalised spherical harmonics for d = 2, 3 are provided in [64,Example 3.1]. Formulae for the more general case d > 3 are available in [65].
Using the bilinearity of the Schwartz duality product, it is possible to extend the SHT to generalised functions [64,Remark 3.3]. The generalised spherical harmonic transform (gSHT) of a generalised function Notice that the convergence of the infinite series in (2.5) is w.r.t. to the weak * topology (see Section 2. 1.3).
Since the spherical harmonics are infinitely differentiable and hence in the predual S (S d−1 ) of S (S d−1 ), the Fourier coefficients {f m n } are moreover well-defined.

Spherical Zonal Kernels
Another route towards defining the Fourier basis consists of looking at eigenfunctions of linear shift invariant systems, or convolution operators [72, Chapters 3 and 4]. As the hypersphere is a manifold, there is no intrinsically defined notion of convolution. 19 It is however possible to define a class of linear integral operators which "behave" as traditional convolution operators. In the Euclidean setting, convolution operators have shift-invariant kernels, whose value at a pair (r, s) ∈ R d × R d depends only on the distance r − s R d between the two points. We can extend this notion to the hypersphere by noticing that the chord distance between two input directions r, s ∈ S d−1 is given by: r − s R d = r 2 + s 2 − 2 r, s = 2 − 2 r, s . Notice that this quantity depends only on the inner product between the two directions r, s. This observation naturally leads to the notion of zonal kernel: For brevity, the function ψ is often abusively referred to as the zonal kernel and no reference is made to Ψ. Moreover, for every s ∈ S d−1 , we call the trace ψ( ·, s ) : S d−1 → C of a zonal kernel a zonal function.
Zonal kernels can then be used to construct spherical convolution operators [48]:

Remark 3
It is shown in [48,Theorem 7.2] that, under the assumptions of Definition 3, the image of the convolution operator (2.6) is indeed L 2 (S d−1 ).
Using two important results from spherical Fourier analysis, namely the addition theorem [48,Theorem 5.11] and the Funk-Hecke formula [48,Theorem 7.3], it can be shown [64,Proposition 3.8] that spherical convolution operators defined from zonal kernels are indeed diagonalised by spherical harmonics.

Hyperspherical Splines
In this section we introduce hyperspherical splines -or spherical splines for short, which play a central role in spherical approximation theory [48,Chapter 6]. To this end, we extend the approach of [71, Chapter 6] to the spherical setting and construct spherical splines as "primitives" of finite Dirac streams w.r.t. a certain class of pseudo-differential operators, called spline-admissible. In short, spline-admissible operators are such that their fundamental solutions, called Green functions, are ordinary functions. 20 We derive a sufficient condition for spline-admissibility and provide examples of spline-admissible operators among the pseudo-differential operators most commonly used in practice.

Spherical Pseudo-Differential Operators
By analogy with the Euclidean case, we define spherical pseudo-differential operators as Fourier multipliers with slowly growing spectra.

Definition 4 (Spherical Pseudo-Differential Operator).
We call spherical pseudo-differential operator any linear operator of the form D :

3)
for some real number p ≥ 0, called the spectral growth order of D. 20 Ordinary functions are functions which are everywhere defined pointwise.

Remark 4 (Theta Notation).
The condition |D n | = Θ (n p ) for some p ≥ 0 means that |D n | = O (n p ) and |D n | = Ω (n p ), i.e. there exists n 0 ∈ N such that ∀n ≥ n 0 we have C 1 n p ≤ |D n | ≤ C 2 n p , for some positive constants C 1 , C 2 ∈ R + . In other words, the sequence {|D n |} n∈N is asymptotically comparable to the polynomial n p .
Notice that D multiplies the Fourier coefficients of its argument h by a sequence {D n } n∈N with polynomial growth order. This filtering operation effectively boosts the high frequency content of h, hence making it "rougher" (less regular). This behaviour is reminiscent of the one of the Laplace-Beltrami operator, which can indeed be shown to verify Definition 4. Examples of common pseudo-differential operators are provided in the following example.

Example 1 (Common Pseudo-Differential Operators). Consider the Laplace-Beltrami operator
• Iterated Laplace-Beltrami operators: these operators are obtained as integer powers (i.e. successive compositions) of the Laplace-Beltrami operator D : They are indeed pseudodifferential operators since they can be written 21 is positive semi-definite for k even and negative semi-definite for k odd.
• Fractional Laplace-Beltrami operators: these operators are obtained as p-th roots of the negative Laplace-Beltrami operator D : They are indeed pseudo-differential operators since they can be written as in (3.1) The case p = 2 yields the square-root of the Laplace-Beltrami operator, which is intimately linked to the spherical gradient differential operators The latter is however vector-valued, and hence does not belong to the class of pseudo-differential operators considered in Definition 4. • Sobolev operators: these operators are defined as D := (Id − ∆ S d−1 ) β , with β > 0. They are indeed pseudo-differential operators since their Fourier symbols are given byD n = (1 + n(n + d − 2)) β , n ∈ N, and henceD n ∈ R, |D n | = Θ(n 2β ), and In order to gain further insight on Definition 4 and the motivations behind it, it is helpful to look at some key properties of spherical pseudo-differential operators:

Proposition 1 (Properties of Pseudo-Differential Operators).
Let D be a spherical pseudo-differential operator as in Definition 4. Then the following holds: is an eigenfunction of D, with common eigenvalue λ n =D n . 3. D has finite-dimensional nullspace, given by

it maps infinitely differentiable functions onto infinitely differentiable functions.
The proof of Proposition 1 is not detailed here due to space constraints but is relatively easy to obtain from the assumptions of Definition 4 [64, Proposition 4.1]: • Properties 1 and 3 are direct consequences of the fact that the sequence {D n } n∈N is respectively real and null for at most finitely many integers. • The isotropy property 2 is implicitly assumed in Definition 4 since the spectrum of D was chosen in (3.1) to be constant for all m = 1, . . . , N d (n) in a given frequency level n ∈ N. As shall be seen in Section 3.2, this construction guarantees that -when they exist-the spherical splines associated to a given pseudo-differential operator are sums of zonal functions, and hence fast to evaluate. 21 Recall that the spherical harmonics were defined as eigenfunctions of the Laplace-Beltrami operator: • Property 4 finally, results from the polynomial growth of the sequence {D n } n∈N . More specifically, it results from {D n } n∈N being asymptotically bounded from above by a polynomial sequence |D n | = O(n p ), implied by |D n | = Θ(n p ). Note that Definition 4 has an additional requirement that {D n } n∈N be asymptotically bounded from below by a polynomial sequence -i.e. |D n | = Ω(n p ). This assumption comes into play when considering primitives w.r.t. a particular pseudo-differential operator D, obtained via the Moore-Penrose pseudo-inverse D † of D.

Proposition 2 (Moore-Penrose Pseudo-Inverse of D).
Let D be a pseudo-differential operator as in Definition 4. The Moore-Penrose pseudo-inverse D † of D is given by

Proof. The proof of Proposition 2 is available in Appendix A.
Note that the primitive operator D † acts as an integral operator and smooths out high frequency content with a polynomially decaying sequence. In what follows, we will sometimes need to extend by duality the action of D (respectively D † ) to generalised functions f ∈ S (S d−1 ). Since D (respectively D † ) is self-adjoint, this can easily be achieved by understanding Df as the element of S (S d−1 ) with point-wise definition:

Green Functions and Spline-Admissibility
The next important ingredient for the definition of spherical splines is the notion of Green function of a pseudo-differential operator D. A Green function is a fundamental solution of D, obtained by taking the primitive of some Dirac measure.

Definition 5 (Green Function).
Let D be a pseudo-differential operator as in Definition 4. Consider moreover the Moore-Penrose pseudo-inverse The Green functions of an operator D can be expressed as traces of a certain zonal kernel, called the zonal Green kernel: be Green functions for a pseudo-differential operator D. We have then, for each s ∈ S d−1 : The zonal Green kernel ψ D is moreover such that , and is defined as

8)
where a d is the area of the unit sphere S d−1 and P n,d :

Proof. The proof of Proposition 3 is available in Appendix A.
Observe that the traces of the zonal Green kernel (3.8) are generalised functions, which make sense when integrated against a Schwartz function but which may not admit a pointwise interpretation. When they do admit a pointwise interpretation, we say that the operator D is spline-admissible: The following result provides us with a sufficient condition for a pseudo-differential operator to be splineadmissible:

Proposition 4 (Sufficient Condition for Spline-Admissibility).
Let D be a pseudo-differential operator, with spectral growth order p > d − 1 and zonal Green kernel ψ D . Then we have and hence D is spline-admissible.

Proof. The proof of Proposition 4 is available in Appendix A.
We conclude this section by providing, for the specific case of S 2 , some examples (and non-examples) of spline-admissible pseudo-differential operators.

Example 2 (Common Spline-Admissible Operators on S 2 ).
Consider the specific case d = 3.
• Laplace-Beltrami operator: ∆ S 2 is not spline-admissible. Indeed, its zonal Green kernel is given by [26,Lemma 4.3], which is not defined for r = s. Note that we have p = 2 = d − 1 which shows that the bound on the spectral growth order in Proposition 4 is tight. ∆ 2 S 2 on the other hand is, from Proposition 4, splineadmissible. Indeed, its spectral order p is such that p = 4 > 2 = d − 1. Moreover, its zonal Green kernel admits a closed-form expression [26,Corollary 4.24].

Spherical Splines
We are now in a position to introduce spherical splines. Roughly speaking, spherical splines are primitives (w.r.t. a particular spline-admissible pseudo-differential operator) of Dirac streams with finite innovations: be a set of points on the hypersphere and D a spline-admissible pseudodifferential operator. Then, a D-spline is a generalised function s ∈ S (S d−1 ) such that the linear subspace of D-splines associated with the knot set Ξ M .
Remark 5 (Non-trivial Nullspace and Constrained Amplitudes). Notice that (3.10) implicitly constrains the spline amplitudes α i when D has a nontrivial nullspace. Indeed, for a Schwartz function ϕ ∈ N (D), we have from the definition of D for generalised functions Ds|ϕ = s|Dϕ = 0. However, we also have from the right hand-side of (3.10): , which holds if and only if: The following result characterises the splines associated to a spline-admissible operator in terms of its zonal Green kernel: whereβ m n := s|Y m n , ∀n ∈ K D , m = 1, . . . , N d (n). In particular, when K D = ∅ and p > d − 1, we have

13)
and Proof. The proof of Proposition 5 is available in Appendix A.

Remark 6
Observe from (3.13) that when K D = ∅ the D-splines are linear combinations of zonal functions, and hence very easy to evaluate. This nice feature is due to the fact that we restricted ourselves to isotropic pseudo-differential operators in Definition 4.

Functional Penalised Basis Pursuit on the Hypersphere
In this section, we leverage the tools introduced in Sections 2 and 3 to formulate functional inverse problems on the hypersphere in a common generalised sampling framework. Our formulation allows us to see most spherical approximation problems as specific instances of a generic penalised optimisation problem. We investigate the latter in the specific case of generalised total variation penalties, yielding a functional penalised basis pursuit (FPBP) problem. We define the search space of a given FPBP problem, and characterise its predual in which the sampling linear functionals must be chosen. We moreover provide a representer theorem characterising the form of the solutions to FPBP problems in terms of sparse spherical splines.

Generalised Sampling & Functional Inverse Problems
Most real-life spherical approximation problems take the form of functional inverse problems. In a typical inverse problem formulation, an unknown spherical field 22 f ∈ B is probed by some sensing device, resulting in a data vector y = [y 1 , . . . , y L ] ∈ C L of L measurements. To account for potential inaccuracies in the measurement process, the data vector y is often modelled as the outcome of a random vector Y = [Y 1 , . . . , Y L ] : Ω → C L , fluctuating according to some application-dependent noise distribution. 23 When the measurement process is unbiased, entries of the expectation of Y can be thought of as the ideal measurements which would be obtained in a noise-free environment. In most cases, the ideal measurements are linked to the unknown spherical field by some linear relationship, called generalised sampling [70]: where ·|· : B ×B → C denotes the Schwartz duality product for some duality pair (B, B ) and {ϕ 1 , . . . , ϕ L } ⊂ B are linear sampling functionals modelling the action of the sensing device on the spherical field f ∈ B .
Since most real-life acquisition systems react continuously to variations in their inputs, the dual space B is generally equipped with the weak * topology, so that the linear functionals {ϕ 1 , . . . , ϕ L } modelling the instrument are all continuous (see Section 2. 1.3). In such a formalism, the ideal measurements in (4.1) are often referred to as generalised samples [70] of f . This is because the Schwartz duality product is a generalised evaluation map f |ϕ i = f (ϕ i ), allowing us to interpret the ideal measurements as samples of f evaluated at "points" ϕ 1 , . . . , ϕ L ∈ B. For convenience, it is moreover customary to write the generalised sampling equations (4.1) in terms of a sampling operator 24 Φ : B → C L (see [72,Chapter 5]) defined as: Reformulating (4.1) in terms of Φ yields: The goal of a functional inverse problem is then to recover a spherical field f ∈ B which best explains the observed generalised samples y, given a particular noise and functional data model (4.3). Since the search space B is infinite-dimensional and the data finite-dimensional, this task is fundamentally ill-posed and will in general elicit infinitely many candidate solutions. To discriminate among such solutions, it is customary to resort to regularisation, which can be seen as implementing Occam's razor principle 25 by favouring solutions with simple behaviours. This is typically achieved by means of penalised convex optimisation problems of the form: where 22 The generic appellation "spherical field" is used here to designate any element of S (S d−1 ), such as a function or a measure defined over the sphere. 23 In the absence of noise, Y can simply be chosen as a deterministic random vector. 24 In finite dimensions, the sampling operator is generally called forward, design or sensing matrix. 25 Occam's razor principle is a philosophical principle also known as the "law of briefness" or in Latin lex parsimoniae. It was supposedly formulated by William of Ockham in the 14th century, who wrote in Latin "Entia non sunt multiplicanda praeter necessitatem". In English, this translates to "More things should not be used than are necessary". In essence, this principle states that when two equally good explanations for a given phenomenon are available, one should always favour the simplest, i.e. the one that introduces the least explanatory variables.
• F : C L × C L → R + ∪ {+∞} is a cost functional, measuring the discrepancy between the observed and predicted generalised samples y and Φ(f ) respectively. Common choices of discrepancy measures are discussed in Remark 7. In what follows, we will assume that F is such that for all y ∈ C L , is proper, convex and lower semi-continuous. • |||·||| : B → R + is the dual norm on B , called regularisation norm, which implements Occam's razor principle. Intuitively, elements f ∈ B with small regularisation norm are simple and well-behaved, typically with a finite number of degrees of freedom (df). • Λ : R → R + is some convex regularisation function, strictly increasing on R + . In practice, Λ often takes the form of a monomial t → λt p , where p ≥ 1 and λ > 0. The parameter λ is called regularisation parameter and controls the amount of regularisation by putting the regularisation norm and the cost functional on a similar scale.

Remark 7 (Choosing the Cost Functional).
In practice, the cost functional F is often chosen in one of the following two ways: • Noiseless case: In a noiseless setup, one has full trust in the generalised samples. It is therefore natural to require that any solution of (4.4) be consistent [72,Chapter 5] with the samples at hand, i.e. y = Φ(f ), ∀f ∈ V. This can be achieved by choosing the cost functional as where ι : C L → {0, +∞} is the indicator function with value 0 if z = 0 and +∞ otherwise. Such cost functionals are for example used in interpolation problems. • Noisy case: In a noisy setup, consistency is not desired anymore, as it almost always leads to overfitting the noisy data. In this case, one can use general p cost functionals is typically chosen according to the tail behaviour 26 of the noise distribution [58]. Another approach consists in using the negative log-likelihood of the data y as a measure of discrepancy, i.e. F (y, Φ(f )) = − (y|Φ(f )). This choice makes (4.4) resemble a maximum a posteriori problem with improper prior. In the case of centred Gaussian white noise, both discrepancy measures coincide, yielding the classical quadratic cost functional F (y, Φ(f )) = y − Φ(f ) 2 2 .

Generalised Total Variation Regularisation
Notice that the regularisation norm |||·||| in (4.4) entirely determines the search space B (see (2.2) page 7). Candidate regularisation norms for spherical approximation problems can hence be constructed as follows: 1. Identify interesting functional spaces B ⊂ S (S d−1 ), whose elements are regular enough; 2. Find a norm |||·||| on B such that B admits a predual B and characterise this predual. For example, one could consider choosing B as a generalised Sobolev space of the form: is some pseudo-differential operator as in Definition 4, which we will assume here to have trivial nullspace for simplicity. This is the space of generalised functions regular enough so that their generalised derivatives w.r.t. D are square-integrable. The associated regularisation norm can be shown to be the generalised Tikhonov (gTikhonov) norm Df 2 [64, Section 2.1]. While extensively used in the literature, this notion of regularity may however be considered too restrictive, since the Sobolev space (4.6) is notably not large enough 27 to contain D-splines in cases where D is spline-admissible. This is particularly cumbersome, since the latter are, by definition of spline-admissible operators, ordinary functions and hence relatively well-behaved. To include D-splines, one must consider the larger space 26 The following rule of thumb is proposed in [58]: p should be close to 1 for heavy-tailed distributions, close to 2 for Gaussian-like distributions, and close to +∞ for compactly supported distributions. 27  where M(S d−1 ) denotes the space of spherical regular Borel measures introduced in (2.3). This is the space of generalised functions regular enough so that their generalised derivatives w.r.t. D are Borel measures. When D has trivial nullspace, M D (S d−1 ) can be equipped with the generalised total variation (gTV) norm: The gTV norm can be interpreted as measuring the variations of the generalised derivative Df . Since D is bijective we can consider its inverse which can be shown to define an isometric isomorphism between the spaces (M(S d−1 ), · T V ) and (M D (S d−1 ), · D,T V ). Indeed, we can uniquely write any element f in M D (S d−1 ) as: This isometry implies that the metric space (M D (S d−1 ), · D,T V ) is actually a Banach space, and allows us to characterise its predual: Proof. The proof of Proposition 6 is available in Appendix B.
We have hence established the duality pair showing that the gTV norm · D,T V is actually a dual norm which can hence be used as regularisation norm in (4.4). For such a choice of regularisation norm, it is customary to set the regularisation function to Λ(t) = λt with λ > 0. This yields the following optimisation problem: where the sampling operator Φ : . We call (4.11) a functional penalised basis pursuit (FPBP) problem. Because of the gTV regularisation norm, solutions to FPBP problems will tend to have few variations in their generalised derivatives. When D is spline-admissible and invertible, such functions are templated by the D-splines, which, from Proposition 5, take the form where {r 1 , . . . , r M } ⊂ S d−1 and ψ D is the zonal Green kernel of D. For such functions, we have indeed Hence D-splines with small 1 norm in their coefficients will also have small gTV norm. It is then expected for solutions f ∈ V to take the form of D-splines (4.12) with few innovations M . In Theorem 2 we will show that extreme points of the solution set V indeed take such a form, with M < L.

Representer Theorem
We now establish a representer theorem characterising the solution set of the FPBP problem (4.11). For simplicity, we state this theorem in the case where the pseudo-differential operator D used to define the gTV regularisation norm has a trivial null space (K D = ∅). While this may seem like a limiting assumption for practical purposes, we show that pseudo-differential operators with nontrivial nullspaces (such as the Laplace-Beltrami operator ∆ S d−1 ) can be brought into the scope of the representer theorem if properly regularised on their nullspace. Our theorem shows that the solution set of an FPBP problem is nonempty and the weak * closed convex-hull of extreme points taking the form of D-primitives of Dirac streams -i.e. D-splines when D is spline-admissible-with less innovations than available measurements. This result can be seen as an extension to the spherical setup of [34, Theorem 4].

Theorem 2 (Representer Theorem).
Consider the following assumptions: is some pseudo-differential operator with trivial nullspace and Green func- (4.5); A6 λ is a positive regularisation constant. Then, for any y ∈ C L , the solution set of the FPBP problem is nonempty, and the weak * closed convex hull of its extreme points. 28 The latter are moreover necessarily of the form:

15)
where ψ D is the zonal Green kernel of D.
Proof. The proof of Theorem 2 is available in Appendix B.

Remark 8
Theorem 2 allows us to write the solution set V of (4.13) as the weak * closed convex-hull of a (potentially infinite) set of extreme points δV ⊂ V: where the extreme points f ∈ δV are, when D is spline-admissible, D-splines of the form: . Then, the regularised operator L γ = D + γΠ N (D) with γ > 0 is an injective pseudo-differential operator. The latter can hence be used to define a gTV regularisation norm. The FPBP problem associated to this choice of gTV regularisation norm is then: where the sampling operator Φ : . It is then possible to invoke Theorem 2 to show that the solution set of (4.17) is nonempty and the weak * closed convex hull of extreme points of the form , as requested for spherical splines associated to a pseudo-differential operator with nontrivial nullspace (see discussion in Remark 5).

Compatible Sampling Functionals
Note that assumption A3 of Theorem 2 requires the sampling functionals {ϕ 1 , . . . , ϕ L } to be compatible with the regularising operator D, in the sense that they must be included in the predual C D (S d−1 ) of the search space M D (S d−1 ). In practice, this assumption may not always be easy to verify. In this section, we therefore provide sufficient conditions on the spectral growth order of D for Dirac measures and square-integrable functions -which are the two most common types of sampling functionals encountered in practice-to be compatible with a given pseudo-differential operator D.

Proposition 7 (Compatibility of Dirac Measures).
Let D be a pseudo-differential operator as in Definition 4 with spectral growth order p > d − 1, and (M D (S d−1 ), D · T V ) the space defined in (4.7) equipped with the gTV norm. Then, all Dirac measures are included in the predual Proof. The proof of Proposition 7 is available in Appendix B.

Proposition 8 (Compatibility of Square-Integrable Functions).
Let D be a pseudo-differential operator as in Definition 4 with trivial nullspace and spectral growth order p > (d − 1)/2, and (M D (S d−1 ), D · T V ) the space defined in Eq. (4.6) equipped with the gTV norm. Then, all square-integrable functions are included in the predual Proof. The proof of Proposition 8 is available in Appendix B.

Practical Implications
Theorem 2 has profound practical implications, which we now outline. First, we use (4.15) to derive a discretisation scheme allowing us, under mild assumptions on D, to approximate with arbitrary precision functions in the solution set V of a given FPBP problem. The obtained discretisation scheme is moreover canonical to the FPBP problem at hand, as it transforms it into a traditional discrete penalised basis pursuit (PBP) problem [69]. Depending on the nature of the convex cost functional, we propose to solve this discrete PBP problem with two state-of-the-art first order proximal algorithms [52], namely the accelerated proximal gradient descent method [8] or the primal-dual splitting method [16]. We conclude by discussing suitable choices of regularising pseudo-differential for practical purposes. We introduce Wendland and Matérn operators, whose Green functions have simple closed-form expressions and good localisation in space. The latter are moreover closely related to the popular Sobolev operators

Canonical Discretisation of FPBP Problems
In this section, we use Theorem 2 to derive a canonical search space discretisation scheme for the FPBP problem (4.13). The idea of search space discretisation schemes is to restrict the search space B of an optimisation problem to a well-chosen finite-dimensional subspace of the form span{ψ 1 , . . . , ψ N } ⊂ B . In the particular case of (4.13), Theorem 2 tells us that the solution set V is the closed convex-hull of D-splines 30 with sparse 31 innovation sets (see Remark 8). This essentially means that any non limit point 32 of V is itself a D-spline, as finite convex combination of D-splines. Our goal is therefore to find a finite family of functions capable of approximating well enough any arbitrary D-spline. To this end, it will help to characterise the functional space in which D-splines naturally live, called the native space [48,Chapter 6]. When D is a positive-definite operator with spectral growth order p > d − 1, the native space is the generalised Sobolev space H D 1/2 (S d−1 ) associated to the Hermitian square-root D 1/2 of D:

Proposition 9 (Native Space for D-splines).
Let D be a positive-definite pseudo-differential operator with spectral growth order p > d − 1. Then the Hermitian square-root D 1/2 of D is a pseudo-differential operator with Fourier coefficients { D n , n ∈ N} ⊂ R + .

Moreover, the generalised Sobolev space H D 1/2 (S d−1 ) is a reproducing kernel Hilbert space (RKHS) which contains all D-splines.
Proof. The proof of Proposition 9 is available in Appendix C. 30 Of course, under the assumption that the operator D used to define the gTV regularisation norm is spline-admissible. 31 i.e. with cardinality bounded by the number of available measurements. 32 As explained in Remark 8, the limit points are not necessarily splines. We are however not interested in approximating such limit points, since they may have infinitely many df.   The next proposition, adapted from [48, Theorem 6.36], shows the error incurred by approximating elements of H D 1/2 (S d−1 ) -and hence in particular arbitrary D-splines of interest here-with D-splines with fixed knot set Ξ N ⊂ S d−1 of size N .

Consider a knot set Ξ
Let further D denote a positive-definite, spline-admissible pseudo-differential operator with spectral growth order p > d+1 2 and S D (S d−1 , Ξ N ) be the space of spherical D-splines associated to the knot set Ξ N . Then, for every function

2)
where h D 1/2 := D 1/2 h, D 1/2 h S d−1 , L D > 0 is a known 33 positive constant depending only on D and Proof. The proof of Proposition 10 is available in Appendix C.
Notice that the approximation error in Proposition 10 is bounded by the nodal width (5.1) of the knot set, which can be interpreted geometrically as the largest distance from an arbitrary point on the sphere to the knot set Ξ N (see Fig. 2a). D-splines with knot sets minimising this quantity for a fixed number of knots N will hence yield the smallest approximation error. From the geometric interpretation of the nodal width, it is easy to see that knot sets with minimal nodal width distribute their knots uniformly over the hypersphere. Unfortunately, distributing points uniformly over S d−1 is a notoriously hard problem for d > 2 [36], making uniform knot sets inpractical. For d = 3 however, it is possible to obtain quasi-uniform knot sets with quasi-optimal nodal widths [36]. An example of quasi-uniform knot set is the Fibonacci lattice [32,36] described in the subsequent example. In [36], the authors provide a comprehensive list of quasi-uniform knot sets easy to generate in practice. For each knot set, the asymptotic behaviour of the nodal width is assessed, either numerically or theoretically.

Example 3 (Fibonacci Lattice).
In nature, many plant leaves are arranged according to phyllotactic spiral patterns, which are well modelled by the Fibonacci lattice. Points in the Fibonacci lattice are arranged uniformly along a spiral pattern on the sphere linking the two poles (see Fig. 2b). The lattice can very easily be generated from the following formula: with n = 1, . . . , N . It can be shown [36] that if the knot set Ξ N is constructed according to the Fibonacci lattice (5.3), then the nodal width is quasi-optimal and approximately given by Θ Ξ N 2.728/ √ N .
Observe that the nodal width of the Fibonacci lattice tends to zero as the number of knots N grows to infinity. This is a general behaviour of quasi-uniform knot sets [36]. Consequently, the uniform approximation error (5.2) in Proposition 10 tends to zero as the number of knots tends to infinity. In other words, any element of H D 1/2 (S d−1 ) can be approximated arbitrarily well by D-splines with quasi-uniform knot sets -called quasiuniform spherical splines, provided a sufficient number of knots. In light of this discussion, we therefore propose to discretise FPBP problems by restricting their search spaces to subspaces spanned by quasi-uniform D-splines. The following theorem shows that the solutions to FPBP problems restricted this way can be obtained by solving a discrete penalised basis pursuit (PBP) problem.

Consider the notations and assumptions A1 to A6 of Theorem 2. Consider additionally the following:
C7 D is spline-admissible and positive-definite;

Remark 10 (Canonical Discretisation Scheme).
Notice that the discretisation scheme chosen in Theorem 3 is canonical w.r.t. the gTV norm induced by the pseudo-differential operator D. Indeed, it conveniently transforms the gTV norm D· T V into a discrete 1 norm in (5.6). As detailed in the proof, this is because the basis functions {ψ D ( ·, r n ), n = 1, . . . , N } used in the discretisation are Green functions of the operator D. Had the basis functions been chosen differently, such simplifications would not have been possible, hence making the discrete optimisation problem (5.6) considerably more difficult to solve in practice. N ). The bound in (5.2) can be used in practice to set N . Indeed, one can choose N such that the relative approximation error falls below an acceptable accuracy threshold for any h ∈ H D 1/2 (S d−1 ), hence allowing us to approximate the solutions of the FPBP problem with controlled error.

Remark 12 (Practical Implementation).
Theorem 3 provides us with a simple two-step procedure for computing a practical solution to the restricted FPBP problem (5.5): 1. Minimise (5.6) using one of the algorithms described in Section 5.2 and obtain a solution u ∈ U. Since the interpolating functions ψ D ( r, r n ) are zonal, such an interpolation can be carried out very efficiently in practice (and even more so when ψ D has compact support, as investigated in Section 5.3).

Remark 13 (Form of the Solutions).
Applying [69, Theorem 6] to the discrete PBP problem (5.6) shows that U is convex and compact with L-sparse extreme points. From the bijection (5.7), it implies in turn that V = Ψ(U) is the closed convex-hull of extreme points taking the form of sparse D-admissible spherical splines with at most L non-zero amplitudes: where · 0 denotes the " 0 norm", counting the number of non-zero elements in a vector. Solutions of the restricted FPBP problem (5.5) behave hence very similarly as the ones of the unrestricted FPBP problem (4.13) considered in Theorem 2.

Optimisation Algorithms
In this section, we propose to solve the discrete PBP problem (5.6) by means of provably convergent fully-split proximal iterative methods [52], which only involve simple matrix-vector multiplications and proximal steps.
We treat the most general case where the cost function F is proximable but not necessarily differentiable with the primal-dual splitting method (PDS) introduced by Condat in his seminal work [16]. In the simpler (yet prevailing in practice) case where F is also differentiable and with β-Lipschitz continuous derivative, we leverage an optimal first-order method called accelerated proximal gradient descent (APGD) [8,52], with faster convergence rate than the PDS method. For the sake of simplicity and without loss of generality, we consider the real case only, where x ∈ R N and y ∈ R L -i.e. the coefficients and data vector are assumed real. The complex case, less common in practice, can be handled similarly by identifying C N and C L with R 2N and R 2L respectively and reformulating the optimisation problem (5.6) in terms of real operations only, using the techniques described in [59, Section 7.8].

Algorithm 2:
The accelerated proximal gradient descent algorithm for solving smooth PBP problems (5.8).
1: procedure A P G D _ P B P(y, G, λ, ε, τ, d, x0) admits a closed-form formula or can be efficiently computed with high accuracy. Closed-form expressions for the proximity operators of commonly-used data-fidelity terms are available in [64,Chapter 7,Section 5] and are summarised in Table 2. To solve (5.8), we apply the generic primal-dual splitting algorithm of [16, Algorithm 3.1] to our particular case and obtain Algorithm 1. As explained in [16], Algorithm 1 solves jointly the primal problem (5.8) and its associated dual using an equivalent saddle-point formulation of (5.8) The proximal operator of the convex conjugate (5.11) is given, for every σ > 0, by Moreau's identity [52] prox σF * The saddle-point problem (5.10) allows the complicated composite term F (y, bmGx) in (5.8) to be split as a sum of two simpler terms F * y (z) + Gx, z , which are easier to optimise. Estimates (x,ẑ) of the primal and dual variables are then obtained by iterating rows 5 and 6 of Algorithm 1 until convergence, starting from an initial guess (x 0 , z 0 ) ∈ R N × R L . Notice that the update equations 5 and 6 are not too computationally intensive since they involve only proximal operators -namely the soft-thresholding operator [52,Chapter 6] and (5.12)-and matrix-vector multiplications between G, its trasnpose G T and the primal/dual variables. Regarding the stopping criterion, we chose to stop Algorithm 1 when the relative improvement in the primal variable x n falls under a certain pre-determined threshold ε > 0. This stopping criterion, which monitors improvement of the primal variable, is motivated by the fact that we are in this context only interested in solving the primal problem (5.8). As reported in [16], the convergence of the iterates (x n , z n ) towards a solution (x,ẑ) of (5.10) as n tends to infinity depends on the choice of the hyper-parameters τ > 0 and σ > 0, which control respectively the amount of improvement in the primal and dual variable at each iteration. To ensure convergence, one must have [16,Theorem 3.3]: στ G 2 2 ≤ 1. In practice, the speed of convergence is improved by choosing σ and τ as large as possible and relatively well-balanced. Consequently, we chose in our implementation of Algorithm 1 στ = 1/ G 2 2 and τ = σ, yielding: σ = τ = 1/ G 2 .

Accelerated Proximal Gradient Descent Method for Smooth Cost Functionals
In this section, we consider the special case of smooth PBP problems (5.8), where the cost functional E y (x) := F (y, Gx), ∀x ∈ R N , is assumed differentiable and with β-Lipschitz gradient ∇E y : R N → R N : for some constant β ≥ 0. While trivially in the scope of Algorithm 1, such problems are however more efficiently optimised by means of Algorithm 2, a specific instance of the generic accelerated proximal gradient Name F (y, z), y, z ∈ R L prox τ F (y,·) (z), τ > 0, z ∈ R L Useful for Exact Match ι(z − y) y Noiseless data, interpolation.
Strong outliers and heavy-tailed noise distributions.
Count data with Poisson noise. TA B L E 2 Common data-fidelity functionals and their associated proximity operators. descent (APGD) method [5,13]. Indeed, it has been shown in [5] that, with 0 < τ ≤ 1/β and d > 2, APGD achieves the following optimal convergence rates: In other words, the iterates {x n } n∈N of APGD form a norm-convergent sequence, minimising the objective functional at a rate o(1/n 2 ). In our practical implementation of Algorithm 2, we chose the step size τ as large as possible τ = 1/β and set d to the value d = 75. The latter choice was motivated by the results reported in [45,46], which show significant practical acceleration for values of d in the range [50, 100].

Practical Pseudo-differential Operators
One key insight of Theorem 2 is that the solutions of the FPBP problem (4.13) are D-splines. As such, they inherit all their analytical properties from the zonal Green kernel associated to the pseudo-differential operator D used in the gTV regularisation term. For practical purposes, it is hence important to choose the pseudo-differential operator in agreement with the desired analytical properties of the solutions. In Example 1, we have for example introduced Sobolev operators D β := [Id−∆ S d−1 ] β , whose associated zonal Green kernels reproduce, for β > (d − 1)/2, the Sobolev spaces [64,Lemma 5.5] The latter are nested RKHSs, , ∀γ ≥ β > (d − 1)/2, containing functions with β-increasing degrees of smoothness. For example, a function f ∈ H β (S d−1 ) for β ∈ N is differentiable up to order β, with all its derivatives up to that order square-integrable. Sobolev operators seem hence particularly well-suited to enforce a certain degree of smoothness in the solutions of FPBP problems. Unfortunately, the Sobolev zonal Green kernel, given in (3.9), does not admit a convenient closed-form expression, hence making Sobolev operators -and consequently Sobolev splines-very cumbersome to work with in practice.
In this section, we hence design two pseudo-differential operators, namely the Matérn and Wendland operators, with convenient properties for practical purposes. Their zonal Green kernels admit indeed a simple closed-form expression and are well-localised in space. They have moreover a trivial nullspace (hence falling into the scope of Theorem 2) and their Fourier symbols are closely related to those of Sobolev operators, making them suitable for enforcing a certain degree of regularity in the solutions of FPBP problems. Unlike the standard differential operators from Example 1 which are defined via their Fourier symbols, the Matérn and Wendland operators are implicitly defined from their zonal Green kernel, obtained by restricting scaled radial kernels to the hypersphere [30,44]: 13) where 0 < ≤ 1 is a scale parameter and Ψ β : for some positive constants c 1 , c 2 . Using ( 5.15), it is easy to show thatψ β [n] > 0, ∀n ∈ N, andψ β [n] = Θ(n −p ) with p = 2β > d − 1. From (5.14) and Proposition 3, we can hence identify the kernel ( 5.13) with the zonal Green kernel of the spline-admissible 34 pseudo-differential operator D β with Fourier symbol {(ψ β [n]) −1 , n ∈ N}. Still thanks to (5.15), it is moreover possible to show [30,44] that, for a given β > contains therefore exactly the same elements as the Sobolev space In conclusion, the zonal Green kernels ψ β in (5.13) reproduce the Sobolev space when the latter is equipped with the inner product : m nĝ m n , and can hence be used as a replacement for the Sobolev zonal Green kernel to build practical spherical splines. In the subsequent sections, we give examples of functions Ψ β in (5.13), yielding respectively the Matérn and Wendland kernels, depicted in Fig. 3.

Remark 14 (About the Scale Parameter ).
For a fixed β > (d − 1)/2, we have seen that the kernels ψ β for 0 < ≤ 1 all reproduce the Sobolev space H β (S d−1 ). As such, one could question the relevancy of this parameter in the construction ( 5.13) proposed in [30,44]. Such doubts are however dispelled when considering approximation errors made by projecting functions from H β (S d−1 ) into specific spline spaces S D β (S d−1 , Ξ M ) with fixed knot sets Ξ M ⊂ S d−1 . Indeed, it was shown in [30,44] that the approximation error is proportional to the quantity (Θ Ξ M / ) β , where Θ Ξ M > 0 is the nodal width of Ξ M defined in (5.1) page 22. As such, choosing at least as large as the nodal width Θ M helps in reducing the approximation error.

Wendland Operators
Wendland operators are obtained by choosing Ψ β in (5.13) as where k(β) = β − d/2 ∈ N and φ d,k : R + → R, k ∈ N are the Wendland's functions. The latter are constructed by repeatedly applying and integral operator I to Askey's truncated power functions φ l : where I is given by : (Iφ)(r) = +∞ r tφ(t) dt, r ≥ 0. It can be shown [81] that the Wendland's functions can be represented as: φ d,k (r) = (1 − r) l+k + p k,l (r), r ≥ 0, where p k,l is a polynomial of degree k whose coefficients depend on l. These functions are compactly supported piecewise polynomials with support [0, 1] which yield positive definite radial kernels in R d with minimal degree and prescribed smoothness [15,81]. They have been introduced by Wendland [77] in the context of high-dimensional approximation. For d ≥ 3, Wenland's radial kernels φ d,k(β) ( x − y R d ), x, y ∈ R d were moreover proven [15,81] to reproduce Sobolev spaces of the form H β+1/2 (R d ). In the case d = 2, similar results can be obtained via a generalisation of the Wendland's functions called the missing Wendland's functions [81]. Examples of Wendland zonal Green kernel and their associated Sobolev spaces are provided in [64, Figure 8.2] for d = 3.

Sparse Gram Matrices
In many experimental setups, the rapid decays of the Matérn and Wendland Green kernels cause the Gram matrices G in Theorem 3 to be sparse, allowing them to be conveniently implemented as such in Algorithms 1 and 2. For example, consider Theorem 3 in the context of the pseudo-differential operator associated to the Wendland kernel ψ β and sampling functionals {ϕ l = δ ρ l , l = 1, . . . , L} with sampling directions {ρ l , l = 1, . . . , L} ⊂ S d−1 . Then, the entries of the Gram matrix G ∈ R L×N are given by G ln = ψ β ( ρ l , r n ), l = 1, . . . , L, n = 1, . . . , N. It is then easy to see that, for small enough and the point sets {ρ l , l = 1, . . . , L} and {r n , n = 1, . . . , N } reasonably well distributed over S d−1 , most of the entries of G are null. This behaviour extends to many spatially-localised measurement processes such as local averages or filtrations.

Test Cases
In this section, we test our FPBP spherical approximation framework on two real-life spherical approximation problems originating from the fields of meteorology and forestry respectively. In both applications, various sampling functionals and data-fidelity functionals are investigated, demonstrating the versatility and genericity of our approximation framework. Interactive versions of the spherical maps provided in this section are moreover available at the links respectively.

Sea Surface Temperature Anomalies
In this example, we propose to establish a global map of sea surface temperature anomalies for the month of January 2011. Such maps are used in environmental sciences to monitor global climate change as well as manage the population of marine species and ecosystems particularly sensitive to fluctuations in water temperature. The data consists in 6745 simulated anomalies sampled at various points across the globe by drifting floats of the ARGO fleet [4,42], and corrupted by Gaussian white noise. The map is obtained by solving the FPBP problem (4.11) with the indicator function of an 2 -ball as cost functionals (legitimised by the Gaussian noise assumption). The latter being non smooth but proximable, we solve the discrete PBP problem resulting from the canonical discretisation of (4.11) by means of the PDS Algorithm 1. For comparison purposes, we moreover consider discretising the FPBP problem by means of the non canonical domain discretisation scheme described in [64, Section 2 of Chapter 6]. Finally, we show the benefits of gTV regularisation w.r.t. the generalised Tikhonov (gTikhonov) regularisation discussed in [64, Chapter 5].

Background
Sea surface temperature is usually defined as the temperature of the one millimetre upper layer of the oceans, reflecting the thermal energy stored in the latter. Sea surface temperatures departing from long-term averages (typically 12 years) are called temperature anomalies. While some anomalies are transient and simply due to ocean circulation patterns (such as El Niño and La Niña), other persist over many years, and can hence be potential indicators of global climate changes [57]. Sea surface temperature anomalies are also very important in the monitoring and management of marine ecosystems particularly sensitive to water temperature fluctuations. For example, above-average sea water temperatures can result in coral bleaching, a phenomenon is suspected to be responsible of the disappearance of between 29 and 50% of the Great Barrier Reef in 2016 [38]. Similarly, high water temperatures are contributing factors to harmful algal blooms, which lead to oxygen depletion in natural waters, with disastrous consequences on marine life [35].

Data Description
For this experiment, we obtained simulated sea surface temperature anomalies by sampling at 6745 locations the global map of sea surface temperature anomalies produced by NASA's Aqua satellite [53] in January 2011 [3]. These anomalies, which serve here as a ground truth, were derived by comparing the sea surface temperatures recorded in January 2011 by NASA's Aqua satellite to the 12-year-averaged historical data for the same month collected by the Pathfinder satellite [75] between 1985 and 1997. The resulting map is depicted in Fig. 4a. The 6745 sampling locations were chosen as the positions of all floats from the Argo fleet [4] during the month of January 2011, obtained at this link [41] and curated by the authors of [42]. Argo is an international program, initiated in the early 2000's, that uses 4000 drifting floats to monitor temperature, salinity and currents in the Earth's oceans. The samples were further polluted by Gaussian white noise with PSNR 10 dB. The resulting samples are plotted in Fig. 4b.

Data Model
Let f : S 2 → R denote the sea surface temperature anomaly function defined at every location on the globe (modelled as the unit sphere S 2 ). Since temperatures typically have smooth variations at the surface of the Earth, we assume f to be an element of some Sobolev space H β (S 2 ), with β > 1. The L = 6745 measurements {y 1 , . . . , y L } ⊂ R correspond here to noisy anomaly records collected by the Argo floats across the globe at locations Assuming a Gaussian white noise model, the float records can moreover be modelled as realisations of independent Gaussian random variables {Y 1 , . . . , Y L }, centred around the true temperature anomalies obtained by ideal spatial sampling of f : 1) where N denotes the Gaussian distribution and σ 2 > 0 is the (unknown) noise variance, assumed uniform.
which fits well in our generic data model (4.1) page 16, if we choose the sampling functionals as Dirac measures δ pi . Note moreover that spatial sampling is indeed well-defined for f since for β > 1 the Sobolev space H β (S 2 ) is an RKHS.
• D 2.5 : S (S 2 ) → S (S 2 ) is the pseudo-differential operator associated to the Matérn zonal Green kernel with fixed scale 0.017 -corresponding to an angular resolution 37 of approximately 4 • : Note that Φ is well-defined over M D2.5 (S 2 ) since the Matérn kernel ψ 2.5 has continuous traces, and hence from Proposition 6, the predual C D2.5 (S 2 ) contains all Dirac measures. We consider discretising (6.2) by means of two discretisation schemes: the canonical quasi-uniform splinebased scheme described in Section 5.1 and the non canonical domain discretisation scheme described in [64, Section 2 of Chapter 6], which amounts to pixelating the sphere by means of the Fibonacci equidistributed spherical point set.
• Canonical Spline-Based Discretisation: From the discussion in Section 5.1, solutions of the optimisation problem (6.2) can be approximated as quasi-uniform Matérn splines: where N = 7386, Ξ N = {r n , n = 1, . . . , N } ⊂ S 2 is a Fibonacci lattice (see Example 3) and x = [x 1 , . . . , x N ] ∈ R N is some solution to the discrete optimisation problem: x ∈ arg min The matrix G ∈ R L×N is moreover given by G ln = ψ 2.5 We solve (6.4) using the PDS Algorithm 1. Since the Matérn kernel is spatially localised, the matrix G is in practice sparse and is implemented as such in Algorithm 1 for computational and storage efficiency. • Non Canonical Domain Discretisation: For comparison purposes, we also discretising recovering ( 6.2) by means of the domain discretisation schemes described in [64, Section 2 of Chapter 6]. To this end, we consider the restriction f ∈ R N of f to the discrete set of directions Ξ N = {r n , n = 1, . . . , N } ⊂ S 2 , where Ξ N is the same Fibonacci lattice as before. This can be interpreted as pixelating the spherical domain. Due to its appealing conceptual simplicity, this is the prevailing approach in most applications. Unfortunately, such a discretisation necessarily incurs some (difficultly assessable) approximation error, sometimes even when the number of points composing the tessellation graph tends to infinity [12,76]. We recover f via the discrete domain counterparts of (6.2), given in this case by: is often impossible in practice due to the size of the latter. Moreover, such a choice of discrete pseudodifferential operator would make [64,Algorithm 7.9] used to solve (6.5) much more computationally and memory intensive since the latter could no longer perform matrix-vector products involving D with the matrix free [64,Algorithm 7.11]. Indeed, this algorithm was designed for discrete pseudodifferential operators taking the form of polynomials of L, and D = (I N + L) 2.5 is not a polynomial in L.
gTikhonov Regularisation For comparison purposes, we finally consider recovering f by means of the following functional penalised Tikhonov (FPT) problem (see [64,Chapter 5] for details): where Φ : H D 2.5 (S 2 ) → R L is this time given by and H D 2.5 (S 2 ) is the Hilbert space defined in (4.6) (see [64, Chapter 5, Section 2.1] for a detailed analysis on this space). From [64,Theorem 5.3], the solution to the optimisation problem (6.6) is unique and given by: where * denotes the spherical convolution operator 38 (see Definition 3) andx = [x 1 , . . . ,x N ] ∈ R N is the unique solution to the discrete optimisation problem (see [34,Section V]): Entries of the matrix H ∈ R L×L are moreover given by H lk = ψ 2.5 * ψ 2.5 We solve (6.7) using [64,Algorithm 7.3]. Again, since the Matérn kernel is spatially localised, the matrix H is in practice sparse and is implemented as such in the iterations of the numerical solver for computational and storage efficiency.

Results
The various estimates of the sea surface temperature anomaly function obtained by solving (6.2), (6.6) and ( 6.5) are provided in Figs. 5a, 5b and 6 respectively. The smoothing induced by the gTikhonov regularisation is clearly visible in Fig. 6. In contrast, the continuous and discrete gTV estimates in Figs. 5a and 5b capture far more of the fine fluctuations in the true anomaly map: see for example the eastern coast and southern tip of Africa, as well as the regions surrounding Greenland, Japan or the Indian ocean. This time, both estimates exhibit much more similar features. The discrete gTV estimate in Fig. 5b however appears rougher than the continuous gTV estimate in Fig. 5a due to the clearly visible pixelisation artefacts.

Wildfires
In this example, we propose to establish global density maps of wildfires across the globe for the year 2016. Wildfire density maps allow scientists to better understand atmospheric chemistry and its impact on climate.
The data used consists of fire counts recorded by NASA's Aqua and Terra satellites. The resolution of the raw data is moreover deliberately reduced by a factor of 3 by binning the counts in patches of angular size 1.5 • × 1.5 • . The goal of this resolution reduction is to show that the lost resolution can be successfully recovered by spline-based approximation. Two density maps are obtained by solving with Algorithms 1 and 2 respectively two FPBP problems: one with a least-squares cost functional, and one with a KL-divergence cost functional -ideally suited for count data with Poisson-like distribution [64,Chapter 7].

Background
Home to 80% of terrestrial species, forests host most of Earth's biodiversity and contribute largely to its preservation [28]. Indeed, tree canopies play a crucial role in the regulation of the water cycle, creation of litter and exchange of energy between the ground and the atmosphere, which are all contributing factors to the overall good health of an ecosystem [17]. Changes to forest habitats can lead to the extinction of endangered species and disrupt the entire food chain equilibrium. But forests are more than animal shelters: they also protect humans from natural hazards such as floods or droughts [28]. In addition, forests represent natural and cost-efficient solutions for mitigating climate-change, and can provide 30% of the solution for keeping global warming below 2 • C [28]. In order to enlighten policy-makers and hopefully stem the current environmental crisis, it is hence crucial to monitor deforestation and understand its causes, such as agricultural conversion, commodity production, urbanisation, illegal logging or fires. Fires deserve perhaps a special attention as they contribute largely to the overall greenhouse gas emissions, with an estimated contribution of 30% to the net annual increase in the concentration of atmospheric carbon dioxide [29,66].

Data Description
For this experiment, we worked with the Fire [39] data product provided by NASA for the year 2016. The dataset, available at [66], provides monthly counts of active fires at a resolution of 0.1 degrees square. The counts are estimated from multispectral images captured by the MODIS aboard NASA's Terra and Aqua satellites [80]. For a better visual appreciation of the results, we aggregated the data from every month of 2016 (see Fig. 7a) and binned it to a lower resolution of approximately 1.5 degrees square (this corresponds to a reduction of resolution by a factor 3). We further corrupted the binned data with Poisson noise, a common noise model for count data. The processed data is displayed in Fig. 7b.

Data Model
In this experiment, one wishes to estimate the spatial density of fires f at the surface of the Earth, using counts {y 1 , . . . , y L } ⊂ N from non-overlapping equal-angle patches {B 1 , . . . , B L } ⊂ S 2 tiling the sphere. Since the data at hand consists of counts, a Poisson noise model is a suitable choice. This can be achieved by modelling fire locations as random occurrences of some spatial Poisson point process [14,67], often used in spatial statistics to model random spatial scattering of objects [56,62]. The distribution of a Poisson point process is entirely determined by its intensity measure Λ ∈ M(S 2 ), which counts the expected number of objects (in this case fires) observed in a given region of the sphere. The sought spatial density of the Poisson process is then given -assuming it exists-by the Radon-Nikodym derivative [61] f : S 2 → R of Λ, also called density or intensity function of the point process. Similarly as in Section 6.1, we assume f to be an element of the RKHS H β (S 2 ) for some β > 1. With such a formalism, the reported counts can then be seen as realisations of L = 24000 independent Poisson random variables {Y 1 , . . . , Y L }: with rates λ i > 0 given by: and where χ Bi ∈ L 2 (S 2 ) are the characteristic functions of the surveyed patches {B i , i = 1, . . . , L} ⊂ S 2 . We can reinterpret the rates in ( 6.9) as generalised samples of f : hence yielding a data model falling into the scope of the generalised sampling framework (4.1).

KL-Divergence Cost Function
Since the data consists of counts, we consider recovering f by means of the following FPBP problem: • λ is a strictly positive constant, tuned manually.
• D 3,1 : S (S 2 ) → S (S 2 ) is the pseudo-differential operator associated to the Wendland zonal Green kernel with a scale 0.026 -corresponding again to an angular resolution 39 of approximately 1 • : 39 The angular resolution is measured here as the full width at half maximum (FWHM) of the Wendland kernel.
Note that since the sampling functions are square-integrable and D 3,1 is invertible and with spectral growth order p = 2.5 > (d − 1)/2 = 1, we can use Proposition 8 to show that {χ Bi , i = 1, . . . , L} ⊂ C D 3,1 (S 2 ) and hence the sampling operator Φ is indeed well defined. Again, we consider approximating the solutions of (6.10) by means of quasi-uniform Wendland splines: where N = 210216, Ξ N = {r n , n = 1, . . . , N } ⊂ S 2 is a Fibonacci lattice (see Example 3) and x = [x 1 , . . . , x N ] ∈ R N is some solution to the discrete optimisation problem: x ∈ arg min From Theorem 3, the matrix G ∈ R L×N is moreover given by We solve (6.11) using Algorithm 1, one again leveraging the sparse structure of G.

Quadratic Cost Function
For sufficiently large rates, the Poisson distribution can be well approximated by a Gaussian distribution. This motivates the use of a least-squares data functional in ( 6.10), yielding: f ∈ arg min The discrete optimisation problem (6.11) then becomes: x ∈ arg min which can be solved efficiently using Algorithm 2.

Results
The fire density maps estimated with both recovery strategies (6.10) and ( 6.12) are provided in Figs. 8a and 8b respectively. We observe that the recovered estimates (with KL-divergence and least-squares cost functions respectively) have a much higher resolution than the original corrupted binned counts in Fig. 7b, recovering almost the natural resolution of the unprocessed data in Fig. 7a. We observe however that the KL-divergence cost function seems to better recover regions with low intensity count, such as the Arabian Peninsula or Australia. In contrast, the least-squares data-fidelity functional has a tendency of yielding sparser density estimates, where all low intensity count regions are set to zero. This behaviour was already observed in image restoration under Poisson noise [74].

Conclusion
We have proposed a functional penalised basis pursuit approximation framework for functional inverse problems on the hypersphere. Unlike ad-hoc discrete methods traditionally favoured by practitioners, such problems present the advantage of being directly formulated in the continuous spherical domain, the natural domain for analogue spherical signals encountered in nature. In Theorem 2 we showed that, if regularised by means of gTV regularisation, functional inverse problems admitted finite dimensional solutions, which could hence be estimated in practice despite being defined over a continuous domain. More precisely, we showed in that the solutions were convex combinations of spherical D-splines with sparse innovations, i.e. less than available measurements. This result inspired in Section 5.1 a canonical search space discretisation  (a) Sparse spline approximation of the fire density function with KL-divergence data-fidelity.
(b) Sparse spline approximation of the fire density function with least-squares data-fidelity. scheme with controlled approximation error (see Theorem 3). In Section 5.2, we moreover proposed algorithmic solutions adapted from the primal-dual splitting method and APGD to solve the discrete optimisation problems resulting from the canonical discretisation scheme. The proposed algorithms were shown to be computationally efficient, provably convergent and compatible with most common cost functionals -including non-differentiable ones, such as the KL-divergence often used in the context of Poisson noise. In Section 5.3 finally, we introduced the last ingredient to our spherical approximation framework, namely the Wendland and Matérn splines, particularly convenient for practical purposes. We concluded in Section 6 and tested our continuous-domain spherical approximation framework and novel algorithms on two datasets from the fields of meteorology and forestry. The sampling functionals, cost functions and regularisation strategies considered in each case were diverse, showing the versatility of both the theoretical framework and algorithmic solutions. In the meteorology example, we moreover illustrated the superiority of continuous-domain vs. discrete-domain recovery, both in terms of accuracy and resolution. This superiority was partially explained by the fact that continuous-domain methods could, unlike their discrete-domain counterparts, directly process the irregular spatial samples without resorting to ad-hoc gridding steps.
In conclusion, we hope that the contributions of this paper will spark interest among the community of practitioners, and give rise to the development of new reconstruction algorithms for spherical approximation problems. As future work, we plan on extending the framework to random, vector-valued or time-varying spherical fields. We also would like to investigate the sliding Frank-Wolfe algorithm proposed in [22] for solving FPBP problems without resorting to the spline-based discretisation scheme proposed in this paper.

A Proofs of Section 4
Proof of Proposition 2 First, notice that since |D n | = Θ(n p ), we have in particular |D n | = Ω(n p ) and hence D † h ∈ S (S d−1 ) for all h ∈ S (S d−1 ) (using similar arguments as for the proof of point 4 of [64, Proposition 4.1]). The compositions DD † and D † D are hence well-defined. Finally, we have from (3.1) and (3.4) that, for all h ∈ S (S d−1 ) which shows that D † is a generalised inverse of D. Moreover, we have that Since both D and D † are self-adjoint, we have (DD † ) * = D † D = DD † and (D † D) * = DD † = D † D, which shows that D † is actually the Moore-Penrose pseudo-inverse of D and concludes the proof.

Proof of Proposition 3
Let s ∈ S d−1 . From the generalised spherical harmonic transform applied to Ψ D s we have: ∈ K D and zero otherwise. We have hence, from the bilinearity of the Schwartz duality product and the addition theorem [48,Theorem 5.11]:

Proof of Proposition 4
Let s ∈ S d−1 be fixed but arbitrary. We show that, under the assumptions of Proposition 4, the series converges uniformly (w.r.t. the variable r). Since every summand is continuous, we can then conclude that the limit ψ D ( r, s ) is continuous (see [48,Theorem 2.14]) and hence in particular pointwise defined -i.e. D is spline-admissible. To show that (A.1) is uniformly convergent, we consider its remainder for some N > max(K D ). Then, from the addition theorem [48,Theorem 5.11] and the Cauchy-Schwarz inequality we get, for each r ∈ S d−1 : Moreover, since |D n | = Θ(n p ) we have from (2.4 a d |Dn| is convergent and hence its remainder tends to zero. Therefore Moreover, since the bound is independent of r (and s as a matter of fact) the convergence is uniform, which achieves the proof.

Proof of Proposition 5
Consider the function s : S d−1 → C defined as Notice that since D is spline-admissible, the functions ψ D ( ·, r i ) are ordinary functions which are hence bounded on S d−1 and hence ψ D ( ·, r i ) ∈ L 2 (S d−1 ), which implies in turn that s ∈ L 2 (S d−1 ). We can hence interpret s as an element of S (S d−1 ) with pointwise definition: s |ϕ := ϕ, s S d−1 , ∀ϕ ∈ S (S d−1 ). We now show that s = s , i.e. s|ϕ = s |ϕ , ∀ϕ ∈ S (S d−1 ). First, we write S (S d−1 ) = R(D) ⊕ N (D) such that every element ϕ of S (S d−1 ) can be written as ϕ = Dh + η, with (h, η) ∈ R(D) × N (D). We have hence Similarly we have from Proposition 3 s|Y m n η m n , and hence we have indeed s|ϕ = s |ϕ ∀ϕ ∈ S (S d−1 ) as claimed. Equation (3.13) then follows trivially from the fact that the summations involving K D vanish when K D = ∅. Finally, (3.14) follows from the definition of S D (S d−1 , Ξ M ) (see Definition 7) and the fact that when K D = ∅ the spline coefficients are unconstrained (see Remark 5).

B Proofs of Section 3
Proof of Proposition 6 Notice first that D maps isometrically (C (S d−1 ), · ∞ ) onto (C D (S d−1 ), · D,∞ ). Indeed, every element h of C D (S d−1 ) can be uniquely written as We have hence the isometries Moreover, we have from the Riesz-Markov representation theorem the duality pair (C (S d−1 ), · ∞ ) ∼ = (M(S d−1 ), · T V ), which yields Proof of Theorem 2 We show Theorem 2 as a corollary of Lemma 1 below, pertaining to penalised convex optimisation problems in non strictly convex abstract Banach spaces. The latter leverages the Krein-Milman theorem [60, p. 75] as well as results from [10,34,68] to characterise the solution set V of (4.4) as the weak * closed convex hull of extreme points with bounded df. Our result generalises [68,Theorem 6] to the case of non strictly convex cost functionals F , often encountered in practice.

Lemma 1 (Extreme Point Representer Theorem).
Consider the following assumptions: Since V e ⊂ V and f e ∈ V e , f e is also an extreme point of V e and hence is indeed of the form (B.2). This shows that every extreme point of V is of the form (B.2), which achieves the proof.
We now apply Lemma 1 to the FPBP problem (4.13) in Theorem 2. To this end, we set (B, · B ) = (C D (S d−1 ), D −1 · ∞ ), (B , |||·|||) = (M D (S d−1 ), D · T V ) and Λ(t) = λt. The assumptions of Lemma 1 are then indeed verified since (C D (S d−1 ), D −1 · ∞ ) and (M D (S d−1 ), D · T V ) form a duality pair of Banach spaces, and Λ is convex and strictly increasing. We get hence from Theorem 1 that the solution set to the FPBP problem (4.13) is nonempty and the weak * closed convex hull of its extreme points. The latter are moreover necessarily of the form: where ψ D is the zonal Green kernel of D and where the equality is in the sense of (3.7). Moreover, when D is spline-admissible, all traces {ψ D ( ·, r ), r ∈ S d−1 } of the zonal Green kernel are ordinary functions and hence f is also an ordinary function. This shows (4.15) and achieves the proof of Theorem 2.
Proof of Proposition 7 According to Proposition 6, the Dirac measures {δ r , r ∈ S d−1 } belong to C D (S d−1 ) i.f.f. there exists, for every r ∈ S d−1 , a function Ψ r ∈ C (S d−1 ) such that: DΨ r = δ r . The functions Ψ r satisfying the above equation are actually the Green functions of D, which, from Proposition 3, can be identified with traces ψ D ( ·, r ) of the zonal Green kernel of D. Finally, we have shown in Proposition 4 that, for a pseudo-differential operator with spectral growth order p > d−1 {ψ D ( ·, r ), r ∈ S d−1 } ⊂ C (S d−1 ), and hence Ψ r can indeed be identified with a continuous function and consequently all Dirac measures belong to the predual C D (S d−1 ). converges uniformly (see [48,Theorem 2.14]). To show that (B.6) is uniformly convergent, we consider its remainder for some N ∈ N. Then, from the addition theorem [48,Theorem 5.11] and the Cauchy-Schwarz inequality we get, for each r ∈ S d−1 :

Proof of
|f m n | 2 .
Since f ∈ L 2 (S d−1 ) we have trivially lim N →+∞ +∞ n=N N d (n) m=1 |f m n | 2 = 0. Moreover, since |D n | = Θ(n p ) we have from (2.4) N d (n)|D n | −2 = O(n d−2−2p ). Since p > (d − 1)/2 ⇒ d − 2 − 2p < −1 we have hence that the series n∈N N d (n) a d |Dn| 2 is convergent and hence is remainder tends to zero. We have hence Moreover, since the upper bound is independent on r the convergence is uniform, which achieves the proof.

C Proofs of Section 5
Proof of Proposition 9 It is easy to see that the Hermitian square-root of D has Fourier symbol { D n , n ∈ N}. The latter is moreover a pseudo-differential operator since the Fourier coefficientsD n are all positive (from the assumption of positive-definiteness of D) and hence { D n , n ∈ N} ⊂ R + . The rest of the assumptions of Definition 4 trivially follow from D being a pseudo-differential operator. Moreover, from the assumption p > d − 1 we get that the spectral growth order of D 1/2 , equal to p/2, is strictly larger than (d − 1)/2. We can hence apply [64,Lemma 5.5] to conclude that the generalised Sobolev space H Proof of Proposition 10 Proposition 9 tells us that H D 1/2 (S d−1 ) is an RKHS. Therefore, any element h ∈ H D 1/2 (S d−1 ) is an ordinary function, and we have from Proposition 3 h, ψ D ( ·, r ) D 1/2 = D 1/2 h, D 1/2 ψ D ( ·, r ) S d−1 = D 1/2 Ψ D r D 1/2 h = DΨ D r h = δ r |h = h(r), for all r ∈ S d−1 , which shows that the zonal Green kernel ψ D is the reproducing kernel [9] of H D 1/2 (S d−1 ). Additionally, since D is positive-definite, it is in particular invertible, and we get from (3.14) that S D (S d−1 , Ξ N ) = span{ψ n D := ψ D ( ·, r n ), r n ∈ Ξ N }.
The positive-definiteness of D implies moreover (see [64,Remark 5.9] and [48, Theorem 6.27 where the second equality follows from the fact that ψ D reproduces functions in H Moreover, [48,Lemma 6.34] tells us that, for spline-admissible pseudo-differential operators with growth order p > d+1 2 , the zonal Green kernel ψ D is uniformly Lipschitz continuous, i.e. there exists L D > 0 which only depends on the sequence {D n } n∈N such that for any ρ ∈ S d−1 |ψ D ( r, ρ ) − ψ D ( s, ρ )| ≤ L 2 D r − s 2 , ∀r, s ∈ S d−1 .

(C.2)
With these two observations, we are now ready to prove the result. First, we get from (C.1) as well as the Cauchy-Schwarz and triangle inequalities . Second, we obtain from the reproducing property, equation (C.2) and the definition (5.1) of the nodal width Θ Ξ N : In conclusion, this yields: sup r∈S d−1 |h(r) − s ⊥ N (r)| ≤ 2 3/2 L D Θ Ξ N h D 1/2 , which achieves the proof.

Proof of Theorem 3
The spline-admissible pseudo-differential operator D being positive-definite, its Green kernel ψ D is strictly positive-definite (see [48,Definition 6.25 and Theorem 6.27]) and hence according to [48,Lemma 6.26 x n δ rn Notice that the linear operator ΦΨ : C N → C L is finite-dimensional, and can hence be represented as a matrix. From the bilinearity of the Schwartz duality product, we have indeed x n ψ D ( ·, r n ) ϕ l = N n=1 x n ψ D ( ·, r n )|ϕ l = N n=1 x n G ln , ∀l = 1, . . . , L.