Continuous Regularized Least Squares Polynomial Approximation on the Sphere

In this paper, we consider the problem of polynomial reconstruction of smooth functions on the sphere from their noisy values at discrete nodes on the two-sphere. The method considered in this paper is a weighted least squares form with a continuous regularization. Preliminary error bounds in terms of regularization parameter, noise scale, and smoothness are proposed under two assumptions: the mesh norm of the data point set and the perturbation bound of the weight. Condition numbers of the linear systems derived by the problem are discussed. We also show that spherical t ϵ -designs, which can be seen as a generalization of spherical t -designs, are well applied to this model. Numerical results show that the method has good performance in view of both the computation time and the approximation quality.


Introduction
In recent decades, the problem of finding a polynomial approximation g: S 2 ⟶ R with degree of g up to an integer L of a continuous function f ∈ C(S 2 ) on the sphere S 2 : � x ∈ R 3 : ‖x‖ 2 � 1 has received much attention [1][2][3][4][5][6]. Given noisy data f δ (x i ) at points x i ∈ S 2 , i � 1, . . . , N, regularized least squares schemes are usually considered, whereas the regularization varies for different tasks. In this paper, we consider approximating the function f with a continuous regularized least squares scheme (CRLS) as where w i , i � 1, . . . , N are positive scalars as weights of the data fitting term. For normalization, we can always assume that N i�1 w i � N. Moreover, λ is the regularized parameter to balance the data fitting term and the regularized term and P L denotes the polynomial space: P L � spherical polynomials with degree ≤ L � span Y ℓ,k , ℓ � 0, . . . , L, k � 1, . . . , 2ℓ + 1 , (2) where Y ℓ,k is a fixed L 2 -orthonormal real spherical harmonic of degree ℓ and order k, which means S 2 Y ℓ,k (x)Y ℓ′,k′ (x)dω(x) � δ ℓ,ℓ′ δ k,k′ , ℓ, ℓ ′ � 0, . . . , t; k, k ′ � 1, . . . , 2ℓ + 1, where dω(x) denotes the surface measure on S 2 and δ ℓ,ℓ′ is the Kronecker delta. It is well known that the dimension of P L is e Sobolev space H s : � H s (S 2 ) is defined for s ≥ 0 as the set of all functions f ∈ L 2 (S 2 ) whose Laplace-Fourier coefficients where λ ℓ � ℓ(ℓ + 1), and the norm of H s (S 2 ) can be defined as where the positive parameters ϕ (s) ℓ are decreasing in ℓ and satisfy Here, we say that a n ≍ b n if there exist positive constants c 1 and c 2 independent of n such that c 1 a n ≤ b n ≤ c 2 a n for all n. Obviously, by letting s � 0, we can obtain H 0 (S 2 ) � L 2 (S 2 ). CRLS is to obtain the approximation of f with fitting the data values f δ (x i ), w i � 1 . . . , N, and at the same time, we keep the approximation smoother. e motivation is often from the geophysical quantity problems such as satellite gravity gradiometry problem ( [7], pp. 120, 262), in which a smoother polynomial representation of the gravitational potential than the Earth's surface is to be found. On the other hand, CRLS can be seen as a generalization or approximation of some smooth-oriented regularized least squares models (see the appendix for discussions on some polynomial approximation models relative to this paper).
Polynomial approximation problems arise in a variety of engineering areas, for which different models are used. To recover smooth functions on S 2 , a discrete regularized least squares method (DRLS) is proposed in [1,5] as where R L : P L ⟶ P L is a linear "penalization" operator, which can be chosen in different ways. e points used are positive-weight cubature formula that is exact for all spherical polynomials of degree up to 2L, which will guarantee the uniqueness of the solution and furthermore derive its explicit form. A reconstruction error bound is obtained for the case that the nodes and weights establish a cubature formula that is exact for all spherical polynomials of degree up to 2L. To our best knowledge, the number of points needed to construct a 2L-degree polynomial-exact cubature on S 2 is around (2L + 1) 2 /3 [8], which is bigger than the dimension of the polynomial space P L in most practical cases. However, in real application, it is usually costly to acquire as much data points as expected, which motivates us to study the case that the number of data points is not large. In this paper, we do not assume that N ≥ (L + 1) 2 , that is, the number of data values is not necessary to reach or exceed the dimension of the polynomial space P L . e regularization term could still guarantee the uniqueness of the minimizer of CRLS when N < (L + 1) 2 and avoid the overfitting. e rest of this paper is organized as follows. In Section 2, we discuss the solution and approximation error of CRLS and the condition number of the linear system it derives. Based on the results, we further give a discussion on spherical t ϵ -designs applied to this model. Numerical experiments are proposed in Section 3 in which the model is tested for its efficiency and accuracy.

Solution, Approximation Error, and Condition Number
where Problem (1) is convex, and thus by deriving its first-order optimality condition, we can obtain that solving problem (1) leads to find a solution of the linear system with Remark 1. For λ > 0, equation (12) will always have a unique solution by the fact that Y T WY � (W 1/2 Y) T W 1/2 Y is positive semidefinite and B is positive definite. By the uniqueness of the solution of problem (1), we denote the minimizer of it by Furthermore, when λ � 0, problem (1) may have multiple solutions. In this case, f δ λ represents the uniquely determined minimizer in P L of (1) with minimal norm.

Approximation
Error. For the case that N ≥ (t + 1) 2 and X is a spherical 2L-design or a cubature rule with 2L-degree polynomial exactness, the error bounds are discussed in [1,5]. Model (1) can be seen as a variant of models considered in [1,5], in which the difference is that we choose different regularization terms. Furthermore, to propose the error bounds, we do not need the assumption about the polynomial exactness the data point needs to have. e only assumption we need is on the mesh norm of the data point set X ⊂ S d , which can be defined as follows.
e mesh norm could be regarded as the largest geodesic distance of all points on S d to their nearest x i ∈ X or as the radius of the largest spherical cap that does not contain any e results in this section are based on a slight generalization of eorem 5.1 in [9] which presents an L 2 sampling inequality on the sphere S d .

Theorem 1. Let d ≥ 2 and let s > d/2. ere exist positive constants c s and c s (dependent on d and s) such that for any finite point set
For the setting of weighted least squares, we slightly revise the above theorem as the following.
ere exist positive constants c s and c s (dependent on d and s) such that for any finite point set X � x 1 , · · · , x N of N distinct points on S d with h X ≤ c s and for any function g ∈ H s , Proof. e gap between eorem 5.1 in [9] and the above one is the involvement of the weight vector w � (w 1 , . . . , w N ). By the fact that w i ≥ 1 − ϵ for ∀i � 1, . . . , N and inequality (16), we can obtain the conclusion.
. . , f(x N )) T on X, and given a vector of deterministic noise δ � (δ 1 , . . . , δ N ) T and given λ ≥ 0, there exist positive constants c s and c s ′ (dependent on s) such that if the point set X satisfies h X ≤ c s , the following holds: Proof. For simplicity, we define the norm ‖·‖ w as y T Wy for a vector y ∈ R N . Note that the norm is well defined since W is positive definite. Define the quadratic functional μ(g) as Let f δ λ , f ∈ R N be the vector of function value of f δ λ , f at the point set X, respectively. By sampling inequality (17) in eorem 2, we can obtain that there exists a positive constant c s ″ such that en, by the fact that f δ λ is the minimizer of μ(g), the following holds: Moreover, we have us, the term ‖f δ λ − f‖ H s can be bounded as Substitute inequalities (21) and (23) where the last inequality is derived by the fact that From the theorem, it is easy to see that the data point set X � x 1 , x 2 , . . . , x N and the weight w � (w 1 , w 2 , . . . , w N ) T ∈ R N need to be chosen properly to improve the approximation quality of the model. e data set X should have its mesh norm as small as possible and w should be near to its mean value. Furthermore, one can assume that X and w establish a cubature rule with polynomial exactness so that it can give a fast way to find the solution of the problem. By this view, spherical t ϵ -designs which are proposed and studied in [6,[10][11][12][13][14] are a suitable choice in this paper.
is exact for all spherical polynomials p of degree at most t, with the weight vector w � (w 1 , . . . , w N ) T satisfying Naturally, we can choose a spherical t ϵ -design as the data set and its corresponding weight w as the weight in problem (1). Moreover, solving problem (1) can become trivial when t is large enough. Theorem 4. Let X be a spherical t ϵ -design with t ≥ 2L and w be the vector of weights satisfying (25) and (26) with respect to X. en, and (1) has the unique solution Proof. Note that when X is a spherical t ϵ -design, where the third equality is established by the orthonormality of spherical harmonics and the fact |S 2 | � 4π. By Remark 1, problem (1) has a unique optimal solution satisfying N 4π Note that (N/4π)I (L+1) 2 + λB is diagonal, and thus we can directly obtain the solution of the above linear system as equation (28). e proof is completed. Theorem 5. Let f δ denote a perturbation of f. en, the following holds: Proof. e proof of the theorem is similar to the proof of eorem 4.1 in [1], and we provide quick illustration to keep the paper self-contained. e condition number of T could be obtained by the eigenvalue Weyl's inequalities ( [15], p. 181), which leads to Note that problem (1) can be written as a ℓ 2 least square problem: 4 Mathematical Problems in Engineering e perturbation bound could then be achieved by applying the standard least squares perturbation bound ( [16], eorem 1.4.6) as When X is a spherical t ϵ -design with corresponding weight w � (w 1 , . . . , w N ) T with t ≥ 2L, by (27), we have that Y T WY is a scalar multiple of the identity matrix, and from (13), we find erefore, estimate (31) of the condition number κ(T) is sharp. It can be directly obtained by the above theorem that κ(T)≍ℓ 2s , which induces that s should be chosen properly as a balance of smoothing and solvability of the problem. However, the choice of s, which may depend on the nature of target function f ∈ C(S 2 ) and the scale of noise, will be an interesting question to be solved in the future.

Numerical Experiments
In this section, we report the numerical results for the CRLS model proposed in this paper. To illustrate the effectiveness of the model, we first compare the approximation errors of our model and the DRLS model (9) which is proposed in [1,5]. e computation is implemented in Matlab 2016a and performed on a HP notebook computer equipped with Intel Core i7-6500U 2.5G Hz CPU, 16 GB RAM running Windows 10. e solver for linear systems generated by the two models is selected as the left matrix divide command embedded in Matlab, which indeed performs fast in the experiment.
For the choice of the regularized parameter λ and the penalized parameters β ℓ , ℓ � 0, . . . , L for the DRLS model, one can refer to [5], in which β ℓ are chosen based on the theoretical error bounds and real applications and λ is chosen by the balancing principle. e parameter choice of CRLS is also of great importance for the approximation quality of our model, and this will be a crucial problem we will tackle in the future work. e task for the numerical experiment here is to compare the numerical performance between CRLS and DRLS. What should also be noticed in the numerical experiments is that for an approximation Hf of f, both the uniform and L 2 approximation errors as ‖Hf − f‖ C(S 2 ) and ‖Hf − f‖ L 2 are difficult to obtain exactly. In such case, the errors will be approximately computed by where X t � z 1 , . . . , z Nt ⊂ S 2 is a large-scaled and welldistributed point set and we use it to estimate the errors. In this paper, X t is selected as a maximal determinant (MD) (see [6,17]) point set with 10 4 points.

Example 1.
In the first example, we will test the two models, CRLS and DRLS, and record their approximation errors for comparison. e data point set X is also selected as the MD node of size 900. e penalized parameters are selected as β ℓ � (1 + ℓ(+1)) s and L � 29 so that both models will achieve unique minimizers. e target function supposed to approximate is the Franke function [18]: which is not a spherical polynomial but continuously differentiable on the whole sphere. e noises in the data f δ obey a uniform distribution in [− 0.5, 0.5] i.i.d. Images of f 1 and its noisy version are shown in Figures 1(a) and 1(b). Figure 2 shows uniform and L 2 approximation errors of the both models for s � 0.15j, j � 1, . . . , 20 and λ � 10 − 3.5+0.15k , k � 1, . . . , 20. It should be noted that both models will only be effective to recover the target function by choosing the parameters s and λ properly. erefore, thresholds as a success of the approximation are introduced into the figure, that is, L 2 error ≤ 1.75 and uniform error ≤ 1. As can be seen from the figure, CRLS shows its effectiveness and robustness in this example, which seems to be achieving success with more choices of the parameters than Mathematical Problems in Engineering 5 DRLS. Also, the deep blue area of the first row is larger than that in the second row, which indicates that CRLS has more chance than DRLS to achieve small errors.
Example 2. In this example, we will focus on the solving speed of the linear systems induced by the models. e condition number and CPU time about the linear systems derived by CRLS and DRLS will be reported. Two types of data point set with different scales will be chosen in this example. One is a well-distributed type which is used in Example 1, and the other type is ill-distributed, called Gauss-Legendre point set (GL) [19] with Gauss points in cos(polar angle) and equally spaced points in azimuthal angle.
In Example 1, the two models cannot achieve each one's smallest errors in the same setting λ and s, even when other settings are the same. To deal with this point, for a fixed data set, we search for the optimal parameter setting of λ and s for each model to achieve least uniform errors separately with the scheme shown in Example 1 and report their errors, condition numbers, and CPU time in Table 1.
From Table 1, it is obvious to see that for both MD and GL cases, CRLS and DRLS can achieve similar uniform errors at their best situations. However, in most cases, CRLS is more well conditioned than DRLS and costs less CPU time.
It is an interesting question to explain that CRLS performs better than DRLS in the simulation. First, we can compare the theoretical results on the condition numbers of the linear systems induced by CRLS and DRLS and then get the reason that CRLS has smaller condition numbers and solving times. However, the theoretical underlying reason that CRLS achieves smaller errors is unknown to us until now. We can try to explain it in a heuristic way: we know that the difference between CRLS and DRLS is the choice of the regularization term. In CRLS, we use the ‖ · ‖ H s (S 2 ) norm of the polynomial approximation g while in DRLS, we use the ℓ 2 norm of the value of g with some operator transform. us, it can be easy to obtain that the regularization term in DRLS is determined on the choice of the data point set X N but in CRLS, it is not. is may possibly be the reason for CRLS obtaining smaller errors.

Conclusion and Future Work
In this paper, a continuous regularized least squares polynomial approximation model on the two-sphere is discussed. Condition number and error bound of the model are proposed in the situation that the data are noisy and their cardinality is not necessarily larger than the dimension of the polynomial space. To guarantee the error bound, we need two assumptions: the mesh norm of the data point set should be small enough and the weights should be close to their mean. Numerical results show that the novel model is effective to approximate smooth functions on the sphere.
We conclude this paper by pointing out a few problems for future directions. e first is that it is of great importance to study the strategy of choosing the parameters s and λ, which is crucial to the approximation quality of the model. Secondly, it is an interesting topic to explain that CRLS could achieve smaller errors than DRLS in the numerical experiments, since the difference between them is only the choices of the regularization term.

Appendix Two Relative Polynomial Approximation Models
We will introduce two related regularized polynomial approximation models in this part. e first one can be seen as a special case of the CRLS model discussed in this paper. in which the regularized term is the Euclidean norm of c. e second example is that we replace the regularized term by the ‖ · ‖ L 2 norm of the Laplace-Beltrami transform of g, which is slightly different with another special case of the CRLS model. in which Δ * denotes the Laplace-Beltrami operator ( [20], pp. 38-39), which is the angular part of the Laplace operator in three dimensions: