Asymptotic confidence bands in the Spektor-Lord-Willis problem via kernel estimation of intensity derivative

The stereological problem of unfolding the distribution of spheres radii from linear sections, known as the Spektor-Lord-Willis problem, is formulated as a Poisson inverse problem and an L2-rate-minimax solution is constructed over some restricted Sobolev classes. The solution is a specialized kernel-type estimator with boundary correction. For the first time for this problem, non-parametric, asymptotic confidence bands for the unfolded function are constructed. Automatic bandwidth selection procedures based on empirical risk minimization are proposed. It is shown that a version of the Goldenshluger-Lepski procedure of bandwidth selection ensures adaptivity of the estimators to the unknown smoothness. The performance of the procedures is demonstrated in a Monte Carlo experiment. MSC 2010 subject classifications: Primary 62G05, 62G20, 45Q05.


Introduction
Consider a population of balls with random radii, randomly placed in an opaque medium and, hence, not directly observable. A linear section through the medium, like drilling through a rock or a muscle biopsy, allows for measuring the radii (half-lengths) of the line segments that are intersections of the line probe with the balls. The ultimate goal is to use those measurements to unfold both the balls radii distribution and the 'density' of the balls in the medium. Early formulation of the problem dates back to Spektor [30] and Lord and Willis [23] and was related to some measurements in material sciences. A review of some heuristic algorithms traditionally used for solving the Spektor-Lord-Willis problem (henceforth, SLW problem) and related stereological problems is given in [10,Ch. 10.4.3]. Although planar rather than linear sections through the medium are generally preferred in experimental setups, leading to the better known Wicksell's problem, there are also important practical applications of line probes (e.g., [2,21]). Moreover, linear intercept measurements on polished metallographic sections may lead to the SLW problem, even if plane sections through the medium are taken [22, p. 117].
Following [33] and [16], we formulate the SLW problem as a Poisson inverse problem. Assume that the balls radii distribution is supported in [0, 1] and the balls centers form a homogeneous Poisson process in R 3 with intensity λ (all density and intensity functions are taken w.r.t. respective Lebesgue measures). If the balls radii distribution has a density, say ρ, then the observed radii of the line segments form a Poisson process on [0, 1] with intensity function ns(u), where n is the 'size of the experiment', related to the total length of the observed linear probe, and with f (x) = λρ(x), see [33]. The goal is to unfold f from observed linear sections radii and to study asymptotics with n → ∞, which is a special form of a Poisson inverse problem. With S : L 2 ([0, 1], dx) → L 2 ([0, 1], du), S is a compact Hilbert-Schmidt operator, its inverse is unbounded and the SLW problem is ill-posed in the Hadamard sense. Poisson inverse problems were studied in some generality in, e.g., [31,32,1]. Spectral type solutions for the SLW problem, minimax on Sobolev ellipsoids were constructed in [16] and [34] and B-spline sieved quasi-maximum likelihood estimation was studied in [33]. Adaptive wavelet solutions, rate-minimax over some Besov balls were found in [11,12].
The main new contribution of the present paper is the construction of nonparametric, asymptotic confidence bands for the unfolded intensity function. Construction of confidence bands in direct problems of function estimation started in 1973 with the seminal paper by Bickel and Rosenblatt [3], who constructed confidence bands for density estimated from an i.i.d. sample, and continued in several further developments, as summarized, e.g., in [19,Ch. 5.1.3] and [6]. The latter paper was also the first step towards the construction of confidence bands in inverse problems and was followed in recent years by several similar works [7,4,24,8,13,25]. Despite their practical importance, no confidence bands had been constructed, however, for densities in stereological inverse problems until only recently, when such bands were produced in the Wicksell's problem in [36].
The methodology developed in [3,6,36] works well for estimators of kernel type. Therefore, in Section 2 we propose a new kernel-type estimator in the SLW problem and study its theoretical properties. Rate minimaxity over some Sobolev classes and adaptivity issues are studied in Section 3, and asymptotic, non-parametric confidence bands are constructed in Section 4. We also report in Section 5 on a practical implementation, including automatic bandwidth selection based on the empirical risk minimization principle, and on results of an extensive Monte Carlo study of the proposed procedures. Some final conclusions are formulated in Section 6. Proofs related to adaptivity are presented in Appendix A.
It should be stressed that, in contrast to all the previous constructions of confidence bands in inverse problems, we construct the bands in the Poisson inverse problem setup with random number of data points, which is more realistic in stereology than inverse density estimation setup with non-random sample size.

The estimator
Our first goal is to construct an L 2 -rate-minimax estimator of the intensity function f , given observations of a Poisson process with the intensity function ns, related to f via (1). The main idea of our approach is based on the observation that this problem may be reformulated as the problem of derivative estimation with a weighted L 2 -loss function. Let X 1 , . . . , X N (n) be random points associated with the Poisson process with intensity ns. Then X 2 1 , . . . , X 2 N (n) form a Poisson process with intensity ng, where g(u) = s( √ u)/(2 √ u) and .
. Hence, any estimatorĝ of g gives a natural estimatorf (x) := −2xĝ (x 2 ) of f and To express the smoothness assumptions, the functions f and g will be further considered as functions defined on R. The L 2 and L ∞ norms will be denoted with · 2 and · ∞ . For some m ∈ N, define a Sobolev-type class of functions W (m, L) = s : R → R : s (m−1) is absolutely continuous and ||s (m) || 2 L .
To evaluate the estimation risk for f in W (m, L) with standard methods, as in, e.g., [35], it would be convenient to have g in W (m + 1, L). However, since g(0) = 1 0 f (x) dx, any g is necessarily discontinuous at zero. To circumvent the problem, we will use the reflection device (e.g., [29,Ch. 2.10]) and estimate the function g under the assumption that the symmetrized g, i.e. the function u → g sym (u) := g(|u|), belongs to W (m + 1, L). Define It is elementary to see that, if f has a left limit f (1 − ) at 1, then the left derivative g (1 − ) of g at 1 exists and g (1 − ) = −f (1 − )/2. Because m ≥ 1, g must be continuous on R for f ∈ F m,C , which implies that f (1 − ) = 0. Further, it follows by induction that, for k = 1, . . . , m + 1, there exist absolute constants C k 0 , . . . , C k k−1 such that, for almost every u > 0 and every f ∈ W (m, L), so that the class F m,C contains the functions f ∈ W (m, L) with derivatives tending to zero sufficiently fast, when the argument goes to zero.
Assume that there exist ε > 0, a > 0 and ξ > 0 such that Then, f ∈ F m,C for some C dependent on m, L, ε, a, ξ.
Following the idea sketched above, we start with a kernel estimator of the derivative g , based on symmetrized sample ±X 2 1 , . . . , ±X 2 N (n) , and obtain the estimator for f in the form with some kernel K and some bandwidth h. For the minimaxity of the estimator, it is convenient to make h a function of the non-random experiment size n. For the construction of confidence bands in Section 4, however, it will be more natural to make h dependent on the random number N (n) of data points. As N (n)/n tends a.s. to a limit (depending on f ), those two types of assumptions are closely related. Throughout this article, we define the order of a kernel as the order of its first non-zero moment. At several places, we impose Assumption 1. K is an absolutely continuous, symmetric kernel with squareintegrable derivative K and such that K(u) du = 1.
be the standard kernel estimator of p , based on the symmetrized sample. Definê g (u) =p (u)1 [0,1] (u). Then and, consequently, one has the standard variance-bias decomposition of the risk Now, we evaluate the bias, using the absolute continuity of K in the integration by parts step: Standard calculation with p ∈ W (m + 1, C), analogous to that on page 14 in [35], gives after Taylor expansion of p (x − uh) − p (x) and double application of the generalized Minkowski inequality With h n −1 2m+3 , this finally gives 2m+3 .

Lower bounds, minimaxity and adaptivity
For a good lower bound, a possibly large number of well separated functions in F m,C should be constructed, for which the corresponding data distributions are close to each other. As in [11], the construction will be based on smooth wavelets. Let ψ ∈ C m be a compactly supported mother wavelet, e.g. a Daubechies wavelet, and let ψ jk ( Note that, for every j and x, the cardinality of the set {k ∈ Z : ψ

Confidence bands in the Spektor-Lord-Willis problem
and it is obvious that each f ω is non-negative because of the condition 5 above and because, again, for every fixed j and x, the cardinality of the set {k ∈ Z : ψ jk (x) = 0} does not exceed S(m). Since supp f ω ⊂ [1/8, 1], using Proposition 1, we conclude that f ω ∈ F m,C . Reasoning further as in the proof of Proposition 1 in [11], with 2 j n 1/(2m+3) and using conditions 2 and 5, we obtain, via an application of the Assouad's cube technique, the existence of a constant C 5 > 0 such that sup for any estimatorf n . The constant C may influence the value of C 5 , but not the rate. This means that the estimatorf n in Theorem 1 is rate-minimax over F m,C . Moreover, this also implies that, for any estimatorf n , sup f ∈F m,C E f n − f ∞ cannot approach zero faster than n −m/(2m+3) , which is more relevant for the discussion of uniform confidence bands. It will be seen in the next section that the width of our confidence bands converges to zero at the rate n −(m/(2m+3)−ξ) , with ξ that may be arbitrarily small, but positive because of undersmoothing used for bias control. The bandwidth depends on the generally unknown smoothness of the estimated function, but the estimator can be made adaptive through a suitable data-driven choice of the bandwidth. A natural choice for adaptive bandwidth selection for kernel estimators is the Goldenshluger-Lepski method (cf. [20]). For brevity, call such adaptive estimators 'GL-estimators'. A version of GLestimators has been studied for probability densities and their derivatives in the i.i.d. setup in [15]. For Poisson process intensity functions (but not for their derivatives), GL-estimators have been studied in [28]. In order to produce an oracle inequality for our boundary corrected estimator of the derivative of the intensity of the observable Poisson process (and, consequently, for the unfolded intensity), we adapt to the Poisson process setup with boundary correction the proof of Proposition 3 in [15], pretty much in the same spirit, in which Proposition 2 in [15] has been ported to the Poisson process setup as Theorem 2 in [28].
As in the proof of Theorem 1, we work with symmetrized functions p, p and with the estimatorp h , with the index that marks the dependence on the bandwidth. Similarly, we write in this sectionf h rather thanf n . For the reflection device, it is natural to assume K symmetric. This makesp h antisymmetric and simplifies some technical arguments in the proofs.
From the proof of Theorem 1, 2 ) 1/2 , with suitably estimated unknown quantities. By triangle and Cauchy-Schwarz inequalities, and because · * ≤ · 2 for antisymmetric functions, The oracle bandwidth is defined as the minimizer of Φ(h) over a set, say H, of candidate bandwidths. It depends on the unknown p. The idea of the GL method is to minimize a specific estimator of the function Φ(h). Set the kernel estimator of p. Thenp h = (p h ) and, since K h p = (K h p) , a natural estimator of this term may be defined aŝ with some fixed h . Further, since EN (n) = n g 1 , a natural estimator of g 1 is N (n)/n. In the GL method, the secondary bandwidth h is eliminated by minimizing w.r.t. h an 'upper envelope' of a family of functions indexed by h and suitably constructed to properly estimate the bias term.
With some η > 0, define χ : and setĥ For the GL estimator of p , defined asp GL :=p ĥ , we have the following oracle inequality proved in Appendix A.

203
This easily gives the adaptivity of the GL estimator of f , naturaly defined aŝ if f ∈ F s,C .
Note that, for f ∈ F s,C , the bias term K h p − p 2 2 can be bounded, as in the proof of Theorem 1, by C 2 (s, C)h 2s and the leading constants in (4) depend on f only through g 1 . Since, in the SLW problem, g 1 ≤ g ∞ = f 1 , the rates are uniform over intersections of F s,C with any fixed ball in L 1 . Note also that conditions of Proposition 1 ensure that f 1 is upper bounded by a constant that depends on L only.

Confidence bands
The goal of this section is to construct asymptotic confidence bands for the intensity function f on an interval [a, b] ⊂ [0, 1], with a > 0 and b < 1, based on the data X 2 1 , . . . , X 2 N (n) .
x) corresponds to the intensity function of the Poisson process of the unobserved squared spheres radii, then, for continuous f 0 , f 0 (u) = −g (u). Therefore, to adapt the idea of Bickel and Rosenblatt ( [3]), who proposed a method of construction of confidence bands around kernel estimators of a specific type, it is convenient to first construct confidence bands for f 0 on [a 2 , b 2 ] around a suitable kernel estimator of the derivative −g , and then transform the results back to the original problem.
More specifically, let be an estimator of f 0 , similar to the estimatorf n of f defined in (3), but with a random bandwidth h = h [N (n)] that depends on the random sample size N (n). The first step is to investigate the asymptotic distribution of the maximal deviation of the estimatorf 0,n from its mean over [a 2 , b 2 ]. Define the process where h = h [N (n)] andq N (n) is an appropriate estimator of the density q(x) = g(x)/c of the observations X 2 i (see Assumption 2(c) below).
The main results of this section depend on two sets of assumptions. The first and less restrictive one allows for construction of confidence bands for Ef 0,n . The second set includes stronger assumptions needed for bias-correction and for construction of confidence bands for f itself. Denote with · ∞,I the sup-norm on the interval I = [a 2 , b 2 ].
(a) K is supported and twice continuously differentiable on (a) For some integer m ≥ 1, K is a kernel of order at least m, supported and twice continuously differentiable on

(d) same as Assumptions 2(c) and 2(d), respectively, (e) for m as in assumption (a) and for some
With h(k) k −γ , Assumption 2(c) reduces to q k − q ∞,I = o p (1/ log k), which is typically satisfied by, e.g., kernel estimators with polynomial rate of convergence (cf. [14]).
For any a > 0 and b < 1, the conditions imposed on the intensity function f hold for most commonly assumed distributions, including SML-A (with m = 1) and SML-B, NM, Beta(4,2) (with any m ≥ 1), used in our simulation studies in Section 5.
Note that, due to Assumption 2(d), g is bounded away from zero on [a 2 , b 2 ], because it is non-increasing and g(b 2 ) = 1 b f (x) dx, which also implies that g 1/2 has bounded derivative in [a 2 − Δ, b 2 + Δ], with some Δ > 0. Also, due to Assumption 3(e), g is m-times continuously differentiable in [a 2 − Δ, b 2 + Δ] and there exists bounded g (m+1) in (a 2 − Δ, b 2 + Δ). All those features of g are used in the proofs of Theorem 3 and Corollary 1 below.
The limiting distribution of the supremum of the process {Z N (n) (t) : a 2 ≤ t ≤ b 2 } is given in the following theorem. and Proof. The proof will only be sketched briefly, because large parts of it are similar to the proofs in [36]. The first part of the proof goes, conditionally on N (n) = k, along the same lines as the proof of Theorem 1 in [36] with the estimator based on i.i.d. sample X 1 , . . . , X k and with the process where α k is the empirical process corresponding to the distribution function of X 2 1 , . . . , X 2 k . After noting that asymptotically, for h ≤ a 2 , one can construct appropriate approximations Y k,0 , Y k,1 , Y k,2 , and Y k,3 of the process Y k , and obtain, for each x ∈ R, The unknown quantity in Y k is q(t), with t ∈ [a 2 , b 2 ]. However, reasoning as in the proof of Corollary 1 in [36], one deduces, using Assumption 2(c), that the above result remains true after replacing Y k with Now, it follows from Lemma 4.1.1 in [26] that the same limiting distribution is valid for the process with k = N (n). An obvious modification of the results obtained for the estimatorf 0,k , with k = N (n), gives the required result for the estimatorf 0,n and completes the proof.
To finish the construction of the confidence bands for f , one has to deal with the bias off 0,n , and then transform the obtained bands for f 0 to bands for f . To control the impact of the bias, we follow one of the major strategies which is undersmoothing: accepting less smooth estimator in order to reduce the bias (cf., e.g., [6], [5], [4], [25], [19,Ch. 6.4.2], [36]). (An alternative approach is explicit bias correction, see, e.g. [17].) In our case, this idea is realized in Assumption 3(b), which becomes clear after comparison with the rate of convergence of h to zero imposed in Theorem 1.
Summarizing, the confidence bands constructed using an undersmoothing bandwidth have the form implied by the following corollary.

Bandwidth selection and simulations
The Goldenshluger-Lepski (GL) procedure, although attractive theoretically, proved to be computationally very expensive. Even with effective, parallel implementation on a large grid of processors, it was hardly feasible to conduct Monte Carlo experiments of conclusive size, and this only for Gaussian kernels, for which convolutions can be computed analytically in closed forms. The necessity of numerical computation of convolutions (and their derivatives) of compactly supported kernels made the computing time explode, even with FFT. This is unfortunate, because smooth, compactly supported kernels are used for construction of confidence bands (cf. Assumption 2(a), even if, strictly speaking, the theory only allows the bandwidths to depend on N (n) and, hence, does not allow for data-driven bandwidth selection). Moreover, as will be demonstrated below, the GL-estimators consistently tend to oversmooth, at least with sample sizes used in our simulations. Therefore, as a practical alternative, we propose to select the bandwidth for the estimatorf n , given by formula (3), in another data-dependent way with the following version of the empirical risk minimization (ERM) principle, which is computationally much cheaper than the GL-method. Define Substituting f (x) = −2xg (x 2 ), we obtain, after a change of variable, which can be estimated witĥ where M is a positive integer, 0 = x 0 , . . . , x M = 1 are equally spaced points in [0, 1], andg is some, independent of h, estimator of the intensity function g.
The bandwidth h is chosen as the minimizer ofR(h).
In the numerical experiment, we used M = 2 10 and the following kernel estimator of the function g: where K ep is the Epanechnikov kernel and H = 5(N (n)/n) 2 n −1/5 , which proved to work well in a wide range of examples studied in simulations. Notice that N (n)/n converges almost surely to Data samples were generated from several intensity functions, but we only present selected results for the following four functions (taken from [16]) that represent typical behaviour of our procedures in various setups: Note that only Beta(4,2) belongs to F 1,C , which guarantees the MISE convergence rate n −2/5 , according to Theorem 1. The other functions, being discontinuous at 1, fail to belong to F m,C . Nevertheless, all four functions satisfy the less restrictive conditions of Corollary 1, so that asymptotically valid confidence bands may be constructed for all of them. The mean numbers of the observed data points were 47.6%, 54.2%, 41.3%, and 38.6% of n, respectively for the functions from Beta(4,2) to NM.
The SML-B and NM functions are much more difficult to estimate in stereological problems than Beta(4,2) and SML-A. SML-B is hard mainly because of rapid local changes of the derivative, being close to zero over the rest of the support (step-like behaviour). For the NM function, in addition to rapid local changes of the derivative, problems with the coverage can be expected near to one, where the function is close to zero and steepy, because the confidence bands will be narrow there, due to the presence ofq N (n) in the nominator of l n (t, x) (cf. Corollary 1). Additionally, for NM data is scarce, indeed, in that region because, due to the very nature of the SLW problem, the observed data are shifted to the left w.r.t. the original radii, and there is not much to be shifted from the neighbourhood of one, although Assumption 2(d) is formally satisfied.
For the GL-method, one has to select some value of the parameter η. Numerical experiments have shown that the bandwidths selected with η = 0.5 were always larger than those selected with η = 10 −2 , and the increasing tendency continued to η = 1.5 and η = 2, for all generated data samples. On the other hand, the difference between using η = 10 −2 and η = 10 −4 , say, was negligible. To compare the GL-method (with η = 10 −2 ) with the ERM method, we have run both (with the Gaussian kernel) on the same data and compared the estimators and their L 2 squared errors. Typically, GL-estimators tend to oversmooth and to have larger errors than those of the ERM-based estimators. Figure 1 illustrates that for the NM function, with n = 10 5 and estimators computed from 10 data samples. In view of the above discussion, only ERM method and estimators with the biweight kernel K(x) = (15/16)(1 − x 2 ) 2 1 [−1,1] (x) of order two in the estimatorf n will be further discussed below. Figures 2 and 3 illustrate the behaviour of the estimatorf n for Beta(4,2) and SML-B functions. For each function and each experiment size, 10 artificial data samples were generated and the estimatorf n with bandwidth selected with ERM principle was computed. Additionally, the best possible bandwidths were found, i.e. those that minimize the numerically computed L 2 distance betweenf n and the true f . The best and worst cases (out of 10) are presented in the left panels of Figures 2 and 3. The right panels of those figures illustrate the performance of the ERM principle. For similar results on spectral-type and wavelet estimators, see [16,12]. As seen in Figures 2 and 3, minimization of empirical risk produces good estimates of the unfolded intensity function, close to those with the optimal bandwidth.
The last part of this section presents the results of our simulation study of the asymptotic confidence bands given in Corollary 1. The asymptotic theory applies to confidence bands constructed on intervals [a, b] with any 0 < a < b < 1. In finite samples, however, a and b should not be chosen too close to the bound-  aries. Problems with actual coverage probability can be expected because of at least two reasons: wrong width of the band and/or bad behaviour of the central estimator. The half-width l n (t, x) of the band is proportional to the estimatẽ q N (n) of the density q(·) = g(·)/c (cf. Corollary 1). With f ∈ F m,C , symmetrization allows for reliable estimation of q(·) near to zero, so that the bands width may be expected reasonable there (provided h is chosen correctly). In the vicinity of one, however, problems with estimation of q(·) should be expected, due to well-known boundary effects. Moreover, since q(1) = 0, the bands can become too narrow in the vicinity of one. On the other hand, the central estimatorf n may be expected to cause problems at both ends of the interval [0, 1] due to boundary effects, because it is based on the estimator of the derivative of g and symmetrization does not necessarily help in this case. Problems may be more serious near to zero, due to the additional square transformation of the argument (cf. formula (3)). In effect, to fully eliminate the boundary effects, one would have to concider only a and b such that √ h < a < b < √ 1 − h, which may be quite restrictive in finite samples. To which extent these restrictions can be relaxed, may depend on the estimated function.
In the first stage of our numerical experiment, all confidence bands were constructed on the interval [0.1, 0.9] and with the bandwidth h = 0.85h * , where h * was chosen according to the ERM procedure. For rapidly changing functions SML-B and NM, however, the coverage probabilities were too low, and we were forced to use h = 0.75h * in those cases. Alternatively, one can use h = 0.85h * on a shorter interval. Asq N (n) , we used the kernel estimator (5), with N (n) instead of n in the normalizing factor. Table 1 shows the simulated coverage probabilities and the confidence bands mean widths for all considered intensity functions f , for three values of the experiment size n: 10 000, 30 000, 100 000, and for three levels of nominal coverage probability. The results are based on 1 000 simulation runs. Note that, out of all scenarios for SML-B, only the last row of the table (approximately) meets the restriction √ h < a and, at the same time, produces coverage probabilities very close to nominal ones. The remaining functions are much less problematic in this respect. Figures 4 and 5 show some typical examples of 80% and 95% confidence bands for n = 10 4 and n = 10 5 . It is seen in those figures and in Table 1 that the bandwidth selected with the ERM procedure may also be used, after multiplication by a shrinkage factor, in the construction of confidence bands. The difficulty of the SML-B case is reflected in larger sample sizes needed to obtain reasonable estimates and in too low actual coverage probabilities in smaller samples. Similar effect, although less pronounced, can be seen in case of the NM function.

Discussion and conclusions
The proposed kernel-type estimator with an automatic bandwidth selection and the related confidence bands provide the first solution to the problem of uniform interval estimation in the Spektor-Lord-Willis problem of stereology, formulated as a Poisson inverse problem. As seen in Proposition 1, the smoothness class F m,C is essentially the Sobolev class W (m, L), with some local restrictions in an arbitrarily small vicinity of zero. Balls with radii below detectability limit in a given experiment are, however, not observable anyway, so that such local restrictions seem acceptable from the applied point of view. As shown in Section 2, estimating f in the standard L 2 -norm is equivalent to estimating g in the weighted L 2 -norm, with the weight √ x. This weight does not influence the minimax rate, which is n −2m/(2m+3) , as one would expect with the standard L 2norm. Recall, however, that g has m − 1 zero derivatives at zero, which makes estimation of g in the vicinity of zero an easy task. Hence, the estimation error in ( , 1] dominates the global error and the weight does not influence the minimax rates. Relatively large samples needed for the applicability of asymptotic confidence bands motivate studying alternative bootstrap constructions. This is also attractive from theoretical point of view because, strictly speaking, asymptotic confidence bands based on the standard Bickel-Rosenblatt theorem do not allow for data-driven choice of the bandwidth. This restriction can be relaxed only at the cost of serious additional theoretical effort, as in [18], or in [9], where an approach based on, so-called, anti-concentration property of the supremum of a relevant Gaussian process was proposed, along with a Gaussian multiplier bootstrap version of the Lepski's method and a corresponding construction of confidence bands for densities. Adaptation of those techniques to our setup and empirical verification of the performance of such (again, computationally very costly) confidence bands is, however, beyond the scope of this article and may be the subject of a separate project.
For A 3 , we obtained in the proof of Theorem 1 in Section 2 Finally, using (a + b) 2 ≤ 2a 2 + 2b 2 , This will give the thesis because of the following lemma.