The Role of Nonpolynomiality in Uniform Approximation by RBF Networks of Hankel Translates

Given μ > −1/2 and c ∈ I =]0,∞[, let the space Cμ,c (respectively, Cμ) consist of all those continuous functions u on ]0, c] (respectively, I) such that the limit limz󳨀→0+z−μ−1/2u(z) exists and is finite; Cμ,c is endowed with the uniform norm ‖u‖μ,∞,c = supz∈[0,c]|z−μ−1/2u(z)| (u ∈ Cμ,c).Assume φ ∈ Cμ defines an absolutely regular Hankel-transformable distribution. Then, the linear span of dilates and Hankel translates of φ is dense inCμ,c for all c ∈ I if, and only if, φ ∉ πμ, where πμ = span{t2n+μ+1/2 : n ∈ Z+}.


Introduction and Motivation
1.1.RBFNNs.The radial basis function (RBF) method is nowadays one of the primary tools for interpolating multidimensional scattered data.Its simple form and ability to accurately approximate an underlying function have made the method increasingly popular in several different types of applications, some of which include cartography, medical imaging, the numerical solution of partial differential equations, and neural networks (see, e.g., [1] and references therein).
Radial basis function neural networks (RBFNNs) as such were introduced in the 1980s by Broomhead and Lowe [2] and soon applied to problems of supervised learning such as regression, classification, and time series prediction [3,4].This type of network falls within the general class of nonlinear, single hidden layer feedforward neural networks.Given  ∈ N, the family of RBFNNs consists of all those functions V : R  → R of the form where (i)  ∈ N is the number of kernel nodes in the hidden layer (ii) ( 1 , . . .,   ) ∈ R  is the vector of weights from the th kernel node to the output nodes (iii)  ∈ R  is an input vector (iv)  is a radially symmetric kernel function of a unit in the hidden layer (v)   ∈ R  and   ∈ R are the centroid and smoothing factor (or width) of the th kernel node ( ∈ N, 1 ≤  ≤ ), respectively (vi)  : [0, ∞[→ R is the so-called activation function, which characterizes the kernel shape, often a Gaussian The smoothing factors may be the same in all kernel nodes of a RBFNN or may vary across them.Park and Sandberg [5,6] proved that under mild conditions on the kernel  (or the activation function ) both classes of RBFNNs (with either the same or varying smoothing factors across nodes) have the universal approximation property, meaning that they are dense in suitable spaces of continuous or integrable functions.Chen and Chen [7] considered RBFNNs with a continuous activation function  in the hidden layer defining a tempered distribution in R and proved that the necessary and sufficient condition for such networks to uniformly approximate every continuous function on compacta is that  is not an even polynomial.Nonpolynomiality is straightforwardly seen to be a necessary condition for these approximations and has been found necessary and sufficient for other types of networks to possess the universal approximation property as well, cf.[8][9][10][11][12].In this paper we aim to extend the result in [7] to RBFNNs of Hankel translates.The precise meaning of this extension will be clarified in due course.

The Hankel Transformation and the Hankel Translation.
Let  =]0, ∞[.The Hankel integral transformation is usually defined by where J  () =  1/2   () ( ∈ ) and   denotes the Bessel function of the first kind and order  ∈ R.
Aiming to obtain a distributional extension of ℎ  , Zemanian introduced new spaces of test and generalized functions.The space H  [13,14] consists of all those smooth, complexvalued functions  = () ( ∈ ) such that When topologized by the family of norms {] , } ∈Z + , H  becomes a Fréchet space where ℎ  is an automorphism provided that  ≥ −1/2.Then the generalized Hankel transformation ℎ   , defined by transposition on the dual H   of H  , is an automorphism of H   when this latter space is endowed with either its weak * or its strong topology.For  ∈ R and  ∈ , Zemanian [15] also introduced the space B , of all those smooth functions  = () ( ∈ ) such that () = 0 ( > ) and Endowed with the topology generated by the family of seminorms { , } ∈Z + , B , becomes a Fréchet space.The study of the Hankel #-convolution in spaces of generalized functions was initiated by Sousa Pinto [16], only on compactly-supported distributions and for  = 0.In a series of papers [17][18][19], Betancor and the author investigated systematically the generalized #-convolution in wider spaces of distributions, allowing  > −1/2.In this context, the Hankel convolution # ∈ H  of ,  ∈ H  is defined as the function where the Hankel translate    ∈ H  of  ∈ H  is given by Here, for , ,  ∈ , is the so-called Delsarte kernel.Note that   (, , ) ≥ 0, supp   (, , ⋅) = [| − |,  + ],   (, , ) is symmetric in , , , and where   = 2  Γ( + 1).Therefore, for any  ∈ H  we have The formula ℎ  (  ) () =  −−1/2 J  () (ℎ  ) () (,  ∈ ) (10) and the exchange formula The formulas and For the operational rules of the Hankel transformation and further properties of the Hankel translation and Hankel convolution that will be required, in particular those involving the Bessel differential operator the reader is mainly referred to [14,17,19].Here we will highlight the following [14, Equation 5.5 (8)]: If  ∈ N and () =  0 (‖‖) (a.e. ∈ R  ) is an integrable radial function, then its -dimensional Fourier transform is also radial and becomes a 1-dimensional Hankel transform of order /2 − 1 [22, Theorem IV.3.3]: Actually, since it turns out that, on radial univariate-even-functions, the Fourier transformation, which reduces to a Fourier-cosine transformation, coincides with the Hankel transform of order  = −1/2; similarly, the Hankel translation and Hankel convolution of order  = −1/2 can be seen to coincide (modulo a multiplicative constant) with the usual translation and convolution on R (cf.[23,Example 3.2]).Thus for 2+2 ∉ N the Hankel translation and the Hankel convolution provide strict generalizations of the usual translation and convolution operators, inasmuch as arbitrary orders  ≥ −1/2 are allowed.

RBFNNs of Hankel
Translates.Motivated by the fact that the Hankel transformation is best adapted to deal with radial functions, Arteaga and the author [24][25][26][27] have proved that the Hankel transformation and the Hankel convolution are suitable tools for the description and analysis of a RBF interpolation scheme by functions of the form where  ≥ −1/2,  is a complex function defined on  (the so-called basis function), the Hankel translation operator of order , and   ,   (,  ∈ Z + , 1 ≤  ≤ , 0 ≤  ≤  − 1) are complex coefficients.
In analogy to the standard case (1), we set the family S 1 () = S ,1 () of RBFNNs of Hankel translates of order  > −1/2 to consist of all those functions V :  → R which can be represented as where  ∈ N is the number of kernel nodes in the hidden layer, for  ∈ N, 1 ≤  ≤ ,   ∈ R is the weight from the th kernel node to the output node, and   ,   ∈  are, respectively, the centroid and the smoothing factor of the th kernel node.Further,  is a kernel function of a unit in the hidden layer which, in this case, coincides with the activation function and, as above,   ( ∈ ) denotes the Hankel translation operator, while (  )() = () (,  ∈ ) is a dilation operator.Note that, for  = 1 and  = −1/2, (23) becomes ( 1).An investigation on the universal approximation capabilities of a closely related class of RBFNNs defined on the nonnegative real axis has been carried out in several papers by Arteaga and the author [28][29][30].It should be remarked that the results in the present paper can be derived neither from [24][25][26][27], where only the interpolation problem is addressed, nor from [28][29][30], where RBFNNs are constructed using the Bessel-Kingman hypergroup translation (or Delsarte translation) instead of the Hankel one, and where the universal approximation property, which is studied mainly in spaces of integrable functions, requires in turn integrability of the basis function.

Objectives.
In the sequel we assume  > −1/2 and consider the following spaces: (i) Given  ∈ , C , will denote the linear space of all those continuous functions  on [0, ] such that the limit lim exists and is finite.When endowed with the norm C , becomes a Banach space.In fact, the map is an isometry from C , onto [0, ], the space of all continuous functions on [0, ] with the uniform norm.(ii) The linear space C  consists of all those continuous functions  on  such that the limit (24) exists and is finite.Endowed with the topology generated by the family of seminorms { , } ∈N , where C  becomes a Fréchet space.Note that sequential convergence in C  is equivalent to convergence in C , for all  ∈ .
(iii) The space E  consists of all those smooth functions  on  such that the limits lim exist and are finite.Endowed with the topology generated by the family of seminorms { ,, } (,)∈N×Z + , where E  becomes a Fréchet space.
Our aim here is to find necessary and sufficient conditions on the basis function  for the family of RBFNNs S 1 () to have the universal approximation property.More precisely, the above mentioned result in [7] is extended to the Hankel setting in the following way.Given  ∈ C  ∩ H   , a necessary and sufficient condition for S 1 () to be dense in C , ( ∈ ) is nonmembership in the class of Müntz polynomials generated by  , () =  2++1/2 ( ∈ Z + ).This is the content of Theorem 9 in Section 3. In Section 2 we introduce the concept and give a characterization of zerosupported B   -distributions (Theorem 5), which is used in the proof of Theorem 9 and might be interesting in its own right.
Recall that Λ ∈ B   if, and only if, the restrictions of Λ to every B , ( ∈ ) are continuous.By (4), this means that to each  ∈  there corresponds  > 0 and  ∈ Z + such that Definition 3. If, in (33), one  will do for all  ∈  (not necessarily with the same ), then the smallest such  is called the order of Λ.Otherwise, Λ is said to have infinite order.
Remark 4. Note that every Λ ∈ B   with supp Λ = {0} has finite order.Indeed, fix ,  ∈  with  < , and choose  ∈ B , such that On the other hand, the Leibniz formula gives  2 > 0 such that Thus as asserted.
Remark 6.Note that the functionals (38) can be written (modulo constant factors) as derivatives of the identity for the Hankel convolution (16).In fact, we have (49)

Nonpolynomiality of the Activation Function
We begin by establishing two auxiliary results.
The proof is now complete.

Final Remarks
(i) RBFNNs of Hankel translates, as defined in this paper, admit only one-dimensional inputs.In order to allow for multidimensional inputs one should consider the multidimensional Hankel translation, defined by iteration of the one-dimensional translation operator with respect to each of the variables while the others are kept fixed (see, e.g., [35] and references therein).The proof of the above results for the multidimensional case could well be the subject of a forthcoming paper.(ii) According to Theorem  This could be useful in handling mathematical models built upon a class of radial basis functions depending on the order  whose performance might be improved by finely tuning , without increasing the number of centroids [36,37].
we may write     (  ) ()     ≤ ∫ 9 and [32, Theorem 3.7 and Corollary 3.10], any continuous function  ∉   for which there exists  ∈ Z + so that (1 +  2 ) − () is bounded on , or (1 +  2 ) −  +1/2 () is integrable on , can be used as an activation function yielding universal approximation.A paradigmatic example is the Gaussian By considering RBFNNs of Hankel translates, a new parameter  is introduced which in practice leaves a greater variety of manageable kernels at our disposal.