Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory

Singh, Himanshu

doi:10.3390/math12060829

Open AccessFeature PaperArticle

Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory

by

Himanshu Singh

Department of Mathematics, The University of Texas at Tyler, Tyler, TX 75799, USA

Mathematics 2024, 12(6), 829; https://doi.org/10.3390/math12060829

Submission received: 17 February 2024 / Revised: 28 February 2024 / Accepted: 5 March 2024 / Published: 12 March 2024

(This article belongs to the Section Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

Gaussian Radial Basis Function Kernels are the most-often-employed kernel function in artificial intelligence for providing the optimal results in contrast to their respective counterparts. However, our understanding surrounding the utilization of the Generalized Gaussian Radial Basis Function across different machine learning algorithms, such as kernel regression, support vector machines, and pattern recognition via neural networks is incomplete. The results delivered by the Generalized Gaussian Radial Basis Function Kernel in the previously mentioned applications remarkably outperforms those of the Gaussian Radial Basis Function Kernel, the Sigmoid function, and the ReLU function in terms of accuracy and misclassification. This article provides a concrete illustration of the utilization of the Generalized Gaussian Radial Basis Function Kernel as mentioned earlier. We also provide an explicit description of the reproducing kernel Hilbert space by embedding the Generalized Gaussian Radial Basis Function as an

L^{2} -

measure, which is utilized in implementing the analysis support vector machine. Finally, we provide the conclusion that we draw from the empirical experiments considered in the manuscript along with the possible future directions in terms of spectral decomposition of the Generalized Gaussian Radial Basis Function.

Keywords:

Gaussian Radial Basis Function; reproducing kernel Hilbert space; support vector machine; neural network; ReLU; Generalized Hypergeometric Function; Pochhammer symbol; Hermite polynomials

MSC:

46E22; 68T01; 68T07; 68T09; 68T10; 33C20; 33C90; 28A10; 46E27; 46E30

1. Introduction

Contemporary data science, which revolves around diverse architectural frameworks of machine learning, signifies crucial potential across a wide range of applications. These applications span from various fields such as physics and engineering [1] to areas like genomic-assisted prediction [2] and predictive modeling for cancer tumors [3]. The central task of (automated) machine learning is to devise an optimal activation function or optimal kernel function, depending upon the learning architecture, that is best-suited for the specific scientific model. Therefore, it is not hard to see that machine learning applications and many others across various scientific realms duly take advantage of various mathematical functions that arise in basic function theory and incorporate them along with their Hilbert space theories. This symbiotic relationship between advanced machine learning architecture avenues of artificial intelligence and the raw availability of various classes of mathematical functions is aptly captured by pictorial representations produced by the National Academy of Sciences in [4]. This does not only simply depict the importance of such functions but also questions and inquires about the possibility of the existence and applicability of other new, unexplored, better math functions, which can advance the horizons of artificial intelligence, for instance in [5].

Across the rich and diverse history of machine learning, numerous activation functions, along with their incorporation in a kernel sense, have been proposed. Certain important examples of activation functions are the Sigmoid function or S-Shaped curves in Figure 1, like the hyperbolic tangent function (given as

\frac{1}{1 + e^{- x}}

cf. [6]). The class of Sigmoid functions is employed to classify data objects where the output is supposed to be constrained on a limited range (usually on

[0, 1]

).

The other class of important and popular activation functions comprises ReLU activation functions in Figure 2, defined by

f (x) = x

if

x > 0

, which are employed in machine learning problems of reinforcement learning, quantification, and classification. If one adds a piece of function defined as

f (x) = α x

for

x \leq 0

, where

α \neq 0

to the actual definition of ReLU, we obtain the

α

ReLU activation function (see (29) in this article), which is often also referred to as LeakyReLU. It is essential to acknowledge that both ReLU and the

α

ReLU activation function have discontinuities at the origin; these lead to the failure of the gradient descent optimizer by producing undefined gradients [7].

Furthermore, one of the most important and interesting classes of mathematical functions that leverage crucial platforms of machine learning or deep learning, to name a few, are radial basis functions, defined as follows (cf. [8]).

Definition 1.

A function

Φ : R^{d} \to R

is referred to as radial if there exists a univariate function

ϕ : {0} \cup R_{+} \to R

, such that

\begin{matrix} Φ (x) = ϕ (r), & w h e r e r = ∥ x ∥ . \end{matrix}

Here,

∥ • ∥

is some norm defined on

R^{d}

, usually

{∥ • ∥}_{2}

, which is the Euclidean norm.

One such special type of mathematical radial basis function is the Gaussian Radial Basis Function (GRBF), which is given as:

\begin{matrix} g_{σ^{2}} {(r) : =}_{def} exp (- σ^{2} r^{2}), \end{matrix}

where

σ > 0

. The Gaussian Radial Basis Function is an activation function that possesses a bell-shaped curve (top in Figure 3); it has impact-full applications in modeling random variables, which are distributed according to the Gaussian probability distribution. For instance, a neural network architecture predicting the speed of a car might employ Gaussian kernel regression, since the speed of the car follows the Gaussian probability distribution.

Moreover, in recent years, learning algorithms of support vector machines (SVMs) have played a key role in data-mining methods; these are workhorse machine learning tools in industry and science. The crucial explanation for this aspect is the unique ability of SVMs to embed in higher-dimensional nonlinear spaces for nonlinear classifiers. The theoretical underpinnings of SVMs revolve around the kernel function, which essentially enables one to operate in a high-dimensional space without computing coordinates of data but by simply computing inner products between all pairs of data in the feature space (cf. page 197 in [9]). To that end, we define the most commonly used kernel function, that is, the Gaussian Radial Basis Function Kernel.

Let

{∥ • ∥}_{2}

be the usual Euclidean norm on

R^{d}

; then, the

g (∥ • - • ∥_{2})

function is comfortably synonymous to its kernel notion, which is famously referred to as the Gaussian Radial Basis Function Kernel, given as

K_{σ} {(x, z) : =}_{def} g_{σ^{2}} {(∥ x - z ∥}_{2})

for

x, z \in R^{d}

. Explicitly, that is

\begin{matrix} K_{σ} (x, z) = exp (- σ^{2} {∥ x - z ∥}_{2}^{2}) . \end{matrix}

The Gaussian Radial Basis Function Kernel (Figure 4) is a building block for various learning architectures, such as spatial statistics [10], dynamical system identification [11], Gaussian processes for machine learning [12], etc. Additionally, it is also employed for the classification of the existence of objects (cf. [13]). This particular study extends the idea of the Gaussian Radial Basis Function Kernel to what is referred to as the Generalized Gaussian Radial Basis Function Kernel (GGRBF Kernel in Figure 5), as introduced in [14], on the avenues of machine learning and deep learning.

Definition 2.

Let

σ > 0

and

σ_{0} \geq 0

. Then, the Generalized Gaussian Radial Basis Function Kernel for

x

and

z \in R^{d}

is defined as:

\begin{matrix} K_{σ, σ_{0}} (x - z) : =_{d e f} & [g_{σ^{2}} (r) exp (g_{σ_{0}^{2}} (r) - 1) {\circ (∥ x - z ∥}_{2})] \\ = & g_{σ^{2}} ({∥ x - z ∥}_{2}) e^{(g_{σ_{0}^{2}} {(∥ x - z ∥}_{2}) - 1)} \\ = & e^{- σ^{2} {∥ x - z ∥}_{2}^{2}} e^{e^{- σ_{0}^{2} ({∥ x - z ∥}_{2}^{2})} - 1} . \end{matrix}

(1)

Now that we have introduced the Generalized Gaussian Radial Basis Function Kernel, let us briefly discuss the motivation of the present article with respect to it.

1.1. Motivation

The applicability of the Generalized Gaussian Radial Basis Function was introduced in [15] to provide better results in contrast to the Gaussian Radial Basis Function results; here, we specifically refer to the results of convergence and stability for interpolation problems on Franke’s test function and Runge’s function or solving the system with Tikhonov regularization and Riley’s algorithm. Drawing significant inspiration from [15], the application of the unexplored Generalized Gaussian Radial Basis Function Kernel was documented in [14], specifically in the contexts of SVMs, kernel regression, and pattern recognition, through the activation function in a neural network. These state-of-the-art methods in learning architectures are leveraged by a peculiar topic from the Hilbert function space called the reproducing kernel Hilbert space (RKHS) [16]. The analysis from the reproducing kernel theory perspective for the Gaussian Radial Basis Function Kernel has already been established by [17]; in the paper, an investigation related to the norms, orthonormal basis, and feature space was presented. However, with the present empirical evidence supporting better results obtained by employing the Generalized Gaussian Radial Basis Function Kernel, it becomes important to perform the same investigation for the Generalized Gaussian Radial Basis Function Kernel.

Note that, if

σ_{0} = 0

, then we obtain the traditional Gaussian Radial Basis Function Kernel. This is demonstrated in Figure 6.

1.2. Contributions

This article is motivated by documenting the application of the Generalized Gaussian Radial Basis Function Kernel on various machine learning and deep learning architectures. In the process of executing the Generalized Gaussian Radial Basis Function Kernel over learning architectures, we discover that the results produced by the Generalized Gaussian Radial Basis Function Kernel excel in terms of comparative accuracy or misclassification; these are achieved in contrast to the results obtained from the existing counterparts. These intriguing and promising findings naturally demand an exploration of the Generalized Gaussian Radial Basis Function from the perspective of mathematical function theory, which starts with the identification of the Hilbert space, generated by it as an

L^{2} -

measure. Furthermore, with the established Hilbert function space theory, we are also able to address the corresponding reproducing kernel theory by virtue of Aronszajn’s theorem (see [16]).

Having aptly addressed the Hilbert function theory generated by the Generalized Gaussian Radial Basis Function, this study immediately provides its application in learning architectures. The detailed investigation on the reproducing kernel for the Hilbert space of the Generalized Gaussian Radial Basis Function is proved to be universal (see in Section 4) on the restrictions over

R^{d} \times R^{d}

; this makes it ready-to-use for learning architecture problems, such as standard binary classification through SVMs, function approximation through kernel regression, and pattern recognition through deep convolutional neural networks. In particular, we determine that the results related to the Hilbert function theory of the Generalized Gaussian Radial Basis Function Kernel can be leveraged for SVM applications by addressing the hinge loss function L and minimal

L -

risk losses as well.

Finally, this article also provides promising future directions in light of the spectral decomposition of the Generalized Gaussian Radial Basis Function.

1.3. Plan of the Article

The present article is organized as follows: we provide the essential preliminaries of reproducing kernel theory in Section 2. Then, we give the results from the Hilbert space function theory, such as the orthonormal basis and the reproducing kernel of the Generalized Gaussian Radial Basis Function Kernel in Section 3. Then, we provide the functionality of the reproducing kernel in a real sense in Section 4; this is followed by compelling empirical comparison results in Section 5. Lastly, we provide our conclusions and results in Section 6 and describe our future research directions in Section 7.

2. Notation and Preliminaries

2.1. Hypergeometric Function Notation

We begin first by recalling the basic calculus results related to the Generalized Hypergeometric Function

_{p} F_{q} [\begin{matrix} a_{1} & a_{2} & \dots & a_{p} \\ b_{1} & b_{2} & \dots & b_{q} \end{matrix}; z]

[18].

Definition 3.

We define the Pochhammer symbol [19] (rising factorial notation) as

{(a)}_{k} = \frac{Γ (a + k)}{Γ (a)} = a (a + 1) \dots (a + k - 1) .

Then, we define Generalized Hypergeometric Function

_{p} F_{q} [\begin{matrix} a_{1} & a_{2} & \dots & a_{p} \\ b_{1} & b_{2} & \dots & b_{q} \end{matrix}; z]

, which is given as

_{p} F_{q} [\begin{matrix} a_{1} & a_{2} & \dots & a_{p} \\ b_{1} & b_{2} & \dots & b_{q} \end{matrix}; z] : =_{d e f} \sum_{l = 0}^{\infty} \frac{\prod_{i = 1}^{p} {(a_{i})}_{l}}{\prod_{i = 1}^{p} {(b_{i})}_{l}} \frac{z^{l}}{l!} .

(2)

Example 1.

The summation

\sum_{l = 0}^{\infty} \frac{1}{{(l + x)}^{n + 1}} \frac{1}{l!}

can be represented as:

\begin{matrix} \sum_{l = 0}^{\infty} \frac{1}{{(l + x)}^{n + 1}} \frac{1}{l!} = & \sum_{l = 0}^{\infty} {(\frac{Γ (l + x)}{Γ (l + x + 1)})}^{n + 1} \frac{1}{l!} \\ = & \sum_{l = 0}^{\infty} {(\frac{{(x)}_{l} Γ (x)}{{(x + 1)}_{l} Γ (x + 1)})}^{n + 1} \frac{1}{l!} \\ = & \sum_{l = 0}^{\infty} {(\frac{{(x)}_{l} Γ (x)}{{(x + 1)}_{l} x Γ (x)})}^{n + 1} \frac{1}{l!} \\ = & \frac{1}{x^{n + 1}} \sum_{l = 0}^{\infty} \frac{{(x)}_{l}^{n + 1}}{{(x + 1)}_{l}^{n + 1}} \frac{1}{l!} \\ = & \frac{1}{x^{n + 1}} n + 1 F_{n + 1} [\begin{matrix} x \\ x + 1 \end{matrix}; 1] & (u s e (2)) . \end{matrix}

The example presented above will be useful in addressing further details such as the orthonormal basis in the present manuscript. To avoid heavy notation clutter, we write

_{n + 1} F_{n + 1} [\begin{matrix} x \\ x + 1 \end{matrix}; 1] : =_{n o t a t i o n} F_{n, x, 1}

(3)

from now onward where necessary. Note that

F_{n, \infty, 1} = e

.

2.2. Field and Space Notations

The set of natural numbers in union with 0 is denoted by

W

, that is

W : = 0, 1, 2, \dots

. We use Kronecker delta

δ_{n m}

on non-negative integers n and m to show that

δ_{n m} = 1

; this is the case whenever

n = m

and

δ_{n m} = 0

, if

n \neq m

. We denote a complex number,

z = x + i y

, where x and

y \in R

. With that z, its conjugate part is given as

\bar{z} = x - i y

, along with its absolute value as

{| z |}^{2} = z \cdot \bar{z} = x^{2} + y^{2}

. We reserve symbol

K

to treat with a choice of fields on which we will operate; in particular,

K

can either be

R

or

C

.

If X denotes the space of input values (sometimes a closed subset of

R^{d}

), then we write

C (X)

as the space of continuous functions

f : X \to R

; on the other hand,

C^{m} (X)

represents the space of the

m -

times differentiable function, including

m = \infty .

2.3. Tensor Product Notation

We recall the tensor product between two functions; say that

f_{1}, f_{2} : X \to K

is given as

f_{1} \otimes f_{2} : X \times X \to K

. Then, for all

x, x^{'} \in X

, the tensor product

f_{1} \otimes f_{2}

is defined as

f_{1} \otimes f_{2} (x, x^{'}) : = f_{1} (x) f_{2} (x^{'})

.

2.4. Preliminaries

Definition 4.

Let

X = \emptyset

; then, a function

k : X \times X \to K

is referred to as the kernel on X if there exists a

K -

Hilbert space

(H, {〈 \cdot, \cdot 〉}_{H})

accompanied by a map

Φ : X \to H

such that

\forall x, x^{'} \in X

, we have

\begin{matrix} k (x, x^{'}) = {〈 Φ (x^{'}), Φ (x) 〉}_{H} . \end{matrix}

(4)

We regard Φ as the feature map and H as the feature space of k.

Now that we have introduced the basic notion from the kernel theory in the definition provided above, we can now comfortably define the building block of this article: reproducing kernel Hilbert space—RKHS.

Definition 5.

Let

X = \emptyset

and

(H, {〈 \cdot, \cdot 〉}_{H})

be the Hilbert function space over X.

1.: The space H is referred to as the reproducing kernel Hilbert space (RKHS) if $\forall x \in X$ , the evaluation functional $ε_{x} : H \to K$ defined as $ε_{x} (f) : = f (x), f \in H$ is continuous.

Definition 6.

A function

k : X \times X \to K

is called reproducing kernel of H if we have:

1.: $k (\cdot, x) \in H \forall x \in X$ , that is ${∥ k (\cdot, x) ∥}_{H} < \infty$ , and
2.: $k (\cdot, \cdot)$ has the reproducing property; that is

$\begin{matrix} f (x) = {〈 f, k (\cdot, x) 〉}_{H} \forall f \in H a n d x \in X . \end{matrix}$

It is worthwhile mentioning that the norm convergence yields the point-wise convergence inside RKHS. This fact can be readily learned due to the continuity of the evaluation function. This is demonstrated as follows for an arbitrary

f \in H

and

{\{f_{n}\}}_{n} \in H

with

∥ f - f_{n} ∥_{H} \to 0

as

n \to \infty

; then,

\begin{matrix} lim_{n \to \infty} f_{n} (x) = lim_{n \to \infty} ε_{x} (f_{n}) =_{(continuity of ε_{x})} ε_{x} (f) = f (x) . \end{matrix}

Now, we will state an important theorem from [16], which dictates the relationship between the reproducing kernel of the RKHS H and the orthonormal basis of it.

Theorem 1

(Aronszajn’s Theorem [16]). Let H be an RKHS over an nonempty set X; then,

k : X \times X \to K

, defined as

k (x, x^{'}) : = {〈 ε_{x}, ε_{x^{'}} 〉}_{H}

for

x, x^{'} \in X

is the only reproducing kernel of H. Additionally, for some index set,

I

, if we have

{\{e_{i}\}}_{i \in I}

as an orthonormal basis, then for all

x, x^{'} \in X

, we have

\begin{matrix} k (x, x^{'}) = \sum_{i \in I} e_{i} (x) \bar{e_{i} (x^{'})}, \end{matrix}

(5)

with an absolute convergence.

3. Function Space of Generalized Gaussian Radial Basis Measure

Let

d \in N

,

σ > 0

and

σ_{0} \geq 0

and f be holomorphic functions

f : C^{d} \to C

; we first write the measure of our interest:

\begin{matrix} d μ_{σ, σ_{0}, d} (z) : = e^{- σ^{2} {| z |}^{2}} e^{e^{- σ_{0}^{2} {| z |}^{2}} - 1} d V_{C^{d}} (z) . \end{matrix}

(6)

Here, ‘

d V_{C^{d}} (z)

’ is the usual Lebesgue measure on the entire

C^{d}

. For

d = 1

, we write simply

d μ_{σ, σ_{0}} (z)

to denote the typical Lebesgue area measure on

C

. We now provide the inner product associated with this measure as:

\begin{matrix} {〈 f, g 〉}_{σ, σ_{0}, C^{d}} : = N_{σ, σ_{0}, d} \int_{C^{d}} f (z) \bar{g (z)} d μ_{σ, σ_{0}, d} (z) . \end{matrix}

(7)

Here, ‘

N_{σ, σ_{0}, d}

’ is the normalization constant whose value is explicitly given as

{(\frac{e σ^{2}}{2 π})}^{d}

. Once we have define the inner product for the space, the norm for holomorphic function

f : C^{d} \to C

is:

\begin{matrix} {∥ f ∥}_{σ, σ_{0}, C^{d}}^{2} : = & {(\frac{e σ^{2}}{2 π})}^{d} \int_{C^{d}} {| f (z) |}^{2} d μ_{σ, σ_{0}, d} (z) . \end{matrix}

(8)

We write the following for the collection of holomorphic functions

f : C^{d} \to C

, whose

{∥ • ∥}_{σ, σ_{0}, C^{d}}

is finite, that is:

\begin{matrix} H_{σ, σ_{0}, C^{d}} : = \{holomorphic function f : C^{d} \to C : {∥ f ∥}_{σ, σ_{0}, C^{d}} < \infty\} . \end{matrix}

(9)

Once we have defined norm in (8) and the associated Hilbert space in (9), we can provide the following formulation, which makes the Hilbert space

H_{σ, σ_{0}, C^{d}}

as an RKHS.

Theorem 2.

For all

σ > 0

,

σ_{0} \geq 0

and all compact sets

K \subset C^{d}

, there exists a positive constant

c_{σ, σ_{0}, d}

such that for all

z \in K

and

f \in H_{σ, σ_{0}, C^{d}}

, we have

\begin{matrix} | f (z) | \leq c_{σ, σ_{0}, d} {∥ f ∥}_{σ, σ_{0}, C^{d}} . \end{matrix}

(10)

Proof.

Denote

B_{(0, 1)}

as the complex unit ball in

C

. Define

\begin{matrix} c_{σ, σ_{0}, d} : = sup_{z \in K + B_{(0, 1)}^{d}} \{e^{- σ^{2} {| z |}^{2}} e^{e^{- σ_{0}^{2} {| z |}^{2}} - 1}\} . \end{matrix}

In the spirit of ([17], Lemma-3, Page 4639), we have

\begin{matrix} \prod_{j = 1}^{d} r_{j} {| f (z) |}^{2} \leq & \frac{1}{{(2 π)}^{d}} \prod_{j = 1}^{d} r_{j} \int_{{[0, 2 π]}^{d}} {|f (z_{1} + r_{1} e^{i θ_{1}}, \dots, z_{1} + r_{d} e^{i θ_{d}})|}^{2} d θ . \end{matrix}

Integration of the above with respect to

(r_{1}, \dots, r_{d}) \in {[0, 1]}^{d}

yields:

\begin{matrix} {| f (z) |}^{2} \leq & \frac{1}{{(2 π)}^{d}} \int_{z + B_{(0, 1)}^{d}} {| f (z^{'}) |}^{2} d V (z^{'}) \\ \leq & \frac{c_{σ, σ_{0}, d}}{{(2 π)}^{d}} \int_{z + B_{(0, 1)}^{d}} {| f (z^{'}) |}^{2} e^{- σ^{2} {| z^{'} |}^{2}} e^{e^{- σ_{0}^{2} {| z^{'} |}^{2}} - 1} d V (z^{'}) \\ \leq & \frac{c_{σ, σ_{0}, d}}{{(e σ^{2})}^{d}} {∥ f ∥}_{σ, σ_{0}, d}^{2} . \end{matrix}

In particular, we have

c_{σ, σ_{0}, d} = \sqrt{\frac{c_{σ, σ_{0}, d}}{{(e σ^{2})}^{d}}}

. Hence, the result is established. □

Establishing Theorem 2 immediately yields that

H_{σ, σ_{0}, C^{d}}

is indeed an RKHS and we state here as an important corollary.

Corollary 1.

The Hilbert space

H_{σ, σ_{0}, C^{d}}

identified by the inner product

{〈 \cdot, \cdot 〉}_{σ, σ_{0}, C^{d}}

is an RKHS for all

σ > 0

and

σ \geq 0

.

Orthonormal Basis

We will need following technical result to establish the orthonormal basis for the RKHS

H_{σ, σ_{0}, C^{d}}

.

Theorem 3.

For every

σ > 0, σ_{0} \geq 0

and

n, m \in W

, we have

\begin{matrix} \int_{C} z^{n} \bar{z^{m}} d μ_{σ, σ_{0}} (z) = (\sqrt{\frac{2 π n!}{e σ^{2 n + 2}} F_{n, \hat{σ}, 1}} \sqrt{\frac{2 π m!}{e σ^{2 m + 2}} F_{m, \hat{σ}, 1}}) δ_{n m} \end{matrix}

(11)

where

\hat{σ} = \frac{σ_{0}^{2}}{σ^{2}}

and

F_{n, \hat{σ}, 1}

are defined in (3).

Proof.

Employ the polar coordinate of

z = r e^{i θ}

to have:

\begin{matrix} \int_{C} z^{n} \bar{z^{m}} d μ_{σ, σ_{0}} (z) = & \int_{0}^{\infty} r^{n + m} e^{- σ^{2} r^{2}} e^{e^{- σ_{0}^{2} r^{2}} - 1} r d r \int_{0}^{2 π} e^{i (n - m) θ} d θ . \end{matrix}

(12)

The quantity

\int_{0}^{\infty} r^{n + m} e^{- σ^{2} r^{2}} e^{e^{- σ_{0}^{2} r^{2}} - 1} r d r \int_{0}^{2 π} e^{i (n - m) θ} d θ

is 0 when

n \neq m

. Now, assume that

n = m

in (12), then:

\begin{matrix} \int_{C} z^{n} \bar{z^{m}} d μ_{σ, σ_{0}} (z) = & 2 π \int_{0}^{\infty} r^{2 n} e^{- σ^{2} r^{2}} e^{e^{- σ_{0}^{2} r^{2}} - 1} r d r \\ = & \frac{2 π}{e} \int_{0}^{\infty} t^{n} e^{- σ^{2} t} e^{e^{- σ_{0}^{2} t}} d t & (put t = r^{2}) \\ = & \frac{2 π}{e} \int_{0}^{\infty} {(\frac{s}{σ^{2}})}^{n} e^{- s} e^{e^{- σ_{0}^{2} / σ^{2} s}} \frac{d s}{σ^{2}} & (put s = σ^{2} t) \\ = & \frac{2 π}{e {(σ^{2})}^{n + 1}} \int_{0}^{\infty} s^{n} e^{- s} e^{e^{- σ_{0}^{2} / σ^{2} s}} d s \\ = & \frac{2 π}{e {(σ^{2})}^{n + 1}} \int_{1}^{0} {(\log \frac{1}{y})}^{n} e^{y^{σ_{0}^{2} / σ^{2}}} (- d y) & (put e^{- s} = y) \\ (13) & = & \frac{2 π}{e {(σ^{2})}^{n + 1}} \int_{0}^{1} {(\log \frac{1}{y})}^{n} (\sum_{l = 0}^{\infty} \frac{{(y^{σ_{0}^{2} / σ^{2}})}^{l}}{l!}) d y \\ (14) & = & \frac{2 π}{e {(σ^{2})}^{n + 1}} \sum_{l = 0}^{\infty} \frac{1}{l!} \int_{0}^{1} {(\log \frac{1}{y})}^{n} y^{l \hat{σ}} d y & (use \hat{σ} = \frac{σ_{0}^{2}}{σ^{2}}) \\ (15) & = & \frac{2 π}{e {(σ^{2})}^{n + 1}} \sum_{l = 0}^{\infty} \frac{1}{l!} \frac{Γ (n + 1)}{{(l \hat{σ} + 1)}^{n + 1}} \\ = & \frac{2 π n!}{e σ^{2 n + 2}} F_{n, \hat{σ}, 1} & (use (3)) . \end{matrix}

We used the entry 4.272(6)), Page 551 in [20] for the result in (15). The summation over l in (13) is interchanged with integral with respect to

d y

for

0 \leq y \leq 1

by virtue of Fubini–Tonelli’s theorem as

\int_{0}^{1} {(\log \frac{1}{y})}^{n} e^{y^{σ_{0}^{2} / σ^{2}}} d y

is finite which is demonstrated as:

\begin{matrix} |\int_{0}^{1} {(\log \frac{1}{y})}^{n} e^{y^{σ_{0}^{2} / σ^{2}}} d y| \leq \int_{0}^{1} |{(\log \frac{1}{y})}^{n} e^{y^{σ_{0}^{2} / σ^{2}}}| d y \leq e \int_{0}^{1} |{(\log \frac{1}{y})}^{n}| d y = e Γ (n + 1) < \infty . \end{matrix}

Thus, the result prevails. □

We will employ the notation

\hat{σ} = \frac{σ_{0}^{2}}{σ^{2}}

for

σ > 0

and

σ_{0} \geq 0

wherever it is required. In the light of Theorem 1, we have to determine the orthonormal basis of

H_{σ, σ_{0}, C^{d}}

.

Theorem 4.

Let

σ > 0

,

σ_{0} \geq 0

and

\begin{matrix} e_{n} (z) : = \sqrt{\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}}} z^{n} \end{matrix}

(16)

for some

z \in C

. Then, the tensor–product system

{(e_{n_{1}} \otimes \dots \otimes e_{n_{d}})}_{n_{1}, \dots, n_{d} \geq 0}

forms the orthonormal basis of

H_{σ, σ_{0}, C^{d}}

.

Proof.

We establish our result for

d = 1

for an initial basic understanding. For this, let us show that

{\{e_{n}\}}_{n \in W}

forms an orthonormal system. So, consider

z \in C

and let

m, n \in W

. Then,

\begin{matrix} {〈 e_{n}, e_{m} 〉}_{σ, σ_{0}} = & \frac{e σ^{2}}{2 π} \int_{C} e_{n} (z) \bar{e_{m} (z)} d μ_{σ, σ_{0}} (z) \\ = & \frac{e σ^{2}}{2 π} \sqrt{\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}}} \sqrt{\frac{σ^{2 m}}{m! F_{m, \hat{σ}, 1}}} \int_{C} z^{n} \bar{z^{m}} d μ_{σ, σ_{0}} (z) \\ = & \{\begin{matrix} 1 & if n = m \\ 0 & otherwise \end{matrix} & (use Theorem 3) . \end{matrix}

We use the ‘polarplot’ function available in MATLAB R2023b to construct the polar plot of

e_{n} (z)

. These results are given in Figure 7, where the numbers

1, 2,

and 3 are the radius values in the polar plot.

The above result concludes that

{\{e_{n}\}}_{n \in W}

is actually an orthonormal system. To this end, we have to establish that it is also complete. So, for this, pick a holomorphic function

f \in H_{σ, σ_{0}, C}

with

f (z) = \sum_{l = 0}^{\infty} a_{l} z^{l}

and observe that

\begin{matrix} {〈f, e_{n}〉}_{σ, σ_{0}} = & \frac{e σ^{2}}{2 π} \int_{C} f (z) \bar{e_{n} (z)} d μ_{σ, σ_{0}} (z) \\ = & \frac{e σ^{2}}{2 π} \sum_{l = 0}^{\infty} a_{l} \int_{C} z^{l} \bar{e_{n} (z)} d μ_{σ, σ_{0}} (z) \\ = & \frac{e σ^{2}}{2 π} \sqrt{\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}}} \sum_{l = 0}^{\infty} a_{l} \int_{C} z^{l} \bar{z^{n}} d μ_{σ, σ_{0}} (z) \\ = & \frac{e σ^{2}}{2 π} \sqrt{\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}}} \sum_{l = 0}^{\infty} a_{l} (\sqrt{\frac{2 π l!}{e σ^{2 l + 2}} F_{n, \hat{σ}, 1}} \sqrt{\frac{2 π n!}{e σ^{2 n + 2}} F_{n, \hat{σ}, 1}}) δ_{l n} \\ = & \sqrt{\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}}} a_{n} \frac{n! F_{n, \hat{σ}, 1}}{σ^{2 n}} \\ = & {[\sqrt{\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}}}]}^{- 1} a_{n} . \end{matrix}

Since the constant is

\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}} \neq 0

for any choice of n, the condition that

〈 f, e_{n} 〉 = 0

for all

n \in W

yields that

a_{n} = 0

for all

n \in W

, which results in the conclusion that

f \equiv 0

. Therefore,

{\{e_{n}\}}_{n \in W}

is complete. Now, we establish these results in a

d -

dimensional situation by employing the tensor product notation of Section 2.3. To this end, we see that

\begin{matrix} {〈 e_{n_{1}} \otimes \dots \otimes e_{n_{d}}, e_{m_{1}} \otimes \dots \otimes e_{m_{d}} 〉}_{σ, σ_{0}, d} = \prod_{j = 1}^{d} {〈 e_{n_{j}}, e_{m_{j}} 〉}_{σ, σ_{0}} . \end{matrix}

Hence, the orthonormality of

{\{e_{n_{1}} \otimes \dots \otimes e_{n_{d}}\}}_{n_{1}, \dots n_{d} \in, W^{d}}

is established due to the orthonormality of each

{〈 e_{n_{j}}, e_{m_{j}} 〉}_{σ, σ_{0}}

. We still need to ensure that this

d -

dimensional orthonormal system is complete. Now, observe

\begin{matrix} {〈 f, e_{n_{1}} \otimes \dots \otimes e_{n_{d}} 〉}_{σ, σ_{0}, d} = & {(\frac{e σ^{2}}{2 π})}^{d} \int_{C^{d}} f (z) \bar{e_{n_{1}} \otimes \dots \otimes e_{n_{d}} (z)} d μ_{σ, σ_{0}, C^{d}} (z) \\ = & {(\frac{e σ^{2}}{2 π})}^{d} \sum_{l_{1}, \dots, l_{d}}^{\infty} a_{l_{1}, \dots, l_{d}} I_{l, d}, \end{matrix}

where

I_{l, d} = \int_{C^{d}} z^{l} (e_{n_{1}} \otimes \dots \otimes e_{n_{d}} (\bar{z})) d μ_{σ, σ_{0}, C^{d}} (z)

. We further can simplify

I_{d}

, as follows:

\begin{matrix} I_{l, d} = & \int_{C^{d}} z^{l} e_{n_{1}} (\bar{z_{1}}) \land \dots \land e_{n_{d}} (\bar{z_{d}}) d μ_{σ, σ_{0}} (z_{1}) \land \dots \land d μ_{σ, σ_{0}} (z_{d}) \\ = & \prod_{j = 1}^{d} (\int_{C} z_{j}^{l_{j}} e_{n_{j}} (\bar{z_{j}}) d μ_{σ, σ_{0}} (z_{j})) \\ = & \prod_{j = 1}^{d} (\int_{C} z_{j}^{l_{j}} {\bar{z_{j}}}^{n_{j}} d μ_{σ, σ_{0}} (z_{j})) \\ = & \prod_{j = 1}^{d} (\sqrt{\frac{2 π l_{j}!}{e σ^{2 l_{j} + 2}} F_{l_{j}, \hat{σ}, 1}} \sqrt{\frac{2 π n_{j}!}{e σ^{2 n_{j} + 2}} F_{n_{j}, \hat{σ}, 1}}) δ_{l_{j} n_{j}} a_{l_{1}, \dots, l_{d}} . \end{matrix}

Finally,

\begin{matrix} {(\frac{e σ^{2}}{2 π})}^{d} \sum_{l_{1}, \dots, l_{d}}^{\infty} a_{l_{1}, \dots, l_{d}} I_{l, d} = & (\prod_{j = 1}^{d} {[\sqrt{\frac{σ^{2 n_{j}}}{n_{j}! F_{n_{j}, \hat{σ}, 1}}}]}^{- 1}) a_{n_{1}, \dots, n_{d}} . \end{matrix}

The further result for completeness in

d -

dimension follows a routine procedure from single-dimension case, as already discussed before. □

The following theorem provides the reproducing kernel for the RKHS

H_{σ, σ_{0}, C^{d}}

defined in (9).

Theorem 5.

For

σ > 0, σ_{0} \geq 0

and

\hat{σ} = \frac{σ^{2}}{σ_{0}^{2}}

, the reproducing kernel for the RKHS

H_{σ, σ_{0}, C^{d}}

is indeed (because Theorem 1) given as

\begin{matrix} K (z, w) : = \sum_{n_{1}, \dots, n_{d} = 0}^{\infty} λ_{n} {(z \bar{w})}^{n}, \end{matrix}

(17)

where multi-index notation is employed:

n = (n_{1}, \dots, n_{d})

and

λ_{n} = \prod_{i = 1}^{d} \frac{σ^{2 n_{i}}}{n_{i}! F_{n_{i}, \hat{σ}, 1}}

.

Proof.

We will demonstrate the desired proof as follows:

For $w \in C^{d}$ , we will show that $∥ K (\cdot, w) ∥_{σ, σ_{0}, d} < \infty$ . For this, consider that

$\begin{matrix} ∥ K (\cdot, w) ∥_{σ, σ_{0}, d}^{2} = & {(\frac{e σ^{2}}{2 π})}^{d} \int_{C^{d}} {| K (z, w) |}^{2} d μ_{σ, σ_{0}, d} (z) \\ (18) & = & {(\frac{e σ^{2}}{2 π})}^{d} \sum_{n_{1}, \dots, n_{d}}^{\infty} \sum_{m_{1}, \dots, m_{d}}^{\infty} λ_{n} λ_{m} w^{n} {\bar{w}}^{m} \int_{C^{d}} z^{n} {\bar{z}}^{m} d μ_{σ, σ_{0}, d} (z) \\ (19) & = & {(\frac{e σ^{2}}{2 π})}^{d} \sum_{n_{1}, \dots, n_{d}}^{\infty} λ_{n}^{2} {| w |}^{2 n} (\prod_{n_{i} = 1}^{d} \int_{C} z_{i}^{n_{i}} {\bar{z_{i}}}^{m_{i}} d μ_{σ, σ_{0}} (z_{i})) \\ = & {(\frac{e σ^{2}}{2 π})}^{d} \sum_{n_{1}, \dots, n_{d}}^{\infty} λ_{n}^{2} {| w |}^{2 n} (\prod_{n_{i} = 1}^{d} \frac{2 π n_{i}!}{σ^{2 n_{i} + 2}} F_{n_{i}, \hat{σ}, 1}) \\ = & \sum_{n_{1}, \dots, n_{d}}^{\infty} \frac{{| w |}^{2 n}}{\prod_{i = 1}^{d} \frac{n_{i}!}{σ^{2 n_{i}}} F_{n_{i}, \hat{σ}, 1}} . \end{matrix}$

We used the result of Theorem 4 from (18) to (19). For all $w \in C^{d}$ , the quantity $\sum_{n_{1}, \dots, n_{d}}^{\infty} \frac{{| w |}^{2 n}}{\prod_{i = 1}^{d} (\frac{n_{i}!}{σ^{2 n_{i}}}) F_{n_{i}, \hat{σ}, 1}}$ achieves convergence. This implies that $∥ K (\cdot, w) ∥_{σ, σ_{0}, d} < \infty$ for all $w \in C^{d}$ . Therefore, the kernel function is $K (\cdot, w) \in H_{σ, σ_{0}, C^{d}}$ .
In order to establish the reproducing property of $K (\cdot, w)$ , pick an arbitrary $f = \sum_{n_{1}, \dots, n_{d}}^{\infty} a_{n_{1}, \dots, n_{d}} w^{n} \in H_{σ, σ_{0}, C^{d}}$ . Then, consider the inner product of f with $K (\cdot, w)$ , as follows:

$\begin{matrix} {〈 f, K (\cdot, w) 〉}_{σ, σ_{0}, C^{d}} = & {(\frac{e σ^{2}}{2 π})}^{d} \int_{C^{d}} f (z) \bar{K (z, w)} d μ_{σ, σ_{0}, C^{d}} (z) \\ = & {(\frac{e σ^{2}}{2 π})}^{d} \sum_{n_{1}, \dots, n_{d}}^{\infty} \sum_{l_{1}, \dots, l_{d}}^{\infty} a_{n_{1}, \dots, n_{d}} λ_{l_{1}, \dots, l_{d}} w^{n} (\prod_{i = 1}^{d} \frac{2 π}{e σ^{2}} \frac{σ^{2 n_{i}}}{n_{i}! F_{n_{i}, \hat{σ}, 1}}) \\ = & \sum_{n_{1}, \dots, n_{d} = 0}^{\infty} a_{n_{1}, \dots, n_{d}} w^{n} \\ = & f (w) . \end{matrix}$

Hence, the desired result is achieved. □

The proof in the preceding theorem to demonstrate reproducing kernel nature of

K (z, w)

of RKHS

H_{σ, σ_{0}, C^{d}}

utilizes the basic machinery borrowed from the two-part definition for the reproducing kernel given in Definition 6. Now that we have determined the reproducing kernel theory for the Generalized Gaussian Radial Basis Function Kernel, we will investigate their functionality in terms of universality, which makes them applicable in addressing the learning architecture problems, such as standard binary classification through SVMs, function approximation through kernel regression, and pattern recognition through deep convolutional neural networks.

4. Restriction and Universality of Reproducing Kernel of RKHS $H_{σ, σ_{0}, C^{d}}$

It is crucial to address the fact that the restriction of the reproducing kernel

K (z, w)

on

R^{d} \times R^{d}

is also a kernel function, which eventually will assist us in establishing the universal nature of the

K (z, w)

.

Lemma 1

(Restriction of kernels [21]). Let k be a kernel on X, let

\tilde{X}

be a set, and let

A : \tilde{X} \to X

be a map. Then,

\tilde{k}

defined by

\tilde{k} (x, x^{'}) : = k (A (x), A (\tilde{x})), x, \tilde{x} \in X

, which is a kernel on

\tilde{X}

. In particular, if

\tilde{X} \subset X

, then

k_{| \tilde{X} \times \tilde{X}}

is a kernel.

In other words, the conclusion of the above lemma is given for all

x, x^{'} \in R^{d}

; if

k (x, x^{'}) \in R

, then k is also a kernel in the real sense (cf. [21]). This notion of restricting the kernel from

C

to

R

can allow the restriction of

K (z, w)

for the RKHS

H_{σ, σ_{0}, C^{d}}

on

R^{d} \times R^{d}

to have kernel in a real sense. Note that the

λ_{n}^{'}

s present in the reproducing kernel

K (z, w)

in (17) are strictly positive real numbers for all

n \in W

; also, observe that for all

x, x^{'} \in R^{d}

, we have

\begin{matrix} K_{| R^{d} \times R^{d}} (x, x^{'}) = \sum_{n_{1}, \dots, n_{d} = 0}^{\infty} λ_{n} {(x \bar{x^{'}})}^{n} = [\sum_{n_{1}, \dots, n_{d} = 0}^{\infty} λ_{n} {(x x^{'})}^{n}] \in R, \end{matrix}

(20)

which implies that the restriction of

K (\cdot, \cdot)

on

R^{d} \times R^{d}

is a kernel. Now, we shall recall the notion of universal kernel.

Definition 7

(Universal Kernels [21]). A continuous kernel k on a compact metric space X is called universal if the RKHS H of k is dense in

C (X)

. This means that, for every function

g \in C (X)

and all

ϵ > 0

, there exists

f \in H

, such that

{∥ f - g ∥}_{\infty} \leq ϵ .

Now that we have defined the notion of universal kernel function in the above definition, we provide an example of the same.

Example 2

(Universal Taylor Kernels). For a fixed

R \in (0, \infty]

and a

C^{\infty} -

function

f : (- R, R) \to R

that can be expanded into its Taylor series at 0, i.e.,

\begin{matrix} f (t) = \sum_{n = 0}^{\infty} a_{n} t^{n} & t \in (- R, R) . \end{matrix}

(21)

Let

X \{x \in R^{d} : {∥ x ∥}_{2} < \sqrt{R}\}

. If we have

a_{n} > 0

for all

n \geq 0

in (21), then k given by

\begin{matrix} k (x, x^{'}) : = f (〈 x, x^{'} 〉), & x, x^{'} \in X, \end{matrix}

(22)

is a universal kernel on every compact subset of X.

With the help of universal Taylor kernels, as given above, we can see that the reproducing kernel of the RKHS

H_{σ, σ_{0}, C^{d}}

is a universal Taylor kernel, followed by its restriction on

R^{d} \times R^{d}

. This motivating result is captured in the following important theorem.

Theorem 6.

Let X be a compact subset of

R^{d}

,

σ > 0

and

σ \geq 0

. Then, the kernel

K_{| R^{d} \times R^{d}}

defined in (20) is universal.

Proof.

Let

σ > 0

and

σ_{0} \geq 0

. Recall

a_{n}

from (21) and

λ_{n}

from (17) and set both of them equal to each other, that is,

a_{n} = λ_{n}

for all

n \in W

. With these choices of

a_{n}

for all

n \in W

, we now define a

C^{\infty} -

function

f : (- R, R) \to R

as

\begin{matrix} f (t) = \sum_{n = 0}^{\infty} λ_{n} t^{n} & t \in (- R, R) . \end{matrix}

(23)

We learn that, for

x, x^{'} \in R^{d}

, whose Euclidean norm is bounded by

\sqrt{R}

, we can express

K_{| R^{d} \times R^{d}} (x, x^{'})

in terms of f as defined in (23) as:

\begin{matrix} K_{| R^{d} \times R^{d}} (x, x^{'}) = \sum_{n = 0}^{\infty} λ_{n} {(x x^{'})}^{n} = \sum_{n = 0}^{\infty} λ_{n} {(〈 x, x^{'} 〉)}^{n} = f (〈 x, x^{'} 〉) \end{matrix}

(24)

The establishment of

K_{| R^{d} \times R^{d}} (x, x^{'})

for

x, x^{'} \in R^{d}

in terms of

C^{\infty} -

function f and in combination of

λ_{n} > 0

for all

n \in W

makes

K_{| R^{d} \times R^{d}}

a universal kernel by the virtue of Example 2. □

The establishment of the universal nature of the reproducing kernel

K (\cdot, \cdot)

of the RKHS

H_{σ, σ_{0}, C^{d}}

with restriction on

R^{d} \times R^{d}

in Theorem 6 allows one to approximate arbitrary continuous functions on compact subsets of

R^{d}

with those functions that can be represented against the reproducing kernel

K (\cdot, \cdot)

to the inner product, as defined in (7) or consequently whose norm given in (8) is finite.

5. Empirical Evidence and Results Comparison

We present the empirical evidence for the application of the Generalized Gaussian Radial Basis Function Kernel on various basic yet important learning architecture routines. The technical details of every experiment presented here are provided in Section 6.

5.1. Kernel Regression

We know that kernel regression aims to approximate or interpolate arbitrary continuous functions through the available choices of universal kernel functions on compact subsets of

R^{d}

. Usually, these functions are radial basis functions that are employed in the kernel sense to achieve the approximation. In the current example given as follows, we leverage the universal nature of both Gaussian Radial Basis Function Kernel (cf. Corollary 4.58 in [21]) and Generalized Gaussian Radial Basis Function Kernel from Theorem 6.

5.1.1. Example 1

Kernel regression of

f (x) = e^{\frac{(1 - 9 x^{2})}{4}} + tan x + x^{\frac{1}{6}} + sin x^{ϑ (n)}

is performed via both GRBF and (1) and the respective results are given below in Figure 8. Here,

ϑ (n)

is uniform random distributed number in

(0, 1)

,

x = n \in Z_{100}

.

5.1.2. Example 2

Kernel regression of

f (x) = e^{sin x - sin x^{2}} + \sqrt{2 π} | x + cos ϑ (n) |

is performed via both GRBF and (1) and the respective results are given below in Figure 9. Here,

ϑ (n)

is uniform random distributed number in

(0, 1)

,

x = n \in Z_{100}

.

5.2. Support Vector Machine

Let us briefly address how the results derived in Section 3 can be used to understand the analysis of SVM. To that end, we define an important notion in SVMs referred to as hinge loss function, denoted by

L (\cdot, \cdot)

as:

\begin{matrix} L (y, t) : = max_{y \in Y : = \{- 1, 1\}} \{0, 1 - y t\} & t \in R . \end{matrix}

(25)

Let

X \subset R^{d}

be a nonempty open subset and P be a probability measure on

X \times Y

; then, the risk induced by the hinge loss function is

L -

risk, which is defined with help of

f : X \to R -

measurable function as:

\begin{matrix} R_{L, P} (f) : = \int_{X \times Y} L (y, f (x)) d P (x, y) . \end{matrix}

(26)

Additionally, the minimal

L -

risk is given as

\begin{matrix} R_{L, P}^{*} : = inf_{f : X \to R} R_{L, P} (f) . \end{matrix}

(27)

It is important to recall here that the RKHSs of universal kernels approximate the minimal

L -

risk losses. It is performed in the following corollary (cf. Corollary 5.29 in [21]).

Corollary 2.

Let X be a compact metric space, H be the RKHS of a universal kernel on X, P be a distribution on

X \times Y

, and

L : X \times Y \times R \to [0, \infty)

be a continuous

P -

integrable loss. Then, we have

{inf}_{f \in H} [R_{L, P} (f)] = R_{L, P}^{*} .

Now, consider

f \in H_{σ, σ_{0}, C^{d}}

; then, let the restriction of f on

R^{d}

be denoted by

f_{| R^{d}}

; then, the aforementioned result forces us to establish the relation

\begin{matrix} inf_{f_{| R^{d}} \in H_{σ, σ_{0}, C^{d}}} [R_{L, P} (f)] = R_{L, P}^{*} . \end{matrix}

(28)

The setup presented in (28) enables to have consistent results concerning the classifiers of SVMs, which simply means that the classifiers that are based on minimization of a regularized risk can asymptotically learn in every classification task [22]. With this theory of SVM around the RKHS

H_{σ, σ_{0}, C^{d}}

, we are now ready to ensure the application of the Generalized Gaussian Radial Basis Function Kernel in SVMs.

SVMs are implemented for the standard binary data classification (cf. [23]) data generated in top left of Figure 10; this is performed via a different choices of kernels; these kernels include the traditional Sigmoid function, the Gaussian Radial Basis Function Kernel, and the Generalized Gaussian Radial Basis Function Kernel in (1).

Of the available three kernels—the Generalized Gaussian Radial Basis Function in (1), the Sigmoid, and the Gaussian Radial Basis Function Kernel—the Generalized Gaussian Radial Basis Function Kernel yields the lowest misclassification for the data of 101 sampled points.

5.3. Neural Network

The work in this section revolves around the application of universal approximation theorem (adapted from [24]) of Cybenko-1989 [25], Hornik-1991 [26], and Pinkus-1999 [27], as follows:

Theorem 7

(Universal Approximation Theorem in Neural Network). Let

ρ : R \to R

be any continuous function. Let

N_{d}^{ρ}

represent the class of feed-forward neural networks with activation function ρ with

d -

neurons in the input layer, one neuron in the output layer, and one hidden layer with an arbitrary number of neurons. Let

X \subseteq R^{d}

be compact. Then,

N_{d}^{ρ}

is dense in

C (X)

if and only if ρ is non-polynomial.

Now, under the application of the universal approximation theorem, one can have

ρ (r) = g_{σ^{2}} (r) exp (g_{σ_{0}^{2}} (r) - 1)

in Theorem 7; then, we realize that the Generalized Gaussian Radial Basis Function is an activation function, as it is a non-polynomial function. In the pursuit of an application of the Generalized Gaussian Radial Basis Function, we have following examples that provide the evidence of the optimal results that are yielded by the Generalized Gaussian Radial Basis Function in (1) employment both as in the usage of the activation function and the neural net layer in deep convolution neural nets (DCNNs).

5.3.1. Activation Function

Consider an

α

ReLU activation function, a simple modification of the powerful traditional ReLU activation function from NN (AF-NN), defined as

\begin{matrix} f (x) \{\begin{matrix} x & if x > 0 \\ α x & if x \leq 0 . \end{matrix} \end{matrix}

(29)

In the present experiment, two seven-layered NNs are constructed: one with the activation function defined by (29) and one with Generalized Gaussian Radial Basis Function in (1). Thereafter, the following performance Table 1 and Table 2 yield the respective results and comparison for the NNs.

Furthermore, the training progress data for the individual neural network architectures are presented next. The legends for accuracy and loss are same for both of the neural network architectures and their respective information is present in between the Figure 11 and Figure 12.

5.3.2. DCNN

The above experiments demonstrate the concrete functionality of the Generalized Gaussian Radial Basis Function in (1) as an activation function; this finding further motivates the performance of the deep convolution neural net. Therefore, in pursuit of this, a typical 7-layered deep convolution neural net is constructed; it has an activation function defined by (29) and one with (1). In the following experiment given in Figure 13 and Figure 14, the training data contain 1500–

28 \times 28

gray-scale letter images of A, B, and C in a 4-D array.

6. Results and Technical Details

Now that we have successfully constructed the Hilbert function space theory around the Generalized Gaussian Radial Basis Function, when it is viewed as the

L^{2} -

measure, we immediately employ it in various basic artificial intelligence learning architectures. The learning architectures, namely kernel regression, SVM, and DCNN, that are considered here are quite basic yet well-versed in the field of machine learning and artificial intelligence. These can be regarded as the building blocks of several other artificial intelligence routines. Following Table 3 is essentially a compilation of the results documented for the various experiments we have discussed so far in comparison with the Generalized Gaussian Radial Basis Function Kernel as opposed to its available counterpart(s), where none of hyper-parameters were optimized for the available kernel functions.

The empirical evidence recorded here simply directs us to conclude that the results prompted by the Generalized Gaussian Radial Basis Function outperform every considered learning architecture routine. A detailed discussion of the empirical evidence collected is presented here:

We considered two experiments for the kernel regression of the functions mentioned in Figure 8 and Figure 9. These functions, as easily can be seen, are not well-behaved functions, since they involve transcendental behavior along with function discontinuities (due to $tan x$ or $x^{1 / 6}$ , etc.) followed by uniform randomness due to the $ϑ (n)$ of $101 -$ random points. In both of the experiments of this nature, we record the minimum error obtained by employing either the Gaussian Radial Basis Function Kernel or the Generalized Gaussian Radial Basis Function Kernel. From this experiment, we learn that the minimum error recorded in both of them is remarkably small, in the order of ∼ $10^{- 4}$ , if we use the Generalized Gaussian Radial Basis Function Kernel in contrast to the Gaussian Radial Basis Function Kernel, which provides a minimum error in the order of ∼ $10^{- 3}$ .
Next, we consider the experiment of support vector machines for binary classification, in which we generated a random set of points (one can think of them as data) within the unit circle. We identified the random points spread across the circle with the positive class if it lies either in the first or the third quadrant; otherwise, we consider it to be within the negative class. After we generate and identify these datasets, we then identify the support vectors and the decision boundary by employing three (custom) kernel functions over MATLAB R2023b: Gaussian Radial Basis Function Kernel, Sigmoid Kernel, and Generalized Gaussian Radial Basis Function Kernel. We then register the out-of-sample misclassification rate by using 10-fold cross-validation from the aforementioned kernel functions. Here, we immediately learn that the misclassification yielded by Gaussian Radial Basis Function Kernel is comparatively high among all three employed kernel functions; the Generalized Gaussian Radial Basis Function Kernel yields the lowest misclassification rate: upward of $96 %$ .
For the activation function application in the domain of neural networks, we consider a practical example of pattern recognition for the synthetic hand-written English letter dataset in the form of a $4 - D$ array (lettersTrainSet in MATLAB). As already mentioned in the respective subsection, two $7 -$ layered neural nets were constructed whose layer names are given as follows:
(a)
Image input;
(b)
2D convolution;
(c)
Batch normalization;
(d)
(29) vs. Generalized Gaussian Radial Basis Function;
(e)
Ten fully connected layers;
(f)
Softmax;
(g)
Classification layer.
It should be noted that, from the above $7 -$ layered neural network constructed, both of the neural network training experiments came to a complete stop when the maximum epoch of 30 amounting to 330 iterations were finished. Additionally, the sequence of inserting the neural net layer was the same as that which was given above from (a)–(g). Once the desired neural net is set with both of the activation functions, we immediately learn that the neural net containing the activation function of the Generalized Gaussian Radial Basis Function Kernel as one of its layers simply outperforms its opponent, i.e., the neural net layer containing the activation function of (29).
With the establishment of the basics of the neural network architecture in the previous listing point, we train and execute a deep convolution neural net for the representative images of the alphabet characters A, B, and C. A list of the seven layers involved in the construction of a deep convolution neural network is presented as follows:
(a)
Image input;
(b)
2D convolution;
(c)
(29) vs. Generalized Gaussian Radial Basis Function;
(d)
A 2 $\times 2$ max pooling with stride [2 2] and padding [0 0 0 0];
(e)
Three fully connected layers;
(f)
Softmax;
(g)
Classification layer.
Again, the sequence of inserting the neural net layers was the same as what is given above from (a)–(g) to construct the present DCNN. Similarly, the result for the successful classification performances by the DCNN which contains the activation layer of Generalized Gaussian Radial Basis Function Kernel achieves an accuracy upward of $96 %$ . However, we fail to see the match-able competitiveness by the DCNN, which contains the (29) as the activation layer.

7. Future Directions

Eigen Function Expansion of GGRBF

We recall the Mercer’s theorem from ([12], Theorem 4.2, Page 96).

Theorem 8

(Mercer’s Theorem). Let

(X, μ)

be a finite measure space and

k \in L_{\infty} (X^{2}, μ^{2})

be a kernel, such that

T_{k} : L_{2} (X, μ) \to L_{2} (X, μ)

is a definite positive. Let

{\{ϕ_{i}\}}_{i} \in L_{2} (X, μ)

be the normalized eigenfunctions of

T_{k}

associated with the eigenvalues

{\{Λ_{i}\}}_{i}

. Then,

1.: the eigenvalues ${\{Λ\}}_{i}$ are absolutely summable
2.: $k (x, x^{'}) = \sum_{i = 0}^{\infty} Λ_{i} ϕ_{i} (x) ϕ_{i} {(x^{'})}^{*}$ holds $μ^{2}$ almost everywhere, where the series converges absolutely and uniformly $μ^{2}$ almost everywhere.

Example 3.

With the application of Theorem 8, we can provide the eigen function decomposition of

K_{σ} (x, z) = e^{- σ^{2} {(x - z)}^{2}}

for

x, z \in R

, that is:

\begin{matrix} e^{- σ^{2} {(x - z)}^{2}} = & \sum_{i = 0}^{\infty} Λ_{i} ϕ_{i} (x) ϕ_{i} (z), \end{matrix}

(30)

\begin{matrix} Λ_{i} = & \frac{α σ^{2 i}}{{(\frac{α^{2}}{2} (1 + \sqrt{1 + {(\frac{2 σ}{α})}^{2}}) + \frac{σ^{2}}{2})}^{i + 1 / 2}}, \end{matrix}

(31)

\begin{matrix} ϕ_{i} (x) = & \frac{\sqrt[8]{1 + {(1 + \frac{2 σ}{α})}^{2}}}{\sqrt{2^{i} i!}} e^{- (\sqrt{1 + {(\frac{2 σ}{α})}^{2}} - 1) \frac{α^{2} x^{2}}{2}} H_{i} (\sqrt[4]{1 + {(\frac{2 σ}{α})}^{2}} α x) . \end{matrix}

(32)

The expression

{\{H_{i} (•)\}}_{i}

in (32) are the Hermite polynomials which are

L_{2}

orthonormal against the weight

\frac{α}{π} e^{- α^{2} x^{2}}

, that is:

\begin{matrix} \int_{R} ϕ_{n} (x) ϕ_{m} (x) \frac{α}{π} e^{- α^{2} x^{2}} d x = δ_{n m} . \end{matrix}

(33)

We have the graphical representation of first seven Hermite polynomials in Figure 15a,b.

Based on the Hermite polynomials [28] and as its application for the eigen decomposition analysis for the Gaussian Radial Basis Function Kernel, we have the following promising future research direction. In the spirit of the application of Mercer’s theorem, we know the eigen function decomposition of the Gaussian Radial Basis Function Kernel. However, presently, we are not fortunate enough to have such a decomposition for the Generalized Gaussian Radial Basis Function Kernel. A preliminary investigation towards the desired eigen function decomposition of the Generalized Gaussian Radial Basis Function Kernel (from [29,30,31]) directs us toward the incorporation of a new variety of functions, as defined in (34).

\begin{matrix} H_{n} (x) : = & {(- 1)}^{n} e^{a x^{2}} e^{- e^{- b x^{2}} + 1} \frac{d^{n}}{d x^{n}} (e^{- a x^{2}} e^{e^{- b x^{2}} - 1}) \end{matrix}

(34)

into our desired eigen function decomposition analysis. Here,

a > 0

and

b \geq 0

; therefore, if

a = 1 & b = 0

, then

H_{i} = H_{i}

. The first two expressions for

H_{n} (x)

are explicitly given as:

\begin{matrix} H_{1} (x) = & 2 a x + 2 b x e^{- b x^{2}} \end{matrix}

(35)

\begin{matrix} H_{2} (x) = & - 2 a + 4 b^{2} x^{2} e^{- b x^{2}} - 2 b e^{- b x^{2}} + {(2 a x + 2 b x e^{- b x^{2}})}^{2} . \end{matrix}

(36)

The real constants a and b present in (34) corresponds to the respective constants present in (1) as

a = σ^{2}

and

b = σ_{0}^{2}

. Next, see the graphical presentation of the first seven functions from the family defined in (34) in Figure 16 and Figure 17.

Future Direction 1.

We remain in a position lacking in knowledge of eigenvalues for the Generalized Gaussian Radial Basis Function Kernel. Additionally, we still need to investigate whether the function introduced in (34) is an orthonormal in the sense that the traditional Hermite polynomials are. Therefore, it will be interesting to understand the analysis of the function given in (34).

8. Discussion

In the present manuscript, we explore and introduce the Hilbert function space theory; in particular, the RKHS was generated by the Generalized Gaussian Radial Basis Function

g_{σ^{2}} (r) exp (g_{σ_{0}^{2}} (r) - 1)

as an

L^{2} -

measure, followed by an interesting application of it in the learning architectures of machine learning and deep learning. Throughout this article, we have learned from practical results that it is indeed beneficial to employ the Generalized Gaussian Radial Basis Function Kernel; here, one might achieve phenomenal results compared to those derived using its corresponding counterparts. Even though it is quite advantageous to use the Generalized Gaussian Radial Basis Function Kernel in learning architectures, we also encounter a few shortcomings in this endeavor. Oftentimes, the result yielded by the Generalized Gaussian Radial Basis Function is similar to the results that one might obtain using the preexisting and well-studied activation functions, such as Gaussian error linear unit (GELU) (cf. [32]), whose operation on x is given as

\begin{matrix} GELU (x) : = \frac{x}{2} (1 + erf (\frac{x}{\sqrt{2}})) . \end{matrix}

(37)

For instance, the comparison of the results based on choice of either Generalized Gaussian Radial Function Kernel or Gaussian error linear unit as an activation function is given in Table 4 and Table 5, respectively. We see that the results from both of the cases are comparable in terms of acquiring an accuracy rate. The key point to note here is that the Generalized Gaussian Radial Basis Function Kernel is a brand-new novel mathematical function to operate on; hence, it demands a detailed, in-depth investigation, which is initiated by its RKHS in this article, before it can be exploited to its full potential on the venues of artificial intelligence. Out of the other few major concerns in connection with the Generalized Gaussian Radial Basis Function Kernel is its mathematical structure, i.e., the presence of an extra exponential of an exponential of an argument. Hence, the simple presence of this can often slow the training process of learning architectures in which the Generalized Gaussian Radial Basis Function Kernel is employed; therefore, this might pose a challenge in attaining the desired results.

Funding

This research was funded by The Office of Dean, the Office of Research, Scholarship, and Sponsored Programs, Robert R. Muntz Library at The University of Texas at Tyler, TX-75799, USA.

Data Availability Statement

The coding framework presented in this article was executed on MATLAB R2023bat (https://www.mathworks.com/products/matlab.html, accessed on 15 February 2024).

Acknowledgments

The author would like to thank the reviewer for their precious time in reviewing the present manuscript. The author acknowledges the support of Drishty Singh, 4th Year MSc, Department of Mathematics, Babasaheb Bhimrao Ambedkar University, Lucknow, Uttar Pradesh, India. She provided the explicit expression of the first two functions in (35) and (36) from (34). Further analysis of function in (34) was extended due to these important results. The author wishes to thank the referees for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Baddoo, P.J.; Herrmann, B.; McKeon, B.J.; Brunton, S.L. Kernel learning for robust Dynamic Mode Decomposition: Linear and Nonlinear disambiguation optimization. Proc. R. Soc. A 2022, 478, 20210830. [Google Scholar] [CrossRef] [PubMed]
Gianola, D.; Van Kaam, J.B. Reproducing Kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 2008, 178, 2289–2303. [Google Scholar] [CrossRef] [PubMed]
Attia, N.; Akgül, A.; Seba, D.; Nour, A. Reproducing kernel Hilbert space method for the numerical solutions of fractional cancer tumor models. Math. Methods Appl. Sci. 2023, 46, 7632–7653. [Google Scholar] [CrossRef]
Mathematical Functions Power Artificial Intelligence. Available online: https://nap.nationalacademies.org/resource/other/deps/illustrating-math/interactive/mathematical-functions-power-ai.html (accessed on 15 February 2024).
Mathematics and Statistics of Weather Forecasting. Available online: https://nap.nationalacademies.org/resource/other/deps/illustrating-math/interactive/mathematics-and-statistics-of-weather-forecasting.html (accessed on 15 February 2024).
Kalman, B.L.; Kwasny, S.C. Why tanh: Choosing a sigmoidal function. In Proceedings of the 1992 IJCNN International Joint Conference on Neural Networks, Baltimore, MD, USA, 7–11 June 1992; Volume 4, pp. 578–581. [Google Scholar]
Lu, L.; Shin, Y.; Su, Y.; Karniadakis, G.E. Dying ReLU and initialization: Theory and numerical examples. arXiv 2019, arXiv:1903.06733. [Google Scholar] [CrossRef]
Fasshauer, G.E. Meshfree Approximation Methods with MATLAB; World Scientific: Singapore, 2007; Volume 6. [Google Scholar]
Brunton, S.L.; Kutz, J.N. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
Stein, M.L. Interpolation of Spatial Data: Some Theory for Kriging; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Rosenfeld, J.A.; Russo, B.; Kamalapurkar, R.; Johnson, T.T. The occupation kernel method for nonlinear system identification. arXiv 2019, arXiv:1909.11792. [Google Scholar]
Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
Park, J.; Sandberg, I.W. Universal approximation using radial-basis-function networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef] [PubMed]
Singh, H. A new kernel function for better AI methods. In Proceedings of the 2023 Spring Eastern Sectional Meeting, Virtual, 1–2 April 2023. [Google Scholar]
Karimi, N.; Kazem, S.; Ahmadian, D.; Adibi, H.; Ballestra, L. On a generalized Gaussian radial basis function: Analysis and applications. Eng. Anal. Bound. Elem. 2020, 112, 46–57. [Google Scholar] [CrossRef]
Aronszajn, N. Theory of reproducing kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
Steinwart, I.; Hush, D.; Scovel, C. An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Trans. Inf. Theory 2006, 52, 4635–4643. [Google Scholar] [CrossRef]
Barnes, E.W.V. The asymptotic expansion of integral functions defined by Taylor’s series. Philos. Trans. R. Soc. London. Ser. A Contain. Pap. A Math. Phys. Character 1906, 206, 249–297. [Google Scholar]
Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; US Government Printing Office: Washington, DC, USA, 1968; Volume 55. [Google Scholar]
Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series, and Products, 7th ed.; Academic Press: Cambridgem MA, USA; Elsevier: Amsterdam, The Netherlands, 2007. [Google Scholar]
Christmann, A.; Steinwart, I. Support Vector Machines; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Steinwart, I. Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans. Inf. Theory 2005, 51, 128–142. [Google Scholar] [CrossRef]
Devroye, L.; Györfi, L.; Lugosi, G. A Probabilistic Theory of Pattern Recognition; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 31. [Google Scholar]
Kidger, P.; Lyons, T. Universal approximation with deep narrow networks. In Proceedings of the Conference on Learning Theory, Graz, Austria, 9–12 July 2020; Proceedings of Machine Learning Research. pp. 2306–2327. [Google Scholar]
Cybenko, G. Approximation by Superpositions of a Sigmoidal Function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195. [Google Scholar]
Hermite, M. Sur un Nouveau Développement en Série des Fonctions; Imprimerie de Gauthier-Villars: 1864; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Fasshauer, G.E.; McCourt, M.J. Stable evaluation of Gaussian radial basis function interpolants. Siam J. Sci. Comput. 2012, 34, A737–A762. [Google Scholar] [CrossRef]
Rasmussen, C.E.; Williams, C.K. Gaussian Processes for Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 1. [Google Scholar]
Zhu, H.; Williams, C.K.; Rohwer, R.; Morciniec, M. Gaussian Regression and Optimal Finite Dimensional Linear Models; Neural Networks and Machine Learning; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (Gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]

Figure 1. Graph of Sigmoid activation function.

Figure 2. Graph of ReLU activation function.

Figure 3. Graphs of radial basis functions of exponential types.

Figure 4. GRBF kernel.

Figure 5. GGRBF kernel with

σ = \frac{1}{2}

and

σ_{0} = \frac{1}{0.98}

.

Figure 5. GGRBF kernel with

σ = \frac{1}{2}

and

σ_{0} = \frac{1}{0.98}

.

Figure 6. GGRBF kernel with

σ = \frac{1}{2 + π}

and

σ_{0} = 0

.

Figure 6. GGRBF kernel with

σ = \frac{1}{2 + π}

and

σ_{0} = 0

.

Figure 7. Polar plot of orthonormal basis

e_{n} (z) : = \sqrt{\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}}} z^{n}

given in (16).

Figure 7. Polar plot of orthonormal basis

e_{n} (z) : = \sqrt{\frac{σ^{2 n}}{n! F_{n, \hat{σ}, 1}}} z^{n}

given in (16).

Figure 8. Kernel regression of

f (x) = e^{\frac{(1 - 9 x^{2})}{4}} + tan x + x^{\frac{1}{6}} + sin x^{ϑ (n)}

with GRBF (L) and GGRBF (R).

Figure 8. Kernel regression of

f (x) = e^{\frac{(1 - 9 x^{2})}{4}} + tan x + x^{\frac{1}{6}} + sin x^{ϑ (n)}

with GRBF (L) and GGRBF (R).

Figure 9. Kernel regression of

f (x) = e^{sin x - sin x^{2}} + \sqrt{2 π} | x + cos ϑ (n) |

with GRBF (L) and GGRBF (R).

Figure 9. Kernel regression of

f (x) = e^{sin x - sin x^{2}} + \sqrt{2 π} | x + cos ϑ (n) |

with GRBF (L) and GGRBF (R).

Figure 10. SVM classifier (standard binary classification) via different kernels, namely (respectively) GGRBF (in top right), Sigmoid (in bottom left) and GRBF (in bottom right).

Figure 11. Training progress map for NN architecture with GGRBF as AF.

Figure 12. Training progress map for NN architecture with (29) as AF.

Figure 13. Accuracy of

96.33 %

registered with GGRBF in (1) as a DCNN neural layer.

Figure 13. Accuracy of

96.33 %

registered with GGRBF in (1) as a DCNN neural layer.

Figure 14. Accuracy of

91.87 %

registered with (29) as a DCNN neural layer.

Figure 14. Accuracy of

91.87 %

registered with (29) as a DCNN neural layer.

Figure 15. First seven Hermite polynomials. (a) Hermite polynomials

H_{i} (x)

for

i = 0, 1, 2

; (b) Hermite polynomials

H_{i} (x)

for

i = 3, 4, 5, 6

.

Figure 15. First seven Hermite polynomials. (a) Hermite polynomials

H_{i} (x)

for

i = 0, 1, 2

; (b) Hermite polynomials

H_{i} (x)

for

i = 3, 4, 5, 6

.

Figure 16. Graph of

H_{n} (x)

in (34) for

n = 0, 1, 2

with

a = 0.091 & b = 0.81

.

Figure 16. Graph of

H_{n} (x)

in (34) for

n = 0, 1, 2

with

a = 0.091 & b = 0.81

.

Figure 17. Graph of

H_{n} (x)

in (34) for

n = 3, 4, 5, 6

with

a = 0.091 & b = 0.81

.

Figure 17. Graph of

H_{n} (x)

in (34) for

n = 3, 4, 5, 6

with

a = 0.091 & b = 0.81

.

Table 1. Performance summary for GGRBF with an accuracy of

97.88 %

.

Table 1. Performance summary for GGRBF with an accuracy of

97.88 %

.

Training with GGRBF (on Single CPU)
Epoch	Iteration	Runtime (hh:mm:ss)	Accuracy (%)	Batch Loss	Learning Rate
1	1	00:00:04	8.59	2.5329	0.0010
2	50	00:00:10	75.78	1.0087	0.0010
3	100	00:00:15	93.75	0.4377	0.0010
4	150	00:00:21	99.22	0.2452	0.0010
6	200	00:00:26	99.22	0.1790	0.0010
7	250	00:00:31	100.00	0.1028	0.0010
8	300	00:00:36	100.00	0.0681	0.0010
9	350	00:00:41	100.00	0.0458	0.0010
10	390	00:00:45	100.00	0.0382	0.0010

Table 2. Performance summary for (29) with accuracy of

93.48 %

.

Table 2. Performance summary for (29) with accuracy of

93.48 %

.

Training with (29) (on Single CPU)
Epoch	Iteration	Runtime (hh:mm:ss)	Accuracy (%)	Batch Loss	Learning Rate
1	1	00:00:08	8.59	2.9976	0.0010
2	50	00:00:15	78.91	0.6053	0.0010
3	100	00:00:19	86.72	0.4108	0.0010
4	150	00:00:25	90.62	0.3122	0.0010
6	200	00:00:30	96.88	0.1632	0.0010
7	250	00:00:36	99.22	0.0695	0.0010
8	300	00:00:41	99.22	0.0778	0.0010
9	350	00:00:46	100.00	0.0390	0.0010
10	390	00:00:49	100.00	0.0355	0.0010

Table 3. Compilation of technical results.

AI Learning Architecture	Activation Function	Reference	Min. Error	Misclass. %	Accuracy %	Runtime (s)
Kernel Regression	GRBF	Figure 8	0.00230	-	-	-
Kernel Regression	GGRBF	Figure 8	$0.00097$	-	-	-
Kernel Regression	GRBF	Figure 9	0.00100	-	-	-
Kernel Regression	GGRBF	Figure 9	$0.00043$	-	-	-
SVM	GRBF	Figure 10	-	5.75	94.25	-
SVM	Sigmoid	Figure 10	-	4.50	95.50	-
SVM	GGRBF	Figure 10	-	$3.75$	$96.25$	-
AF in NN	GGRBF	Table 1	-	$2.12$	$97.88$	0.43077
AF in NN	$α$ ReLU	Table 2	-	6.52	93.48	1.8935
DCNN	GGRBF	Figure 13	-	$3.67$	$96.33$	-
DCNN	$α$ ReLU	Figure 14	-	8.13	91.87	-

Table 4. Performance summary for GGRBF in comparison with (37) with accuracy of

98.18 %

.

Table 4. Performance summary for GGRBF in comparison with (37) with accuracy of

98.18 %

.

Training with GGRBF (on a Single CPU)
Epoch	Iteration	Runtime (hh:mm:ss)	Accuracy (%)	Batch Loss	Learning Rate
1	1	00:00:00	11.72	2.5329	0.0010
2	50	00:00:04	72.66	1.0087	0.0010
3	100	00:00:09	84.38	0.4377	0.0010
4	150	00:00:13	89.06	0.2452	0.0010
6	200	00:00:18	98.44	0.1790	0.0010
7	250	00:00:23	100.00	0.1028	0.0010
8	300	00:00:27	100.00	0.0681	0.0010
9	350	00:00:32	100.00	0.0458	0.0010
11	400	00:00:36	100.00	0.0382	0.0010
12	450	00:00:41	100.00	0.0382	0.0010
13	500	00:00:45	100.00	0.0382	0.0010
14	546	00:00:50	100.00	0.0382	0.0010

Table 5. Performance summary for (37) in comparison with GGRBF with accuracy of

98.06 %

.

Table 5. Performance summary for (37) in comparison with GGRBF with accuracy of

98.06 %

.

Training with (37) (on a Single CPU)
Epoch	Iteration	Runtime (hh:mm:ss)	Accuracy (%)	Batch Loss	Learning Rate
1	1	00:00:00	10.94	2.7556	0.0010
2	50	00:00:04	82.81	0.5065	0.0010
3	100	00:00:07	97.66	0.1923	0.0010
4	150	00:00:11	100.00	0.0738	0.0010
6	200	00:00:14	100.00	0.0446	0.0010
7	250	00:00:28	100.00	0.0262	0.0010
8	300	00:00:21	100.00	0.0211	0.0010
9	350	00:00:25	100.00	0.0173	0.0010
11	400	00:00:29	100.00	0.0099	0.0010
12	450	00:00:32	100.00	0.0085	0.0010
13	500	00:00:36	100.00	0.0085	0.0010
14	546	00:00:39	100.00	0.0068	0.0010

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Singh, H. Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory. Mathematics 2024, 12, 829. https://doi.org/10.3390/math12060829

AMA Style

Singh H. Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory. Mathematics. 2024; 12(6):829. https://doi.org/10.3390/math12060829

Chicago/Turabian Style

Singh, Himanshu. 2024. "Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory" Mathematics 12, no. 6: 829. https://doi.org/10.3390/math12060829

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory

Abstract

1. Introduction

1.1. Motivation

1.2. Contributions

1.3. Plan of the Article

2. Notation and Preliminaries

2.1. Hypergeometric Function Notation

2.2. Field and Space Notations

2.3. Tensor Product Notation

2.4. Preliminaries

3. Function Space of Generalized Gaussian Radial Basis Measure

Orthonormal Basis

4. Restriction and Universality of Reproducing Kernel of RKHS $H_{σ, σ_{0}, C^{d}}$

5. Empirical Evidence and Results Comparison

5.1. Kernel Regression

5.1.1. Example 1

5.1.2. Example 2

5.2. Support Vector Machine

5.3. Neural Network

5.3.1. Activation Function

5.3.2. DCNN

6. Results and Technical Details

7. Future Directions

Eigen Function Expansion of GGRBF

8. Discussion

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory

Abstract

1. Introduction

1.1. Motivation

1.2. Contributions

1.3. Plan of the Article

2. Notation and Preliminaries

2.1. Hypergeometric Function Notation

2.2. Field and Space Notations

2.3. Tensor Product Notation

2.4. Preliminaries

3. Function Space of Generalized Gaussian Radial Basis Measure

Orthonormal Basis

4. Restriction and Universality of Reproducing Kernel of RKHS H σ , σ 0 , C d

5. Empirical Evidence and Results Comparison

5.1. Kernel Regression

5.1.1. Example 1

5.1.2. Example 2

5.2. Support Vector Machine

5.3. Neural Network

5.3.1. Activation Function

5.3.2. DCNN

6. Results and Technical Details

7. Future Directions

Eigen Function Expansion of GGRBF

8. Discussion

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Restriction and Universality of Reproducing Kernel of RKHS $H_{σ, σ_{0}, C^{d}}$