Applying the Optimal Injected Noise to Signal Estimation via Adaptive Stochastic Resonance

In signal estimation, an optimal estimator is frequently unachievable, because its closed form may not be analytically tractable or is too complex to implement. Alternatively, one can turn to suboptimal yet easily implementable estimators for practical signal estimation tasks. In this paper, a general noise-boosted estimator is designed and the adaptive stochastic resonance method is implemented to simultaneously exploit the beneficial role of the injected noise as well as the learning ability of the estimator parameter. Aiming to effectively improve the estimation performance, we use the kernel function method to find an approximate solution for the probability density function (PDF) of the optimal injected noise. During this process, the noise PDF and the estimator parameter establish a finite dimensional non-convex optimization space for maximizing the estimation performance, which is adaptively searched by the sequential quadratic programming algorithm at each iteration. Two representative signal estimation problems are explored, consisting in estimating a random signal from low-resolution observations, or a deterministic parameter in a heavy-tailed noisy environment. The obtained results demonstrate that this adaptive stochastic resonance method can improve the performance of the suboptimal estimators and bring it very close to that of the optimal estimator.


Introduction
It is a well-established fact nowadays that a suitable amount of noise can result in an improved performance of certain nonlinear systems, and such favorable use of injected noise is the so-called stochastic resonance phenomenon [1][2][3]. In recent years, the phenomenon of stochastic resonance has been the subject of considerable interest in various research fields, such as physics [4][5][6][7], biomedicine [8], signal processing [9,10] and neural networks [11][12][13]. However, a complicated question is to characterize the optimal level, or even better the optimal probability density function (PDF), of the noise to be purposefully injected in a given nonlinear process in order to maximize the performance. Most of the time the noise level alone is optimized, analytically or numerically, for a fixed given noise PDF [14][15][16][17][18]. The task of optimizing the noise PDF is more ambitious but a priori can lead to better performances of processors [10,[19][20][21]. However, analytic solutions of the optimal noise PDF are quite difficult to obtain in general. It can be displayed numerically or by adaptive learning.
In order to adaptively search a non-zero optimal noise level, Mitaim and Kosko proposed the adaptive stochastic resonance method by performing the stochastic gradient ascent algorithm on the signal-to-noise ratio of dynamical systems [2] and the mutual-information of neuron systems [22]. The random noise gradient can tune both the system parameters and the noise level in any adaptive process, and can thus improve the learning capability [2,22]. Inspired by the parameter-tuning stochastic resonance effect [23], the adaptive stochastic resonance can also be achieved by searching optimal parameters of dynamical systems and processing steps based on different optimization algorithm [24][25][26][27][28][29][30][31][32], such as the particle swarm optimization [24], the genetic algorithm [25], and the artificial fish swarm algorithm [31,32]. Most recently, we proposed a noise-boosted backpropagation learning method in feedforward threshold neural networks [12,33], wherein the injected noise level as well as network weights are adaptively learned by the stochastic gradient descent learning rule.
The stochastic gradient descent algorithm can learn to find the stochastic resonance effects in some simple dynamical systems, but might fail to exploit the beneficial role of noise for complex nonlinear systems [2,18]. Especially, the adaptive learning process of the optimal noise PDF for some non-convex optimization problems is not sufficiently discussed [18]. In this paper, we design a distributed parameter estimation strategy based on the adaptive stochastic resonance method, wherein the optimal injected noise PDF, along with the estimator parameters, is "intelligently" updated and searched by the sequential quadratic programming (SQP) optimization algorithm [34,35]. We here consider two illustrative signal estimation problems, consisting in estimating a random signal from low-resolution observations, or a deterministic parameter in a heavy-tailed noisy environment. It will be shown that, via the adaptive stochastic resonance method, the mean square error (MSE) of the designed estimator can be significantly reduced. The learning scheme converges to an approximate optimal injected noise PDF and to the optimized estimator parameters. It is interesting to note that, using the optimal noise PDF, the designed estimator is almost identical to the optimal estimator and therefore, we can improve the practically implemented estimator to have the same estimation accuracy as the optimal one does. These obtained results enlarge the applicability of adaptive stochastic resonance for signal estimation, and can be further investigated for other complex signal processing tasks. Consider a general noise-boosted estimator, as shown in Fig. 1, in which M identical nonlinear elements are boosted by the injection of mutually independent noise components η mn at time n for m = 1, 2, · · · , M . The injected noise components η mn are independent and identically distributed random variables with a common PDF f η . The paralleled elements of the designed estimator in Fig. 1 contain a tunable parameter γ. Let η n = [η 1n , η 2n , · · · , η M n ] ⊤ denote the injected noise vector, y n = [y 1n , y 2n , · · · , y M n ] ⊤ be the output vector of the paralleled elements and w n = [w 1n , w 2n , · · · , w M n ] ⊤ represent the weight vector. The noise-boosted estimator can be calculated asθ = w 0 + w ⊤ n y n . For estimating the signal parameter θ, the adaptation process is oriented toward

Model and main results
of the noise-boosted estimator, where E x,η (·) denotes the expectation with respect to the joint PDF f x,η of variables x and η. It is noted that, for a given noisy environment, the MSE of Eq. (1) is closely related to the PDF f η of the injected noise, the parameter γ of the nonlinear elements and the weights w 0 and w n . In practice, by differentiating Eq. (1) with respect to the weights [∂E/∂w 0 , ∂E/∂w ⊤ n ] ⊤ and set the derivative equal to zero, the optimum solution of [w 0 , w ⊤ n ] ⊤ is easily obtained as the Wiener weight vector [36]. In order to minimize the MSE E of Eq (1), we here maintain our focus on the role of the injected noise and the tunable estimator parameter. Since the injected noise type is not necessarily Gaussian, then our main objective is finding an optimal injected noise PDF to minimize the MSE E of Eq (1) with respect to an arbitrary PDF f η and the adjustable estimator parameter γ.
The optimization problem of the optimal noise PDF f opt η (η) in Eq. (2) is usually analytically intractable, and we here search an approximate optimal noise PDF described byf with K kernel functions ϱ k (η). Note that ϱ k (η) ≥ 0 is also a probability density function and satisfies the normalization condition of ∫ ϱ k (η)dη = 1 for k = 1, 2, · · · , K. As the number K increases, the approximate formf opt η of the optimal noise PDF converges to the analytical one f opt η if it exists [37]. The normalization coefficients satisfy ν k ≥ 0 and ∑ K k=1 ν k = 1. For instance, a popular and commonly used kernel function is the Gaussian kernel function [21,38] ϱ k (η) = φ((η − µ k )/σ k )/σ k with the location parameter µ k and scale parameter σ k ≥ 0, and the standard Gaussian function φ(x) = exp(−x 2 /2)/(2π) 1/2 . In this way, for a given K, the infinite-dimensional optimization problem of Eq. (2) reduces to a constrained finite-dimensional optimization with respect to the parameter vector Θ = [γ, ν, µ, σ] ⊤ , where the coefficient vector ν = [ν 1 , ν 2 , · · · , ν K ] ⊤ , the position vector µ = [µ 1 , µ 2 , · · · , µ K ] ⊤ and the scale vector σ = [σ 1 , σ 2 , · · · , σ K ] ⊤ . Since the MSE E(Θ) in Eq.(4) with respect to parameter vector Θ is frequently non-convex, then the parameter vector Θ can be adaptively searched numerically by the sequential quadratic programming method [34,35]. At each major iteration of the renewal process of Θ, an approximate positive definite Hessian matrix of the Lagrangian function is computed via a quasi-Newton difference updating method [35,37] and then the quadratic programming subproblem yields a solution for a line search procedure [35,37]. A detailed description of the noise-boosted learning method is presented in Algorithm 1 for adaptively searching the optimal noise indicated by the (local) optimal solution off opt η .

Algorithm 1 Adaptive Noise-boosted Learning Algorithm
Consider the minimization of MSE of Eq. (4), a quadratic programming(QP) subprogram for updating the parameter vector Θ t at the t-th iteration is developed Here, ∇ is the gradient operator and H t denotes the Hessian matrix of the Lagrangian function Evaluate E t and the gradient ∇E t ; Solve Eq. (5) to obtain the SQP step q t ; Calculate the step length β t by the line search method; Update H t using a quasi-Newton formula to obtain H t+1 ; end if Set t ← t + 1; until convergence test satisfied.
In the following parts, we will utilize the proposed adaptive noise-boosted Algorithm 1 to illustratively accomplish the Bayesian estimation of a random parameter and the robust estimation of a determinate signal. The adaptive learning capacity of the injected noise will be clearly demonstrated.

Random parameter estimation via adaptive stochastic resonance
First, we consider the problem of estimating a random parameter θ from the signal model x = θ + ξ. As a motivating example, the random parameter θ independent of background noise ξ, has a uniform prior PDF f θ (x) = 1/a for x ∈ [0, a] and zero otherwise, and the background noise ξ has the generalized Gaussian PDF where , the shape parameter α and the scale parameter σ ξ are both positive [39,40]. As illustrated in Fig. 1, M identical quantizers have the transfer function g(x) = 1 for x ≥ γ and zero otherwise. This kind of low-resolution sensors is widely employed in distributed estimation problems, whereby the analog observation is compressed into a low-precision single-bit information with low occupied bandwidth.
Since the estimatorθ = w 0 + w ⊤ y n is designed to be unbiased, then the bias weight It is proved that, for a sufficiently large number M of identical transfer functions, the optimum weight coefficients for the minimization of the MSE can be derived as Wiener solution [21,38]. The tedious manipulation of the Wiener weight w is not included here for simplicity, and we derive the corresponding MSE as [21,38] where E θ (θ) and var(θ) denote the mean and the variance of the random parameter θ, respectively, and the noise-smoothed transfer function g(x) = E η [g(x + η]. In order to learn the optimal PDF of the injected noise that minimizes the MSE E of Eq. (7), we employ the Gaussian kernel functions ϱ k (η) = φ((η − µ k )/σ k ) in the implementation of the proposed learning Algorithm 1. For comparison, the minimum MSE (MMSE) E ms achieved by the MMSE estimatorθ ms = E(θ|x) = ∫ θf (θ|x)dθ is also calculated, where the conditional [41]. For instance, consider the interval bound a = 2 of the uniform PDF of the random parameter θ and the Laplacian background noise having PDF of Eq. (6) with α = 1 and σ ξ = 1. Choosing K = 4 Gaussian kernel functions, by using the proposed adaptive noise-boosted Algorithm 1, the four kernels finally converge to one Gaussian kernel. It is shown in Figs. 2 (a)-(c) that the learning curves of the MSE E of Eq. (7), the threshold parameter γ of quantizers and the level σ η of injected noise are adaptively updated. After 30 iterations, the quantizer threshold γ and the noise level σ η converge to 1 and 0.5584, respectively, as shown in Figs. 2 (b) and (c). It is also illustrated in   7) can be optimized to 0.2287 at the optimal noise level 0.5584, which almost approaches the minimum MSE 0.2285 (blue dotted line) achieved by the MMSE estimatorθ ms . The comparisons of the transfer functions of the MMSE estimatorθ ms and the designed estimatorθ at the optimal σ opt η = 0.5584 demonstrate that adding noise is advantageous for this example, as shown in Fig. 2(d). It is seen that, with the learned noise level σ η = 0.5584 and the threshold γ = 1, the transfer function of the noise-boosted estimator θ, far better than the quantizer without the injected noise, approximates to that of the MMSE estimatorθ ms in the concerned domain interval of [−2, 2].
The MSE surface of the noise-boosted estimatorθ versus the noise level σ η and the threshold parameter γ is also illustratively plotted in Fig. 3(a). It is seen that the non-convex MSE of the designed estimatorθ is clearly higher as the noise level σ η → 0, which indicates the optimal noise level will be nonzero. Therefore, the injection of noise is necessary for reducing the MSE of the designed estimatorθ. For a randomly selected initial vector Θ, the learning curve of the MSE in Fig. 2(a) is also shown in Fig. 3(a) with its corresponding projection given in the contour map of Fig. 3(b). The convergence of the MSE demonstrates the aforementioned analyses and the feasibility of the proposed adaptive noise-boosted Algorithm 1.  Fig. 2.
Furthermore, for a wide range of non-Gaussian environments, Fig. 4(a) illustrates the minimum MSE achieved by the MMSE estimatorθ ms (blue dotted line) as the benchmark. The MSEs of the designed estimatorθ without injected noise(•) and with optimal noise (▽) versus the background noise type described by the shape parameter α of Eq. (6) are also plotted. It is clearly seen in Fig. 4(a) that, with the assistance of the optimal injected noise, the MSE of the designed estimator achieves a rather similar performance in comparison to the MMSE estimatorθ ms for a wide range of background noise types. In addition, as shown in Fig. 4(b), the MSEs of three concerned estimators versus the bound parameter a of the uniformly distributed θ are compared. Upon increasing a from 0 to 8, the curve of the MSE of the designed estimator with the optimal injected noise (▽) almost always coincides with the minimum MSE achieved by the MMSE estimator.

Robust estimation via adaptive stochastic resonance
The proposed adaptive noise-boosted Algorithm 1 can also be applied to robust estimation of a determinate signal θ. It is known that many areas of engineering, for instance, mobile communication channels, result in outliers or atypical observations that do not obey the original assumptions. Thus, the need for robust estimation techniques that are insensitive to outliers is great. Consider a signal model x n = θ+ξ n denoting a determinate signal θ corrupted by heavytailed background noise observations ξ n for n = 1, 2, · · · , N . Here, the Cauchy noise ξ n , as an example of the heavy-tailed background noise, has a common PDF f ξ (x) = σ ξ /[π(σ 2 ξ +x 2 )] with zero mean and noise level σ ξ . The paralleled elements in Fig. 1 are chosen as the bisquare maximum-likelihood type estimator (M-estimator) [42,43], which has the score function ψ(x) = x [ 1 − (x/γ) 2 ] 2 for |x| ≤ γ and zero otherwise. Here, the parameter γ is adjustable. Of course, other heavy-tailed noise types and M-estimators can be employed for verifying the practicability of the proposed adaptive noise-boosted Algorithm 1 for improving the performance of the M-estimator.
For this considered robust estimation of the determinate signal θ buried in the heavy-tailed background noise, the Fisher-consistent condition E ξ,η [ψ(ξ + η)] = 0 needs to be satisfied. Therefore, we here select the bias w 0 = 0 and the weights w nm = 1. Then, for a sufficiently large number M of elements, we proved in [44] that the designed estimatorθ in Fig. 1 converges to the true signal θ. With a sufficiently large number M of the injected noise components η nm , the score function ψ(ξ + η) can be boosted as ϕ(ξ) = E η [ψ(ξ + η)] with its derivative ϕ ′ (ξ) = dE η [ψ(ξ + η)]/dξ. Then, by taking the first-order Taylor expansion of ϕ around the true signal θ, the MSE or variance of the designed estimatorθ can be calculated as [44] where the inequality in Eq. (9) holds according to the Cauchy-Schwarz inequality. The Cramér-Rao lower bound (N J(f ξ )) −1 in Eq. (9) is achieved by denotes the score function of the maximumlikelihood estimator (MLE) [42,43]. Here, J(f ξ ) is the Fisher information of the noise PDF f ξ .
For a given form of the score function ψ with a tunable parameter γ, the parameter γ can be optimized to obtain a minimum of the MSE of the Mestimators. We proved in [44] that the addition of extra noise, along with the adjustment of the estimator parameter γ, can bring a further reduction of the MSE of the M-estimators. Therefore, we here explore the optimal noise PDF f opt η to minimize the MSE of the designed estimatorθ in Eq. (8) with the constraints of the Fisher-consistency. In order to satisfy the Fisher-consistent condition E ξ,η [ψ(ξ + η)] = 0, we here adopt in Eq. (3) a symmetric double-Gaussian kernel function ϱ k (η) = [φ((η + µ k )/σ k ) + φ((η − µ k )/σ k )]/(2σ k ). Moreover, to simplify the calculation of the MSE of the noise-boosted estimator θ in Fig. 1, set the scale parameter σ k → 0, and then the kernel function reduces to ϱ k (η) = [δ(η + µ k ) + δ(η − µ k )]/2, where δ(·) denotes the Delta function.
For a given number K = 4 of kernel functions, the adaptive process of the parameter Θ = [γ, ν, µ] ⊤ is realized by the proposed adaptive noiseboosted Algorithm 1 starting from the initial parameter vector Θ 0 (γ = 2.5,  Fig. 5 that, as the M-estimator parameter γ and the noise parameters adaptively learn, the variance E of the designed estimatorθ can be reduced to the optimum value of 2.008, which is very close to the Cramér-Rao lower bound 2 (blue dotted line in Fig. 5(a)) obtained by the MLE. In Fig. 5(d), the normalized score functions are illustrated for ψ ML (x) = ψ ML (x)/∥ψ ML (x)∥ of the MLE, ϕ(x) = ϕ(x)/∥ϕ(x)∥ of the designed estimatorθ with the optimal noise and ψ(x) = ψ(x)/∥ψ(x)∥ of the bisquare M-estimator. Here, ∥g(x)∥ = [∫ g 2 (x)dx ] 1/2 denotes the L 2 -norm of a function g(x). As shown in Fig. 5(d), the score function of the designed estimatorθ with the optimal noise is found very similar to that of the MLE.
The boosting role of the injection of noise is clearly manifested again in the M-estimator for robust estimation.  Here, the scale parameter of the background Cauchy noise is σ ξ = 1, and the kernel function number is K = 4.

Conclusion
In this paper, a noise-boosted learning algorithm is implemented for the exploration of noise benefits in estimating deterministic and random signal parameters. The injected noise, as well as the estimator parameter, are adaptively learned for improving the performance of the designed estimator. It is shown that, in the learning process, the performance of the designed estimator can be improved to come very close to that of the optimal estimator (e.g. Fig. 5(d) and Fig. 2(d)). For the random signal estimation with prior knowledge, the MSE of the designed estimator based on low-resolution observations can adaptively converge to the minimum MSE achieved by the MMSE estimator using the full analog observations. For the problem of estimating a deterministic signal buried in heavy-tailed background noise, the MSE of the designed estimator almost reaches the Cramér-Rao lower bound by the injection of the optimal noise herein. The beneficial effect of the injected noise is adaptively implemented by the proposed noise-boosted algorithm in signal estimation. The effectiveness of this adaptive learning process deserves to be intensively studied in various other complex signal processing tasks. Several open questions still need to be addressed in future studies. The proposed adaptive learning algorithm is based on the Jacobian and Hessian matrices of the MSE of the designed estimator, and the theoretical expression of the MSE needs to be known. When the theoretical expression of the MSE is too complicated to obtain, this learning algorithm is inapplicable. In this situation, an open and interesting question is how to adaptively learn the optimal noise to be injected into the observed data. In addition, in many practical signal processing tasks, the observed data are acquired sequentially as time progresses. Then, the sequential implementation of adaptively choosing and injecting the optimal noise as time progresses will be of great interest.
Acknowledgments. This work is sponsored by the National Science Foundation of China under the Grant No. 62001271.
Data availability statement. The datasets analysed during the current study are available from the corresponding author on reasonable request.

Declarations
• Funding: This study was funded by National Science Foundation of China (Grant Number 62001271). • Conflict of interest: The authors declare that they have no conflict of interest.
• Consent to participate: Not applicable.
• Consent for publication: Not applicable.
• Availability of data and materials: The datasets analysed during the current study are available from the corresponding author on reasonable request. • Code availability: The code generated or used during the study is available from the corresponding author on reasonable request. • Authors' contributions: Material preparation, formal analysis, methodology were performed by Yan Pan. The first draft of the manuscript was written by Yan Pan and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.