1 Introduction

Adaptive filters are frequently employed to handle filtering problems in which the statistics of the underlying signals are either unknown a priori, or in some cases, slowly-varying. Many adaptive filtering algorithms have been proposed and they are usually variants of the well known least mean square (LMS) [1] and the recursive least squares (RLS) [2] algorithms. An important variant of the LMS algorithm is the normalized least mean square (NLMS) algorithm [3, 4], where the step size is normalized with respect to the energy of the input vector. Due to the numerical stability and computational simplicity of the LMS and the NLMS algorithms, they have been widely used in various applications [5].

Other studies on the variants of the LMS and NLMS algorithms have also been very active, among which their robust counterparts in impulsive noise environment are of great practical interests. This is because their performances will deteriorate significantly when the additive noise is of impulsive nature [6, 7], as both the LMS and NLMS algorithms assume that the additive noise is Gaussian distributed. Considerable efforts have been devoted to combat this adverse effect of impulsive noise on various adaptive filtering algorithms [712]. Among them, the least mean M-estimate (LMM) [7] algorithm, and its step size normalized version, the normalized least mean M-estimate (NLMM) algorithm are two efficient generalizations of the LMS family. Like the recursive least M-estimate (RLM) algorithm [12], which is a robust variant of the RLS algorithm, they are derived from the concept of robust statistics techniques [13] where the M-estimate function [13] is minimized instead of the MSE to improve the robustness to impulsive noise. A common and important feature of these robust algorithms is the use of error nonlinearity to suppress the adverse effect of impulsive errors with large amplitude.

Because of the importance of these adaptive filtering algorithms, their convergence performance analyses have been long standing research problems. The convergence behavior of the LMS algorithm for Gaussian inputs was thoroughly studied in the classical work of Widrow et al. [1], in which the widely used concept of independence assumption was first introduced. Other related studies of the LMS algorithm with independent Gaussian inputs include [1416]. On the other hand, the NLMS algorithm generally possesses an improved convergence speed over the LMS algorithm, but its analysis is more complicated due to the step size normalization. In [17] and [18], the mean and mean square behaviors of the NLMS algorithm for Gaussian inputs were studied. Analysis for independent Gaussian inputs in [19] also revealed some of the advantages of the NLMS algorithm over the LMS algorithm. Due to the difficulties in evaluating the expectations involved in the difference equations for the mean weight-error vector and its covariance matrix, and hence in deriving the general expressions for these equations, the works in [17] and [18] only concentrated on certain special cases of eigenvalue distribution of the input autocorrelation matrix. In [20] and [21], simplified input data models were introduced to facilitate the analysis so that useful analytical expressions can still be derived. In [2224], the averaging principle was invoked to simplify the expectations involved in the difference equations. Basically, the normalization term is assumed to vary slowly with respect to the input correlation term and the power of the input vector is assumed to be either chi-square distributed with L degrees of freedom [22] or integrated otherwise [24]. Recently, Sayed et al. [25] proposed a unified framework based on the concept of energy conservation for analyzing the convergence of adaptive filtering algorithms. It has been applied to different adaptive filtering algorithms with satisfactory results [26, 27]. Other related works for the normalized sign-sign algorithm [28] and the NLMS algorithm in Gaussian noise environment with white input signal can be found in [29].

Convergence behavior of the RLM algorithm in contaminated Gaussian (CG) noise [30] using the modified Huber (MH) error nonlinearity was studied in [12]. However, a detailed analysis of the LMM and NLMM algorithms both in Gaussian and CG additive noises is still unavailable, probably due to the complicated error nonlinearity and robust statistics involved, and the difficulty in analyzing the NLMS algorithm as mentioned above.

In this paper, we study the performance of a class of LMS and NLMS algorithms with error nonlinearity and Gaussian inputs and additive Gaussian as well as CG noise. In particular, the LMM and NLMM algorithms with MH error nonlinearity and adaptive threshold selection (ATS) will be studied in detail. It also includes a variety of other interesting algorithms and provides valuable insights into the advantages of using ATS and the effectiveness of the LMM/NLMM algorithms in impulsive noise environment. The analysis is divided into two main parts:

  1. (1)

    We extend the framework of [17] and [18] to analyze the NLMS algorithm with a class of error nonlinearity, which we called the M-nonlinearity, in Gaussian noise. This includes most M-estimate functions and several other commonly used error nonlinearities. Using the conventional Price’s theorem [3133], new decoupled difference equations describing the mean and mean square convergence behaviors are derived. The final results closely resemble the classical results for LMS in [1] and can be easily reduced to related works such as [34, 35]. Moreover, it is found that the nonlinearity will in general slow down the adaptation rate and the normalization process will always speed up the maximum convergence rate of the NLMS algorithms over their LMS counterparts if the eigenvalues of the input autocorrelation matrices are unequal. The convergence performances of the LMM and NLMM algorithms are studied in detail. The theoretical and practical importance of ATS is also explained and analyzed.

  2. (2)

    Instead of using the Price’s theorem for Gaussian variates as an approximation in analyzing the RLM algorithm with MH nonlinearity in CG noise as in [12], we show that it is also applicable to Gaussian mixtures with very slight modification. In fact, the Price’s theorem is applicable to each independent Gaussian component as pointed out, without proof, in a short note by Price [36]. Using this result and those in (1) above, the mean and mean square convergence behaviors of the LMS and NLMS algorithms with M-nonlinearity in CG noise are derived. The LMM and NLMM algorithms are then analyzed in detail. The analytical results suggest that the M-nonlinearity with ATS helps to suppress the impulsive noise in exchange for a slightly slower adaptation rate than that in the Gaussian noise case.

Another key to the above analysis is the introduction of certain special functions called the generalized Abelian integral functions [37], which are generalizations of the Abelian integral functions [38]. They allow us to obtain decoupled analytical formulas by evaluating the expectations involved in the difference equations and help us interpret the convergence behaviors of the NLMS algorithms with M-nonlinearity. Particularly, these analytical results can be reduced to those for the conventional NLMS algorithms in Gaussian additive noise which agree with [17, 18], except that new expressions for the excess mean square error (EMSE), stability bound and difference equations in terms of the generalized Abelian integrals are obtained. For clarity of presentation and its practical importance, this particular case is separately treated in the companion paper [37]. All the above results can also be readily generalized to the simpler LMS case. Moreover, the new results also agree with the conventional LMS algorithm [1], the LMS algorithms with dual sign (DS) nonlinearity [34] and the error function (EF) nonlinearity [35]. Monte Carlo simulation results confirm that the NLMM algorithm offers improved robustness to impulsive noise over the NLMS algorithm, and are in good agreement with the theoretical analysis.

The rest of this paper is organized as follows: In Section 2, the NLMS algorithm is briefly reviewed and the NLMM algorithm is introduced. In Section 3, the proposed convergence performance analysis is presented. Simulation results are given in Section 4 and conclusions are drawn in Section 5.

2 Normalized Least Mean M-Estimate Algorithm

2.1 The NLMS Algorithm

Consider the adaptive system identification problem in Fig. 1 where an adaptive filter with coefficient or weight vector of order L, \( W(n) = {\left[ {{w_1}(n),{w_2}(n), \cdots, {w_L}(n)} \right]^T} \), is used to model an unknown system with impulse response \( W* = {\left[ {{w_1},{w_2}, \cdots, {w_L}} \right]^T} \). Here, (∙)T denotes the transpose of a vector or a matrix. The unknown system and the adaptive filter are simultaneously excited by the same input x(n). The output of the unknown system d 0(n) is assumed to be corrupted by a measurement noise η 0(n) to form the desired signal d(n) for the adaptive filter. The estimation error is given by \( e(n) = d(n) - y(n) \). In the LMS algorithm, the MSE is minimized by updating the weight vector in the negative direction of the instantaneous gradient of the MSE with respect to the weight vector, −2e(n)X(n). This gives

$$ W\left( {n + 1} \right) = W(n) + \mu e(n)X(n), $$
(1)

where \( e(n) = d(n) - {W^T}(n)X(n) \), \( X(n) = {\left[ {x(n),x\left( {n - 1} \right), \ldots, \,x\left( {n - L + 1} \right)} \right]^T} \) is the input vector at time instant n and μ is a constant step size parameter chosen to reduce the gradient noise and to control the convergence rate and steady state error of the algorithm. In the NLMS algorithm, the step size is normalized by the energy of the input vector, X T(n)X(n), which gives:

$$ W\left( {n + 1} \right) = W(n) + \frac{{\mu e(n)X(n)}}{{\varepsilon + {X^T}(n)X(n)}}. $$
(2)
Figure 1
figure 1

Adaptive system identification.

The step size μ in the NLMS algorithm is a positive constant which should be chosen in the range 0 < μ < 2 to ensure convergence of the algorithm. ε is a small positive value used to avoid division by zero.

When the additive noise η 0(n) is of impulsive nature, the performance of the NLMS algorithm which is based on the MSE criterion will deteriorate significantly. Recently, it is pointed out in [29] and [39] that the normalization mechanism of the NLMS algorithm can provide some degree of protection against impulsive noise. However, as will be illustrated in the subsequent computer experiments, the NLMS algorithm is still sensitive to consecutive impulses corrupting the desired signals. This motivates us to consider the NLMM algorithm.

2.2 The Normalized Least Mean M-Estimate Algorithm

Many techniques have been employed to reduce the hostile effect on the system due to impulsive interference. Examples include algorithms based on median filtering [8, 9], nonlinear clipping approaches [10, 11], and M-estimation-based algorithms such as the LMM [7] and RLM [12] algorithms. In particular, the last two algorithms were developed by minimizing the robust M-estimate cost functions instead of the conventional MSE criterion. Their improved robustness to impulsive noise and performance comparison with other relevant algorithms were thoroughly discussed in [7] and [12]. In the following, the concept of M-estimate cost function will be briefly reviewed and the NLMM algorithm will be derived.

In the LMM algorithm [7], the least mean M-estimate distortion measure \( {J_\rho } = E\left[ {\rho \left( {e(n)} \right)} \right] \) is minimized, where ρ(e) is an M-estimate function, which can be chosen as the Hampel’s three parts redescending function [40] or the modified Huber (MH) M-estimate function [13] as shown in Fig. 2a:

$$ {\rho_{\text{MH}}}(e) = \left\{ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{e^2}/2,} \hfill & {0 \leqslant \left| e \right|<\xi } \hfill \\ \end{array} } \hfill \\ {\begin{array}{*{20}{c}} {{\xi^2}/2.} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \hfill \\ \end{array} } \right., $$
(3)

where ξ is the threshold parameter used to suppress the effect of outlier when the estimation error e is very large. Notice that when \( \rho (e) = {e^2}/2 \) it reduces to the conventional MSE criterion. Like the LMS algorithm, \( {J_\rho } \) is minimized by updating W(n) in the negative direction of the instantaneous gradient vector \( {\widehat\nabla_{W\rho }} \). Hence, the gradient vector, \( {\nabla_W}\left( {{J_\rho }} \right) \), is approximated by

$$ \begin{array}{*{20}{c}} {{\nabla_W}\left( {{J_\rho }} \right) = E\left[ { - \frac{{\partial \left( {\rho \left( {e(n)} \right)} \right)}}{{\partial W}}} \right] \approx {{\widehat\nabla }_{W\rho }} = - \frac{\partial }{{\partial W}}\rho \left( {e(n)} \right)} \\ { = - q\left( {e(n)} \right)e(n)X(n),} \\ \end{array} $$
(4)

where \( q(e) = \psi (e)/e = \left( {\partial \rho (e)/\partial e} \right)/e \) and ψ(e) is the score function. For the MH function:

$$ {q_{\text{MH}}}(e) = \frac{{{\psi_{\text{MH}}}(e)}}{e} = \left\{ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {1,} \hfill & {0 \leqslant \left| e \right|<\xi } \hfill \\ \end{array} } \hfill \\ {\begin{array}{*{20}{c}} {0.} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \hfill \\ \end{array} } \right. $$
(5)

which is depicted in Fig. 2b.

Figure 2
figure 2

a The modified Huber M-estimate function ρMH(e); bψMH(e), the score function of ρMH(e).

Finally, we obtain the LMM algorithm as follows

$$ W\left( {n + 1} \right) = W(n) - \mu {\widehat\nabla_{{\mathbf{W}}\rho }} = W(n) + \mu \psi \left( {e(n)} \right)X(n). $$
(6)

In general, when e(n) is smaller than ξ, the weight function q(e(n)) is equal to one and (6) becomes identical to the LMS algorithm. When e(n) is larger than certain thresholds, say ξ in the MH function, q(e(n)) will become zero and prevent the weight vector from updating. Thus the LMM algorithm can effectively reduce the adverse effect of large estimation error on the update of the filter coefficients. Another interpretation of (6) is that the error term is passed through a nonlinear device ψ(e). This type of adaptive filtering algorithms has been studied previously for the error-function nonlinearity [35] and the sign nonlinearity [34]. The former concludes that the nonlinearity will slow down the convergence rate, while the latter is mainly introduced to reduce the implementation complexity. Both of them did not recognize the robustness of this class of algorithms to impulsive outliers. This was later studied by Koike in [10, 29] and using the clipping nonlinearity, and [6, 7, 41] using M-estimation. In [7], the threshold parameter ξ in the MH function is continuously updated using a technique called adaptive threshold selection (ATS), which greatly improves the convergence speed and steady state error.

Adaptive Threshold Selection (ATS)

In ATS, e(n) is assumed to be Gaussian distributed except being corrupted occasionally by additive impulsive noise. By using the following robust variance estimate

$$ \widehat\sigma_e^2(n) = {\lambda_\sigma }\widehat\sigma_e^2\left( {n - 1} \right) + {c_1}\left( {{\text{1}} - {\lambda_\sigma }} \right){\text{med}}\left( {{A_e}(n)} \right){\text{,}} $$
(7)

\( \widehat\sigma_e^2(n) \), the variance of the “impulse-free” error \( \hat e(n) \), can be estimated. Hence, it can be used to detect and reject the adverse outliers in e(n). The forgetting factor \( {\lambda_\sigma } \) is a positive real number close to but smaller than one. med(∙) is the median operator. \( {A_e}(n) = \left\{ {{e^2}(n), \cdots, {e^2}\left( {n - {N_w} + 1} \right)} \right\} \). c 1 = 2.13 is the correction factor for median estimation and N w is the length of the data set. The probability of \( \left| {\hat e(n)} \right| \) greater than a given threshold T h is given by

$$ {\theta_T}(n) = {P_r}\left\{ {\left| {e(n)} \right| > {T_h}} \right\} = {\text{erfc}}\left( {\tfrac{{{T_h}}}{{\sqrt 2 {{\widehat\sigma }_e}(n)}}} \right), $$
(8)

where \( {\text{erfc}}(x) = \left( {2/\sqrt \pi } \right)\int_x^\infty {{e^{ - {t^2}}}} dt \) is the complementary error function. Let \( {\theta_\xi } = {P_r}\left\{ {\left| {e(n)} \right| > \xi } \right\} \) be the probabilities that e(n) is greater than ξ, the value of ξ can be determined by appropriate selection of \( {\theta_\xi } \) according to (8). For example, if \( {\theta_\xi } \), is chosen as 0.01, there will be 99% confidence to reject it when \( \left| {\hat e(n)} \right| > \xi \). Accordingly, ξ can be obtained from (8) as

$$ \xi = {k_\xi }{\widehat\sigma_e}(n) = 2.576{\widehat\sigma_e}(n). $$
(9)

Similar expressions for the Hampel’s three parts re-descending function can be found in [7, 12]. This technique, which is referred to as adaptive threshold selection (ATS), is very important both theoretically and practically to the LMM and NLMM algorithms as we shall further elaborate in Sections 3 and 4. Some commonly used M-estimate functions, which we called the M-nonlinearity, and useful properties pertaining to their analyses in Section 3 are also summarized in Appendix C.

The computational complexity of the LMM algorithm per iteration is of order O(L) with additional O(N w logN w ) operations for performing the median filtering [7]. The window length N w is usually chosen between 5 and 9.

If the step size μ in (6) for the LMM algorithm is again normalized by the squared norm of the input vector as in the NLMS algorithm, the following NLMM algorithm is obtained

$$ W\left( {n + 1} \right) = W(n) + \frac{{\mu \psi \left( {e(n)} \right)X(n)}}{{\varepsilon + {X^T}(n)X(n)}}, $$
(10)

where \( \psi \left( {e(n)} \right) = q\left( {e(n)} \right)e(n) \). Alternatively, (10) can be derived using the Least Perturbation Property (LPP) [5] originally proposed for the NLMS algorithm. Given the a prior output estimation error \( e(n) = d(n) - {X^T}(n)W(n) \). The NLMS algorithm can be derived by seeking a W(n + 1) such that its deviation from W(n) is minimized in the 2-norm, while satisfying a constraint between the a posteriori output estimation error \( {e_p}(n) = d(n) - {X^T}(n)W\left( {n + 1} \right) \) and the prior output estimation error as follows:

$$ \mathop {\min }\limits_{{\mathbf{W}}\left( {n + 1} \right)} {\left\| {W\left( {n + 1} \right) - W(n)} \right\|^2}, $$

subject to

$$ {e_p}(n) = \left( {1 - \frac{{\mu {{\left\| {X(n)} \right\|}^2}}}{{\varepsilon + {{\left\| {X(n)} \right\|}^2}}}} \right)e(n). $$
(11)

The solution of this problem yields the NLMS conventional update in (2). In M-estimation, the a prior output estimation error e(n) is replaced by the score function ψ(e(n)) to suppress the adverse effect of impulsive noise. Substituting this into the constraint above and solving for W(n + 1) will yield the update in (10) above, since the two optimization problems have the same structure except for that e(n) is now replaced by ψ(e(n)).

As will be shown in the performance analysis in Section 3 and the simulation results in Section 4, this normalization brings faster convergence speed when the input is highly colored. Compared to the LMM algorithm for real inputs, the computational complexity is increased by one multiplication and two additions for updating X T(n)X(n), and one addition and one division for evaluating \( \mu /\left( {\varepsilon + {X^T}(n)X(n)} \right) \). The performance advantage of the NLMS algorithm for white inputs has been also analyzed in detail by Douglas et al. [42].

3 Mean and Mean Square Convergence Analysis

In this section, we first present a detailed convergence performance analysis of the NLMS algorithms with M-nonlinearity ψ(e) for Gaussian input and independent white Gaussian additive noise. More precisely, by extending the approach in [7, 12] and using the conventional Price’s theorem, it is possible to derive analytical expressions for modeling their mean and mean square convergence behaviors. Next, we extend the Price’s theorem to mixture Gaussian processes and, by using the above results, present a detailed convergence performance analysis of these NLMS algorithms for Gaussian inputs and independent CG additive noise. In particular, the LMM and NLMM algorithms with MH function and ATS will be studied in detail. The final results are new expressions for EMSE, stability bound and difference equations describing the convergence behaviors of the various algorithms in terms of the generalized Abelian integral functions. They are similar to the classical results of the LMS algorithm by Widrow et al. [1], which allow us to clearly interpret and compare the convergence behaviors of this class of algorithms. To simplify the analysis, the following assumptions are made:

Assumption 1:

The input signal x(n) is an ergodic process which is Gaussian distributed with zero mean and autocorrelation matrix \( {R_{XX}} = E\left[ {X(n){X^T}(n)} \right] \).

Assumption 2:

The additive noise η 0(n) is assumed to be a Gaussian noise for the analysis in section A below (η 0(n) = η g (n)). For the analysis in Section B below, η 0(n) is modeled as a CG noise which is a frequently used model for analyzing impulsive noise. More precisely, it is given by:

$$ {\eta_0}(n) = {\eta_g}(n) + {\eta_{im}}(n) = {\eta_g}(n) + b(n){\eta_w}(n), $$
(12)

where η g (n) and η w (n) are both independent and identically distributed (i.i.d.) zero mean Gaussian sequences with respective variance \( \sigma_g^2 \) and \( \sigma_w^2 \). b(n) is an i.i.d. Bernoulli random sequence whose value at any time instant is either zero or one, with occurrence probabilities \( {P_r}\left( {b(n) = 1} \right) = {p_r} \) and \( {P_r}\left( {b(n) = 0} \right) = 1 - {p_r} \). The variances of the random processes η im (n) and η 0(n) are then given by \( \sigma_{im}^2 = {p_r}\sigma_w^2 \) and \( \sigma_{{\eta_0}}^2 = \sigma_g^2 + \sigma_{im}^2 = \sigma_g^2 + {p_r}\sigma_w^2 \). The ratio \( {r_{im}} = \sigma_{im}^2/\sigma_g^2 = {p_r}\sigma_w^2/\sigma_g^2 \) is a measure of the impulsive characteristic of the CG noise. \( \sigma_\Sigma^2 = \sigma_g^2 + \sigma_w^2 \). Accordingly, the probability distribution function (PDF) of this CG distribution is given by

$$ {f_{{\eta_0}}}\left( \eta \right) = \tfrac{{1 - {p_r}}}{{\sqrt {2\pi \sigma_g^2} }}\exp \left( { - \tfrac{{{\eta^2}}}{{2\sigma_g^2}}} \right) + \tfrac{{{p_r}}}{{\sqrt {2\pi \sigma_\Sigma^2} }}\exp \left( { - \tfrac{{{\eta^2}}}{{2\sigma_\Sigma^2}}} \right). $$
(13)

Assumption 3:

W(n), x(n) and η0(n) are statistically independent (the independent assumption [1]). Although this assumption is not completely valid in general applications, it is a good approximation and is commonly used to simplify the convergence analysis of numerous adaptive filtering algorithms. Moreover, we denote \( W* = R_{XX}^{ - 1}{P_{dX}} \) as the optimal Wiener solution, where \( {P_{dX}} = E\left[ {d(n)X(n)} \right] \) is the ensemble-averaged cross-correlation vector between X(n) and d(n).

3.1 Convergence Behaviors of the NLMS Algorithms with M-Nonlinearity and Gaussian Inputs and Noises

3.1.1 Mean Behavior

From (10), the weight-error vector \( v(n) = W* - W(n) \) can be written as:

$$ v\left( {n + 1} \right) = v(n) - \frac{{\mu \psi \left( {e(n)} \right)X(n)}}{{\varepsilon + {X^T}(n)X(n)}}. $$
(14)

If ψ(e(n)) = e(n), (14) will reduce to the conventional NLMS algorithm (2). Taking expectation on both sides of (14), we get

$$ E\left[ {v\left( {n + 1} \right)} \right] = E\left[ {v(n)} \right] - \mu {H_1}, $$
(15)

where \( {H_1} = E\left[ {\psi \left( {e(n)} \right)X(n)/\left( {\varepsilon + {X^T}(n)X(n)} \right)} \right] \) and E[∙] denotes the expectation over \( \left\{ {v(n),X(n),{\eta_g}(n)} \right\} \) and is written more clearly as \( {E_{\left\{ {v,X,{\eta_g}} \right\}}}\left[ \cdot \right] \). Since X(n) and η g (n) are stationary, we can drop the time index n in the expectation to get \( {H_1} = E\left[ {\psi (e)X/\left( {\varepsilon + {X^T}X} \right)} \right] \).

In the conventional NLMS algorithm [17] and [18], similar difference equation for the mean weight-error vector (c.f. [18, Eq. 11]) was obtained:

$$ E\left[ {v\left( {n + 1} \right)} \right] = \left( {I - \mu {F_\varepsilon }} \right)E\left[ {v(n)} \right], $$
(16)

where \( {F_\varepsilon } = E\left[ {X(n){X^T}(n)/\left( {\varepsilon + {X^T}(n)X(n)} \right)} \right] \) and I is the identity matrix. For convenience, the variables have been renamed according to the notation in this paper. Moreover, \( {F_\varepsilon } \) was further diagonalized to \( {H_\varepsilon } \) whose i-th element is \( {\left[ {{H_\varepsilon }} \right]_{i,i}} = \int_0^\infty {\frac{{\exp \left( { - \beta \varepsilon } \right)}}{{{{\left| {I + 2\beta {R_{XX}}} \right|}^{1/2}}}}} \frac{{{\lambda_i}}}{{1 + 2\beta {\lambda_i}}}d\beta \) (c. f. [18, Eq. 14]), where λ i is the i-th eigenvalue of R XX . It was evaluated analytically in [17] for three important cases with different eigenvalue distributions (in [18], only the first case was elaborated): (1) white input signal with \( {\lambda_1} = \ldots = {\lambda_L} \); (2) two signal subspaces with equal powers \( {\lambda_1} = \ldots = {\lambda_K} = a \) and \( {\lambda_{K + 1}} = \ldots = {\lambda_L} = b \); (3) distinct pairs, \( {\lambda_1} = {\lambda_2},\,{\lambda_3} = {\lambda_4},\, \ldots, \,{\lambda_{L - 1}} = {\lambda_L} \) (assuming L is even). Besides these three special cases, no general solution to \( {H_\varepsilon } \) was provided. Therefore, general closed-form formulas for modeling the mean and mean square behavior of NLMS algorithm were unavailable in [17] and [18]. To our best knowledge, no such analytical solution has ever appeared in the literature.

Here, we pursue another direction by treating some of these integrals involved as special functions and carry them throughout the analysis. Furthermore, using the Price’s theorem, the expectation involving the M-nonlinearity and ATS can be decoupled from the rest of the equations. The final formulas containing these special integral functions still allow us to clearly interpret the convergence behavior of the NLMS algorithms with error nonlinearity. More precisely, it is shown in Appendix A (A-14) that

$$ {H_1} \approx {A_\psi }\left( {\sigma_e^2} \right)U\Lambda {D_\Lambda }{U^T}E\left[ {v(n)} \right]. $$
(17)

where \( {A_\psi }\left( {\sigma_e^2} \right) = \overline {\psi \prime } \left( {\sigma_e^2} \right) = \tfrac{1}{{\sqrt {2\pi } {\sigma_e}}}\int_{ - \infty }^\infty {\psi \prime (e)\exp \left( { - {e^2}/\left( {2\sigma_e^2} \right)} \right)de} \), \( {R_{XX}} = U\Lambda {U^T} \) is the eigenvalue decomposition of R XX and \( \Lambda = {\text{diag}}\left( {{\lambda_1},{\lambda_2}, \cdots, {\lambda_L}} \right) \) contains its eigenvalues. \( {D_\Lambda } \) is a diagonal matrix with the i-th diagonal entry given by (A-16): \( {\left[ {{D_\Lambda }} \right]_{i,i}} = {I_i}\left( \Lambda \right) = \int_0^\infty {exp\left( { - \beta \varepsilon } \right)\left[ {\mathop \Pi \limits_{k = 1}^L {{\left( {2\beta {\lambda_k} + 1} \right)}^{ - 1/2}}} \right]{{\left( {2\beta {\lambda_i} + 1} \right)}^{ - 1}}d\beta } \), which is a generalized Abelian integral function, whereas the conventional Abelian integral has the form \( {I_{\text{a}}}(x) = \int_0^x {{{\left[ {q\left( \beta \right)} \right]}^{ - 1/2}}d\beta } \) with q(β) being a polynomial in β. It is also similar to \( {\left[ {{H_\varepsilon }} \right]_{i,i}} \) in [18].

Substituting (17) into (15), we obtain

$$ E\left[ {v\left( {n + 1} \right)} \right] = \left( {I - \mu {A_\psi }\left( {\sigma_e^2(n)} \right)U\Lambda {D_\Lambda }{U^T}} \right)E\left[ {v(n)} \right], $$
(18)

where \( \sigma_e^2(n) = E\left[ {{v^T}(n){R_{XX}}v(n)} \right] + \sigma_g^2 \) and the approximate sign has been replaced by the equality sign. (18) can also be written in the natural coordinate V(n) = U T v(n) as

$$ E\left[ {V\left( {n + 1} \right)} \right] = \left( {I - \mu {A_\psi }\left( {\sigma_e^2(n)} \right)\Lambda {D_\Lambda }} \right)E\left[ {V(n)} \right], $$
(19)

which is equivalent to L scalar first order finite difference equations as follows:

$$ E{\left[ {V\left( {n + 1} \right)} \right]_i} = \left( {1 - \mu {A_\psi }\left( {\sigma_e^2(n)} \right){\lambda_i}{I_i}\left( \Lambda \right)} \right)E{\left[ {V(n)} \right]_i}, $$
(20)

where E[V(n)] i is the i-th element of the vector E[V(n)].

Though the maximum possible step size is in general difficult to obtain, a sufficient condition for the algorithm to converge is \( \left| {1 - \mu {A_\psi }\left( {\sigma_e^2} \right){\lambda_i}{I_i}\left( \Lambda \right)} \right|<1 \), for all i. If \( \overline {\psi \prime } \left( {\sigma_e^2} \right) \) is bounded above by a constant \( {\overline {\psi \prime }_{\max }} = {A_{\psi \_\max }} \), then a conservative maximum step size bound is

$$ \mu<\frac{2}{{{A_{\psi \_\max }}{\lambda_{\max }}{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)}} $$
(21)

which yields good estimates in practical algorithms such as the LMM, NLMM, and those in [34, 35].

Remarks

  1. (R-A1):

    LMS algorithms with error nonlinearity:

As pointed out at the end of Appendix A, our analysis will reduce to the LMS case when \( {D_\Lambda } = I \) and Eq. (18) strictly holds for general error nonlinearity ψ(e). Except for the conventional LS error criterion, (19) or (20) are generally a set of nonlinear difference equations due to the error nonlinearity. A general solution is rather difficult to obtain because the term \( {A_\psi }\left( {\sigma_e^2} \right) = \overline {\psi \prime } \left( {\sigma_e^2} \right) \) is dependent on the MSE. For the DS nonlinearity, (19) agrees with the results in [34], and moreover, Mathews et al. showed that the resulting equation is convergent using the Doob’s theorem [43, 44]. (19) also agrees with the result in [35] for LMS algorithm with EF nonlinearity, in which the difference equation was approximated with a differential equation and it was showed that there are two possible adapting phases, the nonlinear and linear ones. If the nonlinearity ψ(e) is known, then it may be possible to derive its convergence behavior by converting it to a nonlinear ordinary differential equation (ODE). \( {A_\psi }\left( {\sigma_e^2} \right) \) for some commonly used error nonlinearities are summarized in Table 1.

Table 1 List of \( {A_\psi }\left( {\sigma_\varepsilon^2} \right) \), \( {B_\psi }\left( {\sigma_\varepsilon^2} \right) \) and \( {C_\psi }\left( {\sigma_\varepsilon^2} \right) \) for three related algorithms.

For most M-estimate functions, ψ(e) = q(e)e. q(e) is equal to one when \( \left| e \right| \) is less than a certain threshold ξ and will gradually decrease to reduce its sensitivity to impulses with large amplitude. Hence, \( 0 \leqslant \psi \prime (e) \leqslant 1 \) and it is approximately equal to one when \( \left| e \right| \) is less than ξ. Specifically, for the LMM and NLMM algorithms using MH nonlinearity, it can be shown from (C-5) and (5) that:

$$ {A_{\text{MH}}}\left( {\sigma_e^2} \right) = {\text{erf}}\left( {\tfrac{\xi }{{\sqrt 2 {\sigma_e}}}} \right) - \tfrac{{2\xi }}{{\sqrt {2\pi } {\sigma_e}}}\exp \left( { - \tfrac{{{\xi^2}}}{{2\sigma_e^2}}} \right), $$

with \( \mathop {\lim }\limits_{\sigma_e^2 \to 0} {A_{\text{MH}}}\left( {\sigma_e^2} \right) \to 1 \) and \( \mathop {\lim }\limits_{\sigma_e^2 \to \infty } {A_{\text{MH}}}\left( {\sigma_e^2} \right) \to 0 \). For sufficiently small step size μ, the algorithm will converge and \( \sigma_e^2 \) will decrease. Because when \( \sigma_e^2 \) is much larger than \( {\xi^2} \), \( {A_{\text{MH}}}\left( {\sigma_e^2} \right) \to 0 \), ξ should be carefully chosen otherwise the step size can be unnecessarily small as adaptation starts. This is because \( \sigma_e^2 \) can be much larger than \( {\xi^2} \) initially, which depends in turn on the initial condition. If \( {A_{\text{MH}}}\left( {\sigma_e^2} \right) \) is not made adaptive, an inappropriately chosen ξ may suppress the signal component, instead of just the outliers. This will cause \( {A_{\text{MH}}}\left( {\sigma_e^2} \right) \) to increase gradually and lead to the so called “nonlinear adaptation” problem encountered by the EF nonlinearity in [35]. When the algorithm is nearly converged, \( \sigma_e^2 \) will be approximately constant and the asymptotic rate of convergence is \( 1 - \mu {A_{\text{MH}}}\left( {\sigma_e^2\left( \infty \right)} \right){I_i}\left( \Lambda \right) \), which is the so called “linear adaptation”.

Since nonlinear adaptation is typically slow, it should be avoided. In the LMM and NLMM algorithms, ξ is chosen to be a multiple of the estimated σ e as shown in (9). If \( \hat \sigma_e^2 \approx \sigma_e^2 \), this helps to avoid significant signal suppression by maintaining a fairly stationary \( {A_{\text{MH}}}\left( {\sigma_e^2} \right) \) as follows:

$$ {A_{\text{MH}}} \approx {\text{erf}}\left( {\tfrac{{{k_\xi }}}{{\sqrt 2 }}} \right) - \tfrac{{2{k_\xi }}}{{\sqrt {2\pi } }}\exp \left( { - \tfrac{{k_\xi^2}}{2}} \right), $$

which is approximately constant and is slightly less than one. From (20), we can see that the degradation in mean convergence over their LMS and NLMS counterparts is therefore minimal. When an impulse with large amplitude is encountered, we will momentarily have \( \sigma_e^2 > > k_\xi^2\hat \sigma_e^2 \) and the measurement will be discarded to improve robustness.

In Appendix C, it is further shown that for most M-estimate functions and nonlinearities (c.f. (C-1) for the definition of M-nonlinearities), \( {A_\psi }\left( {\sigma_e^2} \right) \) is approximately independent of \( \sigma_e^2 \) if ATS mentioned above is used. Consequently, the convergence rate is approximately constant and similar to that of the LMS and NLMS algorithms. Because of its good performance, ATS is highly recommended in practical applications.

  1. (R-A2):

    NLMS algorithms with error nonlinearity:

In the normalized case, (19) provides a good approximation at the steady state of the algorithm and for the NLMS algorithms using the M-nonlinearities together with ATS, which we have elaborated above (c.f. the approximation used in Appendix A and the justification at Appendix C). It should be noted that no such approximation is used in the variants of the LMS algorithms [34, 35] mentioned above. If ATS is not employed, (19) is only an approximation and nonlinear adaptation is likely to be encountered, depending on the clipping level of the nonlinearity and variances of the error signals. The adaptation is typically slow and accurate analytical solution is very difficult to obtain. When the algorithm is nearly converged, \( \sigma_e^2 \) will be approximately constant and (19) will be a good approximation. The asymptotic rate of convergence is \( 1 - \mu {A_{\text{MH}}}\left( {\sigma_e^2\left( \infty \right)} \right){I_i}\left( \Lambda \right) \), which is the familiar “linear adaptation” phase. To avoid slow nonlinear adaptation, ATS is again recommended in practical applications. A conservative maximum step size can also be estimated from (21). Because of its importance, the main focus of this paper is those algorithms employing ATS.

  1. (R-A3):

    Comparison of LMS and NLMS algorithms with M-nonlinearity and ATS:

Next, we briefly compare the LMS/NLMS-based algorithms. It can be seen that with M-nonlinearity and ATS, the convergence rate of the NLMS-based algorithms is \( 1 - \mu {A_\psi }\left( {\sigma_e^2(n)} \right){I_i}\left( \Lambda \right) \), where \( {A_\psi }\left( {\sigma_e^2(n)} \right) \)is approximately constant and usually has a value slightly smaller than one when no impulses are encountered. As a result, compared with the conventional LMS-based algorithms, the step size of the normalized algorithms is changed by a factor of \( 1/\left( {{A_\psi }\left( {\sigma_e^2(n)} \right){I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)} \right) \approx 1/{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right) \). Since the maximum of the product

$$ \begin{array}{*{20}{c}} {{\lambda_i}{I_i}\left( \Lambda \right) = {\lambda_i}\int_0^\infty {exp\left( { - \beta \varepsilon } \right)\left[ {\mathop \Pi \limits_{k = 1}^L {{\left( {2\beta {\lambda_k} + 1} \right)}^{ - 1/2}}} \right]{{\left( {2\beta {\lambda_i} + 1} \right)}^{ - 1}}d\beta } } \\ { = \int_0^\infty {exp\left( { - \beta \varepsilon } \right)\left[ {\Pi_{k = 1}^L{{\left( {2\beta {\lambda_k} + 1} \right)}^{ - 1/2}}} \right]{{\left[ {2\beta + {{\left( {{\lambda_i}} \right)}^{ - 1}}} \right]}^{ - 1}}d\beta }, } \\ \end{array} $$

also achieves its maximum at the largest eigenvalue λmax with the corresponding value of I i (Λ) given by \( {I_{i\_{\lambda_{\max }}}}\left( \Lambda \right) \), the fastest convergence rate of the normalized algorithm occurs when \( \mu = 1/\left( {{A_\psi }\left( {\sigma_e^2} \right){\lambda_{\max }}{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)} \right) \) and it is limited by the mode corresponding to the smallest eigenvalue λmin with the corresponding value of I i (Λ) given by \( {I_{i\_{\lambda_{\min }}}}\left( \Lambda \right) \). That is

$$ 1 - \frac{{2{\lambda_{\min }}{I_{i\_{\lambda_{\min }}}}\left( \Lambda \right)}}{{{\lambda_{\max }}{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)}}. $$
(22)

From the definition of I i (Λ), it can be shown that \( {I_{i\_{\lambda_{\min }}}}\left( \Lambda \right)/{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right) \geqslant 1 \). In other words, the eigenvalue spread λmaxmin is reduced by a factor \( {I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)/{I_{i\_{\lambda_{\min }}}}\left( \Lambda \right) \) after the normalization. Therefore, under the stated assumptions, the maximum convergence rate of the normalized algorithms using ATS is faster than the LMS-based algorithms if the eigenvalues are unequal. Similar conclusion is obtained for the conventional NLMS algorithm [37, 46].

3.1.2 Mean Square Behavior

Post-multiplying (14) by its transpose and taking expectation gives

$$ {\mathbf{\Xi }}\left( {n + 1} \right) = {\mathbf{\Xi }}(n) - {M_1} - {M_2} + {M_3}, $$
(23)

where \( {\mathbf{\Xi }}(n) = E\left[ {v(n){v^T}(n)} \right] \),

$$ {M_1} = \mu {E_{\left\{ {\mathbf{v}} \right\}}}\left[ {H{v^T}} \right] \approx \mu {A_\psi }\left( {\sigma_e^2} \right)U\Lambda {D_\Lambda }{U^T}{\mathbf{\Xi }}\left( {n - 1} \right), $$
(24)
$$ {M_2} = M_1^T \approx \mu {A_\psi }\left( {\sigma_e^2} \right){\mathbf{\Xi }}(n)U{D_\Lambda }\Lambda {U^T}, $$
(25)

and

$$ {M_3} = {\mu^2}E\left[ {{\psi^2}(e)X{X^T}/{{\left( {\varepsilon + {X^T}X} \right)}^2}} \right]. $$
(26)

Note, the expressions in (24) and (25) are obtained by using previous result in (18), and (26) is obtained from the stationarity and independence assumptions. M 3 is evaluated in Appendix B to be

$$ \begin{array}{*{20}{c}} {{M_3} \approx 2{\mu^2}{C_\psi }\left( {\sigma_e^2} \right)U\left\{ {\Lambda \left[ {\left( {{U^T}{\mathbf{\Xi }}(n)U} \right) \circ I\left( \Lambda \right)} \right]\Lambda } \right\}{U^T}} \\ { + {\mu^2}{S_\psi }\left( {\sigma_e^2} \right)U{{\overline D }_2}{U^T} + {\mu^2}{S_\psi }\left( {\sigma_e^2} \right)\sigma_g^2U\Lambda I\prime \left( \Lambda \right){U^T}.} \\ \end{array} $$
(27)

where \( {C_\psi }\left( {\sigma_e^2} \right) = \tfrac{d}{{d\sigma_e^2}}E\left[ {{\psi^2}(e)} \right] \), \( {S_\psi }\left( {\sigma_e^2} \right) = {B_\psi }\left( {\sigma_e^{\text{2}}} \right)/\sigma_e^{\text{2}} \), \( {B_\psi }\left( {\sigma_e^{\text{2}}} \right) = E\left[ {{\psi^2}(e)} \right] = \tfrac{1}{{\sqrt {2\pi } {\sigma_e}}}\int_{ - \infty }^\infty {{\psi^2}(e)\exp \left( { - \tfrac{{{e^2}}}{{2\sigma_e^2}}} \right)de} \). ○ denotes element-wise product of two matrices (Hadamard product), and I(Λ) and I (Λ) are defined in (B-14) and (B-16). The diagonal matrix \( {\overline D_2} \) results from (B-14b) and its i-th element is \( {\left[ {{{\overline D }_2}} \right]_{i,i}} = \mathop \Sigma \limits_k {\lambda_k}{\lambda_i}{I_{ki}}\left( \Lambda \right){\left[ {{U^T}{\mathbf{\Xi }}(n)U} \right]_{k,k}} \). For a given distortion measure ρ(e) and hence ψ(e), these integrals can either be computed analytically or numerically.

Substituting (24)–(27) into (23), and using the natural coordinate \( {\mathbf{\Phi }}(n) = {U^T}{\mathbf{\Xi }}(n)U \), one gets

$$ \begin{array}{*{20}{c}} {{\mathbf{\Phi }}\left( {n + 1} \right) \approx {\mathbf{\Phi }}(n) - \mu {A_\psi }\left( {\sigma_e^2} \right)\Lambda {D_\Lambda }{\mathbf{\Phi }}(n) - \mu {A_\psi }\left( {\sigma_e^2} \right){\mathbf{\Phi }}(n){D_\Lambda }\Lambda } \\ { + 2{\mu^2}{C_\psi }\left( {\sigma_e^2} \right)\left[ {\Lambda \left( {{\mathbf{\Phi }}(n) \circ I\left( \Lambda \right)} \right)\Lambda } \right] + {\mu^2}{S_\psi }\left( {\sigma_e^2} \right){{\mathop D\limits^\sim }_2}} \\ { + {\mu^2}{S_\psi }\left( {\sigma_e^2} \right)\sigma_g^2\Lambda I\prime \left( \Lambda \right),} \\ \end{array} $$
(28)

where \( {\left[ {{{\mathop D\limits^\sim }_2}} \right]_{i,i}} = \mathop \Sigma \limits_k {\lambda_k}{\lambda_i}{I_{ki}}\left( \Lambda \right){\left[ {{\mathbf{\Phi }}(n)} \right]_{k,k}} \). (28) can be written as the following scalar form:

$$ \begin{array}{*{20}{c}} {{\Phi_{i,i}}\left( {n + 1} \right) \approx \left( {1 - 2\mu {A_\psi }\left( {\sigma_e^2} \right){\lambda_i}{I_i}\left( \Lambda \right) + 2{\mu^2}{C_\psi }\left( {\sigma_e^2} \right)\lambda_i^2{I_{ii}}\left( \Lambda \right)} \right){\Phi_{i,i}}(n)} \hfill \\ { + {\mu^2}{S_\psi }\left( {\sigma_e^2} \right)\mathop \Sigma \limits_k {\lambda_k}{\lambda_i}{I_{ki}}\left( \Lambda \right){\Phi_{k,k}}(n) + {\mu^2}{S_\psi }\left( {\sigma_e^2} \right)\sigma_g^2{\lambda_i}I_i^\prime \left( \Lambda \right).} \hfill \\ \end{array} $$
(29)

To study the step size bound for mean square convergence, we first note that the EMSE at time instant n is given by \( {\text{EMSE}}(n) = {\text{Tr}}\left( {{\mathbf{\Phi }}(n)\Lambda } \right) \). Assuming that algorithm converges, it is shown in Appendix B that the last two terms on the right hand side of (29) are tightly upper bounded by \( {\mu^2}{S_\psi }\left( {\sigma_e^2} \right)\sigma_e^2\left( \infty \right){\lambda_i}I_i^\prime \left( \Lambda \right) \) at the steady state. Therefore, from (28), one gets an upper bound for the EMSE as

$$ {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) = {\text{Tr}}\left( {{\mathbf{\Phi }}\left( \infty \right)\Lambda } \right) \approx \tfrac{1}{2}\mu \sigma_e^2\left( \infty \right){\phi_{{\text{NLMS}}\_\psi }}, $$
(30)

where \( {\phi_{{\text{NLMS}}\_\psi }} = {S_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)\mathop \Sigma \nolimits_{i = 1}^L \tfrac{{{\lambda_i}I_i^\prime \left( \Lambda \right)}}{{{A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){I_i}\left( \Lambda \right) - \mu {C_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){\lambda_i}{I_{ii}}\left( \Lambda \right)}} \). Using the fact that \( \sigma_e^2\left( \infty \right) = {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) + \sigma_g^2 \), one gets

$$ {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) = \frac{{\tfrac{1}{2}\mu \sigma_g^2{\phi_{{\text{NLMS}}\_\psi }}}}{{1 - \tfrac{1}{2}\mu {\phi_{{\text{NLMS}}\_\psi }}}}. $$
(31)

which is a nonlinear equation in \( \sigma_e^2\left( \infty \right) \) and hence \( {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) \). It can be seen that \( {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) \) is unbounded when either its denominator becomes zero or when \( {\phi_{{\text{NLMS}}\_\psi }} \) becomes unbounded due to any of the denominators of its partial sum becomes zero. These two conditions allow us to determine the following conditions for the maximum step size:

$$ \mu<2/{\phi_{{\text{NLMS}}\_\psi }}, $$
(32a)
$$ 0<\mu<{A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){I_i}\left( \Lambda \right)/\left[ {{C_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){\lambda_i}{I_{ii}}\left( \Lambda \right)} \right]. $$
(32b)

For the conventional LMS algorithm, \( {A_\psi },{C_\psi },{I_i}\left( \Lambda \right),I_i^\prime \left( \Lambda \right) \), and I ii (Λ) are all equal to one, (32a) and (32b) are identical to the necessary and sufficient conditions for the mean square convergence of the LMS algorithm previously obtained in [14]. Similar results are obtained in [15] by solving the difference equation in Φ(n) and in [16] by a matrix analysis technique. Furthermore, a lower bound of the maximum step size for mean square convergence is obtained in [15]. Here, we extend this approach to study the stability bound of our algorithms. To this end, we write (32a) in full as follows

$$ {S_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)\mathop \Sigma \nolimits_{i = 1}^L \tfrac{{\mu {\lambda_i}I_i^\prime \left( \Lambda \right)}}{{{A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){I_i}\left( \Lambda \right) - \mu {C_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){\lambda_i}{I_{ii}}\left( \Lambda \right)}}<2. $$
(33)

For convenience, rewrite (33) as

$$ \mathop \Sigma \nolimits_{i = 1}^L \frac{{\mu {\lambda_i}{c_i}}}{{1 - \mu {\lambda_i}{d_i}}}<2, $$
(34)

where \( {c_i} = I_i^\prime \left( \Lambda \right)/\left( {{A_\psi }\left( {\sigma_e^2\left( \infty \right){I_i}\left( \Lambda \right)} \right)} \right. \), and \( {d_i} = {C_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){I_{ii}}\left( \Lambda \right)/\left( {{A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){I_i}\left( \Lambda \right)} \right) \).

The right hand side has singularities at 1/(λ i d i ). Between 0 and \( \mathop {\min }\limits_i \left( {1/\left( {{\lambda_i}{d_i}} \right)} \right) \), it is a monotonic increasing function of μ. Therefore, it will reach the critical value of two on the right hand side of the equality at the smallest root μ max of the equation

$$ \mathop \Sigma \nolimits_{i = 1}^L \frac{{\mu {\lambda_i}{c_i}}}{{1 - \mu {\lambda_i}{d_i}}} = 2, $$
(35)

before \( \mathop {\min }\limits_i \left( {1/\left( {{\lambda_i}{d_i}} \right)} \right) \). Any positive step size μ below μ max will ensure the mean square convergence of the LMS algorithm.

Let u = 2 μ −1 and rewrite (35) as

$$ \ell (u) - \mathop \Sigma \nolimits_{i = 1}^L {\lambda_i}{c_i}{l_i}(u) = \mathop \Pi \limits_{i = 1}^L \left( {u - {{\bar u}_i}} \right) = \mathop \Sigma \nolimits_{i = 1}^L {\left( { - 1} \right)}^{L-i} \;{b_{L - i}}{u^i} = 0, $$
(36)

where, \( \ell (u) = \mathop \Pi \limits_{i = 1}^L \left( {u - 2{\lambda_i}{d_i}} \right) \), \( {l_i}(u) = \ell (u)/\left( {u - 2{\lambda_i}{d_i}} \right) \), and \( \overline u_i^{ - 1} \) are the roots of (36). The largest root of (36) (smallest root of (34)) is upper bounded (lower bounded) by [47]

$$ {u_{N\_\max }} \leqslant \tfrac{1}{L}\left[ {{s_1} + \sqrt {\left( {L - 1} \right)\left( {L{s_2} - s_1^2} \right)} } \right], $$
(37)

where \( {s_1} = \mathop \Sigma \nolimits_{i = 1}^L {\overline u_i} = {b_1} \) and \( {s_2} = \mathop \Sigma \nolimits_{i = 1}^L \bar u_i^2 = b_1^2 - {2b_2} \).

By comparing the coefficients on different sides of (36), one also gets

$$ {b_1} = \mathop \Sigma \nolimits_{i = 1}^L {\lambda_i}\left( {2{d_i} + {c_i}} \right) $$

and

$$ {b_2} = 4\mathop \Sigma \limits_{1 \leqslant i \ne j \leqslant L} {\lambda_i}{\lambda_j}{d_i}{d_j} + \mathop \Sigma \nolimits_{i = 1}^L {\lambda_i}{c_i}\left( {2\mathop \Sigma \limits_{1 \leqslant j \ne i \leqslant L} {\lambda_j}{d_j}} \right). $$
(38)

From (37), a more convenient lower bound of μ max can be obtained as follows

$$ \begin{array}{*{20}{c}} {{\mu_{\max }} \geqslant \tfrac{{2L}}{{{b_1} + \sqrt {{{\left( {L - 1} \right)}^2}\left. {b_1^2 -2L(L-1){b_2}} \right)} }}} \\ { \geqslant \tfrac{{2L}}{{{b_1} + \sqrt {{{\left( {L - 1} \right)}^2}b_1^2} }} = 2/{b_1} \equiv {\mu_B}} \\ { = \tfrac{{2{A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)}}{{{S_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)\mathop \Sigma \nolimits_{i = 1}^L {\lambda_i}\left[ {I_i^\prime \left( \Lambda \right)/\left( {{I_i}\left( \Lambda \right)} \right) + 2{C_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){I_{ii}}\left( \Lambda \right)/{I_i}\left( \Lambda \right)} \right]}}.} \\ \end{array} $$
(39)

It can be seen from (39) that \( {A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) \), \( {C_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) \) and \( {S_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) \) also depend on \( \sigma_e^2\left( \infty \right) \) and the exact step size bound is still very difficult to obtain analytically. For M-nonlinearities with ATS, they are approximately constant and the step size bound can be determined accordingly from μ B . Alternatively, one can replace \( {A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) \), \( {C_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) \) and \( {S_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) \) by their worse case values to estimate the step size bound μ B .

In passing, we find from simulation results that the term \( \mu_k^2\Sigma {\lambda_k}{\lambda_i}{I_{ki}}\left( \Lambda \right){\Phi_{k,k}}(n) \) in (29) is very small for small EMSE. Consequently, (29) can be approximated as:

$$ \begin{array}{*{20}{c}} {{\Phi_{i,i}}\left( {n + 1} \right) \approx \left( {1 - 2\mu {A_\psi }\left( {\sigma_e^2} \right){\lambda_i}{I_i}\left( \Lambda \right) + 2{\mu^2}{C_\psi }\left( {\sigma_e^2} \right)\lambda_i^2{I_{ii}}\left( \Lambda \right)} \right){\Phi_{i,i}}(n)} \\ { + {\mu^2}{S_\psi }\left( {\sigma_e^2} \right)\sigma_g^2{\lambda_i}I_i^\prime \left( \Lambda \right).} \\ \end{array} $$

For notation convenience, we have dropped the time index n in \( \sigma_e^2(n) = E\left[ {{v^T}(n){R_{XX}}v(n)} \right] + \sigma_g^2 \). The algorithm will converge when \( \left| {1 - 2\mu {A_\psi }\left( {\sigma_e^2} \right){\lambda_i}{I_i}\left( \Lambda \right) + 2{\mu^2}{C_\psi }\left( {\sigma_e^2} \right)\lambda_i^2{I_{ii}}\left( \Lambda \right)} \right|<1 \), which gives (32b): \( \mu<{A_\psi }\left( {\sigma_e^2} \right){I_i}\left( \Lambda \right)/\left( {{C_\psi }\left( {\sigma_e^2} \right){\lambda_i}{I_{ii}}\left( \Lambda \right)} \right) \) for all i. That is, the results of (32a) and (32b) are very close to each other. From the definition of I i (Λ) and I ii (Λ), we then have \( {I_i}\left( \Lambda \right)/\left( {{\lambda_i}{I_{ii}}\left( \Lambda \right)} \right) = 2/\left( {1 - I_i^{\prime \prime }\left( \Lambda \right)/{I_i}\left( \Lambda \right)} \right) > 2 \), where \( I_i^{\prime \prime }\left( \Lambda \right) = \int_0^\infty {\exp \left( { - \beta \varepsilon } \right)\left[ {\mathop \Pi \limits_{k = 1}^L {{\left( {2\beta {\lambda_k} + 1} \right)}^{ - 1/2}}} \right]} {\left( {2\beta {\lambda_i} + 1} \right)^{ - 2}}d\beta \). Therefore, a conservative stability bound for small EMSE is \( {\mu_B}<2{A_\psi }\left( {\sigma_e^2} \right)/{C_\psi }\left( {\sigma_e^2} \right) \). For M-nonlinearities with ATS, \( {\mu_B}<2{A_\psi }/{C_\psi } \), which is a useful “rule of thumb” step size bound because it does not require any prior knowledge of the input signal.

Remarks

  1. (R-A4):

    LMS algorithms with error nonlinearity:

When \( {I_i}\left( \Lambda \right) = I_i^\prime \left( \Lambda \right) = {I_{ii}}\left( \Lambda \right) = 1 \), our analysis will reduce to that for the LMS algorithm with general nonlinearity. (31) will reduce to

$$ {\text{EMS}}{{\text{E}}_{{\text{LMS}}\_\psi }}\left( \infty \right) \approx \frac{{\tfrac{1}{2}\mu \sigma_g^2{\phi_{{\text{LMS}}\_\psi }}}}{{1 - \tfrac{1}{2}\mu {\phi_{{\text{LMS}}\_\psi }}}}, $$
(40)

where \( {\phi_{{\text{LMS}}\_\psi }} = {S_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)\mathop \Sigma \nolimits_{i = 1}^L \tfrac{{{\lambda_i}}}{{{A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) - \mu {C_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){\lambda_i}}} \), \( {B_\psi }\left( {\sigma_e^2} \right) = \sigma_e^2 \) and \( {S_\psi }\left( {\sigma_e^2} \right) \) and \( {C_\psi }\left( {\sigma_e^2} \right) \) for some related algorithms are summarized in Table 1. For the DS algorithm [34], (40) will reduce to [34, Eq. (24)]. If μ is sufficiently small, then the contribution of the weight-error vector to \( \sigma_e^2\left( \infty \right) \) can be ignored. Accordingly, \( \sigma_e^2\left( \infty \right) \approx \sigma_g^2 \) and (31) can be simplified to \( {\text{EMS}}{{\text{E}}_{{\text{LMS}}\_\psi }}\left( \infty \right) \approx \tfrac{{\mu {S_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)\sigma_g^2}}{{2{A_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)}}\mathop \Sigma \nolimits_{i = 1}^L {\lambda_i} \). This agrees with [34, Eq. (25)] for the DS algorithm. Further, if the input is white, it will reduce to [35 Eq. 45] for the EF nonlinearity case.

For M-nonlinearities with ATS, \( {A_\psi }\left( {\sigma_e^2} \right) \), \( {S_\psi }\left( {\sigma_e^2} \right) \) and \( {C_\psi }\left( {\sigma_e^2} \right) \) are approximately constant (c.f. Appendix C). Denote them by \( {A_\psi } \), \( {S_\psi } \) and \( {C_\psi } \), respectively. Then, (30) becomes

$$ {\text{EMS}}{{\text{E}}_{{\text{LMS}}\_\psi }}\left( \infty \right) \approx \frac{{\tfrac{1}{2}\mu \sigma_g^2{\phi_{{\text{LMS}}\_\psi }}}}{{1 - \tfrac{1}{2}\mu {\phi_{{\text{LMS}}\_\psi }}}}, $$
(41)

where \( {\phi_{{\text{LMS}}\_\psi }} = {S_\psi }\mathop \Sigma \nolimits_{i = 1}^L \tfrac{{{\lambda_i}}}{{{A_\psi } - \mu {\lambda_i}{C_\psi }}} \). The stability bound from (32) is \( \mu<\min \left( {2/{\phi_{{\text{LMS}}\_\psi }},\,{A_\psi }/\left( {{C_\psi }{\lambda_i}} \right)} \right) \), from which we can obtain its lower bound from (38) as follows:

$$ {\mu_{B\_{\text{LMS}}\_\psi }} = \frac{{2{A_\psi }}}{{{S_\psi }\mathop \Sigma \nolimits_{i = 1}^L {\lambda_i}\left( {1 + 2{C_\psi }} \right)}}. $$
(42)

If ψ(e) = e and \( {A_\psi } = {S_\psi } = {C_\psi } = 1 \) we obtain the traditional LMS algorithm. (40) and (42) will respectively reduce to the EMSE(∞) and stability bound for the conventional LMS algorithm derived in [15]. For the LMM algorithm using MH-nonlinearity with a practical value of \( {k_\xi } = 2.576 \), \( {A_\psi } \), \( {S_\psi } \), and \( {C_\psi } \) are quite close to one, and its performance is therefore similar to that of the conventional LMS algorithm.

  1. (R-A5):

    NLMS algorithms with error nonlinearity:

For the normalized version, (31) will be a good approximation at the steady state of the algorithm and for the NLMS algorithms using M-nonlinearities and ATS (c.f. the approximation used in Appendix B). For the latter, (31) can be simplified to

$$ {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) \approx \tfrac{{\mu \sigma_e^2\left( \infty \right)}}{2}{\phi_{{\text{NLMS}}\_\psi }}, $$
(43)

where \( {\phi_{{\text{NLMS}}\_\psi }} = \tfrac{{{S_\psi }}}{{{A_\psi }}}\mathop \Sigma \nolimits_{i = 1}^L \tfrac{{{\lambda_i}I_i^\prime \left( \Lambda \right)}}{{{I_i}\left( \Lambda \right) - \mu {\lambda_i}{I_{ii}}\left( \Lambda \right)\left( {{C_\psi }/{A_\psi }} \right)}} \). Solving (43) yields

$$ {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) \approx \frac{{\tfrac{1}{2}\mu \sigma_g^2{\phi_{{\text{NLMS}}\_\psi }}}}{{1 - \tfrac{1}{2}\mu {\phi_{{\text{NLMS}}\_\psi }}}}. $$
(44)

The stability bound from (32) satisfies: \( \mu<2/{\phi_{{\text{NLMS}}\_\psi }},0<\mu<{A_\psi }/\left. {\left( {{C_\psi }{\lambda_i}} \right)} \right) \), from which we can obtain its lower bound from (39) as follows:

$$ {\mu_{B\_{\text{NLMS}}\_\psi }} = \tfrac{{2{A_\psi }}}{{{S_\psi }\mathop \Sigma \nolimits_{i = 1}^L {\lambda_i}\left[ {I_i^\prime \left( \Lambda \right)/\left( {{I_i}\left( \Lambda \right)} \right) + 2{C_\psi }{I_{ii}}\left( \Lambda \right)/{I_i}\left( \Lambda \right)} \right]}}. $$
(45)

For ψ(e) = e and \( {A_\psi } = {S_\psi } = {C_\psi } = 1 \) we obtain the traditional NLMS algorithm. (43) and (45) will reduce to the EMSE(∞) and stability bound for the conventional NLMS algorithm derived in [37]. For the NLMM algorithm using MH- nonlinearity with a practical value of \( {k_\xi } = 2.576 \), \( {A_\psi } \), \( {S_\psi } \), and \( {C_\psi } \) are quite close to one, and its performance is therefore similar to that of the conventional NLMS algorithm.

3.2 Convergence Behaviors of the NLMS Algorithm with Error Nonlinearity and Gaussian Input and CG Noise

We now analyze the mean and mean square behaviors of the various algorithms studied above in Section 3.1 in CG noise environment. From the previous analysis, we note that the assumption of Gaussian input and additive noise allows us to use the Price’s theorem to approximately decouple the effect of the nonlinearity. For most M-estimate functions which suppress outliers with large amplitude, the convergence rate will be slightly impaired with a similar EMSE, after using ATS. We shall show in the following that this will be paid off by their improved robustness to impulsive noise. Apparently, the Price’s theorem does not apply to Gaussian mixture. However, as pointed out in [36] and to be explained below, it is actually applicable to individual components of the mixture. Moreover, for the LMM and NLMM algorithms, the error signal is nearly Gaussian distributed after the impulsive component is suppressed. This also explains why the approximations in [12, 41] give excellent agreement with Monte Carlo simulations. A detailed analysis will be given below.

3.2.1 Mean Behavior

Assume the additive noise η 0 is now a CG noise as defined in (12), it is a Gaussian mixture consisting of two components η 0_1 and η 0_2, each with zero mean and variance \( \sigma_1^2 = \sigma_g^2 \) and \( \sigma_2^2 = \sigma_\Sigma^2 \), respectively. The occurrence probability of the impulsive noise is p r . Accordingly,

$$ \begin{array}{*{20}{c}} {{E_{\left\{ {v,X,{\eta_0}} \right\}}}\left[ {f\left( {X(n),e(n)} \right)} \right] = \left( {1 - {p_r}} \right){E_{\left\{ {v,X,{\eta_{0\_1}}} \right\}}}\left[ {f\left( {X(n),e(n)} \right)} \right]} \\ { + {p_r}{E_{\left\{ {v,X,{\eta_{0\_2}}} \right\}}}\left[ {f\left( {X(n),e(n)} \right)} \right],} \\ \end{array} $$
(46)

where f(X(n), e(n)) is an arbitrary quantity whose statistical average is to be evaluated. Since X(n), η 0_1, and η 0_2 are Gaussian distributed, each of the expectation on the right hand side can be evaluated using the Price’s theorem. Consequently, the results in Section 3.1 can be carried forward to the CG case by firstly changing the noise power respectively to \( \sigma_g^2 \) and \( \sigma_\Sigma^2 \), and then combining the two results using (46).

Recall the relation of the mean weight-error vector in (15):

$$ E\left[ {v\left( {n + 1} \right)} \right] = E\left[ {v(n)} \right] - \mu H\prime $$
(47)

where \( H\prime = {E_{\left\{ {v,X,{\eta_0}} \right\}}}\left[ {\frac{{\psi \left( {e(n)} \right)X(n)}}{{\varepsilon + {X^T}(n)X(n)}}} \right] = \left( {1 - {p_r}} \right)H_1^\prime + {p_r}H_2^\prime \), \( {\rm H}_1^\prime \) and \( {\rm H}_2^\prime \) are the expectation of the term inside the brackets above with respect to {v, X, η 0_1}, and {v, X, η 0_2}. From (17), we have \( H_i^\prime \approx {A_\psi }\left( {\sigma_{{e_i}}^2} \right)U\Lambda {D_\Lambda }{U^T}E\left[ {v(n)} \right] \), i = 1, 2, where \( \sigma_{{e_1}}^2 = \sigma_{{e_g}}^2 \) and \( \sigma_{{e_2}}^2 = \sigma_{{e_\Sigma }}^2 \). Hence

$$ H\prime \approx {\tilde A_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right)U\Lambda {D_\Lambda }{U^T}E\left[ {v(n)} \right], $$
(48)

where \( \sigma_{{e_g}}^2 = E\left[ {{v^T}(n){R_{XX}}v(n)} \right] + \sigma_g^2 \), \( \sigma_{{e_\Sigma }}^2 = E\left[ {{v^T}(n){R_{XX}}v(n)} \right] + \sigma_\Sigma^2 \), \( {\tilde A_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) = \left( {1 - {p_r}} \right){A_\psi }\left( {\sigma_{{e_g}}^2} \right) + {p_r}{A_\psi }\left( {\sigma_{{e_\Sigma }}^2} \right) \), and U, Λ, and \( {D_\Lambda } \) have been defined in Section 3.1. Substituting (18) into (46) and using the natural coordinate V(n) = U T v(n), one gets

$$ E{\left[ {V\left( {n + 1} \right)} \right]_i} = \left( {1 - \mu {{\mathop A\limits^\sim }_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right){\lambda_i}{I_i}\left( \Lambda \right)} \right)E{\left[ {V(n)} \right]_i}. $$
(49)

For notational convenience, we have replaced the approximate symbol by the equality symbol. This yields the same form as (20), except for \( {\mathop A\limits^\sim_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) \). Similar argument regarding the mean convergence in Section 3.1.1 also applies to (49). A sufficient condition for the algorithm to converge is \( \left| {1 - \mu {{\mathop A\limits^\sim }_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right){\lambda_i}{I_i}\left( \Lambda \right)} \right|<1 \), for all i. If \( {A_\psi }\left( {\sigma_e^2} \right) \) is bounded above by \( {A_{\psi \_\max }} \), then following the argument in Section 3.1.1, the following conservative maximum step size is obtained:

$$ {\mu_{\max }}<2/\left( {{{\mathop A\limits^\sim }_{\psi \_\max }}{\lambda_{\max }}{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)} \right). $$

Remarks

  1. (R-B1):

    LMS and NLMS algorithms

Compared with the Gaussian case, the convergence rate of the conventional LMS and NLMS algorithms without error nonlinearity remains unchanged since \( {\mathop A\limits^\sim_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) = 1 \), though the EMSE will be increased significantly as shown below in Section 3.2.2. All the conclusions in (R-A1) and (R-A2) apply.

For general nonlinearity without ATS, both \( \sigma_{{e_g}}^2 \) and \( \sigma_{{e_\Sigma }}^2 \) can be very large due to the large value of \( \sigma_{{e_\Sigma }}^2 \) and the slow decay of the EMSE \( E\left[ {{v^T}(n){R_{XX}}v(n)} \right] \), as the gain \( {\mathop A\limits^\sim_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) = \left( {1 - {p_r}} \right){A_\psi }\left( {\sigma_{{e_g}}^2} \right) + {p_r}{A_\psi }\left( {\sigma_{{e_\Sigma }}^2} \right) \) can be very small initially. This leads to nonlinear adaptation and slow convergence. Near convergence, \( \sigma_{{e_g}}^2 \) and hence \( {\mathop A\limits^\sim_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) \) will become stable. The convergence is exponential and the rate for the i-th mode is approximately \( 1 - \mu {\mathop A\limits^\sim_\psi }\left( \infty \right){\lambda_i}{I_i}\left( \Lambda \right) \). The actual steady state value of \( {\mathop A\limits^\sim_\psi }\left( {\sigma_{{e_g}}^2\left( \infty \right),\sigma_{{e_\Sigma }}^2\left( \infty \right)} \right) \) will depend on the EMSE and the nonlinearity. Normally, the second term \( {p_r}\overline {\psi \prime } \left( {\sigma_{{e_\Sigma }}^2} \right) \) will be much smaller than the first due to the clipping property of the nonlinearity. The “asymptotic convergence rate” of the NLMS algorithm with general nonlinearity will still be faster than their LMS counterparts if the eigenvalues are unequal.

  1. (R-B2):

    LMS and NLMS with M-nonlinearity and ATS

For the LMM/NLMM algorithms with M-nonlinearity and ATS, this degradation is again not very serious. To see this, consider the MH function where \( {\mathop A\limits^\sim_{\text{MH}}}\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) \) is given by

$$ {\mathop A\limits^\sim_{\text{MH}}}\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) = \left( {1 - {p_r}} \right){A_{\text{MH}}}\left( {\sigma_{{e_g}}^2} \right) + {p_r}{A_{\text{MH}}}\left( {\sigma_{{e_\Sigma }}^2} \right). $$
(50)

Using the ξ update in (9), the first term will approach \( \left( {1 - {p_r}} \right){A_{\text{MH}}}\left( {\sigma_{{e_g}}^2} \right) \) while the second term will be close to zero if \( \sigma_{{e_g}}^2<< \sigma_{{e_\Sigma }}^2 \) as explained at the end of Appendix C. Hence,\( {\mathop A\limits^\sim_{\text{MH}}} \approx \left( {1 - {p_r}} \right){A_{\text{MH}}}\left( {\sigma_{{e_g}}^2} \right) \), which is a constant close to one if p r is not too large.

The fastest convergence rate of these NLMS algorithms with M-nonlinearity occurs when \( \mu = 1/\left( {{{\mathop A\limits^\sim }_\psi }{\lambda_{\max }}{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)} \right) \) and it is limited by the mode corresponding to the smallest eigenvalue. That is

$$ 1 - \frac{{2{{\mathop A\limits^\sim }_\psi }{\lambda_{\min }}{I_{i\_{\lambda_{\min }}}}\left( \Lambda \right)}}{{{{\mathop A\limits^\sim }_\psi }{\lambda_{\max }}{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)}} = 1 - \frac{{2{\lambda_{\min }}{I_{i\_{\lambda_{\min }}}}\left( \Lambda \right)}}{{{\lambda_{\max }}{I_{i\_{\lambda_{\max }}}}\left( \Lambda \right)}}. $$
(51)

Therefore, in additive CG noise, the maximum convergence rate of the NLMS algorithm with M-nonlinearity will also be faster than its un-normalized counterpart if the eigenvalues are unequal.

3.2.2 Mean Square Behavior

For convenience, we drop the argument \( \left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) \) in \( {\mathop A\limits^\sim_\psi }\left( {\sigma_{{e_g}}^2,\sigma_{{e_\Sigma }}^2} \right) \) and similar quantities. From (23), we have

$$ {\mathbf{\Xi }}\left( {n + 1} \right) = {\mathbf{\Xi }}(n) - M_1^\prime - M_2^\prime + M_3^\prime, $$
(52)

where \( M_1^\prime = \mu {\mathop A\limits^\sim_\psi }U\Lambda {D_\Lambda }{U^T}{\mathbf{\Xi }}(n) \), \( M_2^\prime = \mu {\mathop A\limits^\sim_\psi }{\mathbf{\Xi }}(n)U{D_\Lambda }\Lambda {U^T} \), \( M_3^\prime = {\mu^2}{E_{\left\{ {v,X,{\eta_0}} \right\}}}\left[ {{{\left[ {\psi (e)/\left( {\varepsilon + {X^T}X} \right)} \right]}^2}X{X^T}} \right]. \) Note, the expressions for \( \,M_1^\prime \) and \( \,M_2^\prime \) are obtained by using the previous result in (48). Using (46), we have

$$ M_3^\prime = \left( {1 - {p_r}} \right)M_{3\_1}^\prime + {p_r}M_{3\_2}^\prime, $$
(53)

where \( M_{3\_i}^\prime = {\mu^2}{E_{\left\{ {v,X,{\eta_{0\_i}}} \right\}}}\left[ {{{\left[ {\psi (e)/\left( {\varepsilon + {X^T}X} \right)} \right]}^2}X{X^T}} \right] \), for i = 1, 2, and from (27):

$$ \begin{array}{*{20}{c}} {M_3^\prime \approx 2{\mu^2}{{\tilde C}_\psi }U\left[ {\Lambda \left( {{\mathbf{\Xi }}(n) \circ I\left( \Lambda \right)} \right)\Lambda } \right]{U^T} + {\mu^2}{{\tilde B}_\psi }U\Lambda I\prime \left( \Lambda \right){U^T}} \\ { + {\mu^2}{{\tilde S}_\psi }U{{\overline D }_2}{U^T},} \\ \end{array} $$
(54)

where \( {\mathop C\limits^\sim_\psi } = \left( {1 - {p_r}} \right){C_\psi }\left( {\sigma_{{e_g}}^2} \right) + {p_r}{C_\psi }\left( {\sigma_{{e_\Sigma }}^2} \right) \) and

$$ {\mathop S\limits^\sim_\psi } = \left( {1 - {p_r}} \right){S_\psi }\left( {\sigma_{{e_g}}^2} \right) + {p_r}{S_\psi }\left( {\sigma_{{e_\Sigma }}^2} \right) $$
$$ {\mathop B\limits^\sim_\psi } = \left( {1 - {p_r}} \right){S_\psi }\left( {\sigma_{{e_g}}^2} \right)\sigma_g^2 + {p_r}{S_\psi }\left( {\sigma_{{e_\Sigma }}^2} \right)\sigma_\Sigma^2. $$

Substituting (53) into (51) and using the natural coordinate \( {\mathbf{\Phi }}(n) = {U^T}{\mathbf{\Xi }}(n)U \), one gets

$$ \begin{array}{*{20}{c}} {{\mathbf{\Phi }}\left( {n + 1} \right) \approx {\mathbf{\Phi }}(n) - \mu {{\mathop A\limits^\sim }_\psi }\Lambda {D_\Lambda }{\mathbf{\Phi }}(n) - \mu {{\mathop A\limits^\sim }_\psi }{\mathbf{\Phi }}(n){D_\Lambda }\Lambda } \hfill \\ { + 2{{\mathop C\limits^\sim }_\psi }{\mu^2}\left[ {\Lambda \left( {{\mathbf{\Phi }}(n) \circ I\left( \Lambda \right)} \right)\Lambda } \right] + {\mu^2}{{\mathop B\limits^\sim }_\psi }\Lambda I\prime \left( \Lambda \right) + {\mu^2}{{\mathop S\limits^\sim }_\psi }{{\mathop D\limits^\sim }_2}\left( \Lambda \right).} \hfill \\ \end{array} $$
(55)

The i-th diagonal value of Φ(n) is

$$ \begin{array}{*{20}{c}} {{\Phi_{i,i}}\left( {n + 1} \right) \approx \left( {1 - 2\mu {{\mathop A\limits^\sim }_\psi }{\lambda_i}{I_i}\left( \Lambda \right) + 2{\mu^2}{{\mathop C\limits^\sim }_\psi }\lambda_i^2{I_{ii}}\left( \Lambda \right)} \right){\Phi_{i,i}}(n)} \hfill \\ { + {\mu^2}{{\mathop S\limits^\sim }_\psi }\mathop \Sigma \limits_k {\lambda_k}{\lambda_i}{I_{ki}}\left( \Lambda \right){\Phi_{k,k}}(n) + {\mu^2}{{\mathop B\limits^\sim }_\psi }{\lambda_i}I_i^\prime \left( \Lambda \right).} \hfill \\ \end{array} $$
(56)

This has the same form as (28) except for the constants \( {\mathop A\limits^\sim_\psi } \), \( {\mathop S\limits^\sim_\psi } \), \( {\mathop B\limits^\sim_\psi } \) and \( {\mathop C\limits^\sim_\psi } \). As mentioned previously, the last two terms on (55) is upper bounded at the steady state by \( {\mu^2}{\mathop B\limits^\sim_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){\lambda_i}I_i^\prime \left( \Lambda \right) \), where \( {\mathop B\limits^\sim_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) = E\left[ {\sigma_e^2\left( \infty \right){S_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)} \right] \) and the expectation, is taken over the CG noise. Consequently, we have

$$ {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) = {\text{Tr}}\left( {{\mathbf{\Phi }}\left( \infty \right)\Lambda } \right) \approx \frac{{\mu {{\mathop B\limits^\sim }_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)}}{{2{{\mathop S\limits^\sim }_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)}}{\mathop \phi \limits^\sim_{{\text{NLMS}}\_\psi }}, $$
(57)

where \( {\mathop \phi \limits^\sim_{{\text{NLMS}}\_\psi }} = {\mathop S\limits^\sim_\psi }\left( {\sigma_e^2\left( \infty \right)} \right)\mathop \Sigma \nolimits_{i = 1}^L \frac{{{\lambda_i}I_i^\prime \left( \Lambda \right)}}{{{{\mathop A\limits^\sim }_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){I_i}\left( \Lambda \right) - \mu {{\mathop C\limits^\sim }_\psi }\left( {\sigma_e^2\left( \infty \right)} \right){\lambda_i}{I_{ii}}\left( \Lambda \right)}} \). In general, (57) is a nonlinear equation in the EMSE.

Remarks

  1. (R-B3):

    LMS and NLMS algorithms.

For NLMS algorithm, \( {\mathop A\limits^\sim_\psi } = {\mathop C\limits^\sim_\psi } = {\mathop S\limits^\sim_\psi } = 1 \), \( {\mathop B\limits^\sim_\psi } = \left( {1 - {p_r}} \right)\sigma_g^2 + {p_r}\sigma_\Sigma^2 = \sigma_g^2 + {p_r}\sigma_w^2 = \sigma_{{\eta_0}}^2 \), and \( {\mathop B\limits^\sim_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) = \sigma_e^2\left( \infty \right) \). Using the fact that \( \sigma_e^2\left( \infty \right) = {\text{EMS}}{{\text{E}}_{\text{NLMS}}}\left( \infty \right) + \sigma_{{\eta_0}}^2 \), one gets

$$ {\text{EMS}}{{\text{E}}_{\text{NLMS}}}\left( \infty \right) \approx \frac{{\frac{1}{2}\mu \sigma_{{\eta_0}}^2{\phi_{\text{NLMS}}}}}{{1 - \frac{1}{2}\mu {\phi_{\text{NLMS}}}}}. $$
(58)

Similar results are obtained for the LMS algorithm with \( {I_i}\left( \Lambda \right) = {I'_i}\left( \Lambda \right) = {I_{ii}}\left( \Lambda \right) = 1 \). The step size bounds are the same as the Gaussian case.

  1. (R-B4):

    LMS and NLMS algorithms with M-nonlinearities and ATS

In this case, \( {\mathop A\limits^\sim_\psi } \), \( {\mathop C\limits^\sim_\psi } \), and \( {\mathop S\limits^\sim_\psi } \) are nearly constant at the steady state. The step size bounds are similar to Eqs. (42) and (45) for the Gaussian noise case, except that \( {A_\psi } \) and \( {C_\psi } \) are now replaced by \( {\mathop A\limits^\sim_\psi } \) and \( {\mathop C\limits^\sim_\psi } \) respectively.

If \( \sigma_{{e_g}}^2<< \sigma_{{e_\Sigma }}^2 \), the parameter \( {k_\xi } \) in ATS, which is chosen as a reasonable multiple of the “impulse free” variance of the estimation error, can be selected to reduce the adverse effect of the impulses. Consider for example \( {\mathop A\limits^\sim_\psi } = \left( {1 - {p_r}} \right){A_\psi }\left( {\sigma_{{e_g}}^2} \right) + {p_r}{A_\psi }\left( {\sigma_{{e_\Sigma }}^2} \right) \). As mentioned at the end of Appendix C and remarks (R-B2) above, the term \( {A_\psi }\left( {\sigma_{{e_\Sigma }}^2} \right) \) is usually small since the nonlinearity is designed to clip at this high noise level arising from the impulsive noise. Moreover the first term is nearly a constant. As a result, \( {\mathop A\limits^\sim_\psi } \approx \left( {1 - {p_r}} \right){A_\psi } \), and similarly \( {\mathop S\limits^\sim_\psi } \approx \left( {1 - {p_r}} \right){S_\psi } \), \( {\mathop C\limits^\sim_\psi } = \left( {1 - {p_r}} \right){C_\psi } \), and \( {\mathop B\limits^\sim_\psi }\left( {\sigma_e^2\left( \infty \right)} \right) \approx \left( {1 - {p_r}} \right){S_\psi }\sigma_{{e_g}}^2\left( \infty \right) \). If ATS is not employed, both \( {\mathop A\limits^\sim_\psi } \) and \( {\mathop C\limits^\sim_\psi } \), and hence the MSE, will be significantly affected by the impulses. Moreover, complicated nonlinear effects will be encountered, which will generally lead to slower convergence and higher EMSE to be discussed in the sequel.

By noting that \( \sigma_{{e_g}}^2\left( \infty \right) = {\text{EMS}}{{\text{E}}_{{\text{NLMM}}\_\psi }} + \sigma_g^2 \), (57) can be simplified to

$$ {\text{EMS}}{{\text{E}}_{{\text{NLMS}}\_\psi }}\left( \infty \right) \approx \frac{{\frac{1}{2}\mu \sigma_g^2{{\mathop \phi \limits^\sim }_{{\text{NLMS}}\_\psi }}}}{{1 - \frac{1}{2}\mu {{\mathop \phi \limits^\sim }_{{\text{NLMS}}\_\psi }}}}, $$
(59)

where \( {\mathop \phi \limits^\sim_{{\text{NLMS}}\_\psi }} = {\mathop S\limits^\sim_\psi }\mathop \Sigma \nolimits_{i = 1}^L \frac{{{\lambda_i}I_i^\prime \left( \Lambda \right)}}{{{{\mathop A\limits^\sim }_\psi }{I_i}\left( \Lambda \right) - \mu {{\mathop C\limits^\sim }_\psi }{\lambda_i}{I_{ii}}\left( \Lambda \right)}} \approx {S_\psi }\mathop \Sigma \nolimits_{i = 1}^L \frac{{{\lambda_i}I_i^\prime \left( \Lambda \right)}}{{{A_\psi }{I_i}\left( \Lambda \right) - \mu {C_\psi }{\lambda_i}{I_{ii}}\left( \Lambda \right)}} \), which is identical to that in Gaussian noise alone (c.f. Eq. (41)). This also holds true for the LMS-based algorithms where \( {I_i}\left( \Lambda \right) = I_i^\prime \left( \Lambda \right) = {I_{ii}}\left( \Lambda \right) = 1 \). The step size for convergence is also identical to the Gaussian case.

For the NLMM algorithm using the MH function and ATS, the values of \( {\mathop A\limits^\sim_{\text{MH}}} \), \( {\mathop S\limits^\sim_{\text{MH}}} \) and \( {\mathop C\limits^\sim_{\text{MH}}} \) can be computed with the help of Table 1. Assuming \( {\sigma_g}<\xi<< {\sigma_\Sigma } \), then \( {\mathop A\limits^\sim_{\text{MH}}} \approx {\mathop S\limits^\sim_{\text{MH}}} \approx \left( {1 - {p_r}} \right){\text{erf}}\left( {\frac{{{k_\xi }}}{{\sqrt 2 }}} \right) \), \( {\mathop C\limits^\sim_{\text{MH}}} \approx {\mathop A\limits^\sim_{\text{MH}}} - \left( {1 - {p_r}} \right)\left( {\frac{{k_\xi^3}}{{\sqrt {2\pi } }}} \right)\exp \left( { - \frac{{k_\xi^2}}{2}} \right) \). For practical values of \( {k_\xi } = 2.576 \), the second term of \( {\mathop C\limits^\sim_{\text{MH}}} \) above is small and \( {\mathop \phi \limits^\sim_{{\text{NLMS}}\_\psi }} \) in (58) will be approximately given by

$$ {\mathop \phi \limits^\sim_{{\text{NLMS}}\_\psi }} \approx \mathop \Sigma \nolimits_{i = 1}^L \tfrac{{{\lambda_i}I\prime \left( \Lambda \right)}}{{I\prime \left( \Lambda \right) - \mu {\lambda_i}{I_{ii}}\left( \Lambda \right)}}, $$
(60)

which is identical to that of the NLMS algorithm in Gaussian noise alone. This approximation is more accurate when the step size is small. This applies to the LMM algorithm with \( {I_i}\left( \Lambda \right) = I_i^\prime \left( \Lambda \right) = {I_{ii}}\left( \Lambda \right) = 1 \).

In [37], the performance of the NLMS algorithm in Gaussian noise was analyzed and it is shown that \( \frac{1}{2}\mu \sigma_g^2 \) serves as a useful lower bound for estimating the EMSE (misadjustment) of the NLMS algorithm. It is attractive because it does not require the knowledge of the eigenvalues or eigenvalue spread of R XX . A similar upper bound can be estimated empirically as well. From the above analysis for the NLMM algorithm, we see that \( \frac{1}{2}\mu \sigma_g^2 \) also serves as a useful lower bound for the EMSE of the NLMM algorithm in both Gaussian and CG noises since its performance is similar to the NLMS algorithm in Gaussian noise alone in view of Eq. (60). This will be illustrated by computer simulations in the next section.

Finally, we note that the increase in EMSE of NLMS algorithm over the NLMM algorithm in CG noise environment is

$$ \Delta {\text{EMS}}{{\text{E}}_{\text{NLMS}}}\left( \infty \right) \approx \frac{{\tfrac{1}{2}\mu {\phi_{\text{NLMS}}}}}{{1 - \tfrac{1}{2}\mu {\phi_{\text{NLMS}}}}}{p_r}\left( {\sigma_\Sigma^2 - \sigma_g^2} \right). $$
(61)

It is clear that the NLMM algorithm, which is based on robust statistics, offers a substantially lower steady state error than the NLMS algorithm in impulsive noise environment. Similar observation applies to the LMM algorithm.

4 Simulation Results

In this section we shall first examine the performance of the NLMM and other related algorithms and then verify the analytical results obtained in Section 3. Since the comparison of the LMM algorithm with other robust algorithms is available in [7], we will only compare the NLMM algorithm with the LMM and NLMS algorithms in our experiments. All simulations are carried out using the system identification model shown in Fig. 1 and all the learning curves are obtained by averaging the results of K = 200 independent runs. The unknown system to be estimated is an FIR filter with order L. Its weight vector W * is randomly generated and normalized to unit energy. The input signal x(n) is obtained from a first order AR process \( x(n) = ax\left( {n - 1} \right) + v(n), \) where \( 0<a<1 \) is the correlation coefficient. v(n) is an additive white Gaussian noise (AWGN) sequence with zero mean and variance \( \sigma_v^2 \) and \( \sigma_v^2 = 1/\left( {1 - {a^2}} \right) \). The additive noise is either the Gaussian noise η g (n) with zero mean and variance \( \sigma_g^2 \), or the CG noise sequence η 0(n) generated from (12) with a probability \( P\left( {b(n) = 1} \right) = {p_r} \).

  1. Experiment A:

    Convergence performance in Gaussian and CG noise

In this experiment, the convergence performance and the robustness of the NLMM algorithm to impulsive noise are evaluated. The system order is L = 8 and the correlation coefficient of the input AR process is a = 0.9. The impulse occurrence probability and the impulsive characteristic ratio are respectively p r  = 0.01 and r im  = 300. For illustration purpose, the impulsive noise is applied to the system after time instant n = 2,000. From n = 1 to n = 1,999, the additive noise is η g (n) with \( \sigma_g^2 = {10^{ - 4}} \). From n = 2,000 onwards, a CG noise η 0(n) generated by (12) is applied. The step size of the NLMM, NLMS and LMM algorithms are set to \( {\mu_{\text{NLMM}}} = {\mu_{\text{NLMS}}} = 0.1 \) and μ LMM = 0.025, which enables all algorithms to reach a similar steady state EMSE. The small positive constant ε used to prevent division by zero in (2) and (10) for the NLMS and NLMM algorithms is set to 0.0001. The threshold parameter ξ of the M-estimate function in the NLMM and LMM algorithms is calculated from (9) with \( \widehat\sigma_e^2(n) \) being estimated from (7). The forgetting factor \( {\lambda_\sigma } \) in (7) is 0.95. The window length N w for the NLMM and the LMM algorithms is chosen to be 9. For better visualization, the locations of the impulses are fixed whereas their amplitudes are varied according to η w (n). This is realized by generating one fixed Bernoulli sequence \( \overline b (n) \) with p r  = 0.01 and using it in all of the independent runs. Consequently, the locations of the impulses are fixed at n = 2,482, n = 3,475 and n = 4,486 so as to visualize more clearly the impact of the impulsive noise on the desired signal. Figure 3 depicts the performance of all the algorithms, which shows they can reach a similar steady state MSE value. The NLMM and NLMS algorithms have almost identical initial convergence performance. It can be also seen that the NLMM and LMM algorithms are very robust to the impulses in the desired signal. In contrast, the performance of the NLMS algorithm is severely deteriorated by these impulses.

Figure 3
figure 3

The convergence performance and the robustness of (1) LMM, (2) NLMM and (3) NLMS algorithms to impulses in desired signal (MSE vs. n).

Simulations for p r larger than 0.01 were also conducted and similar results are obtained. In general, the performance of the algorithms will be degraded gradually as p r increases. More simulation results concerning the effects of using different parameter values of SNR, μ NLMM, N w , \( {k_\xi } \) are available in [41]. The results show that the NLMM algorithm has improved robustness to impulses and is not too sensitive to these parameters if they are reasonably chosen as suggested.

  1. Experiment B:

    Verification of convergence performance analysis

In this experiment, the theoretical analysis presented in Section 3 will be verified through extensive simulations. For the mean convergence, the norm of the mean square weight-error vector is used as the performance measure:

$$ {\left\| {{v_A}(n)} \right\|_2} = \sqrt {\Sigma_{i = 1}^L{{\left[ {\tfrac{1}{K}\Sigma_{j = 1}^Kv_i^{(j)}(n)} \right]}^2}}, \,i = 1, \cdots, L,\,j = 1, \cdots, K, $$

where \( v_i^{(j)}(n) \) is the i-th component of the weight-error vector v(n) at time n in the j-th independent run. For the mean square convergence results, \( {\text{EMSE}}(n) = {\text{Tr}}\left( {{\mathbf{\Phi }}(n)\Lambda } \right) \)is used as the performance measure. The values of the special integral functions appearing in the analytical results, i. e., I i (Λ), \( I_i^\prime \left( \Lambda \right) \), and I ii (Λ), are evaluated numerically using the method introduced in [45]. Unlike experiment A, the impulsive noise sequence used in these algorithms is applied to the tested algorithms throughout the whole adapting process. The locations of impulses in the desired signal are not fixed for each independent run. Different system order L, input correlation factor a, algorithm step size μ and other key parameters \( \sigma_g^2 \), r im , and p r are employed in different experiments. For succinct description, their values are summarized in Table 2. The following four experiments are conducted:

  1. Ex.1

    NLMS algorithm with CG noise. The theoretical results are calculated from (49), (56) and (58). Figure 4a shows that for mean convergence there is a good agreement between theoretical and experimental results. It can be seen from Fig. 4b that the EMSE as given in (58) is also close to the true EMSE, which shows that the performance of the NLMS is substantially affected by CG noise. The results concerning NLMS algorithm with Gaussian noise has been detailed in [37] so that they are omitted here.

  2. Ex.2

    NLMM algorithm with CG and Gaussian noises. The theoretical results of the former are calculated from (49), (56) and (60) while those for Gaussian noise are calculated from (20), (29) and (44). All theoretical and experimental results are depicted in Figs. 5, 6 and 7 respectively and they are found to match each other very well. To evaluate the effectiveness and robustness of the ATS proposed in (7), (9), the “impulse-free” variance \( \widehat\sigma_e^2(n) \) estimated from (7), \( k_\xi^2\widehat\sigma_e^2(n) \), and \( \sigma_e^2(n) \) for one independent run are plotted in sub-figures (c) at Figs. 5, 6, 7 and 8. It can be seen that the estimated “impulse-free” variance is able to follow the true value and impulses with large amplitudes can be effectively detected. As pointed out at the end of Section 3, an estimate of the lower bound of the EMSE for the NLMM algorithm is \( \frac{1}{2}\mu \sigma_g^2 \). To account for the approximation error in this formula, we also empirically estimated an upper bound \( CF \cdot \frac{1}{2}\mu \sigma_g^2 \) from extensive simulation results by means of a constant correction factor CF, which was found to be 1.7. Due to space limitation, we only plot these bounds in Fig. 5b curve (4) and Fig. 7b curve (3). It can be seen that these estimates give satisfactory bounds for the EMSE.

  3. Ex.3

    LMM algorithm with CG noise, which is a special case of Ex. 1 with all special integrals equal to one. Figure 8 illustrates the good performance of the algorithm and an agreement between the theoretical and simulation results.

Table 2 List of parameters in experiment B.
Figure 4
figure 4

The mean (a) and mean square (b) convergence performance of the NLMS algorithm with CG noise. Learning curves: (1) a = 0, μ = 0.2, σ 2 g = 10−6, r im = 50, p r = 0.005; (2) a = 0.3, μ = 0.15, σ 2 g = 10−5, r im =100, p r = 0.01; (3) a = 0.6, μ = 0.1, σ 2 g = 10−4, r im = 200, p r = 0.015; (4) a = 0.9, μ = 0.05, σ 2 g = 10−3, r im = 400, p r = 0.02.

Figure 5
figure 5

The mean (a) and mean square (b) convergence performance of the NLMM algorithm with CG noise; c Illustration of (i) \( \sigma_e^2(n) \), (ii) \( \widehat\sigma_e^2(n) \) and (iii) \( k_\xi^2\widehat\sigma_e^2(n) \). Learning curves: (1) a = 0, μ = 0.2, σ 2 g = 10−6, r im = 50, p r = 0.02; (2) a = 0, μ = 0.01, σ 2 g = 10−6, r im =50, p r = 0.02; (3) a = 0, μ = 0.05, σ 2 g = 10−6, r im = 50, p r = 0.02; (4) a = 0, μ = 0.02, σ 2 g = 10−3, r im = 400, p r = 0.005; (5) a = 0, μ = 0.01, σ 2 g = 10−3, r im = 400, p r = 0.005; (6) a = 0, μ = 0.07, σ 2 g = 10−3, r im = 400, p r = 0.005.

Figure 6
figure 6

The mean (a) and mean square (b) convergence performance of the NLMM algorithm with CG noise; c Illustration of (i) \( \sigma_e^2(n) \), (ii) \( \widehat\sigma_e^2(n) \) and (iii) \( k_\xi^2\widehat\sigma_e^2(n) \). Learning curves: (1) a = 0.9, μ = 0.3, σ 2 g = 10−6, r im = 50, p r = 0.02; (2) a = 0.9, μ = 0.15, σ 2 g = 10−6, r im =50, p r = 0.02; (3) a = 0.9, μ = 0.1, σ 2 g = 10−6, r im = 50, p r = 0.02; (4) a = 0.9, μ = 0.08, σ 2 g = 10−3, r im = 400, p r = 0.005; (5) a = 0.9, μ = 0.05, σ 2 g = 10−3, r im = 400, p r = 0.005; (6) a = 0.9, μ = 0.03, σ 2 g = 10−3, r im = 400, p r = 0.005.

Figure 7
figure 7

The mean (a) and mean square (b) convergence performance of the NLMM algorithm with Gaussian noise; c Illustration of (i) \( \sigma_e^2(n) \), (ii) \( \widehat\sigma_e^2(n) \) and (iii) \( k_\xi^2\widehat\sigma_e^2(n) \). Learning curves: (1) a = 0, μ = 0.02, σ 2 g = 10−6; (2) a = 0.3, μ = 0.15, σ 2 g = 10−5; (3) a = 0.6, μ = 0.01, σ 2 g = 10−4; (4) a = 0.9, μ = 0.05, σ 2 g = 10−3.

Figure 8
figure 8

The mean (a) and mean square (b) convergence performance of the LMM algorithm with CG noise; c Illustration of (i) \( \sigma_e^2(n) \), (ii) \( \widehat\sigma_e^2(n) \) and (iii) \( k_\xi^2\widehat\sigma_e^2(n) \). Learning curves: (1) a = 0, μ = 0.01, σ 2 g = 10−6, r im = 50, p r = 0.02; (2) a = 0.3, μ = 0.008, σ 2 g = 10−5, r im = 100, p r = 0.015; (3) a = 0.6, μ = 0.006, σ 2 g = 10−4, r im = 200, p r = 0.01; (4) a = 0.9, μ = 0.005, σ 2 g = 10−3, r im = 400, p r = 0.005.

5 Conclusions

A new convergence performance analysis of the LMS and NLMS algorithms with error nonlinearity in Gaussian and CG noises is presented. The approach relies on the use of the Price theorem and its extension to signals with mixture Gaussian distributions, and the use of generalized Abelian integral functions in evaluating the mathematical expectation involved. New formulas for the EMSE, stability bound and difference equations describing both the transient and the steady state convergence behaviors of the algorithms are obtained. Simulation results show good agreement with the theoretical results.