Combined Regularization Factor for Affine Projection Algorithm Using Variable Mixing Factor

The affine projection algorithm with a fixed regularization parameter is subject to a compromise concerning the convergence speed and steady-state misalignment. To address this problem, we propose to employ a variable mixing factor to adaptively combine two different regularization factors in an attempt to put together the best properties of them. The selection of the mixing factor is derived by minimizing the energy of the noise-free a posteriori error, and for the sake of suppressing large fluctuations, a moving-average method is designed for updating the mixing factor. Based on a random walk model, we also prove that the proposed mixing factor is as well available for the non-stationary system. The mathematical analysis including the stability performance, steady-state mean square error, and computational complexity are performed. In practice, we compare with the existing related algorithms in system identification and echo cancellation scenarios, the results illustrate that the proposed algorithm outperforms them with notable margins.


I. INTRODUCTION
Adaptive filtering algorithms are attracting considerable interest in numerous signal processing fields [1], [2], [3], [4], [5]. And due to the simple structure and implementation, the least mean square (LMS) and normalized least mean square (NLMS) algorithms have already gained broad application [6], [7]. Since these two LMS-type algorithms will deteriorate in convergence speed when the input signals are correlated, the affine projection algorithm (APA) was proposed by Ozeki K [8] to ensure a superior convergence rate. However, matrix inversion is inevitable in the classical APA, which may give rise to a singularity. Considering practical reasons, a diagonal matrix obtained through multiplying an identity matrix by a positive The associate editor coordinating the review of this manuscript and approving it for publication was Baoping Cai . constant δ, which is called the regularization factor, is generally added to the matrix requiring inverse to prevent numerical problems [6], [7]. With emphasizing the presence of the regularization parameter in mind, such an algorithm is named the regularized affine projection algorithm (R-APA). Despite this fact, there is still a conflicting objective between convergence speed and steady-state misalignment for the R-APA with a fixed regularization factor.
To tackle the aforementioned issue, extensive research in recent years has concentrated on the variable regularization factor (VRF) strategy. Early on, a pseudo-optimal regularization factor for the APA was carried out in [9], in which the input signals tend to be pre-whitened additionally. An alternative adaptation of the regularization matrix was developed by the normalized stochastic gradient in [10]. Furthermore, by setting the a posteriori error equal to the noise variance, [11] put forward a variable regularization factor APA (VR-APA). Based on the premise that the statistical characteristic of the noise is fully considered, [12] proposed a novel VRF formulation to reduce the steady-state misalignment, but the exponential scaling factor estimation method introduced several parameters and excessively relied on some a priori knowledge. Later on, an improved version of the pseudo APA (PAPA) [13] whose regularization factor varies was derived to overcome the problems of the PAPA. And by minimizing the l 2 -norm of the a priori error vector, [14] proposed a variable regularization affine projection sign algorithm (VR-APSA) to improve the convergence performance. Traditionally, the already proposed VRF methods can alleviate the tradeoff between convergence speed and steadystate error to some extent, but the challenge of high steadystate errors is still present in some environments.
Not only the VRF schemes but also some adaptive combination approaches have generated much research attention to improve the performance of the filters, which can perform at least as well as the best of the components. Earlier attempts pertaining to the combination of APA were reported in [15] and [16]. The construct of the convex combination was first articulated in [15], which can effectively solve the contradiction problem. Nevertheless, there is still a distinctive two-stage convergence phenomenon in practical simulations. [16] achieved the convex combination between two different R-AP filters and briefly introduced two improvement methods. Indeed, the improved algorithms shown in [16] intend to relieve the two-stage convergence, there is much room to accelerate the convergence speed. Such a combination scheme has also been extensively applied to some specific adaptive filters [17], [18], [19], [20], [21]. In these works, it was well demonstrated that the convex combination of adaptive filters can provide a promising way to improve performance.
In this paper, to take full advantage of the fast convergence of the small regularization factor and low steady-state misalignment of the large regularization factor, we propose a combined regularization factor for affine projection algorithm (CRF-APA), which adaptively associates two different regularization factors with a variable mixing factor that is obtained by minimizing the energy of the noise-free a posteriori error. Subsequently, to refrain from large fluctuation, the mixing factor is streamlined in a moving-average method. If the variance of the noise is unknown, an efficient learning strategy is applied to ascertain its estimate. For the integration of the algorithm, we exemplify that the mixing factor is similarly available for a non-stationary environment, which is concluded from a random walk model. To illustrate the above, we have done theoretical analyses containing the stability performance, steady-state mean square error (MSE), and computational complexity. Finally, experiments in system identification and echo cancellation scenarios serve as further validation of the proposed algorithm.
The rest of the paper is organized as follows. In section 2, the R-AP recursion is concisely described. And section 3 illustrates the proposed algorithm with aspects of the stationary system and non-stationary system. Some performance analyses were carried out in Section 4. Section 5 exhibits numerical simulation results aimed at assessing the performance of the proposed algorithm. Ultimately, section 6 summarizes the conclusion.

II. BRIEF VIEW OF THE R-APA
We consider that the input signal matrix U(n) = [u(n), u(n − 1) . . . u(n − K + 1)] is filtered through the unknown system ω o to get the desired output vector d(n).
where n is the time index, and v(n) denotes the noise vector. Besides, in this paper, L is the filter length, and K is the projection order. And a priori error vector can be written as where ω(n) is the estimate of ω o at iteration n. From [6] and [7], through the Lagrange multiplier and gradient descent method, the update equation of the conventional R-APA is given by where µ accounts for the step size, and δ denotes the regularization factor which is generally used to provide numerical stability. Furthermore, as far as the R-APA is concerned, a small regularization factor equates to a large step size, hence the algorithm can converge fast, while also generating a large steady-state misalignment. Instead, a large regularization factor is comparable to a small step size, which will result in a relatively small misalignment only but poor convergence. For that reason, we are keen to explore a novel scheme that can combine the respective merits of different regularization factors to improve the behavior of the algorithm.

III. PROPOSED CRF-APA A. STATIONARY SYSTEM
It's worth noting that the value of µ should meet 0 < µ < 1 or 1 < µ < 2 for ensuring convergence stability [22]. And the former has a lower steady-state error and the filter converges fastest when the µ equals 1. Therefore, as is done in [23] and [24], we assume the step size µ to be one.

Assumption 1: The noise is independent and identically distributed and statistically independent of the regression matrix U(n) and U(n) is statistically independent of e(n) at steady-state [24]
Assumption 2: The expectation of the ratio between two random variables can be similar to the ratio of their expectations. [23], [25] Taking the square and expectation of (6), since some factors disappear under Assumption 1, we arrive at To simplify the expression (7), we perform the eigenvalue decomposition of the matrix U T (n) U(n): where (n) denotes a diagonal matrix formed by the eigenvalues of U T (n) U(n), and (n) = diag{λ 1 (n), λ 2 (n), . . . , λ K (n)}. In addition, Q(n) represents a matrix whose columns are the corresponding eigenvectors of U T (n) U(n). Based on (8), we get Substituting (9) into (7) yields Taking the first-order derivative of the right-hand side of (10) with respect to t(n) in δ(n) and setting the result to zero leads to where 1 ≤ k ≤ K. And just like [12] and [24], the eigenvalues of U T (n) U(n) can be approximated as where Lσ 2 u (n) can be regarded as the average of all eigenvalues and σ 2 u (n) is the estimate of the variance of the input signal u(n), which can be calculated by where Tr{·} denotes the trace of the matrix. Then substituting (12) and (13) into (11) and invoking Assumption 2, we can get Therefore, the updating formula of the mixing parameter t(n) is where E[||v(n)|| 2 ] = K σ 2 v (n) and E[||e a (n)|| 2 ] = σ 2 ea (n). Next, we approximate the power of noise-free a priori error σ 2 ea (n) in (15) by the time average of the squares of e a (n).
The vector e a (n) can be recovered from the noisy a priori error vector e(n) by using the shrinkage denoising method described in [26] and we can obtain the solution by the following steps (17)(18)(19)(20).
where D is an orthogonal matrix, denotes the element-wise product, g(n) = [g 1 (n), g 2 (n), . . . , g K (n)] whose element satisfies where a o,r (n) is the r-th element of a o (n). In order to more smoothly update the mixing factor t(n), we apply the time-averaging method to (15).
The time-average method in (22) can also guarantee the monotonic decrease of t(n), nevertheless, it may yield negative values when the algorithm arrives at the steady-state. To overcome this shortcoming, we provide an adjustment as where t min is a positive constant about zero [27]. Notably, if t(n) < 0, t(n) will equal t min , so as to avoid any appearance of negative values.

B. NONSTATIONARY SYSTEM
It is supposed that the unknown weight vector ω o (n) at iteration n has the following random walk model [28] ω o (n + 1) = ω o (n) + ϕ (n) where ϕ(n) is a random accumulation, which is independent and identically distributed and independent of {ω o (0), ω(0)}, {x j |j ≥ 0}, {v j |j ≥ 0}. Each element of ϕ(n) is of zero mean with the variance σ 2 ϕ (n). And then (5) can be reformulated as Similarly, pre-multiplying (25) by U T (n) yields Since the additional term does not have any dependency on the other variables in (25), squaring both sides of (26) and taking the expectation, we have The term E[ ϕ T (n) U T (n) U(n)ϕ(n)] is involved with the mixing factor t(n), and then it will vanish after differentiating with regard to t(n). Consequently, solving (27) gives the same result as further processing (7), thus we conclude that the proposed algorithm is also applicable to the nonstationary system.

C. PRACTICAL CONSIDERATIONS 1) ESTIMATION OF THE NOISE VARIANCE
The estimation of noise variance is of great importance to the proposed CRF-APA. And if the variance of the background noise is unknown in advance, we can find its estimate using the following strategy [29].

2) RESET METHOD FOR A SYSTEM SUDDEN CHANGE
Due to the monotonically decreasing behavior of the mixing factor t(n), it must be re-initialized each time the unknown system is changed, otherwise the CRF-APA would lose its tracking capability. For addressing this issue, the reset method illustrated in [23] is used as a reference and modified, as summarized in Table 1.

IV. PERFORMANCE ANALYSIS A. STABILITY PERFORMANCE
To guarantee the stability of the proposed algorithm and avoid any appearance of negative values, we need to set t min . In this section, we need to utilize a posteriori error vector ε(n), which VOLUME 10, 2022 is defined as Substituting (4) into (31) yields and From (33), the result is selected to satisfy E[||ε(n)|| 2 ] ≤ E[||e(n)|| 2 ] (with equality only when e(n) = 0). Using the result in (8), the following holds: Furthermore, we deduce that In order to satisfy that (35) is positive-definite, the term (2I-(n)( (n) + δ(n) I) −1 should be positive. Taking an expectation, we obtain As a result, the lower bound of the mixing factor for the proposed algorithm can be expressed as

B. STEADY-STATE MSE
Based on Assumption 1, Shin et al. [30] have indicated that the formula concerning the steady-state MSE of the APA in a stationary system is where α u , η u , and A(n) are, respectively, calculated by with S ≈ diag{1, 0, . . . , 0}. Particularly, in [31], the steadystate MSE of the APA with a small regularization factor δ and large filter length L has been simplified as However, the key to the proposed CRF-APA is to work on the effect of δ on the APA regardless of the size of the value. Therefore, we need to allow for a large δ and the evaluation of the mixing factor t(n) [32].
Based on (8), performing the eigenvalue decomposition of (41) yields Exploiting an evolutionary method, the exact mathematical expectation in (38) can be substituted with an instantaneous value. And the instantaneous values of α u and η u are described as In theory, an ideal mixing factor whose value will tend towards zero at steady-state. For this reason, (46) can be rewritten as Notably, compared with (42), the result of (47) is not only related to the variance σ 2 v (n), but also associated with the input signal and the larger regularization factor. Table 2 exhibits the iteration-wise computational complexity values of the traditional R-APA, CR-APA [16], ICR-APA [16], VR-APAs [11], [12], En-APA [33], and CRF-APA in terms of the total number of multiplications, additions, and comparisons. It can be seen from the data that the CR-APA and ICR-APA have high computational costs because they are bound to update two filters with different regularization factors in the meantime. Due to only one filter involved, the En-APA is relatively easy to implement. Because of the presence of the mixing factor, the CRF-APA just mentions only one filter actually. And the shrinkage denoising method just introduces a trivial computational burden. Nevertheless, since the matrix trace is imperative in dealing with the input signals in (13), which causes a greater computational cost, the CRF-APA doesn't have a significant advantage in the calculation. The trace of the matrix is also adopted by the VR-APAs. More detailed studies on computational complexity are illustrated in [7] and [32].

V. SIMULATION RESULTS
In this section, we present the simulation comparison of the proposed algorithm and other corresponding state-of-theart competing methods in two applications, namely, system identification and echo cancellation scenarios. And we use the normalized mean-square deviation (NMSD) to measure the algorithm performance, which is defined as Besides, we assume that the noise variance is known for [8], [22], and [33]. In the simulations, the signal-to-noise ratio (SNR) is stated as Here, simulations are designed for system identification.
We randomly generate the unknown system ω o (||ω o || 2 = 1). The length of the unknown filter is set to 32, and we assume the adaptive filter has the same number of taps. The colored input signals are generated by filtering a white Gaussian random sequence through a first-order system G(z) = 1/ (1-0.9z −1 ). What's more, the projection order is set to K = 2 and all the learning curves are generated over 100 independent trials to avoid losing the generality.
First of all, we evaluate the performance of the proposed CRF-APA compared with that of the conventional R-APAs, CR-APA, ICR-APA, VR-APAs, and En-APA in a stationary environment with SNR = 30dB. Fig. 1 plots the NMSD learning curves of the above algorithms and Fig. 2 displays the curve of the variable mixing factor of CRF-APA. All the parameters are tuned so that the referred algorithms have both fast convergence and small steady-state error, which are stated in Table 3. As can be seen in Fig. 1, the proposed algorithm achieves both a fast convergence rate of the small regularization factor and low steady-state misalignment of the large regularization factor, while VR-APAs have poor performance in terms of steady-state misalignment and the ICR-APA based on CR-APA converges more slowly than the other algorithms. Notably, the En-APA has a lower steadystate error than VR-APAs but converges slower sometimes. And from Fig. 2, we can visualize that the variation of the mixing factor of CRF-APA decreases from 1 to the minimum value as expected, which can ensure that CRF-APA converges quickly and also achieves a low steady-state misalignment.
Additionally, in adaptive algorithms, tracking capability is also a crucial issue. Figs. 3 and 4 plot the NMSD learning  curves in a non-stationary environment simulation and the variable mixing factor curve of CRF-APA in this case, respectively. We assume that the unknown system impulse responses are regenerated abruptly at iteration 3.5 × 10 4 . And the reset parameters are set as ρ = 1, β = 0.95. As shown in Fig.3, the CRF-APA is able to track the changed system quickly, neither losing the convergence speed nor increasing the misalignment. Fig. 4 also exhibits the desired behaviors. It can be noticed that the value of the mixing factor is 1 in the initial period of convergence or when the system changes and finally decreases to the minimum value at the steady-state. These simulations amply indicate that the CRF-APA is highly robust against environmental changes and does not involve a time-consuming parameter adjustment process.
Lastly, aimed at the impact of different noise environments on the algorithms, we simulate similar experiments again for a non-stationary system under different noise settings. Fig. 5 depicts the NMSD learning curves of the aforesaid algorithms in the simulations carried out in the same environment as that shown in Fig. 3, except for SNR = 20dB. Fig. 6 examines the effect of the algorithms under the SNR is 10dB, and for better visualization, we change the value of the step size to µ = 0.2 (for R-APAs, VR-APAs and CRF-APA) and µ = 0.06 (for En-APA). Besides, we turn the settings of the regularization factors into 0.1 and 1100. It is clearly observed that the CRF-APA still provides better performance in the case of SNR = 20dB and SNR = 10dB. These simulations validate the conjectures made by us regarding the behavior of the proposed algorithm irrespective of the noise level.

B. ECHO CANCELLATION APPLICATION
In this subsection, the performance of the proposed CRF-APA is tested for echo cancellation and compared with the R-APAs, CR-APA, ICR-APA, VR-APAs, and En-APA again. As shown in Fig. 7, we set the length of the echo path to 128,  and the same length is used for the adaptive filters. The input speech signal is sampled at 8kHz as illustrated in Fig. 8. And an independent white Gaussian noise signal is added to the output of the echo path with SNR = 30dB. All the simulation results are obtained after 30 independent trials. What's more, just like [19] and [34], we add another performance evaluation metric called echo return loss enhancement (ERLE), which is described in (50). The higher the value, the better the algorithm's ability to eliminate echoes. Fig. 9 reveals convergence curves of the CRF-APA together with the other algorithms with the speech sequence as input. In this case, α 1 = 0.998, α 2 = 0.9999,   α 3 = 0.999995, δ 1 = 0.01, δ 2 = 50, K = 2, and other parameters are selected to obtain the optimal performance for all algorithms. As can be seen, compared with other algorithms, the proposed CRF-APA has the fastest convergence and lowest steady-state error, which confirms the effectiveness of the mixing factor with aspects of astringency and error performance under the echo cancellation scenario. We can also conclude similar conclusions in Fig.10, which depicts the ERLE comparison of the algorithms. And Table 4 exhibits the corresponding mean ERLE values. An improvement in ERLE   is observed in the CRF-APA compared to the existing algorithms, and reflects the superiority of the proposed algorithm.

VI. CONCLUSION
In this paper, we presented a novel method for deriving the variable mixing factor that allows incorporating the strengths of two separate regularization factors, which intends to improve the performance in the convergence rate and steady-state error. Considering the impact of environmental noise, the mixing factor was deduced by minimizing the energy of noise-free a posteriori error. And using the instantaneous estimation and shrinkage denoising method, a practical approximation of the optimal mixing factor has been shown as well. When the environment changes suddenly, the algorithm demonstrated strong tracking behavior and ongoing effectiveness. Through extensive analyses and simulations, our proposed algorithm proved to have competitive convergence speed and produce small steady-state misalignment for both system identification and echo cancellation scenarios, as compared with other available algorithms.
MENGHUA JIANG received the bachelor's degree from Shandong Normal University, Jinan, China, in 2020. She is currently pursuing the master's degree with Yantai University, Yantai, China. Her research interests include adaptive filtering algorithm, sparse signal processing, and machine learning. SHIFENG OU received the Ph.D. degree from Jilin University, Jilin, China, in 2008. He is currently a Professor and a Master Tutor with the School of Physics and Electronic Information, Yantai University, Yantai, China. His research interests include speech signal processing, adaptive filtering algorithm, and blind signal processing. VOLUME 10, 2022