Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering

The complex correntropy has been successfully applied to complex domain adaptive filtering, and the corresponding maximum complex correntropy criterion (MCCC) algorithm has been proved to be robust to non-Gaussian noises. However, the kernel function of the complex correntropy is usually limited to a Gaussian function whose center is zero. In order to improve the performance of MCCC in a non-zero mean noise environment, we firstly define a complex correntropy with variable center and provide its probability explanation. Then, we propose a maximum complex correntropy criterion with variable center (MCCC-VC), and apply it to the complex domain adaptive filtering. Next, we use the gradient descent approach to search the minimum of the cost function. We also propose a feasible method to optimize the center and the kernel width of MCCC-VC. It is very important that we further provide the bound for the learning rate and derive the theoretical value of the steady-state excess mean square error (EMSE). Finally, we perform some simulations to show the validity of the theoretical steady-state EMSE and the better performance of MCCC-VC.


Introduction
Choosing the appropriate cost function (usually the statistical measure of error signal) is the key problem in adaptive filtering theory and application [1][2][3]. In the presence of Gaussian noise, it is best to using the minimum mean square error (MMSE) criterion. Therefore, a series of MMSE based algorithms [4][5][6][7] have emerged during the past decades. The MMSE based algorithms use the mean square value of the error between the desired signal and output signal as the cost function, which has many attractive features, such as convexity and smoothness. In addition, MMSE has low computational complexity since it only needs to calculate the second order statistics of the signals. However, in many non-Gaussian cases, the MMSE based algorithms are not robust. To improve this shortcoming, many kinds of non-MMSE criteria based algorithms have been developed in [8][9][10][11][12][13][14][15][16]. Since signals are often expressed in complex forms in many practical scenarios [17,18], adaptive filtering in complex domain is of great significance. During the past few years, some information criteria based algorithms have been proposed for complex domain adaptive filtering [19][20][21][22]. Particularly recently Guimarães defined a new similarity measurement between two complex variables based on complex correntropy [19,20] and proposed the maximum complex correntropy criterion (MCCC) algorithm. MCCC uses a complex Gaussian function as the kernel function, and derives the updation of weight based on Wirtinger Calculus. The complex Gaussian kernel function is desirable due to its smoothness and strict positive-definiteness. The performance of the MCCC algorithm is better than classic MMSE based algorithms, and is robust to non-Gaussian noise. Moreover, MCCC has been widely applied to many fields of machine learning and signal processing [23,24].
According to the MCCC, given two complex variables C 1 = A 1 + jB 1 and C 2 = A 2 + jB 2 , complex correntropy is defined by [19,20] where A 1 , B 1 , A 2 , B 2 represent real variables, E[·] denotes the expectation, and κ(C 1 − C 2 ) denotes the kernel function with and σ > 0 is the kernel width. The purpose of adaptive filtering is to estimate the target variable T in some sense by designing a model M to construct a output Y from input X. Under MCCC, we find this mode by maximizing the complex correntropy between T and Y: where M is the model assumption space which contains the possible models to construct the output Y from input X, and M * is the optimal model. However, the center of complex correntropy is always at zero, which is not the best option in the case of non-zero mean noise. Although the maximum corentropy criterion with variable center in [25] and [26] can be suitable for the variable center, they cannot be used for complex domain adaptive filtering. To overcome their defects, this paper proposes the maximum complex correntropy criterion with variable center (MCCC-VC).
The main contributions of this research lie in the following aspects: (1) we define a MCCC-VC and give its probability explanation; (2) based on the MCCC-VC, we propose a novel adaptive filtering algorithm in complex domain by utilizing the gradient descend approach; (3) we give effective and feasible methods to estimate the kernel center and update the kernel width adaptively; (4) we derive the bound for the learning rate, and the theoretical steady-state excess mean square error (EMSE) of the MCCC-VC algorithm, and verify the theoretical analysis by simulations.
The organization of this paper is as follows: Section 2 defines the complex correntropy with variable center and studies its properties. Section 3 proposes the MCCC-VC algorithm and provides the method for the optimization of the parameters. In addition, Section 3 also studies the convergence of the MCCC-VC algorithm and derives the theoretical steady-state excess mean square error (EMSE). Section 4 verifies the correctness of the theoretical conclusions and the superior performance of the MCCC-VC algorithm. Finally, Section 5 summarizes the conclusion of this paper.

Complex Correntropy with Variable Center
For two complex variables, the target variable T and the output Y, the complex correntropy with variable center is defined as: where c ∈ C represents the center of the kernel function. When c = 0, (4) will return to the original complex correntropy.
The complex correntropy with variable center c consists of the whole even moments of T − Y about the center c, which is as follows: where e = T − Y is the complex valued error variable. With the increase of σ, the higher-order moments around the variable center c would attenuate quickly. Therefore, the second-order moment is the key factor which determines the value. In particular, when c = E[e] and σ → ∞ , maximizing the complex correntropy with c would be equal to minimizing the variance of the error. Moreover, when σ → 0 , we obtain where δ(x, y) is the two-dimensional Dirac function with δ(x, y)dxdy = 1 δ(x, y)= 0, x 2 + y 2 0 , the second line is derived based on the fact that lim has the same property as δ(x, y), t R , y R , and c R are the real parts of t, y and c, t I , y I , and c I are the imaginary parts of t, y, and c, and p TY (t R , t I , y R , y I ) denotes the joint probability density function (PDF) of (T, Y). Furthermore, we derive the following result: where p e (ε R , ε I ) is the joint PDF of error. Thus, when σ → 0 , the value of complex correntropy with variable center c would approach p e (ε R , ε I ) evaluated at (c R , c I ).

MCCC-VC Algorithm
In this part, we derive a novel adaptive filtering algorithm based on maximum complex correntropy criterion (i.e., minimum complex correntropy loss) with variable center (MCCC-VC).

Cost Function
We apply the MCCC-VC to adaptive filtering and derive the cost function as follows: where is the error at time instant k, w = [w 1 w 2 · · · w m ] T is the filter weight, d(k) is the desired signal at time instant k, x(k) = [x(k) x(k − 1) · · · x(k − m + 1)] T is the input signal at time instant k, c(k) is the center of the kernel at time instant k. The essential idea behind the cost function (8) is that, in the practical case, even when the error distribution is non-zero-mean, the proposed MCCC-VC can perform well, because MCCC-VC matches well the error distribution. Figure 1 compares the surfaces of the proposed MCCC-VC with MCCC, where the noise is non-zero-mean complex Gaussian noise with unit variance. For visualization, we chose m = 1, and set the system parameter and the mean of the noise as w 0 = 5 + 5i and c = 6 + 6i, respectively. One can see that the cost function of MCCC-VC is minimized at w 0 , whereas the cost function of MCCC is minimized at some other place.
is the error at time instant k , is the center of the kernel at time instant k .
The essential idea behind the cost function (8) is that, in the practical case, even when the error distribution is non-zero-mean, the proposed MCCC-VC can perform well, because MCCC-VC matches well the error distribution.

Gradient Descent Algorithm Based On MCCC-VC
Since the stochastic gradient descent approach requires less computational complexity, we adopt it to search the minimum of the cost function. Utilizing Wirtinger Calculus [27,28], we obtain the updation of the weight as follows:

Gradient Descent Algorithm Based On MCCC-VC
Since the stochastic gradient descent approach requires less computational complexity, we adopt it to search the minimum of the cost function. Utilizing Wirtinger Calculus [27,28], we obtain the updation of the weight as follows: where η w = µ 2σ 2 is the learning rate for the weight.

Optimization Problem in MCCC-VC
The two parameters center location c and the width of kernel σ act a pivotal part in the performance of MCCC-VC. Thus, it is extremely important to optimize them to further improve the robustness and convergence performance in the non-zero mean noise.
The optimal model according to MCCC-VC is as follows: In addition, the complex correntropy with variable center can be divided into three parts: Owing to the first term is independent from the optimal model, we can derive where and The parameters can be optimized by where Ω and C represent the allowed sets of parameters σ and c.

Remark 1.
It can be seen that as long as the function U C σ,c (T, Y) is maximized, M, σ, and c can be optimized simultaneously. However, it is computationally demanding to compute and compare all the values of U C σ,c (T, Y) under all the possible parameters in the allowed sets. Moreover, it may be difficult to obtain the allowed sets of parameters.

Stochastic Gradient Descent Approach
To further simplify the optimization problem, we propose a stochastic gradient descent based online approach.
(1) When the model M is fixed, p e (ε)dε r dε I is independent of the kernel width σ and the center position c. In this case, σ and c can be optimized according to the following formula: . Therefore, we have the following formula: Furthermore, in order to simplify the optimization problem, we can set c(k) as the median or mean of the error samples. Thus, we only need to optimize σ. We take 1/σ 2 as a new variable σ, and update σ and σ 2 using the stochastic gradient descent approach as follows: and where c(k) is estimated online as c(k) = k l=k−T+1 e(l), and T is the smoothing length, η σ is the learning rate for σ.
(2) When the kernel width σ(k) and the center position c(k) is fixed, the model M is optimized by MCCC-VC using (10).

Remark 2.
For the proposed MCCC-VC algorithm, the weight and the parameters are updated alternately at each time instant k using (10), (19) and (20), respectively.

Convergence Analysis
The MCCC-VC algorithm is written as a form of nonlinear error: with f (e(k)) = exp − |e(k)−c(k)| 2 2σ 2 (e(k) − c(k)) * being the scalar function of the error e(k).

Taking into consideration that
the error is written as where w(k) = w 0 − w(k) is the weight error vector at time instant k, w 0 is the system parameter, e a (k) = w H (k)x(k) is the prior error, and v(k) is the additive noise at time instant k. Therefore, we get the following formula By taking the square of the 2-norm of both sides, we can further get the following formula: To guarantee the convergence of the MCCC-VC, the weight error power should be gradually decreased. Thus, we obtain the bound for the learning rate as follows:

Steady-State Mean Square
If MCCC-VC arrives at steady-state, we have Then, when k → ∞ , we can get According to the definition of the steady-state excess mean square error (EMSE), we have To obtain the theoretical steady-state EMSE, we present the following two assumptions [21,22,29]: (1) v(k) is zero-mean distributed and independent of x(k), and x(k) is circular.
(2) e a (k) is zero-mean and independent of v(k).
Owing to the distributions of x(k), v(k), e a (k), and e(k) are not related to the time index k at the steady-state, the time index is ignored in the following derivation.
The left side of (28) can be written as where g 1 (e) = exp − |e − c| 2 2σ 2 (31) We use Taylor expansion to approximate g 1 (e) as where Owing to x is circular, we can get the values of the following two items: Based on the above derivation, if the higher-order terms are small enough, we can rewrite the left side of (28) as follows: The right side of (28) can be written as where In a similar way, we use a Taylor expansion to approximate g 2 (e) as where If the higher-order terms are small enough, we can rewrite the right side of (28) as follows: where Finally, we get the theoretical steady-state EMSE as follows: Furthermore, when η w is small enough, (49) is further simplified as Due to e ≈ v at the steady state, we can further obtain the theoretical value of σ 2 based on the following approach: Since the right side of (51) depends on σ 2 , it is a fixed-point solution for the theoretical σ 2 . (50) is accurate only when e a is small enough, since the higher-order term can be negligible in this case. However, if the noise power or step size is too large, or the center position of the kernel function deviates from the mean of the noise, there will be a large deviation between the theoretical and simulated values of steady-state EMSE.

Simulation
In this section, we present some simulations to show the validity of theoretical results and the superiority of MCCC-VC. We obtain all the simulation results by averaging over 300 Monte Carlo trials.

Steady-State Performance
In this part, the filter weight w 0 = w 1 w 2 · · · w 10 ] T is randomly generated, where w k = w Rk + jw Ik , and w Rk , w Ik ∈ N(0, 0.1), w Rk and w Ik represent the real and imaginary components of w k , and N µ,σ 2 denotes the Gaussian distributed variable whose mean and variance are µ andσ 2 , respectively. We randomly generate input signal x = x R + jx I . In order to show the robustness of MCCC-VC, additive complex noise v = v R + jv I is added in the simulation, whose real and imaginary parts are denoted by v R and v I , respectively. All algorithms initialize w with a zero vector.
Firstly, we illustrate the correctness of theoretical steady-state EMSEs. For each simulation, 30,000 iterations are carried out to make sure MCCC-VC reaches the steady-state, and the last 1000 iterations are used to obtain the simulated steady-state EMSEs. The theoretical kernel width and steady-state EMSEs are calculated according to (51) and (50), respectively. Figures 2 and 3 show the simulated and theoretical steady-state EMSEs of MCCC-VC under various noise variances and learning rates, where v is Gaussian distributed with mean 3 + 3 j. It can be seen from both figures that theoretical results are closely matching with simulated results.   Then, we change the noise to binary noise, and the mean is also 3 + 3 j. In addition, the simulated and theoretical steady-state EMSEs are obtained the same as before. Figures 4 and 5 show the simulated and theoretical steady-state results of MCCC-VC under various noise variances and learning rates. Obviously, there is also a good matching between theoretical results and simulated results.

Performance Comparison
In this part, we compare the performance of the proposed MCCC-VC algorithm with MCCC and minimum complex kernel risk sensitive loss (MCKRSL) [22]. For the fair comparison, all three algorithms use the gradient descent method to search for the optimal solution. We measure the performance of all the algorithms by weight error power.
In this simulation, the noise ( ) is composed of two independent noises [16], i.e.,

Performance Comparison
In this part, we compare the performance of the proposed MCCC-VC algorithm with MCCC and minimum complex kernel risk sensitive loss (MCKRSL) [22]. For the fair comparison, all three algorithms use the gradient descent method to search for the optimal solution. We measure the performance of all the algorithms by weight error power.
In this simulation, the noise v(k) is composed of two independent noises [16], i.e., v(k) = (1 − a(k))A(k) + a(k)B(k), where P(a(k)= 0)= 1 − c, and P(a(k) = 1) = c (0 ≤ c ≤ 1). A(k) is the ordinary noise with small variance σ 2 v = 1 whose real and imaginary parts are denoted by A R (k) and A I (k), and B(k) is the outliers with large variance whose real and imaginary parts are denoted by B R (k) and B I (k).
In this simulation, we set c = 0.05 and B R , B I ∈ N(0, 100). In addition, we consider the following four cases for A(k):  Table 1. It can be seen clearly that the convergence performance of MCCC-VC is better than other two algorithms in all cases.
Entropy 2019, 20, x FOR PEER REVIEW 14 of 17 Table 1. Parameter setting of different algorithms.  Notes: η and σ denote the learning rate and kernel width for MCCC and MCKRSL, and λ denotes the risk-sensitive parameter for MCKRSL. Moreover, η w , σ η denote the learning rates for the weight and kernel width of MCCC-VC, and σ(0) denotes the initial kernel with of MCCC-VC.       Notes: η and σ denote the learning rate and kernel width for MCCC and MCKRSL, and λ denotes the risk-sensitive parameter for MCKRSL. Moreover, η w , σ η denote the learning rates for the weight and kernel width of MCCC-VC, and σ(0) denotes the initial kernel with of MCCC-VC.

Conclusions
The complex correntropy usually employs a Gaussian kernel whose center is zero, which is not the best choice for many situations. To overcome this defect, this paper proposes the maximum complex correntropy criterion with variable center (MCCC-VC). The complex correntropy is extended to the case where the center can be anywhere. Furthermore, this paper also proposes an effective method to optimize the center position and the kernel width. More significantly, we analyze the convergence and steady-state performance of MCCC-VC theoretically. Simulation results obtained in Section 4 support the reliability of theoretical analysis and show the excellent performance of MCCC-VC.

Conclusions
The complex correntropy usually employs a Gaussian kernel whose center is zero, which is not the best choice for many situations. To overcome this defect, this paper proposes the maximum complex correntropy criterion with variable center (MCCC-VC). The complex correntropy is extended to the case where the center can be anywhere. Furthermore, this paper also proposes an effective method to optimize the center position and the kernel width. More significantly, we analyze the convergence and steady-state performance of MCCC-VC theoretically. Simulation results obtained in Section 4 support the reliability of theoretical analysis and show the excellent performance of MCCC-VC.   η w = 4.8 × 10 −4 , η σ = 4 × 10 −4 , σ(0) = 5.
Notes: η and σ denote the learning rate and kernel width for MCCC and MCKRSL, and λ denotes the risk-sensitive parameter for MCKRSL. Moreover, η w , η σ denote the learning rates for the weight and kernel width of MCCC-VC, and σ(0) denotes the initial kernel with of MCCC-VC.

Conclusions
The complex correntropy usually employs a Gaussian kernel whose center is zero, which is not the best choice for many situations. To overcome this defect, this paper proposes the maximum complex correntropy criterion with variable center (MCCC-VC). The complex correntropy is extended to the case where the center can be anywhere. Furthermore, this paper also proposes an effective method to optimize the center position and the kernel width. More significantly, we analyze the convergence and steady-state performance of MCCC-VC theoretically. Simulation results obtained in Section 4 support the reliability of theoretical analysis and show the excellent performance of MCCC-VC.