An Efficient Algorithm by Kurtosis Maximization in Reference-based Framework

This paper deals with the optimization of kur-tosis for complex-valued signals in the independent component analysis (ICA) framework, where source signals are linearly and instantaneously mixed. Inspired by the recently proposed reference-based contrast schemes, a similar contrast function is put forward, based on which a new fast fixed-point (FastICA) algorithm is proposed. The new optimization method is similar in spirit to the former classical kurtosis-based FastICA algorithm but differs in the fact that it is much more efficient than the latter in terms of computational speed, which is significantly striking with large number of samples. The performance of this new algorithm is confirmed through computer simulations.


Introduction
Independent component analysis (ICA) is a statistical method for transforming an observed multidimensional random vector into components that are statistically as independent from each other as possible [1].In a multi-input/multioutput (MIMO) context, the problem of ICA has found interesting solutions through the optimization of so-called contrast (objective) functions [2], many of which rely on higherorder statistics (e.g., the kurtosis contrast [3], [4]) or can be linked to higher-order statistics (e.g., the constant modulus contrast function [5]).These criteria are known to provide good results.
Recently, contrast functions referred to as "referencebased" have been proposed [6], [7] which are based on crossstatistics or cross-cumulants between the estimated outputs and reference signals [6]- [12].Due to the indirect involvement of reference signals in the iterative optimization pro-cess, these reference-based contrast functions have an appealing feature in common: the corresponding optimization algorithms are quadratic with respect to the searched parameters.Taking advantage of this quadratic feature, a maximization algorithm based on singular value decomposition (SVD) has been proposed [6], [7] and was shown to be significantly quicker than other maximization algorithms.However, this method generally requires an additional "fixpoint" like iteration to improve the separation quality and often suffers from the need to have a good knowledge of the filter orders due to its sensitivity on the rank estimation.
A new kurtosis-based gradient algorithm with reference signals fixed has been proposed in [11], which overcomes the drawbacks of SVD based method well.Similarly, an improved algorithm with reference signals update after each one-dimensional optimization has been presented in [12], which shows better performance.Based on [11] and [12], a new family of algorithms to maximize a kurtosisbased contrast function is proposed [7], whose global convergence to a stationary point is proved.And a tradeoff can be adjusted between performance and speed of the optimization method.
Inspired by [7] and [12], a similar reference-based contrast function is constructed, based on which a novel algorithm is proposed for complex-valued signals.More precisely, the cross-cumulant between estimated signals and reference signals is utilized as the contrast criterion in terms of kurtosis.Due to the quadratic dependence on the searched parameter, the main advantage of our proposed algorithm consists in the fact that it is much more efficient than the classical one in terms of computational speed.The performance of our method is validated through computer simulations.Note that the work of this paper is the extension of that in [13] that only considers the real-valued signals.
The remaining of this paper is organized as follows.Section 2 introduces the system model and assumptions.The contrast criteria including reference signals and corresponding contrast functions are presented in Section 3. Our new proposed optimization algorithm is shown in Section 4. Computer simulations are performed in Section 5. Section 6 concludes this paper.

System Model
In this paper, we extend our work in [13] to the complex-valued signal separation model.The system model for ICA is shown in Fig. 1, in which the mixing system is noise-free.
The observed (mixed) signals are denoted by The input-output relationship can be described as where A is the mixture matrix of N 2 × N 1 , representing the linear mixing system, which is composed of The separated signals are denoted by where W is the separation matrix of N 2 × N 1 , representing the linear separating system, which contains N 1 column complex-valued vectors, i.e., W H stands for the Hermitian of W, that is W is transposed and conjugated.Without loss of generality, we assume the number of sources equal to that of observed signals, i.e., N 1 = N 2 = N in this paper.

Practical Model
For practical application of our algorithm for the complex-valued signals, we are planing to construct a wireless communication system with two transmitting and receiving antennas, which transmits multiple signals simultaneously in the same frequency band over wireless channel and recovers the source signals at the multiple-antenna receiver by utilizing the statistical characteristics of source signals and broadcasting characteristics of wireless channel.Therefore, the spectrum efficiency of this system is high compared to that of time division multiplexing (TDM), frequency division multiplexing (FDM), and code division multiplexing (CDM).If it works, our new algorithm can improve the spectrum efficiency and computational speed of wireless communication system.The corresponding work will be investigated and shown in our latter work.We believe this kind of wireless communication system is very significant and promising in the future.

Assumptions
In order to retrieve the sources blindly and successfully, we make two assumptions [7], which are shown as follows: A1.The sources processes s i (t), i ∈ {1, • • • , N} are statistically mutually independent.A2.All the components of s (t) are stationary, zeromean and they have unit variances and uncorrelated real and imaginary parts of equal variances, i.e., E{s (t) s(t) H } = I and E{s (t) s(t) T } = 0.

Contrast Criteria 3.1 Reference Signals
Before introducing the contrast function, we first give a brief introduction of the reference signals referred to above.As shown in [6]- [12], the reference signals are expressed in the form of where z(t) and V are set similarly as y (t) = W H x (t), which are denoted by However, the reference signals are artificially introduced in an algorithm for the purpose of facilitating the maximization of the contrast function [6], [7].In [11], the reference signals are initialized arbitrarily and kept the same during the whole optimization process.In [12], the reference signals are not directly involved in the iterative optimization computation.In other words, the reference signals update following the objective signals.More precisely, V updates following W in each loop iteration step.Then the separation quality of the algorithm in [10] is better than that in [11].In [14] and [15], we have done some corresponding work to investigate the impact of reference signals, which is similar to [12].Combined with [12], we only consider the case where the reference signals are successively updated after each one-dimensional optimization.In [13], we have done some similar work on real-valued signals, which is extended to the complex-valued signals in this paper.Therefore, the performance of our algorithms is much better than those in [14] and [15].Since no confusion is possible and for simplicity, in the following sections, we drop the time index and these vectors are denoted respectively by s, x, y and z.

Contrast Functions
In order to describe the contrast function used in this paper, we first introduce some notations.The cumulant of a set of random variables is denoted by Cum{•}.We consider the cumulant of complex-valued signals [7] where E{•} denotes the mean value and y * designates the complex conjugate of y.
Let us introduce the following criteria: where J is the well-known kurtosis contrast function, which has been proved to be a contrast function [3], [4].I is our reference-based contrast function, which is similar to those proposed in [7] and [12].The consistent convergency of our reference-based contrast function can be proved by referring to the Proposition 1 in [7].Recently, we also have done some corresponding work by introducing the reference signals to the negentropy contrast function, based on which a class of more efficient and robust algorithms has been proposed.These work will be presented in our latter work.

Optimization Algorithm
Based on the assumptions and the Proposition 1 in [7], it can be concluded that the new contrast function I we propose has local extrema, which means that the stationary points of it can be found through optimization schemes.In this section, we give the derivation of our proposed algorithm in detail for the extraction of one source component.
After whitening, the mixing signals x satisfies E{xx H } = I.According to the assumptions A1 and A2, E{xx T } = 0 follows straightforward from E{ss T } = 0.Under the constraint E{ w H x 2 } = w 2 = 1 and with v update following w, we can get According to the Lagrange conditions, we construct Lagrange function under the constraint E{ w H x 2 } = w 2 = 1: Then the extrema of I(w, v) with respect to w are obtained at points where where β ∈ R. Similar to the complex gradient in [16], we have where w = [w 1r , w 1i , • • • , w Nr , w Ni ], and we assume the following notations Then the first term in (10) can be expressed as and we use Newton method to solve (9).Similar to (13), the Jacobian matrix of I(w, v), with respect to w, can be expressed as where the approximation was done by separating the expectations.Also, E{xx H } = I and E{xx T } = 0 are used here.And the last term in ( 14) results from the constraint E{ w H x 2 } = w 2 = 1 in which v updates following w so that we have Then the total approximative Jacobian of ( 10) is now which is diagonal and thus easy to invert.We obtain the following approximative Newton iteration: If we multiply both sides of ( 16) by 2β, then ( 16) can be simplified to Finally, the derivation of our proposed algorithm is completed, which is summarized as follows: Step1.Eliminate the mean value of and prewhiten it. Step4.

H = y W x
In the algorithm above, w i k+1 corresponds to the ith source to be retrieved for the (k+1)th iteration step and v i k+1 is the corresponding reference vector, which updates following w i k+1 .From the whole process, we can see the source signals are restored one by one through each one-dimensional optimization in a deflationary manner.To prevent different one-dimensional optimization converging to the same maxima, a Gram-Schmidt-like decorrelation scheme [3], [4], [16] is adopted.

Separability of Our Algorithm
In this section, the separability of our proposed algorithm is investigated through simulations.Here we choose three independent 16QAM signals as sources, for which the samples T = 10000 and the iteration parameter k max = 1000.The source signals are shown in Fig. 2.  To verify the separability of our algorithm more precisely, the separation-mixture matrix is presented as −0.0059 − 0.0025i −0.0013 + 0.0017i 0.0048 − 0.0036i 0.0011 + 0.0023i −0.5823 + 0.2538i −0.0026 + 0.0004i 0.4633 + 0.4250i −0.0007 − 0.0041i where Q is the whitening matrix, i.e., y = W H x = W H QA = W H QAs. Note that the element labelled in black corresponds to one of sources.
It can be observed clearly that, compared Fig. 2 and Fig. 4, our proposed algorithm recovers the source signals successfully up to some ambiguities, i.e., permutation and scaling ambiguities.However, it is fortunate that the ambiguities are common and insignificant in most applications.

Performance Analysis
In this section, we choose three QAM signals as sources to investigate the performance of our proposed algorithm.The number of samples T varies from 1000 to 10000 and we set the iteration parameter k max = 1000.The mean value of mean square error (MSE) and median MSE between sources and corresponding estimations are chosen as the performance measure criterion of separation quality.The average execution time is chosen as the measure criterion of computational speed, for which the computer is Intel(R) Core 2 Duo CPU, E8400 @ 3.0 GHz, 2.99 GHz, 3.00 GB RAM.In order to make the simulation results more reliable and reasonable, we perform 100 independent execution runs for each simulation, in which the mixing matrix is randomly chosen.
In this paper, we consider the performance comparison and analysis among four algorithms, which are denoted by F1, F2, F3 and F4.F1 denotes the gradient maximization algorithm based on kurtosis, which is similar to the optimization schemes in [6]- [12] but differs in the fact that reference signals are not considered.F2 denotes the optimization algorithm proposed in [7] and [12].F3 denotes the classical FastICA algorithm based on kurtosis shown in [3], [4], [16].F4 denotes our proposed FastICA algorithm with reference signals introduced.
Note that all the algorithms are implemented on our computer under same condition.The simulation results are illustrated in Fig. 5, Fig. 6 and Fig. 7.
It can be observed clearly from Fig. 5 and Fig. 6, that the value of MSE and median MSE of our proposed algorithm F4 is very close to those of F1, F2 and F3, which means that our new method converges to the approximately identically stationary points as others.More precisely, our new algorithm recovers sources successfully on one hand, and shows similar performance in terms of separation quality on the other hand.Moreover, with the number of samples increasing, the performance of F1, F2, F3 and F4 is so close that there is no difference among them.Additionally, compared Fig. 5 and Fig. 6, we can see that our proposed method is very stable.However, in Fig. 7, it can be clearly observed that the execution time of F4 is much less than that of F1, F2 and F3, which means that our new algorithm is more efficient than others in terms of computational speed.And the advantage of our method is particularly more apparent over others with the number of samples increasing.Furthermore, the reference-based algorithms F2 and F4 show quicker convergent speed than F1 and F3, respectively.Note that we mainly focus on the fact that our new proposed FastICA algorithm F4 provides better performance than the classical one F3, which is the initial motivation of this paper.

Complexity Analysis
In this section, the computational complexity of our new algorithm F4 and corresponding classical one F3 is briefly analyzed, which reveals the reason why the former is much more efficient than the latter in terms of computational speed.The main difference between F3 and F4 is where (20) denotes the classical algorithm F3 and (21) denotes our proposed one F4.It is well accepted that the value of kurtosis has to be estimated from a measured sample in practice.So from (20) and (21), we can see that the computational complexity of F3 and F4 is thus an increasing function of the number of sources, sample size T and iterative parameter k max .More precisely, the computational complexity mainly lies in (w i k ) H x, for which w updates after each iteration.However, the reference signal vector v introduced in our algorithm updates following w, which means that (v i k ) H x is known advance to a certain extent and is not involved in the iteration optimization process.Therefore, the computational complexity of (21) for our algorithm only relates to w and the computational resource is reduced significantly compared with (20) of the classical one, which is significantly striking with the number of samples increasing.

Conclusion
In this paper, a new algorithm is proposed, which is based on our proposed reference-based contrast function.It is similar to the classical FastICA algorithm but is much more efficient than the latter in terms of computational speed.Computer simulations are performed to validate the performance of our algorithm.Our future work includes the extension of this new algorithm to more complex mixing models such as convolution and nonlinear, in which the noise will be considered.
About the Authors . . .Wei ZHAO was born in 1988 in China.He is beginning to pursue the PhD degree just now.His research interests include blind source separation, independent component analysis, compressed sensing, sparse component analysis, and their corresponding applications in wireless communication systems.
Yimin WEI was born in 1973 in China.He received the PhD degree in 2012 and is now an assistant professor in PLA University of Science and Technology.His research interests include wireless communication and signal processing.
Yuehong SHEN was born in 1959 in China.He received the PhD degree in 1999 and is now a professor in PLA University of Science and Technology.His research interests include wireless MIMO communication, digital signal processing, statistical signal analysis, wireless statistic division multiplexing.
Yufan CAO was born in 1991 in China.He is now pursuing the Master degree in PLA University of Science and Technology.His research interests include blind source separation, independent component analysis, and their corresponding applications in wireless communication systems.

Fig. 2 .
Fig. 2. Three source signals.The observation signals are mixed through linear matrix, which is chosen arbitrarily, i.e., A =
Zhigang YUAN was born in 1980 in China.He received the PhD degree in 2008 and is now a lecturer in PLA University of Science and Technology.His research interests include digital transmission technology and digital communication signal processing.Pengcheng XU was born in 1987 in China.He is now pursuing the PhD degree in PLA University of Science and Technology.His research interests include blind source separation, independent component analysis, and their corresponding applications in wireless communication systems.Wei JIAN was born in 1976 in China.He received the PhD degree in 2006 and is now a lecturer in PLA University of Science and Technology.His research interests include digital transmission technology and wireless communication signal processing.