A Fetal Electrocardiogram Signal Extraction Algorithm Based on the Temporal Structure and the Non-Gaussianity

Fetal electrocardiogram (FECG) extraction is an important issue in biomedical signal processing. In this paper, we develop an objective function for extraction of FECG. The objective function is based on the non-Gaussianity and the temporal structure of source signals. Maximizing the objective function, we can extract the desired FECG. Combining with the solution vector obtained by maximizing the objective function, we further improve the accuracy of the extracted FECG. In addition, the feasibility of the innovative methods is analyzed by mathematical derivation theoretically and the efficiency of the proposed approaches is illustrated with the computer simulations experimentally.


Introduction
In biomedical signal processing, fetal electrocardiogram (FECG) extraction is full of challenges. FECG provides important information about the health of the fetus. However, FECG is always buried in various interferences and noises. Among these interferences and noises, maternal electrocardiogram (MECG) is the strongest one with high amplitude. Besides, breathing artifact, electrode artifact, and other noises also affect the desired FECG. Therefore, it is a difficult task to extract accurate FECG.
Traditional methods cannot get satisfactory results, such as multireference adaptive noise cancellation [1] and singular value decomposition [2]. Recently, the blind source separation (BSS) model [3][4][5][6] is introduced to solve the extraction problem of the desired FECG and shows satisfactory results. BSS algorithms can separate all sources from mixtures without any priori knowledge. In BSS algorithms, the ICA model [7][8][9] utilizes the non-Gaussianity [10] of signals to separate all source signals. This model is suitable for biomedical signal processing [11] and non-Gaussianity becomes an important tool to process these kinds of signals. In [12], traditional BSS algorithms and joint BSS algorithms were used to separate the maternal signal and the fetal signal. Authors [13] proposed two methods based on hybrid BSS to get FECG signals. These methods get better results than traditional BSS algorithms. However, the extraction of the desired FECG with the BSS algorithms is not the best choice. On one hand, separating all sources is a waste of time and not necessary if only the desired FECG is needed. On the other hand, the prior knowledge about the desired FECG can be utilized by us. According to this situation, blind source extraction (BSE) rises in response to the proper time and conditions and becomes a better choice than BSS. BSE only separates one source signal every time, so it has higher efficiency when it extracts the desired FECG. FECG is a special signal with the temporal structure. As prior knowledge, this structure can help us recover source signals. Barros and Cichocki [14] proposed a BSE algorithm called BCBSE. This algorithm can extract the desired FECG by utilizing the temporal structure. However, the algorithm is very sensitive to the estimation error of the optimal time delay. To overcome the drawback of BCBSE, Shi and Zhang [15] developed another BSE algorithm called SemiBSE based on the non-Gaussianity and the mean squared error function that is described in [14]. This method improves the performance of BCBSE. It is more robust for the estimation error of the optimal time delay. However, the performance of this algorithm is up to the choice of the parameters. If we initialize the parameters randomly, the performance of the algorithm becomes weaker. A BSE method called MACBSE [16] was proposed based on several time-delay autocorrelations of primary sources. MACBSE adopts a fixed-point learning algorithm for extraction of the desired source signal without choosing the learning step size. Authors [17] proposed a BSE algorithm called GABSE based on the generalized linear or nonlinear autocorrelations of the sources. This method has fast convergence speed and good stability if any two sources are uncorrelated with each other and have different temporal structures. A fast and robust fixed-point algorithm based on the nonlinear autocorrelation was proposed in [18]. Its convergence speed is shown to be at least quadratic. Li and Liao [19] proposed an algorithm based on the eigenvalue decomposition of the cross-correlation of whitened source signals at a given time tag. This algorithm functions without iterations. Wang et al. [20] proposed a robust separation algorithm to recover original sources through a joint diagonalizer of several average delayed covariance matrices at positions of the optimal time delay and its integers.
In order to settle the above problem better, we put forward the method based on the non-Gaussianity and the temporal structure of source signals. This paper consists of two main parts. First, we design an objective function to get the weight vector for extraction of the desired FECG. Then, the weight vector obtained by maximizing the objective function is combined with the FastICA algorithm to further improve the performance of the algorithm. This paper is organized as follows. Section 2 introduces the basic theory of our algorithms. As the principle part, we highlight the algorithms and the analysis about the algorithms in Section 3. Simulation results are presented in Section 4, and conclusions are made in Section 5.

The Basic Model
The linear instantaneous mixed model of BSS problems can be denoted as is an unknown mixing matrix, s( ) = [ 1 ( ), 2 ( ), . . . , ( )] is the -dimension source vector, and n( ) is the noise vector. If the noise is ignored, the noiseless model of BSS problems can be expressed as Preprocessing for the mixtures is usually necessary before the BSS algorithms. First, the mixtures are made to own zero mean through the following formula: wherex( ) is the new mixture vector. Then, the prewhitening process makes any two variables of the new mixtures orthogonal and gets the whitened mixtures where V is the prewhitening matrix. For a BSE algorithm, only a source signal can be separated at a time. If the unmixing matrix is w = ( 1 , 2 , . . . , ) , the estimated source signal can be denoted as Meanwhile, the delayed estimated source signal can be expressed as where is the time delay that is some lag constant. The estimation error of the optimal time delay may have bad effect on the performance of the algorithms, so the reasonable estimation is important. The specific estimation method of the optimal time delay refers to [14].

The Proposed Algorithm 1
3.1.1. Objective Function. In order to avoid the blindness of choosing parameters, we combine the non-Gaussianity and the temporal structure to design the following constrained maximization problem: where is a differentiable nonlinear function. This function utilizes the non-Gaussianity, the temporal structure, and the nonlinear correlation.
where is the derivative of . According to the gradient ascent learning rule, a gradient method can be derived as follows: Computational and Mathematical Methods in Medicine   3 where is a learning rate. The desired vector can be obtained through this method, but the corresponding cost is that the convergence speed of the algorithm is slow and the performance of the algorithm is dependent on the proper choice of the learning rate. If the learning rate is chosen improperly, the convergence property will be destroyed. Therefore, how to find a new approach to radically improve the convergence speed and the reliability is an important issue that needs to be solved. The fixed-point algorithm is a choice to solve this issue. For getting a more efficient fixedpoint algorithm, we note that the gradient must point in the direction of w at a stable point of the gradient algorithm. It means the gradient must be equal to w multiplied by some scalar constant. In such a case, adding the gradient to w does not change its direction and the convergence can be obtained. Through normalization to the unit norm, the value of w is not changed except by changing its sign. Therefore, the following formula can be obtained: Through the above formula, the fixed-point algorithm can be updated as Utilizing the above fixed-point algorithm, we can extract the desired signal.

Stability Analysis.
In this part, we analyze the stability of the proposed algorithm.

Theorem 1.
Assume that the input data meet the model as (2). The data are prewhitened through equationx = VAs and is a quite smooth even function. Furthermore, we assume that where is the derivative of .
Proof. According to the above conditions, we make the orthogonal transform of coordinates p = A V w and obtain We analyze the stability at the point p = e 1 = (1, 0, 0, 0, . . . ) . The independency assumptions are utilized to evaluate the gradient and the Hessian matrix of (p) at the point p = e 1 . Then, through making a small perturbation = ( 1 , 2 , . . .), where 1 and 2 are the elements of , we get where (e 1 )/ p and 2 (e 1 )/ p 2 are expressed as Because of the constraint ‖w‖ = 1, we can , so the higher order terms can be neglected. Through the first-order approximation of 1 that is described above, we know 1 = − ∑ >1 2 /2 + (‖ ‖ 2 ) and obtain 4 Computational and Mathematical Methods in Medicine p = e 1 is an extremum that is implied in the condition of Theorem 1 if the following condition is satisfied: Based on the above descriptions, this condition can be expanded as

Convergence Analysis.
Convergence is also important like stability. In this part, we analyze the convergence of the algorithm.
Theorem 2. If the following two conditions are satisfied, the algorithm described in formula (11) can reach convergence: (1) { , } and { , } are mutually independent.
Proof. Based on the orthogonal coordinate transformation p = A V w, formula (11) is changed tô where is the number of iterations. Using a Taylor approximation for and , we have where p − is the vector p without its th component, s − is the vector s without its th component, and s − is the vector s without its th component. Similarly, we analyze the convergence at the point p = e, where is 1 and is 0 (∀ ̸ = ). Based on the above conditions, we get The above equations clearly show that the algorithm converges to such a vector e in which is 1 and is 0 if condition (2) is satisfied. It means that = ((VA) ) −1 q can converge without computing the sign. This shows that w converges as Theorem 2.

The Proposed Algorithm 2.
On the basis of the above algorithm, we propose another improved algorithm by utilizing the FastICA algorithm. The FastICA algorithm is a typical ICA algorithm. The source signal can be estimated through maximizing the non-Gaussianity function of signals. One of the representative non-Gaussianity functions is negentropy. However, this algorithm is based on the theory of projection pursuit, so any signal may be first extracted. According to this problem, we propose an improved algorithm by presetting the weight vector. The method of projection pursuit searches for a projection output to maximize the objective function. Based on the theory of ICA, a local extremum in the solution space of the projection direction is obtained if the negentropy is the objective function. The output of the corresponding data for this local extremum is an independent component. If the mixtures consist of source signals, the parameter space formed by ICA will have 2 local extrema that correspond to signed solutions of all source signals. Meanwhile, a local optimal solution owns its attraction region for a greedy optimization algorithm that is based on projection pursuit. In other words, the initial point in this region converges to this local optimal solution with this algorithm. The FastICA algorithm may not extract the desired signal if the initial weight matrix is arbitrary. However, if the initial weight matrix is near the local optimal solution that corresponds to It is the initial weight matrix of the proposed algorithm 2.
Then, we can extract the desired signal with the FastICA algorithm based on the negentropy. The approximate negentropy in literatures can be denoted as The gradient of ( ) is written as ; the learning algorithm is obtained as follows: The corresponding fixed-point algorithm is denoted as However, the convergence property of the FastICA algorithm based on the negentropy is unsatisfactory because the nonpolynomial moment does not own good algebra property. Therefore, the Newton iteration algorithm is usually utilized to improve the process. If the maximization of { [w x( )]} under the constraint ‖w‖ = 1 is considered, the optimization problem with the Lagrangian multiplier method can be changed to The first-order derivative and the second-order derivative can be denoted as In order to avoid calculating the inverse of the matrix, the simple approximation is made to get the following Newton iteration algorithm: Both sides of the above equation are multiplied by { [wx( )]} + 2 and then the learning algorithm after the algebraic simplification is denoted as Formula (30) is the basic formula of the fixed-point FastICA algorithm.
Here, we analyze the condition that the initial weight vector must meet in order to get the desired signal. According to formula (30), we get Based on formula (31) and q = A V w, the following formula is obtained:q  To analyze conveniently, we choose = 4 /4 to illustrate. Therefore, = 3 and = 3 2 can be obtained. Formula (33) can be simplified as . (34) Next, the following formula can be obtained: where 4 (⋅) denotes the kurtosis of the signal. Similarly, we obtain The following formula is assumed: Then, ( ) can be denoted as If we want to get the desired signal, ( ) must be close to 0 and ( ) must be close to 1. Therefore, the following formula must be met: The above formula (39) can be simplified as Combining the solution vector w 0 described in formula (22), we get In conclusion, the desired signal can be obtained if the following condition is satisfied: Similarly, if another function is chosen, there will be another condition that is similar to formula (42) helping us get the desired source signal. mixed with a matrix to get the mixtures that are shown in Figure 2. The mixing matrix in this paper is written as

Simulation Results and Analysis
where denotes the element of the global vector p = w VA.
If the desired signal is extracted perfectly, the value of PI is 0. In other words, the value of PI is lower if the performance is better. PI is always used to be the measurement index for the artificial data.
In the comparison algorithms, the MACBSE algorithm, the GABSE algorithm, the SemiBSE algorithm, and the FastNA algorithm are the algorithms that need iterations. Therefore, we compare these algorithms with the proposed algorithm through the relationship of the average performance index and the iterations. In this paper, the function ( ) = log(cosh( )) is chosen. The parameters of other algorithms are selected based on the references. In the GABSE algorithm and the FastNA algorithm, ( ) = log(cosh( )) is also chosen. In the SemiBSE algorithm, the nonlinearity is chosen as ( ) = sign( ) and the constant coefficient is initialized to 0.3. The learning rates are set to 0.1 and 0.0001. All  Figure 3.
It is obviously verified in Figure 3 that our algorithms have better performance than other algorithms when the algorithms achieve the convergence.
Meanwhile, we compare our algorithms with two noniterative algorithms which include Li's algorithm and the LAJD algorithm. In the LAJD algorithm, and are set to 3 and 2,  respectively. The performance indexes of the algorithms after the convergence are shown in Table 1.
From Table 1, it is easy to know that our algorithms have lower performance indexes than Li's algorithm and the LAJD algorithm.
The above comparison is done with the same mixing matrix. However, the mixing matrix is unknown and changing in practice. Therefore, Monte Carlo simulations need to be carried out with different random mixing matrices. We set the initial weight vector as [0, 0, 0, 0, 0, 1] and compare different algorithms. The average performance indexes over 100 independent trials against iteration numbers by the six iterative algorithms are shown in Figure 4. In 100 independent trials, the mixing matrices are different and random.
When the mixing matrix is random, the average performance indexes of our algorithms and two noniterative algorithms in 100 independent trials are shown in Table 2.
As shown in Figure 4 and Table 2, our algorithms still own better performance when the mixing matrix is random.
Based on the comparisons of the proposed algorithms and all other algorithms, we can know that our algorithms own good performance for processing the artificial data.

Experiments on Real-World Data.
In order to further demonstrate the practicability of our algorithm, the simulations of real-world EEG data which are shown in Figure 5   need to be performed. The well-known ECG data are measured from a pregnant woman and shared by De Moor [21]. The ECG data are mixed with maternal electrocardiogram (MECG), fetal electrocardiogram (FECG), breathing artifact, electrode artifact, and some other noises. When extracting FECG from the real-world ECG signals with various kinds of algorithms, we cannot measure the performance of the algorithms with PI because the mixing matrix is unknown. However, the robustness of the algorithms is also an important index. It is always used to be the measurement index for the real-world data. Because there may be estimation error of the optimal time delay in practice, the robustness of the algorithms becomes more important. The FECG signals extracted by all algorithms at the optimal delay 112 are shown in Figure 6. The FECG signals extracted by all algorithms at delays 106 and 120 are shown in Figures 7 and 8. From Figure 6, we can find that all algorithms can extract the FECG signal at the optimal delay 112. However, if the estimation error of the delay is too large, the FECG may not be extracted by the algorithms. According to Figures 7 and 8, we can know that only the proposed algorithms can extract the FECG signal. Our algorithms function when the delay ranges from 106 to 120, which directly illustrates that our algorithms are more robust than other algorithms. In conclusion, our algorithm is applicable to real-world data and owns better robustness than other algorithms.

Conclusion
We have proposed novel BSE algorithms for extraction of FECG. First, we design an objective function based on  the non-Gaussianity and the temporal structure of the desired FECG. Secondly, a clearer FECG can be obtained by the learning algorithm. Furthermore, the proposed algorithm is further improved through combining the vector obtained by the objective function with the FastICA algorithm. The algorithms of this paper are effective and robust. They can save more time and get more satisfactory results than traditional BSS methods through utilizing the prior information. Experimental results certify the effectiveness of the proposed algorithms. Due to the generality of the presented BSE algorithms, we believe that they can also be extended to other signal processing applications. With our algorithms, it is easy to design suitable functions to solve various kinds of problems.