Atrial Fibrillation Detection by the Combination of Recurrence Complex Network and Convolution Neural Network

,


Introduction
Atrial fibrillation (AF) is the most common type of cardiac arrhythmia in clinical setting, affecting about 1-2% of the general population [1].Clinical progress indicates that the presence of AF is associated with an increased risk for stroke, heart failure, hospitalization, and death [2].However, the occurrence of AF is usually unknown because for many patients, the condition is asymptomatic and thus remains undetected.As a result, there is a pressing need to develop AF detection methods.
Electrocardiogram (ECG) is commonly used as a diagnostic tool for AF detection, and considerable research has been conducted on ECG.These works are either based on RR interval (RRI, i.e., the interval between two adjacent QRS complex waves) variability or abnormal atrial activity (AA) (AlGhatri [3]).Previous results showed that the RRI-based algorithms are robust compared with the AA-based algorithms (Kikillus [4] and Dash [5]).However, such methods failed to be effective if the patient has a pacemaker, is taking rate-control drugs, or has other simultaneous heart problems, such as atrioventricular (AV) block [6].Thus, it is necessary to develop AF detection algorithms based on the AA feature, namely, designing rate-independent methods [7].
In view of the atrial activity, during AF, the P-wave is replaced by fibrillatory waves.Thus, a natural way to detect AF is to check the absence of P-waves.Previous algorithms were proposed to address this issue [8][9][10]; however, the results were not satisfactory because P-wave fiducial point detection is challenging, especially for dynamic monitoring applications.
Recently, signal processing techniques have been employed to extract AA features from ECG waves for AF detection.Stridh et al. proposed using a time-frequency distribution estimation method to estimate the fibrillation frequency of the ECG signal, in which a set of parameters describing the fundamental frequency, amplitude, shape, and signal-to-noise ratio of the atrial waveforms are derived based on the frequency-shift of an adaptively updated spectral profile [11].Lee et al. analyzed the dominant frequency of the atrial activity by using the variable frequency complex demodulation (VFCDM) method [12].The value of the dominant frequency has been shown to be a distinctive feature for AF detection.In another ECG-based pattern analysis method for the classification of normal sinus rhythm and atrial fibrillation (AF) beats [13], the denoised and registered ECG beats were subjected to independent component analysis (ICA) for data reduction, and the ICA weights were used as features for classification using Naive Bayes and Gaussian mixture model (GMM) classifiers.All of these methods use handcraft features for pattern recognition.Such features are not invariant on different personalities.In their experiments, the classification accuracy was estimated by tenfold cross-validation, where the probability that the training set contains training samples of every user who provides test ECG samples is great.We use the term individual variation to refer to the above phenomena.However, in ECG monitoring applications, it is crucial that the system is able to tackle this problem.
Magnitude-squared coherence, a frequency domain measure of the linear phase relation between two signals, has been shown to be a reliable discriminator of AF [14,15].However, the accuracy of the corresponding AF detection algorithm is relatively low; thus, it has to be combined with the RRI feature in order to achieve acceptable accuracy.The Recurrence Complex Network has been employed to detect AF from dog epicardial signals recorded by an epicardial mapping system with 128 unipolar electrodes [16].It has been demonstrated that the phase space of the Recurrence Complex Network is suitable for between distinguishing normal sinus rhythm and atrial fibrillation beats.However, in [16], only two numerical features calculated from the adjacent matrix of the complex network are used to detect AF.This process may cause the loss of a lot of discriminating information of the adjacent matrix.
The objective of this paper is to improve the performance of the AF detection algorithm by combining the Recurrence Complex Network (RCN) with convolution neural network (CNN).As one of the deep learning algorithms [17], CNN has great potential in feature extraction and has been applied to image processing and speech recognition with notable success [18][19][20].In the proposed algorithm, CNN is exploited to learn robust AF features from the output of the RCN and then to detect AF signal with high accuracy.The proposed AF detection algorithm is composed of two procedures.The first is a heartbeat classification procedure that can distinguish between AF beat and normal beat based on the ECG waveform of a single heartbeat.The second is a voting procedure that improves classification performance by fusing the classification results of multiple beats.The first procedure is the crucial part, in which the synchronization feature of each heartbeat is first extracted by the RCN, and then, a CNN is used to extract more abstract AF features and recognize an AF heartbeat.Experimental results on the MIT-BIH database show that the AF features learned by the CNN are robust to the variation of the ECG signals between different personalities so that the proposed algorithm has good generalization ability.

The Data
The real data (surface ECGs) used in this method were provided by the MIT-BIH AF database (AFDB) [21].The database is from Physionet [22] and includes 25 long-term (10 hour) annotated ECG recordings of humans with AF and contains 299 AF episodes.Each recording contains two ECG signals (ECG1 and ECG2), which are sampled at 250HZ and 12-bit resolution.In this work, only ECG1 signals are used to evaluate the AF detection methods.

Data Preprocessing.
For each data recording, a sevenorder Butterworth bandpass filter is applied with poles at 0.5 Hz and 49 Hz to reduce baseline wander (BW) and noise.Then, the onset of the QRS wave is detected by finding the local maximums of the convolution between the ECG recording and a set of predefined QRS models.At each QRS onset point, the QRS wave is canceled based on the most matched model.The remaining signals are departed into segments; each of which is approximately the AA segment of a heartbeat.All the segments are interpolated into 128 bit data samples with the Fourier transform interpolation.Next, an AF detection algorithm is developed based on such samples.The main ECG preprocessing steps are illustrated in Figure 1, which clearly illustrates the changing process of the data.Figure 1(a) shows that the Butterworth filter can successfully correct the baseline and reduce the effects of the noise.In Figure 1(b), the right figure only contains information outside the QRS wave.It shows that the ventricular signals are almost removed; thus, the output signal essentially represents the AA signal.These are the single heartbeats before and after the interpolation operation, as shown in Figure 1(c).

Extracting Low Level AF Features Based on the Recurrence
Complex Network.The ECG data is a nonstationary time series [23]; thus, it can be analyzed by the Recurrence Complex Network (RCN), a popular tool for processing nonstationary time series [23,24].Traditionally, there are two issues that need to be explored when applying the RCN: the construction of the recurrence matrix and the extraction of the RCN features.This section mainly focuses on the construction of the recurrence matrix from the ECG data.
The recurrence matrix is obtained by the phase space reconstruction method.Generally, there are two kinds of phase space construction methods: the time delay method and the derivative reconstruction method [24].In this study, the time delay method was selected because the derivation is sensitive to the calculation error.Let () = {( 1 ) ( 2 ) ⋅⋅ ⋅ (  )} denote an ECG data of length , and then, the vector (  ) = [(  ) ( + ) ⋅ ⋅ ⋅ ( +(−1) )] represents a vector in the phase space.Here,  is the  embedding dimension, and  is the embedding delay time.If the parameters  and  are properly specified, the dynamic characteristics of the data will be transferred into the relationship between the vectors in the phase space, and it will be much easier to observe and extract the dynamic features of the data than that in the original space.
The most common method for choosing the time delay parameter  is based on the mutual information between the coordinates of the phase space (Frase [25]).By the assignment [  ,   ] = [(  ), ( + )], a couple of random variables  and  are defined, where   = (  ) is an instance of , and   = ( + ) is an instance of .The average amount of information gained from a specific value of , named the entropy (), is defined as the following: where   (  ) is the probability that the observed value of the random variable  is   .The entropy () is defined in the same way.Moreover, the joint entropy of the couple [, ] can be defined as where  , (  ,   ) is the probability that the observed values of  are   , and  is   .Then, the mutual information between  and  can be defined as (, ):  (, ) =  () +  () −  (, ) , In recalling the definitions of the random variables  and , it can be determined that (, ) is a function of .The research work of [25] demonstrated that the proper value of the time delay  corresponds with the first local minimum of (, ).
The problem of determining the embedding dimension  was explored in depth in [26], in which an efficient method for determining the embedding dimension  was developed based on the fact that a low embedding dimension results in points that are far apart in the high dimensional phase space being moved closer together in the reconstructed space [27].This method was adopted in our AF detection algorithm, and in the following part, we briefly review it.
Traditionally, the recurrence matrix is binarilized, and some numerical features are extracted through the manual method.Then, the input samples can be classified with algorithms such as fuzzy c-means (FCM).However, it is difficult to manually define the appropriate features for the ECG data.To solve this problem, we propose to extract features from the recurrence matrix automatically by using the convolution neural network (CNN).Firstly, we calculate the eigenvalues of the recurrence matrix, and then, they are sent into the CNN.The CNN extracts the features and classifies the data.The eigenvalues of each data sample form a 92-byte feature vector.As illustrated in Figure 2, the convolution layer adopted in this study consists of a group of fully connected feature maps   ( = 1, 2, ⋅ ⋅ ⋅ ,   ) (assume   is the total number).Each feature map   is obtained by a summation of the convolutions from all the input feature maps (assume   is the total number),   ( = 1, 2, ⋅ ⋅ ⋅ ,   ), and a series of weight vectors  , ,  = 1, 2, ⋅ ⋅ ⋅ ,   ,  = 1, 2, ⋅ ⋅ ⋅ ,   , i.e.,

AF Detection Based on the
where () is a nonlinear activation function, for example, () = 1/(1 + exp(−)).The term feature map is borrowed from image processing applications, in which the input and output of each layer of the CNN are 2-dimensional arrays.However, the input of the CNN is a 1-dimensional vector, and as a result, all of the feature maps are 1-dimensional vectors in our AF detection algorithm.The weight vectors of the convolutional layer,  , ,  = 1, 2, ⋅ ⋅ ⋅ ,   ,  = 1, 2, ⋅ ⋅ ⋅ ,   , can be seen as trainable feature extraction operators; each of which enhances one kind of feature and weakens the others.When the CNN is trained with sufficient training data, the feature maps, which are obtained by using these weight vectors, will be turned into an appropriate representation for recognition of the input data.Each convolutional layer is followed by a pooling layer, as shown in Figure 2. The pooling layer is also composed of feature maps.Each feature map   ( = 1, 2, ⋅ ⋅ ⋅ ,   ) in the pooling layer is obtained by applying a pooling operation to the units of a convolution layer feature maps   ( = 1, 2, ⋅ ⋅ ⋅ ,   ).There are usually two kinds of pooling operations: maximization pooling and averaging pooling.Here, averaging pooling was adopted, which is defined as where  , is the -th unit of   ;  , is the -th unit of   ;  is the pooling size, which determines a pooling window;  is the shift size, which determines the overlap of adjacent pooling windows; and  is the scaling factor, which is selected as one in .By the pooling operation, the resolutions of the feature maps are reduced so that the features learned by the CNN are robust to small variations in location.
The CNN used for AF detection has six layers (as illustrated in Figure 3).It consists of one input layer; two convolutional layers, which are denoted as C1 and C2, respectively; two pooling layers which are denoted as S1 and S2, respectively; and one output layer.The 92 eigenvalues of the reconstructed recurrence matrix form the ECG sample of a heartbeat and are mounted into the input layer.The output layer has only two units   , where  = 1, 2;  1 corresponds with the normal heat beat class, and  2 corresponds with the AF heart beat class.Suppose that the output units are denoted as the units in the final pooling layer (assume   is the total number) by  , , where  = 1, 2, ⋅ ⋅ ⋅ ,   , and the weight between  , and   is  , , where  = 1, 2, ⋅ ⋅ ⋅ ,   ,  = 1, 2. Then, the final outputs can be calculated as follows: The CNN can be trained with the back-propagation (BP) algorithm with the loss function: where  denotes all of the weights of the CNN,   is an input sample, and ] is the binary encoding vector target label for   .Details of the training algorithm are available in [28].
The number of feature maps in each convolutional layer and the pooling parameters are chosen experimentally in Section 4.

AF Detection.
The input ECG data is preprocessed and segmented into 128-bit samples, where each sample corresponds to the atrial activity (AA) signal of one heartbeat.Then, the recurrence matrix is calculated.The eigenvalues of the recurrence matrix, which form a 92-byte feature vector, are sent to a CNN.The details of the CNN are introduced in the following.
C1 layer: The C1 layer is a convolution layer.It consists of six feature maps with a vector of 1 * 80.Each unit of one feature map in this layer obtains the input from a local area.The size of the convolution kernel determines the size of the receptive field of neurons.Therefore, it is important to set an appropriate convolution kernel size.Here, the convolution kernel is set to be 13, and the size of the output feature map is 80(92-13+1=80).The inner information of the input data is extracted through different convolution kernels.
S2 layer: The S2 layer is a pooling layer.The obtained feature from C1 is sampled according to the principle of local image characteristics.The sampling is achieved by using a pooling function to several units in a region of a size determined by the pooling size parameter.After the experiment, the size is set as 2. Therefore, the size of the obtained feature map in this layer is 40 (80/2=40).The further feature extraction will cause it to be invariant to small variations in location.The resolution of the obtained feature map is reduced, but most of the information is retained.
C3 layer: The C3 layer is similar to that of C1.The size of the obtained feature map is 28 (40-13+1=28).As mentioned above, the pooling layer increases the receptive field of neurons.Therefore, a better feature structure is acquired for the depth structure.
S4 layer: This layer is the same as the S2 layer.The size of the feature maps is 14 (28/2=14).
Output layer: The output layer is fully connected to S4 layer.The number of S4 neurons is 12 * 14=168.Each neuron is connected to the output.There are 168 * 2=336 connections because the output layer consists of two neurons.The output will be closer to the desired output after several times of training through the BP algorithm to update the weights of the network.

Majority Voting.
Although the beat-wise AF detection algorithm is important in exploring the underlying feature of AF, its classification accuracy is relatively low.To improve algorithm performance, the majority voting methodology was adopted.Before AF detection, the ECG data is segmented into beat-wise data samples.Each adjacent  sample is used as a collective candidate for AF detection.The samples of one candidate are classified using the above method, and then, the classification results are integrated by majority voting to determine whether it is AF data.The parameter  will be determined experimentally in the next section.

Experiments and Discussion
All programs and graphs were created in Matlab (R2015b version 8.6.0.267246,Mathworks).The 23 recordings in the database were divided into two groups.The first group contains 15 recordings, and the second group contains 8 recordings.The recordings of the two groups were obtained from different subjects.From the first group, 120,000 NSR (Normal sinus rhythm) heartbeat AA data samples and 120,000 AF samples, respectively, were obtained with the preprocessing method detailed in Section 3.1.All of the 240,000 samples were used to construct the training set.From the second group, 40,000 NSR heartbeat AA data samples and 40,000 AF samples, respectively, were obtained with the same preprocessing method.These 80,000 samples were used delay time with MI method 3.9 to construct the testing set.The goal of such an arrangement is to test whether the AF detection algorithms can be adapted to different individuals.

Choice of Time Delay and Embedding Dimension.
There are two parameters of the reconstructed phase space that need to be determined: the delay time and the embedding dimension.Figure 4 plots the MI versus the delay time .The delay time corresponding to the first local minimum of MI ( = 4) is selected for the phase space.Figure 5 plots the function 1().It can be seen that when the dimension exceeds 9, the 1 value is close to 1 and does not significantly change with an increased embedding dimension.Based on Cao's method,  is set as 10.

The Effects of Varying CNN Parameters.
In order to select the best parameters for the CNN, the performance of the CNN is evaluated using different parameters.
(1) Effects of Different Convolution Kernel Lengths.There are four parameters that need to be determined: the pooling size, the length of the convolution kernels, the number of feature maps in the C1 layer, and the number of feature maps in the C3 layer.In the present algorithm, the length of the input vector of the CNN is not too large; thus, the big pooling size may result in information loss.Therefore, the pooling size is fixed at 2.
As an initiation, the number of feature maps in the C1 layer and C2 layer is set as 6 and 12, respectively, according to ref. [18], and the effects of different lengths of convolution kernels are observed.Table 1 shows the classification rates of the CNN under different convolution kernel lengths.A length of 13 produced the maximal classification rate.
(2) Effects of Various Number of Feature Maps.Table 2 illustrates the accuracy of different feature maps in C1 layer ( 1 ) and C3 layer ( 3 ).The results reveal that the CNN performs best when  1 = 6 and  3 = 12.

Experimental Results of Beat-Wise AF Detection.
To illustrate the effectiveness of the CNN, the CNN is compared with other popular classification methods.Three measurements are used to evaluate the methods: accuracy (AC), sensitivity (SE), and specificity (SP).The inputs of all three classifiers are the low level features obtained by the method detailed in Section 3.2.Table 3 demonstrates that the CNN greatly outperforms the others.
Most of the rate-independent AF detection algorithms are unable to solve the problem of individual variation.According to our investigation, only the Magnitude-squared coherence (MSC) algorithm [15] and the Recurrence Complex Network (RCN) algorithm [16] can recognize the samples of different individuals based on beat-wise AA samples.Table 4 presents a comparison of the proposed beat-wised AF detection algorithm (BWAD) with these two algorithms.For the contrast experiments, all of the ECG recordings are preprocessed with the method described in Section 3.1, and, the training and testing sets are constructed as previously described.In the BWAD algorithm, the low level features are first extracted and then, the CNN is used to extract the highlevel features and classify them.As for the MSC algorithm, the feature vectors are calculated between each data sample and the previous sample, and the samples are classified based on a hand measurement that is detailed in [15].For the RCN algorithm, the recurrence matrix is calculated the same as that in the proposed algorithm, and, the samples are classified based on two hand measurements that are detailed in [16].
It can be seen that the proposed BWAD algorithm outperforms the traditional algorithms.Traditional algorithms perform poorly in beat-wise rate-independent AF detection because they rely on manually obtained features.In contrast, the BWAD algorithm effectively solves this problem by using CNN to extract high-level features for classification.

Experimental Results of Majority
Voting.The performance of the proposed algorithm can be improved by majority voting, in which the outputs of  adjacent heartbeat samples are integrated to obtain an accurate result.As for the testing process, it was determined that the process of removing the QRS wave and reducing the noise was the most time-consuming process.After several experiments, the time spent in each process for one sample was obtained, and it was revealed that the testing process is about 0.1186 seconds for a sample.Therefore, this method can be used in real-time signal processing.

Conclusion
In this paper, a novel rate-independent AF detection algorithm that combines RCN and CNN based on AA features is presented.Firstly, the recurrence matrix is calculated with RCN, and the eigenvalues of the matrix are extracted to characterize atrial activity.Then, CNN is employed, which leverages the multilayer structures and presents an increasingly abstract representation of the input.These signals are distinguished through the optimization of the network so as to extract high-level features and classify the input sample.Finally, majority voting is utilized to improve algorithm performance.
In the experiments, the training set and testing set are constructed with a special arrangement so that the data samples of each set are obtained from different subjects.The proposed algorithm achieves an accuracy of 94.59%, which is comparable to popular RRI-based methods.Moreover, the proposed rate-independent algorithm is applicable to patients with rate-controlled drugs or pacemakers.Furthermore, the developed method solves the problem of individual variation.Therefore, it is evident that the proposed method can detect AF with superior performance.
Interpolation of the AA segment
Convolutional Neural Network 3.3.1.Architecture of the CNN.The convolutional neural network (CNN) addresses the feature learning problem through the calculation of multiple levels of data representations by the operation involved in the multiple layers of the CNN.Except for the first layer and the top layer, the main part of the CNN is composed of alternating layers of convolution and pooling.

Figure 2 :
Figure 2: Illustration of a convolution layer and the subsequent pooling layer.

Figure 3 :
Figure 3: Structure diagram of the CNN.

Figure 4 :
Figure 4: Selection of the delay time.

Figure 5 :
Figure 5: Selection of the embedding dimension.

Table 1 :
Classification rates of CNN under different lengths of the convolution kernel.

Table 2 :
Classification rates of CNN under different number of feature maps.

Table 3 :
Comparison of CNN with typical classification methods.

Table 4 :
Comparison with traditional rate-independent AF detection algorithms.

Table 5 :
Results of majority voting under different parameters.

Table 6 :
Comparison of the time spent in each process.theclassificationrates of the voting algorithm under different parameters ().It can be seen that a larger  value usually leads to a better performance.4.5.The Calculation of the Complexity.The configuration of the computer used for the program is an Intel Pentium Dual-Core with a processor speed of 2.2GHz and a memory size of 3.18GB.For the proposed algorithm, training the CNN is a time-consuming process.However, the training process can be carried out off-line.The training process (i.e., the whole data preprocessing process and the CNN training process (10 times)) requires approximately 9.65 hours for the 24,000 samples.Table6lists the results. lists