From ECG signals to images: a transformation based approach for deep learning

Provocative heart disease is related to ventricular arrhythmias (VA). Ventricular tachyarrhythmia is an irregular and fast heart rhythm that emerges from inappropriate electrical impulses in the ventricles of the heart. Different types of arrhythmias are associated with different patterns, which can be identified. An electrocardiogram (ECG) is the major analytical tool used to interpret and record ECG signals. ECG signals are nonlinear and difficult to interpret and analyze. We propose a new deep learning approach for the detection of VA. Initially, the ECG signals are transformed into images that have not been done before. Later, these images are normalized and utilized to train the AlexNet, VGG-16 and Inception-v3 deep learning models. Transfer learning is performed to train a model and extract the deep features from different output layers. After that, the features are fused by a concatenation approach, and the best features are selected using a heuristic entropy calculation approach. Finally, supervised learning classifiers are utilized for final feature classification. The results are evaluated on the MIT-BIH dataset and achieved an accuracy of 97.6% (using Cubic Support Vector Machine as a final stage classifier).


INTRODUCTION
In 2015, according to the United Nations report, the world is facing an aging population. The number of people aged 60 years or more will rise to 56.00% by 2030 or double by 2050 (Escobar, 2011). One of the main fatalities throughout the world is cardiovascular ailments. The human cardiovascular system weakens as we grow older and it is more likely to suffer from arrhythmias. A ventricular arrhythmia is an irregular heartbeat of ventricular rhythm. If not treated in time, it can cause life in danger. Ventricular fibrillation (Vfib), atrial fibrillation (Afib) and atrial flutter (Afl) are the recurrent dangerous arrhythmias that can disturb the aging population (Van Huls Van Taxis, 2019). Ventricular arrhythmias (VA) reduces ventricular function. It may cause the need for implanting a fixed cardioverter defibrillator due to the occurrence of VA during longstanding follow up in patients affected with hypothetical myocarditis (Sharma et al., 2019b).
This paper introduces a new approach to predict VTA and classify various arrhythmias using a novel technique. In this technique, we transform ECG signals into binary images. Our approach differs from other approaches know from the literature as commonly ECG signals are transformed into series data. As a result, deep learning models such as convolution al neural network (CNN) does not work properly on ECG signals data because the minor value of signals data is ignored in the QRS complex thus preventing from accurate recognition of arrhythmias. It is a big challenge to convert the serial signals data into images and further proceed for the detection of VTA.
The novelty and contribution of the article are as follows: A novel approach to convert ECG signals into 32 Â 32 binary images.
A fusion of features from several deep CNNs for VTA recognition.
The entropy-based feature selection is employed for obtaining the best feature subset.
The selected features are finally trained using different classifiers, and higher accuracy is attained as compared to the existing method.
Here are the key advantages that are achieved using our proposed methods: No need for complex pre-processing of ECG signals.
No need for the QRS complex detection.
Higher accuracy than previous CNN based arrythmia detection techniques.
Less time consumption for arrythmia detection.

RELATED WORK
Recently, several review articles have been written in this domain which explores the importance of VTA using Deep CNN models. Martis et al. (2014) present three-class learning to inevitably identify Afl, Ventricular flutter (Vfl), normal sinus rhythm (Nsr) and VTA ECG signals. They present effective higher-order bands method on 641,855 and 877 (Afib, Nsr and Afl) beats of ECG signals. Formerly, these beats of ECG are exposed to self-governing constituent analysis for the selection of substantial features. The method produced an accuracy of 97.65%, a specificity of 98.75%, and a sensitivity of 98.15% using the k-Nearest Neighbor (KNN) classifier. Acharya et al. (2016) proposed a Computeraided diagnosis system to automated perceive and classify similar ECG into four classes.  (2017) presented a convolutional neural system strategy that combined convolutional layers for feature extraction with long-short term memory layers for feature aggregation to recognize the different ECG sections. In their work, they have utilized 5-and 2-s windows of ECG signals without QRS discovery, achieving and F-score of 82.1%. Acharya et al. (2016) utilize ECG signal beats; they presented a framework for the automated analysis of certain arrhythmias. They accomplish an accuracy, sensitivity, and specificity of 92.50%, 98.09% and 93.13% individually for the 2-s windows of ECG signals. The findings of the related literature analysis show that it will be better if we can transform our signals data into images and then merge signal processing with image processing techniques using deep learning. As a result, CNN works better and gets higher accuracy using different classifiers.

Data used
The signals of ECG were attained from publicly available arrhythmia databases like MIT-BIH, CUDB (Creighton University VT Database) and Nsr. The signals, which are acquired from the MIT-BIH dataset, were recognized and taken out regarding the annotation file, which is set up by the cardiologists (Moody & Mark, 2001). In this work, two lead ECG signals are used. The details of the datasets are given in Table 1. The MIT Arrhythmias dataset consists of signals and their annotated files. The signals data contains a series set of data of each patient with a complete set of ECG patterns of 24 h/s of 36 patients. Each patient's data has approximately 127,232 series points.

Transformation of signals into images
The proposed VTA detection technique consists of two phases. In the first step, the signal data is transformed into binary images. It is a challenge to convert the serial signals data into images and then proceed further for the detection of VTA. The following are the reasons for adopting the computer vision approach.
To automate the algorithm of VTA detection using deep CNN.
To eliminate the need for ECG signal pre-processing.
The 1D CNN is not working well as compared to 2D CNN on signal data (Wu et al., 2018), therefore there is a need to transform our signals data into images. The main problem occurs while findings QRS complex in ECG data. For CNN, there is no need for finding the QRS complex. To increase the accuracy and specificity of the approach.
In the second step, deep features are obtained from images. Finally, these extracted features are fused, and selection based on entropy is applied. The selected features are later fed to Support Vector Machine (SVM) and KNN classifiers for classification results.
The following are the phases of transformation signals points into binary images.

Data normalization
Data normalization is an essential step to VTA detection. Before the transformation, the data must be normalized. Normalization depends on two phases: first, signal data points are split into equal parts which are divisible into total signals points without any data loss. In the second phase, these signals are reshaped into 32 Â 32 binary images. Each patient has 24hrs of recorded ECG data, in which we have 127,356 data values. After carefully examining the last value of signals which is repetitive from S to T peak, we subtract the last 380 data values of every patient's signals data, which are not playing any role in the arrythmia detection, thus obtaining data 126,976 values. To perform transformation, first, we split each data series into 124 segments. Then, every person's data Table 1 Publicly available databases.

Database
Taken from Creighton university Ventricular Tachyarrhythmia's (cudb) contain 124 Â 1024 sequences. Second, we reshape each 1,024-sized segment into an image of 32 Â 32 size. The result is 124 images for each patient. Mathematically, we can describe the transformation as an inverse of the vectorization operation, which converts the matrix into a column vector. Specifically, the vectorization of a m Â n matrix A, denoted vec m;n A ð Þ, is the mn Â 1 column vector obtained by stacking the columns of A as follows: vec m;n A ð Þ ¼ a 1;1 ; …; a m;1 ; a 1;2 ; …; a m;2 ; …; a 1;n ; …; a m;n Â Ã T .
Then we define the proposed transformation formally as follows: where S is one patient's signal data, and S 0 is the patient's data converted to the square matrix, which can be represented as 2D image. Figure 2 illustrated the splitting of an image, which describes each step-in detail on how to transform the ECG signal to a binary image. After splitting and reshaping successfully, we save the new dataset. The new dataset contains normal and abnormal images of serial data of signals for multiple patients. The sample images of our new dataset and the detail of transformation are described in Fig. 3.

Pre-trained CNN features
Deep CNN models have been successfully used for solving numerous tasks in computer vision. CNN takes an input image, forwards it to different layers, for instance, convolutional, nonlinear, fully connected, and pooling to get an output. In computer visualization, transfer learning (TL) is typically expressed using pre-trained models. Because of the high computational cost of training such models from scratch, the pretrained models can be used.
For feature extraction, we adopt three pre-trained deep CNN models (VGG19, AlexNet and Inception-v3) for deep feature extraction. These models were selected because of their high robustness and proven efficiency in biomedical data ana applications. The purpose of adopting these three models is to process different size images and get depth features. To complete this process, we first resize the 32 Â 32 image into different sizes 224 Â 224 Â 3 for AlexNet and VVG19 and 299 Â 299 Â 3 for Inception-v3. Besides, we convert the binary image I binary into three-colour space using the following manner as input for pre-trained models.
The proposed feature extraction using transfer learning (TL) is illustrated in Fig. 4. TL is described as the potential of a machine to use knowledge and skills learned while solving one set of problems (source) to a different set of problems (target). The purpose of the TL is to improve the performance of a new dataset based on the existing model and to acquire useful features and classification. It can be described mathematically as: Where I represents an image, S and T represent labels of training data of source and target domain.
The training on images is done by using DCNN pre-trained models and get 1 × 4,096 features from AlexNet using FC7 layer and 1 × 4,096 features from VGG19 using FC7 layer and 1 × 2,048 features from Inception-v3 using avg-pool known as f 1 ; f 2 ; f 3 , respectively.
Feature fusion is performed by concatenating features from three neural networks. We adopted an approach similar to the one proposed in Ma, Mu & Sha (2019). The concatenation is performed as follows: Therefore, the concatenation process is enriching feature diversity to make the classifier perform better. Afterward, these features are fused, concatenated up, and finally get 10,240 features from these models. Here, FC represents a fully connected layer and follows the same structures of the connected feed-forward network, and it can be defined as: where x i is known as an input vector of i-th class, w and b represent the weight and bias of constant value. After fusion, features are fed further for classification. By the reduction of the features, the execution time is decreased with increased performance. Here, the entropy-based feature reduction method is used, which can diminish the number of features based on entropy value. We compute the entropy of fused features using the following equation: , and pi is the probability of the extracted feature space, which is defined by Pi ¼ pr X ¼ i ð Þand denotes the size of all feature spaces, which gives a new reduced feature 1 Â 5; 120 feature vector which is 50% of the total features and fed these features to the classifier.
The overall model of the proposed fast VTA detection is depicted in Fig. 5.

RESULTS AND ANALYSIS
For performance measure, the following metrics are determined, where true positive represents correctly recognized VTA, false positive shows incorrectly recognized VTA and false negative determines inaccurately rejected VTA. Results of the proposed method are computed using five different experiments. In the first experiment, DCNN features are extracted using the AlexNet model by performed activations on the Fully Connected layer FC7. The data division approach of 50:50 is adopted for training and testing to validate the proposed technique. Also, the 10-fold cross-validation is adopted on all experimental results. During this experiment, the best testing classification accuracy for AlexNet is recorded at accuracy 91.2%, FNR 8.0%, sensitivity of 91.9%, and specificity 90.5% while using Cubic SVM as final stage classifier. Results after classification on Cubic SVM are crosschecked with nine other classifiers, as it is shown in Table 2.
In the second experiment, the deep CNN features are extracted using the VGG19 model by performed activations on the Fully Connected layer FC7. Similarly, with AlexNet, a data division approach of 50:50 is adopted for training and testing to validate the proposed technique. Moreover, the 10-fold cross-validation is adopted on all experimental results. During this experiment, the best testing classification accuracy for VGG19 was recorded at 92.1%, FNR 7%, sensitivity of 93.0% and specificity 92.0% using quadratic SVM as a classifier. The results after classification with quadratic SVM are crosschecked with eight other classifiers as it is depicted in Table 3.
Furthermore, the next experiment on DCNN features extracted using the InceptionV3 model by performing activations on the Avg-Pool layer. For this purpose, the same data division approach of 50:50 is adopted for training and testing to validate the proposed technique. The 10-fold cross-validation is adopted on all experimental results. During this experiment, the best testing classification accuracy for InceptionV3 was recorded at 91.5%, FNR 7.7%, sensitivity of 92.2% and specificity 90.9% with Quadratic SVM. Results after classification on Quadratic SVM are crosschecked with seven other classifiers in Table 4.
In the next experiment, fusing the features obtained from AlexNet, VGG19, and InceptionV3 is performed. These feature vectors are fused to make a standalone feature vector representing all three pre-trained models. 10-fold cross-validation is adopted on all experimental results. During this experiment, the best testing classification accuracy for fused feature vector was recorded at 96.6%, FNR 3.0%, sensitivity of 97.12% and specificity of 95.99% on Cubic SVM. Results after classification with Cubic SVM are crosschecked with six other classifiers in Table 5. In another experiment, we performed entropy-based feature selection. We calculated the entropy of fused features. After entropy calculation, we choose entropy-based features from it and start from the first 1,000 features and train them. In the second step, we take the first 2,000 and up to 8,000 features. The entropy of the fused feature vector is calculated, which derived an entropy feature vector. Therefore, the entropy vector is sorted into ascending order. We get the highest accuracy on selecting 5,120 features, which are 50% of our data. After that, we take 25% of the data, but the accuracy decreases. The results after classification on the Cubic SVM are crosschecked with seven other classifiers in Table 6. We present the confusion matrix of the best accuracy model in Fig. 6. Besides, we have experimented on the Cubic SVM by selecting the features in ascending order on 10-fold cross-validation, as it is shown in Table 7.

Comparison with other works
There are most of the traditional approaches and latest techniques like CNN and deep learning for the diagnosis of VTA. However, from the literature, this can be well known  that these computer-aided diagnostic (CAD) systems have a consistent workflow. For example, one system achieved an accuracy of 94.07% and 91.5% on the MIT-BIH dataset (Acharya et al., 2017); moreover, they are also time-consuming. The latest technique, which included a deep CNN to make the algorithm automated, gained a higher accuracy of 97.6% on a similar dataset (Ullah et al., 2020). Ullah et al. (2020)   Algorithm (GA), for ECG beat classification, an achieved an accuracy of 97.7% on the MIT-BIH dataset. Yang et al. (2021) proposed an ensemble multiclass classifier that combined mixed-kernel-based extreme learning machine (MKELM) as base learner and random forest as a meta-learner, achieving an overall accuracy of 98.1% in classifying four types of heartbeats. The findings of the literature confirm our finding that if we transform our signals data into images then CNN works better, and we get the highest accuracy using different classifiers. We summarize the related works in Table 8.

DISCUSSION AND CONCLUSIONS
There are many the traditional approaches and latest techniques like CNN and deep learning used for the diagnosis of VTA. The main problem occurs when different cofactors affect like QRS complex and segmentation. If the data is not appropriately segmented, the accuracy problem occurs in the prediction of VTA. The significant problems that occur after applying the pattern recognition technique are: i) the amount of data came out for processing is enormous, it is difficult to manage and process those large amounts of values; ii) the limitations of traditional techniques and methods-the previous techniques are restricted to single feature searching capabilities of signals; iii) cardiac cycle dynamics reflect underlying physiological changes.
iv) big data is required for training, which takes more time.
From the analysis of related literature, it concludes that there are recent surveys that involved CNN for the prediction of arrhythmias but the highest accuracy they achieved is 91.2%. CNN models are not working well on signals data as mentioned in the problem statement. For this problem we need to convert one-dimensional signal data into a two-dimensional image (matrix). This is a big challenge here to normalize signal data and transform it into binary image without loss of any information, because ECG signal are non-stationary.
To overcome the above problems, we introduced a novel approach where the main contribution is to convert ECG signals into binary images and automate VTA detection using deep learning and get higher accuracy and less time consumption. The proposed model is tested on MIT/BIH using pre-trained models, Alex Net, VGG19, and InceptionV3. Higher accuracy (97.6% using Cubic SVM as a final stage classifier) is achieved than existing methods, and the execution time is minimized too by making the algorithm automated using CNN.

Future work
This study leads to a future direction where the aim to make a variant architecture of the network model for the prediction of different arrhythmias, including ventricular and atrial. In future work, the framework will be trained and tested on big data. If this processing of feature fusion and feature selection can be applied to other domains after selecting the required features, the results might improve performance in terms of effectiveness and efficiency. The proposed technique is not only limited to the ECG image classification. It can be applied to any other domain such as electroencephalography (EEG), which is directly connected with efficient feature extraction, fusion and selection.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.