Meta Structural Learning Algorithm with Interpretable Convolutional Neural Networks for Arrhythmia Detection of Multi-Session ECG

Detection of arrhythmia of electrocardiogram (ECG) signals recorded within several sessions for each person is a challenging issue, which has not been properly investigated in the past. This arrhythmia detection is challenging since a classification model that is constructed and tested using ECG signals maintains generalization when dealing with unseen samples. This article has proposed a new interpretable meta structural learning algorithm for this challenging detection. Therefore, a compound loss function was suggested including the structural feature extraction fault and space label fault with GUMBEL-SOFTMAX distribution in the convolutional neural network (CNN) models. The collaboration between models was carried out to create learning to learn features in models by transferring the knowledge among them when confronted by unseen samples. One of the deficiencies of a meta-learning algorithm is the non-interpretability of its models. Therefore, to create an interpretability feature for CNN models, they are encoded as the evolutionary trees of the genetic programming (GP) algorithms in this article. These trees learn the process of extracting deep structural features in the course of the evolution in the GP algorithm. The experimental results suggested that the proposed detection model enjoys an accuracy of 98% regarding the classification of 7 types of arrhythmia in the samples of the Chapman ECG dataset recorded from 10646 patients in different sessions. Finally, the comparisons demonstrated the competitive performance of the proposed model concerning the other models based on the big deep models.


I. INTRODUCTION
Computer-aided detection (CADe), also known as Computer-aided diagnosis (CADx), are systems that help physicians to interpret medical images and signals. ECG recording techniques, X-ray, MRI, and ultrasound systems create a lot of information and a professional radiologist or physician has to analyze and assess them in a quite short period of time [1]. CAD systems process the digital signals and images and highlight the suspicious parts such as possible diseases to facilitate the physician's decisionmaking process. These assistants can help physicians to reduce the human error caused due to fatigue [2]. One of these diagnosis systems, which is quite developed is the system for automatic diagnosis of cardiac diseases using ECG signals. The electrical signals generated by various activities of upper and lower heart muscles are measured with electrocardiograms (ECGs). However, sometimes the performance of each muscle is affected by external factors such as excessive blood fat, which can cause arrhythmia in the electrical performance of the heart. For instance, a heart problem on the walls of the ventricles can be considered a cause of arrhythmia [3]. Moreover, the supraventricular on the upper walls of the ventricles in a region called atria is another complication leading to arrhythmia in the electrical signals of the heart. According to the statistics, heart complications account for one-third of the mortality rate in the adults aged 35-90 years old [4]. Hence, researchers have used artificial computer systems to propose efficient methods for the early diagnosis and classification of arrhythmia through ECG signals. Deep learning models such as convolutional neural networks (CNNs) [5,6], AlexNet [7], VGGNet [8], and GoogleNet [9] have proven greatly efficient in recent years. The relevant analyses have achieved high rates of accuracy in the diagnosis of arrhythmia through ECG signals [8,10,11].
From a medical standpoint, every lead observes the heart from its own perspective in multi-lead ECG signals. These signals are categorized as two general classes, the first of which includes the chest signals that observe the heart from various angles of the axial plane. The second class contains the body signals that view the heart from various angles of the coronal plane. Each of these signals highlights a part of the general reality of the heart [12]. Although the arrhythmia of a signal is stable during each ECG signal recording session, the range of this arrhythmia might experience some changes. In general, deep learning methods proposed for the diagnosis of arrhythmia handle ECG signals recorded in a single session. The outstanding characteristic of deep learning models developed through a single-session ECG signal is their high accuracy. For instance, Zheng et al. [13] proposed a CNN model-based method yielding the accuracy of 97% regarding the classification of seven types of arrhythmia in the sample dataset of 10646 patients and 35565 ECG signals, all of which were recorded in a single session. Big-ECG [14] is one of the well-known methods. This method first, diagnosed 45 patients with QRS Complex, and then, classified the signals using the Random Trees model. It obtained an accuracy of 95.6%. In addition, Faust et al. [15] employed a fine-tuned method along with transfer learning in an ImageNet model. They achieved 97% of accuracy in the classification of seven types of arrhythmia in the samples of a dataset including 10646 patients and 35565 ECG signals, all of which were recorded in a single session. Since these models were developed through single-session ECG signals, their main deficiency is their low generalizability while dealing with unseen samples.
As discussed previously, generalization is essential, for each patient's range of arrhythmia might undergo some changes in different sessions. Moreover, Da Silva et al. [16] proposed a dataset of ECG signals recorded in different sessions called CYBHi. They indicated that the only accurate method for the development and evaluation of each model of ECG signal detection could be achieved when the model was developed and tested through the signals recorded in separate sessions. Hence, the advantage of developing an arrhythmia detection system from the multi-session ECG signals is the comprehensive performance analysis of the arrhythmia diagnosis method through the ECG signals in the disjoint acquisition session. As opposed to the previous techniques, this method will be tested on different states of humans within several days of experimentation. Therefore, the proposed method proved to be generalizable and succeeded in preventing the learning model overfitting while dealing with new and unseen samples. This is regarded as an extension of the permanence analysis. However, due to generalization defects, there is a massive literature gap in the use of deep learning models for the signals recorded in different sessions.
Meta-learning is defined as an emerging method for developing the ability to deal with new conditions in machine learning models [17]. Meta-learning emphasizes the preparation of guidance for machine learning models so that they can make the best decision in new conditions including unseen samples. Moreover, meta-learning seeks a way towards learning to learn. Thus, it plays a key role in resolving the problem of arrhythmia classification in multisession ECG signals [18]. In recent years, numerous methods have been proposed to develop a model for the demonstration of the same behavior while handling the signals recorded in various sessions, numerous algorithms are introduced for meta-learning including reinforcement learning, transfer learning, and active learning. The following two subjects should be examined to employ meta-learning for multi-session signals: 1. ECG signals are structural signals consisting of QRS Complexes. Therefore, the ordinary meta-learning algorithms are not appropriate for this signal, and an approach that is a combination of meta-learning and structural learning should be used. 2. Interpretability in meta-learning is a crucial principle since every model that seeks to obtain a higher level of generalization must be able to interpret it easily, plus it should not be similar to common neural networks of the black box [19,20].
The challenge addressed in this article is the arrhythmia classification of multi-session ECG signals as a new problem in this field. Therefore, this article proposed a structural meta-learning method that can be described in CNN to resolve the problems of arrhythmia classification in multi-session ECG signals. The following are the three main keywords in the proposed method: 1. Meta-learning, 2. Structural Learning, 3. The Describablity. The solutions to present each of these parts will be explained in detail in this article. First, a compound loss function is designed in CNN, which is capable of implementing the learning to learn feature in the meta-learning. In this loss function, guidance is provided for the central model to enable it to self-educate. Second, to add describablity feature to CNN, this model was encoded by evolutionary trees in the genetic programming. These trees are CNN models that are learned in the course of the evolution process. Third, in order to add the structural learning feature to the proposed meta-learning algorithm of trees in GP, the CNN trees carry out the deep feature learning process using the morphological structural operators. The experimental results section of the number of parameters and computational complexity of the proposed model for classification of the samples of the Chapman ECG dataset including 10646 patients are evaluated [13]. The experimental results suggested that the student model achieved an average of 97% accuracy through Lead III input signal for the classification of 7 types of arrhythmia in the ChapmanECG dataset.
The main innovations of this article are as follows: 1. Providing a new method for arrhythmia classification of multi-session ECG signals for each person (i.e., training and testing the model in two separate sessions) 2. Introducing a new describable meta-learning algorithm via encoding the CNN model in the evolutionary trees of the GP algorithm 3. Adding the structural learning feature to the proposed metal-learning algorithm to detect QRS Complex in ECG signals using the morphological structural operators in the evolutionary trees of the GP algorithm Then, section II of the article reviews the studies on arrhythmia classification of ECG signals. Section III of the article, describes the proposed structural meta-learning algorithm. Section IV comprises the experimental results. Section V provides a conclusion for the article.

II. RELATED LITERATURE
The ECG signals include various arrhythmia, among which 11 types were mentioned in the previous section. The methods are divided into different classes with respect to the number of arrhythmias classified in various investigations. Besides, it should be noted that all investigations introduced for diagnosing arrhythmia were assumed to be carried out based on single-session ECG signals. Among all methods introduced up to now, deep learning is one of the top algorithms in the field of the classification of ECG signals. Employing more layers in deep learning enables us to have a deeper model and when the ECG signals pass through these layers, more distinguishing features can be extracted to facilitate the detection of the pattern of arrhythmia in them. The total process of extracting the features in the deep learning models is carried out automatically and without the need for hand-craft feature extraction. In recent years, a broad spectrum of ECG dataset-based deep learning methods are introduced such as the MIT-BIH dataset and several other private datasets. The methods introduced [21], [22], and [23] are CNN-based deep learning models, which have classified various arrhythmia in ECG signals. The majority of these methods use 1D filters in CNN since the ECG signal in the MITBIH dataset is 1D as well. Here several CNN-based methods are explained. In [24], a CNN model with 8 convolution layers, 4 pooling layers, and one fully connected layer is used, which obtained the accuracy of 93.1% for 22 patients and 109449 accurate pulses. In addition, in [21], a CNN model enjoying 8 convolution layers bearing 1D filters, 4 pooling layers, and one fully connected layer is proposed for the classification of 5 types of arrhythmia (non-ectopic, supraventricular ectopic, ventricular ectopic, fusion, and unknown beats). This method obtained an accuracy of 94%.
In [25], a CNN model comprising 8 convolution layers bearing 1D filters, 4 pooling layers, and two fully connected layers is used for the classification of 12 types of arrhythmia. In this article, the proposed model was examined on a dataset, which is not publicly accessible including 91232 ECG records from 23500 male and 2004 female patients. The results suggested that this method calculated accuracy of 98.5%. In [26], a deep learning method enjoying the U-Net architectural standard, which comprised thirty-two 1D residual layers, was introduced to classify 5 types of arrhythmia.
The results suggested that this method calculated accuracy of 98.5%. In [22], a CNN model including 12 convolution layers was used for the classification of 44 records. In [27], CNN model consisting of 8 convolution layers bearing 1D filters, 4 pooling layers, and two fully connected layers enjoying the End-to-End feature were suggested for the classification of 17 and 15 arrhythmias in ECG signals, respectively. Per the results, this method calculated the accuracy of 98.5% and 94%, respectively.
Besides these methods, some models evaluated their methods on 12-lead ECG signals. In [28], an LSTM model was used on 38,899 patients and for 12 different arrhythmias. The results manifested 90% accuracy. Moreover, in , a CNN model was used on an unknown number of patients and for 8 different arrhythmias. The results manifested an F1-Score amounting to 81%. Despite the advantages of the aforesaid models such as accuracy of higher than 90%, as well as the End-to-End feature, the dataset used in these models including MIT-BIH were collected about 40 years ago and has problems such as the imbalance class and a small number of patients. Therefore, the results of these studies cannot be approved for practical use. Zheng et al. [13] collected a dataset called ChapmanECG from 12-Lead ECG signals from more than 10,000 people. The interesting point about this dataset is that the signals are recorded from the patients within several different days and during different sessions. A primary evaluation based on the gradient boosting tree model was conducted on the classification of this dataset. This model obtained the accuracy of nearly 97% for each class of arrhythmia, separately. Other than the evaluations conducted in [24], a CNN model comprising three 1D convolutions was proposed for the classification of 12-lead signals of the Chapman ECG dataset. It yielded an average AUC of 79.60 (AUC=79.60%). Faust et al. [29] presented a new and more integrated method on this dataset. In this article, a transfer learning-based fine-tuning method was implemented in the Image-Net deep learning model. Following the evaluations of this article, an accuracy of 92.24% was obtained for the classification of 7 types of arrhythmia and 96.13% accuracy was obtained for the classification of 4 types of arrhythmia. The great deficiency of this method is using the folding method for dividing the data between training and testing. This method results in using repetitive data from a special session in training and testing. All methods introduced up to now enjoyed a fundamental presumption, which is the fact that ECG signals of training and testing stages were recorded for each person during the same session. As mentioned in the previous section, with regard to the related literature, the only study concerned with the challenge of classification of multi-session ECG classification was carried out by Da Silva et al. [16]. This article sought to demonstrate this challenge, thus, it assessed its proposed method that was based on the CNN model on the ECG signals of one session of the training and testing stages and acquired the accuracy of 96%. However, the accuracy of the same model for the multi-session ECG signals in the training and testing stages was reduced down to 88%. Thus, the classification of arrhythmia of multi-session ECG signals for individuals is a challenging subject, which is still open to debate.

III. PROPOSED METHOD
The ECG dataset used in this article was collected by Chapman University and Shaoxing People's Hospital (Chapman ECG in short) [13]. Table I shows the numerical details of the dataset. The ECG signals for each person were recorded within several days and during different sessions. This enables us to assess the proposed method of this article on multi-session ECG signals. In this dataset, the 12-lead ECG signals were recorded from 10646 people with a frequency higher than 500Hz. Each ECG signal in 12-lead is a 10-second strip. In addition, an initial pre-processing was applied to this dataset to smooth the ECG signals using the Butterworth filter and the Non-Local Means technique. Figure 1 shows a general schematic of the meta-learningbased model proposed in this article. The input of this model is based on a dataset in = { , } , in which signifies one of the ECG signals in the Chapman dataset. In addition, indicates the no arrhythmia label, which includes one of SB, SR, AFIB, ST, A, SI, SVT, AT, AVNRT, AVRT, SAAWR. This model includes two phases for one task, i.e., meta training and meta testing. In the meta training, the dataset of training = { , } is used for training the classifications, in which shows the number of training samples.

A. Meta Structural Learning Model
The meta testing aims at detecting the labels of a query sample = , in which indicates the number of queries. Taking into account that ECG signals in the Chapman dataset were recorded during different sessions (i.e., ≠ ∀ , ∈ ,), the meta-learning model must include the following two characteristics: 1. Non-linear mapping in the artificial neural networks must enjoy the generalization feature for unseen samples in various sessions, 2. Mapping must preserve the relationship between the classes of the unseen samples in . Thus, this article seeks to transfer the knowledge related to ECG signals in various sessions between classification models. In light of that, per Fig. 1, a two-phase communication network is proposed for this problem. First, the model for extraction of transferable features is meta learned on the training samples using the CNN trees. Then, the features extracted from the query model will be extracted based on the same model, and they will be added to the model to calculate their distance. Then, a relationship between training samples and the query will be acquired using the labels space.
In general, the los related to the model that is manifested in the Figure will be formalized using an equation.
In this equation is the distance between feature maps extracted by CNN tree for two input signals of and , which is calculated as equation (2). In this equation, ( ) shows the interpretable representation learning section for the input. It will be elaborated on in section 3.3.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3181727 In addition, is the distance between the estimated labels by the model for two and input signals that will be calculated through Gumbel-Softmax distribution.
( , … , ) In this equation, are the soft labels that are used in the Gumbel-Softmax distribution. These labels are calculated using the following equation.
In this equation, signifies the calculated probability by model, plus is the soft label of this model. By the hard label coded as one-hot 1 , merely the corresponding information of classes can be transferred among the models, and it will be impossible to train other information such as the relationship between various arrhythmia classes. Therefore, in equation 4, a soft label and a hard label are used together for training models. In equation 4, the parameter indicates the extent of the softness of the labels. In the normal classification applications, is regarded as equal to 1. In the Gumbel-Softmax distribution, a higher amount is considered for to obtain a softer probable distribution in the classes that include more information about One-Hot coding. Upon the increase of , the inclines towards a uniform distribution [30,31].
In coding 1 of One-Hot the amount of real class is considered to be 1 and the rest of classes are zero.
This article used the middle layers of CNN models to extract the automatic features. As stated in equation (2), , ∀ ∈ {1, . . . , } in model is a deep network that uses the convolution layers to extract the deep features from input signal. Figure 2 shows the architecture used for the CNN model as the th example. As mentioned in [20], the deep learning methods are the black box models, and what happens inside them is not clear. Thus, every meta-learning method designed based on these models will be uninterpretable. It reduces the possibility of using them for different sciences, as well as for non-professional use. Therefore, this article used a new CNN network that is called evolutionary deep learning. In this approach, the CNN networks are coded by genes in the GP algorithm. In the GP algorithm, genes resemble trees with the ability to make plans, plus they are interpretable like the decision tree and can be written as simple mathematical equations [32]. Figure 3 signifies the general diagram concerning the general stages of constructing the CNN trees as the GP algorithm. At first, the CNN models were encoded as GP trees. At the beginning of training, multi-gene samples are constructed to generate the population of the first generation in the GP algorithm. The collection of the initial population can be displayed as a collection of = { , . . . , , . . . , }. In this collection, is the member of this population that is generated as a multi-gene sample.
Moreover, is a set of mapping functions that are coded as genes and can be expanded as  Here, signifies the number of mapping functions or genes that equal the number of classifying models in Fig. 1. Besides, is a dimensional function that is displayed as CNN trees. The process of representing an evolutionary CNN tree as genes for the extraction of the structural features will be elaborated in section III.C. In the course of the GP algorithm in each generation, these trees are used to transfer the input to the new space in the form of representation learning. The process of changing the numerical space can be represented as } indicates the structural features in the new numerical space. ( ) can be expanded with the following equation.
In which signifies the transpose operator and is the number of classifying models. When all samples are executed in the current generation of the GP algorithm, a set of deep features will be calculated in the new numerical space using the mapping functions. In the GP algorithm, the accuracy of the deep features obtained in the classification of ECG signals, as well as the complication of its structure is used as fitting functions [33]. To implement this compound fitting function in this article, the Pessimistic Error Estimate (PLE) is used for each gene in the GP algorithm like the following equation. In addition to this equation, the Ω function shows the complexity of gene calculation and is the set of all nodes of all genes in . Taking into account that this article examines the classification of various arrhythmia of ECG signals, the classification error of signals in the dataset is a great parameter to evaluate the CNN trees. Moreover, per Occam's Razor's theory, to reduce the total computational overload of the GP algorithm, the Ω function is used in the fitting function of each tree. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3181727  Otherwise, the population of the current population are transferred to the subsequent generation via the following three stages: 1. The best samples of the current population are selected based on their fitting function and the rest are eliminated, which is called the natural selection stage.
2. Afterward, the crossover and mutation genetic operators are applied to the remaining samples to construct a more mature set of samples. 3. The mature samples created in the previous stage construct a new generation together with a set of randomly constructed samples to observe the population size in all generations.

C. CNN Tree Structure
It was stated in the previous section that gene, in the sample is a mapping function in the form of a CNN tree, which is used to extract the structural features from signal. The representation of the CNN tree encoded in gene, as well as its functions and terminals, will be explained here. The representation of the CNN model in the form of GP trees is addressed in [34]. This article used a modified version of it per [20], which utilizes the morphological convolutional functions in this tree and it is appropriate for QRS Complex analysis in ECG signal. Figure 4 shows an instance of the CNN trees. As displayed in this figure, the surface of this tree comprises a variety of layers. There is an input layer in the leaf nodes of this tree that receives the signal as the input. Afterward, there is a morphological convolutional layer, which is normally followed by a pooling layer. In this tree, a node can be a layer of the combination of morphological convolutional / pooling. Before the root layer, there is a concatenation layer. Finally, there is the root layer, which is the output layer. In this three, the convolution layer carries out the operation of extracting the structural features using the morphological operators, which is fully explained in the next section. The pooling layer is used after the convolution layer to reduce the output dimensions of the convolution. Besides, the concatenation layer connects two input layers together. The output layer is the same as the Flatten layer in the CNN, which forms the feature diagram.  Table II shows the set of functions used in the CNN tree. First, the functions used to change the numerical space of the features include Conv, SQRT, ADD, ReLU, Sub, and Abs, among which Conv is the most important function. This function is responsible for applying the morphological operators to the input ECG signals. On account of their special geometrical characteristics, the morphological functions such as dilation and erosion are capable of perfectly analyzing the complexities of ECG signals including QRS Complex. The morphological operators such as erosion and dilation are beneficial for analyzing the shape-oriented signals due to their theoretical framework and lower computational complexity [35,36]. The morphological operators, i.e., erosion and dilation, are the restricted form of counter-harmonic mean morphology [37]. ( ) in equations 2 and 3 is the counter-harmonic mean morphology filter that is expanded as follows. The type of the operations of this function is determined with respect to the amount considered for , (q= 0 linear, < 0 pseudoerosion, > 0 pseudodilation). Sub and Add functions carry out the weight addition and subtraction operations of two signals based on the weights of and . Two input ECG signals might have different measurements. Thus, the ECG signal will be cut using the aforementioned function to obtain two ECG signals of the same size. The Sqrt, ReLU, and Abs functions are used to change the amount of ECG signal and change the numerical space of the respective sample. In the new networks, the ReLU is preferred over the activating Sigmoid function for hidden layers for two reasons. First, it is simple and easy to use. Second, it does not cause a local minimum problem. In this function, in case the input amount is less than zero, the output will be the same as the input, and in case the input is less than or equal to zero, then the output will be zero. The ReLU function has a fixed derivative for all inputs greater than zero. This fixed derivative accelerates the network's learning. The Concat1, Concat2, Concat3, and Concat4 functions are used in the concatenation layer, which receives several ECG signals as input and displays them as a diagram in the output. The MaxP function performs a downsampling operation on the ECG input signal. This function can reduce the dimensions of the received ECG signal. Table III shows a set of terminals and amounts authorized for them to be used in the program. The terminals include , × , × , × , , , , and . shows the input ECG signal. The × , × , and × terminals show the filter kernel that is added to this function as the second input of Conv. Considering that the convolution function performs the morphological operator, each of these filters is a diamond, disk, line, rectangle, and square-shaped matrix. and are added to Sub and Add functions as input and their amounts range between 0.000 to 1.000. and terminals are the same size as the kernel of MaxP functions. Their amounts are created randomly based on the initial range, and they are evolved in the course of evolutionary learning. Table III shows the list of the terminals and the range of the authorized amounts used for them in the nodes of the leaves of the CNN tree. The ECG input signal is displayed through the measurement of × 1. × , × , and This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. terminals show the convolution filter and the second input of Conv. Taking into account that the used morphological filter is a matrix in the shape of a diamond, disk, line, rectangle, and square. and are added to Sub and Add functions as input and their amounts range between 0.000 to 1.000. and are the same size as MaxP kernel functions. Their amounts will be evolved randomly and in the course of execution of the GP algorithm.  (2), the value of 0.6 is considered for parameters and 0.3 for parameters.

A. RESULT ANALYSIS METHODOLOGY
To analyze the results first the confusion matrix is calculated for test samples. Table IV shows the method of formation of a confusion matrix for tumors with the label of true and predicted , . The combination of 8 labels of true and predicted results in an 8*8 confusion matrix. Four evaluations criteria, i.e., , , , and , created based on the confusion matrix are employed for evaluation of the classification models. The total accuracy of , demonstrates the performance of classification model for all sub-classes in a special fold. This criterion is calculated by dividing the number of classified accurate samples into the total samples. This criterion can be easily obtained in the confusion matrix through adding the principal diameter with the total samples of the matrix as follows.   For instance, per the results of Table V, the proposed method for the classification of the samples of the AF class enjoyed the mean of PREC cl =94.92%, SEN cl =94.79%, SPEC cl =95.97%, and ACC cl =97.21% for 12 leads. This table demonstrated that the proposed method delivered a better performance for lead III (that is from the class of the chest leads) than other leads. The proposed method for the classification of the samples of the class AF amounted to the mean value of PREC cl =95.02%, SEN cl =95.56%, SPEC cl =95.10%, and ACC cl =97.15% for the lead III in the Chapman dataset. The proposed method delivered a proper performance regarding the classification of samples of the class AF with respect to every constructed lead and test. For instance, the proposed method demonstrated a lower performance when trained by lead V1 and obtained the mean value of PREC cl =92.18%, SEN cl =94.23%, SPEC cl =95.29%, and ACC cl =96.00% for seven classes of arrhythmia in the Chapman dataset. However, these results were dropped by the mean of PREC cl =2.84%, SEN cl =1.33%, SPEC cl =0.19%, and ACC cl =1.15% when trained by lead III. It signifies that regardless of the lead it is trained to, the proposed method preserves its appropriate performance. In general, per Tables V to XI, the proposed method delivered a relatively better performance. It is interesting to know that the lead III enjoyed a better performance in all of these tables.  Following the results of Tables V to XI, the proposed  method trained by lead III provides better performance for all arrhythmia classes. Thus, the performance of the classification model constructed with lead III is investigated in more detail via a confusion matrix. In Figure 5, section(a), the confusion matrix that belongs to the lead IIIbased classification model is provided in the test stage for various arrhythmia classes in the Chapman dataset including Sinus Bradycardia (SB), Sinus Rhythm (SR), Atrial Fibrillation (AFIB), Sinus Tachycardia (ST), Atrial Flutter (AF), Sinus Irregularity (SI), and Supraventricular Tachycardia (SVT). In this matrix, the principal diameter is an indication of the correct classes (TP), which is quite crucial for examining a classification model. In general, this table shows that the proposed method enjoyed an appropriate distribution regarding all classes with no emphasis on a specific class with overfitting and no bad performance concerning a specific arrhythmia class. In this matrix, the majority of wrong states occur between Atrial Flutter (AF) and Atrial Fibrillation (AFIB) arrhythmia classes, as well as Sinus Rhythm (SR) and Sinus Irregularity (SI) classes. In this matrix, the average of 1.8% of the AF and AFIB samples, and 3.2% of SR and SI samples are wrong. It should be noted that from the medical viewpoint, classification fault between these two arrhythmias is not crucial.
To manifest the appropriate performance of the classification model constructed concerning lead III, the convergence diagram in Figure 5, Section (b) applied to the dataset will be examined. This Figure  Per diagram, the accuracy increases upon the increase of epoch for various arrhythmia classes. The accuracy of this architecture was increased among the epochs of 50 to 300 and reached 95.18%. The accuracy was stabilized when the epoch reached the range of (350-500) and no increase has occurred. However, the lead III-based constructed classification model was stabilized in the epoch amounting to 550 and it recorded an accuracy of 95.18%. This diagram in various arrhythmia classes indicated that the number of convolution layers resulted in an increase in accuracy and architectural performance. In general, the diagram in Figure  6 shows that the classification model constructed concerning lead III enjoys a suitable convergence for the classification of the arrhythmia samples in the Chapman dataset. In this section, the proposed method of the best CNN tree model is selected to be classified in the GP algorithm. Then, it is evaluated by 12   In case the classifier is full in the agreement, then, K=1. In case there is no contract between the assessors, despite being accessed randomly, then K=1. Per the Table, the classifier of the proposed method obtained the Kappa criterion near 1 for all classes. This indicates the quality of the proposed method with regard to this criterion. Per the results of this Table, the classification model trained by all 12 leads obtained a higher mean PREC cl =3.11%, SEN cl =2.53%, SPEC cl =2.80%, and ACC cl =1.14% than the classification model trained by lead III. Therefore, it can be concluded that the construction of a classification method for the detection of arrhythmia of 12lead ECG signal delivered a better performance. However, it manifested a computational complexity and had a larger number of parameters. Then, Figure 6 sections (a) and (b) show the confusion matrix and convergence diagram of the classification model constructed concerning all 12 leads of ECG in the test phase, respectively. In general, this confusion matrix demonstrates that the proposed method constructed concerning all 12-lead ECG manifested an appropriate distribution regarding all classes, with no emphasis on a specific class with overfitting and no bad performance concerning specific arrhythmia classes. In this matrix, there were errors between the arrhythmia classes that are not crucial from the medical viewpoint. Figure 6, Section (b), shows the convergence diagram of the classification model based on 12-lead ECG signals. The proposed method enjoys a proper convergence regarding various arrhythmia classifications in the Chapman dataset. Besides, even the convergence rate is improved in comparison to Figure 5, Section (b) and it has rapidly obtained convergence.  The Table XIV shows the model complexity and execution duration of the proposed method during the training, validation and testing phase. Table XIV demonstrates that the training section of the proposed method had an execution duration of 25 minutes and 20 seconds. This time has no effect on the process of executing the proposed method since the training phase is offline, which is crucial during the development of the system. However, the most important time is the duration of the testing. Accordingly, Table XIV shows that the execution duration of the proposed method is 59 seconds. It signifies that the proposed method can diagnose arrhythmia in a split second.  In this section, the performance of the classification model constructed concerning lead III is compared to the other state-of-the-art methods based on deep learning. First, in Table XIV the classification model constructed concerning lead III and methods whose input is based on Single-lead ECG are compared. This Table includes the details such as the number of patients, number of ECG records, the number of the diagnosed classes and rhythms, as well as the method used in them. The results of the performance of the previous methods are displayed on the basis of the criterion that is reported in them such as accuracy or F1-Score. Various deep learning models such as LSTM, CNN, and RNN are used in these methods. Traditionally, a broad spectrum of these methods are evaluated on an MIT-BIH Arrhythmia dataset, which belong to PhysioNet. For instance, in [39], a deep CNN model was employed for the classification of 12 rhythms. In this model, various stages such as batch normalization and data augmentation were used. The results of this investigation indicated the value of an F1 value of 83% for 53,549 patients. In [26], a deep learning model, with the standard U-Net architecture was used for the classification of 5 types of arrhythmia This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3181727 concerning the MIT-BIH Arrhythmia dataset. This article reported accuracy of 97.32% for the classification of the records of 47 patients. In [21] of the same dataset, a CNN model was proposed enjoying a preprocessing stage including data normalization.
The CNN model constituent five convolution layers, three pooling layers, and one fully connected layer. The results of this work revealed an Acc of 98.45% in the evaluation section on 47 patients from the MIT-BIH Arrhythmia dataset. Besides, these methods, recently a compound model including CNN+LSTM is introduced for the classification of the Chapman ECG dataset with singlelead signals. This method used the CNN model to generate deep spatial features from the raw ECG signals. Then, the output of the CNN model was allocated to the LSTM model to generate the deep temporal features. Its results for 10,436 patients revealed an Acc of 92.24% at the evaluation section.
In this article, the classification model constructed concerning lead III was proposed to classify single-lead ECG signals. Its performance was evaluated on the Chapman ECG dataset. Following Table XV, the classification model constructed concerning lead III obtained the Acc of 98.55% for the records of 10,464 patients. The proposed classification model constructed concerning lead III used a larger number of records than the majority of previous methods for constructing a model. The single-lead-III model demonstrated 0.02% higher accuracy in comparison to the [15], which also used the Chapman ECG dataset. In most of the previous investigations, similar records were used in the train and test dataset, which reduced the integration of the methods and cast doubt on when dealing with unseen data. A subject that was taken into account in the proposed model of this article. For a better representation of the performance of the proposed method, the proposed classification model developed considering all 12 leads of ECG was compared to the previous methods developed with regard to 12-lead ECG in Table XVI. The previous methods listed in this table are based on deep learning models such as CNN and LSTM. In [29], on account of combining CNN+LSTM, in addition to the classification of seven rhythms based on single-lead ECG illustrated in Table XV, a separate section was allocated for the classification based on the 12-lead ECG signals.
Its results for 10,436 patients revealed an Acc of 92.24% at the evaluation section. Moreover, in [15] printed recently, a method based on Detrending+ResNet model was introduced for 10,093 patients in the Chapman ECG dataset. Its results for the classification of seven rhythms manifested the Acc of 92.24% in the evaluation section. Deep learning models such as CNN deliver better performance by increasing the number of inputs. Thus, methods provided in Table XVI are regarded as pioneers and most of them enjoy quite a high accuracy. Subsection IV.D demonstrates the superiority of the proposed model over other available methods of diagnosing arrhythmia of ECG signal by manifesting the numerical results and necessary comparisons. This subsection counts several crucial reasons and required justifications for this superiority. Deep learning models are statistical models that are dependent on data distribution. These models perform well merely when faced with data distribution that they are trained with. For instance, in the [42] method, the model obtained an F-measure of almost 99% for training and testing data, however, in experiments carried out in this study it was observed that it enjoys the F-measure of 82% for the Chapman dataset. In light of that, the generalization of a statistical model is directly dependent on the sample distribution. In the proposed method, the statistical model was trained based on data distribution, and this distribution does not necessarily exist in the testing data section. As already mentioned in [19], the latent medical variable cannot be directly achieved in the medical data. Methods such as frequency analysis can express these variables. In the proposed method special frequencies were used in the ECG signal, which helped extract the functional dependency in the ECG leads. It was not examined in methods such as [15] and [29]. The final feature considered for the proposed method is the independence of this method in specifying and extracting QRS Complex. Methods such as Big-ECG [14] of their performance are directly dependent on specifying the stroke. It reduces the flexibility of the model since, in some types of arrhythmia, the QRS peak undergoes changes and is quite difficult to specify. In general, these are several justifications that can be stated to improve the proposed method in comparison to other previous methods.

VI. CONCLUSION
This article proposed an Interpretable Meta Structural Learning algorithm regarding the challenging problems to classify various arrhythmia of ECG signals recorded in several sessions for each person. Therefore, a compound loss function was provided that included a structural feature extraction fault and a space label fault with GUMBEL-SOFTMAX distribution in the CNN models. The collaboration was carried out between models to create the learning to learn feature in these models via transferring the knowledge among them when dealing with unseen samples. This article encoded the models in the form of evolutionary trees of the GP algorithm to create the interpretability feature for CNN models. These trees learn the process of extracting deep structural features in the course of the evolution of the GP algorithm. The experimental results suggested that the proposed classification model enjoys an accuracy of 98% for the classification of 7 types of arrhythmia in the samples of the ChapmanECG dataset on 10646 patients, which were recorded in different sessions. Finally, the comparisons demonstrated the competitive performance of the proposed model through state-of-the-art methods based on the big learning models.