Identifying patients with atrial fibrillation during sinus rhythm on ECG: Significance of the labeling in the artificial intelligence algorithm

Highlights • High performance of AI algorithm to detect AF using SR-ECG was confirmed in patients without structural heart disease.• The performance of AI-enabled ECG to detect AF was high especially when the algorithm included SR-ECG taken after the index AF-ECG.• A similar tendency was observed when the performance was tested in patients with structural heart diseases.


Introduction
Atrial fibrillation (AF) is among the most common cardiac rhythm disorders and is associated with increased morbidity (e.g., ischemic stroke) and mortality. One major challenge is promptly diagnosing AF after onset because of its silent nature in many patients. As tools for screening AF, many devices have been proposed over the gold standard tool of 12-lead electrocardiography (ECG), such as patient-initiated devices (oscillometric blood pressure cuff, intermittent ECG rhythm strip, or photoplethysmogram on smartphone), semi-continuous (smart watch ECG) or continuous wearable devices (long-term Holter, wearable belts, or 1-2 week continuous ECG patches), and implanted devices [1,2]. In patients with cryptogenic stroke, in which the origin of the thrombus is unknown, continuous and repeated monitoring with implanted or wearable devices have demonstrated a substantial burden of undiagnosed AF [3,4].
In practical viewpoint, simple methods to discriminate patients at a high risk of AF would help to identify candidates for such long-term monitoring device. For example, precise analysis of waveforms of resting 12-lead ECG using artificial intelligence (AI) has identified patients with AF on sinus rhythm ECG (SR-ECG) [5]. This method is unique because, although the existence of AF on 12-lead ECG is a gold standard for diagnosing AF, this method by AI gains insight into the existence of AF on ECG in which AF is absence. Attia et al. [5] reported a landmark study of AI-enabled ECG to predict AF on ECG with SR from the Mayo Clinic, and Raghunath et al. [6] reported AI-enabled ECG from the Geisinger Health System with a larger ECG database. These studies had a strong impact for their high predictive ability, with an area under the curve (AUC) of around 0.900.
After the publishment of these studies, additional important tasks in the methodologies have raised, one being labeling problems. For example, among ECG recordings with the AF label, ECG recordings in SR had different predictive abilities according to the length of time to incident AF [5,7]. Moreover, there remain tasks in the methodologies of AI-enabled ECG which have not been fully investigated. First, the performance of AI-enabled ECG may differ depending on the existence of structural heart disease because typical ECG findings may mask refined AF-related features on SR-ECG. Second, whether SR-ECG for developing AI-enabled ECG should be taken before or after the index AF-ECG is unknown. Even when AF has never been detected, undetected AF may already exist. Therefore, AI-enabled ECG may need the information on SR-ECG after AF occur. Third, the performance of AI-enabled ECG in the SR-label may be different according to the length of the observation period in the database because, simply, it is assumed that the shorter the time-period of apparently keeping sinus rhythm, the lower the certainty of the absence of undetected AF.
In the present study, we developed AI-enabled ECG within SR-ECG to predict AF using a single-center ECG database to enhance the performance of AI-enabled ECG in special reference to these unresolved issues.

Ethics and informed consent
This study was performed in accordance with the Declaration of Helsinki (revised in 2013) and Ethical Guidelines for Medical and Health Research Involving Human Subjects (Public Notice of the Ministry of Education, Culture, Sports, Science and Technology, and the Ministry of Health, Labour and Welfare, Japan, issued in 2017). Written informed consent was obtained from all participants. The study protocol was reviewed by the Institutional Review Board of the Cardiovascular Institute.

Total study population
The Shinken database includes all patients who newly visited the Cardiovascular Institute, Tokyo, Japan, excluding foreign travelers and patients with active cancer. This single-hospital database was established in June 2004. Details of this database have been described elsewhere [8]. In the present study, 19,170 patients registered between February 2010 and March 2018 were extracted from the Shinken database because a computerized electrocardiogram database has been available since February 2010. We excluded 1975 patients for one or more of the following reasons: AF on ECG at the initial visit (n = 1601), atrial flutter (n = 185; of which 8 were coincident with AF), atrial tachycardia (n = 3), paroxysmal supraventricular tachycardia (n = 190), and insufficient follow-up data (n = 4). The remaining 17,195 patients with SR-ECG were the target of the present study (Fig. 1).

Study population for development and evaluation of AI-enabled ECG (derivation dataset)
Out of 17,195 patients in the total study population, 2172 patients were selected as the derivation dataset which comprised 276 patients with AF label and 1896 patients with SR label (Fig. 1). In the present study, SR-ECGs were used for the analysis, where they were assigned to "AF label" when at least one ECG showing AF (AF-ECG) was found in the same patient in the ECG database during the follow-up, while they were assigned to "SR label" when no AF-ECG was found during the follow-up.
As shown in the flowchart (Fig. 1), patients with AF label in the derivation dataset (1) did not have structural heart disease, (2) had at least one AF-ECG in the ECG database during follow-up, and (3) had at least one SR-ECG within 31 days before or after the first AF-ECG. Meanwhile, patients with SR label in the derivation dataset (1) did not have structural heart disease, (2) had no AF-ECG in the ECG database during follow-up, (3) did not have prior diagnosis of AF before the first visit to our hospital, and (4) having an observation period ≥ 1095 days.

Study population for testing the performance of AI-enabled ECG (extra testing dataset)
Among the patients who were excluded in the process of selecting patients for derivation dataset, followings were defined as extra testing datasets and used for testing the performance of AI-enabled ECG ( Fig. 1): (1) patients with structural heart disease (extra testing dataset 1; n = 4338), out of which 340 AF label and 1516 SR label were identified with similar exclusion criteria as for the derivation dataset, (2) patients who did not have structural heart disease, had at least one AF-ECG in the ECG database during follow-up, and did not have SR-ECG within 31 days before or after the first AF-ECG (extra testing dataset 2; n = 128), and (3) patients who did not have structural heart disease, had no AF-ECG in the ECG database during follow-up, did not have prior diagnosis of AF before the first visit to our hospital, and had an observation period < 1095 days (extra testing dataset 3; n = 9735).
The patients in the extra testing dataset 2 were further divided according to the timing of the index SR-ECG from the first AF-ECG (

Data sampling
Twelve-lead ECG was recorded for 10 s in the supine position using an ECG machine (GE CardioSoft V6.71 and MAC 5500 HD; GE Healthcare, Chicago, IL, USA) at a sampling rate of 500 Hz, and raw data of digital records were stored using the MUSE data management system. Out of each 10-second ECG recording, 5-second ECG samples were extracted. The reason for sampling with a 5-second duration was because we employed oversampling to balance the number of samples between AF and SR labels [9,10]. The details of oversampling (the scientific background and the detail process of oversampling) are explained in the Supplementary document.

Derivation dataset
In patients with the SR label, the index SR-ECG for the analysis was the one obtained at the initial visit. In the SR label, each 10-second ECG was divided at the half which yielded two samples of 5-second ECGs.
In patients with the AF label, the index SR-ECG was the one nearest (within 31 days) the first AF-ECG in the ECG database. Here, the index SR-ECG with the AF label was chosen according to three patterns: the pre-(AF label 1, n = 167), post-(AF label 2, n = 242), or pre-or post-(AF label 3, n = 276) 31-day period of the first AF-ECG (Fig. 2). As three patterns of AF label were defined, three patterns of derivation dataset (SR label/AF label 1, SR label/AF label 2, and SR label/AF label 3) were consequently yielded (Fig. 2). In the AF label, 5-second ECGs were

Fig. 2.
Convolutional neural network (CNN) analysis. For each AF label, the index SR-ECG was chosen within 31 days before the first AF-ECG (AF label 1), within 31 days after the first AF-ECG (AF label 2), or within 31 days before or after the first AF-ECG (AF label 3). Using the AF label 1, 2, and 3, CNN algorithm 1, 2, and 3, respectively, were developed combined with fixed SR label. AF, atrial fibrillation; SR, sinus rhythm; ECG, electrocardiography. obtained to balance the number of samples between AF and SR labels by the data augmentation with sliding window (Supplementary Fig. 1).
In each derivation dataset, ECG samples were divided into the training, validation, and testing datasets at a ratio of 7:1:2 (details of the number of samplings are displayed in Supplementary Fig. 2A).

Extra testing dataset
In patients in the Extra testing datasets, each 10-second ECG was divided at the half which yielded two samples of 5-second ECGs per one 10-second ECG (details of the number of samplings are displayed in Supplementary Fig. 2B).

Convolutional neural network (CNN) modeling
We constructed a CNN using the Keras Framework with a Tensorflow (Google; Mountain View, CA, USA) backend and Python. Of the eight physical leads and four augmented leads with a 10-second duration on 12-lead electrocardiography (ECG) recordings, we selected the eight independent leads (leads I, II, and V1-6) with a 5-second duration. Accordingly, the original 12 × 5000 matrix (i.e., 12 leads with a 10-second duration sampled at 500 Hz) was reduced to an 8 × 2500 matrix.
The CNN model had layers for a temporal axis and a lead axis [5].
The layers for the temporal axis were composed of two parts: the convolution part and the residual part. The convolution part included a convolution layer, a batch-normalization layer, a layer for non-linear Rectified Linear Unit (ReLU) activation, and a maximum pooling layer [11]. The residual part included a combination of two residual blocks based on Residual Network (ResNet) [12] and average pooling, which was repeated N times, and the value of N was tuned to obtain the best performance (the method is outlined below). The layers for the lead axis were composed of a paired batch-normalization layer and a layer for non-linear ReLU activation, followed by a convolution layer. Thereafter, a second paired batch-normalization layer and a layer for non-linear ReLU activation were included. Finally, the data were fed to a dropout layer with global average pooling and to the final output layer activated by the softmax function, which generated the probability of AF. The architecture of the model is shown in Fig. 1B. The model was trained on a computer with 192-GB RAM and single Quadro P-2200 (NVIDIA) graphics processing units that were used to train the model using Keras.
A receiver operating characteristic (ROC) curve was created to validate and test the data to assess the AUC of AI-enabled ECG to determine whether AF was present. Using the ROC curve in the validation dataset, we tuned the number of repetitions for the combination of the two residual blocks and average pooling written above (N). Moreover, we determined the probability threshold of AF using the ROC curve. These parameters were used for the final evaluation using the testing dataset.

Outcome measurement
The primary outcome of the study was the ability of AI-enabled ECG to identify patients with AF using SR-ECG, which was assessed by the AUC, sensitivity, specificity, accuracy, and F1 score of the model. In the derivation dataset and the extra testing dataset 1, the AUC, sensitivity, specificity, accuracy, and F1 score of the model were assessed. In the extra testing dataset 2 and 3, only the accuracy was tested because the datasets were consisted of single label (AF and SR label, respectively).
We used two-sided 95% confidence intervals (CIs) to summarize sample variability in the estimates. We used exact CIs (Clopper-Pearson) to be conservative for accuracy, sensitivity, and specificity. The CIs for the AUC were estimated using Sun's and Su's optimization of the Delong method using the pROC package [13], whereas the CIs for F1 were obtained using the bootstrap method with 2000 replications. Analyses of exact CIs were performed using Python version 3.7.6 (Python Software Foundation, DE, USA), and other analyses were performed using R version 4.0.3 (The R Foundation, Vienna, Austria).

Development of AI-enabled ECG in the derivation datasets
Among all included patients in the derivation dataset (n = 2172; AF label, n = 276; SR label, n = 1896), the mean age was 60.1 ± 13.6 years at the initial visit, and 1170 patients (53.9%) were male.
The performance of AI-enabled ECG for CNN algorithm 1, 2, and 3 are shown in Table 1A and The performance was mostly similar between CNN algorithm 2 and 3, which was a little bit higher than that of CNN algorithm 1.

Testing the performance of AI-enabled ECG in the extra testing datasets
The results of testing the performance of AI-enabled ECG for CNN algorithm 1, 2, and 3 in the extra testing dataset 1 are shown in Table 1B and Fig. 4B. The AUC of AI-enabled ECG was 0.75 (0.72-0.77) for CNN algorithm 1, 0.81 (0.79-0.83) for CNN algorithm 2, and 0.78 (0.76-0.80) for CNN algorithm 3. The accuracy was 0.72 (0.71-0.74) for CNN algorithm 1, 0.78 (0.77-0.80) for CNN algorithm 2, and 0.70 (0.68-0.71) for CNN algorithm 3. The performance of AI-enabled ECG in the extra testing dataset 1 was a little bit lower than that in the derivation dataset, whereas their patterns in the three CNN algorithms were similar to those in the derivation dataset.
The results of testing the performance of AI-enabled ECG for CNN algorithm 1, 2, and 3 in the extra testing dataset 2 are shown in Table 2A and Fig. 5A. Commonly among the three CNN algorithms, when SR-ECG was taken before AF-ECG, the accuracy of AI-enabled ECG increased according to the timing of SR-ECG became close to AF-ECG. Meanwhile, when SR-ECG was taken after AF-ECG, the accuracy of AI-enabled ECG was mostly similar irrespective of the length of time between SR-ECG and AF-ECG. The accuracy of AI-enabled ECG was generally higher in CNN algorithm 3 compared with that in CNN algorithm 1 and 2.
The results of testing the performance of AI-enabled ECG for CNN algorithm 1, 2, and 3 in the extra testing dataset 3 are shown in Table 2B and Fig. 5B. Commonly among the three CNN algorithms, the accuracy of AI-enabled ECG was mostly similar irrespective of the length of the observation period. The accuracy of AI-enabled ECG was generally higher in CNN algorithm 1 and 2 compared with that in CNN algorithm 3.

Major findings
In the present study, we developed AI-enabled ECG to predict AF using 12-lead SR-ECG in patients without structural heart disease by three patterns according to the timing of the index SR-ECG, and thereafter confirmed the performance in patients with structural heart disease. The AUC of AI-enabled ECG was higher when the algorithm included SR-ECG taken after the AF-ECG (0.88 and 0.86 for CNN algorithm 2 and 3 compared to 0.83 for CNN algorithm 1). Similar tendency was observed when the AI-enabled ECG was tested in patients with structural heart disease (0.81 and 0.78 for CNN algorithm 2 and 3 compared to 0.75 for CNN algorithm 1).

Comparison with previous studies
AI-enabled ECG to predict AF using 12-lead SR-ECG has already been reported by multiple study groups [5][6][7]. These groups reported a high predictive ability for AF using the AUC, which was 0.90 in the study by Attia et al. and 0.87 in the study by Raghunath et al. It was quite surprising that SR-ECG can predict AF with such a high predictive capability. In the present study, we obtained a similar AUC of 0.88 and 0.86 when SR-ECG in AF label was taken after and before/after, respectively, the index AF-ECG.
Such studies were based on the hypothesis that the AF signature due to structural changes in the atria can be identified by 12-lead ECG during SR [5,14] because structural changes in the atria predispose to atrial arrhythmia [15]. Moreover, in our previous study using hundreds of ECG parameters analyzed with the random forest algorithm, the importance of ECG parameters in predicting AF was similar in the P wave, QRS complex, and ST-T segment, which suggested that structural changes in the ventricle, presumably due to aging or atherosclerosis, seem to be similarly important [16].  Fig. 4.

Characteristics and clinical implications of the AI-enabled ECG in the present study
Attia et al. [5] enrolled 180,922 patients with 649,931 SR-ECGs in their landmark paper, and Raghunath et al. [6] enrolled 430,909 patients with 16,234,87 ECG recordings in their prospective study. However, in the present study, we used only 276 and 1896 patients for AFlabel and SR-label, respectively, with, at most, 2,994 and 3,792 ECG samples, which attained AUC over 0.8 and near to 0.9. Of course, it is not surprising that our AI-enabled ECG showed a high performance with a small number of sampling because we simplified our model by 1) excluding patients with structural heart disease, 2) restricting SR-ECG with the AF label to patients within 31 days from the first AF-ECG, 3) restricting SR-ECG with the SR label to patients with a follow-up period of ≥ 1,095 days, and 4) taking a balance of the number of samples using an over sampling method. Of note, through our model, we believe we can learn some points how to increase the performance of AI-enabled ECG to predict AF on SR-ECG.
First, the timing of SR-ECG to AF-ECG in AF-label would be important. In the present study, we developed three patterns of AI-enabled ECG deriving from three patterns of AF-label, where the index SR-ECG was taken before, after, or before-or-after the first AF-ECG (CNN algorithm 1, 2, and 3, respectively). When we compared their performance, the AUC was higher when the algorithm included SR-ECG taken after the AF-ECG (0.88 and 0.86 for CNN algorithm 2 and 3) than when it did not (0.83 for CNN algorithm 1). Based on the result, we can learn two points: Fig. 4. ROC curves for prediction of AF by AI-enabled ECG using SR-ECG. A. Derivation dataset. B. Extra testing dataset 1. The detail measurement of the performance is displayed in Table 1. ROC, receiver operating characteristic; AF, atrial fibrillation; AI, artificial intelligence; ECG, electrocardiography; SR, sinus rhythm.
(1) AI-enabled ECG can detect the structural changes in atrium before AF incidence because the AUC was over 0.8 even when the model included only the information before the first AF-ECG (CNN algorithm 1), and (2) AI-enabled ECG enhanced the performance remarkably when the information after the first AF-ECG was included (the AUC increased near to 0.9 in CNN algorithm 2 and 3). These results are supported by previous clinical reports. In a study that investigated the changes in atrium between before and after direct cardioversion for AF, left atrial dimension did not decrease at 1 week after the cardioversion (before and after cardioversion, 44.8 mm and 44.3 mm) [17]. Meanwhile, another similar study showed that left atrial volume index decreased at 1 month after cardioversion (41.12 mL/m 2 and 37.56 mL/m 2 ) [18]. Therefore, significant structural changes occur in atrium when AF happens and they may still remain within 1 week even after the cardiac rhythm restores to the sinus rhythm [17] but remarkably improved at 1 month [18]. On the other hand, in a sub-analysis of the Multi-Ethnic Study of Atherosclerosis study, an increment of left atrial volume index or a decrease of left atrial physiological function over time predict the incidence of AF [19]. It would be an attractive concept that AI-enabled ECG can predict the future incidence of AF. But this concept is not so realistic. For example, although Attia et al. demonstrated a high AUC (0.900) with SR-ECG within 31 days of AF-ECG [5], the AUC decreased (0.71-0.73) for prediction after approximately 2 years and further decreased to ~ 0.60 for prediction at ≥ 4 years [7]. On the other hand, in a practical viewpoint, AI-enabled ECG should have a more important role to detect the alreadyexistent AF (sometimes, asymptomatic AF) with a high sensitivity. And, when the aim of AI-enabled ECG is to increase the sensitivity to detect the already-existent AF, our results suggest that the information of the SR-ECG after the first AF-ECG should be included in the development of AI-enabled ECG. In addition, when we tested the accuracy of the AIenabled ECG in diagnosing AF on SR-ECG in a different timing from the first AF-ECG (Fig. 5), the accuracy was constantly high in CNN algorithm 3 (SR-ECG was taken both before and after AF-ECG). Therefore, to maximize the sensitivity to detect the already-existent AF among the three patterns of AF-label, the pattern of AF label in the CNN algorithm 3 seems to be the best.
Second, the tendency in the performance of three patterns of CNN algorithm in patients without structural heart disease was similar when they were applied to those with structural heart disease. Because patients with structural heart disease have various patterns of typical ECG features, especially in ST-T segment, we assumed that the existence of structural heart disease should increase the branches of patterns that AIenabled ECG should learn and would affect its performance. Therefore, in the present study, by excluding patients with structural heart disease from our model, we intended to limit the variations in the derivation dataset. When we applied our model to patients with structural heart disease, the AUC was 0.75, 0.81, and 0.78 for CNN algorithm 1, 2, and 3, which were lower than the AUCs in patients without structural heart disease, but the tendency in the performance of three patterns of CNN algorithm was similar. Although the AUCs of AI-enabled ECG developed in patients without structural heart disease were lower in those with structural heart disease, they remained around 0.8, suggesting the AIenabled ECG works to some extent beyond the absence or existence of structural heart disease. This may be because, even in patients with structural heart disease, typical ECG characteristics present only in limited patients with relatively severe conditions. These results possibly suggest that including patients with structural heart disease may not be a matter for developing AI-enabled ECG to detect AF on SR-ECG. Therefore, our model could be extrapolated to those with structural heart disease to some extent, but its utility requires further investigation.
The AI-enabled ECG to detect AF on SR-ECG may be a candidate to be incorporated in the technology tools supporting the guidelinesrecommended integrated management of AF [1] by the following reasons. First, our findings suggest that AI-enabled ECG to predict AF on sinus-rhythm ECG can provide an aid for screening paroxysmal AF, especially in those who are strongly suspected of having AF (i.e., history of embolic stroke of undetermined sources and accumulation of risk factors for AF). Patients who have a high AF probability under the AIenabled ECG would be candidates for screening AF more vigorously. Second, even in patients who are already diagnosed as paroxysmal AF, AI-enabled ECG to predict AF on sinus-rhythm ECG can provide information whether AF occurs recently (i.e., within 1 month). This would serve as simple information for managing AF patients under rhythm control therapy.

Limitations
There are several limitations of the present study that should be highlighted. First, although we limited SR-ECG recordings with the SR label to patients followed up for ≥ 1,095 days, there remained a possibility that undetected AF existed in patients with the SR label. Second, our study excluded patients with structural heart disease and can thus only be applied to similar populations. Third, our model of AI-enabled ECG should be verified against external datasets to confirm the generalizability.

Conclusions
We confirmed high performance of AI-enabled ECG to detect AF on SR-ECG in patients without structural heart disease. The performance Bold numbers with shadow indicate the data of the derivation dataset. Given that the accuracy for the single label (AF label in extra testing dataset 2 and SR label in extra testing dataset 3) was equal to sensitivity (for AF label) or specificity (for SR label), the bold numbers in the Table 2A and 2B were the sensitivity and the specificity, respectively, in the derivation dataset for each CNN algorithm. These data are visualized in Fig. 5. CNN, convolutional neural network; AF, atrial fibrillation; SR, sinus rhythm.
enhanced especially when SR-ECG after index AF-ECG was included in the algorithm, which was consistent in patients with structural heart disease.

Ethics and informed consent
This study was performed in accordance with the ethical norms based on the Declaration of Helsinki (revised in 2013) and Ethical Guidelines for Medical and Health Research Involving Human Subjects (Public Notice of the Ministry of Education, Culture, Sports, Science and Technology and the Ministry of Health, Labour and Welfare, Japan, issued in 2017). Written informed consent was obtained from all participants. The study protocol was reviewed by the Institutional Review Board of the Cardiovascular Institute.

Consent for publication
Not applicable.

Availability of data and materials
Data cannot be shared publicly because of lack of such a description in the study protocol and informed consent. Data are available from the Ethics Review Committee at the Cardiovascular Institute for researchers who meet the criteria for access to confidential data (contact via the corresponding author).

Funding
This study was partially supported by the Practical Research Project for Life-Style related Diseases including Cardiovascular Diseases and