Data for a population based cohort study on abnormal findings of electrocardiograms (ECG), recorded during follow-up periodic examinations, and their association with long-term cardiovascular morbidity and all-cause mortality

In this Data in Brief article, we provide data of the cohort and statistical methods of the research- “Incidental abnormal ECG findings and long-term cardiovascular morbidity and all-cause mortality: a population based prospective study” (Goldman et al., 2019). Extended description of statistical analysis as well as data of cohort baseline characteristics and baseline ECG incidental abnormal findings of 2601 Israeli men and women without known cardiovascular disease (CVD) is presented. The cohort is part of the Israel study of Glucose Intolerance, Obesity and Hypertension (GOH) (Dankner et al., 2007). Furthermore, we provide the data on the performance assessment of the 23 - year CVD-risk and the 31- year all-cause mortality prediction models, which includes Receiver Operating Characteristic (ROC) curves, reclassification-based measures and calibration curve.

© 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data
In this Data in Brief article, we provide the baseline characteristics of the total glucose intolerance, obesity and hypertension (GOH) Israel cohort [2] and Phase-3 CVD incidence for the active follow-up subsample (Table 1). We describe the incidental ECG abnormalities frequencies of the cohort at baseline ( Table 2) and summarize the CVD and all-cause mortality according to normal vs. abnormal ECG status ( Table 3). The statistical methods for assesing the performance measures of the CVD and allcause mortality risk prediction models are detailed in 2.1, followed by a summary of these measures Specifications Table   Subject Cardiology and Cardiovascular Medicine Specific subject area ECG testing as a primary prevention screening tool in adults without known CVD for early detection of CVD risk and all-cause mortality Type of data Tables Graph Figure  How data were acquired Questionnaires, interviews, physical examination (including anthropometric measurements), laboratory blood tests and ECG recording, performed at regional medical centres or at the homes of the cohort members. Data format Analysed Filtered Parameters for data collection CVD incidence was determined according to self-reported past myocardial infarction (MI), cerebrovascular accident, peripheral artery disease (PAD) or "other cardiovascular disease" or phase 3 ECG findings of "past MI" or "evidence of myocardial ischemia". All-cause mortality and date of death were recorded from the Israel population registry (May 2017).

Description of data collection
Prospective cohort of 2769 adult men and women randomly selected from the Israel population registry. They were invited to regional clinics during baseline (1979e1984) and during active follow-up (1999e2008) and the data parameters were collected. Several individuals were visited at their homes during the active follow-up since they were too old or had difficulties to travel to the regional clinic. Value of the data These data are important for understanding and interpretation of the potential benefits of the ECG as a screening tool as described in our study [1]. Clinicians and researchers working in the fields of CVD and diabetes primary prevention, CVD risk prediction and individual's CVD risk stratification. The full description of the methods, results and prediction models performance measures provide deeper insights regarding CVD risk factors and CVD primary prevention. These data provide a unique opportunity to follow a high validity data of a representative cohort of healthy women and men over 4 decades for CVD prognostic factors, including baseline ECG findings.
( Table 4). The full data of the Net Reclassification Improvement (NRI) following the addition of ECG incidental findings to CVD risk prediction models is also presented ( Table 5). Fig. 1 shows the ROC curves of CVD risk prediction with vs. without ECG incidental findings. Fig. 2 present the All-cause mortality risk prediction Cox model calibration curve. Table 1 Baseline characteristics of the total glucose intolerance, obesity and hypertension (GOH) Israel cohort and Phase 3 CVD incidence active follow-up subsample.  More than one finding was recorded for some individuals. Individuals with the following findings were excluded: Single chamber pacemaker, dual chamber pacemaker and past MI. 2. Experimental design, materials, and methods

Assessment of performance measures for CVD and all-cause mortality risk prediction models -statistical methods
To evaluate discrimination improvement, we compared the C-index of the prediction model with traditional CVD risk factors and a model with additional ECG findings. The C-index for the CVD prediction model by logistic regression was calculated by the area under the receiver operating characteristic curve, whereas the C-index for all-cause mortality prediction was calculated by C-index adaption for Cox proportional hazard regression, as proposed by Harrell et al. [4], with the confidence interval calculated by bootstrap resampling with 200 repetitions. We assessed net reclassification improvement (NRI) when incidental ECG findings are added to traditional CVD risk factors at individual risk stratification. The NRI was estimated as described by Pencina et al. [5]: NRI ¼½ ðnumber of events reclassified higher À number of events reclassified lowerÞ number of events À ðnumber of non À events reclassified higher À number of non À events reclassified lowerÞ number of non À events For this purpose, we defined cutoffs for the likelihood to reach the outcome of interest, by adjusting the ACC/AHA [6] risk categories (low, intermediate and high risk) to the increased duration of followup, from 10% to 20%e20% and 30%, similar to the Framingham study extension method [7]. We estimated the improvement in reclassification also by continuous NRI measure and the integrated discrimination index (IDI), which are not affected by the chosen cutoff values, in contrast to the NRI measure. Continuous NRI relies on the proportion of individuals with outcome correctly assigned a higher probability and individuals without outcome correctly assigned lower probability, by the new model. IDI reflects the average increase in predicted risk among cases plus the analogous average decrease among controls [5]. Calibration curve of 2520 model 2 participants in all-cause death multivariable analysis. Bootstrap resampling with 200 repetitions for 30-year survival prediction.