Classification of electrocardiogram and auscultatory blood pressure signals using machine learning models

doi:10.1016/j.eswa.2014.12.023

Expert Systems with Applications

Volume 42, Issue 7, 1 May 2015, Pages 3643-3652

https://doi.org/10.1016/j.eswa.2014.12.023 Get rights and content

Highlights

•
Medical data classification problems with two real data sets are investigated.
•
A literature review on biomedical signal processing techniques is provided.
•
The data sets are corrupted with noise to assess the robustness of different models.
•
The logistic regression model produces the best results in noise-free environments.
•
Ensemble-based learning model yields the best results in noisy environments.

Abstract

In this paper, two real-world medical classification problems using electrocardiogram (ECG) and auscultatory blood pressure (Korotkoff) signals are examined. A total of nine machine learning models are applied to perform classification of the medical data sets. A number of useful performance metrics which include accuracy, sensitivity, specificity, as well as the area under the receiver operating characteristic curve are computed. In addition to the original data sets, noisy data sets are generated to evaluate the robustness of the classifiers against noise. The 10-fold cross validation method is used to compute the performance statistics, in order to ensure statistically reliable results pertaining to classification of the ECG and Korotkoff signals are produced. The outcomes indicate that while logistic regression models perform the best with the original data set, ensemble machine learning models achieve good accuracy rates with noisy data sets.

Introduction

Data classification constitutes one of the fundamental requirements in undertaking many decision-making tasks (Örkcü & Bal, 2011). A classification task involves building a model that depicts a mapping from the input feature space to the target output space (Oza & Tumer, 2008). In general, there are a number of classification methods, which include statistical methods, mathematical programming methods, and a variety of machine learning methods (Örkcü & Bal, 2011). Researchers in the medical domain have used many methods to perform data classification. Methods with higher classification accuracy are desirable to correctly identify potential diseases; therefore improving diagnosis accuracy (Fan, Chang, Lin, & Hsieh, 2011).

The main contribution of this study is a comprehensive performance evaluation and analysis pertaining to a number of machine learning models for undertaking real medical data classification problems. Specifically, we use two sets of real data collected from patients, i.e., the electrocardiogram (ECG) and auscultatory blood pressure (Korotkoff) signals. ECGs are signals related to electrical activity of the heart, which can be recorded by placing surface electrodes on a patient’s body (Mitra, Mitra, & Chaudhuri, 2006). It is an effective non-invasive clinical tool for the diagnosis of certain cardiovascular diseases, and it provides useful information pertaining to pathological physiology of heart activity (Chen & Yu, 2012). ECG signals carry valuable information about the heart function, and provide a cardiologist with useful insight about the rhythm and functioning of the heart (Chen & Yu, 2012).

As stated in Mele (2008), an estimated 300 million ECGs are performed each year. As such, there is a clear need for reliable and accurate interpretation tools of ECG readings. Although trained cardiologists can discover different cardiac abnormalities in ECG recordings, it is time-consuming and laborious for them to examine a large number of ECG recordings (Kiranyaz, Ince, Pulkkinen, & Gabbouj, 2011). Moreover, visual inspection can take considerable time, and some vital information can be neglected due to fatigue in carrying out the tedious manual procedure (Sun, Lu, Yang, & Li, 2012). As such, automated tools to help accurately analyze a large number of ECG data samples are required. Similarly, blood pressure (BP) is an established link in determining coronary heart disease and cardiovascular incidents (Mendiola, Luna, Guerra, & Ramírez, 2013). The most commonly used methods for measuring BP in clinical activities is the use of a manual sphygmomanometer and a stethoscope to detect the Korotkoff sounds (Kurl et al., 2001). Korotkoff waveforms are one of the most reliable means for monitoring blood pressure (Mendiola et al., 2013). To exploit the advantages of using computerized tools to help medical prognosis and diagnosis tasks, an empirical study to evaluate a variety of machine learning models for classification of both ECG and Korotkoff signals is undertaken in this study.

In respect to machine learning models, the artificial neural networks (ANNs) are popular methods for tackling medical diagnosis problems (Al-Shayea, 2011). There are a number of advantages of using ANNs for medical data classification, e.g. a detailed mathematical model that relates the input features and the target outputs is not necessary. In addition, ANNs have the ability to learn complex relationship in data samples. Recent advancements in ANNs have shown their usefulness in analyzing signals, which has opened up the possibility of solving problems typically not possible with some existing signal processing techniques (Übeyli & Güler, 2005). On the other hand, model-based signal processing is a new method to describe physiological systems (Porta, Baselli, & Cerutti, 2006). The method is useful for short-term cardiovascular control and analysis of cardiovascular regulation mechanisms (Porta et al., 2006). Besides that, other medical signal processing methods have been used in many different application, e.g. in synthesized and real biomedical signals using frequency-domain methods (Mitov, 1998), medical ultrasound signals using a fast wavelet-based edge method (Nes, 2012), and a multi-sensor fetal movement detection system based on a time–frequency signal processing method (Boashash, Khlif, Ben-Jabeur, East, & Colditz, 2014). A review of different machine learning and related methods for medical applications is presented in Section 2.

The organization of this paper is as follows. After a literature review in Section 2, a description on signal pre-processing, experimental setup, and the background of different classifiers used for experimentation is presented in Section 3. In Section 4, two real medical case studies consisting of ECG and Korotkoff signals are detailed. Finally, conclusions and suggestions for further work are presented in Section 5.

Section snippets

Literature review

Biomedical signal processing is becoming an essential feature in many advanced medical equipment, and is widely used in clinical and biomedical research (Simpson, De Stefano, Allen, & Lutman, 2005). In this review, a number of statistical methods, machine learning models, and other related techniques for medical signal processing are reviewed. The details are as follows.

Data pre-processing and classification

In this section, the acquisition and pre-processing steps of the ECG and Korotkoff signals are detailed. This is then followed by an explanation on the types of classifiers used in the study.

Results and discussion

The experimental study was conducted using Weka (Waikato Environment for Knowledge Analysis) (Hall et al., 2009), a popular suite of machine learning and other models. The k-fold cross validation technique, where k = 10, was employed. As such, the data samples were split into ten equal subsets. Each subset contained approximately the same proportion of data samples from each target class. Nine subsets were used for training, while the remaining was used for performance evaluation. This procedure

Conclusions

Machine learning models are useful for biomedical signal processing as presented in this study whereby abnormal patterns suggesting risks of cardiovascular diseases can be recognized. The main contributions of this study are twofold. The first contribution is a comprehensive review focusing on recent advances pertaining to biomedical signal processing, as presented in Section 2 and summarized in Table 1. The second contribution is an empirical extended evaluation and analysis pertaining to a