Dual - Stage Learning Approach Towards Continuous Cuffless Blood Pressure Monitoring

in Abstract - Hypertension, being one of the associated factors of cardiovascular diseases needs to be monitored on the continuous manner to track the rapid BP changes. The paper proposes a dual-stage blood pressure estimation approach using the suitable features from Photoplethysmogram and machine learning models. The method initially classifies the given data among 4 classes given by British Hypertension Society (BHS). Further, the classified data is predicted from one of the four models. The 105 medical records consisting of clinical and digitalized signal data of 125 Hz are taken from the MIMIC-III database for the process. The dual-stage approach for the classification and estimation of BP outperforms the existing method by relative improvement in the MAE and RMSE by 64.4% and 36.37 % for systolic BP and 40.1% and 22.9% for diastolic BP respectively.


Introduction
Hypertension or raised Blood Pressure (BP) condition is one of the fatal complications associated with cardiovascular diseases. The control and management of BP serves a vital role because of the maintaining the BP variability over a long term is essential in slowing down the target organ damage. Research in the development of BP monitoring devices reveals the method of using features from physiological biosignals to estimate BP. The mathematical model that describes this nature is Moens -Korteweg Formula. It relates the BP to the velocity the pulse wave propagates. It is given by The E is the elastic modulus of the artery, 'a' is the arterial thickness, d is the radius of the artery and ρ is blood density. Considering the ρ is approximately equal to 1.055, the velocity of the pulse wave depends on the 3 factors, E, a and d. The most efficient way of measuring the blood volume changes to study the properties of the hemodynamic system is made possible by the origination of Photoplethysmogram (PPG).
Some of the preliminary model are regression models that relate BP with features of PPG. The continuous estimation of BP with the well-optimized regression model performs better on a subjective basis, i.e., the system is customized for each subject by training with the first few data instances from the beginning of the record and predicting BP for the forthcoming instances. The above reported methodologies work in this manner and also illustrates the need for periodic calibration with the measurements taken from standard BP monitors. A generalized model that can continuously estimate BP without any subjective bias and reduced calibration is essential for the implementation in the home BP monitoring systems.
There are very few works that illustrate the formation of a generalized model. [1] proposed the method of using the features extracted from ECG and PPG to form a generalized regression model using the MIMIC-II database. The wavelet decomposition works best for any type of medical data for processing. The models that were based on Adaboost regression and random forests outperforms the other regression models. But the problem of one-to-many feature mapping can be observed, i.e., the same range of feature values are mapped to different BP values thus increasing the error percentage of the estimation process. While He et al.
implemented the random forest model that employed the pulse width features, the patient-specific details were unavailable for further processing.
The patient-specific factors like the gender, age, height, weight, etc., influence the variability in the BP. The effect of age and gender in the autonomic control of BP was studied by [2]. The study was conducted on 41 males and 48 females aged between 20 and 83. It was observed that the women gender is highly associated with the low-frequency variability of BP due to the level of estrogen and lower plasma norepinephrine. The hormonal differences in the gender affects the autonomic control of BP. The aging reduces the sympathetic vasomotor responsiveness that reduces the low-frequency BP oscillations [12][13]. Therefore, the gender and the aging are one of the critical factors in the regulation of BP. The non-parametric approaches for BP estimation in the continuous manner has been tested for different datasets, but the improvement in the performance of the system for long-term monitoring is required. A dual-stage approach for the classification and estimation of Blood Pressure is proposed in this paper.

Methodology
A dual-stage approach for the beat-to-beat estimation of Blood Pressure has been proposed in this work. It involves the classification of the input feature vector based on the different stages of hypertension as multiclass classification problem [3]. The classified feature vector is then applied to the regression models that is specific to the class for the beat-to-beat estimation of BP.

Advances in Computing, Communication, Automation and Biomedical Technology
The process flow for the dual-stage approach for the classification and estimation of BP is shown in Figure 1. The main processes in the dual-stage approach for the beat-to-beat BP estimation includes the feature extraction, Blood Pressure Group classification and multi-class regression model generation [4].

Feature Extraction
The database that houses the waveform and the patient clinical data is essential for the implementation of the algorithm. The waveform data includes the Pulse Wave (PPG) and the Arterial Blood Pressure (ABP) waveform whereas the clinical data includes the patient specifications like demographic details (age, gender, height, weight, etc.), the clinical summary, drug dosage requirements and treatment plan. For the first -level implementation of the system, we suggest including the demographic details of the subject along with the contourbased features for the BP classification process [5]. The Figure 2 shows the pulse contour features extracted from the pulse wave.

Blood Pressure Classification Stage
The next stage in the BP estimation process is the classification of the data based on the different stages of hypertension. The class separation approach is based on the Seventh report of the Joint National Committee (JNC) on Prevention, Detection, Evaluation and Treatment of High Blood Pressure [6]. For the evaluation and the early detection of the blood pressure related ailments, the team has reported the classes of BP based on the ranges of the systolic and the diastolic BP. They are categorized as normal, prehypertension, Stage-I hypertension and Stage -II hypertension [7]. The BP ranges for these categories are listed in Table 1. The pulse wave data along with the subject specifics are employed to classify each data instances into particular classes. The classification process here is modeled as multi-class problem. The inclusion of patient-specific details can improve the classifier performance as the age, gender and other specifics influence the BP variability.

Multi-Class Regression Stage
The second stage of the process is to build the classspecific regression models for the continuous prediction of BP. Unlike the other methods previously mentioned, this stage creates regression model from the data instances that belong to each class [8]. The data initially are randomly divided into training dataset and testing dataset. The training data are again divided into 4 groups based on the BP ranges [9]. The categorized training data are employed to create regression models that are classspecific. When the testing data are categorized from the classification stage, the classified data instances are fed into their respective regression models. A n-class problem consists of n-regression models and the data instances pass through the regression model that is specific to the particular class [10], [11]. This enables for a generalized BP estimation method that can improve the estimation performance of the system.

Data Sources
The MIMIC-II matched subset database is used for the study. The chosen database has the subject waveform record and its associated clinical record. The clinical record is required to collect the patient demographics namely the age, gender, height, and weight. The subject's Body Mass Index (BMI) is also calculated to include as a feature. The BMI is calculated as A total of 104 dataset with the waveform and clinical data is chosen for the study (72 males and 32 females). In addition to the demographic information, the pulse contour based features are to be used as input for the classification and regression process. The features are extracted from the pulse wave of 1-minute duration from all the records. The feature set obtained are randomly divided into training dataset (80% -5780 data instances) and testing dataset (20 % -1925 data instances). The classifiers are build using 10-fold cross-validation process. The feature vector consists of 18 parameters where there are 13 pulse wave features and 5 patientspecific features.

Results and Discussion
The classification process consists of PPG features classified under 4 classes. The 4 networks namely Multi-Lyer Perceptron, SVM with Radial Basis Function, Random Forests, Bagging and Adaboosting classifier were implemented and tested. The performance of these algorithms in learning the BP class is listed in Table 2.

MLP -Multi-Layer Perceptron
The beat-to-beat estimation of blood pressure is performed using the class-wise regression model created from the data instances that fall within the particular class. The test data instances that are categorized by the AdaBoost classifier is fed to the regression system to obtain the beat-to-beat BP predictions. The performance of the BP prediction is evaluated using the Mean Absolute Error (MAE) and Root Means Square Error (RMSE) measures calculated from the actual and estimated BP.
The comparison with different regression algorithms proves that the random forest outperforms the beat-to-beat estimation process. The predictions from multiple base estimators suppresses the variance in the error measures. In the similar manner, the performance measures of the random forest regression for each class are tabulated in Table 3. The MAE and RMSE measures for every class conforms within the AAMI standards (MAE ± RMSE ≤ 5 ± 8 mmHg). As shown in Figure 3, the performance measures of the AdaBoost classifier for each class shows that the Class 0 and Class 3 are highly distinctive by the increased sensitivity and specificity. This is because the data instances under the class 1 and class 2 can get misclassified to class 0 and class 3 respectively due to the slightest variation in the feature ranges. The slighter overlapping of the feature ranges into more than two classes results in the reduced sensitivity and specificity in class 1 and class 2. The maximum depth of the regression tree and the number of regression trees in the random forests influence the prediction of the BP. The depth of the tree that is half the number of features is essential because further depth can introduce overfitting of training data. Similarly, the number of trees is restricted to 500 beyond which the data overfitting occurs.
The dual-stage approach for the classification and estimation of BP outperforms the existing method by relative improvement in the MAE and RMSE by 64.4% and 36.37 % for systolic BP and 40.1% and 22.9% for diastolic BP respectively. The evaluation of the regression models can be done graphically and by computing the error measures. The graphical means of evaluation refers to the Bland-Altman plot. The Bland-Altman plot is the graphical method is used to visualize the comparison of two methods of BP measurement. The error difference between the BP from the reference method and BP from the proposed method is plotted against the average of the two methods.  These plots depict the degree of agreement between the standard BP measurement method and the proposed method. This decides the clinically acceptability of the method. If the data points lie within the 95% Confidence Intervals (CI) specified by (Mean ± 1.96*SD), then the method is acceptable. Based on the AAMI standards, the 95% CI are defined as (5 ± 1.96*8) mm Hg which yields the limits of ± 20.92 mm Hg. The Figures 4 and 5 illustrates that the majority of the data points lie within the 95% CI and the boundary margins are lesser than the specified standards. Therefore, the proposed method is shown to be clinical acceptable from the given data and further testing of the method can result in the practical use in the clinical care.

Conclusion
This paper proposes the dual-stage approach to estimate BP from the features extracted from Photoplethysmogram. The class-specific separation has significantly reduced the error measures and therefore it is seen that it is essential for class separation based on the BP class to improve the performance of the beat-to-beat estimation system. Further, the application of Photoplethysmogram as the one and only input to continuously monitor BP can be extended for use in the ehealth based systems.