Evaluation of electrohysterogram measured from different gestational weeks for recognizing preterm delivery: a preliminary study using random Forest

Developing a computational method for recognizing preterm delivery is important for timely diagnosis and treatment of preterm delivery. The main aim of this study was to evaluate electrohysterogram (EHG) signals recorded at different gestational weeks for recognizing the preterm delivery using random forest (RF). EHG signals from 300 pregnant women were divided into two groups depending on when the signals were recorded: i) preterm and term delivery with EHG recorded before the 26th week of gestation (denoted by PE and TE group), and ii) preterm and term delivery with EHG recorded during or after the 26th week of gestation (denoted by PL and TL group). 31 linear features and nonlinear features were derived from each EHG signal, and then compared comprehensively within PE and TE group, and PL and TL group. After employing the adaptive synthetic sampling approach and six-fold cross-validation, the accuracy (ACC), sensitivity, specificity and area under the curve (AUC) were applied to evaluate RF classification. For PL and TL group, RF achieved the ACC of 0.93, sensitivity of 0.89, specificity of 0.97, and AUC of 0.80. Similarly, their corresponding values were 0.92, 0.88, 0.96 and 0.88 for PE and TE group, indicating that RF could be used to recognize preterm delivery effectively with EHG signals recorded before the 26th week of gestation.


Introduction
Preterm delivery, defined as birth before 37 completed weeks of gestation, is a leading cause of neonatal morbidity and mortality, and has long-term adverse consequences for fetal health [1]. Accurate diagnosis of preterm delivery is one of the most significant problems faced by obstetricians. The existing measurement techniques for diagnosing preterm delivery include tocodynamometer (TOCO), ultrasound and fetal fibronectin. However, they are subjective, or suffer from high measurement variability and inaccurate diagnosis or prediction of preterm delivery [2]. TOCO is often influenced by sensor position, the tightness of binding by the examiner and maternal movement. Short cervical length measured by transvaginal ultrasonography has been associated with an increased risk of preterm delivery. But its accuracy for prediction of preterm delivery is not satisfied due to the high false positive rate. Fetal fibronectin test, which is performed like a pap smear, has not been shown to accurately predict preterm delivery in women who are at low risk or who have no obvious symptoms. Comparatively, electrohysterogram (EHG) which reflects the sum of the electrical activities of the uterine cells could be recorded noninvasively from the abdominal surface. The parameters of EHG signals might provide an effective tool for the diagnosis and prediction of preterm delivery [3]. Therefore, using EHG signal is a reliable method at evaluating uterine activity and it has been used in analyzing uterine activity of non-pregnant women as well [4].
Many features have been extracted from EHG signals to recognize preterm delivery, which can be grouped into three classes: linear features, nonlinear features and features related to EHG propagation [5]. Time, frequency and timefrequency features, such as root mean square, median frequency, peak frequency and energy distribution have been used to characterize EHG signals and distinguish between term and preterm delivery [5][6][7]. Besides, nonlinear features, including correlation dimension (CorrDim) [8], sample entropy (SampEn) [9], Lyapunov exponent (LE) [10], and multivariate multiscale fuzzy entropy [11] have been applied to describe the nonlinear interactions between billions of myometrium cells [12,13]. In recent years, the propagation velocity, direction of the EHG signals, intrinsic mode functions from empirical model decomposition (EMD) [14] have been proposed as the potential discriminators to predict the progress of pregnancy. However, selection of EHG features was somehow arbitrary in these published studies. A comprehensive analysis of these feature differences between preterm and term delivery would therefore be clinically and physiologically useful.
Machine-learning algorithms have been investigated to recognize the preterm delivery using EHG signals [15].
Conventional classifiers include the K-nearest neighbors (K-NN), linear and quadratic discriminant analysis (LDA and QDA, respectively), support vector machine (SVM) [6], artificial neural network (ANN) classifiers [8,16,17], decision tree (DT) [18], penalized logistic regression, rule-based classifier [19] and stacked sparse autoencoder (SSAE) [20]. However, the K value of the K-NN classifier is set subjectively, LDA and QDA are affected by sample distribution, ANN and SSAE have high computational complexity [16], and SVM requires additional steps to reduce the dimension of the extracted features [21]. The published studies have reported that ANN, SSAE, Adaboost, DT, SVM, logistic and polynomial classifier have achieved better performance in recognizing preterm delivery. However, these classifiers were evaluated on different database using different EHG features, and therefore unable to determine the most significant features for predicting preterm delivery. Random forest (RF) is an ensemble learning method for classification. DT is the base learner in RF, which has been employed in data mining and feature selection [22]. Classification accuracy could be improved by growing an ensemble of trees and letting them vote for the most popular class. Ren et al. reported that RF with simpler structure achieved the same accuracy as ANN for classifying preterm delivery with EHG signals [17]. Idowu et al. [19] also indicated that RF performed the best and robust learning ability.
The main aim of this study was to evaluate the EHG signals recorded at different gestational weeks for recognizing preterm and term delivery using RF. Meanwhile, the importance of EHG features for predicting preterm delivery would be ranked.

Materials and methods
The overview flowchart of the proposed method in this study is shown in Fig. 1. Briefly, EHG signals from 300 pregnant women were divided into two groups depending on whether the EHG signals were recorded before or after 26 th week of gestation. Thirty-one linear and nonlinear features were then derived from each EHG signal and fed to a RF classifier for automatic identification of term and preterm delivery, and the importance of features was ranked by DTs. The performance of RF for recognizing preterm delivery was then evaluated and compared between EHG signals recorded at different gestational weeks. The details of each step are presented in Fig. 1 Ljubljana, Ljubljana [23]. Three channels of EHG signals were recorded from the abdominal surface using four electrodes, as shown in Fig. 2. Three-channel EHG signals were measured between the topmost electrodes (channel 1: E2-E1), the leftmost electrodes (channel 2: E2-E3), the lower electrodes (channel 3: E4-E3) separately. The recording time was 30 min with the sampling frequency of 20 Hz. A previously published research has confirmed that the EHG from channel 3 was regarded as the most distinguishable signals for classifying preterm and term delivery [17]. Therefore, as a pilot study, channel 3 was selected for further analysis. EHG signals from 300 pregnant women (262 cases of term delivery, and 38 cases of preterm delivery) were divided into two groups depending on when the signals were recorded: i) preterm and term delivery with EHG recorded before the 26 th week of gestation (denoted by PE and TE group, 19 and 143 cases respectively), and ii) preterm and term delivery with EHG recorded during or after the 26 th week of gestation (denoted by PL and TL group, 19 and 119 cases respectively).Table1 shows the number of EHG recordings in PE and TE group and in PL and TL group. Fig. 3 shows four typical examples of EHG segments from each group.

EHG signal preprocessing
The main frequency component of EHG signal ranges between 0 and 5 Hz [24]. The EHG signals preprocessed by the bandpass filter of 0.08 4 Hz were selected from the TPEHG database, in which the interferences from fetal and maternal electrocardiogram, respiratory movement, motion artifacts and 50/60 Hz power noise had been removed [25]. Furthermore, the first and last 5 min of EHG segments were abandoned to avoid the transient effects due to filtering process [18], and the remaining 20 min EHG signals were used for further analysis.

EHG features extraction
Thirty-one features were extracted with time domain, frequency domain, time-frequency domain and nonlinear analysis as follows.

Root mean square (RMS)
RMS is a conventional method for investigating signal amplitude changes. x i ð Þ; i ¼ 0; . . . ; N 1, N is the signal length, here N = 600. RMS was calculated as:

Autocorrelation zero-crossing(t R xx)
Autocorrelation zero-crossing, t Rxx ; is defined as the first zerocrossing starting at the peak in the autocorrelation R xx (t) of the signal xðtÞ [26]. Considering the data distribution, t Rxx was calculated as: where x i ð Þ is the amplitude of EHG signal at sampling point i.

Peak frequency (PF)
PF corresponds to the largest amplitude peak of the EHG signal power spectrum p which was calculated using the fast discrete Fourier transform of each signal. PF was calculated as follows: where f = 20 Hz is the sampling frequency. MDF is defined as the frequency above where the sums of the parts above and below the frequency-power spectrum P are the same. MDF was calculated follows: where i is the i-th line of the power spectrum.

Mean frequency (MNF)
MNF is the centroid frequency of the power spectrum and is defined as follows: where p i is the i -th line of the power spectrum; f i is the frequency variable; and I is the highest harmonic (I ¼ N 2 Þ. N is the signal length, here N = 600.

Features extracted from autoregressive (AR) model
AR is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step.  where p is the order of AR model, here p = 5. a 1 , a 2 , a 3, a 4, a 5 and residual e were the model features. e m is the white noise.  [33]. Tr was calculated as follows: Where x is a time series with M samples, M = 24,000(20 Hz☓60 s/min☓20 min)and t is the time delay, here t = 1.

Lyapunov exponent (LE)
LE characterizes the rate of separation between adjacent tracks in the phase space. l is a measure of how fast a trajectory converges from a given point into some other trajectory: where d 0 represents the Euclidean distance between two states of the system at some arbitrary time i.

Sample entropy (SampEn)
SampEn measures the irregularity of a time series of finite length. The more unpredictable the time series is, the higher its SampEn.
where u½: is the Heaviside function, t is the limit for the distance between two points on the system trajectory, M is the number of the trajectory points. y is the EHG time series. M C = 23,999(M-1, M = 24,000).

Comparison of EHG features between term and preterm delivery
The mean SD of the derived EHG features were calculated across all the cases in the PE and TE group, and PL and TL group. Non-parametric t-test (Mann-Whitney U test) was performed using SPSS 22 (IBM Corporation, New York, United States) to assess the difference of EHG features between PE and TE, and between PL and TL. A p-value below 0.05 was considered statistically significant.

2.5.
Term and preterm classification 2.5.1. Adaptive synthetic sampling approach (ADASYN) TPEHG dataset is not balanced in term of the sample size between term delivery (majority class, 262 cases) and preterm deliveries (minority class, 38 cases). Classifiers are often more sensitive to the majority class and less sensitive to the minority class, leading to biased classification [27]. ADASYN was employed in this study to oversample the minority class (preterm) to balance the term and preterm samples [28]. Therefore, the sample size of PE increased from 19 to 135 cases, and PL increased from 19 to 111 cases. In total, there were 278 cases in PE and TE group, and 230 cases in PL and TL group (Fig. 5).

Random forest
31 features/case☓278 cases from PE and TE group, and 31 features/case☓230 cases from PL and TL group were respectively divided into subset 1 to n and entered to the base learner DT (tree-1, tree-2, . . ., tree-m) randomly. The value of n was determined by the number of features. The number of features in each subset was chosen randomly but not exceeding the preset maximum. The value of m is the number of base learner DT. The depth d determines the maximum layer each tree can reach. A DT, which is applied to select features, is formed by randomly selected subset of features. The feature importance is ranked based on its influence on the DT prediction

Classification evaluation
Six-fold cross validation method was applied to evaluate the RF performance for classifying preterm and term delivery, independently for the PE and TE group and for the PL and TL group. The PE and TE group, and the PL and TL group were randomly partitioned into six subsets respectively, five of which were employed to train the RF, the other was used to test the RF. The cross-validation process was repeated six times, with each of the six subsets used once as test data. The accuracy (ACC), sensitivity, specificity [29] from the sixfold cross validation were averaged to evaluate the performance of RF classification results, independently for the PE and TE group, and for the PL and TL group. The area under the curve (AUC) from the receiver operating characteristic (ROC) curve was also calculated and compared between the PE and TE group, and the PL and TL group.

Comparison of EHG features between groups of term and preterm delivery
The 31 EHG features from PE and TE group, PL and TL group are summarized in Table 2. PF, SV2; SV3; SV4; SV5; SE3; SE4; SE5; SM3; SM4; SM5; SS2; SS3; SS4; SS5 of wavelet decomposition, a 1 of AR model and CorrDim from PE were significantly larger than those of TE (all p < 0.05), while RMS, MNF and SampEn from PE were significantly smaller than TE (all p < 0.05). SampEn of PL was significantly larger than TL (p < 0.05). No other significant difference was found. The features with significant difference are shown in Fig. 6.

3.2.
Feature importance Table 3 shows the 15 key features which were identified as the best features for recognizing preterm delivery both in PE and TE group, and PL and TL group. The feature importance accounted for less than 0.1 % were a 2 , SM 3 , SV 3 , SV 4 , SS 3 in PE -b i o c y b e r n e t i c s a n d b i o m e d i c a l e n g i n e e r i n g 4 0 ( 2 0 2 0 ) 1 -1 1 Fig. 6 -EHG features from PE and TE, PL and TL groups with significant difference in median (p < 0.05).
and TE group, and a 2, a 3, SE 5, SM 5, SV 4, SV 5 in PL and TL group. It was noticed that SampEn, MDF, MNF, SE4, SM2 and SM4 played important roles on the classification of preterm and term delivery in both PE and TE, PL and TL groups. In particular, SampEn accounted for nearly 70 % of the importance for recognizing preterm delivery.

Evaluation of RF classifier
ROC curves for classifying preterm delivery in PE and TE group, and PL and TL group are shown in Fig. 7. There was no significant difference between the two AUCs from the ROC curves (p = 0.70). As shown in Table 4, RF achieved the ACC of 0.92, sensitivity of 0.88, specificity of 0.96 and AUC of 0.88 for PE and TE group, and ACC of 0.93, sensitivity of 0.89, specificity of 0.97, and AUC of 0.80 for PL and TL group. Table 4 summarizes the performance of RF model in this study in terms of ACC, sensitivity, specificity and AUC, in comparison with the previously published papers using TPEHG database [8,9,11,[16][17][18][19]21,28]. All the studies achieved over 80 % ACC and sensitivity. Compared with other studies using TPEHG database, the current study extracted EHG features including 27 linear and 4 nonlinear features more comprehensively. RF classifier which did not require computational complexity, performed a promising result without additional step of pre-selected features in a wider band pass filter of 0.08 4 Hz. The feature importance was ranked by RF based on classification accuracy. After the importance of different features was ranked by DT, SampEn was found to be the most important feature for recognizing preterm delivery. The previous studies concluded that nonlinear methods such as sample entropy [9,20], approximate entropy [8,20] and Shannon entropy [17] can   provide better discrimination between pregnancy and labor contractions compared to linear methods [34]. It is probably because entropy reflects the complex and nonlinear dynamic interactions between myometrium cells [8,23]. SampEn was considered to be particularly suitable for revealing EHG changes in relation to pregnancy progression and labor [33]. RF classifier could obtain the promising results as the previous studies illustrated [17,19]. The performance of recognizing preterm delivery was influenced by the cut-off frequency of filter and the extracted features. Jager et al. [28] got the highest classification ACC of 100 % with features from the frequency band of 0.08 5 Hz when using the entire records of TPEHG database. Most of studies used the specific features [9,11,30] or selected features [8,[16][17][18]21] for prediction of preterm delivery, while RF utilized the extracted features without additional feature selection algorithm. Similar to the other studies in Table 4, the current study extracted features from the entire records because there were no annotated contraction intervals or even no contraction during early recordings. Recently, various features and classifiers have been proposed to recognize uterine contraction (UC) with Icelandic 16-electrode database [20,[31][32][33]. As UC detection is necessary for monitoring labor progress, some studies extracted features from EHG bursts [20,32,33] and achieved reliable results of UC detection by machine learning and deep learning algorithms [35][36][37]. A multi-channel system for recognizing uterine activity with EHG signal has also been developed in clinical research [38]. They also provided important ways for recognition of preterm delivery with UC.
ADASYN technique was applied to solve the problem of unbalanced data in our study, though synthetic minority oversampling technique (SMOTE) algorithm has been employed in the previous studies [16][17][18]23]. Compared with ADASYN technique, the synthetic samples generated by SMOTE algorithm may increase the likelihood of data overlapping which will not provide more useful information [12,27]. ADASYN achieved better results for classification of preterm delivery in current study.
The present work has the following limitations. The synthetic data generated by ADASYN is less convincing than the clinically collected EHG data. More clinical EHG signals are essential, in particular from preterm delivery. A comprehensive study has been conducted on various EHG features, however, sixteen of which were from wavelet decomposition coefficients. Therefore, AAR model [9], EMD technique [17], multivariate multiscale entropy features [8] and combination of multi-channel EHG signals [5,11,39] could be investigated to improve the prediction of preterm delivery [39]. Nevertheless, as a pilot study, the positive results from using channel 3 was the first step for evaluating the effectiveness of a RF model. Furthermore, comparison of different classifiers for recognizing preterm delivery could be considered in future study.

Conclusion
In current study, sample entropy played the most important role on recognizing preterm delivery among the 31 extracted features. RF classifier was a promising method without additional steps of selecting features. EHG signals recorded before the 26 th week of gestation achieved the similar results to those after the 26 th week. This study is of great helpful in the early prediction of preterm delivery and early clinical intervention.