Keywords

1 Introduction

Pain is a precious tool for medical attention. A widely accepted definition considers pain as an unpleasant sensory and emotional experience associated with actual or potential tissue damage, or described in terms of such damage [1]. Pain helps identify harmful situations, avoid tissue damage, and promote healing [2]. While each individual learns pain through previous experiences related to injury, which can confound subjective pain ratings [3]. The criterion for evaluating pain can be different for each person. Consequently, accurate assessment of pain is challenging due to the subjectivity of pain.

As reported in [4], pain is a complex and not entirely understood phenomenon. Self-report is used as the golden measurement, which is however not always reliable and valid [5]. For instance, assessing pain from children depends on the cognitive development, clinical context, and pain typology [6]. Therefore, the awareness of pain may vary among different ages of children. For the elderly, effective pain evaluation can be a challenge [7]. Moreover, pain may not be self-reported in some situations. For example, neonates cannot utter words to describe their pain [8]. Individuals who have no clear consciousness of their feelings, like dementia [9], cannot report their pain neither. An objective system for automatically recognizing pain can contribute to detecting pain in such situations. This in turn helps improve people’s physical health and mental happiness.

A bunch of machine learning methods have been developed for pain recognition, such as random forest [11] and support vector machines (SVM) [10]. However, these classifiers rely on the selected features and empirical knowledge and therefore suffer from the complexity of the tasks. Nowadays, it is well known that deep learning can provide accurate recognition or prediction for complex data science problems [12] as it allows discovering hidden patterns in data and modelling complex relationships between variables.

In light of the above statement, we explore four tree-based classification models, i.e., eXtreme Gradient Boosting (XGBoost) [13], random forest, Adaptive Boosting (AdaBoost) [14], and TabNet (which is the recently proposed deep-learning-based ensemble decision trees) [15]. To improve the classification accuracy, we extract new electrodermal activity (EDA) features and electrocardiogram (ECG) features of physiological signals. The main contributions of this work include: 1) the extracted novel features from biomedical signals, which can boost the current models in pain assessment; 2) the application of TabNet model, which is the first attempt in pain detection; 3) the outstanding performance of TabNet for most detection tasks, compared to the existing models.

2 Related Work

In this section, we will introduce the previous studies that are related to this work. All the related work mentioned below with better performance will be compared to our method in Sect. 4.2.

Early research in pain assessment was mainly focused on fusion of multi-model signals, involving facial expression and biomedical signals based on BioVid Heat Pain Database (BVDB) (Part-A) [16]. For example, Werner et al. [17] applied a random forest classifier on multi-model signals to detect pain level. Kächele et al. [18, 19] conducted random forest classifier for continuous prediction of pain intensity. However, facial expression-based pain recognition needs tracking of the facial regions, which can be cumbersome and complex in clinical application. Pain research indicates that the presence of pain significantly interacts with autonomic nervous system and thus leads to changes in EDA and heart rate [20]. The bio-physiological signals are recorded in BVDB as follows: 1) EDA, which is also referred to as skin conductance (SC) or galvanic skin response (GSR), measures the changes in the electrical properties of the skin and shows the strong association to the emotional arousal [16]; 2) ECG records the electrical activity of the heart and supplies a significant amount of information about heart function [21]; 3) Electromyogram (EMG) records the electrical activity of muscles. The activity of the trapezius is a hint of a high stress level which is to be expected during pain stimulation [16].

Given the correlation between physiological signals and unpleasant pain, nowadays, pain assessment researchers have shifted to focusing on merely physiological signals. Gruss et al. [22] extracted a total of 159 features and proposed SVM classifiers for binary pain classification. Deep learning models on physiological signals have shown promising results in pain recognition. In Lopez-Martinez et al. [23], EDA and ECG signals features were introduced into multi-task neural networks which have two hidden layers, one shared and one person-specific, and proved to perform better than single-task neural networks. Wang et al. [24] proposed deep Recurrent Neural Network based hybrid classifiers to classify the pain intensity. They applied a bidirectional Long Short-Term Memory (LSTM) network to learn temporal dynamic characteristics of physiological signals and fused them with handcrafted features. Thiam et al. [25] proposed a multi-modal information aggregation approach based on Deep Denoising Convolutional Auto-Encoders for pain assessment on two different pain databases including BioVid Heat Pain Database. Thaim et al. [26] designed Convolutional Neural Network (CNN) based on physiological signals (EDA, ECG, and EMG) for pain classification. Pouromran et al. [27] computed features from EDA, ECG and EMG signals and trained machine learning models—including Linear Regression, Support Vector Regression, Neural Network and XGBoost—on these features for pain intensity estimation. Subramaniam et al. [28] proposed a hybrid CNN-LSTM classification based on ECG and EDA signals for binary pain detection.

In [26], the pain classification with EDA signals was much higher than other signals and the fusion of EDA and ECG signals. However in this work with our new model, we observed different results. For ECG signals, our extracted features provide additional information allowing to improve pain detection. For instance, by comparing the results of the classification task \(B_0\) vs \(P_4\) using random forest classifier, we notice that Kächele et al. [18, 19] provided an accuracy of 53.90%, Werner et al. [17] 62%, while we were able to generate an average accuracy of 67.18% using our new ECG features. Furthermore, using the TabNet model with ECG features, we were able to improve the accuracy to 81.12%.

3 Proposed Model

Fig. 1.
figure 1

(a) Pain stimulation, represented by the heat levels from \(T_0\) to \(T_4\). The features are extracted from the blue window of length 5.5 s. (b) The standard ECG signal with P, Q, R points and amplitudes.

3.1 Biovid Heat Pain Database (BVDB)

The BioVid Heat Pain Database [16] contains multidimensional datasets, both video signals and biopotentials, which provide potential to advance an automated pain recognition system. The data was collected on 90 subjects from 18 to 65 years of age, using a thermode at right arm for the pain elicitation. The experimenters randomly heated the participants with four calibrated intensities (\(T_1, T_2, T_3, T_4\)) and each stimulation was held up to 4s (Fig. 1a). Thus, it generates four pain intensity levels (i.e., \(P_1, P_2, P_3, P_4\)). 20 times of each pain level were given. The pauses between the stimuli were kept at baseline temperature (\(T_0\)) and around 8 s-12 s, which was no pain (\(B_0\)). Each dataset was extracted with the length of 5.5 s starting 1 s after the stimulation was given, where 2816 data points were collected. Consequently, the experimenters retained 20 samples/class \(\times \) 5 classes \(=100\) samples for per subject.

3.2 Feature Extraction

BVDB includes five parts, Part A-E [29]. Besides Part-A, we also explored our methods on Part-B with EDA and ECG signals. Before extracting features from ECG signals, a Butterworth bandpass filter with frequency range of [0.1, 250] was introduced to remove the noise of the muscles, baseline wander and other interference [30]. In Part-B, 6 samples contain a quantity of nulls with EDA signals. Their corresponding ECG signals are also invalid. Then we removed the 6 samples in our experiment, which are all from no pain stimulation (\(B_0\)).

In addition to utilizing EDA and ECG features proposed in [23], we explored novel features. We found that some EDA signals are like the sine function and some are not, so we investigated the numbers of crests and troughs, and the relations between the start and end states, minimum and maximum. These four SC features were appended. On the other hand, ECG signals are mainly reflected on the characteristics of P, R and T points. ECG signals consist of the P-QRS-T waves and P, R, T points are the peaks of P, R, T waves (Fig. 1b) [31]. We detected false R points with visualization and removed them via their amplitudes because these points may seriously impact on the preciseness of further feature extraction. We extract the information of R, P, and T points as ECG features.

Skin Conductance (SC). We applied 12 SC features in [23]. (1) maximum; (2) range; (3) standard deviation; (4) interquartile range; (5) root mean square; (6) mean; (7) mean absolute value of the first differences (Mav1d): \(\frac{1}{N-1} \sum _{i=1}^{N-1}|x_{i+1} - x_{i}|\); (8) means absolute value of the second differences: \(\frac{1}{N-2} \sum _{i=1}^{N-2}|x_{i+2} - x_{i}| \); (9) mean absolute value of the first differences of the standardized signal: \(\frac{x-mean(x)}{std(x)} \); (10) mean absolute value of the second differences of the standardized signal; (11) skewness; (12) kurtosis were computed. In addition, we appended the novel features: (13) differences among maximum, first state and last state (Max1n): \(maximum - x_1 - x_{N}\); (14) the number of troughs; (15) differences between minimum, first state and last state (N1Min): \(x_{N} - x_1 - minimum\) and (16) the number of crests. The abbreviation of SC features are summarized in Table 1.

Electrocardiogram. First, we located P, R, T points. Their amplitudes and positions were kept. The R points were detected by Pan-Tompkins algorithm [32]. After locating the R peaks, the area of T wave was determined from \(R(i) + 0.16 \times RR(i+1)\) to \(R(i) + 0.57 \times RR(i+1)\) [21], where R(i) is the \(i^{th}\) position of R point; RR represents RR interval, \(RR(i+1)\) is achieved from \(R(i+1) - R(i)\). The peak of phase signal within the segment was considered as the T points. Finally, in the similar way, P waves were demarcated from \(R(i) + 0.7 \times RR(i+1)\) to \(R(i) + 0.07 \times RR(i+1)\) [21], where the range was modified to make it suitable for datasets from BVDB. The P, R, T points were correctly located and a sample of ECG signals is shown in Fig. 2.

Fig. 2.
figure 2

Raw ECG signal of a sample with detected T, P, Q, R, S points.

Second, we calculated the following ECG features. The 4 of 5 ECG features proposed in [23] are included in our work. They are (17) the mean of the IBIs; (18) the root mean square of the successive differences (RMSSD); (19) the mean of the standard deviations of the IBIs (SDNN); (20) the mean of the slope of the liner regression of IBIs in its time series; (21) the ratio of SDNN to RMSSD. New features that we computed are (22) the number of R peaks in the windows of 5.5 s (Fig. 1a); (23) the range of R amplitudes; (24) the standard deviation of R amplitudes; (25) the mean of the duration of PT (Fig 1b): \(\frac{1}{M} \sum _{i=1}^{M-1}(T(i) - P(i) )\), where M is the minimum of the number of P points and T points, T(i) is the \(i^{th}\) position of T point, P(i) is the \(i^{th}\) position of P point; (26) the root mean square of the successive differences of the duration of PT (PTxRMSSD); (27) the mean of the standard deviation of the duration of PT (PTxSDNN); (28) the ration of PTxSDNN to PTxRMSSD; (29) the mean of the difference of amplitude of PT: \(\frac{1}{M} \sum _{i=1}^{M-1}(Amplitude(T_i) - Amplitude(P_i) )\); (30) the root mean square of the successive differences of the amplitude of PT (PTyRMSSD); (31) the mean of the standard deviation of the amplitude of PT (PTySDNN); (32) the ration of PTySDNN to PTyRMSSD. They are summarized in Table 1.

Table 1. Extracted features

3.3 Methodology

The main goal is to discriminate between no pain (\(B_0\)) and the presence of pain (\(P_1\), \(P_2\), \(P_3\), \(P_4\)) based on physiological signals. For this, we establish four binary classification tasks. In the tasks, we extract features from EDA and ECG signals – i.e., create tabular EDA and ECG features. Based on these features, we explore the decision trees (DTs) based classifiers, i.e., random forest, AdaBoost, XGBoost, and TabNet. In general, ensemble methods tend to yield better results than standard DTs. For this reason, we choose tree-based tabular data learning methods. Random forest [11] is an ensemble of decorrelated decision trees. Adaboost is a classic type of Boosting [14]. Boosting is a general ensemble method that produces a strong classifier from an ensemble of weak learners. XGBoost [13] is an ensemble DT approaches that follows the principle of boosting. TabNet [15] is the integration of deep neural network into DTs. The architecture of TabNet contains a Batch Normalisation layer to filter the raw data, and several transformer blocks to learn relevant features. It also consists of a sequential attention mechanism and learnable masks to choose which feature to process at each decision step. This characteristics enables efficient learning as the learning capacity is used for the most salient features. Finally, we can select the best combination of signals, features, and models based on their performance.

4 Experiments

4.1 Setup

On the Part-A and Part-B datasets, we conducted four binary classification tasks of discriminating subjects with no pain (\(B_0\)) versus pain (\(P_1, P_2, P_3, P_4\)) condition, i.e., \(B_0\) vs. \(P_1\), \(B_0\) vs. \(P_2\), \(B_0\) vs. \(P_3\), and \(B_0\) vs. \(P_4\). Since Part-A includes 87 subjects and Part-B 86 subjects, we have \(87 \times 20 \times 2 = 3480\) samples for each of the four Part-A tasks and \((86 \times 20 - 6) + 86 \times 20 = 3434\) samples for each Part-B task. We applied the PyTorch implementation of TabNet [15], which is available at [33]. We evaluated the model’s performance via the stratified 10-fold cross-validation because the Part-B data is imbalanced due to 6 invalid samples in no pain level (\(B_0\)). To improve the models’ performance, we employ the grid search optimization method to select the best hyperparameters from the following hyperparameter candidates:

  • Random forest: The n_estimators is in range 60 to 280 with a 20 step size.

  • AdaBoost: Given the decision tree base_estimator, the optimizer determines the max_depth \(\in \{10, 11\}\). The n_estimators takes a value in the range [90, 240] and the learning_rate from {0.001, 0.01, 0.5, 0.1, 1}.

  • XGBoost: With max_depth=7, the value of n_estimators is selected from {100, 200, 300, 400, 500}, learning_rate from {0.01, 0.1, 0.2, 0.3, 0.7}, and gamma from {0.1, 0.2, 1, 2, 5}.

  • TabNet: For TabNetClassifier [15], the batch_size is either 8% or 10% of the total training dataset size. For learning rate schedulers, we set gamma=0.9, step_size=10. A validation-based early stopping strategy was employed.

4.2 Results and Discussion

We present and discuss classification results with respect to Part-A and Part-B in this section. Since it’s the first attempt to investigate machine learning algorithms on Part-B, we only compare our work with previous works involving physiological signals of Part-A.

  • The pain detection performance of the four models on Part-A is summarized in Table 2. It can be seen that the best accuracy for the four tasks is 65.57% for \(B_0\) vs. \(P_1\) task, 68.39% for \(B_0\) vs. \(P_2\), 76.15% for \(B_0\) vs. \(P_3\), and 85.23% for \(B_0\) vs. \(P_4\), respectively, when EDA is used. TabNet steadily outperforms the random forest and Adaboost classifiers. For lowest pain level (\(P_1\)), TabNet is superior to XGBoost. For higher pain levels (\(P_2\), \(P_3\), and \(P_4\)), XGBoost performs better than TabNet. For ECG, TabNet consistently outperforms all the other models, achieving the 72.18%, 71.81%, 77.04%, and 81.12% accuracy for the four tasks, respectively. TabNet wins the competition when it uses the fusion of EDA and ECG signals, with the accuracy of 75.71%, 83.97%, 88.93%, and 94.51% for the four tasks, respectively.

  • The four models for four classification tasks (\(B_0\) vs. \(P_1\), \(B_0\) vs. \(P_2\), \(B_0\) vs. \(P_3\), \(B_0\) vs. \(P_4\)), the results on Part-B are presented in Table 3. For EDA signals, the best accuracy is 61.18%, 64.83%, 69.22%, and 79.22%, respectively. TabNet performs better than other classifiers on the lowest pain level (\(P_1\)). For higher three pain levels (\(P_2, P_3, P_4\)), XGBoost generates the best accuray. In addition, TabNet did the best for the eights tasks (ECG and EDA+ECG). The best accuray is 69.19%, 74.34%, 76.82%, and 82.85% with ECG signals, and 75.24%, 78.74%, 82.44% and 88.44% with the fusion of EDA and ECG. It shows the same trend as Part-A.

Table 2. Pain recognition with physiological signals of Part-A. Mean accuracies are reported for stratified 10-fold cross-validation, for representative models and binary classification tasks (Mean% ± Standard Deviation%).
Table 3. Pain recognition with physiological signals of Part-B. Mean accuracies are reported for stratified 10-fold cross-validation, for representative models and binary classification tasks (Mean% ± Standard Deviation%).
Table 4. Performance comparison of the binary classification task \(B_0\) vs. \(P_4\)
Table 5. Comparing the best binary classification accuracy of each study on Part-A

In Table 4, we compare our results with previous studies on single signals of Part-A for classifying no pain and highest pain levels. For EDA signal, Thiam et al. [26] applied CNN algorithm with an average accuracy of 84.57% and achieved the highest accuracy among previous work. For XGBoost, we achieved an accuracy of 85.23%. TabNet outperforms the most previous work as well. For ECG signal, TabNet with our selected features yields 72.18%, 71.81%, 77.04% and 81.12% accuracy for the four tasks, outperforming the currently best model [28] (with an accuracy of 68.7%, 62.61%, 67.86%, and 75.21%). The performance with ECG signals has been significantly improved. From these results, we found that ECG signals are essential for pain detection, which are comparable to EDA signals; this finding has never been previously reported.

Table 5 shows the comparison between the best binary classification accuracy reported by the previous studies and ours, for discriminating no pain (\(B_0\)) and pain tolerance (\(P_4\)) on Part-A. The facial expression (Video) turns out to be non-promotional for pain detection, according to results shown in [17,18,19]. On the contrary, using physiological signals, especially EDA and ECG signals, can boost the pain detection. Additionally, using the fusion of EDA and ECG signal, TabNet achieves the best pain detection results, with an accuracy of 94.51%.

In order to investigate the contribution of EDA and ECG features for the classification of no pain and pain tolerance, we plot the significance of the features extracted from the fusion of EDA and ECG in using XGBoost (Fig. 3a) and TabNet (Fig. 3b). Based on the XGBoost’s feature importances, we find that the extracted EDA features contribute 65% to the classification, while ECG contributes 35%. (The computation is not explicitly presented here due to the space limit.) Moreover, the combination of the 4 features from EDA (i.e., the features #13–16 in Table 1) and the 10 from ECG (i.e., features #23–32 in Table 1) contribute 37.48% to the classification. For TabNet, the extracted EDA and ECG features make a contribution of 61.15% and 38.85%, respectively, while our 4 new EDA features and 10 ECG features contribute 39.76% to the classification. The comparison reveals that 1) besides EDA, ECG is significant for the pain detection; 2) important features have been overlooked by previous researches.

Fig. 3.
figure 3

The importance of features (EDA and ECG) determined by XGBoost(a) and TabNet(b) for the classification of \(B_0\) and \(P_4\).

5 Conclusion

We proposed in this paper an automatic pain detection method for the classification of subjects with no pain and pain condition. For each of the four classification tasks, we explored the performance of four tree-based models that are learned based on the features extracted from a single physiological signal (EDA and ECG) and from their fusion (EDA + ECG). We removed the noise of ECG signals and appended corresponding P-QRS-T wave information to ECG features. Our method can close the gap to previous work and discover significance of ECG in pain detection. The experimental results demonstrate that our method achieves the highest classification accuracy when using a single signal (EDA or ECG). The 94.51% accuracy on the data of physiological signals reveals the promising of our method in practical use – e.g., detecting pain based on the real-time data collected from wearable devices.