Abstract
This paper deals with physiological functional variables selection for driver’s stress level classification using random forests. Our analysis is performed on experimental data extracted from the drivedb open database available on PhysioNet website. The physiological measurements of interest are: electrodermal activity captured on the driver’s left hand and foot, electromyogram, respiration, and heart rate, collected from ten driving experiments carried out in three types of routes (rest area, city, and highway). The contributions of this work touch on the method as well as the application aspects. From a methodological viewpoint, the physiological signals are considered as functional variables, decomposed on a wavelet basis and then analyzed in search of most relevant variables. On the application side, the proposed approach provides a “blind” procedure for driver’s stress level classification, giving close performances to those resulting from the expert-based approach, when applied to the drivedb database. It also suggests new physiological features based on the wavelet levels corresponding to the functional variables wavelet decomposition. Finally, the proposed approach provides a ranking of physiological variables according to their importance in stress level classification. For the case under study, results suggest that the electromyogram and the heart rate signals are less relevant compared to the electrodermal and the respiration signals. Furthermore, the electrodermal activity measured on the driver’s foot was found more relevant than the one captured on the hand. Finally, the proposed approach also provided an order of relevance of the wavelet features.
Similar content being viewed by others
References
Akbas A (2011) Evaluation of the physiological data indicating the dynamic stress level of drivers. Sci Res Essays 6(2):430–439
Alkali AH, Saatchi R, Elphick H, Burke D (2014) Short-time Fourier and wavelet transform analysis of respiration signal obtained by thermal imaging. In: 2014 9th International Symposium on Communication Systems, Networks & Digital Sign (CSNDSP). IEEE, pp 183–187. https://doi.org/10.1109/CSNDSP.2014.6923821
Auret L, Aldrich C (2011) Empirical comparison of tree ensemble variable importance measures. Chemometr Intell Lab Syst 105(2):157–170. https://doi.org/10.1016/j.chemolab.2010.12.004
Ayata D, Yaslan Y, Kamasak M (2016) Emotion recognition via random forest and galvanic skin response: comparison of time based feature sets, window sizes and wavelet approaches. In: 2016 Medical Technologies National Congress (TIPTEKNO). IEEE, pp 1–4. https://doi.org/10.1109/TIPTEKNO.2016.7863130
Bach FR (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9(Jun):1179–1225
Bostrom J (2005) Emotion-sensing PCs could feel your stress. PC World
Boucsein W (2012) Electrodermal activity. Springer, Berlin
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. The wadsworth and Brooks–Cole statistics-probability series. Taylor & Francis, London
Breiman L, Cutler A (2015) Randomforest: Breiman and cutler’s random forests for classification and regression. R Package Version 46-12 http://cran.r-project.org/package=randomForest
Chaudhary R (2013) Electrocardiogram comparison of stress recognition in automobile drivers on matlab. Adv Electron Electr Eng 3(8):1007–1012
Deng Y, Wu Z, Chu C, Yang T (2012) Evaluating feature selection for stress identification. In: Information Reuse and Integration (IRI), 2012 IEEE 13th international conference on, pp 584–591. https://doi.org/10.1109/IRI.2012.6303062
Díaz-Uriarte R, de Andrés SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):1–13. https://doi.org/10.1186/1471-2105-7-3
El Haouij N, Poggi JM, Sevestre-Ghalila S, Ghozi R, Jaïdane M (2018) AffectiveROAD system and database to assess driver’s attention. In: SAC 2018: symposium on applied computing, April 9–13, Pau. https://doi.org/10.1145/3167132.3167395
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice (springer series in statistics). Springer-Verlag New York Inc., Secaucus
Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31(14):2225–2236. https://doi.org/10.1016/j.patrec.2010.03.014
Genuer R, Poggi JM, Tuleau-Malot C (2015) VSURF: an R package for variable selection using random forests. R J 7(2):19–33
Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P, Mark R, Mietus J, Moody G, Peng CK, Stanley H (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220
Granero AC, Fuentes-Hurtado F, Naranjo Ornedo V, Guixeres Provinciale J, Ausín JM, Alcañiz Raya M (2016) a Comparison of physiological signal analysis techniques and classifiers for automatic emotional evaluation of audiovisual contents. Front Comput Neurosci 10:74. https://doi.org/10.3389/fncom.2016.00074
Gregorutti B (2016) RFgroove: importance measure and selection for groups of variables with random forests. R Package Version 11 http://cran.r-project.org/package=RFgroove
Gregorutti B, Michel B, Saint-Pierre P (2015) Grouped variable importance with random forests and application to multiple functional data analysis. Comput Stat Data Anal 90:15–35. https://doi.org/10.1016/j.csda.2015.04.002
Gregorutti B, Michel B, Saint-Pierre P (2016) Correlation and variable importance in random forests. Stat Comput. https://doi.org/10.1007/s11222-016-9646-1
Guendil Z, Lachiri Z, Maaoui C, Pruski A (2015) Emotion recognition from physiological signals using fusion of wavelet based features. In: 2015 7th International Conference on Modelling, Identification and Control (ICMIC), IEEE, pp 1–6. https://doi.org/10.1109/ICMIC.2015.7409485
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422. https://doi.org/10.1023/A:1012487302797
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York Inc., New York
Healey JA (2000) Wearable and automotive systems for affect recognition from physiology. Ph.D. Thesis, MIT Department of Electrical Engineering and Computer Science
Healey JA, Picard RW (2005) Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans Intell Transp Syst 6(2):156–166. https://doi.org/10.1109/TITS.2005.848368
Horberry T, Anderson J, Regan MA, Triggs TJ, Brown J (2006) Driver distraction: the effects of concurrent in-vehicle tasks, road environment complexity and age on driving performance. Accid Anal Prev 38(1):185–191
Imam MH, Karmakar CK, Khandoker AH, Palaniswami M (2014) Effect of ECG-derived respiration (EDR) on modeling ventricular repolarization dynamics in different physiological and psychological conditions. Med Biol Eng Comput 52(10):851–860
Jolliffe I (2012) Principal Component Analysis. Springer, Berlin
Karmakar C, Imam MH, Khandoker A, Palaniswami M (2014) Influence of psychological stress on QT interval. Computing in cardiology 2014:1009–1012
Lin HP, Lin HY, Lin WL, Huang ACW (2011) Effects of stress, depression, and their interaction on heart rate, skin conductance, finger temperature, and respiratory rate: sympathetic-parasympathetic hypothesis of stress and depression. J Clin Psychol 67(10):1080–1091. https://doi.org/10.1002/jclp.20833
Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees
Lykken DT (1972) Range correction applied to heart rate and to GSR data. Psychophysiology 9(3):373–379. https://doi.org/10.1111/j.1469-8986.1972.tb03222.x
Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
Nicodemus KK, Malley JD, Strobl C, Ziegler A (2010) The behaviour of random forest permutation-based variable importance measures under predictor correlation. BMC Bioinform 11(1):1–13. https://doi.org/10.1186/1471-2105-11-110
Picard RW, Fedor S, Ayzenberg Y (2016) Multiple arousal theory and daily-life electrodermal activity asymmetry. Emot Rev 8(1):62–75. https://doi.org/10.1177/1754073914565517
Poggi JM, Tuleau C (2007) Classification of objectivization data using cart and wavelets. In: Proceedings of the IASC 07. Aveiro, pp 1–8
R Core Team (2016) R: A language and environment for statistical computing. In: R foundation for statistical computing. Vienna. www.r-project.org
Ramsay JO, Silverman BW (2002) Applied functional data analysis: methods and case studies, vol 77. Springer, New York
Ramsay JO, Silverman BW (2005) Functional data analysis. Springer, New York. https://doi.org/10.1007/b98888
Rigas G, Katsis C, Bougia P, Fotiadis D (2008) A reasoning-based framework for car drivers stress prediction. In: Control and automation, 2008 16th mediterranean conference on. pp 627–632. https://doi.org/10.1109/MED.2008.4602162
Sharma N, Gedeon T (2012) Objective measures, sensors and computational techniques for stress recognition and classification: a survey. Comput Methods Programs Biomed 108(3):1287–301. https://doi.org/10.1016/j.cmpb.2012.07.003
Sidek KA, Khalil I (2011) Automobile driver recognition under different physiological conditions using the electrocardiogram. PC World 38:753–756
Singh RR, Conjeti S, Banerjee R (2012) Biosignal based on-road stress monitoring for automotive drivers. In: 2012 National Conference on Communications (NCC), IEEE, pp 1–5. https://doi.org/10.1109/NCC.2012.6176845
Singh M, Queyam AB (2013) Stress detection in automobile drivers using physiological parameters: a review. Int J Electron Eng 5(2):1–5
Smart RG, Cannon E, Howard A, Frise P, Mann RE (2005) Can we design cars to prevent road rage? Int J Veh Inf Commun Syst 1(1–2):44–55. https://doi.org/10.1504/IJVICS.2005.007585
Strobl C, Zeileis A (2008) Danger: high power!? exploring the statistical properties of a test for random forest variable importance. In: Proceedings of 18th international conference on computational statistics
Tao J, Tan T (2005) Affective computing: a review. In: International conference on affective computing and intelligent interaction. Springer, pp 981–995
Ullah S, Finch CF (2013) Applications of functional data analysis: a systematic review. BMC Med Res Methodol 13(1):43
Van Dooren M, De Vries JJ, Janssen JH (2012) Emotional sweating across the body: Comparing 16 different skin conductance measurement locations. Physiol Behav 106(2):298–304. https://doi.org/10.1016/j.physbeh.2012.01.020
Verikas A, Gelzinis A, Bacauskiene M (2011) Mining data with random forests: a survey and results of new tests. Pattern Recognit 44(2):330–349. https://doi.org/10.1016/j.patcog.2010.08.011
Yang K, Yoon H, Shahabi C (2005) A supervised feature subset selection technique for multivariate time series. In: Proceedings of the workshop on feature selection for data mining: interfacing machine learning with statistics, pp 92–101
Zhang L, Tamminedi T, Ganguli A, Yosiphon G, Yadegar J (2010) Hierarchical multiple sensor fusion using structurally learned Bayesian network. In: Wireless health 2010 on—WH ’10. ACM Press, New York, p 174. https://doi.org/10.1145/1921081.1921102
Zhu R, Zeng D, Kosorok MR (2012) Reinforcement learning trees. Technical reports on University of North Carolina
Acknowledgements
The authors gratefully acknowledge Dr. Chiraz Ben Abdelkader and Dr. Hassine Saidane for proofreading the paper. They also thank the anonymous referees for their useful suggestions and meaningful comments which led to a considerable improvement of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
El Haouij, N., Poggi, JM., Ghozi, R. et al. Random forest-based approach for physiological functional variable selection for driver’s stress level classification. Stat Methods Appl 28, 157–185 (2019). https://doi.org/10.1007/s10260-018-0423-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-018-0423-5
Keywords
- Physiological signals
- Functional data
- Random forests
- Recursive feature elimination
- Wavelets
- Grouped variable importance