Triage prediction in pediatric patients with respiratory problems
Introduction
The prevalence of respiratory diseases is increasing in big populated areas which also attract industries and heavy traffic, and it is the leading cause for children hospitalization in developing countries. Children are a sector of the population very vulnerable to air pollution for several reasons: they are developing the respiratory system; they have greater volume of air per unit of bodyweight income per breath than adults; and often they do rapid, deep, and mouth breath [1]. There are evidences that areas with industries that are emitters of fine particles (PM2.5), sulfur dioxide (SO2) and nitrogen dioxide (NO2) have a greater prevalence of respiratory complications, increasing related hospital admissions [2], [3]. This effect is significative also in regions with high ambient dust, like desert zones [4]. Diesel engine traffic has been also found having a clear effect on increase of critical respiratory hospital admittances [5].
Increasingly automated monitoring systems are desirable, though they require advanced equipment [6] that may not be widely available, such as the direct connection of continuous reading physiological sensors to computer based decision support systems [7], [8]. Besides, some systems that use computational intelligence tools, such as Markov models [9], have not taken into consideration the specifics of the children population, i.e. the stronger dependence of many physiological variable values on the patient age in children, which invalidate the solutions proposed for adult populations. Monitoring and respiratory assistance has been received attention in the care of acute or chronic cases, such as children with muscular weakness [10] or ventilated newborns [11], but not very much for milder cases that can become acute when untreated, such as is often the case of low income children [12].
The main focus of the article is the search of predictive models of the risk levels of pediatric patients hospitalized in the pediatric intensive care unit due to diseases related with respiratory problems. These risk levels are determined by a “triage” process [13]. In the data considered in this paper, risk levels are coded by numbers from 1 to 4, with 1 being the mildest and 4 the most severe. Predictive models are built by machine learning approaches [14], [15], [16], [17], following a cross-validation methodology to assess generalization of prediction results. The available data comes from health records, often without electronic implementation; therefore, they are noisy and need some preprocessing to remove erroneous values and records with unknown variable values. Also we carry out a feature selection process, which can be exhaustive because of the small dimensionality of the data. We have not performed any transformation, such as Principal Component Analysis, in order to preserve the original meaning of the variables, and, therefore, be able to argue with the experts about the value of findings.
The experimental data for this work is based on records obtained in pediatric units and CPU (Critical Patient Unit) of the Hospital Dr. Exequiel González Cortés (HEGC), which is a pediatric medical center belonging to the public health system of Chile, located in the municipality of San Miguel in Santiago de Chile. These records belong to patients who have different diagnoses, but all of them fall in the class of diseases considered as respiratory conditions.
The structure of the paper is as follows. Section 2 gives some background information and a short review of the state of the art of data mining applications in health care, with the aim of setting the stage for applications in respiratory disease assistance. Section 3 explains the variables included in the study, and their preprocessing. Section 3.1 provides descriptive statistics of the data previous to machine learning approaches. Specifically, we look closely at the dependence of some variables with respect to the patient age. Section 3.2 discusses the two ways we have explored in order to normalize the dependent variables to make them uncorrelated with patient age. Section 4 describes the computational experiments conducted seeking to predict the triage at each time instant. Section 5 presents the results of these experiments. Finally, Section 6 gives our conclusions.
Section snippets
Background
This paper details an study based on data related to respiratory diseases of children hospitalized because of respiratory complications in the city of Santiago de Chile. The prevalence of respiratory diseases in the world is increasing alarmingly [18], which is particularly severe in the city of Santiago, capital of Chile [19]. Children are one of the most affected groups by these diseases in Santiago [20], as happens in other major world cities such as Sao Paulo (Brazil) [21].
There are a host
Description of the dataset
The initial number of records is 22,025, corresponding to 45 patients (29 boys and 16 girls) aged between a few months to 16 years, having physiological measurements and record annotations between 2012 and 2014. Before carrying out the analysis of the starting data, there has been a preprocessing of the data, considering, on the one hand, the contained variables, and, on the other hand, the records. Regarding the variables, we have the following ones:
- •
Ancillary information: the patient Age,
Experimental design
To build Triage predictors from the normalized physiological features we have applied four kinds of classification algorithms: Multilayer Perceptron (MLP), Decision Trees (DT), k-Nearest Neighbors (k-NN), and Naive Bayes (NB). We have used the Caret package (short for Classification and Regression Training)1 in the R programming environment, to carry out repeated k-fold cross-validation. In our case, we applied 10-fold Cross-Validation repeated 3 times.
Results
The average accuracy obtained by the cross-validation experiments with each feature independently is presented in Table 2. It can be appreciated that the only feature that provides a result above 0.70 accuracy is the Respiratory Frequency after normalization by linear regression (Norm1). The normalization by linear regression is significant better than the categorization in age intervals (Norm2), confirmed by a one-sided t-test (p < 0.01) extended to all feature selection and classifiers
Conclusions
The aim of this work was to develop automatized systems for the monitoring of children suffering from respiratory diseases in a pediatric intensive care unit. We proceed by trying to emulate the Triage decisions of the physicians as recorded in a dataset containing the physiological variable measurements and the Triage decision. The dataset original recordings are quite noisy, with many missing values and some inconsistent variable values. Direct recording of physiological sensors and storage
Acknowledgments
The Pediatric Unit and the Unit of Critical Patients of the Hospital Dr. Exequiel González Cortés, (HEGC) have kindly provided the anonymized data. The Computational Intelligence Grouphas grant IT874-13 from the Basque Government, and participates at UIF 11/07 of UPV/EHU.
Asier Garmendia is assistant professor at the Universidad del Pais Vasco (UPV/EHU). Currently, he is pursuing Ph.D. research. His current interests are in machine learning for health care data.
References (40)
- et al.
Respiratory hospital admissions in young children living near metal smelters, pulp mills and oil refineries in two canadian provinces
Environ. Int.
(2016) - et al.
Respiratory hospitalizations of children living near a hazardous industrial site adjusted for prevalent dust: a case-control study
Int. J. Hyg. Environ. Health
(2015) - et al.
Respiratory hospitalizations of children and residential exposure to traffic air pollution in jerusalem
Int. J. Hyg. Environ. Health
(2015) - et al.
Lung ultrasound in the diagnosis and monitoring of community acquired pneumonia in children
Respir. Med.
(2015) - et al.
Management of respiratory disease in children with muscular weakness
Paediatrics Child Health
(2015) - et al.
Low-income children, adolescents, and caregivers facing respiratory problems: support needs and preferences
J. Pediatric Nurs.
(2016) - et al.
Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain
J. Clin. Epidemiol.
(2001) - et al.
Respiratory disease and particulate air pollution in santiago chile: contribution of erosion particles from fine sediments
Environ. Pollut.
(2014) Special issue: geography of health air pollution and respiratory diseases in children in São Paulo, Brazil
Soc. Sci. Med.
(1989)Potential application of machine learning in health outcomes research and some statistical cautions
Value Health
(2015)
The-muss: mobile u-health service system
Comput. Methods Program Biomed.
Application of different scoring systems and their value in pediatric intensive care unit
Egypt. Pediatric Assoc. Gaz.
Centile-based early warning scores derived from statistical distributions of vital signs
Resuscitation
Normal ranges of heart rate and respiratory rate in children from birth to 18 years of age: a systematic review of observational studies
The Lancet
Children’s response to air pollutants
J. Toxicol. Environ. Health Part A
Wireless patch sensor for remote monitoring of heart rate, respiration, activity, and falls
Proceedings of the 2013 Thirty-fifth Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Robust remote monitoring of breathing function by using infrared thermography
Proceedings of the 2015 Thirty-seventh Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
An early respiratory distress detection method with Markov models
Proceedings of the 2014 Thirty-sixth Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Cited by (4)
A review on utilizing machine learning technology in the fields of electronic emergency triage and patient priority systems in telemedicine: Coherent taxonomy, motivations, open research challenges and recommendations for intelligent future work
2021, Computer Methods and Programs in BiomedicineCitation Excerpt :In addition, disease prediction is incredibly important for patients in telemedicine because they should continuously be notified about their medical behaviour to avoid a disease or urgent cases, specifically for elderly people. Accurate diagnosis is necessary for all diseases, such as biopsy and pathologic disease [67] and respiratory diseases for children [104]. Moreover, many technological solutions have been proposed to predict falls.
Direction of the Difference Between Bayesian Model Averaging and the Best-Fit Model on Scarce-Data Low-Correlation Churn Prediction
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Comparison analysis of prediction model for respiratory diseases
2021, Multimedia and Sensory Input for Augmented, Mixed, and Virtual RealityMachine learning approach for wart treatment selection: prominence on performance assessment
2020, Network Modeling Analysis in Health Informatics and Bioinformatics
Asier Garmendia is assistant professor at the Universidad del Pais Vasco (UPV/EHU). Currently, he is pursuing Ph.D. research. His current interests are in machine learning for health care data.
Sebastián A. Ríos is Assistant Professor at the Industrial Engineering Department of the University of Chile since (2008). He received the B.E. on Industrial Engineering on 2001, the B.E. on Computer Science, P.E. on Industrial Engineering on 2003 from the University of Chile, Chile; and the Ph.D. on Knowledge Engineering from the University of Tokyo, Japan on 2007. He is the Founder and Director of the Business Intelligence Research Center (CEINE) at the University of Chile since 2012, a collaborative applied research effort between prived companies and the University. His research interests include data mining algorithms in big dataset and its applications to different industry domains (medicine, marketing, management, etc.); he also is interested in generative topic models for text mining in social networks and knowledge representation using semantic web technologies.
Jose Manuel Lopez-Guede, Ph.D. is an assistant professor at the Universidad del Pais Vasco (UPV/EHU). He received Ph.D. in 2012 in the UPV/EHU. His research interests include robotics and reinforcement learning as well as machine learning applications to health care data.
Manuel Graña Romay received the M.Sc. and Ph.D. degrees from Universidad del Pais Vasco (UPV/EHU), Donostia, Spain, in 1982 and 1989, respectively, both in computer science. His current position is a Full Profesor (Catedrático de Universidad) with the Computer Science and Artificial Intelligence Department of the Universidad del Pais Vasco (UPV/EHU). He is the head of the Computational Intelligence Group (Grupo de Inteligencia Computational). His current research interests are in applications of computational intelligence to linked multicomponent robotic systems, medical image in the neurosciences, multimodal human computer interaction, remote sensing image processing, content based image retrieval, lattice computing, semantic modelling, data processing, classification, and data mining.