Elsevier

Neurocomputing

Volumes 326–327, 31 January 2019, Pages 161-167
Neurocomputing

Triage prediction in pediatric patients with respiratory problems

https://doi.org/10.1016/j.neucom.2017.01.122Get rights and content

Abstract

Respiratory diseases have an increasing prevalence in the large urban concentration of the world, due to the apparently unstoppable increase of air pollution from a diversity of sources. Children are specially a fragile part of the population suffering this conditions. Improved monitoring of critical patients by means of automatized data gathering and processing, i.e. alarm raising, aims to alleviate the risks of critical patients. Pediatric respiratory critical care has not received much attention in the literature, despite children care has specific conditions, such as the strong dependence of some physiological signals on the patient age. We approach the problem as triage prediction problem, formulated as multi-class classification problem, with special care on the age normalization of physiological variables. Data which can be used as classification features is scarce, in the sense that measurements of only a few variables are available, and that much of the qualitative information used by the medical doctors is not available. In this paper, we report the experimental results obtained on a data sample covering patients assisted in a local pediatric hospital during two years. The results conclude that it is possible to successfully predict the triage that the medical doctors will assign the critical patients. Success is mostly dependent on the features selected, specifically it is critical to include the triage in the previous record of the patient. That means that the caregivers follow very conservative decision policies. Besides, we have found that respiratory frequency is more discriminant than blood oxygen saturation.

Introduction

The prevalence of respiratory diseases is increasing in big populated areas which also attract industries and heavy traffic, and it is the leading cause for children hospitalization in developing countries. Children are a sector of the population very vulnerable to air pollution for several reasons: they are developing the respiratory system; they have greater volume of air per unit of bodyweight income per breath than adults; and often they do rapid, deep, and mouth breath [1]. There are evidences that areas with industries that are emitters of fine particles (PM2.5), sulfur dioxide (SO2) and nitrogen dioxide (NO2) have a greater prevalence of respiratory complications, increasing related hospital admissions [2], [3]. This effect is significative also in regions with high ambient dust, like desert zones [4]. Diesel engine traffic has been also found having a clear effect on increase of critical respiratory hospital admittances [5].

Increasingly automated monitoring systems are desirable, though they require advanced equipment [6] that may not be widely available, such as the direct connection of continuous reading physiological sensors to computer based decision support systems [7], [8]. Besides, some systems that use computational intelligence tools, such as Markov models [9], have not taken into consideration the specifics of the children population, i.e. the stronger dependence of many physiological variable values on the patient age in children, which invalidate the solutions proposed for adult populations. Monitoring and respiratory assistance has been received attention in the care of acute or chronic cases, such as children with muscular weakness [10] or ventilated newborns [11], but not very much for milder cases that can become acute when untreated, such as is often the case of low income children [12].

The main focus of the article is the search of predictive models of the risk levels of pediatric patients hospitalized in the pediatric intensive care unit due to diseases related with respiratory problems. These risk levels are determined by a “triage” process [13]. In the data considered in this paper, risk levels are coded by numbers from 1 to 4, with 1 being the mildest and 4 the most severe. Predictive models are built by machine learning approaches [14], [15], [16], [17], following a cross-validation methodology to assess generalization of prediction results. The available data comes from health records, often without electronic implementation; therefore, they are noisy and need some preprocessing to remove erroneous values and records with unknown variable values. Also we carry out a feature selection process, which can be exhaustive because of the small dimensionality of the data. We have not performed any transformation, such as Principal Component Analysis, in order to preserve the original meaning of the variables, and, therefore, be able to argue with the experts about the value of findings.

The experimental data for this work is based on records obtained in pediatric units and CPU (Critical Patient Unit) of the Hospital Dr. Exequiel González Cortés (HEGC), which is a pediatric medical center belonging to the public health system of Chile, located in the municipality of San Miguel in Santiago de Chile. These records belong to patients who have different diagnoses, but all of them fall in the class of diseases considered as respiratory conditions.

The structure of the paper is as follows. Section 2 gives some background information and a short review of the state of the art of data mining applications in health care, with the aim of setting the stage for applications in respiratory disease assistance. Section 3 explains the variables included in the study, and their preprocessing. Section 3.1 provides descriptive statistics of the data previous to machine learning approaches. Specifically, we look closely at the dependence of some variables with respect to the patient age. Section 3.2 discusses the two ways we have explored in order to normalize the dependent variables to make them uncorrelated with patient age. Section 4 describes the computational experiments conducted seeking to predict the triage at each time instant. Section 5 presents the results of these experiments. Finally, Section 6 gives our conclusions.

Section snippets

Background

This paper details an study based on data related to respiratory diseases of children hospitalized because of respiratory complications in the city of Santiago de Chile. The prevalence of respiratory diseases in the world is increasing alarmingly [18], which is particularly severe in the city of Santiago, capital of Chile [19]. Children are one of the most affected groups by these diseases in Santiago [20], as happens in other major world cities such as Sao Paulo (Brazil) [21].

There are a host

Description of the dataset

The initial number of records is 22,025, corresponding to 45 patients (29 boys and 16 girls) aged between a few months to 16 years, having physiological measurements and record annotations between 2012 and 2014. Before carrying out the analysis of the starting data, there has been a preprocessing of the data, considering, on the one hand, the contained variables, and, on the other hand, the records. Regarding the variables, we have the following ones:

  • Ancillary information: the patient Age,

Experimental design

To build Triage predictors from the normalized physiological features we have applied four kinds of classification algorithms: Multilayer Perceptron (MLP), Decision Trees (DT), k-Nearest Neighbors (k-NN), and Naive Bayes (NB). We have used the Caret package (short for Classification and Regression Training)1 in the R programming environment, to carry out repeated k-fold cross-validation. In our case, we applied 10-fold Cross-Validation repeated 3 times.

Results

The average accuracy obtained by the cross-validation experiments with each feature independently is presented in Table 2. It can be appreciated that the only feature that provides a result above 0.70 accuracy is the Respiratory Frequency after normalization by linear regression (Norm1). The normalization by linear regression is significant better than the categorization in age intervals (Norm2), confirmed by a one-sided t-test (p < 0.01) extended to all feature selection and classifiers

Conclusions

The aim of this work was to develop automatized systems for the monitoring of children suffering from respiratory diseases in a pediatric intensive care unit. We proceed by trying to emulate the Triage decisions of the physicians as recorded in a dataset containing the physiological variable measurements and the Triage decision. The dataset original recordings are quite noisy, with many missing values and some inconsistent variable values. Direct recording of physiological sensors and storage

Acknowledgments

The Pediatric Unit and the Unit of Critical Patients of the Hospital Dr. Exequiel González Cortés, (HEGC) have kindly provided the anonymized data. The Computational Intelligence Grouphas grant IT874-13 from the Basque Government, and participates at UIF 11/07 of UPV/EHU.

Asier Garmendia is assistant professor at the Universidad del Pais Vasco (UPV/EHU). Currently, he is pursuing Ph.D. research. His current interests are in machine learning for health care data.

References (40)

  • HanD. et al.

    The-muss: mobile u-health service system

    Comput. Methods Program Biomed.

    (2010)
  • H.I. Rady et al.

    Application of different scoring systems and their value in pediatric intensive care unit

    Egypt. Pediatric Assoc. Gaz.

    (2014)
  • C.P. Subbe

    Centile-based early warning scores derived from statistical distributions of vital signs

    Resuscitation

    (2011)
  • S. Fleming et al.

    Normal ranges of heart rate and respiratory rate in children from birth to 18 years of age: a systematic review of observational studies

    The Lancet

    (2011)
  • T.F. Bateson et al.

    Children’s response to air pollutants

    J. Toxicol. Environ. Health Part A

    (2007)
  • R.J. Moore, J.L. Hotchkiss, The importance of toxicity in determining the impact of hazardous air pollutants on the...
  • ChanA.M. et al.

    Wireless patch sensor for remote monitoring of heart rate, respiration, activity, and falls

    Proceedings of the 2013 Thirty-fifth Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

    (2013)
  • C.B. Pereira et al.

    Robust remote monitoring of breathing function by using infrared thermography

    Proceedings of the 2015 Thirty-seventh Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

    (2015)
  • H. Ravishankar et al.

    An early respiratory distress detection method with Markov models

    Proceedings of the 2014 Thirty-sixth Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

    (2014)
  • G. Schmalisch, Basic principles of respiratory function monitoring in ventilated newborns: a review, Paediatric Respir....
  • Cited by (4)

    • A review on utilizing machine learning technology in the fields of electronic emergency triage and patient priority systems in telemedicine: Coherent taxonomy, motivations, open research challenges and recommendations for intelligent future work

      2021, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      In addition, disease prediction is incredibly important for patients in telemedicine because they should continuously be notified about their medical behaviour to avoid a disease or urgent cases, specifically for elderly people. Accurate diagnosis is necessary for all diseases, such as biopsy and pathologic disease [67] and respiratory diseases for children [104]. Moreover, many technological solutions have been proposed to predict falls.

    • Direction of the Difference Between Bayesian Model Averaging and the Best-Fit Model on Scarce-Data Low-Correlation Churn Prediction

      2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Comparison analysis of prediction model for respiratory diseases

      2021, Multimedia and Sensory Input for Augmented, Mixed, and Virtual Reality
    • Machine learning approach for wart treatment selection: prominence on performance assessment

      2020, Network Modeling Analysis in Health Informatics and Bioinformatics

    Asier Garmendia is assistant professor at the Universidad del Pais Vasco (UPV/EHU). Currently, he is pursuing Ph.D. research. His current interests are in machine learning for health care data.

    Sebastián A. Ríos is Assistant Professor at the Industrial Engineering Department of the University of Chile since (2008). He received the B.E. on Industrial Engineering on 2001, the B.E. on Computer Science, P.E. on Industrial Engineering on 2003 from the University of Chile, Chile; and the Ph.D. on Knowledge Engineering from the University of Tokyo, Japan on 2007. He is the Founder and Director of the Business Intelligence Research Center (CEINE) at the University of Chile since 2012, a collaborative applied research effort between prived companies and the University. His research interests include data mining algorithms in big dataset and its applications to different industry domains (medicine, marketing, management, etc.); he also is interested in generative topic models for text mining in social networks and knowledge representation using semantic web technologies.

    Jose Manuel Lopez-Guede, Ph.D. is an assistant professor at the Universidad del Pais Vasco (UPV/EHU). He received Ph.D. in 2012 in the UPV/EHU. His research interests include robotics and reinforcement learning as well as machine learning applications to health care data.

    Manuel Graña Romay received the M.Sc. and Ph.D. degrees from Universidad del Pais Vasco (UPV/EHU), Donostia, Spain, in 1982 and 1989, respectively, both in computer science. His current position is a Full Profesor (Catedrático de Universidad) with the Computer Science and Artificial Intelligence Department of the Universidad del Pais Vasco (UPV/EHU). He is the head of the Computational Intelligence Group (Grupo de Inteligencia Computational). His current research interests are in applications of computational intelligence to linked multicomponent robotic systems, medical image in the neurosciences, multimodal human computer interaction, remote sensing image processing, content based image retrieval, lattice computing, semantic modelling, data processing, classification, and data mining.

    View full text