Prediction model for patients with acute respiratory distress syndrome: use of a genetic algorithm to develop a neural network model

Background Acute respiratory distress syndrome (ARDS) is associated with significantly increased risk of death, and early risk stratification may help to choose the appropriate treatment. The study aimed to develop a neural network model by using a genetic algorithm (GA) for the prediction of mortality in patients with ARDS. Methods This was a secondary analysis of two multicenter randomized controlled trials conducted in forty-four hospitals that are members of the National Heart, Lung, and Blood Institute, founded to create an acute respiratory distress syndrome Clinical Trials Network. Model training and validation were performed using the SAILS and OMEGA studies, respectively. A GA was employed to screen variables in order to predict 90-day mortality, and a neural network model was trained for the prediction. This machine learning model was compared to the logistic regression model and APACHE III score in the validation cohort. Results A total number of 1,071 ARDS patients were included for analysis. The GA search identified seven important variables, which were age, AIDS, leukemia, metastatic tumor, hepatic failure, lowest albumin, and FiO2. A representative neural network model was constructed using the forward selection procedure. The area under the curve (AUC) of the neural network model evaluated with the validation cohort was 0.821 (95% CI [0.753–0.888]), which was greater than the APACHE III score (0.665; 95% CI [0.590–0.739]; p = 0.002 by Delong’s test) and logistic regression model, albeit not statistically significant (0.743; 95% CI [0.669–0.817], p = 0.130 by Delong’s test). Conclusions The study developed a neural network model using a GA, which outperformed conventional scoring systems for the prediction of mortality in ARDS patients.

life-threatening hypoxia (Abdel Hakim et al., 2016). A variety of mechanical ventilation strategies, such as low tidal volume ventilation, prone positioning and paralytics have been developed over the past few decades in order to improve clinical outcomes of ARDS (Carron, 2016). However, the improvement in mortality rate was less than satisfactory (Zhang, Chen & Ni, 2015;Mezidi & Guérin, 2016), and there is still much work to be done in this area. Risk stratification for ARDS can be a useful tool in medical decision making and the design of clinical trials, thus strenuous efforts have been made to derive a model for the prediction of ARDS mortality (Cooke et al., 2009;Frenzel et al., 2011;Balzer et al., 2016;Zhao et al., 2017). The Acute Physiology and Chronic Health Evaluation (APACHE) III score is a severity-of-disease classification system, which is applied within the first 24 h of admission to an ICU, higher scores correspond to more severe disease forms and a higher risk of mortality. For decades, APACHE III has been widely used for the prediction of ARDS mortality (Knaus et al., 1991). However, most of these studies employed conventional regression methods to develop prediction models, which requires preexisting domain knowledge for model interactions and/or higher-order terms; while sophisticated machine learning methods can capture these complex relationships automatically based on the data.
A genetic algorithm (GA) is an adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetics. As such it represents an intelligent exploitation of a random search used to solve optimization problems (Lucasius & Kateman, 1993). Variable selection in building prediction models is a problem of optimization. A GA is suitable for large-scale searches of candidate predictors, and it is a popular method in many fields such as chemistry, computer science and economics (Lucasius & Kateman, 1993;Las Heras et al., 2016;Escalona-Vargas et al., 2016). However, GAs have not yet been widely used in clinical research, mainly due to their complexity in computations. In the present study, I aimed to develop a neural network model for the prediction of ARDS mortality, with predictor selection being performed using a GA. The final model was compared to the model developed using a conventional logistic regression approach and the existing risk prediction score APACHE III.

Training and validation cohorts
The study was a secondary analysis of two randomized controlled trials (RCTs) involving ARDS. The Statins for Acutely Injured Lungs from Sepsis (SAILS, NCT00979121) study enrolled 745 patients with sepsis-induced ARDS. Patients were randomized to receive either rosuvastatin or a placebo in a double-blind manner. The result was neutral in that rosuvastatin was not able to reduce mortality in comparison to the placebo (Truwit et al., 2014). The other study was the OMEGA study (NCT00609180), which enrolled 272 adults within 48 h of developing ARDS (Rice et al., 2011). The OMEGA study also failed to identify beneficial effects of the intervention. The SAILS trial was used for model development and the OMEGA trial was used for model validation. All the data were de-identified and were openly accessible from the Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC, https://biolincc.nhlbi.nih.gov/home/). The study was approved by the ethics committee of Sir Run-Run Shaw Hospital (approval number: 20170313-2) and was performed in accordance with the Declaration of Helsinki.

Descriptive statistics
The variables included for analysis were compared between survivors and non-survivors. Normally distributed numeric variables were expressed as mean and standard deviation, and they were compared using Student's t -test. Otherwise, they were described as median and interquartile range, and compared using Mann-Whitney U tests. Categorical data were expressed as numbers and percentages, and the differences were compared using Chi-square test or Fisher's exact test as appropriate. A two-tailed p value <0.05 was considered to be statistically significant.

Variables included for GA
The primary outcome of the study was 90-day mortality, which was coded for if the patient died prior to discharge with unassisted breathing or died prior to achieving unassisted breathing at home for 48 h.
All variables collected during the 24 h before randomization in the original RCTs were included for the GA search. A total of 88 variables were included, which contained information on demographics, admission resources, admission type, laboratory findings, vital signs, parameters of mechanical ventilation, and the outcome status during the study period (Table S1). Hospital admission type referred to the category of hospital admission. Admission sources referred to the location where the patient was immediately prior to the ICU admission, including operating room (OR), recovery room, emergency room (ER), hospital floor, another special care unit, another hospital, direct admit, and step-down unit. The place of residence was the place of residence prior to the admission to hospital. Chronic health information was updated at any time during the admission, which included: acquired immunodeficiency syndrome (AIDS), leukemia (e.g., including acute myeloid leukemia, chronic myeloid leukemia, all lymphocytic leukemia, and multiple myeloma), Non-Hodgkin's Lymphoma, solid tumor with metastasis, immune suppression (e.g., the patient is immunocompromised secondary to chemotherapy, radiation therapy, use of anti-rejection drugs taken after organ transplant, or the daily use of high doses of steroids (0.3 mg prednisone kg/day or equivalent therapy) within part of or the entire previous six months), hepatic failure (e.g., the patient has decompensated cirrhosis, as evidenced by one or more episodes of jaundice and ascites, upper gastrointestinal bleeding or hepatic encephalopathy or comas), and dementia. Ventilator variables included minute ventilation volume measured as the total tidal volume summed over one minute. Physiological variables included temperature, systolic blood pressure, mean arterial pressure, heart rate, respiratory rate, and urine output. Laboratory variables included hematocrit, white blood cell count, platelet count, serum sodium, potassium, creatinine, albumin and bicarbonate. All variables were obtained 24 h preceding randomization. In the case where there were several measurements of one variable, the ones associated with the worst illness severity was used, for example, both lowest and highest body temperatures were included for analysis because both low and high temperatures were associated with increased risk of mortality as compared with the normal temperature. Intraoperative values or values related to death or cardiac arrest situations were not included for analysis. If no values were obtained for clinical purposes during the 24 h preceding randomization, the laboratory tests were obtained after obtaining patient/surrogate consent, but before initiating study procedures. The ventilator parameters were obtained on day 0. The delivered tidal volume was calculated as the inspired tidal volume (ml) set on the ventilator, minus any additional tidal volume added to correct for compression and ventilator tube expansion (note that Puritan-Bennett 7,200's and some other ventilators make this correction automatically). The plateau pressure measurement should be made with a 0.5-second inspiratory pause. Peak inspiratory pressure was obtained while the patient was relaxed, not coughing or moving in bed. Continuous variables were included in their original forms. Categorical variables were converted to dummy variables. Variables with >10% missing values were excluded, however variables (bilirubin, albumin, glucose, sodium and inspiratory oxygen fraction) with <10% missing values were compensated for using a single imputation (the mice package version 3.3.0) (Van Buuren & Groothuis-Oudshoorn, 2011).
Since GAs were originally developed for the selection of genes, the terms ''gene'' and ''chromosome'' were widely used in the field of bioinformatics. However, the use of these terms in this manner might be confusing in the present study. Herein, I clarify that the GA searching algorithm was employed to search important clinical variables, which were related to mortality, one clinical variable was regarded as a ''gene'' and a group of clinical variables (genes) was regarded as a ''chromosome''. The chromosome size was 15 in the search for candidate predictors. The whole process of GA evolution is shown in Fig. 1. A neural network with one hidden layer of six units was used as the classification method. The terms evolution epochs and chromosomes refer to different things. In one evolution epoch, there can be hundreds of models being developed to form the chromosome pool, and I select the one with the best fitness value. A maximum solution of 200 evolutions was used, indicating that a total of 200 independent evolutions/cycles would take place. Studies had reported that the area under the curve for ARDS mortality prediction ranged between 0.67-0.74 (Damluji et al., 2011;Klinzing et al., 2015;Zhao et al., 2017). Since I hypothesized that the prediction accuracy could be better by using GA search, the fitness goal was set to be 0.77. Furthermore, the fitness goal was chosen so that most evolution epochs can reach the goal, but not too quickly with a small number of epochs/generations (e.g., a number of 200 epochs was used in the study). The area under the curve (AUC) fitness goal was set by trying several iterations. The GA search was performed in the training dataset. The study employed the GALGO package (version 1.4) in R to perform GA search (Trevino & Falciani, 2006).

Developing a representative model
The initial search identified 200 chromosomes that were the best ones in their respective evolution cycles (e.g., a total of 200 GA evolution cycles were performed, and each cycle resulted in one best-fit model with AUC >0.77, if the fitness goal of 0.77 was reached). Although these models all reached the fitness goal, it was not clear which one should be chosen for developing a classifier. Thus, it was reasonable to develop a representative model. The frequency of genes in the population of chromosomes was used as a criterion for inclusion in a pre-selection procedure. I would choose a model with the smallest number of covariates, as long as it is within 99% of the maximum fitness value. Other alternative models with high classification accuracy would also be scored in the Galgo object for reference.

MODEL VALIDATION AND COMPARISON WITH OTHER PREDICTION MODELS
Subjects from the OMEGA trial were used for model validation. The AUC of the model was computed to show the diagnostic performance of the model. Furthermore, I compared the neural network model with the APACHE III score and the model developed by stepwise development of a logistic regression model. The APACHE III score was used because it was a widely used prediction score for unselected ICU patients (Knaus et al., 1991). I hypothesized that our model (i.e., the GA/NN model) would be better than the APACHE III score. Since the logistic regression model was the most widely used statistical tool in predictive analytics in clinical research, the GA model was also compared with the logistic regression model. The DeLong method was used to compare the difference between two receiver operating characteristic (ROC) curves (DeLong, DeLong & Clarke-Pearson, 1988;Robin et al., 2011). A two-tailed p value less than 0.05 was considered to be statistically significant.

GA search
The GA search identified seven important variables associated with mortality (Fig. 2). These variables were age, AIDS, leukemia, metastatic tumor, hepatic failure, lowest albumin, and FiO2. Figure 2A shows the frequency of each variable (gene) presented in the stored chromosomes. The top 50 variables were colored, and the top seven variables were named. Figure 2B displays the stability of the rank of the top 50 variables. It appears that the top four variables stabilized quickly. The red colored variables such as albumin, immunodeficiency, residence prior to admission and chronic dialysis stabilized after approximately 100 epochs/generations. At the right side of Fig. 2B, variables had many changes in ranks (e.g., there were different colors under their names). These variables were considered to be unstable. Perhaps the low ranked ''gray'' variables require thousands of evolutions to be stabilized. Since they were not important for mortality prediction, we did not run thousands of cycles for them to be stabilized. Figure 2C shows the distribution of the number of generations required for an evolution to achieve the fitness goal.

Developing a representative model
A representative model was selected by using the forward selection method (Fig. 3). The criteria to choose a model was that the model consisted of the smallest number of covariates, as long as its fitness value was within 99% of the maximum fitness value. The selection was done by evaluating the test error using the fitness function in all test sets. The figure shows 14 models with the best predictive accuracy. The model labelled 8, containing the 24 most frequent variables, was the best model in terms of accuracy. The other 13 models displayed in the figure are within 99% of the maximum fitness value. Model 8 included variables such as immunodeficiency, metastatic tumor, hepatic failure, residing at home independently, FiO2, chronic dialysis, ventilation mode, albumin, age, highest glucose, highest bilirubin, minute ventilation volume (i.e., the product of tidal volume multiplied by respiratory rate) and admission source (i.e., the location where the patient was immediately prior to ICU admission), showed the highest fitness value and was selected as the representative model. The neural network model was trained with these variables. The hyperparameter tuning is shown in Fig. 4. The selected variables were compared between survivors and non-survivors by univariable analysis in Table 1. The results showed that most of these variables were significantly different between survivors and non-survivors (p < 0.05). Figure 5 shows the importance of the variables in the neural network model, which showed that age was the It appeared that the top four variables stabilized quicker. (C) shows the distribution of the number of generations required for an evolution to achieve the fitness goal. If an evolution epoch cannot reach the fitness goal of AUC = 0.77, the iteration is considered as ''no solution'' and the current iteration stopped. The training sample was split into the training and test sets in 2:1 ratio. Annotations: aids: acquired immunodeficiency syndrome; tumor: metastatic tumor; leuk: leukemia; hepa: hepatic failure; bilih: highest bilirubin; fio2: Fraction of Inspired Oxygen; albuml: lowest albumin; immune: immunodeficiency; hcth: highest value of hematocrit; reside: residence prior to admission; admitfrom: admission source; gluch: highest glucose; pip: peak inspiratory pressure on day 0; resp: respiratory rate on day 0; sodiumh: highest sodium value.
Full-size DOI: 10.7717/peerj.7719/ fig-2 most important variable, followed by creatinine kinase, hematocrit and so on. Table 2 shows the result of the logistic regression model, which showed that most variables were independently associated with mortality. [13]

External validation of the model
[12] [11] [10] [9] [8] [7] [6] [5] [4] [3] [2] [1]  (745) alive (540) die (205) average (2) Figure 3 Forward selection using the most frequent variables. Horizontal axis represents the variables ordered by their rank. Vertical axis shows the classification accuracy. Solid line represents the overall misclassification (misclassified samples divided by the total number of samples). Colored dashed lines represent the accuracy per class. One model resulted from the selection whose fitness value is maximum (black thick line), but 9 models were finally reported because they were very similar in absolute value.  Fig. 6). The results showed that the most important predictors of mortality included immunodeficiency, metastatic tumor, hepatic failure, residing at home independently, FiO2, chronic dialysis, ventilation mode, albumin, age, highest glucose, highest bilirubin, minute ventilation volume and admit source. The model showed a significantly higher predictive performance than APACHE III scoring. Although the discrimination of the neural network model was higher than that developed by the logistic regression model, the difference was not statistically significant. In real clinical practice, the model can be used to stratify patients into risk subgroups. Furthermore, the variables used in the model were obtained within 24 h after ICU admission, which is fast enough to allow adequate time for interventions to take effect. Mortality prediction in patients with ARDS has been extensively investigated in the literature. The APPS score incorporated the variables of age, plateau pressure and arterial oxygen partial pressure to fractional inspired oxygen ratio (PaO 2 /FiO 2 ) (Villar et al., 2016). The score was a 9-point scale, in which a value greater than 7 had a mortality rate of 83.3% and a value below 5 had a mortality rate of 14.5% (p < 0.001). While the AUC was 0.755 in the original cohort, it was 0.62 in an independent cohort (Bos et al., 2016). Both peak and plateau inspiratory pressures were selected as important variables in predicting ARDS mortality in the stepwise regression model in the current study. Consistent with our findings, Panico et al. (2015) also showed that peak airway pressure (OR: 1.13; 95% CI [1.03-1.25]), rather than plateau pressure, was associated with mortality in a multivariable logistic regression model. Similar results were replicated in other studies (Erickson et al., 2007;Walkey & Wiener, 2011;Patel et al., 2016). Interestingly, Zhao and colleagues combined surfactant protein D (SP-D), interleukin-8, age and APACHE III score for the prediction of ARDS mortality, which reported a diagnostic performance comparable to our study. The addition of novel biomarkers significantly increases the predictive performance compared to a model incorporating simple clinical variables (Zhao et al., 2017). I proposed that the major drawback of the study was that the mechanical ventilation variables were not included. Since ARDS patients were primarily characterized by pulmonary dysfunctions, parameters of mechanical ventilation, such as peak inspiratory pressure, driving pressure and tidal volume must play an important role (Villar et al., 2017). The patient type also plays an important role in determining ARDS mortality. In the present study, I found that living independently at home was associated with lower risks of mortality. Patients residing in an intermediate care facility had worse outcomes than those who resided independently at home, when they developed ARDS. This is not unique to ARDS, but had been reported in various conditions, such as ischemic colitis and pulmonary conditions (Peixoto et al., 2017;Duarte et al., 2017). However, most prediction models for ARDS failed to incorporate this factor (Frenzel et al., 2011;Zhang & Ni, 2015;Balzer et al., 2016;Luo et al., 2017), which might be responsible for their lack of satisfactory accuracy. Furthermore, the admission source (e.g., admit from operation room, emergency room, floor, stepdown unit or another hospital) was also an important predictor of mortality. There was evidence showing that patients admitted from the emergency room, had lower ventilator-associated lung injury than those admitted from other sources (Choudhuri, Chakravarty & Uppal, 2017). Furthermore, the mortality rates for overall ICU patients were quite different across various admission sources (Valentini et al., 2013). More recently, diffuse alveolar damage was found to be an important factor influencing clinical outcome (Kao et al., 2015;Cardinal-Fernández et al., 2017). However, this variable cannot be quantified routinely at the bedside, and thus the current analysis cannot include this variable. Perhaps the inclusion of this pathological variable can further improve the diagnostic performance of the predictive model.
Several limitations need to be acknowledged. Firstly, the study was retrospective and observational in design, which has inherent limitations, such as selection bias, loss to follow up and the presence of confounding factors. Further prospective studies are needed to evaluate the effectiveness of the prediction model in improving clinical outcomes. Secondly, the study employed only variables collected within the 24 h after ICU admission, failing to account for the dynamic process of disease progression. Dynamic indices have been shown to be superior to static indices in predicting clinical outcomes. In critical care medicine, such indices include stroke volume variation, lactate clearance rate and glucose variability (Pisarchik, Pochepen & Pisarchyk, 2012;Lee et al., 2015;Chao et al., 2017;Yi et al., 2017). It is not surprising that variables measured late in the disease course have better predictive performances than early ones. However, early predictions are more clinically useful than late ones, because the early prediction allows clinicians to have enough time to take action in order to reduce the mortality risk. It is a compromise between timeliness and accuracy, indicating that the improvement of accuracy is at the cost of delay. Thirdly, artificial intelligence and machine learning are suitable for prediction, but not necessarily for clinical decision making. Rather, there are many barriers to the implementation of ''black box'' methods into the clinical workflow, which remains a relatively novel concept within medicine (Harrison et al., 2017). It is necessary to pilot prospective implementation studies of a tool based on this system in the critical care setting for patients with ARDS. This is a difficult task, however, lack of implementation research is a major limiting factor for models such as this one and the move from the realm of biomedical research to widespread use and application as clinical decision-making tools is challenging.

CONCLUSION
In conclusion, the current study developed and validated a neural network model using GA for the prediction of mortality in patients with ARDS. The most important predictors of mortality were age, AIDS, leukemia, metastatic tumor, hepatic failure, highest bilirubin, and FiO2. The external validation of the model showed that the AUC was 0.821, which is greater than the APACHE III score and logistic regression model, albeit not statistically significant for the latter comparison.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The study was funded by the Zhejiang Engineering Research Center of Intelligent Medicine (2016E10011) from the First Affiliated Hospital of Wenzhou Medical University, the public welfare research project of Zhejiang province (LGF18H150005), the National Natural Science Foundation of China (Grant No. 81901929) and the Scientific Research Project of Zhejiang Education Commission (Y201737841). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the author: