Severity Index for Suspected Arbovirus (SISA): Machine learning for accurate prediction of hospitalization in subjects suspected of arboviral infection

Background Dengue, chikungunya, and Zika are arboviruses of major global health concern. Decisions regarding the clinical management of suspected arboviral infection are challenging in resource-limited settings, particularly when deciding on patient hospitalization. The objective of this study was to determine if hospitalization of individuals with suspected arboviral infections could be predicted using subject intake data. Methodology/Principal findings Two prediction models were developed using data from a surveillance study in Machala, a city in southern coastal Ecuador with a high burden of arboviral infections. Data were obtained from subjects who presented at sentinel medical centers with suspected arboviral infection (November 2013 to September 2017). The first prediction model—called the Severity Index for Suspected Arbovirus (SISA)—used only demographic and symptom data. The second prediction model—called the Severity Index for Suspected Arbovirus with Laboratory (SISAL)—incorporated laboratory data. These models were selected by comparing the prediction ability of seven machine learning algorithms; the area under the receiver operating characteristic curve from the prediction of a test dataset was used to select the final algorithm for each model. After eliminating those with missing data, the SISA dataset had 534 subjects, and the SISAL dataset had 98 subjects. For SISA, the best prediction algorithm was the generalized boosting model, with an AUC of 0.91. For SISAL, the best prediction algorithm was the elastic net with an AUC of 0.94. A sensitivity analysis revealed that SISA and SISAL are not directly comparable to one another. Conclusions/Significance Both SISA and SISAL were able to predict arbovirus hospitalization with a high degree of accuracy in our dataset. These algorithms will need to be tested and validated on new data from future patients. Machine learning is a powerful prediction tool and provides an excellent option for new management tools and clinical assessment of arboviral infection.

Introduction Undifferentiated febrile illness is a common clinical scenario in tropical medicine, with a long list of potential pathogens sharing similar symptoms. Arthropod-borne viruses (arboviruses), including dengue virus (DENV), chikungunya virus (CHIKV), and Zika virus (ZIKV), share common mosquito vectors (Aedes aegypti and Ae. albopictus) and often present with fever, rash, myalgias, and arthralgias. Dengue virus is endemic in the tropical Americas, and the emergence of CHIKV in 2013 in Saint Martin and ZIKV in 2015 in Brazil has brought these arboviruses to the forefront of international attention [1][2][3]. There were over 400,000 cases of dengue fever in Andean Latin America in 2013 [4], with transmission risk expected to increase sharply over the next 50 years [5]. Ecuador in particular has a high burden of arboviral illness, with 86,306 total cases of dengue from 2014-2018 [6][7][8]. There is also a high prevalence of asymptomatic DENV infections and infections with other arboviruses in coastal Ecuador [1,9]. In 2014, CHIKV was introduced to Ecuador, with 35,555 cases from 2014-2018, followed by the introduction of ZIKV in 2016, with 5,304 cases from 2016-2018 [6][7][8].
Clinical decision-making in the context of arboviral infection is particularly challenging in resource-limited settings such as Ecuador. For instance, there may be too few healthcare professionals relative to the high disease burden, which may impact the ability to provide optimal subject care. Ecuador has 22 physicians for every 10,000 people, though this ranges by province from 13 to 32 physicians per 10,000 people [10]. This is just above the 1 physician per 1,000 people benchmark of the World Health Organization (WHO) [11], and physicians are likely concentrated in urban areas. Moreover, molecular diagnostics are often unavailable outside of large urban centers. Of Ecuador's 4,168 healthcare establishments, 1,045 (25.1%) have a clinical laboratory [10], leaving many healthcare providers in Ecuador without these crucial diagnostic tools (i.e. PCR or ELISA). These infrastructural limitations create a challenging clinical environment, especially as healthcare providers need to determine whether a patient with suspected arboviral illness should be hospitalized or not. Efficient and effective triaging is essential for good clinical care in resource-limited settings [12].
Patients with DENV, CHIKV, or ZIKV infections often present with similar symptoms. Fever, lethargy, and arthralgia are common [1,13] and acute febrile illness is a typical manifestation for many patients. In Latin America, DENV, CHIKV, and ZIKV are the three most common infections among acute febrile illness patients [14]. Moreover, co-infection is common if multiple viruses are circulating [13]. Current practice in Ecuador is to hospitalize subjects suspected of dengue infection when they exhibit any of the WHO 2009 dengue warning signs, any signs of shock, or severe thrombocytopenia [15]. While treatment for dengue is supportive, proper inpatient management of severe dengue can reduce mortality dramatically [16,17]. Dengue has a wide spectrum of clinical presentations, with the majority of patients recovering following a self-limited clinical course, and a small percentage progressing to severe disease characterized by plasma leakage. In the latter cases, prompt intravenous rehydration can reduce the case fatality rate to less than 1% [16]. Similarly, management of CHIKV and ZIKV infections is largely supportive, but both infections may result in potentially serious complications, such as adverse neonatal effects [18][19][20][21]. Deciding whether to hospitalize a subject with a suspected DENV, CHIKV, or ZIKV infection is thus an important clinical decision, which often must be made before a clear diagnosis has been determined. This decision has other non-clinical and indirect consequences, including the utilization of hospital resources that could otherwise be used for other patients, as well as increasing the financial cost of the case when compared to less costly outpatient care. Globally, an estimated 18% of dengue cases are admitted to the hospital, with 48% managed as outpatients and 34% not seeking medical attention [22]. The average cost to manage a case of dengue is tripled if the patient is hospitalized [16,22].
Machine learning is a tool that combines statistics with computer science to make efficient use of massive data sets [23]. It differs from traditional statistical modeling (e.g. regression models) in that there are fewer assumptions about the underlying distribution of the data and the relationships between variables. While model interpretability is often a goal of traditional statistical models, this is not important in machine learning. The only goal is to create highly accurate predictions of an outcome of interest, often using as many variables as possible [24,25]. In modeling relationships with a machine learning approach, the computer incorporates connections not obvious to human beings to successfully predict an outcome of interest. Machine learning is applicable in many fields and has been previously used in medical applications, to estimate clinical risk, guide triage or diagnose disease [23,[26][27][28]. Clinical applications of machine learning for arboviral illnesses, specifically, have included analysis of patient genomes for dengue prognosis [29], scanning of patient sera for DENV [30] or Zika diagnosis [31], thermal image scanning for detection of hemodynamic shock [32], analysis of body temperature patterns for diagnosis of undifferentiated fever etiology [33], and analysis of patient data for dengue fever diagnosis [27]. No studies have yet attempted to use machine learning for prediction of hospitalization among arboviral illness or undifferentiated fever patients, although it has been used to predict critical care and hospitalization outcomes based on emergency department triage data in children and adults [34,35].
The objective of this study was to determine if the hospitalization of individuals with suspected arboviral infections could be predicted using subject intake data. This information (i.e. initial clinical details and no diagnostic testing data) replicates the information available to clinicians making the decision whether to hospitalize a patient or not. In this study, we take a retrospective view of arboviral infection management in a tropical city in southern coastal Ecuador using data from an ongoing prospective surveillance study. Using actual clinical practice as a guide, we assessed the ability of seven machine learning algorithms to determine hospitalization using basic symptom and demographic data that was collected via standard intake of subjects with suspected DENV, CHIKV, or ZIKV infections. The machine learning approach and algorithms developed here could potentially support physicians faced with complex clinical management decisions in areas where multiple arboviruses co-circulate, such as Ecuador.

Ethics statement
This study protocol was reviewed and approved by Institutional Review Boards at the State University of New York (SUNY) Upstate Medical University, Cornell University, and the Luis Vernaza Hospital in Guayaquil, Ecuador, the Human Research Protection Office of the U.S. Department of Defense, and the Ecuadorean Ministry of Health (MoH). Clinical and demographic data from study subjects was obtained following written informed consent, and/or assent (as applicable) as per the study protocol (described previously) [1]. For those subjects unable to participate in the consent and/or assent process, an adult representative documented consent. Parents signed a written informed consent for children aged 6 months to 17 years, and children aged 7 to 17 additionally signed a written assent.

Study design and data source
We conducted a retrospective analysis of data from a prospective arbovirus surveillance study, which included subjects (age �6 months) recruited from Ecuadorean MoH clinical sites from November 2013 to September 2017 in the city of Machala, Ecuador. Subjects were identified as a part of an ongoing, multi-year arbovirus surveillance project, a description of which has been published previously [1]. Briefly, subjects were invited to enroll in the study if they presented at the reference hospital or one of four outpatient clinics and were diagnosed with arboviral infection by MoH physicians. In 2014 and 2015, we recruited subjects who were clinically diagnosed with dengue fever by MoH physicians based on their individual clinical suspicion for DENV infection. We assume that diagnostic standards for each respective infection were similar across study sites, as all physicians receive the same training from MoH. Following the local emergence of CHIKV (2015) and ZIKV (2016), the inclusion criterion in 2016 and 2017 was expanded to include subjects clinically diagnosed with DENV, CHIKV or ZIKV infection. At the time of enrollment, subject demographic information, clinical history, and symptoms present during current illness were collected using a questionnaire administered by trained study personnel. Subjects were asked about symptoms in the past 7 days, including the following: headache, anorexia or nausea, muscle or joint pain, rash, bleeding (defined as bleeding from respiratory, digestive, or genitourinary mucosa), rhinorrhea, vomiting, lethargy or drowsiness, cough, abdominal pain, diarrhea, and retro-orbital pain. Conjunctivitis was later added to the enrollment survey after the emergence of ZIKV but was not included in this analysis. Laboratory data (hematocrit, white blood cell count, neutrophils, lymphocytes, and platelet count) were collected at the time of enrollment if the subject had copies of recent laboratory evaluation (for outpatients), or the first labs on admission to the hospital (for hospitalized subjects) were used. Additional laboratory data were available for hospitalized subjects, but analysis was limited to the aforementioned three parameters as these were consistently available among a subset of the non-hospitalized subjects. Laboratory arboviral diagnostic data were not included, as these data are often not available at the time that a physician decides whether or not to hospitalize a patient, and we utilized only the data that would realistically be available. Data from enrollment surveys was used for the analysis of the non-hospitalized outpatients in the current study. Laboratory data on the hospitalized subjects was verified by review of medical records and managed using REDCap software [36] hosted at SUNY Upstate Medical University.

Exclusion criteria
Hospitalized subjects whose physical medical records could not be located and subjects with incomplete enrollment survey data (i.e. missing hospitalization status, symptom survey questions) were excluded. The subset of the non-hospitalized subjects who had available laboratory data were included in a second analysis with the same hospitalized cohort, all of whom had available laboratory data.

Statistical analysis
The outcome variable was hospitalization status. Variables of interest included demographic data, presenting symptoms, past medical history, and laboratory data (hematocrit, white blood cell count, neutrophils, lymphocytes, and platelet count). A prediction algorithm was developed using demographic and symptom data only (28 total predictors), called the Severity Index for Suspected Arbovirus (SISA, in Spanish the Severidad de Infecciones Sospechosas por Arbovirus). A second prediction algorithm was developed using demographic, symptom, and laboratory data (33 total predictors), called the Severity Index for Suspected Arbovirus with Laboratory (SISAL, in Spanish the Severidad de Infecciones Sospechosas por Arbovirus con datos del Laboratorio). Characteristics for hospitalized and non-hospitalized subjects among these subject groups were compared using a two-sample t-test (continuous) or Fisher's exact test (categorical).
In machine learning, 10-fold cross-validation with holdout data results in an unbiased estimate of model validity and accuracy [24,37]. Thus, our datasets were divided by random sampling into training and testing (holdout) data sets. For SISA, the training set was 85% of the full dataset and the testing set was the remaining 15%. For SISAL, the training set was 70% of the full dataset and the testing set was the remaining 30% (the SISAL dataset was allowed a greater percentage to allow for sufficient sample size in the testing set). With the training dataset, we used repeated 10-fold cross validation to estimate the ability of six algorithms with diverse statistical approaches-bagged trees (bags) [38], k nearest neighbors regression (knn) [39], random forest [40], elastic net regression [41], generalized boosting models (gbm) [42], and neural networks [43]-to predict hospitalization. Because we have no prior assumptions about the nature of the relationship between the available predictors and the outcome, we use a variety of statistical approaches to improve the likelihood that we will find an algorithm that works well with these data. Following a published criticism of machine learning prediction compared to logistic regression [44], we added logistic regression to our list of algorithms to test (seven total algorithms, model descriptions in S1 Table). For models with tuning parameters (knn, random forest, elastic net, and gbm), tuning was performed using another layer of repeated 10-fold cross-validation [45]. The final model for each algorithm was created based on all training data. Each algorithm was then used to predict hospitalization outcomes within the holdout testing dataset, resulting in the final performance measure for the model. Model predictions were probabilities that the given observation set was hospitalized or not, with 0.5 used as the probability cut-off for hospitalization classification. Measures of discrimination [46], including accuracy, Cohen's kappa, and area under the curve (AUC) for the receiver operating characteristic (ROC) were calculated to determine the performance of each algorithm. Each algorithm's classification predictions were compared to the true outcomes of the data used, such that the correct/incorrect ratings of the algorithm were compared to what happened. Accuracy is the percentage of correct classifications out of all classifications made; Cohen's kappa (potential values: 0-1) also calculates this percentage but compares each algorithm's performance to classifications made by random guessing [47]. The AUC (potential values: 0-1) considers both the true and false positive predictions, with higher AUC indicating a high true positive prediction and a low false positive prediction (i.e. it is sensitive and specific) [46]. For each fold of the cross-validation, performance measures were calculated and averaged across all folds and repetitions (100 preliminary models for algorithms with no tuning parameters), resulting in a mean cross-validation performance measure that estimates how the algorithm will perform on a new dataset. The best algorithm for SISA/SISAL was chosen based on the highest AUC as calculated from the holdout test set. A flow chart of the entire approach is available in S1 Fig. Model residual plots were examined. The relative contribution of each variable to the model (i.e. variable influence or influence on prediction) was calculated using model-or non-model specific methods as appropriate (see caret [39] documentation for details). Calibration plots provide a method to graphically evaluate the predictive ability of a prediction model [48]. Subjects in the holdout test set were separated into deciles (SISA) or quintiles (SISAL) and the mean predicted hospitalization probability and proportion of actual hospitalizations were calculated for each decile/quintile. These values were plotted to create calibration plots; the distance of the points from the diagonal (perfect prediction) shows whether the prediction model is over-or under-predicting among certain risk groups [48]. Data analysis and visualization were performed using SAS version 9.2 (SAS Institute, Cary, NC) and R version 3.2.2 (R Foundation for Statistical Computing, Vienna, Austria) in RStudio (RStudio, Inc., Boston, MA) including packages haven [49], caret [39], MASS [43], ipred [38], randomForest [40], elasticnet [41], gbm [42], nnet [43], mgcv [50,51], kernlab [52], glmnet [53], and pROC [54]. Code for the machine learning analyses is available at https://github. com/rsippy/SISASISAL.
We compared the prediction abilities of SISA versus SISAL to assess whether laboratory data could improve our ability to predict subject hospitalization status. Because there may be some selection bias for subjects with available laboratory data (e.g. more severe symptoms, more similar subject data, or different socioeconomic status compared to typical patients with clinical arboviral diagnosis), the subject groups in SISA and SISAL may not be exchangeable [55]. We performed a sensitivity analysis to determine if the selected algorithm and prediction ability of SISA is the same when using all SISA subjects or SISAL subjects (without laboratory data) for the training and testing steps.

General characteristics
Between November 20, 2013 to September 13, 2017, 592 subjects were recruited into the arboviral surveillance study. After exclusions (Fig 1), 534 subjects were included in the dataset for SISA, of which 59 were hospitalized and 475 were not hospitalized. The SISA training dataset included 455 subjects and the holdout test dataset included 79 subjects. The SISAL dataset included 98 subjects, of which 59 were hospitalized and 39 were outpatients. The SISAL training dataset included 70 subjects and the holdout test dataset included 28 subjects. Demographics and symptoms for the two datasets are in Table 1. Presenting temperature was higher, and presence of mucosal bleeding, vomiting, and abdominal pain were significantly more common in hospitalized subjects in the SISA dataset.

Prediction of hospitalization status
Accuracy, Cohen's kappa, and AUC for the training set (from repeated 10-fold cross-validation) and the holdout test set (final performance) are shown in Fig 2. For SISA, using only symptoms and demographics, generalized boosting model, elastic net, neural networks, and logistic regression performed well with the test set (accuracy: 89.8-96.2%, Cohen's kappa: 0.00-0.77, AUC: 0.50-0.91). The generalized boosting model algorithm was found to have the best final AUC (0.91) among the test dataset and was the second-best algorithm in the training set. The sensitivity for this algorithm was 95.8%, and the specificity was 87.5% when predicting hospitalization of subjects in the test dataset. The variables with the greatest influence on the final SISA model were drowsiness, bleeding, vomiting, and temperature. The calibration plot for this prediction is in Fig 3; the SISA model shows under-prediction of hospitalization risk among low-risk groups and over-prediction among high-risk groups.  in the test dataset. The variables with the greatest influence on the final SISAL model were drowsiness, orbital pain, and platelet count. The calibration plot for this prediction is in Fig 5; the SISAL model shows under-prediction of hospitalization risk among low-risk groups and over-prediction among high-risk groups.
The results for SISA when trained with the SISAL subjects (without laboratory data) are shown in S2 Fig. All models except k nearest neighbors and logistic regression performed well (test set accuracy: 53.5-92.9%, Cohen's kappa: 0.04-0.86, AUC: 0.51-0.94). The bagged trees, random forest, generalized boosting models, and elastic net algorithm had identical final AUC values (0.94). The sensitivity was 100% and the specificity was 88.2% when predicting hospitalization of subjects in the test dataset. If the SISA and SISAL subjects were exchangeable, we would expect the SISAL subject group (without laboratory data) to produce the same results as the SISA analysis. Because these results differ from those obtained in the SISA analysis, we conclude that the SISAL subjects are not exchangeable with the SISA subjects.

Discussion
Suspected arboviral infections impose large health and financial burdens on populations in which the diseases are endemic. In 2013, the estimated global cost of dengue illness was US $8-9 billion [22]. In many arbovirus endemic regions, DENV, CHIKV, and ZIKV infections are diagnosed based on clinical presentation and basic laboratory results, which can be difficult due to nonspecific symptoms and limited availability of definitive diagnostic tools [56]. In this study, we demonstrate that our machine learning algorithms were able to predict hospitalization status among our cohort of subjects with suspected arboviral illness with up to 96% accuracy using only symptom and demographic data. We thus describe the early development of a new tool, SISA/SISAL, which in the future may be utilized by clinicians in resource-limited settings when triaging subjects with suspected arboviral illness. The final SISA model used the generalized boosting model. These models are also called stochastic boosting gradients or gradient boosting machines and were developed by Jerome Friedman [57,58]. This ensemble-type model is based on a sequentially built series of simple Severity index for hospitalization of suspected arbovirus infections classification trees and its final predictions are based on the collective ensemble of trees, with some trees weighted more heavily than others [59]. Generalized boosting models are particularly adept in solving hard-to-predict observations; the "boosting" component is the model forcing itself to improve these predictions (i.e. reducing prediction error) by building additional trees until it is correct. The final SISA model makes predictions from a weighted set of 150 weak (single node) trees and included information from 23 of the original 28 predictors (all except history of hypertension, history of asthma, history of diabetes, and history of dengue), with symptoms of drowsiness, bleeding, vomiting, and temperature providing most of the predictive information (i.e. highest variable influence). Calibration plots showed that the SISA model under-predicted hospitalization risk among low-risk groups, and over-predicted hospitalization among high-risk groups. Because the holdout test set for SISA was relatively small (n = 79), it is unclear if these prediction trends would hold in a larger validation set of subjects.
The final SISAL model was an elastic net regression. This is a type of regression that applies additional terms (alpha and lambda) to the regression coefficients, resulting in some terms being set to zero (i.e. eliminating some coefficients) and "shrinking" some others (i.e. minimizing the magnitude of the coefficients), particularly coefficients from highly correlated predictors [60]. The final SISAL model was an elastic net regression with an alpha of 0.5 and a lambda of 0.25 and included information from three of the original 28 predictors (drowsiness, retro-orbital pain, and platelet count). The SISAL model showed the same under-and overprediction trends as SISA, and like SISA, the holdout test set was small (n = 28). The prediction trends of SISAL should be assessed with a larger validation set to determine if there are prediction weaknesses for the model among specific patient groups.
In our cohort, we found that hospitalized cases had statistically significant-though clinically insignificant-elevations in temperature at presentation in both SISA and SISAL. This demonstrates an algorithm's ability to make use of small differences in data. In the SISA dataset, Severity index for hospitalization of suspected arbovirus infections mucosal bleeding, vomiting, and abdominal pain were more common in hospitalized subjects than in outpatients. In the SISAL dataset, while hospitalized subjects experienced more mucosal bleeding and vomiting than outpatients, the presence of abdominal pain did not differ between groups. This could suggest that the outpatient subjects who were sent for laboratory testing represented cases of serious concern, as abdominal pain may qualify those with suspected or confirmed dengue for hospital admission [15,16]. For SISAL, hospitalized subjects had lower hematocrit and platelet counts when compared to non-hospitalized subjects; lower platelet counts are to be expected in hospitalized dengue cases.
Our sensitivity analysis revealed that SISA produced different results when its training/testing dataset was restricted to those subjects with laboratory data available. This result is unsurprising, as we expect that selection bias is contributing to the subjects available for the SISA and SISAL datasets. All hospitalized subjects had laboratory data available, and we would additionally expect that subjects with laboratory data had some signs or symptoms that would prompt the attending physician to request laboratory diagnostics, setting them apart from subjects without laboratory data. These signs and symptoms are also likely linked to whether subjects were eventually hospitalized or not, meaning these groups of subjects are not directly comparable to one another. When we used the SISA approach (symptoms and demographics only) for a dataset comprised of the SISAL group of subjects (without laboratory data), we found that the AUC was identical to the AUC from the SISAL approach. This would suggest that we are unable to improve our prediction of hospitalization status by using subject laboratory data; though a study implementing the use of laboratory tests among a general population could potentially find that laboratory tests provide an added benefit for prediction of hospitalization status. In our dataset, the AUC was higher for the SISAL group of subjects, but these improvements are likely due to fundamental differences between the SISA/SISAL groups of subjects. These patient groups should continue to be analyzed with separate algorithms. This is the first use of machine learning to predict hospitalization status of subjects with clinically diagnosed arboviral infections. Our models exhibit high accuracy, sensitivity, and specificity in a region with a high burden of co-circulating of DENV, CHIKV, and ZIKV. These algorithms, particularly SISA, use information that could easily be obtained in resourcelimited settings, suggesting the potential to develop a useful tool for clinicians. Our model's accuracy is consistent with tools previously reported in the literature. Past predictive modeling of disease with a machine learning approach had been efficacious in the diagnosis of pneumonia (95% sensitivity, 96.5% specificity), dengue (70% sensitivity, 80% specificity), hepatitis (96% accuracy), and tuberculosis (95% accuracy) using clinical and laboratory parameters [27,[61][62][63].
There has been criticism regarding the use of machine learning in prediction models. A recent systematic review found that machine learning predictions had no advantage over logistic regression predictions, on average [44]. Christodoulou et al. do an excellent job of outlining some common missteps in the use of machine learning for prediction and the somewhat alarming lack of transparency in many published machine learning prediction models. We agree with many of the assertions made by the authors and strive to improve reporting and validation in our own work, in accordance. However, in this specific study, we did not find that logistic regression performed better than other algorithms. Our overall approach differs from that of most machine learning papers in that we did not assume that one particular algorithm would have superior prediction abilities for our data. We rigorously compared multiple algorithms with the goal of finding an algorithm that functions well with our predictors and outcome of interest, to be further validated with a new dataset in future research. We have no illusions about the potential lack of generalizability of our data and caution against any strong conclusions about the future utility SISA/SISAL in predicting hospitalization status for future patients. In the current study, we present preliminary yet promising results in the development of a future tool that will need additional, vigorous validation using additional future sets of subject data before use in the real world.
Numerous studies have looked at clinical and laboratory findings specific to certain arbovirus diagnoses, yet few have proposed tools that can aid in management of unconfirmed febrile illness [64][65][66][67]. A study in Puerto Rico of acute febrile illness emergency room cases found the tourniquet test and leukopenia to be predictive of dengue diagnosis, yet dengue was confirmed in only 11% of their total 284 cases [68]. In Thailand, fever, positive tourniquet test and leukopenia differentiated confirmed dengue from other febrile illness initially suspected as dengue [69]. Also in Thailand, among a sample of 172 children with acute fever without obvious cause, those with dengue had several laboratory parameters that differentiated them from the other febrile illness [56]. While these studies were able to distinguish dengue from other acute febrile illness, they highlight the large proportion of cases that do not get a confirmed diagnosis, and most studies have not moved beyond initial reports to demonstrate predictive abilities. With SISA/SISAL, the approach is more empirical. Clinical diagnosis of DENV, CHIKV, or ZIKV infection was a starting point for the machine learning used here. Given that timely laboratory diagnostics may not be available, grouping these suspected subjects reflects the reality that physicians face in the clinic in arbovirus-endemic regions. That such a model can accurately predict hospitalization outcome suggests that SISA/SISAL could be expanded to undifferentiated febrile illness. The ability of machine learning models to predict hospital admission outcomes using only emergency department triage data lends support to expanding our approach to undifferentiated fever [34,35]. Of the suspected arboviral cases analyzed here, approximately 54% were confirmed as acute or recent DENV infection, 17% had acute CHIKV infection, and 29% were negative for DENV, CHIKV or ZIKV (based on analysis of subjects in 2014 and 2015) [1]. Results of the 2016 and 2017 subject samples are pending, but preliminary PCR testing suggests predominance of CHIKV in 2016 and ZIKV in 2017.
Clinicians rely on tools to help make decisions about patient management, and simple tools can benefit physicians in limited-resource settings [70,71]. Smart phones are commonly used in Ecuador and mobile health tools are a great option for physicians, with several popular apps that include various triage rules and scores, such as MDCalc [72]. After further development and validation of our algorithmic approach, and evaluation of its potential benefit in the clinic, we conceive of its inclusion in a user-friendly mobile application to aid in the decision to hospitalize patients with undifferentiated fever.

Limitations
The variables with the greatest influence on the final SISA model were drowsiness, temperature, and nausea; for the SISAL model they were drowsiness, orbital pain, and platelet count. An important caveat inherent to the nature of machine learning is that the exact weight of each variable in the final prediction model is difficult to assess and interpret, thus we cannot propose a causal relationship or correlation between these variables and our outcome of hospitalization.
The SISA/SISAL models are presented here in the first iteration of their use. They have not yet been validated beyond the current datasets, but the use of holdout data and 10-fold crossvalidation provides us with an unbiased estimate of model validity as well as prediction accuracy. An external validation of these algorithms with a new dataset is ongoing, as well as the testing of fewer prediction variables with the eventual goal of an easy-to-use online or mobile app for use in the clinic.
In this study, we used the outcome of subject hospitalization for both prediction models. The sensitivity and specificity of SISA/SISAL relies on the assumption that the subjects in this dataset were correctly hospitalized. It is possible that some subjects were treated as outpatients when they should have been hospitalized, or that some subjects were hospitalized unnecessarily. For subjects that were incorrectly treated as outpatients, it is likely that the subject would return to a clinic to receive care, as their symptoms would likely drive them to do so. Because our collection of medical records was retrospective, we were able to capture subject hospitalization at any point, even if they were initially treated as outpatients. Hospital Teofilo Davila is the reference MoH hospital in the province, and it is unlikely that these subjects would have sought care at a hospital elsewhere. It is possible for some subjects to have been hospitalized unnecessarily; we have no way of identifying these subjects or truly knowing if it was safe for these subjects to have been treated as outpatients. As a result, our algorithms could thus recommend hospitalization unnecessarily. Although hospitalization could place undue financial burden on some patients and the health system, failure to hospitalize a serious case could results in grave consequences and we would prefer to take a cautious approach in hospitalization decision-making. Moreover, these algorithms are merely intended as a tool to inform clinical judgement, not to replace important clinical triage decisions [73].
The time period during which our data were collected (2014-2017) included the emergence of two important new arboviruses-CHIKV and ZIKV. The MoH provided training to its personnel (including those working at the hospital and clinics in this study) to identify and diagnose these patients. For patients, the potential severity of these infections and their novelty may have increased the number of patients willing to be hospitalized or to seek healthcare in the first place. With ZIKV infection, physicians may have been more likely to hospitalize pregnant women. This may limit the generalizability of SISA/SISAL in future subject datasets, though as viral diseases continue to emerge globally, it is important to test the ability of decision-making tools to function under these dynamic scenarios. For new diseases with clear warning signs for potentially severe disease, we would expect SISA/SISAL to work well.

Conclusions
Clinicians in resource-limited settings commonly encounter subjects with a suspected diagnosis of DENV, CHIKV, or ZIKV infection and often have limited tools at their disposal. A subject may be unable or unwilling to provide a laboratory specimen, and diagnostic testing may not always be available. The SISA/SISAL models are promising clinical tools, given the high sensitivity and specificity for both models. Machine learning, if used thoughtfully, can be a powerful method for building such prediction models, making the best use of real-world available clinical data.
Supporting information S1 Table. Classification algorithms used for prediction. Predictors and outcomes are the actual data that are put into the model; for this manuscript the predictors are the variables from each subject and the outcome is whether the subject was hospitalized or not. The final prediction is determined by each algorithm, i.e. the algorithm predicts whether the subject was hospitalized based on the predictor variable values. These final predictions are compared to the true outcome to determine how well each algorithm performed. (DOCX) S2 Table. Final SISAL model. The final SISAL model was an elastic net regression model with an alpha value of 0.5, a lambda value of 0.25, and three coefficients (all other coefficients were reduced to zero). (DOCX) S1 Fig. Flow chart of approach. This chart shows the algorithm development, training and testing processes and the flow of data, using an example algorithm with no tuning parameters with the SISA dataset. Repeated 10-fold cross-validation is used for algorithm development to produce an estimate of the final model performance (Mean CV-AUC). The final performance for the algorithm is calculated from the holdout test data. This process was repeated for each algorithm. (DOCX) S2 Fig. SISA analysis of SISAL dataset. Accuracy (blue), Cohen's kappa (red), and AUC (green) were calculated for the repeated 10-fold cross validation (left) and the holdout test dataset (right) for prediction of hospitalization status in clinically diagnosed DENV, CHIKV or ZIKV infections. bag = bagged trees, knn = k nearest neighbors, rf = random forest, gbm = generalized boosting models, enet = elastic net, nnet = neural networks, log = logistic regression, DENV = dengue virus, CHIKV = chikungunya virus, ZIKV = Zika virus (DOCX)