Machine Learning Models for Analysis of Vital Signs Dynamics: A Case for Sepsis Onset Prediction

Objective Achieving accurate prediction of sepsis detection moment based on bedside monitor data in the intensive care unit (ICU). A good clinical outcome is more probable when onset is suspected and treated on time, thus early insight of sepsis onset may save lives and reduce costs. Methodology We present a novel approach for feature extraction, which focuses on the hypothesis that unstable patients are more prone to develop sepsis during ICU stay. These features are used in machine learning algorithms to provide a prediction of a patient's likelihood to develop sepsis during ICU stay, hours before it is diagnosed. Results Five machine learning algorithms were implemented using R software packages. The algorithms were trained and tested with a set of 4 features which represent the variability in vital signs. These algorithms aimed to calculate a patient's probability to become septic within the next 4 hours, based on recordings from the last 8 hours. The best area under the curve (AUC) was achieved with Support Vector Machine (SVM) with radial basis function, which was 88.38%. Conclusions The high level of predictive accuracy along with the simplicity and availability of input variables present great potential if applied in ICUs. Variability of a patient's vital signs proves to be a good indicator of one's chance to become septic during ICU stay.


Introduction
e sepsis syndrome occurs when an infectious agent produces a systemic response in the host [1]. is condition may progress to severe sepsis with the presence of multiple organ dysfunction or septic shock when there is a profound decrease in systemic blood pressure [2]. Both these latter conditions are associated with significant morbidity and mortality, and sepsis remains the most expensive condition treated in the hospital [3]. Timely intervention with appropriate antibiotic administration and hemodynamic optimization has been shown to improve outcomes and decrease costs [4]. is in turn requires early recognition which is dependent on the vigilance of the treating personnel identifying the signals heralding the onset of the syndrome.
However, many demands are made on the staff of busy intensive care units, where these patients are typically treated, so that delays in the administration of life-saving treatments invariably occur.
To date, the diagnosis of sepsis has largely relied on identifying the presence of the Systemic Inflammatory Response Syndrome (SIRS) together with the presence of infection, hemodynamic variables, and organ dysfunction [5]. In addition, screening laboratory tests are often required to confirm the diagnosis. However, the SIRS criteria do not have high sensitivity and specificity while laboratory tests require time and so further delay treatment [6].
For this reason, alternative modalities for the early detection of sepsis have been sought. is has been facilitated by the increasingly widespread use of Electronic Medical Records (EMRs), which collect and display patient data in real time. However, a multitude of parameters are generated every second so that a more focused sepsis-recognizing approach is required. In this regard, automated electronic alert systems have been described which typically rely on the presence of the SIRS criteria as the basis for the alert. A recent systematic review of automated electronic sepsis alert systems concluded that they had a poor positive predictive value and did not improve mortality or length of stay [7,8].
Traditional interpretations of the physiologic events that follow exposure to bacterial endotoxin have focused on absolute changes in measured end-points [9]. However, unlike in health, where physiologic systems act like biological oscillators that are coupled, during systemic inflammation, this state may be lost (uncoupled), resulting in both absolute changes in the functional intensity of physiologic end points and a generalized loss of physiologic variability [10]. Recently, it has been increasingly recognized that this altered autonomic regulation in sepsis may be related to the concept of cholinergic anti-inflammatory pathways.
us, for example, studies have suggested that early reduction of heart rate variability may serve as a noninvasive and sensitive marker of the systemic inflammatory syndrome, thereby widening the therapeutic window for early interventions [11]. Heart rate variability had been used in the prediction of cardiovascular and cerebrovascular events, sudden cardiac death, and epileptic seizures and has yet to be used for sepsis detection [12][13][14]. Godin et al. [15] recently reported that experimental human endotoxemia induces an increase in heart rate regularity using time series analysis and the statistical technique of approximate entropy (ApEn). Using ApEn as a measure of regularity, other clinical studies have shown that increased regularity predicts the postoperative ventricular dysfunction [16], the ability to wean from mechanical ventilation [17], and the occurrence of cardiac dysrhythmias [18].
Several works concentrated on leveraging data accumulated from bedside monitors to identify propensity of sepsis acquisition in the ICU. Guillén et al. [19] used vital signs measurements and lab tests results in order to predict septic patients' likelihood to develop severe sepsis during ICU stay. e mean, median, maximum, minimum, and standard deviation were computed for each set of vital sign/ lab result measured during an individual stay, and these features were used to train a logistic regression (LR) model, support vector machines (SVM) models with various kernels, and logistic model trees (LMT). e study demonstrated accuracy measured by maximal area under the curve (AUC) of 0.84, as derived from SVM with radial basis function (RBF) performed for vital signs only and 0.882 derived from LMT based on vital signs and lab results. Calvert et al. [20] investigated the correlations between pairs and triplets of vital signs measurements as well as the overall trend of the measurements overtime (i.e., increase, decrease, and no change) in order to predict sepsis in adult ICU population, up to 3 hours before first SIRS episode. eir results demonstrated the accuracy measured by average AUC of 0.83 but dictated the use of a rather larger dataset which usually mandates greater processing time.
We hypothesized that the change in variability of a number of physiological parameters commonly measured by EMRs might provide an early alert for impending sepsis. In this study, we present a novel approach to assess the magnitude of instability in 4 common vital signs and incorporate these findings into a prediction model for the development of sepsis within an adult ICU population.

Data Collection and Inclusion Criteria.
is is a retrospective study using the electronic medical records (EMRs) of patients admitted to the general intensive care unit (ICU) of the tertiary-level, university-affiliated Rabin Medical Center (RMC), Petah Tikva, Israel, over the period 2007-2014. Our ICU uses a specialized EMR system (Metavision, iMDsoft, Israel) which allows running queries. e EMRs document in real-time all clinical as well as laboratory data, drug administration, and medical notes for all patients admitted to the ICU. For this study, the data were anonymized prior to analysis to exclude all specifics of patient identity. e trial was approved by the hospital's institutional review board with a waiver of informed consent as the study did not affect clinical care and all data were anonymized. Systemic inflammatory response syndrome (SIRS) is the systemic inflammatory response to a variety of severe clinical insults. e response is manifested by two or more of the following conditions (SIRS Criteria): (1) temperature >38°C or <36°C obtained continuously using a temperature probed placed in the nasopharynx (Deloyal, USA); (2) heart rate >90 beats per minute; (3) respiratory rate >20 breaths per minute or PaCO 2 <32 mm Hg; and (4) white blood cell count >12,000/cu mm, <4,000/cu mm, or >10% immature (band) forms. e condition of sepsis as regarded to in this study is defined as the presence of at least 2 SIRS criteria within a consecutive 24 hour interval and a diagnosis of an infection [1].
Inclusion criteria for this study were as follows: (i) Adult patients >18 years admitted to the general intensive care department (ii) Patients stayed a minimum of 12 hours in the ICU (iii) Patients did not meet SIRS criteria at time of admission to the ICU (iv) Continuous documented measurements were available for at least 12 hours for vital signs: heart rate, temperature, and mean arterial blood pressure as recorded from an arterial line and respiratory rate as recorded from the mechanical ventilator

Target and Control Groups.
A process of backward labeling was performed in order to identify and label the target population, i.e., those who developed sepsis during their ICU stay, in the following manner. Out of 4,534 patients admitted to the ICU between 2007 and 2014, only 1,605 were diagnosed with a sepsis-related infection (first requirement for sepsis diagnosis). Out of these, only 1,593 met the sepsis definition and only 401 were admitted to the ICU at least 12 hours before sepsis detection moment, the time in which antibiotics were administered to treat the detected sepsis. Finally, only 300 patients had complete data records in the data collection period ( Figure 1). ese patients were selected as the target group with sepsis detection moment, the time of antibiotics administration by attending physicians, denoted as T 0 . From the control group, which consisted of patients who were not diagnosed with a sepsis-related infection during their ICU stay, 300 patients were randomly selected in order to allow for balance between groups' number of patients, their average age, and gender distribution (Table 1). For these patients, who were not treated with antibiotics, T 0 was assigned arbitrarily to a time point of at least 12 hours after admission to the ICU.

Feature Extraction.
In this study our choice to focus on the analysis of 4 vital sign stems from the fact that these parameters are typically available in all ICUs, are clinically recognized signs of sepsis, and are collected at frequent intervals. e information systems in the ICU record vital sign data into the electronic medical records, and every 10 minutes, the system samples the current measurement and records the absolute value with a frequency of 6 records per hour.
In order to assert our hypothesis that the development of sepsis is preceded by a period of instability, we developed a method to quantify the magnitude of variability in vital signs prior to T 0 . We divided the 12 hours period prior to T 0 into two time intervals: the interval of data collection T and the interval between the prediction moment and the sepsis detection moment 12-T ( Figure 2). us, in the T hour interval before the sepsis prediction moment, N � 6 · T discrete measurements of each vital sign were documented. For each patient i, X i ∈ R N represents one of the following vital sign measurements: mean arterial pressure, heart rate, respiratory rate, and temperature.
For each X i , we defined a corresponding vector Y i as the vector of local minimum and maximum values of X i . Each Y i � (y 1 , . . . , y n ) vector indicates events of trend change in the given vital sign. e values in Y are sorted according to their appearance in series X (this process is detailed in Algorithm 1). e following features are then extracted from each of the vectors Y i : (1) Number of trend changes (f 1 ) � the number of local extreme values of X i � |Y|, which equals to the size of vector Y i . Y i is defined as a series of local minimal and maximal values. Each value, be it local maximum or local minimum, corresponds to a change in the dynamics of the vital sign, e.g., there is a trend for an increase before a local maximum and for a decrease after it. erefore, any extreme value determines a trend change. is feature allows us to compare instability in a vital sign. A vital sign with more trend changes is considered less stable than the one with fewer changes.
(2) Mean intensity of changes (f 2 ) � mean 1≤i≤(n− 1) |y i+1 − y i | . is feature indicates the mean magnitude of changes in a vital sign. A vital sign with a higher mean intensity of change is considered less stable than the one with a lower mean.
is feature indicates the median of changes in a vital sign-the value at which the lower 50% of measurements top.
is feature indicated the minimal magnitude of change in this vital sign measurements interval.
is feature indicates the maximal magnitude of change in this vital sign measurements interval.
A collection of 5 features were extracted per vital sign, resulting in 20 features per patient. ese features addressed both the amount of changes and their intensity (or magnitude) throughout a specific time interval. To check our features' ability to evaluate instability or variability of a vital sign, we compared Guillen's features for predicting severe sepsis (mean, median, maximum, minimum, and standard deviation of vital sign) [19] to ours. Guillen's features' values varied very little between very unstable vital sign recordings and those which were more stable. Figure 3 shows an example of the behavior of the mean arterial pressure (MAP) during the first 8 hours in two patients, one who developed sepsis during the following four hours and another patient that did not. Guillen's features' values as well as our feature's values are given in Table 2. When comparing these same time series with respect to our features, a great difference is evident in quantitative measures. Our features demonstrate the variability in the behavior of MAP; this is while Guillen's features are very similar for patients with a very distinct MAP behavior. A trend in MAP features curve (Figure 3, bottom) does not indicate the development of sepsis and could be attributed to other conditions.
In an attempt to separate patients that developed sepsis from those who did not, we examined the statistics (mean and standard deviation) of our features for both groups. ese values are presented in Table 3 with their corresponding p values. e values in the table indicate that measured features belong to different distributions with high probability (low p values).

Dimensionality Reduction.
In order to reduce the dimensionality of the problem, we selected four features which contributed the most to creating a separation between target and control populations. e most important features were selected by analyzing the features importance from all tested models. e feature selection processes was conducted in two phases. During the first phase, we have trained 5 different models and estimated the importance of the features model-dependent importance metrics as defined by R caret package [21]. In the second phase, the top two most important features were selected for each model. e combined set of all model-specific features is used as a final feature set. Naturally, in most cases, there was an overlap between features selected by different models.
us, the merged set of features consists only of 4 different features. is process is illustrated in Figure 4, where the most important features for the SVM with RBF kernel are presented. e most important features of this model also coincide with the final set of all merged features. e x-axis on the graph represents the normalized model-dependent measure of accuracy (in the case of the SVM, AUC). e chosen features were as follows: the number of trend changes in respiratory rate and arterial pressure, the minimal change in respiratory rate, and the median change in heart rate. is left us with a compact model consisting of 4 features instead of 20. Figure 5 provides further visualization of the distinction between groups based on these 4 features.

Training and Testing.
e task of predicting sepsis onset is in fact a classification problem, to decide whether a given patient example would be diagnosed as septic or not at a  from vital signs) and the actual outcome (sepsis or not at a given time). It builds a mathematical representation, i.e., a model, of these relationships, and calculates a decision when given new input data without an outcome. In order to solve a binary classification problem, whether sepsis develops in the next X hours, we trained and tested the following five machine learning classification models: logistic regression (LR), support vector machine (SVM) with linear, radial, and polynomial kernel, and artificial neural networks (ANNs). ere are other well-known classification methods (e.g., random forest) that can be used in these settings. We selected a few methods with different level of interpretability power, ranging from the completely interpretable linear regression model towards the powerful but not-so-easy-tointerpret ANN. e five machine learning algorithms were implemented using R software packages (open access). e reader that is unfamiliar with those basic machine learning models can find the introductory description in [22]. e input to these models is the dataset containing 600 feature vectors which comprise both the study and control groups. e dataset was divided into a training set of 75% (450 records) and a test set of 25% (150 records). e ratio between positive (septic patients) and negative (nonseptic patients) examples was maintained in both sets. e 600 patients were partitioned into mutually exclusive sets for training and testing the prediction algorithm. We aimed to select the algorithm which will produce the best Area under the curve (AUC) which is used to examine predictive performance of machine learning in medical applications. A more thorough description of these models is provided in supplement 1.

Logistic Regression.
Logistic regression is a common tool for medical data analysis, including mortality or morbidity outcomes prediction. It is common to use it as a benchmark with other more advanced machine learning models. It is used for the binary classification problem, i.e., the classification between two options, for example, dead or alive. e input may consist of many parameters, measured or calculated, and the output is a value between 0 and 1, that may be interpreted as the probability of belonging to one of the two predefined classes: (1)

Support Vector
Machines. Support vector machines (SVMs) are models that operate when data behavior is nonlinear, limiting the applicability of models with high interpretability. It can be viewed as a black-box, meaning there is no transparency and clinical interpretability, potentially restricting the ability to make inferences. It produces a binary input, i.e., 0 or 1. is classification model is commonly utilized for medical applications. e goal is to find a hyperplane of the form w T x + b � 0 which will provide the best separation between two classes of examples in the space. e best hyperplane is determined by the widest possible margins which separate it from the closest examples of both classes. Labels of classes are denoted as y � {− 1, 1} and the decision function is as follows: where each x i which fulfils w T x + b > 0 will be classified as 1 and those which fulfil w T x + b < 0will be classified as − 1. In order to produce a probability output in the range [0, 1], we pass SVM's output to a sigmoid function. In some cases, a linear hyperplane to separate the two classes does not exist, so a kernel function is used. A kernel function maps features into a higher dimension space in which the separating hyperplane exists. e input x i is replaced be a kernel function Φ(x i ): Two different kernels were used in this study. e polynomial basis function is of the following form: A radial basis function is of the form Journal of Healthcare Engineering 5

Artificial Neural Network.
Much like SVMs, these methods have a high predictive ability but are restricted in transparency and interpretability. It is a multilayered mathematical representation of a learning network which maps the correlation between inputs and outputs by backtracking to evaluate and minimize errors. is network contains neurons and arcs which comprise the net's architecture, which can be generally described as follows: where x i (k)is the input of the k th neuron where i � 1, . . . , m, w i (k) is the value of correlation between the k th and k − 1 th neurons, F is the propagation function, for classification usually a sigmoid function, bis the bias of the mentioned neuron, and y(k)is the output of k th neuron. New examples are then run through the net from input neurons to outputs.

Performance Measures.
In statistics, a receiver operating characteristic curve (ROC curve) is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. e ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. e AUC equals to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming "positive" ranks higher than "negative"). e AUC is generally given by e model with the maximal AUC is considered the most favorable.
In addition to AUC, we also compared sensitivity, specificity, accuracy, negative predictive value (NPV), positive predictive value (PPV), and area under precision recall   curve (AUC-PR), all of which are common performance indicators for comparison of predictive models.

Results
In order for each algorithm to build the best mathematical representation (model) of the problem, we used 10-fold cross validation on the training set (75% of the records) from which we deducted the optimal initialization parameters for each model. e optimal parameters we received for T � 8 in each model were as follows (logistic regression and SVM with linear kernel have no parameters which require tuning): (i) SVM with radial basis function: σ � 0.440227, c � 0.25 (ii) SVM with polynomial basis function: deg � 3, scale � 0.1, c � 0.25 (iii) ANN hidden layers � 5, decay � 0.1 ese models were run with the test set (remaining 25% of records), and the results were calculated by examining the model's ability to correctly classify the outcome of each input case. From the results summarized in Table 4, it is evident that SVM with the radial basis function provided the highest AUC of 88.38%. is model also provided the highest PPV, i.e., the accuracy of a given sepsis prediction as well as specificity, i.e., the true negative rate of the prediction. Figure 6 presents the ROC plots of all tested models, and Figure 7 displays the PR Curve for each model, where the area under both curves is greatest for the SVM-RBF model. e length of data collection interval T was set to 8 hours for two reasons: first, the number of patients with complete data records was reduced significantly when using 9 hours or more. Second, the models' performance was lower when the data collection period was shorter. e best performing model was built on an 8-hour interval of data collection (Table 5).
ere is a need for a prediction model which gives ICU staff enough time to act based on prediction. at is, if the model predicts sepsis onset in the next hour, even if the result is highly accurate-ICU staff still need more time in advance to complete intensive treatment processes. Due to this tradeoff between accuracy and practicality-five SVM-RBF models were trained to predict probability of sepsis onset within the following 1 to 5 hours. Models for 1-4 hours in advance performed similarly (AUC 86-88%), while the model for 5 hours in advance provided only 81.41% (Table 6). e reduction in performance may be due to a reduction in number of patients with complete data records for 13 hour interval (data collection interval + 5). According to these findings a 4 hours prediction interval was determined as the most suitable match that is both accurate and actionable.

Comparison to Previous
Work. Previous work on the problem was presented by Guillén et al., in which a prediction of severe sepsis onset in the following 2 hours was provided based on a 22 hour data collection period [19]. e features used were descriptive statistics of the measurements: the median, standard deviation, and minimum and maximum values.
We compared the predictive power of these features in our settings: to predict 4 hours into the future based on 8 hours of data collection. Table 7 shows that the best performing model is again the SVM-RBF, but accuracy values of the model are lower than those achieved with variability features, as can be seen from the comparison of ROC curves (Figure 8).

Discussion
Our study succeeded to predict with a high ROC (0.88), the onset of sepsis 4 hours previous to antibiotic start prescribed by the physician using simple vital signs such as heart rate, arterial pressure, and respiratory and temperature variabilities available from an electronic medical record system. Other centers have recently presented similar approaches. In a study comparing heart rate to systolic pressure ratio to systemic inflammatory response syndrome (SIRS) after emergency department admission, Danner et al. included more than 50,000 patients [23]. Eight-hundred eighty-four patients were septic, and the heart rate to systolic blood pressure ratio had 73.8% sensitivity for prediction of sepsis.  Chiew et al. [24] selected patients admitted to an emergency department and used heart rate variability for risk prediction of suspected sepsis. e sample was small, and AUC did not exceed 0.33. However, in-hospital mortality prediction was improved. Nemati et al. [25] used the MIMIC -III ICU     database analyzing 65 variables using the artificial intelligence sepsis expert algorithm (AISE) and were also able to predict the sepsis onset between 12 and 4 hours in advance, albeit with a slightly lower AUC (0.83 to 0.85) compared to our results. Most of the 65 variables were lowresolution data, and only high-resolution data from heart rate and arterial blood pressure were used. Mao et al. [26] conducted an interesting study on more than 90,350 patients from the University of California San Francisco database and used 6 vital signs (systolic and diastolic blood pressure, heart rate, respiratory rate, and peripheral oxygen blood saturation and temperature). e InSight's algorithm generated by gradient tree-boosting was verified in the MIMIC-III dataset with a population of short stayers (only ICU population). ey obtained an AUROC curve for sepsis onset of 0.92, for severe sepsis onset of 0.87 and for septic shock of 0.99. However, gold standard involved measurements were included in the algorithm. When these gold standards were removed from the model training, InSight had an AUROC value of 0.84, slightly lower than our algorithm's.
Our study has limitations. It was conducted using EMRs from a regional hospital's general ICU. Since only patients from this hospital were included, the dataset was rather small. We might have been able to predict sepsis onset farther into the future (next 5 or 6 hours) if more patient data were available. In addition, physicians determine sepsis onset as the moment in which antibiotics are administered to a patient (sepsis diagnosis). is is a limitation of the medical documentation process, and our study relies on the detection moment as available from this documented history. e new definition of sepsis was published after the end of our study [27]. Finally, the information systems in this ICU record vital sign measurements every 10 minutes, meaning 48 discrete measurements per 8 hour time interval. e number of measurements may vary (every 1, 5, 10 minutes) according to the collection rate of information systems in other ICUs.

Conclusions
We have developed a model which is able to predict the onset of sepsis 4 hours prior to the decision by the attending physician to initiate antibiotic treatment. e prediction was calculated using 3 commonly monitored and collected patient parameters, without the need for time-consuming and expensive laboratory investigations. is fact makes the model relevant for almost any ICU or hospital setting, especially where laboratories are limited in resources or unreachable.
In addition, since the model input is collected from individual 8 hour intervals, a prediction of sepsis onset can be made very early into a patient's hospitalization course, as well as at any later point throughout it. is promotes the model as a useful tool in the ICU.
Since the model was constructed to predict the probability of sepsis onset within the following 4 hours, and as it considers a predicted probability of over 50% as sepsis (and less as no sepsis), more work can be done when testing the model in real time in the ICU setting in order to optimize the selection of a threshold of classification.

Data Availability
e datasets used for this study were extracted from Rabin Medical Center's Intensive Care Unit's archives. It includes confidential and personal patient data, which cannot be shared publicly online.

Conflicts of Interest
e authors declare that they have no conflicts of interest.