Infectious disease/original research
Development and Evaluation of a Machine Learning Model for the Early Identification of Patients at Risk for Sepsis

https://doi.org/10.1016/j.annemergmed.2018.11.036Get rights and content

Study objective

The Third International Consensus Definitions (Sepsis-3) Task Force recommended the use of the quick Sequential [Sepsis-related] Organ Failure Assessment (qSOFA) score to screen patients for sepsis outside of the ICU. However, subsequent studies raise concerns about the sensitivity of qSOFA as a screening tool. We aim to use machine learning to develop a new sepsis screening tool, the Risk of Sepsis (RoS) score, and compare it with a slate of benchmark sepsis-screening tools, including the Systemic Inflammatory Response Syndrome, Sequential Organ Failure Assessment (SOFA), qSOFA, Modified Early Warning Score, and National Early Warning Score.

Methods

We used retrospective electronic health record data from adult patients who presented to 49 urban community hospital emergency departments during a 22-month period (N=2,759,529). We used the Rhee clinical surveillance criteria as our standard definition of sepsis and as the primary target for developing our model. The data were randomly split into training and test cohorts to derive and then evaluate the model. A feature selection process was carried out in 3 stages: first, we reviewed existing models for sepsis screening; second, we consulted with local subject matter experts; and third, we used a supervised machine learning called gradient boosting. Key metrics of performance included alert rate, area under the receiver operating characteristic curve, sensitivity, specificity, and precision. Performance was assessed at 1, 3, 6, 12, and 24 hours after an index time.

Results

The RoS score was the most discriminant screening tool at all time thresholds (area under the receiver operating characteristic curve 0.93 to 0.97). Compared with the next most discriminant benchmark (Sequential Organ Failure Assessment), RoS was significantly more sensitive (67.7% versus 49.2% at 1 hour and 84.6% versus 80.4% at 24 hours) and precise (27.6% versus 12.2% at 1 hour and 28.8% versus 11.4% at 24 hours). The sensitivity of qSOFA was relatively low (3.7% at 1 hour and 23.5% at 24 hours).

Conclusion

In this retrospective study, RoS was more timely and discriminant than benchmark screening tools, including those recommend by the Sepsis-3 Task Force. Further study is needed to validate the RoS score at independent sites.

Introduction

Sepsis is a syndrome without a criterion standard diagnostic test,1 and the challenges associated with defining it have made it difficult to quantify the associated morbidity and mortality.2 The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) incorporated nearly two decades of advances in pathobiology, epidemiology, and management into a new definition of sepsis.1, 3, 4 The Sepsis-3 definition has the potential to benefit researchers and public health officials interested in consistently measuring sepsis incidence and trends.5 In addition to the potential epidemiologic benefits, researchers interested in developing models for the early identification of sepsis welcome the new definitions. The development of screening tools for sepsis that use statistical and, more recently, machine learning methods is an active and important area of investigation,6, 7, 8, 9, 10 and the development of these tools depends on the availability of “labeled data,” which in machine learning parlance refers to a data set that includes a reliable “target” (ie, the outcome of interest). Although not a criterion standard diagnostic, the Sepsis-3 definition does provide a consistent consensus-based target on which machine learning models can be developed and validated.

Machine learning models are a departure from traditional screening tools for sepsis that are based on strong conceptual models. Traditional screening tools have the advantage of being relatively easy to describe and can often be calculated without assistance at the bedside. However, there is evidence suggesting that machine learning algorithms outperform traditional alternatives in contexts in which data inputs are abundant and where there is high potential for complex variable interactions.11, 12, 13 Because of these attractive features, machine learning models are supplanting rule-based models in many industries.12, 14 Given the complexity and acknowledged gaps in our understanding of sepsis, screening for sepsis seems like an ideal use case for machine learning. In fact, some studies have already shown that machine learning models offer improved sepsis prognostication compared with the Systemic Inflammatory Response Syndrome (SIRS), quick Sequential [Sepsis-related] Organ Failure Assessment (qSOFA), and the Modified Early Warning Score (MEWS) among ICU patients.6, 7, 8, 10

There is evidence that early initiation of treatment for sepsis is associated with significant reductions in morbidity and mortality.1, 15, 16 A screening tool that provides more accurate and timely assessment of patients’ risk for sepsis could facilitate earlier treatment and improved patient outcomes. In this article, we describe the development and evaluation of a new screening tool for sepsis, the Risk of Sepsis (RoS) score. We aimed to develop a tool that would incorporate the latest definition of sepsis, be applicable to all adult patients presenting to an emergency department (ED), and use machine learning methods to identify sepsis with a high degree of sensitivity and specificity in a timely fashion. We compared the performance of the RoS score with a number of benchmarks, including SIRS, the Sequential Organ Failure Assessment (SOFA), qSOFA, the National Early Warning Score, and MEWS.

Section snippets

Study Design, Setting, and Selection of Participants

A retrospective cohort study was performed among all patients aged 18 years and older who presented to the ED at 49 urban community hospitals operated by Tenet Healthcare between January 1, 2016, and October 31, 2017. The hospitals are located in 39 cities across 9 states. In total, 2,856,060 patient encounters were eligible for inclusion. A patient encounter is defined as a continuous interaction between a patient and a facility (ie, a scenario in which a patient presents to the ED and is

Characteristics of Study Subjects

Average annual ED volumes at the study hospitals ranged from 7,314 to 85,286 (median volume 33,960; interquartile range 25,238 to 43,601). Table 1 illustrates the demographic and geographic diversity of our large cohort of patients. We did not observe significant differences in the demographics, patient type, comorbid diagnoses, or outcomes between the testing and training cohorts. Differing from previous studies, the majority (79.5%) of our denominator population were not admitted as

Limitations

We used a set of well-documented clinical criteria for sepsis as the target for this model.5 We believe that this approach for identifying sepsis-positive patients was the best of available options and enabled us to assemble a large cohort of labeled data. However, we acknowledge that the lack of criterion standard diagnostic for sepsis continues to be a challenge and is a limitation of this analysis.

Machine learning models may be perceived as “black boxes,” whereas the benchmark models are

Discussion

In this study, we sought to build a screening tool that would incorporate the latest definitions and surveillance tools for sepsis, be applicable to all adult patients who present to an ED, use machine learning methods, and be more discriminant, sensitive, precise, and timely than available benchmarks. Our objectives were informed by the limitations we identified through our review of the literature.

Table E5 (available online at http://www.annemergmed.com) includes summaries of 11 recent

References (31)

  • C.W. Seymour et al.

    Assessment of clinical criteria for sepsis: for the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3)

    JAMA

    (2016)
  • C. Rhee et al.

    Incidence and trends of sepsis in US hospitals using clinical vs claims data, 2009-2014

    JAMA

    (2017)
  • K.E. Henry et al.

    A targeted real-time early warning score (TREWScore) for septic shock

    Sci Transl Med

    (2015)
  • S. Horng et al.

    Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning

    PLoS One

    (2017)
  • T. Desautels et al.

    Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach

    JMIR Med Inform

    (2016)
  • Cited by (127)

    • The Learning Electronic Health Record

      2023, Critical Care Clinics
    • Artificial intelligence in emergency medicine

      2023, Artificial Intelligence in Clinical Practice: How AI Technologies Impact Medical Research and Clinics
    View all citing articles on Scopus

    Please see page 335 for the Editor’s Capsule Summary of this article.

    Supervising editor: Alan E. Jones, MD. Specific detailed information about possible conflict of interest for individual editors is available at https://www.annemergmed.com/editors.

    Author contributions: RJD and SSJ conceived and designed the study. SSJ supervised the model development and analysis. JA implemented the code to identify sepsis-positive cases. RJD developed and evaluated the machine learning model and created the tables and figures. SSJ drafted the article, and all authors contributed substantially to its revision. SSJ takes responsibility for the paper as a whole.

    All authors attest to meeting the four ICMJE.org authorship criteria: (1) Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; AND (2) Drafting the work or revising it critically for important intellectual content; AND (3) Final approval of the version to be published; AND (4) Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

    Funding and support: By Annals policy, all authors are required to disclose any and all commercial, financial, and other relationships in any way related to the subject of this article as per ICMJE conflict of interest guidelines (see www.icmje.org). The authors have stated that no such relationships exist. Dr. Sherwin has received funding from the Agency for Healthcare Research and Quality (PA-14-001, Exploratory and Developmental Grant to Improve Health Care Quality through Health Information Technology [IT]–R21) for the project titled “Enhancing an EMR-Based Real-Time Sepsis Alert System Performance Through Machine Learning.”

    Readers: click on the link to go directly to a survey in which you can provide feedback to Annals on this particular article.

    A podcast for this article is available at www.annemergmed.com.

    View full text