A Least Absolute Shrinkage and Selection Operator-Derived Predictive Model for Postoperative Respiratory Failure in a Heterogeneous Adult Elective Surgery Patient Population

BACKGROUND: Postoperative respiratory failure (PRF) is associated with increased hospital charges and worse patient outcomes. Reliable prediction models can help to guide postoperative planning to optimize care, to guide resource allocation, and to foster shared decision-making with patients. RESEARCH QUESTION: Can a predictive model be developed to accurately identify patients at high risk of PRF? STUDY DESIGN AND METHODS: In this single-site proof-of-concept study, we used structured query language to extract, transform, and load electronic health record data from 23,999 consecutive adult patients admitted for elective surgery (2014–2021). Our primary outcome was PRF, defined as mechanical ventilation after surgery of > 48 h. Predictors of interest included demographics, comorbidities, and intraoperative factors. We used logistic regression to build a predictive model and the least absolute shrinkage and selection operator procedure to select variables and to estimate model coefficients. We evaluated model performance using optimism-corrected area under the receiver operating curve and area under the precision-recall curve and calculated sensitivity, specificity, positive and negative predictive values, and Brier scores. RESULTS: Two hundred twenty-five patients (0.94%) demonstrated PRF. The 18-variable predictive model included: operations on the cardiovascular, nervous, digestive, urinary, or musculoskeletal system; surgical specialty orthopedic (nonspine); Medicare or Medicaid (as the primary payer); race unknown; American Society of Anesthesiologists class ≥ III; BMI of 30 to 34.9 kg/m2; anesthesia duration (per hour); net fluid at end of the operation (per liter); median intraoperative FIO2, end title CO2, heart rate, and tidal volume; and intraoperative vasopressor medications. The optimism-corrected area under the receiver operating curve was 0.835 (95% CI,0.808–0.862) and the area under the precision-recall curve was 0.156 (95% CI, 0.105–0.203). INTERPRETATION: This single-center proof-of-concept study demonstrated that a structured query language extract, transform, and load process, based on readily available patient and intraoperative variables, can be used to develop a prediction model for PRF. This PRF prediction model is scalable for multicenter research. Clinical applications include decision support to guide postoperative level of care admission and treatment decisions.

Postoperative respiratory failure (PRF), defined as requiring mechanical ventilation (MV) after surgery of > 48 h, is a major source of morbidity. 16][7][8] Risk factors for PRF in patients undergoing a broad spectrum of surgical procedures have been analyzed in prior predictive models. 1,9,10wever, consensus among these models is lacking because of differences in PRF definition, population, and predictors of interest.Other studies have focused on homogeneous patient populations, such as abdominal, 11 neurological, 12 or cardiovascular 13 surgery patients, often including both elective and emergent surgical procedures.The Centers for Medicare & Medicaid Services includes PRF that occurs after elective surgery in the Hospital-Acquired Condition Reduction and Hospital Compare Public Reporting Programs, yet progress in reducing the incidence of PRF has been hindered by this lack of consensus in identifying the most at-risk patients.Identifying patients at increased risk of PRF after elective surgery is an important step toward developing clinical workflows to improve postoperative care and outcomes while appropriately allocating hospital resources.Such workflows include postoperative level of care, admission location, monitoring, and treatment orders for at-risk patients.
Herein we describe an automated structured query language (SQL)-based extract, transform, and load (ETL) procedure that enables rapid acquisition of data exclusively from an electronic health record (EHR).We then used the selected and validated data to develop a single-site proof-of-concept predictive model 14 for PRF after elective surgery in adults.Our aim was to develop a model that considered a patient's pre-existing risk factors, intraoperative care and physiologic parameters, and status on exiting the operating room to identify patients at risk of PRF.We hypothesized that our model would have at least good discrimination and would be well calibrated across its range of predicted probabilities.Our methods will allow us to expand our SQL ETL process across the five centers of our University of California Critical Care Research Collaborative for further model development and validation.Generating standardized, automated approaches to large-scale multicenter research using real-world data is crucial in predictive modeling of rare adverse events, such as PRF.

Study Design and Methods
This retrospective cohort study was approved by the institutional review board at the University of California, Davis; the requirement for informed consent was waived.This article adheres to the Strengthening the Reporting of Observational Studies in Epidemiology Statement 15 and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis 16 guidelines.

Study Design, Setting, and Population
We analyzed 23,999 consecutive adult patients undergoing elective surgery at a single academic center (2014-2021).The start date was selected based on the conversion from paper to EHR clinical documentation for perioperative services and the end date was selected to provide access to 8 full calendar years of data for ETL.Inclusion criteria were adults aged 18 years and older, elective surgical admissions, undergoing an operation within 24 h of admission, and general anesthesia.Exclusion criteria were transfers from another hospital and a tracheostomy present on admission.The primary outcome was PRF.Secondary outcomes included hospital and ICU length of stay and discharge disposition.

Data ETL Procedure
PRF was defined as > 48 h of MV, from the anesthesia end time to hospital discharge.Predictors of interest spanned the preoperative and intraoperative care continuum and included demographics, pre-existing comorbidities, and preoperative and intraoperative factors (e-Table 1).We used SQL coding to perform the data ETL procedure from our Epic EHR (e-Appendix 1).Two clinicians validated data acquisition by comparing ETL output for 100% of patients with PRF and a random 10% of patients without PRF via manual independent chart review until agreement reached 100%.All variables had < 2.5% missing data; missingness was imputed to the cohort mode for categorical variables and median for continuous variables.Although other studies have included preoperative laboratory values, despite also having > 50% missing data 17 and emergency surgery [18][19][20] in their models, we opted not to include either.Although our health system, like others, has used an SQL ETL process for clinical data, this was our first use of this method for perioperative flow sheet data from the Epic OpTime module.

Descriptive Statistics
We report the median and interquartile range for continuous variables and total number and percentage for categorical variables.We used Pearson's χ 2 test and the Wilcoxon rank-sum test to compare patients with PRF with patients without PRF for categorical and continuous variables, respectively.Significance was set a priori at P < .05.Data were analyzed using Stata MP version 18 software (StataCorp) and R version 4.2.2 software (R Foundation for Statistical Computing).

Predictive Model Development and Evaluation
We used logistic regression to build the predictive model 14 and least absolute shrinkage and selection operator (LASSO) 21 regularization to select variables and estimate model coefficients (e-Table 2).Our conceptual model for the analysis considered a patient's preexisting risk factors, intraoperative factors, and status on exiting the operating room to identify patients at risk of PRF (Fig 1).
Before model fitting, we dichotomized all categorical variables and standardized all numeric variables to have a mean of 0 and an SD of 1.To select the regularization parameter in the logistic LASSO model, we used a 10-fold cross-validation procedure and application of the 1-SE rule.This helps to ensure the generalizability and interpretability of the model by encouraging parsimony. 22We retained variables with nonzero coefficients from the fitted logistic LASSO model in the final prediction model.Given the small number of patients with PRF and the need to develop a model representative of the real-world prevalence of PRF, we used the entire data set in model development.To evaluate the performance of the model while controlling for overfitting, we used an optimism-corrected bootstrap procedure. 23We drew 250 bootstrap samples from the training data stratified by PRF group, maintaining the overall sample prevalence, and repeated the logistic LASSO modeling procedure on each bootstrap sample.We estimated optimism-corrected performance using the bootstrap models following Steyerberg. 23We additionally used a bootstrap procedure in combination with the logistic LASSO 24 model fitting procedure to evaluate the stability of the variable selection procedure by calculating the frequency at which each variable was selected in the bootstrap models.This approach has the advantage of providing a robust feature selection performance and a more accurate estimate of coefficients.By training multiple LASSO models on different bootstrap samples of data, this method accounts for data variability and helps to identify features that consistently are important across different samples.We evaluated model performance using area under the receiver operating characteristic curve (AUC) and area under the precision-recall curve (AUPRC).Sensitivity, specificity, positive and negative predictive values, and Brier scores were calculated using a cutoff that maximized Youden's index (Fig 2).

Sensitivity and Robustness Analyses
We conducted secondary analyses to verify the optimism-corrected bootstrap procedure results and to evaluate robustly the model's performance.For these analyses, data were split temporally into a training set (2014-2018) and a test set (2019-2021).First, the training set was used to develop a model in the same manner as the primary analysis and was evaluated on the test set.Second, again using the training set, we developed models using 1,000 bootstrapped data sets with equal numbers of patients with PRF and patients without PRF by randomly sampling from among patients without PRF.These models also were evaluated on the test set (e-Appendix 2).We also conducted a sensitivity analysis to determine the effect of the Elixhauser comorbidity count and score on model performance (e-Appendix 3).

Pre-existing Patient and Intraoperative Characteristics
After 23,999 consecutive surgical encounters, PRF developed in 225 patients (0.94%).Patients with PRF were older, male, covered by Medicare, not obese, and admitted with multiple comorbidities (Table 1). 25,26Patients with PRF underwent longer anesthesia and surgery durations and more often underwent surgery on the cardiovascular system (Table 2).Patients with PRF also showed lower operative tidal volume and greater net positive fluid balance at the end of surgery and 24 h after surgery.Patients with PRF received more morphine equivalent units and more often received vasopressor medications.The most frequently used vasopressor medication in patients with PRF was norepinephrine and in patients without PRF was phenylephrine.Among all patients, the first oxygen device outside of the operating room was supplemental oxygen (47.3%), followed by room air (45.9%),MV (4.9%), noninvasive positive pressure ventilation (0.8%), and high-flow nasal cannula (0.05%).Patients with PRF left the operating room while receiving MV more often than patients without PRF (49.8% vs 4.5%) and while receiving room air less often (14.2%vs 46.3%; P < .001).Patients with PRF underwent a median of 164 h of postoperative MV (Table 2).Nearly one-half of patients with PRF continued to receive MV for > 48 h immediately after surgery, whereas 52% were reintubated and returned to MV for > 48 h.The median time to reintubation for patients with PRF was 51.4 h.
Ninety-nine percent of patients with PRF were admitted to an ICU from the operating room, compared with only 17% of patients without PRF (P < .001).Patients with PRF underwent longer hospital and ICU lengths of stay (Table 3).Twenty-four percent of patients with PRF died in the hospital, compared to ≤ 1% of patients without PRF.Of the 171 patients with PRF who survived to discharge, 95 patients (42%) were discharged to another facility (eg, skilled nursing, long-term acute care), rather than home.

Predictive Model Performance
The LASSO procedure retained 18 predictors in the logistic regression (Table 4).Duration of anesthesia (hours), net fluid balance at operating room departure (liters), operations on the cardiovascular system, Medicare (as the primary payer), and American Society of Anesthesiologists class of ≥ III were selected as predictors in all bootstrap samples and increased the odds of PRF.Other predictors included operations on the cardiovascular, nervous, digestive, urinary, or musculoskeletal system; surgical specialty orthopedic (nonspine); Medicaid (as the primary payer); race unknown; BMI of 30 to 34.9 kg/m 2 ; median FIO 2 , end-tidal CO 2 (EtCO 2 ), heart rate, and tidal volume; and intraoperative vasopressor medications.All predictors except race unknown and EtCO 2 were retained in ≥ 80% of bootstrap samples (Table 4).This model achieved an observed AUC of 0.851 (95% CI, 0.824-0.878)and an optimismcorrected AUC of 0.835 (95% CI, 0.808-0.862)( We used Youden's index 27 to identify a potential threshold for discriminating patients with PRF from patients without PRF.A predicted probability of PRF of 1.315% maximized Youden's index, achieving an optimism-corrected sensitivity of 0.647 (95% CI, 0.593-0.713)and specificity of 0.858 (95% CI, 0.851-0.86)(Table 5).Other performance metrics (positive predictive value, negative predictive value, Brier score) are provided in Table 5.The confusion matrix shows 3,372 of 23,774 as false-positive findings and 69 of 225 as false-negative findings (Table 6).

Sensitivity and Robustness Analyses
In the secondary analyses (e-Appendix 2), the predictors retained in the LASSO logistic regression and their coefficients like were the primary model (e-Table 2).Performance metrics of models developed with the training set and applied to the holdout test sets were slightly worse than the optimism-corrected metrics for the primary model.The AUC declined from 0.835 to between 0.763 and 0.786 in the supplementary analyses, whereas the AUPRC values increased from 0.156 for the primary model to 0.172 in the comparable secondary analysis (e-Table 3, approach 1).We also performed sensitivity analysis to determine the effect of including Elixhauser comorbidity count and score on model performance (e-Appendix 3; e-Table 5, e-Figure 1, e-Figure 2).This resulted in a 13-variable predictive model with a negligible increase in optimism-corrected AUC from 0.835 to 0.84 and an AUPRC from 0.156 to 0.162 (e-Table 4).

Discussion
We developed a prediction model for PRF that used readily available patient preoperative and intraoperative data from 23,999 consecutive adult elective surgeries using an automated SQL ETL process.Our model includes 18 variables; duration of anesthesia, net fluid balance at operating room departure, operations on the cardiovascular system, Medicare coverage, and American Society of Anesthesiologists class ≥ III were selected as predictors in all bootstrap samples.Other predictors included operations on the cardiovascular, nervous, digestive, urinary, or musculoskeletal system; surgical specialty orthopedic (nonspine); Medicaid coverage; race unknown; BMI of 30 to 34.9 kg/m 2 ; median FIO 2 , EtCO 2 , heart rate, and tidal volume; and intraoperative vasopressor medications.The model showed good discrimination and calibration.Secondary analyses validated our primary findings.
This study extends prior work in several important ways.In contrast to our previous PRF research that used manual chart abstraction, [28][29][30] our current study developed and validated an automated ETL process to enable efficient, standardized acquisition of real-world data from the EHR.The potentially extensible nature of SQL ETL processes should allow adaptation of our methods to the EHRs of other research sites, thereby enabling data acquisition and large-scale research into rare events like PRF that would not be feasible if data collection were restricted to manual chart review.Although our prior work focused on developing an explanatory model, our current study aimed to develop a model optimized for prediction that eventually might be incorporated into clinical decision support (CDS)-aided clinical workflows.Our work is distinct from the work of others in that we excluded emergent surgical procedures and preoperative laboratory findings and focused exclusively on elective surgical procedures.We also narrowed our outcome of interest to PRF, rather than the broad continuum of all postoperative pulmonary complications.
In this predictive model, we aimed to estimate accurately the probability that PRF would develop based on preoperative and intraoperative factors.Other published predictive models (eg, Assess Respiratory Risk in Surgical Patients in Catalonia [ARISCAT], 18 Prospective Evaluation of a Risk Score for Postoperative Pulmonary Complications in Europe [PERISCOPE], 19 and Local Assessment of Ventilatory Management During General Anesthesia for Surgery [LAS VEGAS] 20 ) focused on all postoperative pulmonary complications, ranging from atelectasis to respiratory failure, which occurred in 5% to 11% of patients.These models also included emergency surgeries.Despite the good discrimination of all three models, the focus on all postoperative pulmonary complications and the inclusion of emergency surgeries makes extrapolation to elective surgery populations challenging and external validation of the models in the patient population impossible.Importantly, the ARISCAT and PERISCOPE studies did not include intraoperative fluid, medications, or MV parameters in their predictive models.The LAS VEGAS study evaluated intraoperative predictors, but the inclusion of emergency surgeries precludes direct comparison with our model.The more recent Respiratory Support, Prolonged Intubation, or Reintubation.Accuracy (RESPIRE) 17 single-site predictive model for PRF was EHR based and had good accuracy; however, in addition to using a consensus definition for PRF that differed from ours, it included outpatient, same-day, and emergency surgeries and did not include intraoperative treatment factors, although surgical site was included.
To create a targeted and readily interpretable model for CDS, we chose a fundamentally different approach by considering both pre-existing patient comorbidities and intraoperative treatment.Our goal was to consider the effect of a patient's pre-existing risk factors, intraoperative care and physiologic parameters, and status on exiting the operating room to determine risk and to assist in postoperative level of care and treatment decisions.This approach is congruent with the theory of cascade iatrogenesis, 31,32 in which adverse events may occur if trigger events are not recognized and addressed.An example of cascade iatrogenesis is intraoperative fluid overload in a patient with pre-existing heart failure, leading to pulmonary edema, respiratory failure, and invasive MV.We also chose a different statistical approach than others, logistic regression, because we sought to develop a model that was readily interpretable by clinicians and that could be developed into a risk scorebased, real-time CDS tool.
Possible clinical applications of our model include identification of at-risk patients who could benefit from postoperative admission or upgrade to the ICU; implementation and monitoring of adherence to the daily Assess, Prevent, and Manage Pain, Both Spontaneous Awakening Trials and Spontaneous Breathing Trials, Choice of Analgesia and Sedation, Delirium: Assess, Prevent, and Manage, Early Mobility and Exercise, and Family Engagement and Empowerment bundle 33 ; and the postoperative application of procedurespecific, evidence-based enhanced recovery after surgery 34 protocols.For example, although enhanced recovery after surgery implementation has been shown to improve outcomes in almost all major surgical specialties, 34 as a multidisciplinary and multimodal approach, it can be resource intensive, thus limiting its widespread use.Application of well-calibrated PRF prediction models may allow patient-level risk stratification and subsequent ICU admission; Assess, Prevent, and Manage Pain, Both Spontaneous Awakening Trials and Spontaneous Breathing Trials, Choice of Analgesia and Sedation, Delirium: Assess, Prevent, and Manage, Early Mobility and Exercise, and Family Engagement and Empowerment bundle implementation; and enhanced recovery after surgery application for only those patients identified as at risk, simultaneously optimizing patient outcomes and the efficiency of care delivery by avoiding underuse or overuse of critical care resources. 35Early identification of patients at risk of PRF, creation of supportive infrastructure, and implementation of prevention strategies helped one health system reduce PRF by 35%. 36rengths of our study include our easily interpretable statistical approach, use of a large and diverse patient population, and restriction to elective surgeries and the outcome of PRF to reduce heterogeneity.Our development of an SQL ETL data extraction method enabled us to analyze all 23,999 consecutive elective surgical encounters over an 8-year period.This approach could improve the ability to build scale in studies of PRF and to support implementation and validation of predictive models across health systems.Our focus on a more narrowly defined population and single serious adverse event should enable future researchers both to refine predictive models and to test the effects of incorporating model outputs into CDS-enabled clinical workflows designed to prevent adverse outcomes such as PRF in at-risk patients. 37,38mitations of our current study include the single-center proof-of-concept design and a relatively small number of patients with PRF, which we addressed through optimismcorrected analyses.With our SQL ETL, we were limited to analyses of data found in discrete fields, rather than free-text notes.This constrained our definition of the primary outcome to MV after surgery of > 48 h without further qualification of the reason for prolonged MV.Thus, it is possible this cohort of 225 patients with PRF includes patients who required prolonged MV for airway protection, not respiratory failure.In our prior work, 4.3% of patients flagged for PRF had airway compromise, not respiratory failure. 28We also acknowledge that not all cases of PRF can be prevented.Patients at risk may still opt to undergo an elective surgical intervention to address quality-of-life issues such as chronic pain or reduced life expectancy (eg, laminectomy, lung resection).Furthermore, our ETL procedure was developed in a standard EHR deployment from a single vendor, and it is possible that extension of our methods to a nonstandard Epic implementation or another EHR vendor's data model would require cost-prohibitive adaptation of our methods.Finally, the model was developed using data from one hospital, and external validation in other cohorts is needed to confirm its performance.
Feasible multicenter analysis is key to the study of rare adverse events such as PRF.We have described a method using an SQL ETL that could be deployed at other centers effectively to automate the abstraction of tens of thousands of charts, work that would not be feasible through manual chart abstraction.The ability to predict patients at risk of PRF reliably using readily available patient preoperative and intraoperative variables is valuable for clinicians and may afford individualized, optimized postoperative planning.Future research is needed to validate our findings in other centers, to conduct clustered machine learning to identify subgroups (eg, low, moderate, and high risk), and to develop, test, and operationalize a risk score for real-time use by clinicians.
In conclusion, we developed a prediction model for PRF based on readily available patient, preoperative, and intraoperative data using an automated procedure to extract large volumes of data from the EHR.If validated in other centers, our model may represent an intuitive and practical tool for prediction of PRF.With improved prediction, clinician scientists can understand PRF better, can begin to classify phenotypes, and can discern if heterogeneity of treatment effect exists.This eventually might lead to improved care and outcomes for PRF, which is associated with high morbidity and mortality.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.In this study, we sought to determine if a predictive model, using readily available patient and intraoperative factors, could identify patients at high risk of postoperative respiratory failure accurately.

Results:
We developed an 18-variable predictive model for PRF that included operations on the cardiovascular, nervous, digestive, urinary, or musculoskeletal system; surgical specialty orthopedic (nonspine); Medicare or Medicaid (as the primary payer); race unknown; American Society of Anesthesiologists class ≥ III; BMI of 30 to 34.9 kg/m 2 ; anesthesia duration (per hour); net fluid at end of the operation (per liter); median intraoperative FIO 2 , end-tidal carbon dioxide, heart rate, and tidal volume; and intraoperative vasoactive medications.

Interpretation:
A predictive model for postoperative respiratory failure, based on readily available patient and intraoperative variables, achieved an optimism-corrected area under the receiver operating characteristic curve of 0.835 (95% CI, 0.808-0.862)and an area under the precision-recall curve of 0.156 (95% CI, 0.105-0.203).Calibration plot for the least absolute shrinkage and selection operator logistic regression model predicting postoperative respiratory failure.To create this plot, predicted probabilities were binned into 10 equally sized groups.The mean predicted probability and 95% CI were calculated for each bin and were plotted against the observed proportion of events in each bin.Because of the very low prevalence of events, the mean predicted probability remains small (approximately 5%), even for the bin containing the largest predicted probabilities.
The mean predicted probabilities are close to the 45° line, reflecting good agreement between predicted probabilities and observed probabilities, and hence good calibration.

Figure 2 -.
Figure 2 -.Diagram showing steps in the model derivation and validation process.M = mean; LASSO = least absolute shrinkage and selection operator.

TABLE 2 ]
Perioperative Characteristics: Patients Who Demonstrated PRF Compared With Patients Who Did Not Demonstrate PRF CHEST Crit Care.Author manuscript; available in PMC 2024 March 01.

TABLE 3 ]
Outcomes for Patients Who Demonstrated PRF Compared With Patients Who Did Not Demonstrate PRF Data are presented as No. (%) or median (interquartile range), unless otherwise indicated.LTAC = long-term acute care; PRF = postoperative respiratory failure; SNF = skilled nursing facility.a Wilcoxon rank-sum test for continuous variables and Pearson CHEST Crit Care.Author manuscript; available in PMC 2024 March 01.

TABLE 4 ]
Variables Retained by the LASSO Procedures in the Logistic Regression for Predicting Occurrence of PRF American Society of Anesthesiologists; LASSO = least absolute shrinkage and selection operator; PRF = postoperative respiratory failure; a Probability selected is the percentage of bootstrap samples in which the variable was retained.
CHEST Crit Care.Author manuscript; available in PMC 2024 March 01.