Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Predicting overdose among individuals prescribed opioids using routinely collected healthcare utilization data

  • Jenny W. Sun ,

    Roles Conceptualization, Data curation, Formal analysis, Writing – original draft, Writing – review & editing

    jennysun@mail.harvard.edu

    Affiliations Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, United States of America

  • Jessica M. Franklin,

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliation Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America

  • Kathryn Rough,

    Roles Conceptualization, Data curation, Writing – review & editing

    Affiliations Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, United States of America

  • Rishi J. Desai,

    Roles Conceptualization, Writing – review & editing

    Affiliation Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America

  • Sonia Hernández-Díaz,

    Roles Conceptualization, Funding acquisition, Writing – review & editing

    Affiliation Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, United States of America

  • Krista F. Huybrechts,

    Roles Conceptualization, Funding acquisition, Writing – review & editing

    Affiliation Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America

  • Brian T. Bateman

    Roles Conceptualization, Funding acquisition, Supervision, Writing – review & editing

    Affiliations Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America, Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America

Abstract

Introduction

With increasing rates of opioid overdoses in the US, a surveillance tool to identify high-risk patients may help facilitate early intervention.

Objective

To develop an algorithm to predict overdose using routinely-collected healthcare databases.

Methods

Within a US commercial claims database (2011–2015), patients with ≥1 opioid prescription were identified. Patients were randomly allocated into the training (50%), validation (25%), or test set (25%). For each month of follow-up, pooled logistic regression was used to predict the odds of incident overdose in the next month based on patient history from the preceding 3–6 months (time-updated), using elastic net for variable selection. As secondary analyses, we explored whether using simpler models (few predictors, baseline only) or different analytic methods (random forest, traditional regression) influenced performance.

Results

We identified 5,293,880 individuals prescribed opioids; 2,682 patients (0.05%) had an overdose during follow-up (mean: 17.1 months). On average, patients who overdosed were younger and had more diagnoses and prescriptions. The elastic net model achieved good performance (c-statistic 0.887, 95% CI 0.872–0.902; sensitivity 80.2, specificity 80.1, PPV 0.21, NPV 99.9 at optimal cutpoint). It outperformed simpler models based on few predictors (c-statistic 0.825, 95% CI 0.808–0.843) and baseline predictors only (c-statistic 0.806, 95% CI 0.787–0.26). Different analytic techniques did not substantially influence performance. In the final algorithm based on elastic net, the strongest predictors were age 18–25 years (OR: 2.21), prior suicide attempt (OR: 3.68), opioid dependence (OR: 3.14).

Conclusions

We demonstrate that sophisticated algorithms using healthcare databases can be predictive of overdose, creating opportunities for active monitoring and early intervention.

Introduction

Over the past two decades, the abuse, dependence, and misuse of prescription opioids has become one of the most widely recognized public health problems in the United States [14]. Since 2000, the rate of overdose deaths involving opioids has tripled, with over 70,000 deaths in 2017 and an accumulation of over 700,000 deaths to date [59]. The rate of overdose deaths involving prescription opioids is now five times higher than it was in 1999, making it a leading cause of injury-related death in the United States [1012]. In recent years, the combined economic burden of the opioid epidemic has cost the United States over $50 billion annually [1316].

Routinely collected healthcare databases, which provide a rich source of longitudinal patient information on medical diagnoses and procedures, medication prescriptions, and healthcare utilization [17] could be leveraged as a resource for surveilling and intervening on patients at high-risk of aberrant opioid-related behaviors. Several automated algorithms to detect opioid-related adverse events have been proposed [18, 19], including two algorithms developed to predict overdose [20, 21]. Such claims-based algorithms have already been implemented in practice as tools for routine surveillance. Examples include the Centers for Medicare and Medicaid Services’ Overutilization Monitoring System in Medicare Part D to help prevent overutilization of prescription opioid medications [22] and a private company’s platform that has licensed its algorithm to organizations for use in identifying patients at risk opioid overdose [23]. Because overdose is such a potentially catastrophic outcome and there are low cost interventions that can be directed to at-risk patients (e.g., naloxone), even algorithms with modest performance may have clinical utility in flagging at-risk patients for intervention.

Previous studies have highlighted that the performance of existing opioid-related algorithms could be improved [18, 24]. Most algorithms currently used in practice are based on simple models with few predictors and have not fully taken advantage of the rich data available in healthcare databases. Recently, one study found that machine learning algorithms based on claims data performed well for risk prediction of opioid overdose in Medicare patients (which insures elderly patients in the United States) [20]. However, recent reports suggest that over 90% of opioid overdoses occur in patients <65 years old [5, 25], and the performance of more sophisticated algorithms based on data-driven techniques has not been evaluated in younger patients. Additionally, machine learning methods for predicting opioid overdose have not been directly compared to traditional multivariate regression.

In a nationwide healthcare database of commercially-insured patients, we used a data-driven approach to develop an algorithm to identify patients prescribed opioids who may be at high-risk of overdose. Specifically, we were interested in developing an approach that could use routinely collected healthcare utilization data to identify high-risk patients who have received prescription opioids and may benefit from interventions that can prevent overdose such as naloxone, a potentially life-saving medication that can be administered to patients suspected to have an overdose [26], or medication-assisted treatment (methadone, buprenorphine, or naltrexone). Such evidence-based practices can be effective in reducing the risk of overdose and will play an important role in preventing future overdoses [27]. To develop this tool, we applied two data-driven approaches and compared their performance to traditional multivariate regression. First, we utilized elastic net penalized regression to empirically select strong predictors of opioid overdose [28]. Then, we evaluated whether random forest, a machine learning method that also automates the identification of interactions between predictors or nonlinear associations between predictors and the outcome, could enhance prediction [29, 30]. To fully take advantage of the information available in the database, we produced a time-updating algorithm. Each month, the patient’s recent medical history was re-assessed, allowing us to capture temporal changes in clinically important risk factors and emulate real-time safety surveillance.

Materials and methods

Study population

This study used data from the Optum© Clinformatics® Data Mart, which comprises de-identified US healthcare claims for beneficiaries of a large, national commercial insurance provider. At any given time, Optum covers approximately 13 million people in the United States and reflects a geographically diverse population with beneficiaries from several health plans that have different benefit structures. The database contains individual-level information on inpatient and outpatient diagnoses and procedures, as well as records of outpatient prescription dispensing. Data from October 2011 to September 2015 were used in the analysis.

We identified a cohort of patients at least 18 years old who filled at least 1 prescription of the following opioids: buprenorphine, butorphanol, codeine, fentanyl, hydrocodone, hydromorphone, levorphanol, methadone, meperidine, morphine, oxycodone, oxymorphone, pentazocine, tapentadol, and tramadol. Both incident and prevalent users were eligible for inclusion. The date of the first observed dispensing of any prescription opioid was defined as the index date. Patients with a cancer diagnosis or overdose at any point prior to the index date were excluded. Patients with a prior overdose were excluded because they are at high-risk for recurrent overdose and prescription of naloxone or other interventions are clearly indicated. In our approach, we were interested in identifying patients who have not yet overdosed but could potentially benefit from preventative interventions. To develop our prediction model, we split the sample into 3 datasets. Patients were randomly allocated into the training set (50% of cohort), validation set (25%), or test set (25%).

Outcome and candidate predictors

We followed patients until first opioid overdose, which was defined as the presence of an inpatient or outpatient diagnosis code for prescription opioid poisoning (International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM] codes 965.00, 965.02, 965.09) or heroin poisoning (ICD-9-CM code 965.01). A previous validation study showed that these ICD-9-CM diagnosis codes for opioid overdoses and poisonings accurately identify opioid overdose events reported in medical records (PPV = 81–84%) [31]. Patients were censored at the end of insurance enrollment, death, cancer diagnosis, or end of follow-up (September 30, 2015).

Seventy-eight candidate predictors were selected a-priori based on subject matter knowledge. We considered variables related to demographics, medical diagnoses, medication prescriptions, and healthcare utilization. The ICD-9-CM codes (inpatient or outpatient, any position) used to define the medical diagnoses are displayed in S1 Table. All time-varying predictors were updated monthly. Demographic variables were captured on the index date and modeled categorically. Medical diagnoses were modelled as binary variables. All other candidate predictors were modelled as continuous variables. For each person-month of follow-up, medical diagnoses were defined during the preceding 6-month period and all other candidate predictors (medication prescriptions, healthcare utilization) were assessed during the preceding 3-month period. A longer covariate assessment period was used for medical diagnoses to allow sufficient time for diagnoses to be captured. However, six months of available data was not required since the goal was to mimic how active surveillance would be conducted in healthcare databases. If less than 3–6 months of data were available, all obtainable information was used. Therefore, each patient’s recent medical history was updated for each month they were enrolled to account for changes in risk factors over time. This study design is summarized in Fig 1.

thumbnail
Fig 1. Study design diagram.

Abbreviations: FU = follow up. The covariate assessment period was 3 months for medication dispensings and healthcare utilization and 6 months for medical diagnoses.

https://doi.org/10.1371/journal.pone.0241083.g001

Statistical analysis

Model development.

Pooled logistic regression models were used to predict the odds of opioid overdose in the next month, based on patient history from the prior 3–6 months. We used elastic net regularization, which minimizes overfitting through parameter shrinkage and variable selection, to create a parsimonious algorithm [28]. Our candidate model contained all candidate predictors, as well as quadratic transformations of total number, days supplied, and dose for opioid prescriptions. Inclusion of quadratic transformations was determined a-priori to accommodate potential non-linear relationships between key candidate predictors and outcome. Continuous variables were standardized to improve optimization and convergence of the models. Extreme outliers (>4 standard deviations of the mean) were imputed as the mean. To account for time, number of months since first observed opioid dispensing was included as a covariate. Models were fit using data exclusively from patients in the training set.

Decisions to optimize model performance were made using patient data in the validation set. Specifically, mean absolute error (MAE) was used for the tuning of λ, which controls the magnitude of regularization (smaller λ value imposes less penalization). The elastic net procedure generated 72 candidate values for λ from the training set. For each potential λ value, MAE was assessed by computing the mean difference between observed and predicted probabilities. The λ value that minimized the MAE was used in the final model. While K-fold cross-validation is typically used to select the optimal tuning parameter, the size of our data prevented use of this procedure (~50 million rows of person-month data in the training set). However, the differences among validation approaches decreases as sample size increases [32].

For the final model, beta coefficients and odds ratios (OR) were reported. 95% confidence intervals (95% CI) were not provided because elastic net regularization does not provide an accurate estimate of precision [33].

Internal validation.

Model performance was assessed in the test set. Discrimination was evaluated using c-statistics, which can be interpreted as the probability that the model correctly classifies a random patient who experienced an overdose in a given month as higher risk than a random patient who did not overdose in a given month. Model accuracy was evaluated using the Brier score, which calculates the squared differences between the actual outcomes and the model’s predicted probabilities. A lower Brier score suggests better accuracy [34].

Calibration was assessed visually at the person-month level. We compared the mean observed and predicted probabilities in 29 strata: deciles of predicted probability, with the highest decile further split into 20 additional strata based on percentiles. The highest decile of predicted probability included patients at both high and moderate risk, so further stratification allowed closer examination of patients at the highest risk of overdose and ensured comparable risks for patients within the same strata [34]. A perfectly calibrated model would form a diagonal line, suggesting that the observed incidence of the outcome is equal to the predicted risk of the outcome.

Predicted probabilities from the final elastic net model were used to classify patients into high and low risk groups using several potential thresholds, ranging from 0.0015% to 0.15% probability of having an overdose in the next month. Our time-updating approach means that each patient’s risk of overdose may change each month. However, we anticipate that for most clinical applications, interest will be in intervening at the patient level when high-risk individuals are flagged, as opposed to identifying high-risk person-months. In the primary analysis, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were computed at the person-level, instead of the person-month level, for each threshold. A person was therefore classified as high-risk if at least 1 follow up month was flagged as high-risk and classified as a true positive if an overdose occurred at any point during follow up. As an additional analysis, we estimated potential classification at the person-month level, where person-months were considered high-risk if that month was flagged and were considered a true positive if an overdose event occurred in the next month. This classification considers risk for each month separately.

Subgroup and secondary analyses.

To evaluate the robustness of our primary results, the final elastic net model was validated across subgroups of age (18–25, 26–35, 36–50, 50–65, >65 years) and gender (male, female).

As secondary analyses, we evaluated whether use of different analytic approaches may alter performance. First, we used random forest, a non-parametric data-mining technique, to consider prediction models with all possible variable transformations (polynomials, logarithms, etc.) and interaction terms [35]. Briefly, random forest is a supervised classification method that builds many decision trees to predict the outcome. At each split in a tree, a random sample of predictors is chosen. The overall prediction is comprised of the proportion of trees predicting overdose. Then, to consider the potential gain in using data-driven approaches, we used traditional logistic regression. The random forest and traditional regression models were fit in the training set using the same set of candidate predictors as the elastic net model, and performance was evaluated in the test set. The validation set was not used because there were no hyper-parameters that required tuning, as there were in elastic net.

Additionally, we evaluated whether inclusion of many time-updated predictors outperformed simpler models. First, to examine the potential gain in updating predictors over time, we fit a traditional logistic model using baseline predictors only (measured during the period prior to first opioid prescription) to predict overdose at any point during follow up. Then, to assess whether including many predictors improved performance, we fit a traditional logistic model comprised of only the top 10 most important predictors (updated over time). These predictors were identified based on variable importance (largest mean decrease in accuracy) from the random forest model. They include age, gender, region, back and neck pain, opioid dependence, psychosis, depression, anxiety disorder, number of prescribers for non-opioids, and neuropathic pain.

Elastic net and random forest analyses were performed in R version 3.4.3 using the packages “glmnet” version 2.0–16 and “randomforest” version 4.6–14, respectively [36, 37].

Results

We identified 5,293,880 individuals who were prescribed opioids, of which 2,682 patients (0.05%) had an observed opioid overdose during follow-up (S1 Fig). Patients were followed for a total of 99,174,018 person-months (0.003% of person-months with an overdose) and an average of 12.9 months among patients with an overdose and 17.1 months among patients without an overdose.

For each candidate predictor, descriptive statistics are displayed in Table 1. Each individual’s characteristics are updated each month, so the statistics shown reflect measurements during the 3–6 month window prior to overdose or a censorship event (final covariate assessment period). Patients who overdosed were younger than those who did not overdose (25.4% in 18–25 years among overdose vs. 12.6% among no overdose), but other demographic characteristics were relatively similar between groups. Compared to those who did not overdose, patients who overdosed were more likely to have at least 1 diagnosis of opioid dependence (16.6% vs. 0.7%) and opioid abuse without dependence (5.4% vs. 0.1%) during the 6 months prior to overdose or censoring. Additionally, compared to those who did not overdose, patients who overdosed had a higher number of total opioid dispensings (mean [sd], 2.42 [2.61] vs. 0.48 [1.11]), total dose for opioid prescriptions in oral morphine equivalents (mean [sd], 763.95 [1753.85] vs. 96.83 [596.06]), number of unique prescribers of opioids (mean [sd], 1.15 [1.11] vs. 0.32 [0.61]), and number of unique pharmacies for opioid dispensings (mean [sd], 1.05 [0.98] vs. 0.30 [0.55]) during the 3 months prior to censoring. Descriptive statistics at the person-month level are shown in S2 Table.

thumbnail
Table 1. Characteristics during the window prior to censoring for patients who have filled at least 1 opioid prescription between October 2011 to September 2015.

https://doi.org/10.1371/journal.pone.0241083.t001

The elastic net model had strong discrimination, with a c-statistic of 0.888 (95% CI: 0.872–0.902), and good accuracy (Brier score: 2.662 x 10−5; Table 2). Performance of the elastic net model was largely consistent across age and gender subgroups. Using different analytic approaches had little influence on model discrimination (traditional logistic regression c-statistic: 0.881, 95% CI: 0.866–0.896; random forest c-statistic: 0.862, 95% CI: 0.845–0.878). However, simpler models based on baseline predictors only (c-statistic: 0.806, 95% CI: 0.787–0.826) and the top 10 predictors only (c-statistic: 0.825, 95% CI: 0.808–0.843) had weaker performance.

thumbnail
Table 2. Comparative performance of models and validation in subgroups in the test set.

https://doi.org/10.1371/journal.pone.0241083.t002

The elastic net model’s predicted probability of opioid overdose in the next month provided close, but slightly underestimated predictions of the observed risk (Fig 2). However, the predicted probabilities were higher than the true risk of overdose in patients in the highest percentile of risk (>99.99th percentile). Random forest was slightly better calibrated compared to elastic net and traditional regression, particularly for patients with the highest predicted probabilities of opioid overdose (Fig 2).

thumbnail
Fig 2. Calibration plot for models predicting opioid overdose in the next month: Comparison of analytic approaches.

All models based on baseline and time-updated predictors.

https://doi.org/10.1371/journal.pone.0241083.g002

The final elastic net model identified 40 predictors for opioid overdose in the next month based on the previous 3–6 months of medical history (Table 3). All identified predictors were associated with increased odds of opioid overdose. Based on model coefficients, the strongest predictors for opioid overdose were age 18–25 years old at first opioid prescription (OR = 2.21 compared to age 26+ years) and at least 1 diagnosis of the following conditions during the 6 months prior to overdose: suicide attempt (OR = 3.68), opioid dependence (OR = 3.14), opioid abuse without dependence (OR = 2.63), and other substance use (OR = 2.58). Prescriptions for 6 types of opioids during the 3 months prior to overdose (compared to no prescriptions of the opioid type during the prior 3 months) were identified as predictors of overdose: fentanyl (OR = 1.14), hydrocodone (OR = 1.10), hydromorphone (OR = 1.20), methadone (OR = 1.16), morphine (OR = 1.14), and oxycodone (OR = 1.15). Prescriptions for five non-opioid medications during the 3 months prior to overdose were identified as predictors, including benzodiazepines (OR = 1.18) and gabapentanoids (OR = 1.11). Several indicators of healthcare utilization were also identified as predictors, including number of unique pharmacies for opioid dispensings (OR = 1.18) and number of hospitalizations (OR = 1.16). Elastic net does not provide an accurate estimate of precision [33], so for reference, we provided ORs and 95% CIs for the conventional logistic regression that was estimated prior to implementing elastic net (Table 3).

thumbnail
Table 3. Model coefficients for the traditional multivariate logistic regression model and the elastic net model.

https://doi.org/10.1371/journal.pone.0241083.t003

At the person-level, several cut points based on predicted probabilities could be used to dichotomize patients into high and low risk groups (Table 4 and S3). Among all potential cut points our algorithm had a high NPV (99.9% for all cut point), and a low PPV, ranging from 0.06% to 3.66%, which was driven by the very low incidence of opioid overdose. Diagnostics at the person-month level performed similarly, with a high NPV and low PPV (S3 Table). However, the PPV was much smaller, ranging from 0.003% to 0.26%, as the risk of opioid overdose in each month (0.003%) is much lower than the risk of overdose at any point during follow up (0.05%).

thumbnail
Table 4. Performance of elastic net model predictions for classifying patients into high-risk and low-risk groups.

https://doi.org/10.1371/journal.pone.0241083.t004

Discussion

Using data-driven methods and a time-updating approach, we developed a prediction model for opioid overdose using data routinely collected in healthcare utilization claims. The final algorithm based on elastic net had strong performance with respect to discrimination and was well calibrated. Use of different analytic techniques (elastic net vs. traditional regression vs. random forest) had a relatively small impact on model performance, whereas inclusion of many time-updating predictors substantially improved prediction in the traditional regression. The final algorithm identified 40 characteristics from a patient’s previous 3–6 month medical history that were predictive of opioid overdose in the next month. These findings suggest that high-dimensional algorithms for opioid overdose based on many time-updated predictors could be used by health systems or payers to monitor patients and help identify those at high-risk for opioid overdose and then target them for interventions, including naloxone prescribing.

Implementation could provide an opportunity for real-time surveillance and early intervention. Details on how to calculate the predicted probability of opioid overdose in the next month for an example patient are shown in S4 Table. The predicted probability could be used to determine whether the patient is at high-risk of overdose. This process could be automated and repeated each month to ensure active surveillance. Thus, this algorithm could be used to monitor patients or automate the detection of high-risk individuals, whose risk factors are already being routinely collected. As a result, this may facilitate early intervention to mitigate the risk of overdose through prescription of naloxone or other clinical strategies, such as opioid tapering or substance abuse treatment referral.

We proposed several potential cut points for classifying high-risk patients, allowing those who implement the algorithm to determine the optimal thresholds for their intervention of interest. Patients could have been flagged as high-risk during multiple months. However, clinical intervention happens at the patient level, so we considered patients as high-risk if any of their person-months were flagged as high-risk. The selection of the cut point inevitability results in a tradeoff between sensitivity and specificity. For example, the optimal cut point of 0.004% would provide a sensitivity of 80.2% and specificity of 80.1%., while a cut point of 0.15% would maximize specificity (99.8%) at the expense of decreasing sensitivity (13.1%). Since the incidence of overdose is low, the PPV was relatively low among all potential thresholds (PPV<3.66%). A highly specific cut point could be used to maximize the PPV, although most of the patients flagged as high-risk would not go on to have an observed opioid overdose event. However, given the seriousness of the outcome of overdose and the availability of low-cost and low-risk interventions for overdose, a low PPV may still have clinical utility. Over an average 17-month follow up, the risk of overdose was 0.05% among the general population of patients with at least 1 opioid prescription, which is much lower than the risk of overdose among patients flagged as high-risk by our algorithm (0.20% using the optimal cut point, 3.66% when maximizing specificity).

Our algorithm builds on previously proposed algorithms for identifying patients at high-risk of opioid overdose in claims data [20]. In addition to considering baseline factors, we constructed our algorithm to accommodate a large number of time-updating predictors in a setup that closely resembles active safety surveillance and to provide information on the predicted probability of overdose over time with monthly updates. Including a large number of time-updated predictors enhanced algorithm performance. Further, we utilized machine learning approaches to address the issue of overfitting that is routinely encountered in prediction models. Recently, an algorithm to predict opioid overdose in Medicare patients was published [21]. Using similar methods, this study found that machine learning algorithms performed well with respect to risk prediction and stratification of overdose. We demonstrate that data-driven algorithms using administrative data are predictive of overdose not only in the elderly, publicly-insured population, but also in commercially-insured populations, where this surveillance tool can be applied to a broader range of patients who may be at higher risk of overdose and may have slightly different risk factors for overdose.

Our study has several limitations. First, opioid overdose events may be under-recorded in claims data [38]. We only detect overdoses that result in presentation to an emergency department or inpatient admissions. Fatal opioid overdoses may also be under captured since death is not a billable event, but the proportion of fatal overdoses is likely small relative to nonfatal overdoses [39, 40]. The underestimated incidence of overdose suggests that our model’s PPV may be underestimated. Second, our study focuses on a population who are dispensed prescription opioids. Many individuals may receive opioids from nonmedical settings, such as family and friends [41]. These exposures are not well captured in claims data. Future work will be needed to determine if information available in claims may be useful for evaluating the risk of overdose in patients who use opioids illicitly. Third, our study consisted of patients who received prescription opioids, including buprenorphine or methadone. This study population may have captured different types of patients (e.g., those who are doctor shopping and those who are receiving treatment for opioid use disorder) who may benefit from different interventions. Additionally, the data were left-censored at October 2011. Although we used an all-available lookback window to exclude patients with a prior overdose, do not know whether the patients in our study population had a diagnosed overdose prior to October 2011. Another limitation is that the list of candidate predictors does not encompass all of the important risk factors of overdose, such as behavioral health and criminal justice variables that are poorly captured in claims data [42]. Predictors were also measured during the previous 3–6 months, and we did not assess whether a longer covariate assessment period could have resulted in better prediction. Despite the potential incomplete capture all relevant predictors, we highlight that high-dimensional time-updated algorithms can outperform simpler models based on a few predictors, and such methodology can be extended to other data sources. Future work will need to explore whether future expanding the range of predictors improves model performance. Next, internal validation was used to assess model performance. Generalizability to other populations, such as those insured by Medicaid, will need to be assessed in future studies. Finally, we defined medical conditions using ICD-9-CM codes, but the US recently transitioned to International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) [43]. Our goal was to capture the relevant medical conditions, as opposed to the specific codes, that predict overdose. Therefore, our algorithm can be implemented in ICD-10-CM through mapping the ICD-9-CM codes to ICD-10-CM.

In conclusion, this study suggests that sophisticated algorithms using data routinely collected in healthcare utilization claims can be predictive of opioid overdose. It highlights the feasibility of using high-dimensional algorithms by payers to create monitoring programs to prospectively identify high-risk patients and create an opportunity for intervention, such as administering naloxone, before an overdose occurs.

Supporting information

S1 Fig. Flow diagram of cohort assembly.

https://doi.org/10.1371/journal.pone.0241083.s001

(PDF)

S1 Table. ICD-9-CM codes for candidate predictors.

https://doi.org/10.1371/journal.pone.0241083.s002

(DOCX)

S2 Table. Characteristics for each person-month of follow up among patients with ≥1 opioid prescription, October 2011 to September 2015.

https://doi.org/10.1371/journal.pone.0241083.s003

(DOCX)

S3 Table. Diagnostics for dichotomizing into “high” and “low” risk groups using different cutpoints.

https://doi.org/10.1371/journal.pone.0241083.s004

(DOCX)

S4 Table. Example calculation for the predicted probability of opioid overdose in the next month.

https://doi.org/10.1371/journal.pone.0241083.s005

(DOCX)

References

  1. 1. Manchikanti L, Fellows B, Ailinani H, Pampati V. Therapeutic use, abuse, and nonmedical use of opioids: a ten-year perspective. Pain Physician. 2010;13(5):401–35.
  2. 2. Policy WHOoNDC. National Drug Control Strategy 2014. 2014.
  3. 3. US Executive Office of the President OoNDCP. Epidemic: Responding to America’s Prescription Drug Abuse Crisis. 2011.
  4. 4. Crisis PsCCDAatO. Interim Report. Washington, DC; 2017.
  5. 5. Rudd RA, Aleshire N, Zibbell JE, Gladden RM. Increases in Drug and Opioid Overdose Deaths—United States, 2000–2014. MMWR Morb Mortal Wkly Rep. 2016;64(50–51):1378–82.
  6. 6. National Institutes of Health NIoDA. Overdose death rates 2019 [Available from: https://www.drugabuse.gov/related-topics/trends-statistics/overdose-death-rates.
  7. 7. Olfson M, Rossen LM, Wall MM, Houry D, Blanco C. Trends in Intentional and Unintentional Opioid Overdose Deaths in the United States, 2000–2017. JAMA. 2019;322(23):2340–2.
  8. 8. Blau M. Opioids could kill nearly 500,000 Americans in the next decade June 27, 2017 [Available from: https://www.statnews.com/2017/06/27/opioid-deaths-forecast/.
  9. 9. National Institutes of Health NIoDA. Overdose death rates 2017 [Available from: https://www.drugabuse.gov/related-topics/trends-statistics/overdose-death-rates.
  10. 10. Seth P, Scholl L, Rudd RA, Bacon S. Overdose Deaths Involving Opioids, Cocaine, and Psychostimulants—United States, 2015–2016. MMWR Morb Mortal Wkly Rep. 2018;67(12):349–58.
  11. 11. Seth P, Rudd RA, Noonan RK, Haegerich TM. Quantifying the Epidemic of Prescription Opioid Overdose Deaths. Am J Public Health. 2018;108(4):500–2.
  12. 12. Scholl L, Seth P, Kariisa M, Wilson N, Baldwin G. Drug and Opioid-Involved Overdose Deaths—United States, 2013–2017. MMWR Morb Mortal Wkly Rep. 2018.
  13. 13. Oderda GM, Lake J, Rudell K, Roland CL, Masters ET. Economic Burden of Prescription Opioid Misuse and Abuse: A Systematic Review. J Pain Palliat Care Pharmacother. 2015;29(4):388–400.
  14. 14. Hsu DJ, McCarthy EP, Stevens JP, Mukamal KJ. Hospitalizations, costs and outcomes associated with heroin and prescription opioid overdoses in the United States 2001–12. Addiction. 2017;112(9):1558–64.
  15. 15. Gomes T, Tadrous M, Mamdani MM, Paterson J, Juurlink DN. The burden of opioid-related mortality in the united states. JAMA Network Open. 2018;1(2):e180217. pmid:30646062
  16. 16. Florence CS, Zhou C, Luo F, Xu L. The Economic Burden of Prescription Opioid Overdose, Abuse, and Dependence in the United States, 2013. Med Care. 2016;54(10):901–6.
  17. 17. Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58(4):323–37.
  18. 18. Canan C, Polinski JM, Alexander GC, Kowal MK, Brennan TA, Shrank WH. Automatable algorithms to identify nonmedical opioid use using electronic data: a systematic review. J Am Med Inform Assoc. 2017;24(6):1204–10. pmid:29016967
  19. 19. Reps JM, Cepeda MS, Ryan PB. Wisdom of the CROUD: Development and validation of a patient-level prediction model for opioid use disorder using population-level claims data. PLoS One. 2020;15(2):e0228632. pmid:32053653
  20. 20. Cochran G, Gordon AJ, Lo-Ciganic WH, Gellad WF, Frazier W, Lobo C, et al. An Examination of Claims-based Predictors of Overdose from a Large Medicaid Program. Med Care. 2017;55(3):291–8.
  21. 21. Lo-Ciganic WH, Huang JL, Zhang HH, Weiss JC, Wu Y, Kwoh CK, et al. Evaluation of Machine-Learning Algorithms for Predicting Opioid Overdose Risk Among Medicare Beneficiaries With Opioid Prescriptions. JAMA Netw Open. 2019;2(3):e190968.
  22. 22. Tudor CG. Memorandum: Medicare Part D Overutilization Monitoring System. Centers for Medicare and Medicaid Services; 2013.
  23. 23. Venebio. [Available from: https://venebio.com/technologies/venebio-opioid-advisor/.
  24. 24. Rough K, Huybrechts KF, Hernandez-Diaz S, Desai RJ, Patorno E, Bateman BT. Using prescription claims to detect aberrant behaviors with opioids: comparison and validation of 5 algorithms. Pharmacoepidemiol Drug Saf. 2018.
  25. 25. Lippold KM, Jones CM, Olsen EO, Giroir BP. Racial/Ethnic and Age Group Differences in Opioid and Synthetic Opioid-Involved Overdose Deaths Among Adults Aged >/ = 18 Years in Metropolitan Areas—United States, 2015–2017. MMWR Morb Mortal Wkly Rep. 2019;68(43):967–73.
  26. 26. Boyer EW. Management of opioid analgesic overdose. N Engl J Med. 2012;367(2):146–55.
  27. 27. Evidence-Based Strategies for Preventing Opioid Overdose: What’s Working in the United States. Centers for Disease Control and Prevention: National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services; 2018 [Available from: https://www.cdc.gov/drugoverdose/pdf/pubs/2018-evidence-based-strategies.pdf.
  28. 28. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Statist Soc B. 2005;67(Part 2):301–20.
  29. 29. Varian HR. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives. 2014;28(2):3–28.
  30. 30. Segal MR. Machine learning benchmarks and random forest regression. Center for Bioinformatics & Molecular Biostatistics. 2004.
  31. 31. Green CA, Perrin NA, Janoff SL, Campbell CI, Chilcoat HD, Coplan PM. Assessing the accuracy of opioid overdose and poisoning codes in diagnostic information from electronic health records, claims data, and death records. Pharmacoepidemiol Drug Saf. 2017;26(5):509–17. pmid:28074520
  32. 32. Steyerberg EW, Harrell FE Jr., Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54(8):774–81.
  33. 33. Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society. 2011;73(3):273–82.
  34. 34. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38.
  35. 35. Breiman L. Random forests. Machine learning. 2001;45(1):5–32.
  36. 36. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1).
  37. 37. Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002;2(3):18–22.
  38. 38. Rowe C, Vittinghoff E, Santos GM, Behar E, Turner C, Coffin PO. Performance Measures of Diagnostic Codes for Detecting Opioid Overdose in the Emergency Department. Acad Emerg Med. 2017;24(4):475–83.
  39. 39. Elzey MJ, Barden SM, Edwards ES. Patient Characteristics and Outcomes in Unintentional, Non-fatal Prescription Opioid Overdoses: A Systematic Review. Pain Physician. 2016;19(4):215–28.
  40. 40. Dunn KM, Saunders KW, Rutter CM, Banta-Green CJ, Merrill JO, Sullivan MD, et al. Opioid prescriptions for chronic pain and overdose: a cohort study. Ann Intern Med. 2010;152(2):85–92.
  41. 41. Lipari RN, Hughes A. How People Obtain the Prescription Pain Relievers They Misuse. The CBHSQ Report. Rockville (MD)2013. p. 1–7.
  42. 42. Larochelle MR, Bernstein R, Bernson D, Land T, Stopka TJ, Rose AJ, et al. Touchpoints—Opportunities to predict and prevent opioid overdose: A cohort study. Drug Alcohol Depend. 2019;204:107537.
  43. 43. International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM): Centers for Disease Control and Prevention; 2018 [updated July 26, 2018. Available from: https://www.cdc.gov/nchs/icd/icd10cm.htm#FY%202019%20release%20of%20ICD-10-CM.