Predictive modeling of nontuberculous mycobacterial pulmonary disease epidemiology using German health claims data

https://doi.org/10.1016/j.ijid.2021.01.003Get rights and content
Under a Creative Commons license
open access

Highlights

  • Machine learning using historical claims data may predict previously undiagnosed NTM-PD.

  • A random forest model with a risk threshold >99% performed best (AUC 0.847; total error 19.4%).

  • Prevalence increased 5-fold to 19/100,000 for both coded and non-coded vs. coded cases alone.

  • Correspondingly, incidence increased 9-fold to 15/100,000 population in 2016.

  • A relevant number of previously unreported NTM-PD cases were identified with high probability.

Abstract

Objectives

Administrative claims data are prone to underestimate the burden of non-tuberculous mycobacterial pulmonary disease (NTM-PD).

Methods

We developed machine learning-based algorithms using historical claims data from cases with NTM-PD to predict patients with a high probability of having previously undiagnosed NTM-PD and to assess actual prevalence and incidence. Adults with incident NTM-PD were classified from a representative 5% sample of the German population covered by statutory health insurance during 2011–2016 by the International Classification of Diseases, 10th revision code A31.0. Pre-diagnosis characteristics (patient demographics, comorbidities, diagnostic and therapeutic procedures, and medications) were extracted and compared to that of a control group without NTM-PD to identify risk factors.

Results

Applying a random forest model (area under the curve 0.847; total error 19.4%) and a risk threshold of >99%, prevalence and incidence rates in 2016 increased 5-fold and 9-fold to 19 and 15 cases/100,000 population, respectively, for both coded and non-coded vs. coded cases alone.

Conclusions

The use of a machine learning-based algorithm applied to German statutory health insurance claims data predicted a considerable number of previously unreported NTM-PD cases with high probabilty.

Keywords

Epidemiology
Insurance claims analysis
Machine learning
Nontuberculous mycobacteria
Nontuberculous mycobacterium infections
Probability learning

Cited by (0)

Preliminary results from this study were presented at the Annual Congress of the German Respiratory Society; March 13–16, 2019; Munich, Germany.