research-article

Open Access

Flexible Modelling of Longitudinal Medical Data: A Bayesian Nonparametric Approach

Authors:
Alexis Bellot

University of Cambridge and the Alan Turing Institute

University of Cambridge and the Alan Turing Institute
View Profile

,
Mihaela Van Der Schaar

University of Cambridge, the Alan Turing Institute, and University of California Los Angeles

University of Cambridge, the Alan Turing Institute, and University of California Los Angeles
View Profile

Authors Info & Claims

ACM Transactions on Computing for Healthcare Volume 1 Issue 1Article No.: 3pp 1–15https://doi.org/10.1145/3377164

Published:02 March 2020Publication History

ACM Transactions on Computing for Healthcare

Abstract

Using electronic medical records to learn personalized risk trajectories poses significant challenges because often very few samples are available in a patient’s history, and, when available, their information content is highly diverse. In this article, we consider how to integrate sparsely sampled longitudinal data, missing measurements informative of the underlying health status, and static information to estimate (dynamically, as new information becomes available) personalized survival distributions. We achieve this by developing a nonparametric probabilistic model that generates survival trajectories, and corresponding uncertainty estimates, from an ensemble of Bayesian trees in which time is incorporated explicitly to learn variable interactions over time, without needing to specify the longitudinal process beforehand. As such, the changing influence on survival of variables over time is inferred from the data directly, which we analyze with post-processing statistics derived from our model.

References

Kartik Ahuja, William Zame, and Mihaela van der Schaar. 2017. DPSCREEN: Dynamic personalized screening. In Advances in Neural Information Processing Systems 30 (NIPS’17). 1321--1332.Google Scholar
Ahmed M. Alaa, Scott Hu, and Mihaela van der Schaar. 2017. Learning from clinical judgments: Semi-Markov-modulated marked Hawkes processes for risk prognosis. In Proceedings of theInternational Conference of Machine Learning.Google Scholar
James H. Albert and Siddhartha Chib. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88, 422 (1993), 669--679.Google ScholarCross Ref
Eleni-Rosalina Andrinopoulou, Dimitris Rizopoulos, Ruyun Jin, Ad J. J. C. Bogers, Emmanuel Lesaffre, and Johanna J. M. Takkenberg. 2012. An introduction to mixed models and joint modeling: Analysis of valve function over time. Annals of Thoracic Surgery 93, 6 (2012), 1765--1772.Google Scholar
Peter C. Austin. 2012. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Statistics in Medicine 31, 29 (2012), 3946--3958.Google Scholar
Edmon Begoli, Tanmoy Bhattacharya, and Dimitri Kusnezov. 2019. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 1, 1 (2019), 20.Google ScholarCross Ref
Melanie L. Bell, Mallorie Fiero, Nicholas J. Horton, and Chiu-Hsieh Hsu. 2014. Handling missing data in RCTs; a review of the top medical journals. BMC Medical Research Methodology 14 (2014), 118.Google ScholarCross Ref
Alexis Bellot and Mihaela van der Schaar. 2018. Boosted trees for risk prognosis. In Proceedings of the Machine Learning for Healthcare Conference. 2--16.Google Scholar
Alexis Bellot and Mihaela van der Schaar. 2018. Multitask boosting for survival analysis with competing risks. In Advances in Neural Information Processing Systems 31 (NIPS’18). 1390--1399.Google Scholar
Krishnan Bhaskaran, Harriet J. Forbes, Ian Douglas, David A. Leon, and Liam Smeeth. 2013. Representativeness and optimal use of body mass index (BMI) in the UK clinical practice research datalink (CPRD). BMJ Open 3, 9 (2013), e003389.Google Scholar
Hugh A. Chipman, Edward I. George, and Robert E. McCulloch. 2010. BART: Bayesian additive regression trees. Annals of Applied Statistics 4, 1 (2010), 266--298.Google ScholarCross Ref
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016. Doctor AI: Predicting clinical events via recurrent neural networks. In Proceedings of the Machine Learning for Healthcare Conference. 301--318.Google Scholar
Tamara Fernández, Nicolás Rivera, and Yee Whye Teh. 2016. Gaussian processes for survival analysis. In Advances in Neural Information Processing Systems. 5021--5029.Google Scholar
Loïc Ferrer, Hein Putter, and Cécile Proust-Lima. 2017. Individual dynamic predictions using landmarking and joint modelling: Validation of estimators and robustness assessment. arXiv:1707.03706.Google Scholar
Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189--1232.Google ScholarCross Ref
Thomas A. Gerds, Michael W. Kattan, Martin Schumacher, and Changhong Yu. 2013. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine 32, 13 (2013), 2173--2184.Google Scholar
Yanzhang Gong, S. L. Klingenberg, and C. Gluud. 2006. Systematic review and meta-analysis: D-Penicillamine vs. placebo/no intervention in patients with primary biliary cirrhosis—Cochrane Hepato-Biliary Group. Alimentary Pharmacology 8 Therapeutics 24, 11--12 (2006), 1535--1544.Google Scholar
Trevor Hastie and Robert Tibshirani. 2000. Bayesian backfitting. Statistical Science 15, 3 (2000), 196--223.Google ScholarCross Ref
Robin Henderson, Peter Diggle, and Angela Dobson. 2000. Joint modelling of longitudinal measurements and event time data. Biostatistics 1, 4 (2000), 465--480.Google ScholarCross Ref
Graeme L. Hickey, Pete Philipson, Andrea Jorgensen, and Ruwanthi Kolamunnage-Dona. 2016. Joint modelling of time-to-event and multivariate longitudinal outcomes: Recent developments and issues. BMC Medical Research Methodology 16, 1 (2016), 117.Google Scholar
Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer. 2008. Random survival forests. Annals of Applied Statistics 2, 3 (2008), 841--860.Google ScholarCross Ref
Adam Kapelner and Justin Bleich. 2013. bartMachine: Machine learning with Bayesian additive regression trees. arXiv:1312.2171.Google Scholar
Silvan Licher, Alis Heshmatollah, Kimberly D. van der Willik, Bruno H. Ch Stricker, Rikje Ruiter, Emmely W. de Roos, Lies Lahousse. 2019. Lifetime risk and multimorbidity of non-communicable diseases and disease-free life expectancy in the general population: A population-based cohort study. PLoS Medicine 16, 2 (2019), e1002741.Google Scholar
Bryan Lim and Mihaela van der Schaar. 2018. Disease-Atlas: Navigating disease trajectories with deep learning. In Proceedings of the Machine Learning for Healthcare Conference.Google Scholar
Zachary C. Lipton. 2016. The mythos of model interpretability. arXiv:1606.03490.Google Scholar
Zachary C. Lipton. 2017. The doctor just won’t accept that! arXiv:1711.08037.Google Scholar
Roderick J. A. Little and Donald B. Rubin. 2014. Statistical Analysis with Missing Data. Vol. 333. John Wiley 8 Sons.Google Scholar
Matthew Powney, Paula Williamson, Jamie Kirkham, and Ruwanthi Kolamunnage-Dona. 2014. A review of the handling of missing longitudinal outcome data in clinical trials. Trials 15, 1 (2014), 237.Google Scholar
Rajesh Ranganath, Adler Perotte, Noémie Elhadad, and David Blei. 2016. Deep survival analysis. In Proceedings of the Machine Learning for Healthcare Conference. 101--114.Google Scholar
Dimitris Rizopoulos. 2011. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 67, 3 (2011), 819--829.Google ScholarCross Ref
Patrick Royston. 2004. Multiple imputation of missing values. Stata Journal 4, 3 (2004), 227--41.Google ScholarCross Ref
Judith D. Singer and John B. Willett. 1993. It’s about time: Using discrete-time survival analysis to study duration and the timing of events. Journal of Educational Statistics 18, 2 (1993), 155--195.Google Scholar
Hossein Soleimani, James Hensman, and Suchi Saria. 2017. Scalable joint models for reliable uncertainty-aware event prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 8 (2017), 1948--1963.Google Scholar
Rodney A. Sparapani, Brent R. Logan, Robert E. McCulloch, and Purushottam W. Laud. 2016. Nonparametric survival analysis using Bayesian additive regression trees (BART). Statistics in Medicine 35, 16 (2016), 2741--2753.Google Scholar
Damian C. Stanziano, Michael Whitehurst, Patricia Graham, and Bernard A. Roos. 2010. A review of selected longitudinal studies on aging: Past findings and future directions. Journal of the American Geriatrics Society 58 (2010), S292--S297.Google Scholar
Terry M. Therneau and P. M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model, P. Bickel, P. Diggle, S. Fienberg, et al. (Eds.). Statistics for Biology and Health. Springer.Google Scholar
Beth Twala, M. C. Jones, and David J. Hand. 2008. Good methods for coping with missing data in decision trees. Pattern Recognition Letters 29, 7 (2008), 950--956.Google ScholarDigital Library
Hans C. Van Houwelingen. 2007. Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics 34, 1 (2007), 70--85.Google Scholar
Andrew K. Wills, Debbie A. Lawlor, Fiona E. Matthews, Avan Aihie Sayer, Eleni Bakra, Yoav Ben-Shlomo, Michaela Benzeval, et al. 2011. Life course trajectories of systolic blood pressure using longitudinal data from eight UK cohorts. PLoS Medicine 8, 6 (2011), e1000440.Google Scholar

Index Terms

Flexible Modelling of Longitudinal Medical Data: A Bayesian Nonparametric Approach

Recommendations

Full Bayesian inference with hazard mixture models

Bayesian nonparametric inferential procedures based on Markov chain Monte Carlo marginal methods typically yield point estimates in the form of posterior expectations. Though very useful and easy to implement in a variety of statistical problems, these ...
Read More
Goodness-of-fit tests for modeling longitudinal ordinal data

Longitudinal studies involving categorical responses are extensively applied in many fields of research and are often fitted by the generalized estimating equations (GEE) approach and generalized linear mixed models (GLMMs). The assessment of model fit ...
Read More
Bayesian semiparametric analysis of short- and long-term hazard ratios with covariates

A full Bayesian analysis is developed for an extension to the short-term and long-term hazard ratios model that has been previously introduced. This model is specified by two parameters, short- and long-term hazard ratios respectively, and an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computing for Healthcare Volume 1, Issue 1
January 2020
99 pages
EISSN:2637-8051
DOI:10.1145/3386261
Editors:
Insup Lee
University of Pennsylvania, USA
,
John A. Stankovic
University of Virginia, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 March 2020
- Received: 1 October 2019
- Accepted: 1 October 2019
Published in health Volume 1, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian nonparametrics
survival analysis
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 1,077
  Total Downloads
- Downloads (Last 12 months)177
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Flexible Modelling of Longitudinal Medical Data: A Bayesian Nonparametric Approach

ACM Transactions on Computing for Healthcare

Abstract

References

Cited By

Index Terms

Recommendations

Full Bayesian inference with hazard mixture models

Goodness-of-fit tests for modeling longitudinal ordinal data

Bayesian semiparametric analysis of short- and long-term hazard ratios with covariates

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Flexible Modelling of Longitudinal Medical Data: A Bayesian Nonparametric Approach

ACM Transactions on Computing for Healthcare

Abstract

References

Cited By

Index Terms

Recommendations

Full Bayesian inference with hazard mixture models

Goodness-of-fit tests for modeling longitudinal ordinal data

Bayesian semiparametric analysis of short- and long-term hazard ratios with covariates

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media