skip to main content
research-article
Open Access

Flexible Modelling of Longitudinal Medical Data: A Bayesian Nonparametric Approach

Authors Info & Claims
Published:02 March 2020Publication History
Skip Abstract Section

Abstract

Using electronic medical records to learn personalized risk trajectories poses significant challenges because often very few samples are available in a patient’s history, and, when available, their information content is highly diverse. In this article, we consider how to integrate sparsely sampled longitudinal data, missing measurements informative of the underlying health status, and static information to estimate (dynamically, as new information becomes available) personalized survival distributions. We achieve this by developing a nonparametric probabilistic model that generates survival trajectories, and corresponding uncertainty estimates, from an ensemble of Bayesian trees in which time is incorporated explicitly to learn variable interactions over time, without needing to specify the longitudinal process beforehand. As such, the changing influence on survival of variables over time is inferred from the data directly, which we analyze with post-processing statistics derived from our model.

References

  1. Kartik Ahuja, William Zame, and Mihaela van der Schaar. 2017. DPSCREEN: Dynamic personalized screening. In Advances in Neural Information Processing Systems 30 (NIPS’17). 1321--1332.Google ScholarGoogle Scholar
  2. Ahmed M. Alaa, Scott Hu, and Mihaela van der Schaar. 2017. Learning from clinical judgments: Semi-Markov-modulated marked Hawkes processes for risk prognosis. In Proceedings of theInternational Conference of Machine Learning.Google ScholarGoogle Scholar
  3. James H. Albert and Siddhartha Chib. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88, 422 (1993), 669--679.Google ScholarGoogle ScholarCross RefCross Ref
  4. Eleni-Rosalina Andrinopoulou, Dimitris Rizopoulos, Ruyun Jin, Ad J. J. C. Bogers, Emmanuel Lesaffre, and Johanna J. M. Takkenberg. 2012. An introduction to mixed models and joint modeling: Analysis of valve function over time. Annals of Thoracic Surgery 93, 6 (2012), 1765--1772.Google ScholarGoogle Scholar
  5. Peter C. Austin. 2012. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Statistics in Medicine 31, 29 (2012), 3946--3958.Google ScholarGoogle Scholar
  6. Edmon Begoli, Tanmoy Bhattacharya, and Dimitri Kusnezov. 2019. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 1, 1 (2019), 20.Google ScholarGoogle ScholarCross RefCross Ref
  7. Melanie L. Bell, Mallorie Fiero, Nicholas J. Horton, and Chiu-Hsieh Hsu. 2014. Handling missing data in RCTs; a review of the top medical journals. BMC Medical Research Methodology 14 (2014), 118.Google ScholarGoogle ScholarCross RefCross Ref
  8. Alexis Bellot and Mihaela van der Schaar. 2018. Boosted trees for risk prognosis. In Proceedings of the Machine Learning for Healthcare Conference. 2--16.Google ScholarGoogle Scholar
  9. Alexis Bellot and Mihaela van der Schaar. 2018. Multitask boosting for survival analysis with competing risks. In Advances in Neural Information Processing Systems 31 (NIPS’18). 1390--1399.Google ScholarGoogle Scholar
  10. Krishnan Bhaskaran, Harriet J. Forbes, Ian Douglas, David A. Leon, and Liam Smeeth. 2013. Representativeness and optimal use of body mass index (BMI) in the UK clinical practice research datalink (CPRD). BMJ Open 3, 9 (2013), e003389.Google ScholarGoogle Scholar
  11. Hugh A. Chipman, Edward I. George, and Robert E. McCulloch. 2010. BART: Bayesian additive regression trees. Annals of Applied Statistics 4, 1 (2010), 266--298.Google ScholarGoogle ScholarCross RefCross Ref
  12. Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016. Doctor AI: Predicting clinical events via recurrent neural networks. In Proceedings of the Machine Learning for Healthcare Conference. 301--318.Google ScholarGoogle Scholar
  13. Tamara Fernández, Nicolás Rivera, and Yee Whye Teh. 2016. Gaussian processes for survival analysis. In Advances in Neural Information Processing Systems. 5021--5029.Google ScholarGoogle Scholar
  14. Loïc Ferrer, Hein Putter, and Cécile Proust-Lima. 2017. Individual dynamic predictions using landmarking and joint modelling: Validation of estimators and robustness assessment. arXiv:1707.03706.Google ScholarGoogle Scholar
  15. Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189--1232.Google ScholarGoogle ScholarCross RefCross Ref
  16. Thomas A. Gerds, Michael W. Kattan, Martin Schumacher, and Changhong Yu. 2013. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine 32, 13 (2013), 2173--2184.Google ScholarGoogle Scholar
  17. Yanzhang Gong, S. L. Klingenberg, and C. Gluud. 2006. Systematic review and meta-analysis: D-Penicillamine vs. placebo/no intervention in patients with primary biliary cirrhosis—Cochrane Hepato-Biliary Group. Alimentary Pharmacology 8 Therapeutics 24, 11--12 (2006), 1535--1544.Google ScholarGoogle Scholar
  18. Trevor Hastie and Robert Tibshirani. 2000. Bayesian backfitting. Statistical Science 15, 3 (2000), 196--223.Google ScholarGoogle ScholarCross RefCross Ref
  19. Robin Henderson, Peter Diggle, and Angela Dobson. 2000. Joint modelling of longitudinal measurements and event time data. Biostatistics 1, 4 (2000), 465--480.Google ScholarGoogle ScholarCross RefCross Ref
  20. Graeme L. Hickey, Pete Philipson, Andrea Jorgensen, and Ruwanthi Kolamunnage-Dona. 2016. Joint modelling of time-to-event and multivariate longitudinal outcomes: Recent developments and issues. BMC Medical Research Methodology 16, 1 (2016), 117.Google ScholarGoogle Scholar
  21. Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer. 2008. Random survival forests. Annals of Applied Statistics 2, 3 (2008), 841--860.Google ScholarGoogle ScholarCross RefCross Ref
  22. Adam Kapelner and Justin Bleich. 2013. bartMachine: Machine learning with Bayesian additive regression trees. arXiv:1312.2171.Google ScholarGoogle Scholar
  23. Silvan Licher, Alis Heshmatollah, Kimberly D. van der Willik, Bruno H. Ch Stricker, Rikje Ruiter, Emmely W. de Roos, Lies Lahousse. 2019. Lifetime risk and multimorbidity of non-communicable diseases and disease-free life expectancy in the general population: A population-based cohort study. PLoS Medicine 16, 2 (2019), e1002741.Google ScholarGoogle Scholar
  24. Bryan Lim and Mihaela van der Schaar. 2018. Disease-Atlas: Navigating disease trajectories with deep learning. In Proceedings of the Machine Learning for Healthcare Conference.Google ScholarGoogle Scholar
  25. Zachary C. Lipton. 2016. The mythos of model interpretability. arXiv:1606.03490.Google ScholarGoogle Scholar
  26. Zachary C. Lipton. 2017. The doctor just won’t accept that! arXiv:1711.08037.Google ScholarGoogle Scholar
  27. Roderick J. A. Little and Donald B. Rubin. 2014. Statistical Analysis with Missing Data. Vol. 333. John Wiley 8 Sons.Google ScholarGoogle Scholar
  28. Matthew Powney, Paula Williamson, Jamie Kirkham, and Ruwanthi Kolamunnage-Dona. 2014. A review of the handling of missing longitudinal outcome data in clinical trials. Trials 15, 1 (2014), 237.Google ScholarGoogle Scholar
  29. Rajesh Ranganath, Adler Perotte, Noémie Elhadad, and David Blei. 2016. Deep survival analysis. In Proceedings of the Machine Learning for Healthcare Conference. 101--114.Google ScholarGoogle Scholar
  30. Dimitris Rizopoulos. 2011. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 67, 3 (2011), 819--829.Google ScholarGoogle ScholarCross RefCross Ref
  31. Patrick Royston. 2004. Multiple imputation of missing values. Stata Journal 4, 3 (2004), 227--41.Google ScholarGoogle ScholarCross RefCross Ref
  32. Judith D. Singer and John B. Willett. 1993. It’s about time: Using discrete-time survival analysis to study duration and the timing of events. Journal of Educational Statistics 18, 2 (1993), 155--195.Google ScholarGoogle Scholar
  33. Hossein Soleimani, James Hensman, and Suchi Saria. 2017. Scalable joint models for reliable uncertainty-aware event prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 8 (2017), 1948--1963.Google ScholarGoogle Scholar
  34. Rodney A. Sparapani, Brent R. Logan, Robert E. McCulloch, and Purushottam W. Laud. 2016. Nonparametric survival analysis using Bayesian additive regression trees (BART). Statistics in Medicine 35, 16 (2016), 2741--2753.Google ScholarGoogle Scholar
  35. Damian C. Stanziano, Michael Whitehurst, Patricia Graham, and Bernard A. Roos. 2010. A review of selected longitudinal studies on aging: Past findings and future directions. Journal of the American Geriatrics Society 58 (2010), S292--S297.Google ScholarGoogle Scholar
  36. Terry M. Therneau and P. M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model, P. Bickel, P. Diggle, S. Fienberg, et al. (Eds.). Statistics for Biology and Health. Springer.Google ScholarGoogle Scholar
  37. Beth Twala, M. C. Jones, and David J. Hand. 2008. Good methods for coping with missing data in decision trees. Pattern Recognition Letters 29, 7 (2008), 950--956.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hans C. Van Houwelingen. 2007. Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics 34, 1 (2007), 70--85.Google ScholarGoogle Scholar
  39. Andrew K. Wills, Debbie A. Lawlor, Fiona E. Matthews, Avan Aihie Sayer, Eleni Bakra, Yoav Ben-Shlomo, Michaela Benzeval, et al. 2011. Life course trajectories of systolic blood pressure using longitudinal data from eight UK cohorts. PLoS Medicine 8, 6 (2011), e1000440.Google ScholarGoogle Scholar

Index Terms

  1. Flexible Modelling of Longitudinal Medical Data: A Bayesian Nonparametric Approach

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computing for Healthcare
          ACM Transactions on Computing for Healthcare  Volume 1, Issue 1
          January 2020
          99 pages
          EISSN:2637-8051
          DOI:10.1145/3386261
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 March 2020
          • Received: 1 October 2019
          • Accepted: 1 October 2019
          Published in health Volume 1, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format