Abstract
Precision medicine or patient tailoring is very important for drug development due to its potential of increasing efficacy and/or reducing adverse reaction for the right patients (with genomic or other types of biomarker). The next two chapters will discuss statistical methods pertinent to biomarkers that are the important ingredient in developing precision medicine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alemayehu D, Chen Y, Markatou M. A comparative study of subgroup identification methods for differential treatment effect: performance metrics and recommendations. Statistical Methods in Medical Research 0 (0): 1–21 (2017).
Battioui C, Shen L, Ruberg S. A Resampling-based Ensemble Tree Method to Identify Patient Subgroups with Enhanced Treatment Effect. JSM proceedings (2014).
Berger J, Wang X, Shen L. A Bayesian approach to subgroup identification. Journal of Biopharmaceutical statistics 24: 110–129 (2014).
Boyiadzis MM, Kirkwood JM, Marshall JL, Pritchard CC, Azad NS, Gulley JL. Significance and implications of FDA approval of pembrolizumab for biomarker-defined disease. Journal of ImmunoTherapy of Cancer 6:35 (2018).
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Wadsworth: Belmont, CA (1984).
Breiman, L. Bagging predictors. Machine Learning 24: 123–140 (1996).
Breiman, L. Random forests. Machine Learning 45: 5–32 (2001).
Buettner R, Wolf J, Thomas RK. Lessons learned from lung cancer genomics: the emerging concept of individualized diagnostics and treatment. Journal of Clinical Oncology 31: 1858–1865 (2013).
Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ & Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14: 365–376 (2013).
Carbone DP, Reck M, Paz-Ares L et al. First-line Nivolumab in stage IV or recurrent non-small-cell lung cancer. New England Journal of Medicine 376: 2415–26 (2017).
Cardoso F, van’t Veer LJ, Bogaerts J, Slaets L, Viale G, Delaloge S, Pierga JY, Brain E, Causeret S, DeLorenzi M, Glas AM. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. New England Journal of Medicine. 2016 Aug 25; 375(8):717–29.
Chen T, Guestrin C. XGBoost: a scalable tree boosting algorithm. ACM Digital Library (2016).
Chen JH, Asch SM. Machine learning and prediction in medicine – beyond the peak of inflated expectations. New England Journal of Medicine 376: 2507–2509 (2017).
Chen JH, Alagappan M, Goldstein MK, Asch SM, Altman RB. Decaying relevance of clinical data towards future decisions in data-driven inpatient clinical order sets. International Journal of Medical Informatics 102: 71–79 (2017).
Chipman HA, George EI, McCulloch RE BART: Bayesian additive regression trees. The Annals of Applied Statistics 4: 266–298 (2010).
Christensen JG, Zou HY, Arango ME, et al. Cytoreductive antitumor activity of PF-2341066, a novel inhibitor of anaplastic lymphoma kinase and c-Met, in experimental models of anaplastic large-cell lymphoma. Molecular Cancer Therapeutics 6: 3314–22 (2007).
Deo RC. Machine learning in medicine. Circulation 132: 1920–1930 (2015).
Dmitrienko A, Muysers C, Fritsch A, Lipkovich I. General guidance on exploratory and confirmatory subgroup analysis in late-stage clinical trials. Journal of Biopharmaceutical Statistics 26: 71–98 (2016).
Dobashi Y, Goto A, Kimura M, Nakano T. Molecularly Targeted Therapy: Past, Present and Future. Chemotherapy. 2012;1(105):2.
Domingos, P. The master algorithm. Basic Books, a member of Perseus Books Group, New York (2015).
Dusseldorf E, Conversano C, Van Os BJ. Combining an additive and tree-based regression model simultaneously: STIMA. Journal of Computational and Graphical Statistics 19: 514–530 (2010).
Dusseldorf E, Van Mechelen I. Qualitative interaction trees: a tool to identify qualitative treatment-subgroup interactions. Statistics in Medicine 33: 219–237 (2014).
Efron, B. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7, 1–26 (1979).
Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21(2): 171–178 (2004).
Fisher, RA. The Design of Experiments. New York: Hafner (1935).
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association 96: 1348–1360 (2001).
“FDA grants accelerated approval to first drug for Duchenne muscular dystrophy”. Press Announcements. U.S. Food & Drug Administration. September 19, 2016. Retrieved September 19, 2016.
Foster JC, Taylor JMC, Ruberg SJ. Subgroup identification from randomized clinical trial data. Statistics in Medicine 30: 2867–2880 (2011).
Foster JC, Nan B, Shen L, Kaciroti N, Taylor JMC. Permutation testing for treatment-covariate interactions and subgroup identification. Statistics in Biosciences 8 (1): 77–98 (2016).
Freund Y, Schapire RE. A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences 55: 119–139 (1997).
Freidlin B, Simon R. Adaptive signature design: an adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clinical Cancer Research 2005; 11:7872–7878.
Friedman JH, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. The Annals of Statistics 28: 337–407 (2000).
Friedman JH. Greedy function approximation: a gradient boosting machine. The Annals of Statistics 29: 1189–1232 (2001).
Frueh FW. Personalized medicine: What is it? How will it affect health care? 11th Annual FDA Science Forum, 2005.
Fu WJ. The Bridge vs Lasso. Journal of Computational and Graphical Statistics 7 (3). Taylor & Francis: 397–416 (1998).
Garon EB, Rizvi NA, Hui R, Leighl N, Balmanoukian AS, Eder JP, et al. Pembrolizumab for the treatment of non-small-cell lung cancer. New England Journal of Medicine 372: 2018–2028 (2015).
Gombar C and Loh E. Drug Discovery & Development magazine 10 (2): 22–27 (2007).
Gu X, Yin G, Lee JJ. Bayesian two-step lasso strategy for biomarker selection in personalized medicine development for time-to-event endpoints. Contemporary Clinical Trials 36: 642–650 (2013).
Halsey LG, Curran-Everett D, Vowler SL & Drummond GW. The fickle P value generates irreproducible results. Nature Methods 12: 179–185 (2015).
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2001).
Hellmann MD, Ciuleanu TE, Pluzanski A, Lee JS, Otterson GA, Audigier-Valette C, Minenza E, Linardou H, Burgers S, Salman P, Borghaei H. Nivolumab plus ipilimumab in lung cancer with a high tumor mutational burden. New England Journal of Medicine. 2018 Apr 16.
Hothorn T, Hornik K, Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3): 651–674 (2006).
Ishwaran H, Kogalur UB, Lauer MS. Random survival forests. Annals of Applied Statistics 2: 841–860 (2008).
Jia J, Tang Q, Xie W, Rode R. A Novel Method of Subgroup Identification by Combining Virtual Twins with GUIDE (VG) for Development of Precision Medicines. Presented at ICSA, and eprint arXiv: 1708.04741 2017
Johnson DR, Bachan LK. What can we learn from studies based on small sample sizes? Psychological Reports 113(1): 1233–1236 (2013).
Kursa MB, Rudnicki WR. Feature selection with the Boruta package. Journal of Statistical Software 36 (11) (2010).
Kwak, EL et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. New England Journal of Medicine 363: 1693–1703 (2010).
Li Q, Lin N. The Bayesian elastic net. Bayesian Analysis 5 (1): 151–170 (2010).
Lipkovich I, Dmitrienko A, Denne J, Enas G. Subgroup identification based on differential effect search (SIDES): a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine 30: 2601–2621 (2011).
Lipkovich I, Dmitrienko A. Biomarker identification in clinical trials. In Clinical and Statistical Considerations in Personalized Medicine, Carini C, Chang M (eds). Chapman and Hall/CRC Press: New York: 211–264 (2014).
Lipkovich I, Dmitrienko A, D’Agostino RB. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Statistics in Medicine 36: 136–196 (2017).
Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A significance test for the lasso. The Annals of Statistics 42: 413–463 (2014).
Loh WY, Shih YS. Split selection methods for classification trees. Statistica Sinica 7: 815–840 (1997).
Loh WY. Variable selection for classification and regression in large p, small n problems. In Probability Approximations and Beyond. Barbour A, Chan HP, Siegmund D (eds), Lecture Notes in Statistics -Proceedings 205: 133–157 (2012).
Loh WY. Fifty Years of Classification and Regression Trees. International Statistical Review 82 (3): 329–348 (2014).
Loh WY, He X, and Man M. A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine 34: 1818–1833 (2015).
Loh WY, Fu H, Man M, Champion V, Yu M. Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables. Statistics in Medicine 35: 4837–4855 (2016).
Loh WY, Man M, Wang S. Subgroups from regression trees with adjustment for prognostic effects: identification and inference. Statistics in Medicine, accepted (2018).
McDermott U, Iafrate AJ, Gray NS, et al. Genomic alterations of anaplastic lymphoma kinase may sensitize tumors to anaplastic lymphoma kinase inhibitors. Cancer Res 68: 3389–95 (2008).
Meinshausen N, Meier L, Buhlmann P. P-values for high-dimensional regression. Journal of the American Statistical Associations 104: 1671–1681 (2009).
Mi G. Enhancement of the adaptive signature design for learning and confirming in a single pivotal trial. Pharmaceutical statistics. 2017 Sep 1; 16(5):312–321.
Morik K. Medicine: applications of machine learning. In Encyclopedia of machine learning. Sammut C, Webb GI (eds). (2011).
Negassa A, Ciampi A, Abrahamowicz M, Shapiro S, Boivin JF. Tree-structured subgroup analysis for censored survival data: validation of computationally inexpensive model selection criteria. Statistics and Computing 15: 231–239 (2005).
Obermeyer Z, Emanuel EJ. Predicting the future – big data, machine learning and clinical medicine. New England Journal of Medicine 375: 1216–1219 (2016).
Park T, Casella G. The Bayesian lasso. Journal of the American Statistical Association 103: 681–686 (2008).
Reck M, et al. “Pembrolizumab versus chemotherapy for PD-L1–positive non-small-cell lung cancer”. The New England Journal of Medicine 375 (19): 1824–1833 (2016).
Peters S, Camidge DR, Shaw AT, Gadgeel S, Ahn JS, Kim DW, Ou SH, Pérol M, Dziadziuszko R, Rosell R, Zeaiter A. Alectinib versus crizotinib in untreated ALK-positive non–small-cell lung cancer. New England Journal of Medicine. 2017 Aug 31; 377(9):829–38.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Ross JS, Hatzis C, Symmans WF, Pusztai L, Hortobagyi GN. Commercialized multigene predictors of clinical outcome for breast cancer. The Oncologist 13 (5): 477–493 (2008).
Ruberg S and Shen L. Personalized Medicine. Four Perspectives of Tailored Medicine. Statistics in Biopharmaceutical Research 7 (3): 214–229 (2015).
Soda M et al. Identification of the transforming EML4–ALK fusion gene in non-small-cell lung cancer. Nature 448: 561–567 (2007).
Strobl C. Data mining. In The Oxford Handbook on Quantitative Methods, Ed. T. Little pp. 678–700. USA, Chapter 29: Oxford University Press (2013).
Su X, Tsai CL, Wang H, Nickerson DM, Li B. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research 10: 141–158 (2009).
Sutton CD. Classification and regression trees. Handbook of Statistics 24: 303–329 (2005).
Tibshirani R. Regression Shrinkage and Selection via the lasso. Journal of the Royal Statistical Society. Series B (methodological). Wiley. 58 (1): 267–88 (1996).
Tibshirani R, Saunders M, Rosset S, Zhu J, and Knight K. Sparsity and Smoothness via the Fused lasso. Journal of the Royal Statistical Society. Series B (statistical Methodology) 67 (1). Wiley: 91–108 (2005).
US Food and Drug Administration, “FDA Clears Breast Cancer Specific Molecular Prognostic Test,” news release, February 6, 2007.
US Food and Drug Administration. FDA labeling information — Xalkori. FDA website (2011).
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68: 49–67 (2007).
Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (statistical Methodology). Wiley. 67 (2): 301–20 (2005).
Zou H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Associations 101: 1418–1429 (2006).
Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics 37 (4): 1733–1751 (2009).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Man, M., Nguyen, T.S., Battioui, C., Mi, G. (2019). Predictive Subgroup/Biomarker Identification and Machine Learning Methods. In: Fang, L., Su, C. (eds) Statistical Methods in Biomarker and Early Clinical Development. Springer, Cham. https://doi.org/10.1007/978-3-030-31503-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-31503-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31502-3
Online ISBN: 978-3-030-31503-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)