Predictive Subgroup/Biomarker Identification and Machine Learning Methods

Man, M.; Nguyen, T. S.; Battioui, C.; Mi, G.

doi:10.1007/978-3-030-31503-0_1

M. Man³,
T. S. Nguyen⁴,
C. Battioui³ &
…
G. Mi³

980 Accesses
1 Citations

Abstract

Precision medicine or patient tailoring is very important for drug development due to its potential of increasing efficacy and/or reducing adverse reaction for the right patients (with genomic or other types of biomarker). The next two chapters will discuss statistical methods pertinent to biomarkers that are the important ingredient in developing precision medicine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alemayehu D, Chen Y, Markatou M. A comparative study of subgroup identification methods for differential treatment effect: performance metrics and recommendations. Statistical Methods in Medical Research 0 (0): 1–21 (2017).
Google Scholar
Battioui C, Shen L, Ruberg S. A Resampling-based Ensemble Tree Method to Identify Patient Subgroups with Enhanced Treatment Effect. JSM proceedings (2014).
Google Scholar
Berger J, Wang X, Shen L. A Bayesian approach to subgroup identification. Journal of Biopharmaceutical statistics 24: 110–129 (2014).
Article MathSciNet Google Scholar
Boyiadzis MM, Kirkwood JM, Marshall JL, Pritchard CC, Azad NS, Gulley JL. Significance and implications of FDA approval of pembrolizumab for biomarker-defined disease. Journal of ImmunoTherapy of Cancer 6:35 (2018).
Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Wadsworth: Belmont, CA (1984).
MATH Google Scholar
Breiman, L. Bagging predictors. Machine Learning 24: 123–140 (1996).
MATH Google Scholar
Breiman, L. Random forests. Machine Learning 45: 5–32 (2001).
Article MATH Google Scholar
Buettner R, Wolf J, Thomas RK. Lessons learned from lung cancer genomics: the emerging concept of individualized diagnostics and treatment. Journal of Clinical Oncology 31: 1858–1865 (2013).
Article Google Scholar
Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ & Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14: 365–376 (2013).
Article Google Scholar
Carbone DP, Reck M, Paz-Ares L et al. First-line Nivolumab in stage IV or recurrent non-small-cell lung cancer. New England Journal of Medicine 376: 2415–26 (2017).
Article Google Scholar
Cardoso F, van’t Veer LJ, Bogaerts J, Slaets L, Viale G, Delaloge S, Pierga JY, Brain E, Causeret S, DeLorenzi M, Glas AM. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. New England Journal of Medicine. 2016 Aug 25; 375(8):717–29.
Article Google Scholar
Chen T, Guestrin C. XGBoost: a scalable tree boosting algorithm. ACM Digital Library (2016).
Google Scholar
Chen JH, Asch SM. Machine learning and prediction in medicine – beyond the peak of inflated expectations. New England Journal of Medicine 376: 2507–2509 (2017).
Article Google Scholar
Chen JH, Alagappan M, Goldstein MK, Asch SM, Altman RB. Decaying relevance of clinical data towards future decisions in data-driven inpatient clinical order sets. International Journal of Medical Informatics 102: 71–79 (2017).
Article Google Scholar
Chipman HA, George EI, McCulloch RE BART: Bayesian additive regression trees. The Annals of Applied Statistics 4: 266–298 (2010).
Article MathSciNet MATH Google Scholar
Christensen JG, Zou HY, Arango ME, et al. Cytoreductive antitumor activity of PF-2341066, a novel inhibitor of anaplastic lymphoma kinase and c-Met, in experimental models of anaplastic large-cell lymphoma. Molecular Cancer Therapeutics 6: 3314–22 (2007).
Article Google Scholar
Deo RC. Machine learning in medicine. Circulation 132: 1920–1930 (2015).
Article Google Scholar
Dmitrienko A, Muysers C, Fritsch A, Lipkovich I. General guidance on exploratory and confirmatory subgroup analysis in late-stage clinical trials. Journal of Biopharmaceutical Statistics 26: 71–98 (2016).
Article Google Scholar
Dobashi Y, Goto A, Kimura M, Nakano T. Molecularly Targeted Therapy: Past, Present and Future. Chemotherapy. 2012;1(105):2.
Google Scholar
Domingos, P. The master algorithm. Basic Books, a member of Perseus Books Group, New York (2015).
Google Scholar
Dusseldorf E, Conversano C, Van Os BJ. Combining an additive and tree-based regression model simultaneously: STIMA. Journal of Computational and Graphical Statistics 19: 514–530 (2010).
Article MathSciNet Google Scholar
Dusseldorf E, Van Mechelen I. Qualitative interaction trees: a tool to identify qualitative treatment-subgroup interactions. Statistics in Medicine 33: 219–237 (2014).
Article MathSciNet Google Scholar
Efron, B. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7, 1–26 (1979).
Article MathSciNet MATH Google Scholar
Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21(2): 171–178 (2004).
Article Google Scholar
Fisher, RA. The Design of Experiments. New York: Hafner (1935).
Google Scholar
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association 96: 1348–1360 (2001).
Article MathSciNet MATH Google Scholar
“FDA grants accelerated approval to first drug for Duchenne muscular dystrophy”. Press Announcements. U.S. Food & Drug Administration. September 19, 2016. Retrieved September 19, 2016.
Google Scholar
Foster JC, Taylor JMC, Ruberg SJ. Subgroup identification from randomized clinical trial data. Statistics in Medicine 30: 2867–2880 (2011).
Article MathSciNet Google Scholar
Foster JC, Nan B, Shen L, Kaciroti N, Taylor JMC. Permutation testing for treatment-covariate interactions and subgroup identification. Statistics in Biosciences 8 (1): 77–98 (2016).
Article Google Scholar
Freund Y, Schapire RE. A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences 55: 119–139 (1997).
Article MathSciNet MATH Google Scholar
Freidlin B, Simon R. Adaptive signature design: an adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clinical Cancer Research 2005; 11:7872–7878.
Article Google Scholar
Friedman JH, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. The Annals of Statistics 28: 337–407 (2000).
Article MathSciNet MATH Google Scholar
Friedman JH. Greedy function approximation: a gradient boosting machine. The Annals of Statistics 29: 1189–1232 (2001).
Article MathSciNet MATH Google Scholar
Frueh FW. Personalized medicine: What is it? How will it affect health care? 11^th Annual FDA Science Forum, 2005.
Google Scholar
Fu WJ. The Bridge vs Lasso. Journal of Computational and Graphical Statistics 7 (3). Taylor & Francis: 397–416 (1998).
Google Scholar
Garon EB, Rizvi NA, Hui R, Leighl N, Balmanoukian AS, Eder JP, et al. Pembrolizumab for the treatment of non-small-cell lung cancer. New England Journal of Medicine 372: 2018–2028 (2015).
Article Google Scholar
Gombar C and Loh E. Drug Discovery & Development magazine 10 (2): 22–27 (2007).
Google Scholar
Gu X, Yin G, Lee JJ. Bayesian two-step lasso strategy for biomarker selection in personalized medicine development for time-to-event endpoints. Contemporary Clinical Trials 36: 642–650 (2013).
Article Google Scholar
Halsey LG, Curran-Everett D, Vowler SL & Drummond GW. The fickle P value generates irreproducible results. Nature Methods 12: 179–185 (2015).
Article Google Scholar
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2001).
Google Scholar
Hellmann MD, Ciuleanu TE, Pluzanski A, Lee JS, Otterson GA, Audigier-Valette C, Minenza E, Linardou H, Burgers S, Salman P, Borghaei H. Nivolumab plus ipilimumab in lung cancer with a high tumor mutational burden. New England Journal of Medicine. 2018 Apr 16.
Google Scholar
Hothorn T, Hornik K, Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3): 651–674 (2006).
Article MathSciNet Google Scholar
Ishwaran H, Kogalur UB, Lauer MS. Random survival forests. Annals of Applied Statistics 2: 841–860 (2008).
Article MathSciNet MATH Google Scholar
Jia J, Tang Q, Xie W, Rode R. A Novel Method of Subgroup Identification by Combining Virtual Twins with GUIDE (VG) for Development of Precision Medicines. Presented at ICSA, and eprint arXiv: 1708.04741 2017
Google Scholar
Johnson DR, Bachan LK. What can we learn from studies based on small sample sizes? Psychological Reports 113(1): 1233–1236 (2013).
Article Google Scholar
Kursa MB, Rudnicki WR. Feature selection with the Boruta package. Journal of Statistical Software 36 (11) (2010).
Google Scholar
Kwak, EL et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. New England Journal of Medicine 363: 1693–1703 (2010).
Article Google Scholar
Li Q, Lin N. The Bayesian elastic net. Bayesian Analysis 5 (1): 151–170 (2010).
Article MathSciNet MATH Google Scholar
Lipkovich I, Dmitrienko A, Denne J, Enas G. Subgroup identification based on differential effect search (SIDES): a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine 30: 2601–2621 (2011).
MathSciNet Google Scholar
Lipkovich I, Dmitrienko A. Biomarker identification in clinical trials. In Clinical and Statistical Considerations in Personalized Medicine, Carini C, Chang M (eds). Chapman and Hall/CRC Press: New York: 211–264 (2014).
Google Scholar
Lipkovich I, Dmitrienko A, D’Agostino RB. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Statistics in Medicine 36: 136–196 (2017).
Article MathSciNet Google Scholar
Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A significance test for the lasso. The Annals of Statistics 42: 413–463 (2014).
Article MathSciNet MATH Google Scholar
Loh WY, Shih YS. Split selection methods for classification trees. Statistica Sinica 7: 815–840 (1997).
MathSciNet MATH Google Scholar
Loh WY. Variable selection for classification and regression in large p, small n problems. In Probability Approximations and Beyond. Barbour A, Chan HP, Siegmund D (eds), Lecture Notes in Statistics -Proceedings 205: 133–157 (2012).
Google Scholar
Loh WY. Fifty Years of Classification and Regression Trees. International Statistical Review 82 (3): 329–348 (2014).
Article MathSciNet MATH Google Scholar
Loh WY, He X, and Man M. A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine 34: 1818–1833 (2015).
Article MathSciNet Google Scholar
Loh WY, Fu H, Man M, Champion V, Yu M. Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables. Statistics in Medicine 35: 4837–4855 (2016).
Article MathSciNet Google Scholar
Loh WY, Man M, Wang S. Subgroups from regression trees with adjustment for prognostic effects: identification and inference. Statistics in Medicine, accepted (2018).
Google Scholar
McDermott U, Iafrate AJ, Gray NS, et al. Genomic alterations of anaplastic lymphoma kinase may sensitize tumors to anaplastic lymphoma kinase inhibitors. Cancer Res 68: 3389–95 (2008).
Article Google Scholar
Meinshausen N, Meier L, Buhlmann P. P-values for high-dimensional regression. Journal of the American Statistical Associations 104: 1671–1681 (2009).
Article MathSciNet MATH Google Scholar
Mi G. Enhancement of the adaptive signature design for learning and confirming in a single pivotal trial. Pharmaceutical statistics. 2017 Sep 1; 16(5):312–321.
Article Google Scholar
Morik K. Medicine: applications of machine learning. In Encyclopedia of machine learning. Sammut C, Webb GI (eds). (2011).
Google Scholar
Negassa A, Ciampi A, Abrahamowicz M, Shapiro S, Boivin JF. Tree-structured subgroup analysis for censored survival data: validation of computationally inexpensive model selection criteria. Statistics and Computing 15: 231–239 (2005).
Article MathSciNet MATH Google Scholar
Obermeyer Z, Emanuel EJ. Predicting the future – big data, machine learning and clinical medicine. New England Journal of Medicine 375: 1216–1219 (2016).
Article Google Scholar
Park T, Casella G. The Bayesian lasso. Journal of the American Statistical Association 103: 681–686 (2008).
Article MathSciNet MATH Google Scholar
Reck M, et al. “Pembrolizumab versus chemotherapy for PD-L1–positive non-small-cell lung cancer”. The New England Journal of Medicine 375 (19): 1824–1833 (2016).
Article Google Scholar
Peters S, Camidge DR, Shaw AT, Gadgeel S, Ahn JS, Kim DW, Ou SH, Pérol M, Dziadziuszko R, Rosell R, Zeaiter A. Alectinib versus crizotinib in untreated ALK-positive non–small-cell lung cancer. New England Journal of Medicine. 2017 Aug 31; 377(9):829–38.
Article Google Scholar
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Ross JS, Hatzis C, Symmans WF, Pusztai L, Hortobagyi GN. Commercialized multigene predictors of clinical outcome for breast cancer. The Oncologist 13 (5): 477–493 (2008).
Article Google Scholar
Ruberg S and Shen L. Personalized Medicine. Four Perspectives of Tailored Medicine. Statistics in Biopharmaceutical Research 7 (3): 214–229 (2015).
Article Google Scholar
Soda M et al. Identification of the transforming EML4–ALK fusion gene in non-small-cell lung cancer. Nature 448: 561–567 (2007).
Article Google Scholar
Strobl C. Data mining. In The Oxford Handbook on Quantitative Methods, Ed. T. Little pp. 678–700. USA, Chapter 29: Oxford University Press (2013).
Google Scholar
Su X, Tsai CL, Wang H, Nickerson DM, Li B. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research 10: 141–158 (2009).
Google Scholar
Sutton CD. Classification and regression trees. Handbook of Statistics 24: 303–329 (2005).
Article Google Scholar
Tibshirani R. Regression Shrinkage and Selection via the lasso. Journal of the Royal Statistical Society. Series B (methodological). Wiley. 58 (1): 267–88 (1996).
Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, and Knight K. Sparsity and Smoothness via the Fused lasso. Journal of the Royal Statistical Society. Series B (statistical Methodology) 67 (1). Wiley: 91–108 (2005).
Google Scholar
US Food and Drug Administration, “FDA Clears Breast Cancer Specific Molecular Prognostic Test,” news release, February 6, 2007.
Google Scholar
US Food and Drug Administration. FDA labeling information — Xalkori. FDA website (2011).
Google Scholar
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68: 49–67 (2007).
Article MathSciNet MATH Google Scholar
Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (statistical Methodology). Wiley. 67 (2): 301–20 (2005).
Article MathSciNet MATH Google Scholar
Zou H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Associations 101: 1418–1429 (2006).
Article MathSciNet MATH Google Scholar
Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics 37 (4): 1733–1751 (2009).
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Eli Lilly and Company, Indianapolis, IN, USA
M. Man, C. Battioui & G. Mi
Nektar Therapeutics, San Francisco, CA, USA
T. S. Nguyen

Authors

M. Man
View author publications
You can also search for this author in PubMed Google Scholar
T. S. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
C. Battioui
View author publications
You can also search for this author in PubMed Google Scholar
G. Mi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Man .

Editor information

Editors and Affiliations

MyoKardia Inc., South San Francisco, CA, USA
Liang Fang
BioMarin Pharmaceutical Inc., San Rafael, CA, USA
Cheng Su

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Man, M., Nguyen, T.S., Battioui, C., Mi, G. (2019). Predictive Subgroup/Biomarker Identification and Machine Learning Methods. In: Fang, L., Su, C. (eds) Statistical Methods in Biomarker and Early Clinical Development. Springer, Cham. https://doi.org/10.1007/978-3-030-31503-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-31503-0_1
Published: 27 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31502-3
Online ISBN: 978-3-030-31503-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics