Skip to main content
Log in

Validation of Prognostic Marker Tests: Statistical Lessons Learned From Regulatory Experience

  • Statistics: Review
  • Published:
Therapeutic Innovation & Regulatory Science Aims and scope Submit manuscript

Abstract

Despite concerted efforts to discover and validate prognostic biomarkers or signatures, few medical tests indicated for prognostic uses have been widely accepted by the clinical community. Even fewer, perhaps, are covered by public or private health plans. We were able to identify 6 prognostic marker tests that have been approved or cleared by the US Food and Drug Administration. The pivotal clinical studies for these prognostic marker tests exhibited a wide variety of designs and statistical analyses. From these experiences, we develop statistical points to consider for design, conduct, and analysis of successful clinical validation studies of prognostic tests. In particular, we review broad themes regarding prospective and retrospective study designs, sample size, clinical performance evaluation, and handling of missing data. Our review emphasizes the distinction between a prognostic biomarker and the medical test used to measure it. For this purpose, a section on test measurement validation is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Moons KG, Royston P, Vergouwe Y, Grobbee DE, Altman DG. Prognosis and prognostic research: what, why, and how? BMJ. 2009;338:b375.

    Article  PubMed  Google Scholar 

  2. National Institute for Health and Care Excellence Glossary. http://www.nice.org.uk/website/glossary/glossary.jsp?alpha=P. Published April 15, 2011; July 7, 2013.

  3. Mandrekar SJ, Sargent DJ. Clinical trial designs for predictive biomarker validation: theoretical considerations and practical challenges. J Clin Oncol. 2009;27:4027–4034.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Yamauchi H, Stearns V, Hayes DF. When is a tumor marker ready for prime time? a case study of c-erbB-2 as a predictive factor in breast cancer. J Clin Oncol. 2001;19:2334–2356.

    Article  CAS  PubMed  Google Scholar 

  5. US Food and Drug Administration. Design Considerations for Pivotal Clinical Investigations for Medical Devices. Issued on November 7. Rockville, MD: FDA; 2013.

    Google Scholar 

  6. Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials, Board on Health Care Services, Board on Health Sciences Policy, Institute of Medicine. Evolution of Translational Omics: Lessons Learned and the Path Forward. Micheel CM, ed. Washington, DC: National Academies Press; 2012.

    Google Scholar 

  7. McShane LM, Polley MY. Development of omics-based clinical tests for prognosis and therapy selection: the challenge of achieving statistical robustness and clinical utility. Clin Trials. 2013;10:653–665.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Sullivan DC, Bresolin L, Seto B, Obuchowski NA, Raunig DL, Kessler LG. Introduction to metrology series. Stat Methods Med Res. 2015;24:3–8.

    Article  PubMed  Google Scholar 

  9. Moons KG, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ. 2009;338:b606.

    Article  PubMed  Google Scholar 

  10. Schulman KA, Tunis SR. A policy approach to the development of molecular diagnostic tests. Nat Biotech. 2010;28:1157–1159.

    Article  CAS  Google Scholar 

  11. Subramanian J, Simon R. What should physicians look for in evaluating prognostic gene-expression signatures? Nat Rev Clin Oncol. 2010;7:327–334.

    Article  PubMed  Google Scholar 

  12. Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A. 2002;99:6562–6566.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003;95:14–18.

    Article  CAS  PubMed  Google Scholar 

  14. Whellan DJ, O’Connor CM, Lee KL, et al. Heart failure and a controlled trial investigating outcomes of exercise training (HF-ACTION): design and rationale. Am Heart J. 2007;153:201–211.

    Article  PubMed  Google Scholar 

  15. Knoop A, Knudsen H, Balslev E, et al. TOP2A aberrations as predictive and prognostic marker in high-risk breast cancer patients. A randomized DBCG Trial (DBCG89D). 2006: 532.

  16. US Food and Drug Administration. 510(k) Substantial equivalence determination Decision summary; K111452. July 1, 2013.

  17. US Food and Drug Administration. 510(k) Substantial equivalence determination Decision summary; K093758. July 1, 2013.

  18. US Food and Drug Administration. 510(k) Substantial equivalence determination Decision summary; K062694. July 1, 2013.

  19. US Food and Drug Administration. 510(k) Substantial equivalence determination Decision summary; K101185. July 1, 2013.

  20. US Food and Drug Administration. 510(k) Substantial equivalence determination Decision summary; K050245. March 5, 2005.

  21. US Food and Drug Administration. 510(k) Substantial equivalence determination Decision summary; K071729. July 1, 2013; June 11, 2013.

  22. US Food and Drug Administration. 510(k) Substantial equivalence determination Decision summary; K073338. September 30, 2013.

  23. Gail MH. Evaluating serial cancer marker studies in patients at risk of recurrent disease. Biometrics. 1981;37:67–78.

    Article  CAS  PubMed  Google Scholar 

  24. Parast L, Cai T. Landmark risk prediction of residual life for breast cancer survival. Stat Med. 2013;32:3459–3471.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Cristofanilli M, Budd GT, Ellis MJ, et al. Circulating tumor cells, disease progression, and survival in metastatic breast cancer. N Engl J Med. 2004;351:781–791.

    Article  CAS  PubMed  Google Scholar 

  26. Dybkaer R. Vocabulary for use in measurement procedures and description of reference materials in laboratory medicine. Eur J Clin Chem Clin Biochem. 1997;35:141–173.

    CAS  PubMed  Google Scholar 

  27. Linnet K, Boyd J. Selection and Analytical evaluations of methods—with statistical techniques. In: Tietz NW, Burtis CA, Ashwood ER, Bruns DE, eds. Tietz Textbook of Clinical Chemistry and Molecular Diagnostics. 4th ed. New York: Saunders; 2006:353–407.

    Google Scholar 

  28. De A, Meier K, Tang R, et al. Evaluation of heart failure biomarker tests: a survey of statistical considerations. J Cardiovasc Trans Res. 2013;6:449–457.

    Article  Google Scholar 

  29. Clinical and Laboratory Standards Institute. Harmonized terminology database. 2013. July 19, 2013.

  30. Fraser CG, Hyltoft Peterson P, Larsen ML. Setting analytical goals for random analytical error in specific clinical monitoring situations. Clin Chem. 1990;36:1625–1628.

    CAS  PubMed  Google Scholar 

  31. Simon RM, Paik S, Hayes DF. Use of archived specimens in evaluation of prognostic and predictive biomarkers. J Natl Cancer Inst. 2009;101:1446–1452.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Riley RD. Reporting of prognostic markers: current problems and development of guidelines for evidence-based practice in the future. Br J Cancer. 2003;88:1191–1198.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Gary M. REporting recommendations for tumor MARKer prognostic studies (REMARK). Nat Clin Pract Oncol. 2005;2:416–422.

    CAS  PubMed  Google Scholar 

  34. Simon R, Altman DG. Statistical aspects of prognostic factor studies in oncology. Br J Cancer. 1994;69:979–985.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Ransohoff DF, Gourlay ML. Sources of bias in specimens for research about molecular markers for cancer. J Clin Oncol. 2010;28:698–704.

    Article  PubMed  Google Scholar 

  36. McGuire WL. Breast cancer prognostic factors: evaluation guidelines. J Natl Cancer Inst. 1991;83:154–155.

    Article  CAS  PubMed  Google Scholar 

  37. US Food and Drug Administration. Summary of Safety and Effectiveness; P050045. July 1, 2013.

  38. Moons KGM, Donders RART, Stijnen T, Harrell J. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006;59:1092–1101.

    Article  PubMed  Google Scholar 

  39. Denne JS, Pennello G, Zhao L, Chang SC, Althouse S. Identifying a subpopulation for a tailored therapy: bridging clinical efficacy from a laboratory-developed assay to a validated in vitro diagnostic test kit. Stat Biopharmaceut Res. 2014;6:78–88.

    Article  Google Scholar 

  40. Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11.

    Article  Google Scholar 

  41. Pepe MS, Feng Z, Janes H, Bossuyt PM, Potter JD. Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J Natl Cancer Inst. 2008;100:1432–1438.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Cai T, Zheng Y. Evaluating prognostic accuracy of biomarkers in nested case-control studies. Biostatistics. 2012;13:89–100.

    Article  PubMed  Google Scholar 

  43. Hosmer DW, Lemeshow S, May S. Applied Survival Analysis: Regression Modeling of Time-to-Event Data. 2nd ed. Hoboken, NJ: Wiley; 2008:308–314.

    Book  Google Scholar 

  44. Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat. 1988;16:64–81.

    Article  Google Scholar 

  45. Jewell NP. Statistics for Epidemiology. 33rd ed. Boca Raton, FL: Chapman & Hall/CRC; 2004.

    Google Scholar 

  46. Barlow WE, Ichikawa L, Rosner D, Izumi S. Analysis of case-cohort designs. J Clin Epidemiol. 1999;52:1165–1172.

    Article  CAS  PubMed  Google Scholar 

  47. Wacholder S. Practical considerations in choosing between the case-cohort and nested case-control designs. Epidemiology. 1991;2:155–158.

    Article  CAS  PubMed  Google Scholar 

  48. Langholz B, Goldstein L. Risk set sampling in epidemiologic cohort studies. Stat Sci. 1996;11:35–53.

    Article  Google Scholar 

  49. Wacholder S, McLaughlin JK, Silverman DT, Mandel JS. Selection of controls in case-control studies: I. Principles. Am J Epidemiol. 1992;135:1019–1028.

    Article  CAS  PubMed  Google Scholar 

  50. Wacholder S, Silverman DT, McLaughlin JK, Mandel JS. Selection of controls in case-control studies: II. Types of controls. Am J Epidemiol. 1992;135:1029–1041.

    Article  CAS  PubMed  Google Scholar 

  51. Wacholder S, Silverman DT, McLaughlin JK, Mandel JS. Selection of controls in case-control studies: III. Design options. Am J Epidemiol. 1992;135:1042–1050.

    Article  CAS  PubMed  Google Scholar 

  52. Simon R. Stratification and partial ascertainment of biomarker value in biomarker driven clinical trials. J Biopharm Stat. 2014;24:1011–1021.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159:882–890.

    Article  PubMed  Google Scholar 

  54. Hlatky MA, Greenland P, Arnett DK, et al. Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation. 2009;119:2408–2416.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19:453–473.

    Article  CAS  PubMed  Google Scholar 

  56. Klein JP, Moeschberger M. Survival Analysis: Techniques for Censored and Truncated Data (Second Edition). 3rd ed. New York: Springer-Verlag; 2003.

    Google Scholar 

  57. Ware JH. The limitations of risk factors as prognostic tools. N Engl J Med. 2006;355:2615–2617.

    Article  CAS  PubMed  Google Scholar 

  58. Zheng Y, Cai T, Pepe MS, Levy WC. Time-dependent predictive values of prognostic biomarkers with failure time outcome. J Am Stat Assoc. 2008;103:362–368.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Rector T, Taylor B, Wilt T. Chapter 12: systematic review of prognostic tests. J Gen Intern Med. 2012;27:94–101.

    Article  PubMed Central  Google Scholar 

  60. Ho JE, Liu C, Lyass A, et al. Galectin-3, a marker of cardiac fibrosis, predicts incident heart failure in the community. J Am Coll Cardiol. 2012;60:1249–1256.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Langholz B, Borgan O. Estimation of absolute risk from nested case-control data. Biometrics. 1997;53:767–774.

    Article  CAS  PubMed  Google Scholar 

  62. Kattan MW. Evaluating a new marker’s predictive contribution. Clin Cancer Res. 2004;10:822–824.

    Article  CAS  PubMed  Google Scholar 

  63. Altman DG, Sobin LH. Studies investigating prognostic factors: conduct and evaluation. TNM Online. New York: John Wiley & Sons; 2003.

    Google Scholar 

  64. Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema JD. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;19:1059–1079.

    Article  CAS  PubMed  Google Scholar 

  65. Buyse M, Loi S, Van’t Veer L, et al. Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst. 2006;98:1183–1192.

    Article  CAS  PubMed  Google Scholar 

  66. Alosh M, Fritsch K, Huque M, et al. Statistical considerations on subgroup analysis in clinical trials. Stat Biopharmaceut Res. In press.

  67. Dignam J. Statistical issues in investigating prognostic and predictive markers for DCIS. Paper presented at: NCI workshop on Ductal Carcinoma in Situ; 2007, San Francisco.

  68. Schoenfeld D. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39:499–503.

    Article  CAS  PubMed  Google Scholar 

  69. Schmoor C, Sauerbrei W, Schumacher M. Sample size considerations for the evaluation of prognostic factors in survival analysis. Stat Med. 2000;19:441–452.

    Article  CAS  PubMed  Google Scholar 

  70. Concato J, Peduzzi P, Holford TR, Feinstein AR. Importance of events per independent variable in proportional hazards analysis I. Background, goals, and general strategy. J Clin Epidemiol. 1995;48:1495–1501.

    Article  CAS  PubMed  Google Scholar 

  71. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49:1373–1379.

    Article  CAS  PubMed  Google Scholar 

  72. Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol. 2007;165:710–718.

    Article  PubMed  Google Scholar 

  73. Campbell G, Pennello G, Yue L. Missing data in the regulation of medical devices. J Biopharm Stat. 2011;21:180–195.

    Article  PubMed  Google Scholar 

  74. Vach W. Some issues in estimating the effect of prognostic factors from incomplete covariate data. Stat Med. 1997;16:57–72.

    Article  CAS  PubMed  Google Scholar 

  75. Burton A, Altman DG. Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. Br J Cancer. 2004;91:4–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. National Research Council (U.S.), Panel on Handling Missing Data in Clinical Trials, National Research Council (U.S.), Committee on National Statistics. The Prevention and Treatment of Missing Data in Clinical Trials. Washington, DC: National Academies Press; 2010.

    Google Scholar 

  77. US Food and Drug Administration. Summary of Safety and Effectiveness; P110030. 7-16-2012.

  78. Little RJA. Regression with missing X’s: a review. J Am Stat Assoc. 1992;87:1227–1237.

    Google Scholar 

  79. Pajak TF, Clark GM, Sargent DJ, McShane LM, Hammond ME. Statistical issues in tumor marker studies. Arch Pathol Lab Med. 2000;124:1011–1015.

    CAS  PubMed  Google Scholar 

  80. Pepe MS, Feng Z, Huang Y, et al. Integrating the predictiveness of a marker with its performance as a classifier. Am J Epidemiol. 2008;167:362–368.

    Article  PubMed  Google Scholar 

  81. Pfeiffer RM, Gail MH. Two criteria for evaluating risk prediction models. Biometrics. 2011;67:1057–1065.

    Article  CAS  PubMed  Google Scholar 

  82. Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risk. Biostatistics. 2005;6:227–239.

    Article  PubMed  Google Scholar 

  83. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–935.

    Article  PubMed  Google Scholar 

  84. Bossuyt PM, Lijmer JG, Mol BW. Randomised comparisons of medical tests: sometimes invalid, not always efficient. Lancet. 356:1844–1847.

    Article  CAS  Google Scholar 

  85. Southwest Oncology Group. A randomized phase iii trial to test the strategy of changing therapy versus maintaining therapy for metastatic breast cancer patients who have elevated circulating tumor cell levels at first follow-up assessment. ClinicalTrials.gov. May 22, 2013. Bethesda, MD: National Library of Medicine; July 26, 2013.

    Google Scholar 

  86. European Organisation for Research and Treatment of Cancer (EORTC). Genetic testing or clinical assessment in determining the need for chemotherapy in women with breast cancer that involves no more than 3 lymph nodes. January 10, 2013.

  87. Cardoso F, Piccart-Gebhart M, Van’t Veer L, Rutgers E. The MINDACT trial: the first prospective clinical validation of a genomic tool. Mol Oncol. 2007;1:246–251.

    Article  PubMed  PubMed Central  Google Scholar 

  88. Buyse M, Michiels S. Omics-based clinical trial designs. Curr Opin Oncol. 2013;25:289–295.

    Article  PubMed  Google Scholar 

  89. Buyse M, Michiels S, Sargent DJ, Grothey A, Matheson A, de Gramont A. Integrating biomarkers in clinical trials. Expert Rev Mol Diagn. 2011;11:171–182.

    Article  PubMed  Google Scholar 

  90. Simon R. Clinical trial designs for evaluating the medical utility of prognostic and predictive biomarkers in oncology. Per Med. 2009;7:33–47.

    Article  Google Scholar 

  91. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–138.

    Article  PubMed  PubMed Central  Google Scholar 

  92. Moons KGM, Kengne AP, Woodward M, et al. Risk prediction models, I: Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart. 2012;98:683–690.

    Article  PubMed  Google Scholar 

  93. Altman DG, Vergouwe Y, Royston P, Moons KG. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338:b605.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Tang PhD.

Additional information

Author Note

No official support or endorsement by the US Food and Drug Administration of this paper is intended or should be inferred.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, R., Pennello, G. Validation of Prognostic Marker Tests: Statistical Lessons Learned From Regulatory Experience. Ther Innov Regul Sci 50, 241–252 (2016). https://doi.org/10.1177/2168479015601721

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1177/2168479015601721

Keywords

Navigation