Skip to main content
Log in

Recent progresses in outcome-dependent sampling with failure time data

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

An outcome-dependent sampling (ODS) design is a retrospective sampling scheme where one observes the primary exposure variables with a probability that depends on the observed value of the outcome variable. When the outcome of interest is failure time, the observed data are often censored. By allowing the selection of the supplemental samples depends on whether the event of interest happens or not and oversampling subjects from the most informative regions, ODS design for the time-to-event data can reduce the cost of the study and improve the efficiency. We review recent progresses and advances in research on ODS designs with failure time data. This includes researches on ODS related designs like case–cohort design, generalized case–cohort design, stratified case–cohort design, general failure-time ODS design, length-biased sampling design and interval sampling design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Asgharian M, M’Lan CE, Wolfson DB (2002) Length-biased sampling with right censoring: an unconditional approach. J Am Stat Assoc 97:201–209

    Article  MathSciNet  MATH  Google Scholar 

  • Asgharian M, Wolfson DB (2005) Asymptotic behaviour of the npmle of the survivor function when the data are length-biased and subject to right censoring. Ann Stat 33:2109–2131

    Article  MATH  Google Scholar 

  • Barlow W (1994) Robust variance estimation for the case-cohort design. Biometrics 50:1064–1072

    Article  MATH  Google Scholar 

  • Bergeron PJ, Asgharian M, Wolfson DB (2008) Covariate bias induced by length-biased sampling of failure times. J Am Stat Assoc 103:737–742

    Article  MathSciNet  MATH  Google Scholar 

  • Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J (2000) Exposure stratified case-cohort designs. Lifetime Data Anal 6:39–58

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow NE, Cain KC (1988) Logistic regression for two-stage case-control data. Biometrika 75:11–20

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow NE, Holubkov R (1997) Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling. J R Stat Soc B 59:447–461

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow NE, McNeney B, Wellner JA (2003) Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Ann Stat 31:1110–1139

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow NE, Wellner JA (2007) Weighted likelihood for semiparametric models and two-phase stratified samples, with application to cox regression. Scand J Stat 34:86–102

    Article  MathSciNet  MATH  Google Scholar 

  • Cai J, Zeng D (2004) Sample size/power calculation for case-cohort studies. Biometrics 60:1015–1024

    Article  MathSciNet  MATH  Google Scholar 

  • Cai J, Zeng D (2007) Power calculation for case-cohort studies with nonrare events. Biometrics 63:1288–1295

    Article  MathSciNet  MATH  Google Scholar 

  • Chatterjee N, Chen YH, Breslow NE (2003) A pseudo-score estimator for regression problems with two-phase sampling. J Am Stat Assoc 98:158–168

    Article  MATH  Google Scholar 

  • Chen HY (2001a) Weighted semiparametric likelihood method for fitting a proportional odds regression model to data from the case-cohort design. J Am Stat Assoc 96:1446–1458

    Article  MathSciNet  MATH  Google Scholar 

  • Chen HY (2001b) Fitting semiparametric transformation regression models to data from a modified case-cohort design. Biometrika 88:255–268

    Article  MathSciNet  MATH  Google Scholar 

  • Chen K (2001c) Generalized case-cohort sampling. J R Stat Soc B 63:791–809

    Article  MathSciNet  MATH  Google Scholar 

  • Chen K, Lo S (1999) Case-cohort and case-control analysis with Coxs model. Biometrika 86:755–764

    Article  MathSciNet  MATH  Google Scholar 

  • Cornfield J (1951) A method of estimating comparative rates from clinical data: applications to cancer of lung, breast, and cervix. J Natl Cancer I 11:1269–1275

    Google Scholar 

  • Correa JA, Wolfson DB (1999) Length-bias: some characterizations and applications. J Stat Comput Sim 64:209–219

    Article  MathSciNet  MATH  Google Scholar 

  • Cox DR (1975) Partial likelihood. Biometrika 62:269–276

    Article  MathSciNet  MATH  Google Scholar 

  • Ding J, Liu L, Peden DB, Kleeberger SR, Zhou H (2012) Regression analysis for a summed missing data problem under an outcome-dependent sampling scheme. Can J Stat 40:282–303

    Article  MathSciNet  MATH  Google Scholar 

  • Ding J, Zhou H, Liu L, Cai J, Longnecker MP (2014) Estimating effect of environmental contaminants on women’s subfecundity for the MoBa study data with an outcome-dependent sampling scheme. Biostatistics 15:636–650

    Article  Google Scholar 

  • Ghosh D (2008) Proportional hazards regression for cancer studies. Biometrics 64:141–148

    Article  MathSciNet  MATH  Google Scholar 

  • Imbens GW, Lancaster T (1996) Efficient estimation and stratified sampling. J Econ 74:289–318

    Article  MathSciNet  MATH  Google Scholar 

  • Kalbfleisch JD, Lawless JF (1988) Likelihood analysis of multi-state models for disease incidence and mortality. Stat Med 7:147–160

    Article  Google Scholar 

  • Kang S, Cai J (2009) Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96:887–901

    Article  MathSciNet  MATH  Google Scholar 

  • Kang S, Cai J, Chambless L (2013) Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the Atherosclerosis Risk in Communities (ARIC) study. Biostatistics 14:28–41

    Article  Google Scholar 

  • Kim S, Cai J, Lu W (2013) More efficient estimators for case-cohort studies. Biometrika 100:695–708

    Article  MathSciNet  MATH  Google Scholar 

  • Kong L, Cai J (2009) Case-cohort analysis with accelerated failure time model. Biometrics 65:135–142

    Article  MathSciNet  MATH  Google Scholar 

  • Kong L, Cai J, Sen PK (2004) Weighted estimating equations for semiparametric transformation models with censored data from a case-cohort design. Biometrika 91:305–319

    Article  MathSciNet  MATH  Google Scholar 

  • Kulich M, Lin DY (2000) Additive hazards regression for case-cohort studies. Biometrika 87:73–87

    Article  MathSciNet  MATH  Google Scholar 

  • Kulich M, Lin DY (2004) Improving the efficiency of relative-risk estimation in case-cohort studies. J Am Stat Assoc 99:832–844

    Article  MathSciNet  MATH  Google Scholar 

  • Lawless JF, Wild CJ, Kalbfleisch JD (1999) Semiparametric methods for response-selective and missing data problems in regression. J R Stat Soc B 61:413–438

    Article  MathSciNet  MATH  Google Scholar 

  • Li Z, Gilbert P, Nan B (2008) Weighted likelihood method for grouped survival data in case-cohort studies with application to HIV vaccine trials. Biometrics 64:1247–1255

    Article  MathSciNet  MATH  Google Scholar 

  • Li Z, Nan B (2011) Relative risk regression for current status data in case-cohort studies. Can J Stat 39:557–577

    Article  MathSciNet  MATH  Google Scholar 

  • Lin DY, Ying Z (1993) Cox regression with incomplete covariate measurements. J Am Stat Assoc 88:1341–1349

    Article  MathSciNet  MATH  Google Scholar 

  • Lu S, Shih JH (2006) Case-cohort designs and analysis for clustered failure time data. Biometrics 62:1138–1148

    Article  MathSciNet  MATH  Google Scholar 

  • Lu W, Tsiatis AA (2006) Semiparametric transformation models for the case-cohort study. Biometrika 93:207–214

    Article  MathSciNet  MATH  Google Scholar 

  • Nan B, Yu M, Kalbfleisch JD (2006) Censored linear regression for case-cohort studies. Biometrika 93:747–762

    Article  MathSciNet  MATH  Google Scholar 

  • Patil GP, Rao CR (1978) Weighted distributions and size-biased sampling with applications to wildlife population and human families. Biometrics 34:179–189

    Article  MathSciNet  MATH  Google Scholar 

  • Patil GP, Rao CR, Zelen M (1988) Weighted distributions. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences. Wiley, New York, pp 565–571

    Google Scholar 

  • Prentice RL (1986) A case-cohort design for epidemiologic studies and disease prevention trials. Biometrika 73:1–11

    Article  MathSciNet  MATH  Google Scholar 

  • Qi L, Wang CY, Prentice RL (2005) Weighted estimators for proportional hazards regression with missing covariates. J Am Stat Assoc 100:1250–1263

    Article  MathSciNet  MATH  Google Scholar 

  • Qin J, Ning J, Liu H, Shen Y (2011) Maximum likelihood estimations and EM algorithms with length-biased data. J Am Stat Assoc 106:1434–1449

    Article  MathSciNet  MATH  Google Scholar 

  • Qin J, Shen Y (2010) Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics 66:382–392

    Article  MathSciNet  MATH  Google Scholar 

  • Qin G, Zhou H (2011) Partial linear inference for a 2-stage outcome-dependent sampling design with a continuous outcome. Biostatistics 12:506–520

    Article  MATH  Google Scholar 

  • Samuelsen SO, Anestad H, Skrondal A (2007) Stratified case-cohort analysis of general cohort sampling designs. Scand J Stat 34:103–119

    Article  MathSciNet  MATH  Google Scholar 

  • Schildcrout JS, Heagerty PJ (2008) On outcome dependent sampling designs for longitudinal binary response data with time-varying covariates. Biostatistics 9:735–749

    Article  Google Scholar 

  • Schildcrout JS, Mumford SL, Chen Z, Heagerty PJ, Rathouz PJ (2012) Outcome dependent sampling for longitudinal binary response data based on a time-varying auxiliary variable. Stat Med 31:2441–2456

    Article  MathSciNet  Google Scholar 

  • Schildcrout JS, Rathouz PJ (2010) Longitudinal studies of binary response data following case-control and stratified case-control sampling: design and analysis. Biometrics 66:365–373

    Article  MathSciNet  MATH  Google Scholar 

  • Schill W, Jockel KH, Drescher K, Timm J (1993) Logistic analysis in case-control studies under validation sampling. Biometrika 80:339–352

    Article  MathSciNet  MATH  Google Scholar 

  • Scott AJ, Wild CJ (1991) Fitting logistic regression models in stratified case-control studies. Biometrics 47:497–510

    Article  MathSciNet  MATH  Google Scholar 

  • Self SG, Prentice RL (1988) Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat 16:64–81

    Article  MathSciNet  MATH  Google Scholar 

  • Shen Y, Ning J, Qin J (2009) Analyzing length-biased data with semiparametric transformation and accelerated failure time models. J Am Stat Assoc 104:1192–1202

    Article  MathSciNet  MATH  Google Scholar 

  • Song R, Zhou H, Kosorok MR (2009) On semiparametric efficient inference for two-stage outcome dependent sampling with a continuous outcome. Biometrics 96:221–228

    Article  MathSciNet  MATH  Google Scholar 

  • Sun J, Sun L, Flournoy N (2004) Addictive hazards model for competing risks analysis of the case-cohort design. Commun Stat Theor M 33:351–366

    Article  MATH  Google Scholar 

  • Tsai WY (2009) Pseudo-partial likelihood for proportional hazards models with biased-sampling data. Biometrika 96:601–615

    Article  MathSciNet  MATH  Google Scholar 

  • Vardi Y (1982) Nonparametric estimation in the presence of length bias. Ann Stat 10:616–620

    Article  MathSciNet  MATH  Google Scholar 

  • Vardi Y (1989) Multiplicative censoring, renewal processes, deconvolution and decreasing density. Biometrika 76:751–761

    Article  MathSciNet  MATH  Google Scholar 

  • Wang MC (1991) Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc 86:130–143

    Article  MathSciNet  MATH  Google Scholar 

  • Wang MC (1996) Hazards regression analysis for length-biased data. Biometrika 83:343–354

    Article  MathSciNet  MATH  Google Scholar 

  • Wang X, Zhou H (2006) A semiparametric empirical likelihood method for biased sampling schemes with auxiliary covariates. Biometrics 62:1149–1160

    Article  MathSciNet  MATH  Google Scholar 

  • Wang X, Zhou H (2010) Design and inference for cancer biomarker study with an outcome and auxiliary-dependent subsampling. Biometrics 66:502–511

    Article  MathSciNet  MATH  Google Scholar 

  • Weaver MA (2001) Semiparametric methods for continuous outcome regression models with covariate data from an outcome dependent subsample. PhD Thesis, University of North Carolina, Chapel Hill

  • Weaver MA, Zhou H (2005) An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. J Am Stat Assoc 100:459–469

    Article  MathSciNet  MATH  Google Scholar 

  • Weinberg CR, Wacholder S (1993) Prospective analysis of case-control data under general multiplicative-intercept risk models. Biometrika 80:461–465

    MathSciNet  MATH  Google Scholar 

  • White JE (1982) A two stage design for the study of the relationship between a rare exposure and a rare disease. Am J Epidemiol 115:119–128

    Google Scholar 

  • Wicksell SD (1925) The corpuscle problem: a mathematical study of a biometric problem. Biometrika 17:84–99

    MATH  Google Scholar 

  • Yu J, Liu Y, Sandler DP, Zhou H (2015) Statistical inference for the additive hazards model under outcome-dependent sampling. Can J Stat 43(3):436–453

  • Zhou H, Weaver MA, Qin J, Longnecker M, Wang MC (2002) A semiparametric empirical likelihood method for data from an outcome dependent sampling scheme with a continuous outcome. Biometrics 58:413–421

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou H, Qin G, Longnecker MP (2011a) A partial linear model in the outcome-dependent sampling setting to evaluate the effect of prenatal PCB exposure on cognitive function in children. Biometrics 67:876–885

  • Zhou H, Song R, Qin J (2011b) Statistical inference for a two-stage outcome dependent sampling design with a continuous outcome. Biometrics 67:194–202

  • Zhou H, Wu Y, Liu Y, Cai J (2011c) Semiparametric inference for a 2-stage outcome-auxiliary-dependent sampling design with continuous outcome. Biostatistics 12:521–534

  • Zhou H, You J, Qin G, Longnecker MP (2011d) A partially linear regression model for data from an outcome-dependent sampling design. J R Stat Soc C 60:559–574

  • Zhu H, Wang MC (2012) Analysing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika 99:345–361

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu H, Wang MC (2014) Nonparametric inference on bivariate survival data with interval sampling: association estimation and testing. Biometrika 101:519–533

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu H, Wang MC (2015) A semi-stationary Copula model approach for bivariate survival data with interval sampling. Int J Biostat 11:151–173

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This research is supported in part by U.S. National Institutes of Health (R01ES021900, P01CA142538 to H.Z. and J.C.), National Science Foundation of China (11101314 to J.D.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibo Zhou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, J., Lu, TS., Cai, J. et al. Recent progresses in outcome-dependent sampling with failure time data. Lifetime Data Anal 23, 57–82 (2017). https://doi.org/10.1007/s10985-015-9355-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-015-9355-7

Keywords

Navigation