Skip to main content

Advertisement

Log in

A review of h-likelihood for survival analysis

  • Original Paper
  • Recent Statistical Methods for Survival Analysis
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

Statistical models with unobservable random variables such as random-effect models have been recently studied for analyzing data of complex types (e.g. longitudinal and time-to-event data) in various areas. The hierarchical likelihood [h-likelihood; Lee and Nelder (Journal of the Royal Statistical Society 58, 619–678, 1996)] provides a unified framework for the inference of such models with unobservable random variables. In this paper, we review the h-likelihood framework for survival analysis. We also demonstrate how to analyze survival data via web-based software Albatross Analytics which was recently developed based on the h-likelihood procedure. Furthermore, we discuss recent extensions of the h-likelihood.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aitkin, M., & Foxall, R. (2003). Statistical modelling of artificial neural networks using the multi-layer perceptron. Statistics and Computing, 13, 227–239.

    Article  MathSciNet  Google Scholar 

  • Austin, P. C. (2017). A tutorial on multilevel survival analysis: Methods, models and applications. International Statistical Review, 85, 185–203.

    Article  Google Scholar 

  • Balan, T. A., & Putter, H. (2020). A tutorial on frailty models. Statistical Methods in Medical Research, 29, 3424–3454.

    Article  MathSciNet  Google Scholar 

  • Breslow, N. E. (1972). Discussion of Professor Cox’s paper. Journal of the Royal Statistical Society: Series B, 34, 216–217.

    MathSciNet  Google Scholar 

  • Breslow, N. E. (1974). Covariance analysis of censored survival data. Biometrics, 30, 89–99.

    Article  Google Scholar 

  • Chee, C.-S., Ha, I.D., Seo, B., & Lee, Y. (2021). Semiparametric estimation for nonparametric frailty models using nonparametric maximum likelihood approach. revision submitted to Statistical Methods in Medical Research.

  • Ching, T., Zhu, X., & Garmire, L. X. (2018). Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Computational Biology, 14(4), e1006076.

    Article  Google Scholar 

  • Christian, N. J., Ha, I. D., & Jeong, J.-H. (2016). Hierarchical likelihood inference on clustered competing risks data. Statistics in Medicine, 35, 251–267.

    Article  MathSciNet  Google Scholar 

  • Duchateau, L., & Janssen, P. (2008). The frailty model. Berlin: Springer.

    MATH  Google Scholar 

  • Efron, B. (2020). Prediction, estimation, and attribution. Journal of the American Statistical Association, 530, 636–655.

    Article  MathSciNet  MATH  Google Scholar 

  • Elbers, C., & Ridder, G. (1982). True and spurious duration dependence: the identifiability of the proportional hazard model. Review of Economics Studies, 49, 403–409.

    Article  MathSciNet  MATH  Google Scholar 

  • Elghafghuf, A., & Stryhn, H. (2017). Robust Poisson likelihood estimation for frailty cox models: a simulation study. Communications in Statistics—Simulation and Computation, 46, 2907–2923.

    Article  MathSciNet  MATH  Google Scholar 

  • Emura, T., Nakatochi, M., Murotani, K., et al. (2017). A joint frailty-copula model between tumour progression and death for meta-analysis. Statistical Methods in Medical Research, 26, 2649–2666.

    Article  MathSciNet  Google Scholar 

  • Emura, T., Matsui, S., & Rondeau, V. (2019). Survival analysis with correlated endpoints, joint frailty-copula models. JSS Research series in statistics. Singapore: Springer.

  • Emura, T., Shih, J.-H., Ha, I. D., & Wilke, R. A. (2020). Comparison of the marginal hazard model and the sub-distribution hazard model for competing risks under an assumed copula. Statistical Methods in Medical Research, 29, 2307–2327.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Ma, C., & Zhong, Y. (2019). A selective overview of deep learning. [stat.ML], 14 2019.

  • Fine, J. P., & Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94, 548–560.

    Article  MathSciNet  MATH  Google Scholar 

  • Goethals, K., Janssen, P., & Duchateau, L. (2008). Frailty models and copulas: similarities and differences. Journal of Applied Statistics, 35, 1071–1079.

    Article  MathSciNet  MATH  Google Scholar 

  • Guo, X., & Carlin, B. P. (2004). Separate and joint modeling of longitudinal and event time data using standard computer packages. American Statistician, 58, 16–24.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Lee, Y., & Song, J.-K. (2001). Hierarchical likelihood approach for frailty models. Biometrika, 88, 233–243.

    Article  MathSciNet  MATH  Google Scholar 

  • Ha, I. D., Lee, Y., & Song, J.-K. (2002). Hierarchical likelihood approach for mixed linear models with censored data. Lifetime Data Analysis, 8, 163–176.

    Article  MathSciNet  MATH  Google Scholar 

  • Ha, I. D., & Lee, Y. (2003). Estimating frailty models via Poisson hierarchical generalized linear models. Journal of Computational and Graphical Statistics, 12, 663–681.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Park, T., & Lee, Y. (2003). Joint modelling of repeated measures and survival time data. Biometrical Journal, 45, 647–658.

    Article  MathSciNet  MATH  Google Scholar 

  • Ha, I. D., & Lee, Y. (2005a). Comparison of hierarchical likelihood versus orthodox best linear unbiased predictor approaches for frailty models. Biometrika, 92, 717–723.

    Article  MathSciNet  MATH  Google Scholar 

  • Ha, I. D., & Lee, Y. (2005b). Multilevel mixed linear models for survival data. Lifetime Data Analysis, 11, 131–142.

    Article  MathSciNet  MATH  Google Scholar 

  • Ha, I. D., Lee, Y., & MacKenzie, G. (2007a). Model selection for multi-component frailty models. Statistics in Medicine, 26, 4790–4807.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Lee, Y., & Pawitan, Y. (2007b). Genetic mixed linear models for twin survival data. Behavior Genetics, 37, 621–630.

    Article  Google Scholar 

  • Ha, I. D., Noh, M., & Lee, Y. (2010). Bias reduction of likelihood estimators in semi-parametric frailty models. Scandinavian Journal of Statistics, 37, 307–320.

    Article  MathSciNet  MATH  Google Scholar 

  • Ha, I. D., Sylvester, R., Legrand, C., & MacKenzie, G. (2011). Frailty modelling for survival data from multi-centre clinical trials. Statistics in Medicine, 30, 28–37.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Noh, M., & Lee, Y. (2012). frailtyHL: a package for fitting frailty models with h-likelihood. R Journal, 4, 307–320.

    Article  Google Scholar 

  • Ha, I. D., Pan, J., Oh, S., & Lee, Y. (2014a). Variable selection in general frailty models using penalized h-likelihood. Journal of Computational and Graphical Statistics, 23, 1044–1060.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Lee, M., Oh, S., Jeong, J.-H., Sylvester, R., & Lee, Y. (2014b). Variable selection in subdistribution hazard frailty models with competing risks data. Statistics in Medicine, 33, 4590–4604.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Christian, N. J., Jeong, J.-H., Park, J., & Lee, Y. (2016a). Analysis of clustered competing risks data using subdistribution hazard models with multivariate frailties. Statistical Methods in Medical Research, 25, 2488–2505.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Vaida, F., & Lee, Y. (2016b). Interval estimation of random effects in proportional hazards models with frailties. Statistical Methods in Medical Research, 25, 936–953.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Jeong, J.-H., & Lee, Y. (2017a). Statistical modelling of survival data with random effects: h-likelihood approach. Singapore: Springer.

    Book  MATH  Google Scholar 

  • Ha, I. D., Noh, M., & Lee, Y. (2017b). H-likelihood approach for joint modelling of longitudinal outcomes and time-to-event data. Biometrical Journal, 59, 1122–1143.

    Article  MathSciNet  MATH  Google Scholar 

  • Ha, I.D., Noh, M., Kim, J., & Lee, Y. (2018). FrailtyHL: frailty models using h-likelihood. R package version 2.1. http://CRAN.Rproject.org/package=frailtyHL

  • Ha, I. D., Kim, J., & Emura, T. (2019). Profile likelihood approaches for semiparametric copula and frailty models for clustered survival data. Journal of Applied Statistics, 46, 2553–2571.

    Article  MathSciNet  Google Scholar 

  • Ha, I. D., Lee, Y., Xiang, L., Peng, M., & Jeong, J.-H. (2020). Frailty modelling approaches for semi-parametric risks data. Lifetime Data Analysis, 26, 109–133.

    Article  MathSciNet  MATH  Google Scholar 

  • Hao, L., Kim, J., Kwon, S., & Ha, I.D. (2021). Deep learning-based survival analysis for high-dimensional survival data. submitted to Mathematics, in press.

  • Hougaard, P. (2000). Analysis of multivariate survival data. New York: Springer.

    Book  MATH  Google Scholar 

  • Huang, X., & Wolfe, R. (2002). A frailty model for informative censoring. Biometrics, 58, 510–520.

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, R., Xiang, L., & Ha, I. D. (2019). Frailty proportional mean residual life regression for clustered survival data: A hierarchical quasi-likelihood method. Statistics in Medicine, 38, 4854–4870.

    Article  MathSciNet  Google Scholar 

  • Jin, S., & Lee, Y. (2020). A review of h-likelihood and hierarchical generalized linear model. WIREs Computational Statistics, in press.

  • Kalbfleisch, J. D., & Prentice, R. L. (1980). The statistical analysis of failure time data. New York: Wiley.

    MATH  Google Scholar 

  • Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1), 24.

    Article  Google Scholar 

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature., 521, 436–444.

    Article  Google Scholar 

  • Lee, M., Ha, I. D., & Lee, Y. (2017). Frailty modeling for clustered competing risks data with missing cause of failure. Statistical Methods in Medical Research, 26, 356–373.

    Article  MathSciNet  Google Scholar 

  • Lee, Y., & Nelder, J. A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society: Series B, 58, 619–678.

    MathSciNet  MATH  Google Scholar 

  • Lee, Y., & Nelder, J. A. (2001). Hierarchical generalised linear models: a synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika, 88, 987–1006.

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, Y., Nelder, J. A., & Pawitan, Y. (2017). Generalised linear models with random effects: unified analysis via h-likelihood (2nd ed.). Boca Raton: Chapman and Hall.

    MATH  Google Scholar 

  • Lee, Y., & Noh, M. (2018). dhglm: Double Hierarchical Generalized Linear Models. R package version 2.0. Retrieved from https://CRAN.R-project.org/package=dhglm.

  • Lee, Y., & Kim, G. (2016). H-likelihood predictive intervals for unobservables. International Statistical Review, 84, 487–505.

    Article  MathSciNet  Google Scholar 

  • Lee, Y., & Kim, G. (2020). Properties of h-likelihood estimators in clustered data. International Statistical Review, 88, 380–395.

    Article  MathSciNet  Google Scholar 

  • Liu, L., Wolfe, R. A., & Huang, X. (2004). Shared Frailty Models for Recurrent Events and a Terminal Event. Biometrics, 60, 747–756.

    Article  MathSciNet  MATH  Google Scholar 

  • Paik, M. C., Lee, Y., & Ha, I. D. (2015). Frequentist inference on random effects based on summarizability. Statistica Sinica, 25, 1107–1132.

    MathSciNet  MATH  Google Scholar 

  • Park, E., & Ha, I. D. (2019). Penalized variable selection for accelerated failure time models with random effects. Statistics in Medicine, 38, 878–892.

    Article  MathSciNet  Google Scholar 

  • Rakhmawati, T.W., Ha, I.D., Lee, H., & Lee, Y. (2021). Penalized variable selection for cause-specific frailty models with clustered competing-risks data. revision submitted to Statistics in Medicine.

  • Ripatti, S., & Palmgren, J. (2000). Estimation of multivariate frailty models using penalized partial likelihood. Biometrics, 56, 1016–1022.

    Article  MathSciNet  MATH  Google Scholar 

  • Rizopoulos, D. (2012). Joint models for longitudinal and time-to-event data, with applications in R. Boca Raton: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Rondeau, V., Filleul, L., & Joly, P. (2006). Nested frailty models using maximum penalized likelihood estimation. Statistics in Medicine, 25, 4036–4052.

    Article  MathSciNet  Google Scholar 

  • Rondeau, V., Schaffner, E., Corbiere, F., Gonzalez, J. R., & Mathoulin-Pelissier, S. (2013). Cure frailty models for survival data: application to recurrences for breast cancer and to hospital readmissions for colorectal cancer. Statistical Methods in Medical Research, 22, 243–260.

    Article  MathSciNet  Google Scholar 

  • Sylvester, R. J., van der Meijden, A. P., Oosterlinck, W., et al. (2006). Predicting recurrence and progression in individual patients with stage Ta T1 bladder cancer using EORTC risk tables: a combined analysis of 2596 patients from seven EORTC trials. European Urology, 49, 466–477.

    Article  Google Scholar 

  • Tawiah, R., McLachlan, G. J., & Ng, S. K. (2020a). Mixture cure models with time-varying and multilevel frailties for recurrent event data. Statistical Methods in Medical Research, 29, 1368–1385.

    Article  MathSciNet  Google Scholar 

  • Tawiah, R., McLachlan, G. J., & Ng, S. K. (2020b). A bivariate joint frailty model with mixture framework for survival analysis of recurrent events with dependent censoring and cure fraction. Biometrics, 76, 753–766.

    Article  MathSciNet  MATH  Google Scholar 

  • Therneau, T. M., & Grambsch, P. M. (2000). Modeling survival data: extending the Cox model. New York: Springer.

    Book  MATH  Google Scholar 

  • Vaida, F., & Blanchard, S. (2005). Conditional Akaike information for mixed-effects models. Biometrika, 92, 351–370.

    Article  MathSciNet  MATH  Google Scholar 

  • Zheng, M., & Klein, J. P. (1995). Estimates of marginal survival for dependent competing risks based on an assumed copula. Biometrika, 82, 127–138.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou, B., & Latouche, A. (2015). crrSC: Competing risks regression for stratified and clustered data. R package version, 1, 1.

    Google Scholar 

Download references

Acknowledgements

The research of Dr. Il Do Ha was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (No. NRF-2020R1F1A1A01056987). The research of Dr. Youngjo Lee was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No 2019R1A2C1002408).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Il Do Ha.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Appendix

Appendix A: Appendix

Appendix A: Derivation of h-likelihood (1)

Under the assumptions of the conditional independence and non-informative censoring, the h-(log)likelihood based on the (ij)th observation is defined by a joint density function of \((Y_{ij}, \Delta _{ij}, V_{i})\) as follows.

$$\begin{aligned} h_{ij}= & {} \log f(y_{ij}, \delta _{ij}, v_{i})\\= & {} \log f(y_{ij}, \delta _{ij}|u_{i}) + \log g(v_{i}), \end{aligned}$$

where the first term of \(h_{ij}\) can be expressed as

$$\begin{aligned} \log f(y_{ij}, \delta _{ij}|u_{i})= & {} \delta _{ij} \log f(y_{ij}|u_{i}) + (1-\delta _{ij}) \log S(y_{ij}|u_{i})\\= & {} \delta _{ij} \log \lambda (y_{ij}|u_{i}) - \Lambda (y_{ij}|u_{i}). \end{aligned}$$

Here \(S(\cdot |u_{i})=\exp \{- \Lambda (\cdot |u_{i})\}\), \(\Lambda (\cdot |u_{i})\) and \(\lambda (\cdot |u_{i})\) are the conditional survival function, cumulative hazard function and hazard function of \(T_{ij}|u_{i}\), respectively. Thus, the h-likelihood (1) of all observations becomes

$$\begin{aligned} h=\sum _{ij} h_{ij}= \sum _{ij}\ell _{1ij}+\sum _{i} \ell _{2i}, \end{aligned}$$

where \(\ell _{1ij}= \log f(y_{ij}, \delta _{ij}|u_{i})\) and \(\ell _{2i}= \log g(v_{i})\). Note here that for \(v_{i}=\nu (u_{i})\) on a strictly monotone function of \(u_{i}\),

$$\begin{aligned} \log g(v_{i})= \log g(u_{i}) + \log \biggl | \frac{du_{i}}{d v_{i}} \biggl |. \end{aligned}$$

Note here that the Jacobian is involved when we transform the u-scale to the \(\nu \)-scale in getting of h-likelihood.

Appendix B: Derivation of the PHL under the joint model

Let \(C_{i}\) denote independent censoring time. We assume that given \(v_{i}\), \(C_{i}\) is independent of \((T_{ik},\Delta _{ik})\) for \(k=1,2\). Then the observed event time and event indicator are, respectively, given by

$$\begin{aligned} T_{i}^{*}=\mathrm {min}(T_{i1},T_{i2},C_{i})~\mathrm {and}~\Delta _{ik}=I(T_{i}^{*}=T_{ik}). \end{aligned}$$

Thus, all observable random variables are \((Y_{ij},T_{i}^{*},\Delta _{ik})\) with their observed values \((y_{ij},t_{i}^{*},\delta _{ik})\) \((i=1,\ldots ,q;j=1,\ldots ,n_{i};k=1,2)\). Here the h-likelihood for the general joint models with competing-risk data becomes

$$\begin{aligned} h=\sum _{ij}\ell _{1ij}+\sum _{ik}\ell _{2ik}+\sum _{i}\ell _{3i}, \end{aligned}$$

where \(\ell _{1ij}\) is the conditional normal log-likelihood for \(y_{ij}\) given \(v_{i1}\), and for \(k=1,2\)

$$\begin{aligned} \ell _{2ik}=\ell _{2ik}(\beta _{k+1},\lambda _{0k};(t_{i}^{*},\delta _{ik})|v_{i})=\delta _{ik}\{\log \lambda _{0k}(t_{i}^{*})+\eta _{2ik}\}-\Lambda _{0k}(t_{i}^{*})\exp (\eta _{2ik}), \end{aligned}$$

where \(\eta _{2ik}=x_{i2}^{T}\beta _{k+1}+ v_{i,k+1}\), and \(\ell _{3i}\) is the log-likelihood for \(v_{i}=(v_{i1}, v_{i2}, v_{i3})^T\) with trivariate normal distribution, given by

$$\begin{aligned} \ell _{3i}(\sigma ;v_{i})=-\frac{1}{2}\log |2\pi \Sigma _{3} |-\frac{1}{2} v_{i}^{T}\Sigma _{3}^{-1}v_{i}. \end{aligned}$$

Following (5) and (6), it can be easily shown that the corresponding PHL \(h_{p}\) is given by

$$\begin{aligned} h_{p}=\sum _{ij}\ell _{1ij}+\sum _{ik}\delta _{ik}\eta _{2ik}-\sum _{kr}d_{(kr)}\log \biggl \{\sum _{~i\in R_{(kr)}~}\exp (\eta _{2ik}) \biggl \}+\sum _{i}\ell _{3i}, \end{aligned}$$

where \(d_{(kr)}\) is the number of events at time \(t_{(kr)}\) and \( R_{(kr)}=\{i:t_{i}^{*}\ge t_{(kr)}\}\) is the risk set at \(t_{(kr)}\) which is the rth (\(r=1,\ldots ,D_{k}\)) smallest distinct event time for Type k event among the \(t_{i}^{*}\)s.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ha, I.D., Lee, Y. A review of h-likelihood for survival analysis. Jpn J Stat Data Sci 4, 1157–1178 (2021). https://doi.org/10.1007/s42081-021-00125-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-021-00125-z

Keywords

Navigation