Abstract
A case-cohort design is a cost-effective biased-sampling scheme in studies on survival data. We study the regression analysis of credit risk by fitting the proportional hazards model to data collected via the case-cohort design. Using the minorization-maximization principle, we develop a new quadratic upper-bound algorithm for the calculation of estimators and obtain the convergence of the algorithm. The proposed algorithm involves the inversion of the derived upper-bound matrix only one time in the whole process and the upper-bound matrix is independent of parameters. These features make the proposed algorithm have simple update and low per-iterative cost, especially to large-dimensional problems. Rcpp is an R package which enables users to write R extensions with C++. In this paper, we write the program of the proposed algorithm via Rcpp and improve the efficiency of R program execution and realize the fast computing. We conduct simulation studies to illustrate the performance of the proposed algorithm. We analyze a real data example from a mortgage dataset for evaluating credit risk.
Similar content being viewed by others
References
Baesens, B., Roesch, D., Scheule, H.: Credit risk analytics Measurement techniques applications and examples in SAS. Wiley, New York (2016)
Banasik, J., Crook, J.N., Thomas, L.C.: Not if but when will borrowers default. J. Op. Res. Soc. 50(12), 1185–1190 (1999)
Becker, M.P., Yang, I., Lange, K.: Em algorithms without missing data. Stat. Methods Med. Res. 6(1), 38–54 (1997)
Bellotti, T., Crook, J.: Credit scoring with macroeconomic variables using survival analysis. J. Op. Res. Soc. 60(12), 1699–1707 (2009)
Bellotti, T., Crook, J.: Retail credit stress testing using a discrete hazard model with macroeconomic factors. J. Op. Res. Soc. 65(3), 340–350 (2014)
Breslow, N.E., Wellner, J.A.: Weighted likelihood for semiparametric models and two-phase stratified samples with application to cox regression. Scand. J. Stat. 34(1), 86–102 (2007)
Böhning, D., Lindsay, B.: Monotonicity of quadratic-approximation algorithms. Ann. Inst. Stat. Math. 40(02), 641–663 (1988)
Cai, J., Zeng, D.: Sample size/power calculation for case-cohort studies. Biometrics 60(4), 1015–1024 (2004)
Cai, J., Zeng, D.: Power calculation for case-cohort studies with nonrare events. Biometrics 63(4), 1288–1295 (2007)
Chen, K., Lo, S.-H.: Case-cohort and case-control analysis with cox’s model. Biometrika 86(4), 755–764 (1999)
Cox, D.R.: Regression models and life-tables. J. Roy. Stat. Soc. Ser. B (Methodol.) 34(2), 187–202 (1972)
De Pierro, A.R.: A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography. IEEE Trans. Med. Imaging 14(1), 132–137 (1995)
Ding, J., Zhou, H., Liu, Y., Cai, J., Longnecker, M.P.: Estimating effect of environmental contaminants on women’s subfecundity for the moba study data with an outcome-dependent sampling scheme. Biostatistics, 15(4), 636–650 (2014)
Ding, J., Tian, G.-L., Yuen, K.C.: A new mm algorithm for constrained estimation in the proportional hazards model. Comput. Stat. Data Anal. 84, 135–151 (2015)
Dirick, L., Claeskens, G., Baesens, B.: An akaike information criterion for multiple event mixture cure models. Eur. J. Oper. Res. 241(2), 449–457 (2015)
Dirick, L., Claeskens, G., Baesens, B.: Time to default in credit scoring using survival analysis: a benchmark study. J. Op. Res. Soc. 68(6), 652–665 (2017)
Eddelbuettel, D., Francois, R.: Rcpp: seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011)
Eddelbuettel, D., Sanderson, C.: Rcpparmadillo: accelerating R with high-performance C++ linear algebra. Comput. Stat. Data Anal. 71, 1054–1063 (2014)
Efron, B., Tibshirani, R.J.: An introduction to the bootstrap. Chapman and Hall/CRC, Edward Chapman (1994)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Fan, J., Li, R.: Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 30(1), 74–99 (2002)
Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B (Statistical Methodology) 70(5), 849–911 (2008)
Huang, J., Horowitz, J.L., Wei, F.: Variable selection in nonparametric additive models. Ann. Stat. 38(4), 2282–2313 (2010)
Hunter, D.R., Lange, K.: Computing estimates in the proportional odds model. Ann. Inst. Stat. Math. 54(1), 155–168 (2002)
Hunter, D.R., Lange, K.: A tutorial on mm algorithms. Am. Stat. 58(1), 30–37 (2004)
Im, J.-K., Apley, D.W., Qi, C., Shan, X.: A time-dependent proportional hazards survival model for credit risk analysis. J. Op. Res. Soc. 63(3), 306–321 (2012)
Kang, S., Cai, J., Chambless, L.: Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the atherosclerosis risk in communities (aric) study. Biostatistics 14(1), 28–41 (2013)
Kang, S., Wenbin, L., Liu, M.: Efficient estimation for accelerated failure time model under case-cohort and nested case-control sampling. Biometrics 73(1), 114–123 (2017)
Kong, L., Cai, J., Sen, P.K.: Weighted estimating equations for semiparametric transformation models with censored data from a case-cohort design. Biometrika 91(2), 305–319 (2004)
Kulich, M., Lin, D.Y.: Additive hazards regression for case-cohort studies. Biometrika 87(1), 73–87 (2000)
Kulich, M., Lin, D.Y.: Improving the efficiency of relative-risk estimation in case-cohort studies. J. Am. Stat. Assoc. 99(467), 832–844 (2004)
Lange, K.: Optimization, vol. 1. Springer, New York, NY (2004)
Lange, K.: Numerical analysis for statisticians. Springer, Berlin (2010)
Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 9(1), 1–20 (2000)
Liu, D., Cai, T., Lok, A., Zheng, Y.: Nonparametric maximum likelihood estimators of time-dependent accuracy measures for survival outcome under two-stage sampling designs. J. Am. Stat. Assoc. 113(522), 882–892 (2018)
Lu, W., Tsiatis, A.A.: Semiparametric transformation models for the case-cohort study. Biometrika 93(1), 207–214 (2006)
Narain, B.: Survival analysis and the credit granting decision. lc thomas, jn crook, db edelman, eds. credit scoring and credit control, (1992)
Prentice, R.L.: A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1), 1–11 (1986). (04)
Scheike, T.H., Martinussen, T.: Maximum likelihood estimation for cox’s regression model under case–cohort sampling. Scand. J. Stat 31(2), 283–293 (2004)
Self, S.G., Prentice, R.L.: Asymptotic Distribution Theory and Efficiency Results for Case-Cohort Studies. Ann. Stat. 16(1), 64–81 (1988)
Steingrimsson, J.A., Strawderman, R.L.: Estimation in the semiparametric accelerated failure time model with missing covariates: improving efficiency through augmentation. J. Am. Stat. Assoc. 112(519), 1221–1235 (2017)
Stepanova, M., Thomas, L.: Survival analysis methods for personal loan data. Oper. Res. 50(2), 277–289 (2002)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)
Tibshirani, R.: The lasso method for variable selection in the cox model. Stat. Med. 16(4), 385–395 (1997)
Tong, E.N.C., Christophe, M., Thomas, L.C.: Mixture cure models in credit scoring: if and when borrowers default. Eur. J. Oper. Res. 218(1), 132–139 (2012)
Yu, J., Liu, Y., Dale P Sandler, J.C., Zhou, H.: Outcome-dependent sampling design and inference for cox’s proportional hazards model. J. Stat. Plan. Inference 178, 24–36 (2016)
Zeng, D., Lin, D.Y.: Efficient estimation of semiparametric transformation models for two-phase cohort studies. J. Am. Stat. Assoc. 109(505), 371–383 (2014)
Zhang, C.-H., Huang, J.: The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Stat. 36(4), 1567–1594 (2008)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)
Acknowledgements
This research is supported in part by the National Natural Science Foundation of China ( 11671310 to J.D.).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: proofs the theorems:
Appendix A: proofs the theorems:
Asymptotic properties of \(\widehat{{\varvec{{\varvec{\beta }}}}}_P\) is established by (Self and Prentice 1988) for statistical inference. Before presenting the result of asymptotic properties, we introduce some notations first. Let \({\varvec{{\varvec{\beta }}}}_0\) denote the true value of \({\varvec{{\varvec{\beta }}}}\) and denote \(\tau \) to be the time when the study ends or discontinues. Let \(C = \{i: \xi _i = 1 \text { or } \Delta _i = 1\}\) and \(\widetilde{C} = \{i:\xi _i = 1\}\). For \(d = 0, 1, 2,\)
where \(a^{\otimes 0} = 1\), \(a^{\otimes 1} = a\), \(a^{\otimes 2} = aa^{\top }\) and a is a vector. Define
Condition (C1)-(C7) ensure the asymptotic convergence of \(\widehat{{\varvec{{\varvec{\beta }}}}}_P\):
-
(C1)
\(\widetilde{n} / N \rightarrow \alpha \) for some \(\alpha \in (0, 1)\).
-
(C2)
\(\int _o^{\tau } \lambda _0(t)dt < \infty \).
-
(C3)
There exits \(\delta >0\) such that \(n^{-\frac{1}{2}} \sup _{i, t \in [0, \tau ]} Y_{i}(t)\left| Z_{i}\right| I\left\{ Z_{i}^{\prime } {\varvec{\beta }}_{0}>-\delta \left| Z_{i}\right| \right\} {\mathop {\rightarrow }\limits ^{P}} 0\).
-
(C4)
There exits \(s^{(d)}({\varvec{\beta }}, t)\) defined in \(\mathbb {B} \times [0, \tau ]\) satisfying:
-
(1)
\(\sup _{{\varvec{\beta }} \in \mathbb {B}, t \in [0, \tau ]}\left\| S^{(d)}({\varvec{\beta }}, t)-s^{(d)}({\varvec{\beta }}, t)\right\| {\mathop {\longrightarrow }\limits ^{P}} 0;\)
-
(2)
\(s^{(d)}({\varvec{\beta }}, t)\) are continuous functions of \({\varvec{\beta }}\) uniformly in \(t \in [0, \tau ]\); \(s^{(d)}({\varvec{\beta }}, t)\) are bounded in \(\mathbb {B} \times [0, \tau ]\), \(s^{(0)}({\varvec{\beta }}, t)\) is bounded above 0; \(s^{(1)}({\varvec{\beta }}, t)=\nabla _{{\varvec{\beta }}} s^{(0)}({\varvec{\beta }}, t), s^{(2)}({\varvec{\beta }}, t)=\nabla _{{\varvec{\beta }}}^{2} s^{(0)}({\varvec{\beta }}, t)\) for \({\varvec{\beta }} \in \mathbb {B}, t \in [0, \tau ]\),
-
(3)
The matrix
$$\begin{aligned} \Sigma =\int _{0}^{\tau } v\left( {\varvec{\beta }}_{0}, t\right) s^{(0)}\left( {\varvec{\beta }}_{0}, t\right) \lambda _{0}(t) \textrm{d} t \end{aligned}$$(A.3)is positive definite, where \(v({\varvec{\beta }}, t)=s^{(2)}({\varvec{\beta }}, t) / s^{(0)}({\varvec{\beta }}, t) -\left( s^{(1)}({\varvec{\beta }}, t) / s^{(0)}({\varvec{\beta }}, t)\right) ^{\otimes 2}\).
-
(C5)
The sequence of distributions of \(n^{\frac{1}{2}}(\widetilde{E}({\varvec{\beta }}_{0}, t)-E ({\varvec{\beta }}_{0}, t))\) is tight on the space of càdlàg functions equipped with the product Skorohod topology.
-
(C6)
There exist \(q^{(d)}({\varvec{\beta }}, t, w)\) defined on \(\mathbb {B} \times [0, \tau ]^{2}\), satisfying:
-
(1)
\(\sup _{{\varvec{\beta }} \in \mathcal {B},(t, w) \in [0,\tau ]^{2}}\Vert Q^{(d)}({\varvec{\beta }}, t, w)-q^{(d)}({\varvec{\beta }}, t, w)\Vert {\mathop {\longrightarrow }\limits ^{P}} 0;\)
-
(2)
\(q^{(d)}({\varvec{\beta }}, t, w), d = 0,1,2\) are continuous function of \({\varvec{\beta }}\) uniformly in \((t, w) \in [0, \tau ]^{2}\); and \(q^{(d)}({\varvec{\beta }}, t, w)\) are bounded on \(\mathcal {B} \times [0, \tau ]^{2}\);
-
(3)
\(\sup _{n \ge 1} E\left( Q^{(d)}({\varvec{\beta }}, t, w)\right) \) is bounded sequence.
-
(C7)
for \(d=0,1,2\),
$$\begin{aligned} \sup _{{\varvec{\beta }} \in \mathbb {B}, t \in [0, \tau ]}\left\| \tilde{S}^{(d)}({\varvec{\beta }}, t)-s^{(d)}({\varvec{\beta }}, t)\right\| {\mathop {\longrightarrow }\limits ^{P}} 0, \end{aligned}$$(A.4)and
$$\begin{aligned} \sup _{{\varvec{\beta }} \in \mathbb {B},(t, w) \in [0, \tau ]^{2}}\left\| \tilde{Q}^{(d)}({\varvec{\beta }}, t, w)-q^{(d)}({\varvec{\beta }}, t, w)\right\| {\mathop {\longrightarrow }\limits ^{P}} 0.\nonumber \\ \end{aligned}$$(A.5)
Lemma A.1
Under the regularity conditions (C1)-(C7), as \(n\rightarrow \infty \), \(\widehat{{\varvec{\beta }}}_P\) converges to \({\varvec{\beta }}_0\) in probability and
where
and
Proof of Theorem 3.1:
The observed information matrix is
Define
\(H_p({\varvec{\beta }})\) can be written as
For any \(x \ne 0\), we have
where W is a discrete random variable with support \(\{W_j|W_j = x^\top {\varvec{Z}}_j, j \in \widetilde{R}(T_i)\}\) and the \(\textbf{E}_i[\cdot ]\) is with respect to probability \(\{p_k^i,\; k \in \widetilde{R}_i(T_i)\}\).
Let \(w_{\textrm{min}}\) and \(w_{\textrm{max}}\) represent minimum and maximum value of \(W_{j}.\) Easily note that this variance of W can be maximized when the \(w_{\textrm{min}}\) and \(w_{\textrm{max}}\) are both taken on with the probability 1/2. So equation (A.9) can be dominated by:
Hence, \(M_i \le \sum _{j \in \widetilde{R}_{i}} {\varvec{Z}}_jZ_j^\top \) and we can deduce that:
Let p be a n-dimensional vector and define D(p) be a diagonal matrix, with the k-th diagonal element being the k-element of p and other elements being 0. Due to (A.8), this quadratic form can be rewritten as:
where \({\varvec{Z}} = [{\varvec{Z}}_1, \ldots , {\varvec{Z}}_{n_i}]^{\top }\) and \(p^i = [p_1^i, \ldots , p_{n_i}^i]^{\top }\). \(n_i\) is the number of the elements of \(\widetilde{R}(T_i)\). Let \(w = [W_1,\ldots , W_{n_i}]^{\top } = {\varvec{Z}}^{\top } x\) and \(G(p) = D(p) - pp^{\top }\). Then (A.11) can be rewritten as
We hope to find an upper bound of \(M_i\), which is independent of \(\varvec{\beta }\). Note that in (A.12), \({\varvec{\beta }}\) is only in \(G(p^i)\), so we want to seek an upper bound of \(G(p^i)\). Consider a proportional expression as follow:
where \(p^{*} = [1/n_i, \ldots , 1/n_i]^{\top }\). If the above ratio is controlled by a proper constant independent of \({\varvec{\beta }}\), then we get an upper bound. Since in the proof of Theorem 2 we have found that
we need only to find a lower bound of \(w^{\top } G(p^{*}) w \).
Rewriting \(w^{\top } G(p^{*}) w \) as:
We can think of it as the variance of a random variable with probability \(1/ n_i\) at \(w_i, i= 1, 2,\ldots , n_i\), so our goal is equivalently to find a lower bound of the variance. Keep the maximum and the minimum of W fixed while consider all other components of the vector w as unknown variables of the variance function. In this case, the variance is minimized when the all remaining position variables take the value \((w_{\min } + w_{\max }) / 2\).
Hence we bound the ratio:
According to this, we can obtain that
Hence
and we can deduce that
\(\square \)
Proof of Theorem 3.2:
Let \({\varvec{\beta }}^{(m+1)} = {\varvec{\beta }}^{(m)} + B^{-1}U_P({\varvec{\beta }})\), we first prove that \(l_P({\varvec{\beta }}^{(m)})\) is an ascending sequence. consider the quadratic approximation of \(l_P({\varvec{\beta }}^{(m+1)}) - l_P({\varvec{\beta }}^{(m)})\):
where the inequality is strict if \(U_P({\varvec{\beta }}^{(m)})\ne 0\). Since \(l_P({\varvec{\beta }}^{(m+1)})\ge Q({\varvec{\beta }}^{(m+1)}|{\varvec{\beta }}^{(m)})\) for the reason \(H_P({\varvec{\beta }})\le \textbf{B}\), we have \(l_P({\varvec{\beta }}^{(m + 1)}) \ge l_P({\varvec{\beta }}^{(m)})\).
Suppose for purpose of contradiction that \(\Vert U_P({\varvec{\beta }}^{(m)})\Vert \) is bounded below 0, then
which means the increments of \(l_P({\varvec{\beta }}^{(m)})\) are positive, contradicting to boundedness of \(l_P({\varvec{\beta }})\). Therefore, \(U_P({\varvec{\beta }}^{(m)})\) converges to 0, deducing that \({\varvec{\beta }}^{(m)}\) converges to \(\widehat{{\varvec{\beta }}}_P\). \(\square \)
Proof of Theorem 3.3
From the proof of Theorem 1 we can learn that
Note that the rate of convergence, measured by degree of improvement at each iterative step is monotone decreasing with the least bound matrix \(\textbf{B}\) from (A.19). And
As a consequence that \(\textbf{B}_{1} \ge \textbf{B}_{2}\), we can deduce that QUB2 converges faster than QUB1 algorithm. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, C., Ding, J. & Feng, Y. A quadratic upper bound algorithm for regression analysis of credit risk under the proportional hazards model with case-cohort data. Stat Comput 33, 78 (2023). https://doi.org/10.1007/s11222-023-10248-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-023-10248-w