Abstract
This paper investigates an adapted loss function for the estimation of a linear regression with right censored responses. The adapted loss function could be used in composite quantile regression, which is a good method to handle the responses with high censored rate. Under some regular conditions, we establish the consistency and asymptotic normality of the resulting estimator. For estimation of regression parameters, we propose the MMCD algorithm, which generates satisfactory results for the proposed estimator. In addition, the algorithm can also be extended to the fused adaptive lasso penalized method to identify the interquantile commonality. The finite sample performances of the methods are further illustrated by numerical results and the analysis of two real datasets.
Similar content being viewed by others
References
Bang H, Tsiatis AA (2002) Median regression with censored cost data. Biometrics 58(3):643–649
Birke M, Van Bellegem S, Van Keilegom I (2017) Semi-parametric estimation in a single-index model with endogenous variables. Scand J Stat 44(1):168–191
Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71(5):1591–1608
Delsol L, Van Keilegom I (2020) Semiparametric M-estimation with non-smooth criterion functions. Ann Inst Stat Math 72(2):577–605
De Backer M, Ghouch AE, Van Keilegom I (2019) An adapted loss function for censored quantile regression. J Am Stat Assoc 114(527):1126–1137
Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Graph Stat 9(1):60–77
Hyde J (1980) Testing survival with incomplete observations, Biostatistics casebook, pp 31–46
Jiang L, Wang HJ, Bondell HD (2013) Interquantile shrinkage in regression models. J Comput Graph Stat 22(4):970–986
Jiang L, Bondell HD, Wang HJ (2014) Interquantile shrinkage and variable selection in quantile regression. Comput Stat Data An 69:208–219
Jiang R, Qian W, Zhou Z (2012) Variable selection and coefficient estimation via composite quantile regression with randomly censored data. Stat Probabil Lett 82(2):308–317
Jiang R, Hu X, Yu K (2018) Composite quantile regression for massive datasets. Statistics 52(5):980–1004
Koenker R (2015) Quantile regression. Cambridge University Press, New York
Koenker R, Bilias Y (2002) Quantile regression for duration data: A reappraisal of the Pennsylvania reemployment bonus experiments//Economic applications of quantile regression. Physica, Heidelberg
Koenker R, Geling O (2011) Reappraising medfly longevity: a quantile regression survival analysis. J Am Stat Assoc 96(454):458–468
Leng C, Tong X (2013) A quantile regression estimator for censored data. Bernoulli 19(1):344–361
Li KC, Wang JL, Chen CH (1999) Dimension reduction for censored regression data. Ann Stat 27:1–23
Lopez O (2011) Nonparametric estimation of the multivariate distribution function in a censored regression model with applications. Commun Stat-Theor M 40(15):2639–2660
Pohar M, Stare J (2006) Relative survival analysis in R. Comput Methods Programs Biomed 81(3):272–278
Portnoy S (2003) Censored regression quantiles. J Am Stat Assoc 98(464):1001–1012
Powell J (1986) Censored regression quantiles. J Econom 32:143–155
Stigler S (1984) Boscovich, Simpson and a 1760 manuscript note on fitting a linear relation. Biometrika 71:615–620
Sun J, Ma Y (2017) Empirical likelihood weighted composite quantile regression with partially missing covariates. J Nonparametr Stat 29(1):137–150
Tang Y, Wang HJ (2015) Penalized regression across multiple quantiles under random censoring. J Multivar Anal 141:132–146
Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York, NY
Wang HJ, Zhou J, Li Y (2013) Variable selection for censored quantile regresion. Stat Sin 23(1):145–167
Wang HJ, Wang L (2009) Locally weighted censored quantile regression. J Am Stat Assoc 104(487):1117–1128
Wey A, Wang L, Rudser K (2014) Censored quantile regression with recursive partitioning-based weights. Biostatistics 15(1):170–181
Yuan X, Li Y, Dong X, Liu T (2022) Optimal subsampling for composite quantile regression in big data. Stat Pap 63(5):1649–1676
Ying Z, Jung SH, Wei LJ (1995) Survival analysis with median regression models. J Am Stat Assoc 90(429):178–184
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126
Acknowledgements
We are grateful to the two reviewers and the associate editor for a number of constructive and helpful comments and suggestions that have clearly improved our manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The assumptions required for the proof of the theorem:
-
(C1)
\((Y_i,X_i,\delta _i),i=1,\cdots , n\) are from an i.i.d. multivariate random sample and censoring time \(C_i\) is conditionally independent of the survival time \(Y^*_i\) given the covariates \(X_i\).
-
(C2)
The support \(\text{ supp }(X)\) of X is contained in a compact subset of \(R^p\), and Var(X) is positive definite.
-
(C3)
Let \(f_{Y^*|X}(\cdot |x)\) denote the conditional density function of \(Y^*\) given \(X=x\). For \(\theta\) in a neighborhood \(\Theta = B \times \Delta _1 \times \cdots \times \Delta _q\) of \(\theta ^*\), \(\inf _{k}\inf _{\beta \in B,b_k\in \Delta _k } \inf _{x \in \text{ supp }(X)} f_{Y^*|X}(x^T\beta +b_k|x) > 0\), furthermore, \(\sup _{t,x} f_{Y^*|X}(t|x)<\infty\).
-
(C4)
Define the (possibly infinite) time \(\tau _{x}= \inf \{t: F_{Y|X}(\cdot |x)=1\}\), where \(F_{Y|X}\) designates the conditional c.d.f. of Y given X. Suppose first that there exists a real number \(\upsilon < \tau _{x}\) for all x. Denote next by \(\mathscr {G}\) the class of functions \(G(t,x): (-\infty ,\upsilon ]\times \text{ supp }(X) \rightarrow [0,1]\) of bounded variation with respect to t (uniformly in x) that have first-order partial derivatives with respect to x of bounded variation in t (uniformly in x), and bounded (uniformly in t) second-order partial derivatives with respect to x which are uniformly in t Lipschitz of order \(\eta\) for some \(0<\eta <1\). Suppose that \(G_C \in \mathscr {G}\). \(\sup _{y } \big |G_C(y|x)-G_C(y|x') \big |=O_p(\Vert x-x'\Vert )\).
-
(C5)
For \(x \in \text{ supp }(X)\) and for \(\beta \in B\), \(b_k \in \Delta _k\), \(k=1,\cdots , q\), the point \(x^T\beta +b_k\) lies below \(\upsilon\).
-
(C6)
\(\sup _{x\in \text{ supp }(X)}\sup _{y\le \upsilon } \big |\hat{G}_C(y|x)-G_C(y|x) \big |=o_p(1)\); \(P(\hat{G}\in \mathscr {G})\rightarrow 1\) as \(n\rightarrow \infty\).
-
(C7)
\(E\left[ X\left( \hat{G}_C(b_k+X^T\beta |X)-G_C(b_k+X^T\beta |X)\right) \right] =n^{-1} \sum _{i=1}^n X_i \xi (X_i,Y_i,\delta _i,\theta |X_i)+o_p(n^{-1/2})\), uniformly in \(\theta \in \Theta\), where \(\xi (X_i,Y_i,\delta _i,\theta |X_i)\), \(i=1,\cdots , n\) are i.i.d., \(E\xi (X_i,Y_i,\delta _i,\theta |X_i)=0\) and \(\sup _{\theta \in \Theta }E[\Vert \xi (X_i,Y_i,\delta _i,\theta |X_i)\Vert ^2 ]<\infty\).
-
(C8)
The kernel function \(K(\cdot )>0\) is compactly supported and Lipschitz continuous of order 1. Furthermore, \(\int K(u)du=1\), \(\int uK(u)du=0\), \(\int K^2(u)du < \infty\). The bandwidth \(h_n\) satisfies \(h_n=O(n^{-\upsilon })\), for \(1/4<\upsilon <1/3\).
-
(C9)
There exists an EDR direction \(\gamma _{0j}\in R^p\) such that for any \(j=1,\cdots ,D\), \(\hat{\gamma }_j-\gamma _{0j}=O_p(n^{-1/2})\); \(n^{-1/2}(\hat{\gamma }_j-\gamma _{0j})=n^{-1}\sum _{i=1}^n d_{ki}\) where \(d_{ki}\) are independent \(p-\)dimensional vectors with means zero and finite variances.
Assumption (C4) defines a general class of functions embedding \(G_C\) coming from the work of Lopez (2011). Assumption (C5) is required for the asymptotic properties of \(\hat{\beta }\). Assumptions (C6) and (C7) are requested for the asymptotic distribution of our estimator and requires implicitly a general linear representation of \(\hat{G_C}\). Assumption (C7) can be deduced from Assumption (C8) by Lemma A.1. Assumption (C9) states the \(\sqrt{n}\) consistency of the estimated EDR direction and the linear presentation of \(\gamma _j\), which are needed to help establish the normality of \(\hat{\beta }\). These conditions can be obtained for the sliced inverse regression estimation in Li et al. (1999).
We list a preliminary lemma which is also used in the proofs of the main results.
Lemma A.1
Suppose Assumptions (C3), (C4) and (C8) hold, then uniformly in \(\theta \in \Theta\),
and
where \(\xi (X_i,Y_i,\delta _i,t|X_i)=(1-G_C(t|X_i))\left[ \int _0^{Y_i\wedge t}\frac{-d H_0(s|X_i)}{\{ 1-F_{Y|X}(s|X_i)\}^2} + \frac{(1-\delta _i)I(Y_i\le t)}{1-F_{Y|X}(Y_i|X_i)}\right]\), \(H_0(t|x)=P(Y\le t,\delta =0|X=x)=\int _{0}^t (1-F_{Y|X}(s|x))dG(s|x)\).
The proof of Lemma A. A.1 See Lemma 1 in De Backer et al. (2019).
Lemma A.2
Suppose Assumptions (C1-C9) hold, then for any \(D\ge 1\), we have
The proof of Lemma A. A.2 See Lemma A.1 in Wang et al. (2013).
The proof of Theorem 1 We use the result of Theorem 1 in Delsol and Van Keilegom (2020)(DVK) on the consistency of the M-estimator, which depends on conditions (A1)-(A5). We now need to verify the conditions (A1)-(A5) to prove the consistency of \(\hat{\beta }\).
Condition (A1) in DVK is satisfied by construction of \(\hat{\beta }\). Condition (A3) in DVK is satisfied as well provided assumption (C6). We therefore need to verify conditions (A2), (A4) and (A5).
Condition (A2) ensures the uniqueness of \(\theta\). We need to verify that for any \(\epsilon >0\),
Using the definition of \(M_{ni}\), we have
By the conditional expectation, we have
Therefore,
the latter expectation is positive, hereby condition (A2) is satisfied.
Next, for (A4) to hold, it suffices by Remark 1(ii) in DVK and assumption (C3) to show that the class
is Glivenko-Cantelli. For this, by Theorem 2.4.1 in Vaart and Wellner (1996), we need to prove that for all \(\epsilon >0\), the \(\epsilon\)-bracketing number \(N_{[]}(\epsilon ,\mathscr {F}, L_1(P))\) of the class \(\mathscr {F}\) with respect to the \(L_1\) probability measure on (Y, X) is finite. Let
and
we have \(M_{ni} (\theta , G_C)= \sum _{k=1}^q \psi _{\tau _k}(X,Y,\beta ,b_k)\). From this decomposition, it is easy to see that
By the proof of Theorem 3.1 in De Backer et al. (2019), we have
for some \(K>0\). It follows that condition (A4) holds.
Lastly, for condition (A5), we need to establish that
Note that
Under assumption (C5), this expression can be bounded above by
which converges to 0 when \(d_{\mathscr {G}}(G,G_C)\rightarrow 0\), provided assumption (C4) holds. Thus condition (A5) holds.
Hence the assumptions of Theorem 1 in DVK are met. The weak consistency of \(\hat{\beta }\) follows. Theorem 1 is then proved.
The proof of Theorem 2 Given that \(\hat{\beta }\) is shown to be weakly consistent in Theorem 1 and that \(\hat{G}_C\) is assumed by (C6) to be a uniformly consistent estimator of \(G_C\), to prove that our proposed estimator is asymptotically normally distributed we now restrict the spaces \(\Theta\) and \(\mathscr {G}\) to shrinking neighborhoods around the true \(\theta ^*\) and \(G_C\). Define the spaces \(\Theta _\delta =\{\theta \in \Theta :\Vert \theta - \theta ^*\Vert \le \delta _n \}\) and \(\mathscr {G}_\delta =\{G\in \mathscr {G}: d(G,G_C) \le \delta _n \}\) for some \(\delta _n=o(1)\).
We use the results of Proposition 2 in Birke et al. (2017) to establish the asymptotic normality of our proposed estimator. Thus we need to verify conditions (C.1)-(C.6) of Proposition 2 in Birke et al. (2017). Define
where
\(m_1(Y_i,X_i,\theta ,G)=X_i\sum _{k=1}^q\left\{ (1-\tau _k)(1-G(X^T_i\beta +b_k|R_i))-I(Y_i>X^T_i\beta +b_k) \right\} ,\)
and \(m_{2k}(Y_i,X_i,\theta ,G)=(1-\tau _k)\left\{ 1-G(X^T_i\beta +b_k|R_i))-I(Y_i>X^T_i\beta +b_k)\right\} , k=1,\cdots ,q.\) Furthermore, let
where \(M_j(\theta ,G)=E(m_j(Y_i,X_i,\theta ,G))\), \(j=1,2\) and
and observe that \(M(\theta ^*,G_C)=0\).
We now verify the conditions of Proposition 2 in Birke et al. (2017). First, note that (C.1) trivially holds by construction of our estimator. Next, for \(\theta \in \Theta _\delta\), let \(\Gamma _1(\theta ,G_C)\) denote the ordinary derivative of \(M(\theta ,G)\) with respect to \(\theta\), that is,
where
and
Under assumptions (C1)-(C4), \(\Gamma _1(\theta ,G_C)\) is then observed to be continuous and of full rank at \(\theta ^*\). Hence, condition (C.2) is satisfied.
For condition (C.3), define first for all \(\theta \in \Theta _\delta\) the functional derivative of \(M(\theta ,G)\) at \(G_C\) in the direction \([G-G_C]\) as
Observe that for all \((\theta ,G) \in \Theta _\delta \times \mathscr {G}_\delta\), \(M(\theta ,G)\) is linear in G since
This verifies the first part of (C.3). For the second part, we have to show that
To prove this, By Assumption (C7), we have uniformly in \(\theta \in \Theta _\delta\) that
where
By the central limit theorem,
where V is finite under Assumptions (C1) and (C8). This implies that condition (C.3) in Birke et al. (2017) is satisfied. Next, condition (C.4) in Birke et al. (2017) is satisfied by Assumption (C6). To establish that condition (C.5) in Birke et al. (2017) holds as well, we need to verify the conditions (3.1-3.3) of Theorem 3 in Chen et al. (2003).
Let \(m=m_c+m_{lc}\), where
and
Then, condition (3.1) is easily observed to hold for some \(s_j\), \(s_{1j} \in (0,1]\) and \(r=2\) under Assumption (C2). For condition (3.2), as \(m_{lc}\) does not depend here on G, we first note via the proof of Theorem 3 in Chen et al. (2003) that the constant \(s_j\) controlling for the regularity of the nuisance parameter and appearing in condition (3.2) may in fact be replaced by the constant \(s_{1j}\) already appearing in condition (3.1). Therefore, we will verify condition (3.2) with respect to \(s_{1j}\) instead of \(s_j\) as initially stated in Chen et al. (2003). To that end, first it can be observed that for all positive values \(\epsilon _n=o(1)\).
Hence, we have that
and
for some finite constant \(K_4\) under assumptions (C3) and (C4). Hence, provided Assumption (C2) is satisfied, we may observe that condition (3.2) holds for \(s_{1j} = 1/2\). For the last condition of Theorem 3 in Chen et al. (2003), for \(\epsilon >0\) denote first by \(N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G})\) the the covering number (Van der Vaart and Wellner (1996, p. 83)) of the class \(\mathscr {G}_\delta\) under the sup-norm metric we consider on the latter with a slight abuse of notation. Now, keeping in mind that \(N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G}) \le N_{[]}(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G})\), and since all the functions in the class \(\mathscr {G}_\delta\) have values between 0 and 1 by (C4), we first observe that only one \(\epsilon\)-bracket suffices to cover \(\mathscr {G}_\delta\) if \(\epsilon >1\). Then, using Lemma 6.1 in Lopez (2011) for a bound on the bracketing number for the case \(\epsilon \le 1\), we have that
for some finite constant \(K_5\), hereby satisfying condition (3.3) in Chen et al. (2003) for \(s_j=1\). It then follows from their Theorem 3 that condition (C.5) in Birke et al. (2017) holds in our context.
Lastly, for condition (C.6) we need to establish that
for some positive definite matrix \(\varSigma\). Recalling that
is the average of independent random vectors with mean 0, this follows easily using the same arguments as for the verification of condition (C.3) for the particular case of \(\theta =\theta ^*\), Hence, we obtain
where \(\varSigma =Cov(\varLambda _i)\) with
Theorem 2 then follows directly from an application of Proposition 2 in Birke et al. (2017),
Let \(\varSigma _1\) be the top left-hand \(p \times p\) submatrix of \(\varSigma\), then we have
where \(V=\Gamma _{1,11}^{-1} (\theta ^*,G_C)\varSigma _1\Gamma _{1,11}^{-1}(\theta ^*,G_C)\).
Lemma A.3
Suppose Assumptions (C1)-(C9) in the Appendix hold, Let
Denote \(\hat{\vartheta }=\arg \min L(\vartheta )\), as \(n\rightarrow \infty\), we have
where
and \(\varSigma _2=Cov(\varLambda _i)\), and \(\varLambda _i\) has the similar form in (A.1).
The proof of Lemma A.A.3 Similar to the proof of Theorem 2.
The proof of Theorem 3 We first prove the sparsity. The parameter \(\vartheta\) can be decomposed as \((\vartheta _{A_1}^T,\vartheta _{A_0}^T)^T\). Denote the estimator as \((\hat{\vartheta }_{FAL,A_1}^T,\hat{\vartheta }_{FAL,A_0}^T)^T\) and the true value as \((\vartheta ^{*T}_{A_1},0^T)^T\). Suppose there exists a \(l \in A_0\) such that \(\hat{\vartheta }_{FAL,l}\ne 0\). Let \(\tilde{\vartheta }\) be a vector constructed by replacing \(\hat{\vartheta }_{FAL,l}\) with 0 in \(\hat{\vartheta }\).
Note that for any \(\tau \in (0,1)\),
Then, we have
where the last inequality holds as \(\sum _{i=1}^n\sum _{k=1}^q \Vert Z_{ki}\Vert |=O_p(n)\) and \(\lambda _n\tilde{w}_{l}\ge C_1 n^{r/2}\lambda _n\rightarrow \infty\). Then (A.2) contradicts the fact that \(L_{FAL}(\hat{\vartheta }_{FAL})\le L_{FAL}(\tilde{\vartheta })\). (a) is thus proved.
We next prove (b). By Theorem 3(a), as \(n\rightarrow \infty\), \(P(\hat{\vartheta }_{A_0}=0)\rightarrow 0\). Let \(\bar{A}_1=\{j: d^*_j \ne 0\}\). The objective function \(L_{AL}(\vartheta )\) is asymptotically equivalent to \(L_{AL}(\vartheta _{A_1})\)
Define the negative subgradient function of \(L_{AL}(\vartheta _{A_1})\) with respect to \(\vartheta _{A_1}\) as
where
and \(m_{AL,2k}(Y_i,X_i,\vartheta _{A_1},{G}_C)=(1-\tau _k)\left\{ 1-{G}(b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}|X_i))-I(Y_i>b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1})\right\}\) for \(k=1,\cdots ,q.\) For \(l\in \bar{A}_1\), we have \(\hat{d}_l \ne 0\) and \(w_{l}=|\breve{d}_l|^{-r}\) is bounded. Since \(n^{1/n}\lambda _n\rightarrow 0\), we have \(\lambda _n \sum _{j\in \bar{A}_1} w_{l}|d_{l}|=o_p(n^{-1/2})\). Similar to the proof of Theorem 2, we can establish the asymptotic normality of the \(\hat{\vartheta }_{A1}\). \(n^{1/2}(\hat{\vartheta }_{A_1}-\vartheta ^*_{A_1}){\mathop {\longrightarrow }\limits ^{d}}N(0, \varSigma _{A_1})\), with \(\varSigma _{A_1}=\left( \Gamma _2^{-1} \varSigma _2 \Gamma _2^{-1}\right) _{A_1\times A_1}\), \(\varSigma _{A_1}\) is as defined in Lemma A. A.3 associated with the index set \(A_1\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yuan, X., Zhang, X., Guo, W. et al. An adapted loss function for composite quantile regression with censored data. Comput Stat 39, 1371–1401 (2024). https://doi.org/10.1007/s00180-023-01352-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-023-01352-6