Skip to main content
Log in

An adapted loss function for composite quantile regression with censored data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This paper investigates an adapted loss function for the estimation of a linear regression with right censored responses. The adapted loss function could be used in composite quantile regression, which is a good method to handle the responses with high censored rate. Under some regular conditions, we establish the consistency and asymptotic normality of the resulting estimator. For estimation of regression parameters, we propose the MMCD algorithm, which generates satisfactory results for the proposed estimator. In addition, the algorithm can also be extended to the fused adaptive lasso penalized method to identify the interquantile commonality. The finite sample performances of the methods are further illustrated by numerical results and the analysis of two real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bang H, Tsiatis AA (2002) Median regression with censored cost data. Biometrics 58(3):643–649

    Article  MathSciNet  Google Scholar 

  • Birke M, Van Bellegem S, Van Keilegom I (2017) Semi-parametric estimation in a single-index model with endogenous variables. Scand J Stat 44(1):168–191

    Article  MathSciNet  Google Scholar 

  • Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71(5):1591–1608

    Article  MathSciNet  Google Scholar 

  • Delsol L, Van Keilegom I (2020) Semiparametric M-estimation with non-smooth criterion functions. Ann Inst Stat Math 72(2):577–605

    Article  MathSciNet  Google Scholar 

  • De Backer M, Ghouch AE, Van Keilegom I (2019) An adapted loss function for censored quantile regression. J Am Stat Assoc 114(527):1126–1137

    Article  MathSciNet  Google Scholar 

  • Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Graph Stat 9(1):60–77

    MathSciNet  Google Scholar 

  • Hyde J (1980) Testing survival with incomplete observations, Biostatistics casebook, pp 31–46

  • Jiang L, Wang HJ, Bondell HD (2013) Interquantile shrinkage in regression models. J Comput Graph Stat 22(4):970–986

    Article  MathSciNet  Google Scholar 

  • Jiang L, Bondell HD, Wang HJ (2014) Interquantile shrinkage and variable selection in quantile regression. Comput Stat Data An 69:208–219

    Article  MathSciNet  Google Scholar 

  • Jiang R, Qian W, Zhou Z (2012) Variable selection and coefficient estimation via composite quantile regression with randomly censored data. Stat Probabil Lett 82(2):308–317

    Article  MathSciNet  Google Scholar 

  • Jiang R, Hu X, Yu K (2018) Composite quantile regression for massive datasets. Statistics 52(5):980–1004

    Article  MathSciNet  Google Scholar 

  • Koenker R (2015) Quantile regression. Cambridge University Press, New York

    Google Scholar 

  • Koenker R, Bilias Y (2002) Quantile regression for duration data: A reappraisal of the Pennsylvania reemployment bonus experiments//Economic applications of quantile regression. Physica, Heidelberg

    Google Scholar 

  • Koenker R, Geling O (2011) Reappraising medfly longevity: a quantile regression survival analysis. J Am Stat Assoc 96(454):458–468

    Article  MathSciNet  Google Scholar 

  • Leng C, Tong X (2013) A quantile regression estimator for censored data. Bernoulli 19(1):344–361

    Article  MathSciNet  Google Scholar 

  • Li KC, Wang JL, Chen CH (1999) Dimension reduction for censored regression data. Ann Stat 27:1–23

    Article  MathSciNet  Google Scholar 

  • Lopez O (2011) Nonparametric estimation of the multivariate distribution function in a censored regression model with applications. Commun Stat-Theor M 40(15):2639–2660

    Article  MathSciNet  Google Scholar 

  • Pohar M, Stare J (2006) Relative survival analysis in R. Comput Methods Programs Biomed 81(3):272–278

    Article  Google Scholar 

  • Portnoy S (2003) Censored regression quantiles. J Am Stat Assoc 98(464):1001–1012

    Article  MathSciNet  Google Scholar 

  • Powell J (1986) Censored regression quantiles. J Econom 32:143–155

    Article  MathSciNet  Google Scholar 

  • Stigler S (1984) Boscovich, Simpson and a 1760 manuscript note on fitting a linear relation. Biometrika 71:615–620

    Article  MathSciNet  Google Scholar 

  • Sun J, Ma Y (2017) Empirical likelihood weighted composite quantile regression with partially missing covariates. J Nonparametr Stat 29(1):137–150

    Article  MathSciNet  Google Scholar 

  • Tang Y, Wang HJ (2015) Penalized regression across multiple quantiles under random censoring. J Multivar Anal 141:132–146

    Article  MathSciNet  Google Scholar 

  • Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York, NY

    Book  Google Scholar 

  • Wang HJ, Zhou J, Li Y (2013) Variable selection for censored quantile regresion. Stat Sin 23(1):145–167

    MathSciNet  Google Scholar 

  • Wang HJ, Wang L (2009) Locally weighted censored quantile regression. J Am Stat Assoc 104(487):1117–1128

    Article  MathSciNet  Google Scholar 

  • Wey A, Wang L, Rudser K (2014) Censored quantile regression with recursive partitioning-based weights. Biostatistics 15(1):170–181

    Article  Google Scholar 

  • Yuan X, Li Y, Dong X, Liu T (2022) Optimal subsampling for composite quantile regression in big data. Stat Pap 63(5):1649–1676

    Article  MathSciNet  Google Scholar 

  • Ying Z, Jung SH, Wei LJ (1995) Survival analysis with median regression models. J Am Stat Assoc 90(429):178–184

    Article  MathSciNet  Google Scholar 

  • Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are grateful to the two reviewers and the associate editor for a number of constructive and helpful comments and suggestions that have clearly improved our manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qian Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The assumptions required for the proof of the theorem:

  1. (C1)

    \((Y_i,X_i,\delta _i),i=1,\cdots , n\) are from an i.i.d. multivariate random sample and censoring time \(C_i\) is conditionally independent of the survival time \(Y^*_i\) given the covariates \(X_i\).

  2. (C2)

    The support \(\text{ supp }(X)\) of X is contained in a compact subset of \(R^p\), and Var(X) is positive definite.

  3. (C3)

    Let \(f_{Y^*|X}(\cdot |x)\) denote the conditional density function of \(Y^*\) given \(X=x\). For \(\theta\) in a neighborhood \(\Theta = B \times \Delta _1 \times \cdots \times \Delta _q\) of \(\theta ^*\), \(\inf _{k}\inf _{\beta \in B,b_k\in \Delta _k } \inf _{x \in \text{ supp }(X)} f_{Y^*|X}(x^T\beta +b_k|x) > 0\), furthermore, \(\sup _{t,x} f_{Y^*|X}(t|x)<\infty\).

  4. (C4)

    Define the (possibly infinite) time \(\tau _{x}= \inf \{t: F_{Y|X}(\cdot |x)=1\}\), where \(F_{Y|X}\) designates the conditional c.d.f. of Y given X. Suppose first that there exists a real number \(\upsilon < \tau _{x}\) for all x. Denote next by \(\mathscr {G}\) the class of functions \(G(t,x): (-\infty ,\upsilon ]\times \text{ supp }(X) \rightarrow [0,1]\) of bounded variation with respect to t (uniformly in x) that have first-order partial derivatives with respect to x of bounded variation in t (uniformly in x), and bounded (uniformly in t) second-order partial derivatives with respect to x which are uniformly in t Lipschitz of order \(\eta\) for some \(0<\eta <1\). Suppose that \(G_C \in \mathscr {G}\). \(\sup _{y } \big |G_C(y|x)-G_C(y|x') \big |=O_p(\Vert x-x'\Vert )\).

  5. (C5)

    For \(x \in \text{ supp }(X)\) and for \(\beta \in B\), \(b_k \in \Delta _k\), \(k=1,\cdots , q\), the point \(x^T\beta +b_k\) lies below \(\upsilon\).

  6. (C6)

    \(\sup _{x\in \text{ supp }(X)}\sup _{y\le \upsilon } \big |\hat{G}_C(y|x)-G_C(y|x) \big |=o_p(1)\); \(P(\hat{G}\in \mathscr {G})\rightarrow 1\) as \(n\rightarrow \infty\).

  7. (C7)

    \(E\left[ X\left( \hat{G}_C(b_k+X^T\beta |X)-G_C(b_k+X^T\beta |X)\right) \right] =n^{-1} \sum _{i=1}^n X_i \xi (X_i,Y_i,\delta _i,\theta |X_i)+o_p(n^{-1/2})\), uniformly in \(\theta \in \Theta\), where \(\xi (X_i,Y_i,\delta _i,\theta |X_i)\), \(i=1,\cdots , n\) are i.i.d., \(E\xi (X_i,Y_i,\delta _i,\theta |X_i)=0\) and \(\sup _{\theta \in \Theta }E[\Vert \xi (X_i,Y_i,\delta _i,\theta |X_i)\Vert ^2 ]<\infty\).

  8. (C8)

    The kernel function \(K(\cdot )>0\) is compactly supported and Lipschitz continuous of order 1. Furthermore, \(\int K(u)du=1\), \(\int uK(u)du=0\), \(\int K^2(u)du < \infty\). The bandwidth \(h_n\) satisfies \(h_n=O(n^{-\upsilon })\), for \(1/4<\upsilon <1/3\).

  9. (C9)

    There exists an EDR direction \(\gamma _{0j}\in R^p\) such that for any \(j=1,\cdots ,D\), \(\hat{\gamma }_j-\gamma _{0j}=O_p(n^{-1/2})\); \(n^{-1/2}(\hat{\gamma }_j-\gamma _{0j})=n^{-1}\sum _{i=1}^n d_{ki}\) where \(d_{ki}\) are independent \(p-\)dimensional vectors with means zero and finite variances.

Assumption (C4) defines a general class of functions embedding \(G_C\) coming from the work of Lopez (2011). Assumption (C5) is required for the asymptotic properties of \(\hat{\beta }\). Assumptions (C6) and (C7) are requested for the asymptotic distribution of our estimator and requires implicitly a general linear representation of \(\hat{G_C}\). Assumption (C7) can be deduced from Assumption (C8) by Lemma A.1. Assumption (C9) states the \(\sqrt{n}\) consistency of the estimated EDR direction and the linear presentation of \(\gamma _j\), which are needed to help establish the normality of \(\hat{\beta }\). These conditions can be obtained for the sliced inverse regression estimation in Li et al. (1999).

We list a preliminary lemma which is also used in the proofs of the main results.

Lemma A.1

Suppose Assumptions (C3), (C4) and (C8) hold, then uniformly in \(\theta \in \Theta\),

$$\begin{aligned} E\left[ X_i\left( \hat{G}_C(X^T_i\beta +b_k|X_i)-G_C(X^T_i\beta +b_k|X_i)\right) \right] = n^{-1}\sum _{i=1}^n X_i \xi (X_i,Y_i,\delta _i,\theta |X_i)+o_p(n^{-1/2}), \end{aligned}$$

and

$$\begin{aligned} E\left[ \hat{G}_C(X^T_i\beta +b_k|X_i)-G_C(X^T_i\beta +b_k|X_i) \right] =n^{-1}\sum _{i=1}^n\xi (X_i,Y_i,\delta _i,\theta |X_i)+o_p(n^{-1/2}), \end{aligned}$$

where \(\xi (X_i,Y_i,\delta _i,t|X_i)=(1-G_C(t|X_i))\left[ \int _0^{Y_i\wedge t}\frac{-d H_0(s|X_i)}{\{ 1-F_{Y|X}(s|X_i)\}^2} + \frac{(1-\delta _i)I(Y_i\le t)}{1-F_{Y|X}(Y_i|X_i)}\right]\), \(H_0(t|x)=P(Y\le t,\delta =0|X=x)=\int _{0}^t (1-F_{Y|X}(s|x))dG(s|x)\).

The proof of Lemma A. A.1 See Lemma 1 in De Backer et al. (2019).

Lemma A.2

Suppose Assumptions (C1-C9) hold, then for any \(D\ge 1\), we have

$$\begin{aligned} \sup _{y}\sup _x |\hat{G}_C(t)-G_C(y|X)|=\sup _{y}\sup _x |\hat{G}_C(t)-G_C(y|R)|=O_p\left( \{\log n /(n h^p_n)\}^{1/2}+h_n^v \right) . \end{aligned}$$

The proof of Lemma A. A.2 See Lemma A.1 in Wang et al. (2013).

The proof of Theorem 1 We use the result of Theorem 1 in Delsol and Van Keilegom (2020)(DVK) on the consistency of the M-estimator, which depends on conditions (A1)-(A5). We now need to verify the conditions (A1)-(A5) to prove the consistency of \(\hat{\beta }\).

Condition (A1) in DVK is satisfied by construction of \(\hat{\beta }\). Condition (A3) in DVK is satisfied as well provided assumption (C6). We therefore need to verify conditions (A2), (A4) and (A5).

Condition (A2) ensures the uniqueness of \(\theta\). We need to verify that for any \(\epsilon >0\),

$$\begin{aligned} \inf _{\Vert \theta -\theta ^*\Vert>\epsilon } E\left[ M_{ni} (\theta ^*, G_C)-M_{ni}(\theta , G_C) \right] >0. \end{aligned}$$

Using the definition of \(M_{ni}\), we have

$$\begin{aligned}{} & {} \inf _{\Vert \theta -\theta ^*\Vert>\epsilon } E\left[ M_{ni} (\theta ^*, G_C)-M_{ni}(\theta , G_C) \right] \\= & {} \inf _{\Vert \theta -\theta ^*\Vert >\epsilon }\sum _{k=1}^q E\left[ \int _{X^T\beta +b_k}^{X^T\beta ^*+b_{k}^*}\left( 1(Y\ge s)-(1-\tau _k)(1-G_C(s|X)) \right) ds\right] . \end{aligned}$$

By the conditional expectation, we have

$$\begin{aligned}{} & {} E\left[ \int _{X^T\beta +b_k}^{X^T\beta ^*+b_{k}^*}\left( 1(Y\ge s)-(1-\tau _k)(1-G_C(s|X)) \right) ds\right] \\=\, & {} E \left[ \int _{X^T\beta +b_k}^{X^T\beta ^*+b_{k}^*}\left( (1-G_C(s|X))(1-F_{Y|X}(s|X)) -(1-\tau _k)(1-G_C(s|X)) \right) ds\right] . \end{aligned}$$

Therefore,

$$\begin{aligned}{} & {} \inf _{\Vert \theta -\theta ^*\Vert>\epsilon } E\left[ M_{ni} (\theta ^*, G_C)-M_{ni}(\theta , G_C) \right] \\= & {} \inf _{\Vert \theta -\theta ^*\Vert >\epsilon }\sum _{k=1}^q E\left[ \int _{X^T\beta +b_k}^{X^T\beta ^*+b_{k}^*}\left( 1-G_C(s|X))( \tau _k-F_{T|X}(s|X)) ds \right) ds\right] , \end{aligned}$$

the latter expectation is positive, hereby condition (A2) is satisfied.

Next, for (A4) to hold, it suffices by Remark 1(ii) in DVK and assumption (C3) to show that the class

$$\begin{aligned} \mathscr {F}=\left\{ (y,x)\mapsto M_{ni} (\theta , G_C), \theta \in \Theta , G\in G \right\} \end{aligned}$$

is Glivenko-Cantelli. For this, by Theorem 2.4.1 in Vaart and Wellner (1996), we need to prove that for all \(\epsilon >0\), the \(\epsilon\)-bracketing number \(N_{[]}(\epsilon ,\mathscr {F}, L_1(P))\) of the class \(\mathscr {F}\) with respect to the \(L_1\) probability measure on (YX) is finite. Let

$$\begin{aligned} \psi _{\tau _k}(X,Y,\beta ,b_k)= \rho _{\tau _k} (Y-X^T\beta -b_k)-(1-\tau ) \int _{0}^{X^T\beta +b_k} G_C(s|X)ds, \end{aligned}$$

and

$$\begin{aligned} \mathscr {F}_k=\left\{ (y,x)\mapsto \psi _{\tau _k}(X,Y,\beta ,b_k), \theta \in \Theta , b_k \in \Delta _k, G\in G \right\} , k=1,\cdots ,q, \end{aligned}$$

we have \(M_{ni} (\theta , G_C)= \sum _{k=1}^q \psi _{\tau _k}(X,Y,\beta ,b_k)\). From this decomposition, it is easy to see that

$$\begin{aligned} N_{[]}(\epsilon ,\mathscr {F}, L_1(P))\le \prod _{k=1}^q N_{[]}(\epsilon ,\mathscr {F}_k, L_1(P)). \end{aligned}$$

By the proof of Theorem 3.1 in De Backer et al. (2019), we have

$$\begin{aligned} N_{[]}(\epsilon ,\mathscr {F}_k, L_1(P))=O(\exp \{K\epsilon ^{-2/(1+\eta )} \}) \end{aligned}$$

for some \(K>0\). It follows that condition (A4) holds.

Lastly, for condition (A5), we need to establish that

$$\begin{aligned} \lim _{d_{\mathscr {G}}(G,G_C)\rightarrow 0} \sup _{\theta \in \Theta } \Big |E\left[ M_{ni} (\theta ,G)-M_{ni} (\theta , G_C) \right] \Big |=0. \end{aligned}$$

Note that

$$\begin{aligned}{} & {} \sup _{\theta \in \Theta } \Big |E\left[ M_{ni} (\theta ,G)-M_{ni} (\theta , G_C) \right] \Big | \\\le & {} \sum _{k=1}^q (1-\tau _k) \sup _{\beta \in B,b_k \in \Delta _k} E\left\{ \int _{0}^{X^T\beta +b_k}|G(s|X)- {G}_C(s|X) |ds\right\} . \end{aligned}$$

Under assumption (C5), this expression can be bounded above by

$$\begin{aligned}{} & {} \sum _{k=1}^q (1-\tau _k)\int _{0}^{v}\sup _{x\in \text{ supp }(X)} |G(s|x)- {G}_C(s|x) |ds \\\le & {} \sum _{k=1}^q (1-\tau _k) v \sup _{x\in \text{ supp }(X)}\sup _{y\le v} |G(y|x)- {G}_C(y|x) |, \end{aligned}$$

which converges to 0 when \(d_{\mathscr {G}}(G,G_C)\rightarrow 0\), provided assumption (C4) holds. Thus condition (A5) holds.

Hence the assumptions of Theorem 1 in DVK are met. The weak consistency of \(\hat{\beta }\) follows. Theorem 1 is then proved.

The proof of Theorem 2 Given that \(\hat{\beta }\) is shown to be weakly consistent in Theorem 1 and that \(\hat{G}_C\) is assumed by (C6) to be a uniformly consistent estimator of \(G_C\), to prove that our proposed estimator is asymptotically normally distributed we now restrict the spaces \(\Theta\) and \(\mathscr {G}\) to shrinking neighborhoods around the true \(\theta ^*\) and \(G_C\). Define the spaces \(\Theta _\delta =\{\theta \in \Theta :\Vert \theta - \theta ^*\Vert \le \delta _n \}\) and \(\mathscr {G}_\delta =\{G\in \mathscr {G}: d(G,G_C) \le \delta _n \}\) for some \(\delta _n=o(1)\).

We use the results of Proposition 2 in Birke et al. (2017) to establish the asymptotic normality of our proposed estimator. Thus we need to verify conditions (C.1)-(C.6) of Proposition 2 in Birke et al. (2017). Define

$$\begin{aligned} M_n(\theta ,G)=n^{-1} \sum _{i=1}^n m(Y_i,X_i,\theta ,G), \end{aligned}$$

where

$$\begin{aligned} m(Y_i,X_i,\theta ,G)= \left( \begin{array}{c} m_1(Y_i,X_i,\theta ,G) \\ m_{2}(Y_i,X_i,\theta ,G) \end{array}\right) , \end{aligned}$$

\(m_1(Y_i,X_i,\theta ,G)=X_i\sum _{k=1}^q\left\{ (1-\tau _k)(1-G(X^T_i\beta +b_k|R_i))-I(Y_i>X^T_i\beta +b_k) \right\} ,\)

$$\begin{aligned} m_2(Y_i,X_i,\theta ,G)= \left( \begin{array}{c} m_{21}(Y_i,X_i,\theta ,G)\\ \vdots \\ m_{2q}(Y_i,X_i,\theta ,G) \end{array}\right) , \end{aligned}$$

and \(m_{2k}(Y_i,X_i,\theta ,G)=(1-\tau _k)\left\{ 1-G(X^T_i\beta +b_k|R_i))-I(Y_i>X^T_i\beta +b_k)\right\} , k=1,\cdots ,q.\) Furthermore, let

$$\begin{aligned} M(\theta ,G)= & {} \left( \begin{array}{c} M_1(\theta ,G) \\ M_2(\theta ,G) \end{array} \right) =\left( \begin{array}{c} E(m_1(Y_i,X_i,\theta ,G)) \\ E(m_2(Y_i,X_i,\theta ,G)) \end{array}\right) , \end{aligned}$$

where \(M_j(\theta ,G)=E(m_j(Y_i,X_i,\theta ,G))\), \(j=1,2\) and

$$\begin{aligned}{} & {} E[m_1(Y,X,\theta ,G)]\\=\, & {} E \left[ X\sum _{k=1}^q (1-\tau _k)\left( 1-G(X^T\beta +b_k|X)\right) \right] \\{} & {} -E \left[ X\sum _{k=1}^q \left( 1- F_{T|X}(X^T\beta +b_k|X) \right) \left( 1-G_C(X^T\beta +b_k|X)\right) \right] \\{} & {} E[m_{2k}(Y,X,\theta ,G)]=0,k=1,\cdots ,q. \end{aligned}$$

and observe that \(M(\theta ^*,G_C)=0\).

We now verify the conditions of Proposition 2 in Birke et al. (2017). First, note that (C.1) trivially holds by construction of our estimator. Next, for \(\theta \in \Theta _\delta\), let \(\Gamma _1(\theta ,G_C)\) denote the ordinary derivative of \(M(\theta ,G)\) with respect to \(\theta\), that is,

$$\begin{aligned} \Gamma _1(\theta ,G_C)= & {} \left( \begin{array}{cc} \Gamma _{1,11} (\theta ,G_C) &{} 0 \\ 0 &{} \Gamma _{1,22} (\theta ,G_C) \end{array}\right) , \end{aligned}$$

where

$$\begin{aligned} \Gamma _{1,11} (\theta ,G_C)= & {} E \left[ \frac{\partial m_1(\theta ,G_C)}{\partial \beta }\right] = E \left[ XX^T\sum _{k=1}^q a_k(\theta , G_C )\right] ,\\ \Gamma _{1,22} (\theta ,G_C)= & {} \text{ diag }\left( \begin{array}{ccc} \Gamma _{1,22(1)}(\theta ,G_C),&\cdots&,\Gamma _{1,22(q)}(\theta ,G_C) \end{array}\right) ,\\ \Gamma _{1,22(k)}(\theta ,G_C)= & {} E \left[ \frac{\partial m_{2(k)}(\theta ,G_C)}{\partial b_k}\right] = E \left[ a_k(\theta , G_C )\right] , k=1,\cdots ,q, \end{aligned}$$

and

$$\begin{aligned} a_k(\theta , G_C )=\, & {} f_{T|X}(X^T\beta +b_k|X) \left( 1-G_C(X^T\beta +b_k|X)\right) \\{} & {} + g_{C}(X^T\beta +b_k|X) \left( \tau _k-F_{Y|X}(X^T\beta +b_k|X)\right) . \end{aligned}$$

Under assumptions (C1)-(C4), \(\Gamma _1(\theta ,G_C)\) is then observed to be continuous and of full rank at \(\theta ^*\). Hence, condition (C.2) is satisfied.

For condition (C.3), define first for all \(\theta \in \Theta _\delta\) the functional derivative of \(M(\theta ,G)\) at \(G_C\) in the direction \([G-G_C]\) as

$$\begin{aligned} \Gamma _2(\theta ,G_C)= & {} \lim _{\eta \rightarrow 0} \frac{1}{\eta } \left[ M(\theta ,G_C+\eta (G-G_C))-M(\theta ,G_C) \right] \\= & {} \left( \begin{array}{c} \sum _{k=1}^q (1-\tau _k) E\left[ X\left( G_C(X^T\beta +b_k |X)-G(X^T\beta +b_k|X)\right) \right] \\ (1-\tau _1) E\left[ G_C(X^T\beta +b_1 |X)-G(X^T\beta +b_1|X) \right] \\ \vdots \\ (1-\tau _q) E\left[ G_C(X^T\beta +b_q |X)-G(X^T\beta +b_q|X) \right] \\ \end{array} \right) , \end{aligned}$$

Observe that for all \((\theta ,G) \in \Theta _\delta \times \mathscr {G}_\delta\), \(M(\theta ,G)\) is linear in G since

$$\begin{aligned} M(\theta ,G)-M(\theta ,G_C)- \Gamma _2(\theta ,G_C)[G-G_C]= & {} 0. \end{aligned}$$

This verifies the first part of (C.3). For the second part, we have to show that

$$\begin{aligned} \left\| \Gamma _2(\theta ,G_C)[\hat{G}-G_C] - \Gamma _2(\theta ^*,G_C)[\hat{G}-G_C] \right\|= & {} O_p(n^{-1/2}). \end{aligned}$$

To prove this, By Assumption (C7), we have uniformly in \(\theta \in \Theta _\delta\) that

$$\begin{aligned} \Gamma _2(\theta ,G_C)[\hat{G}-G_C] = n^{-1} \sum _{i=1}^n \phi (Y_i,\delta _i,\theta ,X_i)+o_p(n^{-1/2}), \end{aligned}$$

where

$$\begin{aligned} \phi (Y_i,\delta _i,\theta ,X_i)= & {} \left( \begin{array}{c} -\sum _{k=1}^q (1-\tau _k) X_i \xi (Y_i,\delta _i,X_i^T\beta +b_k|X_i) \\ -(1-\tau _1)\xi (Y_i,\delta _i,X_i^T\beta +b_1|X_i) \\ \vdots \\ -(1-\tau _q)\xi (Y_i,\delta _i,X_i^T\beta +b_q|X_i) \\ \end{array} \right) , \end{aligned}$$

By the central limit theorem,

$$\begin{aligned} n^{1/2} \Gamma _2(\theta ,G_C)[\hat{G}-G_C] {\mathop {\longrightarrow }\limits ^{d}}N(0,V) \end{aligned}$$

where V is finite under Assumptions (C1) and (C8). This implies that condition (C.3) in Birke et al. (2017) is satisfied. Next, condition (C.4) in Birke et al. (2017) is satisfied by Assumption (C6). To establish that condition (C.5) in Birke et al. (2017) holds as well, we need to verify the conditions (3.1-3.3) of Theorem 3 in Chen et al. (2003).

Let \(m=m_c+m_{lc}\), where

$$\begin{aligned} m_c(Y_i,X_i,\theta ,G)\,= & {} \left( \begin{array}{c} \sum _{k=1}^q X_i (1-\tau _k)(1-G(X_i^T\beta +b_k|X_i)) \\ (1-\tau _1) (1-G(X_i^T\beta +b_1|X_i)) \\ \vdots \\ (1-\tau _q)(1-G(X_i^T\beta +b_q|X_i)) \\ \end{array} \right) , \end{aligned}$$

and

$$\begin{aligned} m_{lc}(Y_i,X_i,\theta ,G )\,= & {} \left( \begin{array}{c} -\sum _{k=1}^q X_i I(Y_i> X_i^T\beta +b_k ) \\ -I(Y_i> X_i^T\beta +b_1 ) \\ \vdots \\ -I(Y_i> X_i^T\beta +b_q ) \\ \end{array} \right) , \end{aligned}$$

Then, condition (3.1) is easily observed to hold for some \(s_j\), \(s_{1j} \in (0,1]\) and \(r=2\) under Assumption (C2). For condition (3.2), as \(m_{lc}\) does not depend here on G, we first note via the proof of Theorem 3 in Chen et al. (2003) that the constant \(s_j\) controlling for the regularity of the nuisance parameter and appearing in condition (3.2) may in fact be replaced by the constant \(s_{1j}\) already appearing in condition (3.1). Therefore, we will verify condition (3.2) with respect to \(s_{1j}\) instead of \(s_j\) as initially stated in Chen et al. (2003). To that end, first it can be observed that for all positive values \(\epsilon _n=o(1)\).

$$\begin{aligned}{} & {} \sup _{\theta ' : \Vert \theta -\theta '\Vert \le \epsilon _n} \Bigr |I(Y>X^T\beta '+b_{k}' )-I(Y>X^T\beta +b_{k} )\Bigr | \\\le & {} I(Y>X^T\beta +b_{k} - \epsilon _n \Vert X\Vert )-I(Y>X^T\beta ^*+b_{k}^*+ \epsilon _n \Vert X\Vert ). \end{aligned}$$

Hence, we have that

$$\begin{aligned}{} & {} E\left[ \sup _{\theta ' : \Vert \theta -\theta '\Vert \le \epsilon _n} \left\| X\left( I(Y>X^T\beta '+b_{k}')-I(Y>X^T\beta +b_{k})\right) \right\| ^2\right] \\\le\, & {} E\left[ \Vert X\Vert ^2\left\{ I(Y>X^T\beta +b_{k}-\epsilon _n \Vert X\Vert )-I(Y>X^T\beta +b_{k}+ \epsilon _n \Vert X\Vert )\right\} \right] \\=\, & {} E\left[ \Vert X\Vert ^2 \{H(X^T\beta +b_k+ \epsilon _n|X)-H(X^T\beta +b_k-\epsilon _n|X)\} \right] \\\le\, & {} E\left[ \Vert X\Vert ^3 K_4 \epsilon _n\right] , \end{aligned}$$

and

$$\begin{aligned} E\left[ \sup _{\theta ': \Vert \theta -\theta '\Vert \le \epsilon _n} \Bigr | I(Y>X^T\beta '+b_{k}' )-I(Y>X^T\beta +b_{k}) \Bigr |^2\right] < E\left[ \Vert X\Vert K_4 \epsilon _n\right] \end{aligned}$$

for some finite constant \(K_4\) under assumptions (C3) and (C4). Hence, provided Assumption (C2) is satisfied, we may observe that condition (3.2) holds for \(s_{1j} = 1/2\). For the last condition of Theorem 3 in Chen et al. (2003), for \(\epsilon >0\) denote first by \(N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G})\) the the covering number (Van der Vaart and Wellner (1996, p. 83)) of the class \(\mathscr {G}_\delta\) under the sup-norm metric we consider on the latter with a slight abuse of notation. Now, keeping in mind that \(N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G}) \le N_{[]}(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G})\), and since all the functions in the class \(\mathscr {G}_\delta\) have values between 0 and 1 by (C4), we first observe that only one \(\epsilon\)-bracket suffices to cover \(\mathscr {G}_\delta\) if \(\epsilon >1\). Then, using Lemma 6.1 in Lopez (2011) for a bound on the bracketing number for the case \(\epsilon \le 1\), we have that

$$\begin{aligned} \int _0^\infty \sqrt{\log N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G}) }d\epsilon\le & {} \int _0^1 \sqrt{\log N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G}) }d\epsilon \\\le\, & {} K_5 \int _{0}^1 \epsilon ^{-\frac{1}{1+\eta }}d\epsilon \\<\, & {} \infty , \end{aligned}$$

for some finite constant \(K_5\), hereby satisfying condition (3.3) in Chen et al. (2003) for \(s_j=1\). It then follows from their Theorem 3 that condition (C.5) in Birke et al. (2017) holds in our context.

Lastly, for condition (C.6) we need to establish that

$$\begin{aligned} n^{1/2}\left\{ M_n(\theta ^*,G_C)+\Gamma _2(\theta ^*,G_C)[\hat{G}_C-G_C]\right\} {\mathop {\longrightarrow }\limits ^{d}}N(0,\varSigma ) \end{aligned}$$

for some positive definite matrix \(\varSigma\). Recalling that

$$\begin{aligned} M_n(\theta ^*,G_C)= & {} n^{-1} \sum _{i=1}^n m(Y_i,X_i,\theta ^*,G_C) \end{aligned}$$

is the average of independent random vectors with mean 0, this follows easily using the same arguments as for the verification of condition (C.3) for the particular case of \(\theta =\theta ^*\), Hence, we obtain

$$\begin{aligned} n^{1/2}\left\{ M_n(\theta ^*,G_C)+\Gamma _2(\theta ^*,G_C)[\hat{G}_C-G_C]\right\} {\mathop {\longrightarrow }\limits ^{d}}N(0,\varSigma ), \end{aligned}$$

where \(\varSigma =Cov(\varLambda _i)\) with

$$\begin{aligned} \varLambda _i=m(Y_i,X_i,\theta ^*,G_C)- \phi (Y_i,\delta _i,\theta ^*,X_i) . \end{aligned}$$
(A.1)

Theorem 2 then follows directly from an application of Proposition 2 in Birke et al. (2017),

$$\begin{aligned} n^{1/2}(\hat{\theta }-\theta ^*){\mathop {\longrightarrow }\limits ^{d}}N(0, \Gamma _1^{-1}(\theta ^*,G_C) \varSigma \Gamma _1^{-1}(\theta ^*,G_C)). \end{aligned}$$

Let \(\varSigma _1\) be the top left-hand \(p \times p\) submatrix of \(\varSigma\), then we have

$$\begin{aligned} n^{1/2} (\hat{\beta }-\beta ^*){\mathop {\longrightarrow }\limits ^{d}}N(0, V), \end{aligned}$$

where \(V=\Gamma _{1,11}^{-1} (\theta ^*,G_C)\varSigma _1\Gamma _{1,11}^{-1}(\theta ^*,G_C)\).

Lemma A.3

Suppose Assumptions (C1)-(C9) in the Appendix hold, Let

$$\begin{aligned} L(\vartheta )= \sum _{i=1}^n\sum _{k=1}^q\left\{ \rho _{\tau _k} (Y_i-b_k-Z_{ki}^Td)-(1-\tau _k) \int _{0}^{b_k+Z_{ki}^Td} \hat{G}_C(s)ds\right\} , \end{aligned}$$

Denote \(\hat{\vartheta }=\arg \min L(\vartheta )\), as \(n\rightarrow \infty\), we have

$$\begin{aligned} n^{1/2} ( \breve{\vartheta }-\vartheta ^*)&{\mathop {\longrightarrow }\limits ^{d}}&N(0,\Gamma _2^{-1} \varSigma _2 \Gamma _2^{-1} ), \end{aligned}$$

where

$$\begin{aligned} \Gamma _2= & {} \left( \begin{array}{cc} \Gamma _{2,11} &{} 0 \\ 0 &{} \text{ diag }\{ E(a_k)\}_{1\le k \le q} \end{array}\right) , \Gamma _{2,11} = E_X \left( Z_{ki}Z_{ki}^T\sum _{k=1}^q a_k \right) , \end{aligned}$$
$$\begin{aligned} a_k\,= \,& {} f_{T|X}(b_k^*+Z_{ki}^Td^*|X) \left( 1-G_C(b_k^*+Z_{ki}^Td^*|X)\right) \\{} & {} + g_{C}(b_k^*+Z_{ki}^Td^*|X) \left( \tau _k-F_{Y|X}(b_k^*+Z_{ki}^Td^*|X)\right) , \end{aligned}$$

and \(\varSigma _2=Cov(\varLambda _i)\), and \(\varLambda _i\) has the similar form in (A.1).

The proof of Lemma A.A.3 Similar to the proof of Theorem 2.

The proof of Theorem 3 We first prove the sparsity. The parameter \(\vartheta\) can be decomposed as \((\vartheta _{A_1}^T,\vartheta _{A_0}^T)^T\). Denote the estimator as \((\hat{\vartheta }_{FAL,A_1}^T,\hat{\vartheta }_{FAL,A_0}^T)^T\) and the true value as \((\vartheta ^{*T}_{A_1},0^T)^T\). Suppose there exists a \(l \in A_0\) such that \(\hat{\vartheta }_{FAL,l}\ne 0\). Let \(\tilde{\vartheta }\) be a vector constructed by replacing \(\hat{\vartheta }_{FAL,l}\) with 0 in \(\hat{\vartheta }\).

$$\begin{aligned} L_{FAL}(\hat{\vartheta }_{FAL})-L_{FAL}(\tilde{\vartheta })= & {} \{L(\hat{\vartheta }_{FAL})-L(\tilde{\vartheta })\}+n\lambda _n\tilde{w}_{l}|\hat{\vartheta }_{FAL,l}| \nonumber \\= & {} \sum _{i=1}^n\sum _{k=1}^q \{\rho _{\tau _k} (Y_i-\hat{b}_k-Z_{ki}^T\hat{d})-\rho _{\tau _k} (Y_i-\tilde{b}_k-Z_{ki}^T\tilde{d})\} \nonumber \\{} & {} \sum _{i=1}^n\sum _{k=1}^q(1-\tau _k)\left\{ \int _{0}^{\tilde{b}_k+Z_{ki}^T\tilde{d}} \hat{G}_C(s)ds-\int _{0}^{\hat{b}_k+Z_{ki}^T\hat{d}} \hat{G}_C(s)ds\right\} \nonumber \\{} & {} +n\lambda _n\tilde{w}_{l}|\hat{\vartheta }_{FAL,l}| \nonumber \\\ge & {} \sum _{i=1}^n\sum _{k=1}^q \{\rho _{\tau _k} (Y_i-\hat{b}_k-Z_{ki}^T\hat{d})-\rho _{\tau _k} (Y_i-\tilde{b}_k-Z_{ki}^T\tilde{d})\} \nonumber \\{} & {} - \sum _{k=1}^q(1-\tau _k) n|\hat{\vartheta }_{FAL,l}| +n\lambda _n\tilde{w}_{l}|\hat{\vartheta }_{FAL,l}|. \end{aligned}$$

Note that for any \(\tau \in (0,1)\),

$$\begin{aligned} |\rho _\tau (a)-\rho _\tau (b)|\le |a-b|\max \{\tau ,1-\tau \}\le |a-b|. \end{aligned}$$

Then, we have

$$\begin{aligned}{} & {} L_{FAL}(\hat{\vartheta }_{FAL})-L_{FAL}(\tilde{\vartheta }) \nonumber \\\ge & {} \sum _{i=1}^n\sum _{k=1}^q \Vert Z_{ki}\Vert |\hat{\vartheta }_{FAL,l}| - \sum _{k=1}^q(1-\tau _k)n |\hat{\vartheta }_{FAL,l}| +\lambda _n\tilde{w}_{l}n|\hat{\vartheta }_{FAL,l}|>0, \end{aligned}$$
(A.2)

where the last inequality holds as \(\sum _{i=1}^n\sum _{k=1}^q \Vert Z_{ki}\Vert |=O_p(n)\) and \(\lambda _n\tilde{w}_{l}\ge C_1 n^{r/2}\lambda _n\rightarrow \infty\). Then (A.2) contradicts the fact that \(L_{FAL}(\hat{\vartheta }_{FAL})\le L_{FAL}(\tilde{\vartheta })\). (a) is thus proved.

We next prove (b). By Theorem 3(a), as \(n\rightarrow \infty\), \(P(\hat{\vartheta }_{A_0}=0)\rightarrow 0\). Let \(\bar{A}_1=\{j: d^*_j \ne 0\}\). The objective function \(L_{AL}(\vartheta )\) is asymptotically equivalent to \(L_{AL}(\vartheta _{A_1})\)

$$\begin{aligned} L_{FAL}(\vartheta _{A_1})= & {} \sum _{i=1}^n\sum _{k=1}^q\left\{ \rho _{\tau _k} (Y_i-b_k-Z_{ki,\bar{A}_1}^Td_{\bar{A}_1})-(1-\tau _k) \int _{0}^{b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}} \hat{G}_C(s)ds\right\} \\{} & {} +n\lambda _n \sum _{j\in \bar{A}_1} w_{l}|d_{l}|. \end{aligned}$$

Define the negative subgradient function of \(L_{AL}(\vartheta _{A_1})\) with respect to \(\vartheta _{A_1}\) as

$$\begin{aligned} M_{AL}(\vartheta _{A_1},\hat{G}_C)=n^{-1} \sum _{i=1}^n m_{AL}(Y_i,X_i,\vartheta _{A_1},\hat{G}_C), \end{aligned}$$

where

$$\begin{aligned}{} & {} m_{AL}(Y_i,X_i,\vartheta _{A_1},\hat{G}_C)= \left( \begin{array}{c} m_{AL,1}(Y_i,X_i,\vartheta _{A_1},{G}_C) \\ m_{AL,2}(Y_i,X_i,\vartheta _{A_1},{G}_C) \end{array}\right) ,\\{} & {} m_{AL,1}(Y_i,X_i,\vartheta _{A_1},{G}_C) \\= & {} Z_{ki,\bar{A}_1} \sum _{k=1}^q\left\{ (1-\tau _k)(1-{G}_C(b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}|X_i))-I(Y_i>b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}) \right\} \\{} & {} -n\lambda _n \sum _{j\in \bar{A}_1} w_{l} \text{ sgn }(d_{l}),\\{} & {} m_{AL,2}(Y_i,X_i,\vartheta _{A_1},{G}_C)= \left( \begin{array}{c} m_{AL,21}(Y_i,X_i,\vartheta _{A_1},{G}_C)\\ \vdots \\ m_{AL,2q}(Y_i,X_i,\vartheta _{A_1},{G}_C) \end{array}\right) , \end{aligned}$$

and \(m_{AL,2k}(Y_i,X_i,\vartheta _{A_1},{G}_C)=(1-\tau _k)\left\{ 1-{G}(b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}|X_i))-I(Y_i>b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1})\right\}\) for \(k=1,\cdots ,q.\) For \(l\in \bar{A}_1\), we have \(\hat{d}_l \ne 0\) and \(w_{l}=|\breve{d}_l|^{-r}\) is bounded. Since \(n^{1/n}\lambda _n\rightarrow 0\), we have \(\lambda _n \sum _{j\in \bar{A}_1} w_{l}|d_{l}|=o_p(n^{-1/2})\). Similar to the proof of Theorem 2, we can establish the asymptotic normality of the \(\hat{\vartheta }_{A1}\). \(n^{1/2}(\hat{\vartheta }_{A_1}-\vartheta ^*_{A_1}){\mathop {\longrightarrow }\limits ^{d}}N(0, \varSigma _{A_1})\), with \(\varSigma _{A_1}=\left( \Gamma _2^{-1} \varSigma _2 \Gamma _2^{-1}\right) _{A_1\times A_1}\), \(\varSigma _{A_1}\) is as defined in Lemma A. A.3 associated with the index set \(A_1\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, X., Zhang, X., Guo, W. et al. An adapted loss function for composite quantile regression with censored data. Comput Stat 39, 1371–1401 (2024). https://doi.org/10.1007/s00180-023-01352-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-023-01352-6

Keywords

Navigation