An adapted loss function for composite quantile regression with censored data

Yuan, Xiaohui; Zhang, Xinran; Guo, Wei; Hu, Qian

doi:10.1007/s00180-023-01352-6

An adapted loss function for composite quantile regression with censored data

Original paper
Published: 22 May 2023

Volume 39, pages 1371–1401, (2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Xiaohui Yuan ORCID: orcid.org/0000-0002-1416-4511¹,
Xinran Zhang¹,
Wei Guo¹ &
…
Qian Hu¹

193 Accesses
1 Citation
Explore all metrics

Abstract

This paper investigates an adapted loss function for the estimation of a linear regression with right censored responses. The adapted loss function could be used in composite quantile regression, which is a good method to handle the responses with high censored rate. Under some regular conditions, we establish the consistency and asymptotic normality of the resulting estimator. For estimation of regression parameters, we propose the MMCD algorithm, which generates satisfactory results for the proposed estimator. In addition, the algorithm can also be extended to the fused adaptive lasso penalized method to identify the interquantile commonality. The finite sample performances of the methods are further illustrated by numerical results and the analysis of two real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical likelihood method for quantile regression models with censored data

Article 05 June 2020

Simultaneous estimation for non-crossing multiple quantile regression with right censored data

Article 10 June 2014

Regularized linear censored quantile regression

Article 16 November 2021

References

Bang H, Tsiatis AA (2002) Median regression with censored cost data. Biometrics 58(3):643–649
Article MathSciNet Google Scholar
Birke M, Van Bellegem S, Van Keilegom I (2017) Semi-parametric estimation in a single-index model with endogenous variables. Scand J Stat 44(1):168–191
Article MathSciNet Google Scholar
Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71(5):1591–1608
Article MathSciNet Google Scholar
Delsol L, Van Keilegom I (2020) Semiparametric M-estimation with non-smooth criterion functions. Ann Inst Stat Math 72(2):577–605
Article MathSciNet Google Scholar
De Backer M, Ghouch AE, Van Keilegom I (2019) An adapted loss function for censored quantile regression. J Am Stat Assoc 114(527):1126–1137
Article MathSciNet Google Scholar
Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Graph Stat 9(1):60–77
MathSciNet Google Scholar
Hyde J (1980) Testing survival with incomplete observations, Biostatistics casebook, pp 31–46
Jiang L, Wang HJ, Bondell HD (2013) Interquantile shrinkage in regression models. J Comput Graph Stat 22(4):970–986
Article MathSciNet Google Scholar
Jiang L, Bondell HD, Wang HJ (2014) Interquantile shrinkage and variable selection in quantile regression. Comput Stat Data An 69:208–219
Article MathSciNet Google Scholar
Jiang R, Qian W, Zhou Z (2012) Variable selection and coefficient estimation via composite quantile regression with randomly censored data. Stat Probabil Lett 82(2):308–317
Article MathSciNet Google Scholar
Jiang R, Hu X, Yu K (2018) Composite quantile regression for massive datasets. Statistics 52(5):980–1004
Article MathSciNet Google Scholar
Koenker R (2015) Quantile regression. Cambridge University Press, New York
Google Scholar
Koenker R, Bilias Y (2002) Quantile regression for duration data: A reappraisal of the Pennsylvania reemployment bonus experiments//Economic applications of quantile regression. Physica, Heidelberg
Google Scholar
Koenker R, Geling O (2011) Reappraising medfly longevity: a quantile regression survival analysis. J Am Stat Assoc 96(454):458–468
Article MathSciNet Google Scholar
Leng C, Tong X (2013) A quantile regression estimator for censored data. Bernoulli 19(1):344–361
Article MathSciNet Google Scholar
Li KC, Wang JL, Chen CH (1999) Dimension reduction for censored regression data. Ann Stat 27:1–23
Article MathSciNet Google Scholar
Lopez O (2011) Nonparametric estimation of the multivariate distribution function in a censored regression model with applications. Commun Stat-Theor M 40(15):2639–2660
Article MathSciNet Google Scholar
Pohar M, Stare J (2006) Relative survival analysis in R. Comput Methods Programs Biomed 81(3):272–278
Article Google Scholar
Portnoy S (2003) Censored regression quantiles. J Am Stat Assoc 98(464):1001–1012
Article MathSciNet Google Scholar
Powell J (1986) Censored regression quantiles. J Econom 32:143–155
Article MathSciNet Google Scholar
Stigler S (1984) Boscovich, Simpson and a 1760 manuscript note on fitting a linear relation. Biometrika 71:615–620
Article MathSciNet Google Scholar
Sun J, Ma Y (2017) Empirical likelihood weighted composite quantile regression with partially missing covariates. J Nonparametr Stat 29(1):137–150
Article MathSciNet Google Scholar
Tang Y, Wang HJ (2015) Penalized regression across multiple quantiles under random censoring. J Multivar Anal 141:132–146
Article MathSciNet Google Scholar
Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York, NY
Book Google Scholar
Wang HJ, Zhou J, Li Y (2013) Variable selection for censored quantile regresion. Stat Sin 23(1):145–167
MathSciNet Google Scholar
Wang HJ, Wang L (2009) Locally weighted censored quantile regression. J Am Stat Assoc 104(487):1117–1128
Article MathSciNet Google Scholar
Wey A, Wang L, Rudser K (2014) Censored quantile regression with recursive partitioning-based weights. Biostatistics 15(1):170–181
Article Google Scholar
Yuan X, Li Y, Dong X, Liu T (2022) Optimal subsampling for composite quantile regression in big data. Stat Pap 63(5):1649–1676
Article MathSciNet Google Scholar
Ying Z, Jung SH, Wei LJ (1995) Survival analysis with median regression models. J Am Stat Assoc 90(429):178–184
Article MathSciNet Google Scholar
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126
Article MathSciNet Google Scholar

Download references

Acknowledgements

We are grateful to the two reviewers and the associate editor for a number of constructive and helpful comments and suggestions that have clearly improved our manuscript.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Changchun University of Technology, Changchun, 130012, Jilin, China
Xiaohui Yuan, Xinran Zhang, Wei Guo & Qian Hu

Authors

Xiaohui Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xinran Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Qian Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qian Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The assumptions required for the proof of the theorem:

(C1)
$(Y_i,X_i,\delta _i),i=1,\cdots , n$ are from an i.i.d. multivariate random sample and censoring time $C_i$ is conditionally independent of the survival time $Y^*_i$ given the covariates $X_i$.
(C2)
The support $\text{ supp }(X)$ of X is contained in a compact subset of $R^p$, and Var(X) is positive definite.
(C3)
Let $f_{Y^*|X}(\cdot |x)$ denote the conditional density function of $Y^*$ given $X=x$. For $\theta$ in a neighborhood $\Theta = B \times \Delta _1 \times \cdots \times \Delta _q$ of $\theta ^*$, $\inf _{k}\inf _{\beta \in B,b_k\in \Delta _k } \inf _{x \in \text{ supp }(X)} f_{Y^*|X}(x^T\beta +b_k|x) > 0$, furthermore, $\sup _{t,x} f_{Y^*|X}(t|x)<\infty$.
(C4)
Define the (possibly infinite) time $\tau _{x}= \inf \{t: F_{Y|X}(\cdot |x)=1\}$, where $F_{Y|X}$ designates the conditional c.d.f. of Y given X. Suppose first that there exists a real number $\upsilon < \tau _{x}$ for all x. Denote next by $\mathscr {G}$ the class of functions $G(t,x): (-\infty ,\upsilon ]\times \text{ supp }(X) \rightarrow [0,1]$ of bounded variation with respect to t (uniformly in x) that have first-order partial derivatives with respect to x of bounded variation in t (uniformly in x), and bounded (uniformly in t) second-order partial derivatives with respect to x which are uniformly in t Lipschitz of order $\eta$ for some $0<\eta <1$. Suppose that $G_C \in \mathscr {G}$. $\sup _{y } \big |G_C(y|x)-G_C(y|x') \big |=O_p(\Vert x-x'\Vert )$.
(C5)
For $x \in \text{ supp }(X)$ and for $\beta \in B$, $b_k \in \Delta _k$, $k=1,\cdots , q$, the point $x^T\beta +b_k$ lies below $\upsilon$.
(C6)
$\sup _{x\in \text{ supp }(X)}\sup _{y\le \upsilon } \big |\hat{G}_C(y|x)-G_C(y|x) \big |=o_p(1)$; $P(\hat{G}\in \mathscr {G})\rightarrow 1$ as $n\rightarrow \infty$.
(C7)
$E\left[ X\left( \hat{G}_C(b_k+X^T\beta |X)-G_C(b_k+X^T\beta |X)\right) \right] =n^{-1} \sum _{i=1}^n X_i \xi (X_i,Y_i,\delta _i,\theta |X_i)+o_p(n^{-1/2})$, uniformly in $\theta \in \Theta$, where $\xi (X_i,Y_i,\delta _i,\theta |X_i)$, $i=1,\cdots , n$ are i.i.d., $E\xi (X_i,Y_i,\delta _i,\theta |X_i)=0$ and $\sup _{\theta \in \Theta }E[\Vert \xi (X_i,Y_i,\delta _i,\theta |X_i)\Vert ^2 ]<\infty$.
(C8)
The kernel function $K(\cdot )>0$ is compactly supported and Lipschitz continuous of order 1. Furthermore, $\int K(u)du=1$, $\int uK(u)du=0$, $\int K^2(u)du < \infty$. The bandwidth $h_n$ satisfies $h_n=O(n^{-\upsilon })$, for $1/4<\upsilon <1/3$.
(C9)
There exists an EDR direction $\gamma _{0j}\in R^p$ such that for any $j=1,\cdots ,D$, $\hat{\gamma }_j-\gamma _{0j}=O_p(n^{-1/2})$; $n^{-1/2}(\hat{\gamma }_j-\gamma _{0j})=n^{-1}\sum _{i=1}^n d_{ki}$ where $d_{ki}$ are independent $p-$dimensional vectors with means zero and finite variances.

Assumption (C4) defines a general class of functions embedding $G_C$ coming from the work of Lopez (2011). Assumption (C5) is required for the asymptotic properties of $\hat{\beta }$. Assumptions (C6) and (C7) are requested for the asymptotic distribution of our estimator and requires implicitly a general linear representation of $\hat{G_C}$. Assumption (C7) can be deduced from Assumption (C8) by Lemma A.1. Assumption (C9) states the $\sqrt{n}$ consistency of the estimated EDR direction and the linear presentation of $\gamma _j$, which are needed to help establish the normality of $\hat{\beta }$. These conditions can be obtained for the sliced inverse regression estimation in Li et al. (1999).

We list a preliminary lemma which is also used in the proofs of the main results.

Lemma A.1

Suppose Assumptions (C3), (C4) and (C8) hold, then uniformly in $\theta \in \Theta$,

$$\begin{aligned} E\left[ X_i\left( \hat{G}_C(X^T_i\beta +b_k|X_i)-G_C(X^T_i\beta +b_k|X_i)\right) \right] = n^{-1}\sum _{i=1}^n X_i \xi (X_i,Y_i,\delta _i,\theta |X_i)+o_p(n^{-1/2}), \end{aligned}$$

and

$$\begin{aligned} E\left[ \hat{G}_C(X^T_i\beta +b_k|X_i)-G_C(X^T_i\beta +b_k|X_i) \right] =n^{-1}\sum _{i=1}^n\xi (X_i,Y_i,\delta _i,\theta |X_i)+o_p(n^{-1/2}), \end{aligned}$$

where $\xi (X_i,Y_i,\delta _i,t|X_i)=(1-G_C(t|X_i))\left[ \int _0^{Y_i\wedge t}\frac{-d H_0(s|X_i)}{\{ 1-F_{Y|X}(s|X_i)\}^2} + \frac{(1-\delta _i)I(Y_i\le t)}{1-F_{Y|X}(Y_i|X_i)}\right]$, $H_0(t|x)=P(Y\le t,\delta =0|X=x)=\int _{0}^t (1-F_{Y|X}(s|x))dG(s|x)$.

The proof of Lemma A. A.1 See Lemma 1 in De Backer et al. (2019).

Lemma A.2

Suppose Assumptions (C1-C9) hold, then for any $D\ge 1$, we have

$$\begin{aligned} \sup _{y}\sup _x |\hat{G}_C(t)-G_C(y|X)|=\sup _{y}\sup _x |\hat{G}_C(t)-G_C(y|R)|=O_p\left( \{\log n /(n h^p_n)\}^{1/2}+h_n^v \right) . \end{aligned}$$

The proof of Lemma A. A.2 See Lemma A.1 in Wang et al. (2013).

The proof of Theorem 1 We use the result of Theorem 1 in Delsol and Van Keilegom (2020)(DVK) on the consistency of the M-estimator, which depends on conditions (A1)-(A5). We now need to verify the conditions (A1)-(A5) to prove the consistency of $\hat{\beta }$.

Condition (A1) in DVK is satisfied by construction of $\hat{\beta }$. Condition (A3) in DVK is satisfied as well provided assumption (C6). We therefore need to verify conditions (A2), (A4) and (A5).

Condition (A2) ensures the uniqueness of $\theta$. We need to verify that for any $\epsilon >0$,

$$\begin{aligned} \inf _{\Vert \theta -\theta ^*\Vert>\epsilon } E\left[ M_{ni} (\theta ^*, G_C)-M_{ni}(\theta , G_C) \right] >0. \end{aligned}$$

Using the definition of $M_{ni}$, we have

$$\begin{aligned}{} & {} \inf _{\Vert \theta -\theta ^*\Vert>\epsilon } E\left[ M_{ni} (\theta ^*, G_C)-M_{ni}(\theta , G_C) \right] \\= & {} \inf _{\Vert \theta -\theta ^*\Vert >\epsilon }\sum _{k=1}^q E\left[ \int _{X^T\beta +b_k}^{X^T\beta ^*+b_{k}^*}\left( 1(Y\ge s)-(1-\tau _k)(1-G_C(s|X)) \right) ds\right] . \end{aligned}$$

By the conditional expectation, we have

$$\begin{aligned}{} & {} E\left[ \int _{X^T\beta +b_k}^{X^T\beta ^*+b_{k}^*}\left( 1(Y\ge s)-(1-\tau _k)(1-G_C(s|X)) \right) ds\right] \\=\, & {} E \left[ \int _{X^T\beta +b_k}^{X^T\beta ^*+b_{k}^*}\left( (1-G_C(s|X))(1-F_{Y|X}(s|X)) -(1-\tau _k)(1-G_C(s|X)) \right) ds\right] . \end{aligned}$$

Therefore,

$$\begin{aligned}{} & {} \inf _{\Vert \theta -\theta ^*\Vert>\epsilon } E\left[ M_{ni} (\theta ^*, G_C)-M_{ni}(\theta , G_C) \right] \\= & {} \inf _{\Vert \theta -\theta ^*\Vert >\epsilon }\sum _{k=1}^q E\left[ \int _{X^T\beta +b_k}^{X^T\beta ^*+b_{k}^*}\left( 1-G_C(s|X))( \tau _k-F_{T|X}(s|X)) ds \right) ds\right] , \end{aligned}$$

the latter expectation is positive, hereby condition (A2) is satisfied.

Next, for (A4) to hold, it suffices by Remark 1(ii) in DVK and assumption (C3) to show that the class

$$\begin{aligned} \mathscr {F}=\left\{ (y,x)\mapsto M_{ni} (\theta , G_C), \theta \in \Theta , G\in G \right\} \end{aligned}$$

is Glivenko-Cantelli. For this, by Theorem 2.4.1 in Vaart and Wellner (1996), we need to prove that for all $\epsilon >0$, the $\epsilon$-bracketing number $N_{[]}(\epsilon ,\mathscr {F}, L_1(P))$ of the class $\mathscr {F}$ with respect to the $L_1$ probability measure on (Y, X) is finite. Let

$$\begin{aligned} \psi _{\tau _k}(X,Y,\beta ,b_k)= \rho _{\tau _k} (Y-X^T\beta -b_k)-(1-\tau ) \int _{0}^{X^T\beta +b_k} G_C(s|X)ds, \end{aligned}$$

and

$$\begin{aligned} \mathscr {F}_k=\left\{ (y,x)\mapsto \psi _{\tau _k}(X,Y,\beta ,b_k), \theta \in \Theta , b_k \in \Delta _k, G\in G \right\} , k=1,\cdots ,q, \end{aligned}$$

we have $M_{ni} (\theta , G_C)= \sum _{k=1}^q \psi _{\tau _k}(X,Y,\beta ,b_k)$. From this decomposition, it is easy to see that

$$\begin{aligned} N_{[]}(\epsilon ,\mathscr {F}, L_1(P))\le \prod _{k=1}^q N_{[]}(\epsilon ,\mathscr {F}_k, L_1(P)). \end{aligned}$$

By the proof of Theorem 3.1 in De Backer et al. (2019), we have

$$\begin{aligned} N_{[]}(\epsilon ,\mathscr {F}_k, L_1(P))=O(\exp \{K\epsilon ^{-2/(1+\eta )} \}) \end{aligned}$$

for some $K>0$. It follows that condition (A4) holds.

Lastly, for condition (A5), we need to establish that

$$\begin{aligned} \lim _{d_{\mathscr {G}}(G,G_C)\rightarrow 0} \sup _{\theta \in \Theta } \Big |E\left[ M_{ni} (\theta ,G)-M_{ni} (\theta , G_C) \right] \Big |=0. \end{aligned}$$

Note that

$$\begin{aligned}{} & {} \sup _{\theta \in \Theta } \Big |E\left[ M_{ni} (\theta ,G)-M_{ni} (\theta , G_C) \right] \Big | \\\le & {} \sum _{k=1}^q (1-\tau _k) \sup _{\beta \in B,b_k \in \Delta _k} E\left\{ \int _{0}^{X^T\beta +b_k}|G(s|X)- {G}_C(s|X) |ds\right\} . \end{aligned}$$

Under assumption (C5), this expression can be bounded above by

$$\begin{aligned}{} & {} \sum _{k=1}^q (1-\tau _k)\int _{0}^{v}\sup _{x\in \text{ supp }(X)} |G(s|x)- {G}_C(s|x) |ds \\\le & {} \sum _{k=1}^q (1-\tau _k) v \sup _{x\in \text{ supp }(X)}\sup _{y\le v} |G(y|x)- {G}_C(y|x) |, \end{aligned}$$

which converges to 0 when $d_{\mathscr {G}}(G,G_C)\rightarrow 0$, provided assumption (C4) holds. Thus condition (A5) holds.

Hence the assumptions of Theorem 1 in DVK are met. The weak consistency of $\hat{\beta }$ follows. Theorem 1 is then proved.

The proof of Theorem 2 Given that $\hat{\beta }$ is shown to be weakly consistent in Theorem 1 and that $\hat{G}_C$ is assumed by (C6) to be a uniformly consistent estimator of $G_C$, to prove that our proposed estimator is asymptotically normally distributed we now restrict the spaces $\Theta$ and $\mathscr {G}$ to shrinking neighborhoods around the true $\theta ^*$ and $G_C$. Define the spaces $\Theta _\delta =\{\theta \in \Theta :\Vert \theta - \theta ^*\Vert \le \delta _n \}$ and $\mathscr {G}_\delta =\{G\in \mathscr {G}: d(G,G_C) \le \delta _n \}$ for some $\delta _n=o(1)$.

We use the results of Proposition 2 in Birke et al. (2017) to establish the asymptotic normality of our proposed estimator. Thus we need to verify conditions (C.1)-(C.6) of Proposition 2 in Birke et al. (2017). Define

$$\begin{aligned} M_n(\theta ,G)=n^{-1} \sum _{i=1}^n m(Y_i,X_i,\theta ,G), \end{aligned}$$

where

$$\begin{aligned} m(Y_i,X_i,\theta ,G)= \left( \begin{array}{c} m_1(Y_i,X_i,\theta ,G) \\ m_{2}(Y_i,X_i,\theta ,G) \end{array}\right) , \end{aligned}$$

$m_1(Y_i,X_i,\theta ,G)=X_i\sum _{k=1}^q\left\{ (1-\tau _k)(1-G(X^T_i\beta +b_k|R_i))-I(Y_i>X^T_i\beta +b_k) \right\} ,$

$$\begin{aligned} m_2(Y_i,X_i,\theta ,G)= \left( \begin{array}{c} m_{21}(Y_i,X_i,\theta ,G)\\ \vdots \\ m_{2q}(Y_i,X_i,\theta ,G) \end{array}\right) , \end{aligned}$$

and $m_{2k}(Y_i,X_i,\theta ,G)=(1-\tau _k)\left\{ 1-G(X^T_i\beta +b_k|R_i))-I(Y_i>X^T_i\beta +b_k)\right\} , k=1,\cdots ,q.$ Furthermore, let

$$\begin{aligned} M(\theta ,G)= & {} \left( \begin{array}{c} M_1(\theta ,G) \\ M_2(\theta ,G) \end{array} \right) =\left( \begin{array}{c} E(m_1(Y_i,X_i,\theta ,G)) \\ E(m_2(Y_i,X_i,\theta ,G)) \end{array}\right) , \end{aligned}$$

where $M_j(\theta ,G)=E(m_j(Y_i,X_i,\theta ,G))$, $j=1,2$ and

$$\begin{aligned}{} & {} E[m_1(Y,X,\theta ,G)]\\=\, & {} E \left[ X\sum _{k=1}^q (1-\tau _k)\left( 1-G(X^T\beta +b_k|X)\right) \right] \\{} & {} -E \left[ X\sum _{k=1}^q \left( 1- F_{T|X}(X^T\beta +b_k|X) \right) \left( 1-G_C(X^T\beta +b_k|X)\right) \right] \\{} & {} E[m_{2k}(Y,X,\theta ,G)]=0,k=1,\cdots ,q. \end{aligned}$$

and observe that $M(\theta ^*,G_C)=0$.

We now verify the conditions of Proposition 2 in Birke et al. (2017). First, note that (C.1) trivially holds by construction of our estimator. Next, for $\theta \in \Theta _\delta$, let $\Gamma _1(\theta ,G_C)$ denote the ordinary derivative of $M(\theta ,G)$ with respect to $\theta$, that is,

$$\begin{aligned} \Gamma _1(\theta ,G_C)= & {} \left( \begin{array}{cc} \Gamma _{1,11} (\theta ,G_C) &{} 0 \\ 0 &{} \Gamma _{1,22} (\theta ,G_C) \end{array}\right) , \end{aligned}$$

where

$$\begin{aligned} \Gamma _{1,11} (\theta ,G_C)= & {} E \left[ \frac{\partial m_1(\theta ,G_C)}{\partial \beta }\right] = E \left[ XX^T\sum _{k=1}^q a_k(\theta , G_C )\right] ,\\ \Gamma _{1,22} (\theta ,G_C)= & {} \text{ diag }\left( \begin{array}{ccc} \Gamma _{1,22(1)}(\theta ,G_C),&\cdots&,\Gamma _{1,22(q)}(\theta ,G_C) \end{array}\right) ,\\ \Gamma _{1,22(k)}(\theta ,G_C)= & {} E \left[ \frac{\partial m_{2(k)}(\theta ,G_C)}{\partial b_k}\right] = E \left[ a_k(\theta , G_C )\right] , k=1,\cdots ,q, \end{aligned}$$

and

$$\begin{aligned} a_k(\theta , G_C )=\, & {} f_{T|X}(X^T\beta +b_k|X) \left( 1-G_C(X^T\beta +b_k|X)\right) \\{} & {} + g_{C}(X^T\beta +b_k|X) \left( \tau _k-F_{Y|X}(X^T\beta +b_k|X)\right) . \end{aligned}$$

Under assumptions (C1)-(C4), $\Gamma _1(\theta ,G_C)$ is then observed to be continuous and of full rank at $\theta ^*$. Hence, condition (C.2) is satisfied.

For condition (C.3), define first for all $\theta \in \Theta _\delta$ the functional derivative of $M(\theta ,G)$ at $G_C$ in the direction $[G-G_C]$ as

$$\begin{aligned} \Gamma _2(\theta ,G_C)= & {} \lim _{\eta \rightarrow 0} \frac{1}{\eta } \left[ M(\theta ,G_C+\eta (G-G_C))-M(\theta ,G_C) \right] \\= & {} \left( \begin{array}{c} \sum _{k=1}^q (1-\tau _k) E\left[ X\left( G_C(X^T\beta +b_k |X)-G(X^T\beta +b_k|X)\right) \right] \\ (1-\tau _1) E\left[ G_C(X^T\beta +b_1 |X)-G(X^T\beta +b_1|X) \right] \\ \vdots \\ (1-\tau _q) E\left[ G_C(X^T\beta +b_q |X)-G(X^T\beta +b_q|X) \right] \\ \end{array} \right) , \end{aligned}$$

Observe that for all $(\theta ,G) \in \Theta _\delta \times \mathscr {G}_\delta$, $M(\theta ,G)$ is linear in G since

$$\begin{aligned} M(\theta ,G)-M(\theta ,G_C)- \Gamma _2(\theta ,G_C)[G-G_C]= & {} 0. \end{aligned}$$

This verifies the first part of (C.3). For the second part, we have to show that

$$\begin{aligned} \left\| \Gamma _2(\theta ,G_C)[\hat{G}-G_C] - \Gamma _2(\theta ^*,G_C)[\hat{G}-G_C] \right\|= & {} O_p(n^{-1/2}). \end{aligned}$$

To prove this, By Assumption (C7), we have uniformly in $\theta \in \Theta _\delta$ that

$$\begin{aligned} \Gamma _2(\theta ,G_C)[\hat{G}-G_C] = n^{-1} \sum _{i=1}^n \phi (Y_i,\delta _i,\theta ,X_i)+o_p(n^{-1/2}), \end{aligned}$$

where

$$\begin{aligned} \phi (Y_i,\delta _i,\theta ,X_i)= & {} \left( \begin{array}{c} -\sum _{k=1}^q (1-\tau _k) X_i \xi (Y_i,\delta _i,X_i^T\beta +b_k|X_i) \\ -(1-\tau _1)\xi (Y_i,\delta _i,X_i^T\beta +b_1|X_i) \\ \vdots \\ -(1-\tau _q)\xi (Y_i,\delta _i,X_i^T\beta +b_q|X_i) \\ \end{array} \right) , \end{aligned}$$

By the central limit theorem,

$$\begin{aligned} n^{1/2} \Gamma _2(\theta ,G_C)[\hat{G}-G_C] {\mathop {\longrightarrow }\limits ^{d}}N(0,V) \end{aligned}$$

where V is finite under Assumptions (C1) and (C8). This implies that condition (C.3) in Birke et al. (2017) is satisfied. Next, condition (C.4) in Birke et al. (2017) is satisfied by Assumption (C6). To establish that condition (C.5) in Birke et al. (2017) holds as well, we need to verify the conditions (3.1-3.3) of Theorem 3 in Chen et al. (2003).

Let $m=m_c+m_{lc}$, where

$$\begin{aligned} m_c(Y_i,X_i,\theta ,G)\,= & {} \left( \begin{array}{c} \sum _{k=1}^q X_i (1-\tau _k)(1-G(X_i^T\beta +b_k|X_i)) \\ (1-\tau _1) (1-G(X_i^T\beta +b_1|X_i)) \\ \vdots \\ (1-\tau _q)(1-G(X_i^T\beta +b_q|X_i)) \\ \end{array} \right) , \end{aligned}$$

and

$$\begin{aligned} m_{lc}(Y_i,X_i,\theta ,G )\,= & {} \left( \begin{array}{c} -\sum _{k=1}^q X_i I(Y_i> X_i^T\beta +b_k ) \\ -I(Y_i> X_i^T\beta +b_1 ) \\ \vdots \\ -I(Y_i> X_i^T\beta +b_q ) \\ \end{array} \right) , \end{aligned}$$

Then, condition (3.1) is easily observed to hold for some $s_j$, $s_{1j} \in (0,1]$ and $r=2$ under Assumption (C2). For condition (3.2), as $m_{lc}$ does not depend here on G, we first note via the proof of Theorem 3 in Chen et al. (2003) that the constant $s_j$ controlling for the regularity of the nuisance parameter and appearing in condition (3.2) may in fact be replaced by the constant $s_{1j}$ already appearing in condition (3.1). Therefore, we will verify condition (3.2) with respect to $s_{1j}$ instead of $s_j$ as initially stated in Chen et al. (2003). To that end, first it can be observed that for all positive values $\epsilon _n=o(1)$.

$$\begin{aligned}{} & {} \sup _{\theta ' : \Vert \theta -\theta '\Vert \le \epsilon _n} \Bigr |I(Y>X^T\beta '+b_{k}' )-I(Y>X^T\beta +b_{k} )\Bigr | \\\le & {} I(Y>X^T\beta +b_{k} - \epsilon _n \Vert X\Vert )-I(Y>X^T\beta ^*+b_{k}^*+ \epsilon _n \Vert X\Vert ). \end{aligned}$$

Hence, we have that

$$\begin{aligned}{} & {} E\left[ \sup _{\theta ' : \Vert \theta -\theta '\Vert \le \epsilon _n} \left\| X\left( I(Y>X^T\beta '+b_{k}')-I(Y>X^T\beta +b_{k})\right) \right\| ^2\right] \\\le\, & {} E\left[ \Vert X\Vert ^2\left\{ I(Y>X^T\beta +b_{k}-\epsilon _n \Vert X\Vert )-I(Y>X^T\beta +b_{k}+ \epsilon _n \Vert X\Vert )\right\} \right] \\=\, & {} E\left[ \Vert X\Vert ^2 \{H(X^T\beta +b_k+ \epsilon _n|X)-H(X^T\beta +b_k-\epsilon _n|X)\} \right] \\\le\, & {} E\left[ \Vert X\Vert ^3 K_4 \epsilon _n\right] , \end{aligned}$$

and

$$\begin{aligned} E\left[ \sup _{\theta ': \Vert \theta -\theta '\Vert \le \epsilon _n} \Bigr | I(Y>X^T\beta '+b_{k}' )-I(Y>X^T\beta +b_{k}) \Bigr |^2\right] < E\left[ \Vert X\Vert K_4 \epsilon _n\right] \end{aligned}$$

for some finite constant $K_4$ under assumptions (C3) and (C4). Hence, provided Assumption (C2) is satisfied, we may observe that condition (3.2) holds for $s_{1j} = 1/2$. For the last condition of Theorem 3 in Chen et al. (2003), for $\epsilon >0$ denote first by $N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G})$ the the covering number (Van der Vaart and Wellner (1996, p. 83)) of the class $\mathscr {G}_\delta$ under the sup-norm metric we consider on the latter with a slight abuse of notation. Now, keeping in mind that $N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G}) \le N_{[]}(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G})$, and since all the functions in the class $\mathscr {G}_\delta$ have values between 0 and 1 by (C4), we first observe that only one $\epsilon$-bracket suffices to cover $\mathscr {G}_\delta$ if $\epsilon >1$. Then, using Lemma 6.1 in Lopez (2011) for a bound on the bracketing number for the case $\epsilon \le 1$, we have that

$$\begin{aligned} \int _0^\infty \sqrt{\log N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G}) }d\epsilon\le & {} \int _0^1 \sqrt{\log N(\epsilon ,\mathscr {G}_\delta ,\Vert \cdot \Vert _\mathscr {G}) }d\epsilon \\\le\, & {} K_5 \int _{0}^1 \epsilon ^{-\frac{1}{1+\eta }}d\epsilon \\<\, & {} \infty , \end{aligned}$$

for some finite constant $K_5$, hereby satisfying condition (3.3) in Chen et al. (2003) for $s_j=1$. It then follows from their Theorem 3 that condition (C.5) in Birke et al. (2017) holds in our context.

Lastly, for condition (C.6) we need to establish that

$$\begin{aligned} n^{1/2}\left\{ M_n(\theta ^*,G_C)+\Gamma _2(\theta ^*,G_C)[\hat{G}_C-G_C]\right\} {\mathop {\longrightarrow }\limits ^{d}}N(0,\varSigma ) \end{aligned}$$

for some positive definite matrix $\varSigma$. Recalling that

$$\begin{aligned} M_n(\theta ^*,G_C)= & {} n^{-1} \sum _{i=1}^n m(Y_i,X_i,\theta ^*,G_C) \end{aligned}$$

is the average of independent random vectors with mean 0, this follows easily using the same arguments as for the verification of condition (C.3) for the particular case of $\theta =\theta ^*$, Hence, we obtain

$$\begin{aligned} n^{1/2}\left\{ M_n(\theta ^*,G_C)+\Gamma _2(\theta ^*,G_C)[\hat{G}_C-G_C]\right\} {\mathop {\longrightarrow }\limits ^{d}}N(0,\varSigma ), \end{aligned}$$

where $\varSigma =Cov(\varLambda _i)$ with

$$\begin{aligned} \varLambda _i=m(Y_i,X_i,\theta ^*,G_C)- \phi (Y_i,\delta _i,\theta ^*,X_i) . \end{aligned}$$

(A.1)

Theorem 2 then follows directly from an application of Proposition 2 in Birke et al. (2017),

$$\begin{aligned} n^{1/2}(\hat{\theta }-\theta ^*){\mathop {\longrightarrow }\limits ^{d}}N(0, \Gamma _1^{-1}(\theta ^*,G_C) \varSigma \Gamma _1^{-1}(\theta ^*,G_C)). \end{aligned}$$

Let $\varSigma _1$ be the top left-hand $p \times p$ submatrix of $\varSigma$, then we have

$$\begin{aligned} n^{1/2} (\hat{\beta }-\beta ^*){\mathop {\longrightarrow }\limits ^{d}}N(0, V), \end{aligned}$$

where $V=\Gamma _{1,11}^{-1} (\theta ^*,G_C)\varSigma _1\Gamma _{1,11}^{-1}(\theta ^*,G_C)$.

Lemma A.3

Suppose Assumptions (C1)-(C9) in the Appendix hold, Let

$$\begin{aligned} L(\vartheta )= \sum _{i=1}^n\sum _{k=1}^q\left\{ \rho _{\tau _k} (Y_i-b_k-Z_{ki}^Td)-(1-\tau _k) \int _{0}^{b_k+Z_{ki}^Td} \hat{G}_C(s)ds\right\} , \end{aligned}$$

Denote $\hat{\vartheta }=\arg \min L(\vartheta )$, as $n\rightarrow \infty$, we have

$$\begin{aligned} n^{1/2} ( \breve{\vartheta }-\vartheta ^*)&{\mathop {\longrightarrow }\limits ^{d}}&N(0,\Gamma _2^{-1} \varSigma _2 \Gamma _2^{-1} ), \end{aligned}$$

where

$$\begin{aligned} \Gamma _2= & {} \left( \begin{array}{cc} \Gamma _{2,11} &{} 0 \\ 0 &{} \text{ diag }\{ E(a_k)\}_{1\le k \le q} \end{array}\right) , \Gamma _{2,11} = E_X \left( Z_{ki}Z_{ki}^T\sum _{k=1}^q a_k \right) , \end{aligned}$$

$$\begin{aligned} a_k\,= \,& {} f_{T|X}(b_k^*+Z_{ki}^Td^*|X) \left( 1-G_C(b_k^*+Z_{ki}^Td^*|X)\right) \\{} & {} + g_{C}(b_k^*+Z_{ki}^Td^*|X) \left( \tau _k-F_{Y|X}(b_k^*+Z_{ki}^Td^*|X)\right) , \end{aligned}$$

and $\varSigma _2=Cov(\varLambda _i)$, and $\varLambda _i$ has the similar form in (A.1).

The proof of Lemma A.A.3 Similar to the proof of Theorem 2.

The proof of Theorem 3 We first prove the sparsity. The parameter $\vartheta$ can be decomposed as $(\vartheta _{A_1}^T,\vartheta _{A_0}^T)^T$. Denote the estimator as $(\hat{\vartheta }_{FAL,A_1}^T,\hat{\vartheta }_{FAL,A_0}^T)^T$ and the true value as $(\vartheta ^{*T}_{A_1},0^T)^T$. Suppose there exists a $l \in A_0$ such that $\hat{\vartheta }_{FAL,l}\ne 0$. Let $\tilde{\vartheta }$ be a vector constructed by replacing $\hat{\vartheta }_{FAL,l}$ with 0 in $\hat{\vartheta }$.

$$\begin{aligned} L_{FAL}(\hat{\vartheta }_{FAL})-L_{FAL}(\tilde{\vartheta })= & {} \{L(\hat{\vartheta }_{FAL})-L(\tilde{\vartheta })\}+n\lambda _n\tilde{w}_{l}|\hat{\vartheta }_{FAL,l}| \nonumber \\= & {} \sum _{i=1}^n\sum _{k=1}^q \{\rho _{\tau _k} (Y_i-\hat{b}_k-Z_{ki}^T\hat{d})-\rho _{\tau _k} (Y_i-\tilde{b}_k-Z_{ki}^T\tilde{d})\} \nonumber \\{} & {} \sum _{i=1}^n\sum _{k=1}^q(1-\tau _k)\left\{ \int _{0}^{\tilde{b}_k+Z_{ki}^T\tilde{d}} \hat{G}_C(s)ds-\int _{0}^{\hat{b}_k+Z_{ki}^T\hat{d}} \hat{G}_C(s)ds\right\} \nonumber \\{} & {} +n\lambda _n\tilde{w}_{l}|\hat{\vartheta }_{FAL,l}| \nonumber \\\ge & {} \sum _{i=1}^n\sum _{k=1}^q \{\rho _{\tau _k} (Y_i-\hat{b}_k-Z_{ki}^T\hat{d})-\rho _{\tau _k} (Y_i-\tilde{b}_k-Z_{ki}^T\tilde{d})\} \nonumber \\{} & {} - \sum _{k=1}^q(1-\tau _k) n|\hat{\vartheta }_{FAL,l}| +n\lambda _n\tilde{w}_{l}|\hat{\vartheta }_{FAL,l}|. \end{aligned}$$

Note that for any $\tau \in (0,1)$,

$$\begin{aligned} |\rho _\tau (a)-\rho _\tau (b)|\le |a-b|\max \{\tau ,1-\tau \}\le |a-b|. \end{aligned}$$

Then, we have

$$\begin{aligned}{} & {} L_{FAL}(\hat{\vartheta }_{FAL})-L_{FAL}(\tilde{\vartheta }) \nonumber \\\ge & {} \sum _{i=1}^n\sum _{k=1}^q \Vert Z_{ki}\Vert |\hat{\vartheta }_{FAL,l}| - \sum _{k=1}^q(1-\tau _k)n |\hat{\vartheta }_{FAL,l}| +\lambda _n\tilde{w}_{l}n|\hat{\vartheta }_{FAL,l}|>0, \end{aligned}$$

(A.2)

where the last inequality holds as $\sum _{i=1}^n\sum _{k=1}^q \Vert Z_{ki}\Vert |=O_p(n)$ and $\lambda _n\tilde{w}_{l}\ge C_1 n^{r/2}\lambda _n\rightarrow \infty$. Then (A.2) contradicts the fact that $L_{FAL}(\hat{\vartheta }_{FAL})\le L_{FAL}(\tilde{\vartheta })$. (a) is thus proved.

We next prove (b). By Theorem 3(a), as $n\rightarrow \infty$, $P(\hat{\vartheta }_{A_0}=0)\rightarrow 0$. Let $\bar{A}_1=\{j: d^*_j \ne 0\}$. The objective function $L_{AL}(\vartheta )$ is asymptotically equivalent to $L_{AL}(\vartheta _{A_1})$

$$\begin{aligned} L_{FAL}(\vartheta _{A_1})= & {} \sum _{i=1}^n\sum _{k=1}^q\left\{ \rho _{\tau _k} (Y_i-b_k-Z_{ki,\bar{A}_1}^Td_{\bar{A}_1})-(1-\tau _k) \int _{0}^{b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}} \hat{G}_C(s)ds\right\} \\{} & {} +n\lambda _n \sum _{j\in \bar{A}_1} w_{l}|d_{l}|. \end{aligned}$$

Define the negative subgradient function of $L_{AL}(\vartheta _{A_1})$ with respect to $\vartheta _{A_1}$ as

$$\begin{aligned} M_{AL}(\vartheta _{A_1},\hat{G}_C)=n^{-1} \sum _{i=1}^n m_{AL}(Y_i,X_i,\vartheta _{A_1},\hat{G}_C), \end{aligned}$$

where

$$\begin{aligned}{} & {} m_{AL}(Y_i,X_i,\vartheta _{A_1},\hat{G}_C)= \left( \begin{array}{c} m_{AL,1}(Y_i,X_i,\vartheta _{A_1},{G}_C) \\ m_{AL,2}(Y_i,X_i,\vartheta _{A_1},{G}_C) \end{array}\right) ,\\{} & {} m_{AL,1}(Y_i,X_i,\vartheta _{A_1},{G}_C) \\= & {} Z_{ki,\bar{A}_1} \sum _{k=1}^q\left\{ (1-\tau _k)(1-{G}_C(b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}|X_i))-I(Y_i>b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}) \right\} \\{} & {} -n\lambda _n \sum _{j\in \bar{A}_1} w_{l} \text{ sgn }(d_{l}),\\{} & {} m_{AL,2}(Y_i,X_i,\vartheta _{A_1},{G}_C)= \left( \begin{array}{c} m_{AL,21}(Y_i,X_i,\vartheta _{A_1},{G}_C)\\ \vdots \\ m_{AL,2q}(Y_i,X_i,\vartheta _{A_1},{G}_C) \end{array}\right) , \end{aligned}$$

and $m_{AL,2k}(Y_i,X_i,\vartheta _{A_1},{G}_C)=(1-\tau _k)\left\{ 1-{G}(b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1}|X_i))-I(Y_i>b_k+Z_{ki,\bar{A}_1}^Td_{\bar{A}_1})\right\}$ for $k=1,\cdots ,q.$ For $l\in \bar{A}_1$, we have $\hat{d}_l \ne 0$ and $w_{l}=|\breve{d}_l|^{-r}$ is bounded. Since $n^{1/n}\lambda _n\rightarrow 0$, we have $\lambda _n \sum _{j\in \bar{A}_1} w_{l}|d_{l}|=o_p(n^{-1/2})$. Similar to the proof of Theorem 2, we can establish the asymptotic normality of the $\hat{\vartheta }_{A1}$. $n^{1/2}(\hat{\vartheta }_{A_1}-\vartheta ^*_{A_1}){\mathop {\longrightarrow }\limits ^{d}}N(0, \varSigma _{A_1})$, with $\varSigma _{A_1}=\left( \Gamma _2^{-1} \varSigma _2 \Gamma _2^{-1}\right) _{A_1\times A_1}$, $\varSigma _{A_1}$ is as defined in Lemma A. A.3 associated with the index set $A_1$.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yuan, X., Zhang, X., Guo, W. et al. An adapted loss function for composite quantile regression with censored data. Comput Stat 39, 1371–1401 (2024). https://doi.org/10.1007/s00180-023-01352-6

Download citation

Received: 27 July 2022
Accepted: 12 March 2023
Published: 22 May 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00180-023-01352-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adapted loss function for composite quantile regression with censored data

Abstract

Access this article

Similar content being viewed by others

An empirical likelihood method for quantile regression models with censored data

Simultaneous estimation for non-crossing multiple quantile regression with right censored data

Regularized linear censored quantile regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Lemma A.1

Lemma A.2

Lemma A.3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An adapted loss function for composite quantile regression with censored data

Abstract

Access this article

Similar content being viewed by others

An empirical likelihood method for quantile regression models with censored data

Simultaneous estimation for non-crossing multiple quantile regression with right censored data

Regularized linear censored quantile regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Lemma A.1

Lemma A.2

Lemma A.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation