Skip to main content
Log in

Robust optimal subsampling based on weighted asymmetric least squares

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

With the development of contemporary science, a large amount of generated data includes heterogeneity and outliers in the response and/or covariates. Furthermore, subsampling is an effective method to overcome the limitation of computational resources. However, when data include heterogeneity and outliers, incorrect subsampling probabilities may select inferior subdata, and statistic inference on this subdata may have a far inferior performance. Combining the asymmetric least squares and \(L_2\) estimation, this paper proposes a double-robustness framework (DRF), which can simultaneously tackle the heterogeneity and outliers in the response and/or covariates. The Poisson subsampling is implemented based on the DRF for massive data, and a more robust probability will be derived to select the subdata. Under some regularity conditions, we establish the asymptotic properties of the subsampling estimator based on the DRF. Numerical studies and actual data demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

Download references

Acknowledgements

The authors would like to thank the Editor and two referees for the constructive suggestions that lead to a significant improvement over the article. This research is supported in part by the National Natural Science Foundation of China (12171277, 12271294, 12071248).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingqiu Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Technical details

Appendix: Technical details

Proof of Proposition 1

Direct calculation yields

$$\begin{aligned}&\int _{-\infty }^{+\infty } f(u)^{2} du\\&\quad =\int _{-\infty }^{+\infty } \frac{4 \tau (1-\tau )}{\pi \sigma ^{2}(\sqrt{\tau }+\sqrt{1-\tau })^{2}} \exp \left\{ -2 \rho _{\tau }\left( \frac{u-\mu _{\tau }}{\sigma }\right) \right\} du \\&\quad =\frac{4 \tau (1-\tau )}{\pi \sigma ^{2}(1+2 \sqrt{\tau (1-\tau )})} \int _{-\infty }^{+\infty } \exp \left\{ -2|\tau -\mathbb {1}\left( u \le \mu _{\tau }\right) |\frac{\left( u-\mu _{\tau }\right) ^{2}}{\sigma ^{2}}\right\} du\\&\quad =\frac{4 \tau (1-\tau )}{\pi \sigma ^{2}(1+2 \sqrt{\tau (1-\tau )})}\left[ \int _{\mu _{\tau }}^{+\infty } \exp \left\{ -2 \tau \left( \frac{u-u_{\tau }}{\sigma }\right) ^{2}\right\} du\right. \\&\qquad +\left. \int _{-\infty }^{\mu _{\tau }} \exp \left\{ -2(1-\tau ) \left( \frac{u-u_{\tau }}{\sigma }\right) ^{2}\right\} du\right] \\&\quad =\frac{4 \tau (1-\tau )}{\pi \sigma ^{2}(1+2 \sqrt{\tau (1-\tau )})}\times \left( \frac{\sigma \sqrt{\pi }}{2\sqrt{2\tau }}+\frac{\sigma \sqrt{\pi }}{2\sqrt{2(1-\tau )}} \right) \\&\quad =\frac{\sqrt{2(1-\tau )\tau }}{\sigma \sqrt{\pi }(\sqrt{\tau }+\sqrt{1-\tau })} \end{aligned}$$

and

$$\begin{aligned}&\int _{-\infty }^{+\infty } f(u)^{2}du-\frac{2}{N}\sum _{i=1}^{N}f(u_i)\\&\quad =\frac{\sqrt{2(1-\tau )\tau }}{\sigma \sqrt{\pi }(\sqrt{\tau }+\sqrt{1-\tau })}\\&\qquad -\frac{2}{N}\sum _{i=1}^{N}\frac{2}{\sqrt{\pi \sigma ^2}}\frac{\sqrt{\tau (1-\tau )}}{\sqrt{\tau }+\sqrt{1-\tau }}\exp \left\{ -|\tau -\mathbb {1}\left( u_{i} \le u_{\tau }\right) |\left( \frac{u_i-u_{\tau }}{\sigma } \right) ^{2}\right\} \\&\quad =\frac{1}{N}\sum _{i=1}^{N} \frac{\sqrt{2(1-\tau )\tau }}{\sigma \sqrt{\pi }(\sqrt{\tau }+\sqrt{1-\tau })}\left[ 1-2\sqrt{2}\exp \left\{ -|\tau -\mathbb {1}(u_i\le u_{\tau }) |\left( \frac{u_i-u_{\tau }}{\sigma }\right) ^2 \right\} \right] . \end{aligned}$$

The result can be derived. \(\square \)

Lemma 1

(Gu and Zou 2016) Denote \(r(v_{i}) = \rho _{\tau }(\varepsilon _{i} - v_{i}) - \rho _{\tau }(\varepsilon _{i}) + 2\varepsilon _{i}v_{i}\psi _{\tau }(\varepsilon _{i})\), \(i = 1, \ldots , N\). The asymmetric squared error loss \(\rho _{\tau }(\cdot )\) is continuously differentiable, but is not twice differentiable at zero when \(\tau \ne 0.5\). Moreover, for any \(\varepsilon _{i}\), \(v_{i} \in {\mathbb {R}}\) and \(\tau \in (0,1)\), we have

$$\begin{aligned} (\tau \wedge (1-\tau ))v_{i}^{2} \le r(v_{i}) \le (\tau \vee (1-\tau ))v_{i}^{2}, \end{aligned}$$

where \(\tau \wedge (1-\tau ) = \text {min}\{\tau , 1-\tau \}\) and \(\tau \vee (1-\tau ) = \text {max }\{\tau , 1-\tau \}\). It follows that \(\rho _{\tau }(\cdot )\) is strongly convex.

Lemma 2

(Corollary of Hjort and Pollard (2011)) Suppose \(\varvec{Z}_{n}(\varvec{d})\) is convex and can be represented as \(\frac{1}{2} \varvec{d}' \varvec{V} \varvec{d} + \varvec{W}'_{n}\varvec{d} + C_{n} + a_{n}(\varvec{d})\), where \(\varvec{V}\) is symmetric and positive definite, \(\varvec{W}_{n}\) is stochastically bounded, \(C_{n}\) is an arbitrary constant and \(a_{n}(\varvec{d})\) goes to zero in probability for each \(\varvec{d}\). Then \(\varvec{\beta }_{n} = \arg \min \varvec{Z}_{n}\) is only \(o_P(1)\) away from \(\varvec{\alpha }_{n} = -\varvec{V}^{-1}\varvec{W}_{n}\), where \(\varvec{\alpha }_{n} = \arg \min (\frac{1}{2}\varvec{d}'\varvec{V} \varvec{d} + \varvec{W}'_{n}\varvec{d} + \varvec{C}_{n})\). If \(\varvec{W}_{n} \overset{d}{\rightarrow }\ \varvec{W}\), then \(\varvec{\beta }_{n} \overset{d}{\rightarrow }\ -\varvec{V}^{-1}\varvec{W}\).

Lemma 3

If Conditions 1, 3, 4, 5 hold, as \(n, N \rightarrow \infty \), then

  1. (a)

    \(\underset{\varvec{\beta }\in \Lambda _\textrm{B}}{\textrm{sup}}|Q_n(\varvec{\beta }) - Q_N(\varvec{\beta })|\rightarrow 0\) in conditional probability for given \({\mathcal {F}}_{N}\),

  2. (b)

    \(\parallel \tilde{\varvec{\beta }} - \varvec{\beta }_{t}\parallel = o_{P}(1)\).

Proof

Direct calculation yields

$$\begin{aligned} {\mathbb {E}}\left\{ Q_n(\varvec{\beta })\mid {\mathcal {F}}_{N} \right\} = \frac{1}{N}\sum _{i=1}^{N}\frac{{\mathbb {E}}(R_i)\omega _i(\varvec{\beta }_0)\rho _{\tau }(y_i-\varvec{x}_i'\varvec{\beta })}{\pi _i}=Q_N(\varvec{\beta }). \end{aligned}$$

Since \((\varvec{x}_{i}, y_{i})\)’s are i.i.d., we have

$$\begin{aligned}&{\mathbb {E}}\left( Q_n(\varvec{\beta })-Q_N(\varvec{\beta })\mid {\mathcal {F}}_{N}\right) ^{2}\nonumber \\&\quad =\frac{1}{N^2}{\mathbb {V}}\left\{ \sum _{i=1}^{N}\frac{R_i}{\pi _i}\omega _i(\varvec{\beta }_0)\rho _{\tau }(y_i-\varvec{x}_i'\varvec{\beta })\mid {\mathcal {F}}_{N}\right\} \nonumber \\&\quad =\frac{1}{N^2}\sum _{i=1}^{N}\frac{1}{\pi _i}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })-\frac{1}{N^2}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\nonumber \\&\quad \le \underset{1\le i \le N}{\max }\left( \frac{1}{N\pi _i}\right) \left\{ \frac{1}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\right\} -\frac{1}{N^2}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\nonumber \\&\quad =O_{P}\left( \frac{1}{n}\right) \left\{ \frac{1}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\right\} -\frac{1}{N^2}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })\nonumber \\&\quad =O_{P}\left( \frac{1}{n}\right) , \end{aligned}$$
(A.1)

where the last equality is derived by

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\rho _{\tau }^2(y_i-\varvec{x}_i'\varvec{\beta })&\le \frac{1}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)(\tau \vee (1-\tau ))^2(y_i-\varvec{x}_i'\varvec{\beta })^4\\&\le \frac{2}{N}\sum _{i=1}^{N}(y_i-\varvec{x}_i'\varvec{\beta }_t)^4+\frac{2}{N}\sum _{i=1}^{N}[\varvec{x}_i'(\varvec{\beta }_t-\varvec{\beta })]^4\\&\le \frac{2}{N}\sum _{i=1}^{N}\varepsilon _i^4+\frac{2}{N}\sum _{i=1}^{N}\Vert \varvec{x}_i\Vert ^4\Vert \varvec{\beta }_t-\varvec{\beta }\Vert ^4\\&= \frac{2}{N}\sum _{i=1}^{N}\varepsilon _i^4+2({\mathbb {E}}\Vert \varvec{x}_i\Vert ^4+o_P(1))\Vert \varvec{\beta }_t-\varvec{\beta }\Vert ^4\\&=O_P(1), \end{aligned}$$

where the last equality is due to Conditions 1, 3, 4. So \({\mathbb {E}}\left\{ Q_n(\varvec{\beta })-Q_N(\varvec{\beta })\mid {\mathcal {F}}_{N} \right\} ^{2}\rightarrow 0\) as \(N\rightarrow \infty \) and \(n\rightarrow \infty \). Combining (A.1) and the Chebyshev inequality, \(Q_n(\varvec{\beta }) - Q_N(\varvec{\beta }) \rightarrow 0\) in conditional probability for given \({\mathcal {F}}_{N}\). Since \(Q_n(\varvec{\beta })\) is convex function of \(\varvec{\beta }\), by the Convexity Lemma of Pollard (1991), then \(\underset{\varvec{\beta }\in \Lambda _{\textrm{B}}}{\text {sup}}|Q_n(\varvec{\beta }) - Q_N(\varvec{\beta })|\rightarrow 0\) in conditional probability for given \({\mathcal {F}}_{N}\).

\(Q_N(\varvec{\beta })\) has a unique minimum \(\hat{\varvec{\beta }}_{f}\) by Lemma A of Newey and Powell (1987). Thus, based on Theorem 5.9 and its remark of van der Vaart (1998), we have

$$\begin{aligned} \Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_{f}\Vert = o_{P\mid {\mathcal {F}}_{N}}(1). \end{aligned}$$

Xiong and Li (2008) showed that if a sequence is bounded in conditional probability, then it is bounded in unconditional probability, so we have \(\Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_{f}\Vert = o_{P}(1)\). Newey and Powell (1987) proved that \(\Vert \hat{\varvec{\beta }}_{f} - \varvec{\beta }_{t} \Vert = o_{P}(1)\). By the triangle inequality, then

$$\begin{aligned} \Vert \tilde{\varvec{\beta }}-\varvec{\beta }_{t}\Vert \le \Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_{f}\Vert + \Vert \hat{\varvec{\beta }}_{f} - \varvec{\beta }_{t}\Vert = o_{P}(1). \end{aligned}$$

This completes the proof. \(\square \)

Lemma 4

Denote \(\varvec{L}^{*}(\varvec{\beta })=\frac{\partial Q_n(\varvec{\beta })}{\partial \varvec{\beta }}\). If Conditions 1, 3, 4, 5 hold, as \(n, N \rightarrow \infty \) then

$$\begin{aligned} \varvec{L}^{*}(\varvec{\beta }_{t})=O_{P}\left( \frac{1}{\sqrt{n}}\right) . \end{aligned}$$

Proof

For any \(\varvec{\beta } \in \Lambda _{\textrm{B}}\), direct calculation yields

$$\begin{aligned} \varvec{L}^{*}(\varvec{\beta }) =&-\frac{2}{N}\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\psi _{\tau }(y_i-\varvec{x}_i'\varvec{\beta })(y_i-\varvec{x}_i'\varvec{\beta })R_i\varvec{x}_i}{\pi _i}. \end{aligned}$$
(A.2)

Substituting \(\varvec{\beta }=\varvec{\beta }_{t}\) into (A.2), we have

$$\begin{aligned} \varvec{L}^{*}(\varvec{\beta }_t) =&-\frac{2}{N}\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)R_i\varepsilon _i\varvec{x}_i}{\pi _i}. \end{aligned}$$

Let \(L_{j_1}^{*}(\varvec{\beta }_{t})\), \(L_{j_2}^{*}(\varvec{\beta }_{t})\) be the elements of \(\varvec{L}^{*}(\varvec{\beta }_{t})\) and \(x_{ij_1}\), \(x_{ij_2}\) be the elements of \(\varvec{x}_i\), \(j_1,j_2 = 0, 1, \ldots , p\), then the conditional expectation and conditional covariance are

$$\begin{aligned}&{\mathbb {E}}(\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})=-\frac{2}{N}\sum _{i=1}^{N}\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i=O_{P}\left( \frac{1}{\sqrt{N}}\right) , \end{aligned}$$
(A.3)
$$\begin{aligned}&Cov(L_{j_1}^{*}(\varvec{\beta }_t),L_{j_2}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})\nonumber \\&\quad =\frac{4}{N^2}\sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi ^2_{\tau }(\varepsilon _i){\mathbb {V}}(R_i)\varepsilon ^2_ix_{ij_1}x_{ij_2}}{\pi _i^2}\nonumber \\&\quad =\frac{4}{N^2}\sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi ^2_{\tau }(\varepsilon _i)\varepsilon ^2_ix_{ij_1}x_{ij_2}}{\pi _i}-\frac{4}{N^2}\sum _{i=1}^{N} \omega _i^2(\varvec{\beta }_0)\psi ^2_{\tau }(\varepsilon _i)\varepsilon ^2_ix_{ij_1}x_{ij_2}\nonumber \\&\quad \le \underset{1\le i\le N}{\max }\left\{ \frac{1}{N\pi _i}\right\} \frac{4}{N}\sum _{i=1}^{N}\omega _i^2(\varvec{\beta }_0)\psi ^2_{\tau }(\varepsilon _i)\varepsilon ^2_ix_{ij_1}x_{ij_2}+O_{P}\left( \frac{1}{N}\right) \nonumber \\&\quad =O_{P}\left( \frac{1}{n}\right) , \end{aligned}$$
(A.4)

where the last equality is due to Conditions 1, 4, 5. For the (A.3), this is because

$$\begin{aligned} {\mathbb {E}}\{{\mathbb {E}}(\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})\}={\mathbb {E}}\left\{ -\frac{2}{N}\sum _{i=1}^{N}\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i\right\} =\varvec{0}, \end{aligned}$$

and

$$\begin{aligned} Cov\left\{ {\mathbb {E}}(\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})\right\}&= \frac{4}{N^2}\sum _{i=1}^{N}{\mathbb {E}}\left( \omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _i)\varepsilon _i^2\varvec{x}_i\varvec{x}_i'\right) \\&\quad -\frac{4}{N^2}\sum _{i=1}^{N}\left\{ {\mathbb {E}}\left( \omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i\right) {\mathbb {E}}\left( \omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i'\right) \right\} \\&=\frac{4}{N^2}\sum _{i=1}^{N}{\mathbb {E}}\left( \omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _i)\varepsilon _i^2\varvec{x}_i\varvec{x}_i'\right) \\&=O\left( \frac{1}{N}\right) . \end{aligned}$$

By the Chebyshev’s inequality, (A.3) can be obtained. Therefore, from (A.3), (A.4) and Chebyshev’s inequality, the result can be derived. \(\square \)

Lemma 5

Denote \(Z=\frac{1}{N}\sum _{i=1}^{N}[\rho _{\tau }(\varepsilon _{i}-v_{i})-\rho _{\tau }(\varepsilon _{i})]\). Under Conditions 1 and 3, then we are able to split Z in two functions, i.e.

$$\begin{aligned} Z \simeq -\frac{2}{N}\sum _{i=1}^{N}v_{i}\varepsilon _{i}\psi _{\tau }(\varepsilon _{i})+\frac{1}{N}\sum _{i=1}^{N}v_{i}^{2}\psi _{\tau }(\varepsilon _{i}), \end{aligned}$$

where \(v_{i}=\varvec{x}_{i}'\left( \varvec{\beta }-\varvec{\beta }_{t} \right) \), \(\varvec{\beta }\in \Lambda _{\textrm{B}}\).

Proof

From Lemma 1, then we have

$$\begin{aligned} Z&=\frac{1}{N}\sum _{i=1}^{N}[\rho _{\tau }(\varepsilon _{i}-v_{i})-\rho _{\tau }(\varepsilon _{i})]\\&=\frac{1}{N}\sum _{i=1}^{N}[-2\varepsilon _{i}v_{i}\psi _{\tau }(\varepsilon _{i})+r(v_{i})]\\&=\frac{1}{N}\sum _{i=1}^{N}[-2\varepsilon _{i}v_{i}\psi _{\tau }(\varepsilon _{i})+r(v_{i})-v_{i}^{2}\psi _{\tau }(\varepsilon _{i})+v_{i}^{2}\psi _{\tau }(\varepsilon _{i})]\\&=\frac{1}{N}\sum _{i=1}^{N}[-2v_{i}\varepsilon _{i}\psi _{\tau }(\varepsilon _{i})+v_{i}^{2}\psi _{\tau }(\varepsilon _{i})]+O_P\left( \frac{1}{N}\sum _{i=1}^{N}v_{i}^{2} \right) . \end{aligned}$$

This completes the proof. \(\square \)

Proof of Theorem 3

The first part of Theorem 3 has showed in Lemma 3. Now we will proof the second part. Let \(\varvec{\xi } = \varvec{\beta }-\varvec{\beta }_t\) and

$$\begin{aligned} \varvec{Z}(\varvec{\xi })=\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\left\{ \rho _{\tau }(\varepsilon _{i}-\varvec{x}_i'\varvec{\xi })-\rho _{\tau }(\varepsilon _{i})\right\} R_i}{N\pi _i}. \end{aligned}$$

Note that \(\varvec{Z}(\varvec{\xi })\) is convex and minimized by \(\tilde{\varvec{\beta }}-\varvec{\beta }_t\). Thus, we focus on \(\varvec{Z}(\varvec{\xi })\) when assessing the properties of \(\tilde{\varvec{\beta }}-\varvec{\beta }_t\). Denote \(\varvec{Z}_{N2}=\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _{i})R_i\varvec{x}_i\varvec{x}_i'}{N\pi _i}\), then

$$\begin{aligned} \varvec{Z}_{N2}&=\varvec{Z}_{N2}-{\mathbb {E}}(\varvec{Z}_{N2}\mid {\mathcal {F}}_{N})+{\mathbb {E}}(\varvec{Z}_{N2}\mid {\mathcal {F}}_{N}) \\&=o_{P\mid {\mathcal {F}}_{N}}(1)+\varvec{D}_N, \end{aligned}$$

where \(\varvec{Z}_{N2}-{\mathbb {E}}(\varvec{Z}_{N2}\mid {\mathcal {F}}_{N})=o_{P\mid {\mathcal {F}}_{N}}(1)\) can be derived by (A.5), (A.6) and Chebyshev’s inequality. Denote \(\varvec{Z}_{N3}=\varvec{Z}_{N2}-{\mathbb {E}}(\varvec{Z}_{N2}\mid {\mathcal {F}}_{N})\), then

$$\begin{aligned} {\mathbb {E}}(\varvec{Z}_{N3}\mid {\mathcal {F}}_{N})=\varvec{0}, \end{aligned}$$
(A.5)

and let \(Z_{N3j_1}\), \(Z_{N3j_2}\) be the elements of \(\varvec{Z}_{N3}\) and \(x_{ij_1}\), \(x_{ij_2}\) be the elements of \(\varvec{x}_i\), \(j_1,j_2 =0, 1, \ldots , p\),

$$\begin{aligned} Cov(Z_{N3j_1},Z_{N3j_2}\mid {\mathcal {F}}_{N})&\le \sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _{i})(x_{ij_1}x_{ij_2})^2\pi _i(1-\pi _i)}{N^2\pi _{i}^2}\nonumber \\&\le \sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _{i})x_{ij_1}x_{ij_2}}{N^2\pi _{i}}\nonumber \\&\le \underset{1\le i\le N}{\max }\left( \frac{1}{N\pi _i}\right) \left( \sum _{i=1}^{N}\frac{\omega _i^2(\varvec{\beta }_0)\psi _{\tau }^2(\varepsilon _{i})(x_{ij_1}x_{ij_2})^2}{N}\right) \nonumber \\&=O_P\left( \frac{1}{n}\right) . \end{aligned}$$
(A.6)

From Lemma 5, we have

$$\begin{aligned} \varvec{Z}(\varvec{\xi })&= -\varvec{\xi }'\sum _{i=1}^{N}\frac{2\omega _i(\varvec{\beta }_0)\varepsilon _{i}\psi _{\tau }(\varepsilon _{i})R_i\varvec{x}_i}{N\pi _i}+\varvec{\xi }'\sum _{i=1}^{N}\frac{ \omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _{i})\varvec{x}_i\varvec{x}_i'}{N}\varvec{\xi }+o_P(1) \\&= \varvec{\xi }'\varvec{L}^{*}(\varvec{\beta }_t)+\varvec{\xi }'\varvec{D}_N\varvec{\xi }+o_P(1). \end{aligned}$$

Since \(\varvec{Z}(\varvec{\xi })\) is convex, and from Lemma 2,

$$\begin{aligned} \tilde{\varvec{\beta }}-\varvec{\beta }_t=-{\frac{1}{2}}\varvec{D}_N^{-1}\varvec{L}^{*}(\varvec{\beta }_t)+o_P(1). \end{aligned}$$
(A.7)

By Condition 2 and Lemma 4, we have

$$\begin{aligned} \tilde{\varvec{\beta }}-\varvec{\beta }_t=O_{P\mid {\mathcal {F}}_{N}}\left( \frac{1}{\sqrt{n}}\right) . \end{aligned}$$

This completes the proof of Theorem 3. \(\square \)

Proof of Theorem 4

By Lemma 4,

$$\begin{aligned} {\mathbb {E}}\{\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N}\}=O_{P\mid {\mathcal {F}}_{N}}\left( \frac{1}{\sqrt{N}}\right) , \quad {\mathbb {V}}\{\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N}\}={ 4}\varvec{V}_{\pi }+o_{P\mid {\mathcal {F}}_{N}}(1). \end{aligned}$$
(A.8)

Now we check the Lindeberg-Feller condition. Note that

$$\begin{aligned} \varvec{L}^{*}(\varvec{\beta }_t)=-\frac{2}{N}\sum _{i=1}^{N}\frac{\omega _i(\varvec{\beta }_0)\psi _{\tau }(\varepsilon _i)R_i\varepsilon _i\varvec{x}_i}{\pi _i}:=-2\sum _{i=1}^{N}\varvec{\eta }_i. \end{aligned}$$

For every \(\epsilon >0\), we have

$$\begin{aligned} \sum _{i=1}^{N}{\mathbb {E}}\{\Vert \varvec{\eta }_i\Vert ^2\mathbb {1}(\Vert \varvec{\eta }_i\Vert >\epsilon )\mid {\mathcal {F}}_{N}\}&\le \frac{1}{\epsilon }\sum _{i=1}^{N}{\mathbb {E}}\{\Vert \varvec{\eta }_i\Vert ^3\mid {\mathcal {F}}_{N}\}\\&\le \frac{1}{\epsilon }\sum _{i=1}^{N}{\mathbb {E}}\left\{ \frac{\Vert \omega _i(\varvec{\beta }_0)R_i\psi _{\tau }(\varepsilon _i)\varepsilon _i\varvec{x}_i\Vert ^3}{N^3\pi _i^3}\mid {\mathcal {F}}_{N}\right\} \\&\le \frac{1}{\epsilon }\underset{1\le i\le N}{\max }\left\{ \frac{1}{(N\pi _i)^2}\right\} \frac{1}{N}\sum _{i=1}^{N}\Vert \omega _i(\varvec{\beta }_0)\psi (\varepsilon _i)\varepsilon _i\varvec{x}_i\Vert ^3\\&\le \frac{1}{\epsilon }O_P\left( \frac{1}{n^2}\right) =o_P(1), \end{aligned}$$

where the last inequality holds by Conditions 1, 4, 5.

Given \({\mathcal {F}}_{N}\), using (A.8) and Lindeberg-Feller central limit theorem,

$$\begin{aligned} \varvec{V}_{\pi }^{-1/2}\{\varvec{L}^{*}(\varvec{\beta }_t)-\sqrt{n}{\mathbb {E}}(\varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N})\}\overset{d}{\rightarrow } {\mathbb {N}}(\varvec{0},\varvec{I}) \end{aligned}$$
(A.9)

with \(\sqrt{n}{\mathbb {E}}\left( \varvec{L}^{*}(\varvec{\beta }_t)\mid {\mathcal {F}}_{N} \right) = O_{P}\left( \sqrt{\frac{n}{N}}\right) = o_{P}(1)\).

By Theorems 3, (A.7), (A.9) and Slutsky’s theorem, we conclude that as \(n\rightarrow \infty \), \(N\rightarrow \infty \), conditional on \({\mathcal {F}}_{N}\), with probability approaching one,

$$\begin{aligned} \{\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N\}^{-1/2}(\tilde{\varvec{\beta }}-\varvec{\beta }_t)\overset{d}{\rightarrow } {\mathbb {N}}(\varvec{0},\varvec{I}), \end{aligned}$$

where \(\varvec{D}_N\) and \(\varvec{V}_{\pi }\) are defined in (12) and (13), respectively. \(\square \)

Proof of Theorem 5

Define \(h_i^{\textrm{Aopt}}=\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert \), \(i=1,\ldots ,N\). Without loss of generality, we assume that \(h_i^{\textrm{Aopt}}>0\), for any i, and \(h_{N+1}^{\textrm{Aopt}}=+\infty \). Minimizing \(\text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)\) is sufficient to solve the following optimization problem:

$$\begin{aligned}&\min \ {\tilde{H}}:=\text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N) \\&s.t. \ \sum _{i=1}^{N}\pi _i=n, 0\le \pi _i\le 1 \ \text {for } i=1,\ldots ,N. \end{aligned}$$

Without loss of generality, we assume that \(h_1^{\textrm{Aopt}}\le h_2^{\textrm{Aopt}}\le \ldots \le h_N^{\textrm{Aopt}}\),

$$\begin{aligned} \text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)&=\frac{1}{N^2}\sum _{i=1}^{N}\left\{ \frac{1}{\pi _i}\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert ^2 \right\} \\&=\frac{1}{N^2}\sum _{i=1}^{N}\frac{1}{\pi _i}(h_i^{\textrm{Aopt}})^2\\&=\frac{1}{N^2}\frac{1}{n}\left( \sum _{i=1}^{N}\pi _i\right) \left( \sum _{i=1}^{N}\frac{1}{\pi _i}(h_i^{\textrm{Aopt}})^2\right) \\&\ge \frac{1}{N^2}\frac{1}{n}\left( \sum _{i=1}^{N}h_i^{\textrm{Aopt}}\right) ^2, \end{aligned}$$

where the last step is from the Cauchy-Schwarz inequality and the equality holds if and only if \(\pi _{i}\propto h_i^{\textrm{Aopt}}\). Now we consider two cases:

Case 1. If all \(\frac{nh_i^{\textrm{Aopt}}}{\sum _{j=1}^{N}h_j^{\textrm{Aopt}}}\le 1\), then \(\pi _i^{\textrm{Aopt}}=\frac{nh_i^{\textrm{Aopt}}}{\sum _{j=1}^{N}h_j^{\textrm{Aopt}}}\), where \(i=1,\ldots ,N\).

Case 2. Assume that exists some i such that \(\pi _i^{\textrm{Aopt}}=\frac{nh_i^{\textrm{Aopt}}}{\sum _{j=1}^{N}h_j^{\textrm{Aopt}}}>1\), by the definition of k, we know that the number of such i is k. Therefore, the original optimization turns into the following optimization problem:

$$\begin{aligned}&\min \ \frac{1}{N^2}\sum _{i=1}^{N-k}\left\{ \frac{1}{\pi _i}\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert ^2 \right\} \\&s.t. \ \sum _{i=1}^{N-k}\pi _i=n-k, 0\le \pi _i\le 1 \ \text {for } i=1,\ldots ,N-k,\ \pi _{N-k+1}=\ldots =\pi _{N}=1. \end{aligned}$$

Similar with the calculating of \(\pi _i^{\textrm{Aopt}}\) under Case 1, from the Cauchy–Schwarz inequality,

$$\begin{aligned}&\frac{1}{N^2}\sum _{i=1}^{N-k}\left\{ \frac{1}{\pi _i}\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert ^2 \right\} \\&\quad =\frac{1}{N^2}\frac{1}{(n-k)}\left( \sum _{i=1}^{N-k}\pi _i\right) \left( \sum _{i=1}^{N-k}\frac{1}{\pi _i}(h_i^{\textrm{Aopt}})^2\right) \\&\quad \ge \frac{1}{N^2}\frac{1}{(n-k)}\left( \sum _{i=1}^{N-k}h_i^{\textrm{Aopt}}\right) ^2, \end{aligned}$$

and the equality holds if and only if \(\pi _{i}\propto h_i^{\textrm{Aopt}}\), i.e. \(\pi _{i}^{\textrm{Aopt}}=\frac{(n-k)h_i^{\textrm{Aopt}}}{\sum _{j=1}^{N-k}h_j^{\textrm{Aopt}}}\), \(i=1,\ldots ,N-k\). Assume there exists \(\tilde{\textrm{M}}\) such that

$$\begin{aligned} \underset{1\le i\le N}{\max }\pi _{i}^{\textrm{Aopt}}=\underset{1\le i\le N}{\max }\frac{n(h_i^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}{\sum _{j=1}^{N}(h_j^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}=1, \end{aligned}$$

and \(h_{N-k}^{\textrm{Aopt}}\le \tilde{\textrm{M}}\le h_{N-k+1}^{\textrm{Aopt}}\), so \(\sum _{i=1}^{N-k}h_i^{\textrm{Aopt}}=(n-k)\tilde{\textrm{M}}\) holds. Thus, the set \(\{1, \ldots , N\}\) can be divided into two parts, i.e. \(\{1, \ldots , N-k\}\) and \(\{N-k+1, \ldots , N\}\), which correspond to \(\pi _{i}^{\textrm{Aopt}}=\frac{h_i^{\textrm{Aopt}}}{\tilde{\textrm{M}}}\) and \(\pi _{i}^{\textrm{Aopt}}=1\). So we have

$$\begin{aligned} \text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)&=\frac{1}{N^2}\sum _{i=1}^{N}\frac{1}{\pi _i}(h_i^{\textrm{Aopt}})^2\nonumber \\&=\frac{1}{N^2}\left\{ (n-k)\tilde{\textrm{M}}^2+\sum _{i=N-k+1}^{N}(h_i^{\textrm{Aopt}})^2\right\} . \end{aligned}$$
(A.10)

(A.10) describes that the lower bound of \(\text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)\) can be attained when the equality of the Cauchy-Schwarz inequality holds.

When \(\pi _{i}^{\textrm{Aopt}}=\frac{n(h_i^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}{\sum _{j=1}^{N}(h_j^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}\), and takes \(\pi _{i}^{\textrm{Aopt}}\) into \({\tilde{H}}\), the following equation holds:

$$\begin{aligned} {\tilde{H}}&:=\frac{1}{N^2}\sum _{i=1}^{N}\left\{ \frac{1}{\pi _{i}^{\textrm{Aopt}}}\Vert \omega _i(\varvec{\beta }_0)\varepsilon _i\psi _{\tau }(\varepsilon _i)\varvec{D}_{N}^{-1}\varvec{x}_{i}\Vert ^2 \right\} \nonumber \\&=\frac{1}{N^2}\sum _{i=1}^{N}\frac{1}{\pi _{i}^{\textrm{Aopt}}}(h_i^{\textrm{Aopt}})^2\nonumber \\&=\frac{1}{N^2}\left\{ (n-k)\tilde{\textrm{M}}^2+\sum _{i=N-k+1}^{N}(h_i^{\textrm{Aopt}})^2\right\} , \end{aligned}$$
(A.11)

which echoes the lower bound of \(\text {tr}(\varvec{D}^{-1}_N\varvec{V}_{\pi }\varvec{D}^{-1}_N)\) in (A.10). Therefore, by (A.10) and (A.11), \(\pi _{i}^{\textrm{Aopt}}=\frac{n(h_i^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}{\sum _{j=1}^{N}(h_j^{\textrm{Aopt}}\wedge \tilde{\textrm{M}})}\) is the optimal solution.

Now, we will prove the existence and rationality of \(\tilde{\textrm{M}}\) when \(\tilde{\textrm{M}} \in (h_{N-k}^{\textrm{Aopt}}, h_{N-k+1}^{\textrm{Aopt}}]\). The definition of k, which implies that

$$\begin{aligned} \frac{(n-k+1)h_{N-k+1}^{\textrm{Aopt}}}{\sum _{i=1}^{N-k+1} h_{i}^{\textrm{Aopt}}} \ge 1 \quad \text{ and } \quad \frac{(n-k) h_{N-k}^{\textrm{Aopt}}}{\sum _{i=1}^{N-k} h_{i}^{\textrm{Aopt}}}<1. \end{aligned}$$

Taking \(\tilde{\textrm{M}}_{1}=h_{N-k+1}^{\textrm{Aopt}}\) and \(\tilde{\textrm{M}}_{2}=h_{N-k}^{\textrm{Aopt}}\), we have

$$\begin{aligned} \frac{(n-k+1) h_{N-k+1}^{\textrm{Aopt}}+(k-1) \tilde{\textrm{M}}_{1}}{\sum _{i=1}^{N-k+1} h_{i}^{\textrm{Aopt}}+(k-1) \tilde{\textrm{M}}_{1}} \ge 1 \quad \text{ and } \quad \frac{(n-k) h_{N-k}^{\textrm{Aopt}}+k \tilde{\textrm{M}}_{2}}{\sum _{i=1}^{N-k} h_{i}^{\textrm{Aopt}}+k \tilde{\textrm{M}}_{2}}<1, \end{aligned}$$

which implies the fact that

$$\begin{aligned} n\frac{h_{i}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}_{1}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}_{1})} \ge 1 \text{ and } n\frac{h_{i}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}_{2}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}_{2})}<1. \end{aligned}$$

We can see that \(\underset{1\le i\le N}{\max }\frac{h_{i}^{\textrm{Aopt}}\wedge \tilde{\textrm{M}}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}})}\) is continuous, which shows the existence of \(\tilde{\textrm{M}}\).

For the rationality of \(\tilde{\textrm{M}}\), we only prove that \(\underset{1\le i\le N}{\max }\frac{h_{i}^{\textrm{Aopt}}\wedge \tilde{\textrm{M}}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}})}=\frac{1}{n}\), i.e. \(\frac{h_{N}^{\textrm{Aopt}}\wedge \tilde{\textrm{M}}}{\sum _{j=1}^{N}(h_{j}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}})}\) is nondecreasing on \(\tilde{\textrm{M}} \in (h_{1}^{\textrm{Aopt}},h_{N}^{\textrm{Aopt}})\). For any \(h_N^{\textrm{Aopt}}\ge \tilde{\textrm{M}}'\ge \tilde{\textrm{M}}\), \(\tilde{\textrm{M}}'\wedge h_N^{\textrm{Aopt}}\ge \tilde{\textrm{M}}\wedge h_N^{\textrm{Aopt}}\), and \(\left( \tilde{\textrm{M}}' / \tilde{\textrm{M}}\right) \sum _{i=1}^{N}(h_{i}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}) \ge \sum _{i=1}^{n}(h_{i}^{\textrm{Aopt}} \wedge \tilde{\textrm{M}}')\). So, the rationality can be proved. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, M., Zhao, S., Wang, M. et al. Robust optimal subsampling based on weighted asymmetric least squares. Stat Papers (2023). https://doi.org/10.1007/s00362-023-01480-7

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1007/s00362-023-01480-7

Keywords

Mathematics Subject Classification

Navigation