1 Introduction

Arriaza et al. (2019) have introduced two functions, the left shape function and the right shape function, which, in some stochastic sense, synthesize the form of the distribution and can be employed to study the behavior of the tails and the symmetry of a random variable. Specifically, let X be an absolutely continuous random variable with probability density function f and distribution function F. For each \(u \in (0,1)\), let

$$\begin{aligned} x_u = F^{-1}(u) = \inf \{x :\, F(x) \geqslant u\}. \end{aligned}$$

The left shape function and the right shape function of X are defined as

$$\begin{aligned} L_X(u) = {\mathbb {E}}\{(X-x_u)^-f(X)\}, \quad u \in (0,1), \end{aligned}$$

and

$$\begin{aligned} R_X(u) = {\mathbb {E}}\{(X-x_u)^+f(X)\}, \quad u \in (0,1), \end{aligned}$$

respectively, provided that the expectations exist, where \(x^-=\max \{0,-x\}\) and \(x^+=\max \{0,x\}\), \(\forall x \in {\mathbb {R}}\). Since \(L_X(u)=R_{-X}(1-u)\), \(\forall u \in (0,1)\) (see Lemma 2.2 in Arriaza et al. 2019), from now on we will restrict our attention to the right shape function, \(R_X\). In order to simplify notation, the subindex X will be suppressed from the right shape function, so, from now on, we write R(u) instead of \(R_X(u)\) when there is no possibility of confusion.

Notice that if \({\mathbb {E}}|X|<\infty \) and f is bounded, then R(u) is a well-defined quantity for each \(u\in (0,1)\) since

$$\begin{aligned} R(u)= {\mathbb {E}}[\{X-F^{-1}(u)\}^+f(X)]\leqslant M \left\{ {\mathbb {E}}|X|+|F^{-1}(u)|\right\} <\infty ,\quad \forall u\in (0,1), \end{aligned}$$

where M is a positive constant. Moreover, R is a positive and strictly decreasing function with \(\displaystyle \lim _{u \rightarrow 1^-}R(u)=0\) (see Remark 2.4 in Arriaza et al. 2019). So we define \(R(1)=0\) and, if the limit exists, \(R(0)=\displaystyle \lim _{u \rightarrow 0^+}R(u)\).

The right shape function has several remarkable properties. For example, the limit when u approaches 1 of the quotient of the right shape functions of two random variables provides useful information on the relative behavior of their residual Rényi entropies of order 2, a measure of interest in reliability and other fields; if \({\mathcal {F}}\) is a location-scale family of distribution functions, that is,

$$\begin{aligned} {\mathcal {F}}=\{F:\, F(x)=F_0((x-\mu )/\varsigma ), \, \forall x \in {\mathbb {R}},\, \mu \in {\mathbb {R}}, \varsigma >0\}, \end{aligned}$$
(1)

for some fixed distribution function \(F_0\), then for any random variables X and Y with distribution function in \({\mathcal {F}}\), we have that \(R_X(u)=R_Y(u)\), \(\forall u \in (0,1)\), in other words, the right shape function characterizes location-scale families; among many others (see Arriaza et al. 2019). Moreover, if

$$\begin{aligned}S_X(u)=R_X(u)-R_{-X}(u), \quad u \in (0,1),\end{aligned}$$

then \(S_X(u)=S_Y(u)\), \(\forall u \in (0,1)\), for all X and Y with distribution function in \({\mathcal {F}}\), and \(S_X(u)=0\), \(\forall u \in (0,1)\), if and only if the distribution of X is symmetric. These properties can be used to make inferences. For example, since R characterizes location-scale families, it may be used to build goodness-of-fit tests of these families. A key step towards the development of statistical procedures based on the right shape function is the study of an estimator of such function. This is just the objective of this paper.

Remark 1

The definition of the function S (as before, the subindex X will be skipped when there is no possibility of confusion) is a bit different from that given in Arriaza et al. (2019), which is \({S^{\textrm{Arr}}(u)}=R_{-X}(1-u)-R_{X}(1-u)\). Both definitions are closely related: \(S(u)=0\), \(\forall u \in (0,1)\), if and only if \(S^{\textrm{Arr}}(u)=0\), \(\forall u \in (0,1)\), and \(S(u) \geqslant 0\), \(\forall u \in (0,1)\), if and only if \(S^{\textrm{Arr}}(u) \leqslant 0\), \(\forall u \in (0,1)\).

Let \(X_1, \ldots , X_n\) be a random sample from X, that is, \(X_1, \ldots , X_n\) are independent with the same distribution as X. Since

$$\begin{aligned} R(u)= {\mathbb {E}}\{(X-x_u)^+f(X)\}=\int \{x-F^{-1}(u)\}^+f(x)dF(x), \end{aligned}$$

where \(\int \) stands for the integral on the whole real line, to estimate R(u) we propose to replace F with the empirical distribution function,

$$\begin{aligned} F_n(x)=\frac{1}{n}\sum _{i=1}^n \textbf{1}\{ X_i \leqslant x \}, \end{aligned}$$

where \(\textbf{1}\{ \cdot \}\) denotes the indicator function (that is, \(\textbf{1}\{ X_i \leqslant x \}=1\) if \(X_i \leqslant x \) and \(\textbf{1}\{ X_i \leqslant x \}=0\) if \(X_i > x \)), and f with a kernel estimator

$$\begin{aligned} {\hat{f}}_n(x)=\frac{1}{n{\hat{h}}}\sum _{i=1}^n K\left( \frac{x-X_i}{{\hat{h}}}\right) , \end{aligned}$$

where \({\hat{h}}\) is the bandwidth and K is a kernel. We take \({\hat{h}}={\hat{\sigma }}\times g(n)\), where \({\hat{\sigma }}={\hat{\sigma }}(X_1, \ldots , X_n)\) is an estimator of \(\sigma =\sigma (X)\), a spread measure of X, both of them satisfying \(\sigma (aX+b)=|a|\sigma (X)\) and \({\hat{\sigma }}(aX_1+b, \ldots , aX_n+b)=|a|{\hat{\sigma }}(X_1, \ldots , X_n)\), \(\forall a,\, b \in {\mathbb {R}}\), and g is a decreasing function. Further assumptions on \({\hat{h}}\) and K will be specified later. For \(u\in (0,1)\), the empirical quantile function, \(F_n^{-1}(u)\), is defined as follows

$$\begin{aligned} F_n^{-1}(u)=\inf \{x:\, F_n(x) \geqslant u\}=\left\{ \begin{array}{lll} X_{1:n} \quad &{} \text{ if } \quad &{} u\in I_1=[0,1/n],\\ X_{k:n} \quad &{} \text{ if } \quad &{} u \in I_k=((k-1)/n, k/n], \quad 2 \leqslant k \leqslant n, \end{array} \right. \end{aligned}$$

where \(X_{1:n} \leqslant \cdots \leqslant X_{n:n}\) denote the order statistics.

Therefore, we consider the following plug-in estimator of R(u),

$$\begin{aligned} R_n(u)= \frac{1}{n}\sum _{i=1}^n \{X_i-F^{-1}_n(u)\}^+{\hat{f}}_n(X_i). \end{aligned}$$

Notice that

$$\begin{aligned} R_n(u)= \left\{ \begin{array}{ll}\displaystyle \frac{1}{n}\sum _{i=k+1}^n (X_{i:n}-X_{k:n}){\hat{f}}_n(X_{i:n}), \quad &{} u \in I_k, \quad 1 \leqslant k \leqslant n-1,\\ 0 &{} u \in I_n, \end{array} \right. \end{aligned}$$

and thus, \(R_n\) is a piece-wise constant function. Observe that \(R_n(1)=0\), \(\forall n\). The behavior of \(R_n\) at \(u=1\) is consistent since, as seen before, \(\displaystyle \lim _{u \rightarrow 1^-}R(u)=0\). Observe also that, by construction, \(R_n(u)=R_n(u; X_1, \ldots , X_n)= R_n(u; aX_1+b, \ldots , aX_n+b)\), \(\forall a, b \in {\mathbb {R}}\), thus \(R_n(u)\) is location-scale invariant, the same as R(u).

Analogously, we consider the following plug-in estimator of S(u),

$$\begin{aligned} S_n(u)= R_{X,n}(u)-R_{-X,n}(u), \end{aligned}$$

where \(R_{X,n}(u)\) stands for \(R_n(u)\), the estimator of \(R_X(u)\) calculated from the sample \(X_1, \ldots , X_n\), and \(R_{-X,n}(u)\) stands for the estimator of \(R_{-X}(u)\) calculated from the sample \(-X_1, \ldots , -X_n\).

Section 5 of Arriaza et al. (2019) proposes another estimator of R(u) that consists of replacing both f(x) and F(x) with kernel estimators \(f_n(x)\) and \({\hat{F}}_n(x)=\int _{-\infty }^x f_n(y)dy\), respectively. No properties of the resulting estimators of R and S were studied there. A main drawback of such estimators is that they do not have an easily computable expression, which must be approximated numerically.

The paper unfolds as follows. Section 2 derives asymptotic properties related to the pointwise and uniform consistency of the proposed estimator of the right shape function. Several results are given under different regularity assumptions. In Sect. 3 results related to the pointwise asymptotic normality and global weak convergence are detailed. In Sect. 4, a simulation study and an application to a real data set illustrate the practical performance of the estimator. This section also contains an application to a goodness-of-fit testing problem for a location-scale family that can be solved by employing the proposed estimator. All computations have been programmed and run in R (R Core Team 2020). Some conclusions and further research possibilities are discussed in Sect. 5. Finally, all proofs are deferred to Sect. 6.

Throughout the paper it will be tacitly assumed that X is an absolutely continuous random variable with cumulative distribution function F and bounded probability density function f; all limits are taken when \(n \rightarrow \infty \), where n denotes the sample size; \({\mathop {\longrightarrow }\limits ^{{\mathcal {L}}}}\) stands for the convergence in law; \({\mathop {\longrightarrow }\limits ^{P}}\) stands for the convergence in probability; \({\mathop {\longrightarrow }\limits ^{a.s}}\) stands for the almost sure convergence; for a function \(w: (a,b) \subseteq {\mathbb {R}} \mapsto {\mathbb {R}}\) and \(x \in (a,b]\), \(w(x-)\) denotes the one-sided limit \(\displaystyle \lim _{y \rightarrow x-}w(y)\); \(O_P(1)\) refers to a stochastic sequence bounded in probability and \(o_P(1)\) refers to a stochastic sequence that converges to zero in probability; the kernel function \(K:{\mathbb {R}} \mapsto {\mathbb {R}}\) is a probability density function satisfying some of the following assumptions:

Assumption 1

 

  1. (i)

    K has compact support and is Lipschitz continuous.

  2. (ii)

    K has bounded variation.

  3. (iii)

    K is symmetric, \(K(x)=K(-x)\), \(\forall x \in {\mathbb {R}}\).

  4. (iv)

    The support of K is [cd], for some \(\infty<c<d<\infty \), \(K(c)=K(d)=0\), K is twice differentiable on (cd), with bounded derivatives \(K'\) and \(K''\).

The bandwidth \(h=\sigma \times g(n)\) will be assumed to satisfy some of the following assumptions:

Assumption 2

 

  1. (i)

    \(h \rightarrow 0\) and \(\sum _{n\geqslant 1}\exp \{-\varepsilon n h^2\}<\infty \), \(\forall \varepsilon >0\).

  2. (ii)

    \(h \rightarrow 0\), \(nh \rightarrow \infty \), \(nh^4 \rightarrow 0\).

  3. (iii)

    \(h \rightarrow 0\), \(nh^2 \rightarrow \infty \), \(nh^4 \rightarrow 0\).

In Assumption 2, notice that (iii) is stronger than (ii). On the other hand, the condition \(\sum _{n\geqslant 1}\exp \{-\varepsilon n h^2\}<\infty \) in (i) implies \(nh^2 \rightarrow \infty \) in (iii), but does not entail \(nh^4 \rightarrow 0\).

2 Almost sure limit

The next theorem gives the almost sure limit of \(R_n(u)\), for each \(u \in (0,1)\).

Theorem 1

Suppose that \({\mathbb {E}}|X|<\infty \), that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), that f is uniformly continuous, that K satisfies Assumption 1 (i) and that h satisfies Assumption 2 (i). Let \(u \in (0,1)\). Suppose that there is a unique solution in x of \(F(x-) \leqslant u \leqslant F(x)\). Then, \(R_n(u) {\mathop {\longrightarrow }\limits ^{a.s.}} R(u)\).

Let (ab) denote the support of F, that is, \(a=\sup \{x\,:\, F(x)=0\}\) and \(b=\inf \{x\,:\, F(x)=1\}\). A key assumption in Theorem 1 to get the a.s. convergence is the uniform continuity of f, necessary to get the uniform convergence of \({\hat{f}}_n\) to f. This assumption may not hold, specially if either \(a>-\infty \) or \(b<\infty \). Nevertheless, if such assumption fails, we still can get the a.s. convergence by using other assumptions, as stated in the next theorem.

Theorem 2

Suppose that f is twice continuously differentiable on (ab), that \({\mathbb {E}}(X^2)<\infty \), \({\mathbb {E}}\{f'(X)^2\}<\infty \), \({\mathbb {E}}\{f''(X)^2\}<\infty \) and \({\mathbb {E}}\{X^2f''(X)^2\}<\infty \), that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), that K satisfies Assumption 1 (i) and (iii) and that h satisfies Assumption 2 (i) and (ii). Let \(u \in (0,1)\). Suppose that there is a unique solution in x of \(F(x-) \leqslant u \leqslant F(x)\). Then, \(R_n(u) {\mathop {\longrightarrow }\limits ^{a.s.}} R(u)\).

In general, it is not possible to get the uniform a.s. convergence of \(R_n\) to R because the convergence of the empirical quantile function to the population quantile function is not uniform, unless F has finite support. In such a case, the next theorem shows that we also have the uniform convergence of \(R_n\) to R.

Theorem 3

Suppose that \(-\infty<a<b<\infty \), f is continuous in (ab) and \(\displaystyle \inf _{0 \leqslant u \leqslant 1} f\left( F^{-1}(u)\right) >0\). Suppose also that K satisfies Assumption 1 (i), that h satisfies Assumption 2 (i), and that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\). Then,

$$\begin{aligned}\sup _{0 \leqslant u \leqslant 1}|R_n(u)-R(u)| {\mathop {\longrightarrow }\limits ^{a.s.}} 0.\end{aligned}$$

Let \(w:[0,1] \mapsto {\mathbb {R}}\) a measurable positive function and let \(L^2(w)\) denote the separable Hilbert space of (equivalence classes of) measurable functions \(f:[0,1] \mapsto {\mathbb {R}}\) satisfying \(\int _0^1 f(u)^2 w(u)du<\infty \), the scalar product and the resulting norm in \(L^2(w)\) will be denoted by \( \langle f, g \rangle _{w}=\int _0^1 f(u)g(u) w(u)du\) and \(\Vert f \Vert _{w}=\sqrt{ \langle f, f \rangle _{w}}\), respectively. If \(w(u)=1\), \(0 \leqslant u \leqslant 1\), then we simply denote \(L^2(w)\), \(\langle \cdot , \cdot \rangle _{w}\) and \(\Vert \cdot \Vert _{w}\) by \(L^2\), \(\langle \cdot , \cdot \rangle \) and \(\Vert \cdot \Vert \), respectively.

As said before, in general, it is not possible to obtain the uniform convergence of \(R_n\) to R. However, under quite general assumptions, it can be shown that \(R_n\) converges to R in \(L^2(w)\). A first result in this sense is given in the following theorem.

Theorem 4

Suppose that \({\mathbb {E}}(X^2)<\infty \), that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), that f is uniformly continuous, that K satisfies Assumption 1 (i) and that h satisfies Assumption 2 (i). Then \(R \in L^2\) and \(\Vert R_n-R\Vert {\mathop {\longrightarrow }\limits ^{a.s.}} 0.\)

It readily follows that if \(w:[0,1] \mapsto {\mathbb {R}}\) is a measurable bounded positive function, then the statement in the previous theorem also holds in \(L^2(w)\).

Corollary 1

Let \(w:[0,1] \mapsto {\mathbb {R}}\) be a measurable bounded positive function. Suppose that assumptions in Theorem 4 hold. Then \(R \in L^2(w)\) and \(\Vert R_n-R\Vert _w {\mathop {\longrightarrow }\limits ^{a.s.}} 0.\)

The uniform continuity of f can be replaced with other assumptions.

Theorem 5

Let \(w:[0,1] \mapsto {\mathbb {R}}\) be a measurable bounded positive function. Suppose that f is twice continuously differentiable on (ab), that \({\mathbb {E}}(X^2)<\infty \), \({\mathbb {E}}\{f'(X)^2\}<\infty \), \({\mathbb {E}}\{f''(X)^2\}<\infty \) and \({\mathbb {E}}\{X^2f''(X)^2\}<\infty \), that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), that K satisfies Assumption 1 (i) and (iii) and that h satisfies Assumption 2 (i) and (ii). Then \(R \in L^2(w)\) and \(\Vert R_n-R\Vert _w {\mathop {\longrightarrow }\limits ^{a.s.}} 0.\)

Remark 2

The assumptions in the statements of the previous asymptotic results exclude the optimal rate of the bandwidth for density estimation, to wit, \(n^{-1/5}\). Notice, however, that the objective here differs from the mere estimation of the density. This situation is not uncommon in nonparametric literature. For instance, when the target is to estimate a distribution function using the kernel method, Azzalini (1981) showed that the optimal bandwidth is of order \(n^{-1/3}\). Other examples can be found in Pardo-Fernández et al. (2015) and Pardo-Fernández and Jiménez-Gamero (2019).

Remark 3

All properties studied so far were stated for \(R_n\) as an estimator of R. Clearly, these properties carry over \(S_n\) as an estimator of S, which are not given to save space. The finite sample performance \(R_n(u)\) and \(S_n(u)\) as estimators of R(u) and S(u), respectively, will be numerically studied in Sect. 4 for data coming from a uniform distribution.

3 Weak limit

We first study the weak limit of \(\sqrt{n}\left\{ R_{n}(u)-R(u)\right\} \) at each \(u\in (0,1)\).

Theorem 6

Suppose that f is twice continuously differentiable on (ab), that \({\mathbb {E}}(X^2)<\infty \), \({\mathbb {E}}\{f'(X)^2\}<\infty \), \({\mathbb {E}}\{f''(X)^2\}<\infty \) and \({\mathbb {E}}\{X^2f''(X)^2\}<\infty \), that \(\sqrt{n}({\hat{\sigma }}-\sigma )=O_P(1)\), that K satisfies Assumption 1 (i), (iii) and (iv), and that h satisfies Assumption 2 (i) and (iii). Let \(u \in (0,1)\) be such that \(f(F^{-1}(u))>0\), then

$$\begin{aligned}\sqrt{n}\left\{ R_{n}(u)-R(u)\right\} =\frac{1}{\sqrt{n}}\sum _{i=1}^n Y_i(u)+o_P(1),\end{aligned}$$

where

$$\begin{aligned} Y_i(u)=2\big [\{X_i-F^{-1}(u)\}^+ f(X_i)-R(u)\big ]+\frac{\textbf{1}\{X_i \leqslant F^{-1}(u)\}-u }{f(F^{-1}(u))} \mu (u), \end{aligned}$$

\(1\leqslant i \leqslant n\), with

$$\begin{aligned} \mu (u)={\mathbb {E}}\left[ f(X)\textbf{1}\{F^{-1}(u)<X\} \right] , \end{aligned}$$

and therefore

$$\begin{aligned} \sqrt{n}\left\{ R_{n}(u)-R(u)\right\} {\mathop {\longrightarrow }\limits ^{{\mathcal {L}}}} Z\sim N(0, \varrho ^2(u)), \end{aligned}$$

where \(\varrho ^2(u)={\mathbb {E}}\{Y_1(u)^2\}\).

Recall that to estimate R(u) we replaced the population quantile function \(F^{-1}\) with the empirical quantile function \(F_n^{-1}\) and the population density function f with the kernel estimator \({\hat{f}}_n\). Each of these two replacements have an effect on the asymptotic behavior of \(\sqrt{n}\left\{ R_{n}(u)-R(u)\right\} \): (a) the first replacement is the responsible for the term \(\frac{\textbf{1}\{X_i \leqslant F^{-1}(u)\}-u }{f(F^{-1}(u))} \mu (u)\) in the expression of \(Y_i(u)\); and (b) the second replacement is the responsible for the coefficient 2 in the first part of \(Y_i(u)\). Notice that, under the assumptions made, taking the bandwidth data dependent, \({\hat{h}}={\hat{\sigma }}g(n)\), has no effect on the asymptotic distribution of \(\sqrt{n}\left\{ R_{n}(u)-R(u)\right\} \).

The result in Theorem 6 can be used to give approximate (in the sense of asymptotic) confidence intervals for R(u). Let \({\hat{\varrho }}(u)\) denote any consistent estimator of \({\varrho (u)}\) (see the explanation below for a candidate). If \(z_{v}\) is such that \(\Phi (z_{v})=v\), where \(\Phi \) stands for the cumulative distribution function of the standard normal distribution, for a given \(\alpha \in (0,1)\), then

$$\begin{aligned} (R_n(u)-z_{1-\alpha /2}{\hat{\varrho }}(u)/\sqrt{n}, \, R_n(u)+z_{1-\alpha /2}{\hat{\varrho }}(u)/\sqrt{n}) \end{aligned}$$
(2)

is a random confidence interval for R(u) with asymptotic confidence level \(1-\alpha \). If \(Y_1(u), \ldots , Y_n(u)\) were observed, since \(\varrho ^2(u)={\mathbb {E}}\{Y_1(u)^2\}\), which also coincides with the variance of \(Y_1(u)\), \({\mathbb {V}}\{Y_1(u)\}\), one could consistently estimate \(\varrho ^2(u)\) by means of the sample variance of \(Y_1(u), \ldots , Y_n(u)\). The point is that \(Y_1(u), \ldots , Y_n(u)\) depend on unknown quantities. Taking into account that \({\mathbb {V}}\{Y_1(u)\}={\mathbb {V}}\{W_1(u)\}\), where \( W_i(u)=2\{X_i-F^{-1}(u)\}^+ f(X_i)+\textbf{1}\{X_i \leqslant F^{-1}(u)\}\mu (u)/f(F^{-1}(u))\), \(1\leqslant i \leqslant n\), we propose to replace f by \({\hat{f}}_n\) and F by \(F_n\) in the expression of \(W_1(u), \ldots , W_n(u)\), giving rise to

$$\begin{aligned} {\hat{W}}_i(u)= & {} 2\{X_i-F_n^{-1}(u)\}^+ {\hat{f}}_n(X_i)+\textbf{1}\{X_i \leqslant F_n^{-1}(u)\}{\hat{\mu }}(u)/{\hat{f}}_n(F_n^{-1}(u)), \quad 1\leqslant i \leqslant n, \nonumber \\ \end{aligned}$$
(3)

where

$$\begin{aligned} {\hat{\mu }}(u)=\int {\hat{f}}_n(x) \textbf{1}\{F_n^{-1}(u)<x\}dF_n(x)=\frac{1}{n}\sum _{i=1}^n {\hat{f}}_n(X_i) \textbf{1}\{F_n^{-1}(u)<X_i\}, \end{aligned}$$

and then estimate \(\varrho ^2(u)\) by means of the sample variance of \({\hat{W}}_1(u), \ldots , {\hat{W}}_n(u)\), that we denote by \({\hat{\varrho }}^2(u)\). The finite sample performance of the confidence interval in (2) as well as the the goodness of \({\hat{\varrho }}^2(u)/n\) as an approximation to the variance of \(R_n(u)\) will be examined in Sect. 4 for data coming from a uniform distribution.

Finally, we study the convergence in law of \(n\Vert R_n-R\Vert _w^2\), for some adequate w. In general, \(n\Vert R_n-R\Vert ^2\) does not possess a weak limit unless we assume rather strong assumptions on F. This is because the derivations to study such a limit involve those of \(\sqrt{n}\{F_n^{-1}-F^{-1}\}\) in \(L^2\), and therefore, it inherits the same limitations (see, for example, del Barrio et al. 2000, 2005). A convenient way to overcome these difficulties is to consider, instead of \(\Vert \cdot \Vert ^2\), the norm in \(L^2(w)\), with \(w(u)=f^2(F^{-1}(u))\). This weight function is taken for analytical convenience. It may seem a bit odd, because f (and hence F) is unknown in practical applications. Nevertheless, as stated in the Introduction, \(n\Vert R_n-R\Vert _w^2\) could be used as a test statistic for testing goodness-of-fit to a location-scale family (1), and in such a case under the null hypothesis \(f(F^{-1}(u))=\frac{1}{\varsigma }f_0(F_0^{-1}(u))\), so we can take \(w(u)=f_0(F_0^{-1}(u))\) which is known in that testing framework. Later on, we will discuss other weight functions

We first show that, under some conditions, the linear approximation for \(\sqrt{n}\{R_{n}(u)-R(u)\} \) given in Theorem 6 for a fixed \(u\in (0,1)\), is valid for all u in certain intervals, in the \(L^2(w)\) sense.

Theorem 7

Suppose that f is twice continuously differentiable on (ab), that \({\mathbb {E}}(X^2)<\infty \), \({\mathbb {E}}\{f'(X)^2\}<\infty \), \({\mathbb {E}}\{f''(X)^2\}<\infty \) and \({\mathbb {E}}\{X^2f''(X)^2\}<\infty \), that \(\sqrt{n}({\hat{\sigma }}-\sigma )=O_P(1)\), that K satisfies Assumption 1 (i), (iii) and (iv), and that h satisfies Assumption 2 (i) and (iii). Suppose also that \(f\left( F^{-1}(u)\right) >0\), \(u \in (0,1)\) and

$$\begin{aligned}\sup _{0< u < 1} u(1-u)\frac{\left| f'\left( F^{-1}(u)\right) \right| }{f^2\left( F^{-1}(u)\right) } \leqslant \gamma ,\end{aligned}$$

for some finite \(\gamma >0\). Let

$$\begin{aligned} A=\lim _{x \downarrow a}f(x)<\infty . \end{aligned}$$

Suppose also that if \(A = 0\) then f is nondecreasing on an interval to the right of a. Let \(w(u)=f^2(F^{-1}(u))\), then

$$\begin{aligned} \sqrt{n}\left\{ R_{n}(u)-R(u)\right\}= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n Y_i(u)+r_n(u),\quad u \in {\mathbb {I}}_n=\left( 0,\frac{n-1}{n}\right] , \\ \int _{{\mathbb {I}}_n}r_n(u)^2w(u)du= & {} o_P(1). \end{aligned}$$

Corollary 2

Under assumptions in Theorem 7, \(n\Vert R_n-R\Vert _w^2 {\mathop {\longrightarrow }\limits ^{{\mathcal {L}}}} \Vert Z\Vert ^2_w\), where \(\{Z(u), 0 \leqslant u \leqslant 1\}\) is a zero-mean Gaussian process on \(L^2(w)\) with \(Z(1)=0\) and covariance function \(cov \{Z(u),\, Z(s)\}={\mathbb {E}}\{ Y_1(u) \, Y_1(s) \}\), \(u, s \in (0,1)\).

From the proof of Theorem 7 and Theorem 4.6 (i) of del Barrio et al. (2005), the results in Theorem 7 and Corollary 2 keep on being true for any bounded weight function w satisfying

$$\begin{aligned}\int _0^1\frac{u(1-u)}{f^2(F^{-1}(u))}w(u)du<\infty , \quad \lim _{u\uparrow 1}\frac{(1-u)\int _u^1w(u)du}{f^2(F^{-1}(u))}=0.\end{aligned}$$

Although this result may seem more general than those stated in Theorem 7 and Corollary 2, notice that the choice of an adequate weight function requires a strong knowledge of f.

As observed after Theorem 6, the replacement of the population quantile function \(F^{-1}\) with the empirical quantile function \(F_n^{-1}\) in the expression of R to build the estimator \(R_n\) is the responsible for the term \(\frac{\textbf{1}\{X_i \leqslant F^{-1}(u)\}-u }{f(F^{-1}(u))} \mu (u)\) in the expression of \(Y_i(u)\), which makes \(Y_i(u)\) to inherit the properties and difficulties of estimating the quantile function. So, one may wonder if there is any advantage in using the right (left) shape function instead of the quantile function. From a theoretical point of view the answer is yes: in order to use the quantile function one must assume certain conditions on the right tail and on the left tail of the distribution, while if one uses right (left) shape function then only assumptions on the left (right) tail are necessary. This is because both \(R_n(u)\) and R(u) (respectively, \(L_n(u)\) and L(u)) go to 0 as \(u\uparrow 1\) (respectively, \(u\downarrow 0\)).

Remark 4

The properties studied in this section for \(R_n-R\) are inherited by \(S_n-S\). Specifically, let \(u \in (0,1)\), then

$$\begin{aligned} (S_n(u)-z_{1-\alpha /2}{\hat{\varrho }}_S(u)/\sqrt{n}, \, S_n(u)+z_{1-\alpha /2}{\hat{\varrho }}_S(u)/\sqrt{n}) \end{aligned}$$
(4)

is an approximate (in the sense of asymptotic) confidence interval for S(u), where \({\hat{\varrho }}_S^2(u)\) is the sample variance of \(V_1(u), \ldots , V_n(u)\), where \(V_i(u)={\hat{W}}_{X,i}(u)-{\hat{W}}_{-X,i}(u)\), \({\hat{W}}_{X,i}(u)\) are the quantities defined in (3) calculated on the sample \(X_1, \ldots , X_n\), and \({\hat{W}}_{-X,i}\) are the quantities defined in (3) calculated on \(-X_1, \ldots , -X_n\), \(i=1,\ldots ,n\). Notice that if 0 does not belong to the confidence interval (4) one may conclude that the law of X is not symmetric.

4 Some numerical illustrations

4.1 Estimation of R and S

If X has a uniform distribution on the interval (ab), \(X\sim U(a,b)\), then \(R(u)=0.5(1-u)^2\) (see, Table 1 in Arriaza et al. 2019). For several values of n, we have generated 10,000 random samples with size n from a distribution U(0, 1). For each sample, we have estimated R using \(R_n\) taking as K the Epanechnikov kernel (scaled so that it has variance 1) and \(h=sd\times n^{-\tau }\), with \(\tau =0.35, 0.40, 0.45, 0.49\), and \(sd^2\) denoting the sample variance. Notice that for these choices of h Assumption 2 (i), (ii) and (iii) are met. Tables 1 and 2 show the value of R(u), the bias and the standard deviation of the values of \(R_n(u)\), the mean of the standard deviation estimator \(\hat{\varrho }(u)/\sqrt{(}n)\) (recall that it is based on asymptotic arguments), and the coverage of the confidence interval (2) calculated at the nominal level 95%, for \(u=0.1, \ldots , 0.9\) and \(n=100, \,250, \,500, \,1000\). Figure 1 displays the graph of 1000 estimations for \(\tau =0.45\) in grey joint with the population shape function in black. Looking at these tables and figure we see that the bias and the variance become smaller as u approaches 1; the standard deviation estimator is, on average, a bit larger than the true standard deviation, specially for smaller sample sizes; the bias depends on the values of \(\tau \) and u, being negative for smaller values of \(\tau \) and for larger values of u; as for the coverage of the confidence interval, it also depends the values of \(\tau \) and u: it is rather poor for smaller values of \(\tau \) and larger values of u, this is because in such cases the value of the bias in relation to the standard deviation estimator is non-negligible. As expected from Theorem 3, the differences between \(R_n(u)\) and R(u) become smaller as n increases, uniformly in u.

Table 1 Values of R(u), bias \(\times 10^3\) (bi) of \(R_n(u)\), standard deviation \(\times 10^3\) (se) of \(R_n(u)\), the mean of the standard deviation estimator \(\hat{\varrho }(u)/\sqrt{(}n)\) \(\times 10^3\) (ms), and the coverage of the confidence interval (2) calculated at the nominal level 95% (co), calculated with \(h=sd\times n^{-\tau }\), for \(u=0.1, \ldots , 0.9\), and sample sizes \(n=100,250\)
Table 2 Values of R(u), bias \(\times 10^3\) (bi) of \(R_n(u)\), standard deviation \(\times 10^3\) (se) of \(R_n(u)\), the mean of the standard deviation estimator \(\hat{\varrho }(u)/\sqrt{(}n)\) \(\times 10^3\) (ms), and the coverage of the confidence interval (2) calculated at the nominal level 95% (co), calculated with \(h=sd\times n^{-\tau }\), for \(u=0.1, \ldots , 0.9\), and sample sizes \(n=500,1000\)

A similar experiment was carried out for the estimation of S, whose results are summarized in Tables 3 and 4 and Fig. 2. Looking a these tables and figure we see that the bias is quite small in all cases; that the variance becomes smaller as u approaches 1 and it also decreases with n; the standard deviation estimator is, on average, a bit larger than the true standard deviation, the differences become smaller as n increases; this fact provokes that the coverage of the confidence intervals is larger than the nominal value, specially for small sample sizes.

4.2 Glass fibre breaking strengths

As an illustration, we will analyse a real data set already considered in Arriaza et al. (2019) and which had been previously introduced by Smith and Naylor (1987). The set consists of 63 observations of the breaking strength of glass fibres of 1.5 cm of length collected by the National Physical Laboratory in England (for more details about the data set, see Smith and Naylor 1987). The left panel of Fig. 3 depicts the histogram and the kernel density estimator obtained from the data. As discussed in Arriaza et al. (2019), this explanatory analysis suggests a certain negative skewness in the distribution. The right panel of Fig. 3 displays the estimator \(S_n(u)\), \(u \in (0,1)\), with \({\hat{h}}=sd\times n^{-0.45}\) and taking as K the Epanechnikov kernel (scaled so that it has variance 1). Other values for \({\hat{h}}\) have been investigated and similar results were obtained. The graph also displays the confidence intervals in (4) for S(u) calculated at the nominal level 90% for \(u=0.1, \ldots , 0.9\). The confidence intervals are also detailed in Table 5. Notice that the estimator of the function S tends to lie over the horizontal axis, which indicates asymmetry in the distribution. Moreover, the confidence intervals for S(0.1), S(0.2) and S(0.4) do not contain the zero. This conclusion is in agreement with Arriaza et al. (2019).

Fig. 1
figure 1

Graphs of 1000 generated estimations calculated with \(h=sd \times n^{-0.45}\) in grey joint with the population shape function R(u) in black

4.3 An application to testing goodness-of-fit

Now we consider the problem of testing goodness-of-fit to a uniform distribution, that is, we want to test

$$\begin{aligned} \begin{array}{ll} H_0: &{} X\sim U(a,b), \quad \text{ for } \text{ some } a<b, \,\, a, b \in {\mathbb {R}},\\ H_1: &{} X\not \sim U(a,b), \quad \text{ for } \text{ all } a<b, \,\, a, b \in {\mathbb {R}}, \end{array} \end{aligned}$$

on the basis of a sample of size n from X. Let \(f_0\) and \(F_0\) denote the probability density function and the cumulative distribution function of the U(0, 1) law, respectively. Then \(w(u)=f_0(F_0(u))=\textbf{1}\{0 \leqslant u \leqslant 1\}\), and thus \(\Vert \cdot \Vert =\Vert \cdot \Vert _w\). Let R denote the right shape function of X and let \(R_0\) denote the right shape function of a uniform distribution, \(R_0(u)=0.5(1-u)^2\). From Theorem 3, it follows that \(\Vert R_n-R_0\Vert \) converges to \(\Vert R-R_0\Vert \), which is equal to 0, under the null, and a positive quantity under alternatives. Thus, it seems reasonable to consider the test that rejects \(H_0\) for large values of \(T_n=n\Vert R_n-R_0\Vert ^2\). The critical region is \(T_n \geqslant t_{n,\alpha }\), where \( t_{n,\alpha }\) is the \(\alpha \) upper percentile of the null distribution of \(T_n\), whose value can be calculated by simulation by generating data from a U(0, 1) law, since the null distribution of \(T_n\) does not depend on the values of a and b, but only on \(F_0\). Notice that \(T_n\) has the readily computable expression

$$\begin{aligned} T_n = \sum _{k=1}^{n-1}r_k^2+\frac{n}{20}-\frac{n}{3}\sum _{k=1}^{n-1}c_k r_k, \end{aligned}$$

where

$$\begin{aligned} r_k = \frac{1}{n}\sum _{i=k+1}^n (X_{i:n}-X_{k:n}){\hat{f}}_n(X_{i:n}) \quad \text { and } \quad c_k = \left( 1-(k-1)/n\right) ^3-(1-k/n)^3. \end{aligned}$$

There are many tests in the statistical literature for testing \(H_0\) against \(H_1\), and the objective of this section is not to provide an exhaustive list of such tests, but only to suggest a possible application of the results stated in the previous sections. In our view, this as well as other possible applications deserve further separate research, out of the scope of this manuscript. Nevertheless, since \(T_n\) is closely related to the Wasserstein distance between \(F_n\) and \(F_0\), which is equal to the squared root of the \(L^2\) norm of the difference between the empirical quantile function and the quantile function of \(F_0\), we carried out a small simulation experiment in order to compare the powers of the newly proposed tests and the one based on the Wasserstein distance. To make the Wasserstein distance invariant with respect to location and scale changes, we consider as test statistic \(W_n=n\Vert F_n^{-1}-F_0^{-1}\Vert ^2/{\hat{\sigma }}^2\), \({\hat{\sigma }}^2\) denoting the sample variance, and reject the null hypothesis for large values of \(W_n\) (see, del Barrio et al. 2000), The critical region is \(W_n \geqslant w_{n,\alpha }\), where \( w_{n,\alpha }\) is the \(\alpha \) upper percentile of the null distribution of \(W_n\), whose value can be calculated by simulation by generating data from a U(0, 1) law, since the null distribution of \(W_n\) does not depend on the values of a and b, but only on \(F_0\).

Table 3 Values of S(u), bias \(\times 10^3\) (bi) of \(S_n(u)\), standard deviation \(\times 10^3\) (se) of \(S_n(u)\), the mean of the standard deviation estimator \(\hat{\varrho }_S(u)/\sqrt{(}n)\) \(\times 10^3\) (ms), and the coverage of the confidence interval (4) calculated at the nominal level 95% (co), calculated with \(h=sd\times n^{-\tau }\), for \(u=0.1, \ldots , 0.9\), and sample sizes \(n=100,250\)
Table 4 Values of S(u), bias \(\times 10^3\) (bi) of \(S_n(u)\), standard deviation \(\times 10^3\) (se) of \(S_n(u)\), the mean of the standard deviation estimator \(\hat{\varrho }_S(u)/\sqrt{(}n)\) \(\times 10^3\) (ms), and the coverage of the confidence interval (4) calculated at the nominal level 95% (co), calculated with \(h=sd\times n^{-\tau }\), for \(u=0.1, \ldots , 0.9\), and sample sizes \(n=500,1000\)

Table 6 displays the values \(t_{n,\alpha }\) and \(w_{n,\alpha }\) for \(n=30, \, 50\) and \(\alpha =0.05\), calculated by generating 100,000 samples. To calculate \(T_n\) we used the Epanechnikov kernel (scaled so that it has variance 1) and \({\hat{h}}=sd \times n^{-\tau }\), \(\tau =0.26,\, 0.30, \, 0.35, \, 0.40, \, 0.45, \, 0.49\). As alternatives, we considered several members of the log-Lindley distribution, a family with support on (0,1) with a large variety of shapes (see Gómez-Déniz et al. 2014; Jodrá and Jiménez-Gamero 2016) and probability density function

$$\begin{aligned} f(x; \kappa , \lambda )=\frac{\kappa ^2}{1+\kappa \lambda }(\lambda -\log (x))x^{\kappa -1}, \quad x \in (0,1), \end{aligned}$$

for some \(\kappa >0\) and \(\lambda \geqslant 0\). Figure 4 represents the empirical power of the two tests for \(\kappa =1.5, \, 2, \, 2.5\) and \(0.6 \leqslant \lambda \leqslant 5\), calculated by generating 10,000 samples for each combination of the parameter values. Looking at Fig. 4 we see that, although according to Corollary 2 the asymptotic null distribution of \(T_n\) does not depend on the value of \({\hat{h}}\), for finite sample sizes it has an effect on the power of the test, being higher for larger values of \({\hat{h}}\). As expected from the results in Janssen (2000), no test has the largest power against all alternatives. The power increases with the sample size.

Fig. 2
figure 2

Graphs of 1000 generated estimations of calculated with \(h=sd \times n^{-0.45}\) in grey joint with the population function S(u) in black

Fig. 3
figure 3

Left: histogram and kernel density estimator for Glass fibre breaking strengths data. Right: graph of \(S_n(u)\), \(u \in (0,1)\), and confidence intervals at the nominal level 90% calculated with \({\hat{h}}=sd\times n^{-0.45}\) for \(u=0.1, \ldots , 0.9\)

Table 5 Confidence intervals (4) for S(u) at nominal level 90%, calculated with \({\hat{h}}=sd\times n^{-0.45}\), for \(u=0.1, \ldots , 0.9\), for the Glass fibre breaking strengths data
Table 6 Critical points for \(n=30,\, 50\) with \(h=sd \times n^{-\tau }\) and \(\alpha =0.05\)

5 Conclusions and further research

As seen in the Introduction, shape functions have shown to have interesting properties. From a practical point of view, estimators of these functions need to be proposed and studied. This paper has focused on nonparametric estimators of the shape functions. The proposed estimators have been studied both theoretically and numerically. They exhibit nice asymptotic properties and the numerical experiments show a reasonable practical behavior. Optimal choice of the smoothing parameter involved in the construction of the estimator has not been dealt with in this piece of research. This issue deserves more investigation and will be considered in future studies.

6 Proofs

This section sketches the proofs of the results stated in the previous sections. Along this section, M is a generic positive constant taking many different values throughout the proofs and \(f_n(x)\) is defined as \({\hat{f}}_n(x)\) with \({\hat{h}}={\hat{\sigma }} \times g(n)\) replaced with \(h=\sigma \times g(n)\).

Fig. 4
figure 4

Empirical power based on 10,000 samples with size \(n=30, \, 50\) from a log-Lindley distribution

Lemma 1

Suppose that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), that K satisfies Assumption 1 (i) and that h satisfies Assumption 2 (i). Then \(\displaystyle \sup _x |{\hat{f}}_n(x)-f_n(x)| {\mathop {\longrightarrow }\limits ^{a.s.}} 0\).

Proof

We have that

$$\begin{aligned} {\hat{f}}_n(x)-{f}_n(x)=\delta _1(x)+\delta _2(x), \end{aligned}$$
(5)

with

$$\begin{aligned} \delta _1(x)= & {} \left( \frac{1}{n{\hat{h}}}- \frac{1}{nh} \right) \sum _{j=1}^nK\left( \frac{X_j-x}{h}\right) , \end{aligned}$$
(6)
$$\begin{aligned} \delta _2(x)= & {} \frac{1}{n{\hat{h}}}\sum _{j=1}^n \left\{ K\left( \frac{X_j-X_i}{{\hat{h}}}\right) -K\left( \frac{X_j-x}{h}\right) \right\} . \end{aligned}$$
(7)

Since

$$\begin{aligned} \delta _1(x)=\frac{\sigma -{\hat{\sigma }}}{{\hat{\sigma }}}\frac{1}{nh}\sum _{j=1}^nK\left( \frac{X_j-x}{h}\right) =\frac{\sigma -{\hat{\sigma }}}{{\hat{\sigma }}}f_n(x), \end{aligned}$$

we can write

$$\begin{aligned} |\delta _1(x)|\leqslant \left| \frac{\sigma -{\hat{\sigma }}}{{\hat{\sigma }}}\right| \left\{ \sup _x |f_n(x)-{\mathbb {E}}\{f_n(x)\}|+\sup _x{\mathbb {E}}\{f_n(x)\} \right\} . \end{aligned}$$
(8)

Recall that if K is Lipschitz continuous and has compact support, then it has bounded variation. Under the assumptions made on K and h we have that (see, for example, the proof of Theorem 2.1.3 of Prakasa Rao 1983)

$$\begin{aligned} \sup _x |f_n(x)-{\mathbb {E}}\{f_n(x)\}| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(9)

Since f is bounded,

$$\begin{aligned} {\mathbb {E}}\{f_n(x)\}=\int _{-\infty }^{\infty }K(u)f(x-hu)du\leqslant M \int _{-\infty }^{\infty }K(u)du=M, \quad \forall x. \end{aligned}$$
(10)

Using (8), (9), (10) and that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), it follows that

$$\begin{aligned} \sup _x |\delta _1(x)| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(11)

Now we study \(\delta _2(x)\). Since K is Lipschitz continuous and has compact support, \(\textrm{Sup}(K)\), we have that

$$\begin{aligned}{} & {} D(x) = \frac{1}{n{\hat{h}}}\sum _{j=1}^n \left| K\left( \frac{X_j-x}{{\hat{h}}}\right) -K\left( \frac{X_j-x}{h}\right) \right| \\{} & {} \leqslant M \frac{\sigma }{{\hat{\sigma }}} \left| \frac{\sigma }{{\hat{\sigma }}} -1 \right| \frac{1}{nh} \sum _{j=1}^n \left| \frac{X_j-x}{h}\right| \textbf{1}\{ (X_j-x)/h\in \textrm{Sup}(K) \text{ or } (X_j-x)/{\hat{h}} \in \textrm{Sup}(K) \}. \end{aligned}$$

Since \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), it follows that for n large enough, \(\textbf{1}\{ (X_j-x)/h \in \textrm{Sup}(K) \text{ or } (X_j-x)/{\hat{h}} \in \textrm{Sup}(K) \} \subseteq \textbf{1}\{ |X_j-x|/h \leqslant C \}\), for certain positive constant C. Therefore

$$\begin{aligned} D(x) \leqslant M \frac{\sigma }{{\hat{\sigma }}} \left| \frac{\sigma }{{\hat{\sigma }}} -1 \right| f_{U,n}(x), \end{aligned}$$
(12)

where \(f_{U,n}(x)\) is the kernel estimator of f built by using as kernel the probability density function of the uniform law on the interval \([-C,C]\). Proceeding as before, we have that

$$\begin{aligned} \sup _x f_{U,n}(x) \leqslant M \quad {a.s.}, \end{aligned}$$
(13)

Using that \( |\delta _2(x)|\leqslant D(x)\), (12), (13) and that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), it follows that

$$\begin{aligned} \sup _x |\delta _2(x)| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(14)

The result follows from (5), (11) and (14). \(\square \)

Remark 5

From the previous proof, notice that if in the statement of Lemma 1 the assumption \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma \) is replaced with \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{P}} \sigma \) then it is concluded that \(\displaystyle \sup _x |{\hat{f}}_n(x)-f_n(x)| {\mathop {\longrightarrow }\limits ^{P}} 0\).

Proof of Theorem 1

We have that

$$\begin{aligned} R_n(u)=R_{1n}(u)+R_{2n}(u)+R_{3n}(u)+R_{4n}(u), \end{aligned}$$
(15)

where

$$\begin{aligned} R_{1n}(u)= & {} \frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+f(X_i), \\ R_{2n}(u)= & {} \frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+\left\{ f_n(X_i)-f(X_i)\right\} , \\ R_{3n}(u)= & {} \frac{1}{n}\sum _{i=1}^n\left[ \{X_i-F^{-1}_n(u)\}^+-\{X_i-F^{-1}(u)\}^+\right] {\hat{f}}_n(X_i), \\ R_{4n}(u)= & {} \frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+\left\{ {\hat{f}}_n(X_i)-f_n(X_i)\right\} . \\ \end{aligned}$$

From the SLLN, it follows that

$$\begin{aligned} R_{1n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} R(u). \end{aligned}$$
(16)

As for \(R_{2n}(u)\), we have that

$$\begin{aligned} \left| R_{2n}(u) \right| \leqslant \sup _x |f_n(x)-f(x)| \left\{ \frac{1}{n}\sum _{i=1}^n|X_i|+|F^{-1}(u)|\right\} . \end{aligned}$$

Under the assumptions made (see, for example, Theorem 2.1.3 in Prakasa Rao 1983)

$$\begin{aligned} \sup _x |f_n(x)-f(x)| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(17)

Taking into account that \(\frac{1}{n}\sum _{i=1}^n|X_i| {\mathop {\longrightarrow }\limits ^{a.s.}} {\mathbb {E}}|X|<\infty \), it follows that

$$\begin{aligned} R_{2n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(18)

In order to study \(R_{3n}(u)\) we first observe that

$$\begin{aligned}{} & {} \left\{ X_i-F_n^{-1}(u) \right\} ^+-\left\{ X_i-F^{-1}(u) \right\} ^+\nonumber \\ {}{} & {} \quad = \left\{ \begin{array}{lll} 0 &{} \text{ if } &{} X_i \leqslant \min \{F_n^{-1}(u), F^{-1}(u)\},\\ X_i-F_n^{-1}(u) &{} \text{ if } &{} F_n^{-1}(u)<X_i\leqslant F^{-1}(u),\\ -X_i+F^{-1}(u) &{} \text{ if } &{} F^{-1}(u)<X_i \leqslant F^{-1}_n(u),\\ F^{-1}(u)-F_n^{-1}(u) &{} \text{ if } &{} X_i > \max \{F_n^{-1}(u), F^{-1}(u)\}, \end{array} \right. \end{aligned}$$
(19)

which implies

$$\begin{aligned} \left| \left\{ X_i-F_n^{-1}(u) \right\} ^+-\left\{ X_i-F^{-1}(u) \right\} ^+ \right| \leqslant \left| F^{-1}(u)-F_n^{-1}(u) \right| , \quad \forall i, \end{aligned}$$

and therefore

$$\begin{aligned} \left| R_{3n}(u) \right| \leqslant |F_n^{-1}(u)-F^{-1}(u)| \frac{1}{n}\sum _{i=1}^n {\hat{f}}_n(X_i). \end{aligned}$$

Under the assumptions made \(|F_n^{-1}(u)-F^{-1}(u)| {\mathop {\longrightarrow }\limits ^{a.s.}} 0\) (see, for example display (1.4.9) in Csörgő 1983). We also have that

$$\begin{aligned} \frac{1}{n} \sum _{i=1}^n {\hat{f}}_n(X_i) \leqslant \frac{1}{n}\sum _{i=1}^nf(X_i)+ \sup _x |f_n(x)-f(x)|+ \sup _x |{\hat{f}}_n(x)-f_n(x)|. \end{aligned}$$
(20)

From (17), Lemma 1 and taking into account that f is bounded, it follows that the right hand-side of inequality (20) is bounded a.s. Therefore,

$$\begin{aligned} R_{3n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(21)

We have that

$$\begin{aligned} R_{4n}(u)=R_{41n}(u)+R_{42n}(u), \end{aligned}$$
(22)

where

$$\begin{aligned} R_{41n}(u)=\frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+\delta _1(X_i) \end{aligned}$$

and

$$\begin{aligned} R_{42n}(u)=\frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+\delta _2(X_i), \end{aligned}$$

where \(\delta _1(x)\) and \(\delta _2(x)\) are as defined in (6) and (7), respectively. Since

$$\begin{aligned} R_{41n}(u)=(\sigma -{\hat{\sigma }})\{R_{1n}(u)+R_{2n}(u)\}/{\hat{\sigma }} \end{aligned}$$
(23)

and \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma \), from (16) and (18), it follows that

$$\begin{aligned} R_{41n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(24)

Using (12), we get that

$$\begin{aligned} |R_{42n}(u)| \leqslant M \frac{\sigma }{{\hat{\sigma }}} \left| \frac{\sigma }{{\hat{\sigma }}} -1 \right| R_{U,n}(u), \end{aligned}$$
(25)

where

$$\begin{aligned} R_{U,n}(u)=\frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+f_{U,n}(X_i). \end{aligned}$$

Proceeding as before, it can be seen that \( R_{U,n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} R(u)\), and hence

$$\begin{aligned} R_{42n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(26)

The result follows from (15), (16), (18), (21), (22), (24) and (26). \(\square \)

Proof of Theorem 2

Let us consider decomposition (15). We have that \(R_{1n}(u)+R_{2n}(u)=T_{1n}(u)+T_{2n}(u)\) with

$$\begin{aligned} T_{1n}(u)=K(0)\frac{1}{n^2h}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+, \end{aligned}$$

and \(T_{2n}(u)=\frac{n-1}{n}U_n(u)\), where \(U_n(u)=\frac{1}{n(n-1)}\sum _{i \ne j}H_n(X_i,X_j;u)\) is a degree two U-statistic with symmetric kernel

$$\begin{aligned} H_n(X_i,X_j; u)=\frac{1}{2} \left[ \{X_i-F^{-1}(u)\}^+ + \{X_j-F^{-1}(u)\}^+\right] \frac{1}{h}K\left( \frac{X_i-X_j}{h}\right) . \end{aligned}$$

We first see that

$$\begin{aligned} T_{1n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(27)

Notice that under the assumptions made,

$$\begin{aligned} {\mathbb {E}}\left[ \{X-F^{-1}(u)\}^+\right] <\infty , \end{aligned}$$
(28)

therefore, by the SLLN,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+ {\mathop {\longrightarrow }\limits ^{a.s.}} {\mathbb {E}}\left[ \{X-F^{-1}(u)\}^+\right] <\infty . \end{aligned}$$

Finally, since K is bounded and \(nh \rightarrow \infty \), (27) follows.

Next we will see that

$$\begin{aligned} T_{2n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} R(u). \end{aligned}$$
(29)

With this aim, we first calculate \({\mathbb {E}}\{U_n(u)\}\).

$$\begin{aligned} {\mathbb {E}}\{U_n(u)\}={\mathbb {E}}\{H_n(X_1, X_2; u)\}=\int \{x-F^{-1}(u)\}^+ \left( \int K(y)f(x-hy)dy\right) f(x)dx. \end{aligned}$$

Since \(\int K(y)f(x-hy)dy \rightarrow f(x)\), \(\{x-F^{-1}(u)\}^+ \left( \int K(y)f(x-hy)dy\right) f(x) \leqslant M \{x-F^{-1}(u)\}^+\) and (28), by dominated convergence theorem we have that \({\mathbb {E}}\{U_n(u)\} \rightarrow R(u)\).

Now, let \(A_n(x; u)={\mathbb {E}}\left\{ H_n(x,X;u) \right\} \) and \(\tau =\int x^2K(x)dx\). Routine calculations show that

$$\begin{aligned} A_n(x; u)= & {} A(x; u) +a_n(x; u),\\ A(x; u)= & {} \{x-F^{-1}(u)\}^+ f(x),\\ a_n(x; u)= & {} 0.5\tau h^2f''(x)\{x-F^{-1}(u)\}^+ +0.5 \tau h^2f'(x)+o(h^2). \end{aligned}$$

Let \(\varepsilon >0\). From the stated assumptions \({\mathbb {E}}\{a_n(X, u)^2\} = O(h^4)\), therefore

$$\begin{aligned} P\left( \frac{1}{n}\sum _{i=1}^n a_n(X_i; u)>\varepsilon \right) \leqslant \frac{1}{\varepsilon ^2}{\mathbb {E}}\left[ \left\{ \frac{1}{n}\sum _{i=1}^n a_n(X_i; u)\right\} ^2\right] =O(h^4). \end{aligned}$$

Since \(nh^4 \rightarrow 0\), it follows that

$$\begin{aligned} \sum _{n \geqslant 1} P\left( \frac{1}{n}\sum _{i=1}^n a_n(X_i; u)>\varepsilon \right) <\infty , \end{aligned}$$

which implies that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n a_n(X_i; u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$

Let \(B_n(u)=U_n(u)+ {\mathbb {E}}\{H_n(X_1,X_2; u)\}-\frac{2}{n}\sum _{i=1}^nA_n(X_i; u)\). It can be seen that \( {\mathbb {E}} \left\{ B_n(u)^2 \right\} =O(1/n^2\,h). \) Reasoning as before, we get that

$$\begin{aligned} B_n(u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$

Summarizing,

$$\begin{aligned}T_{2n}(u)=\frac{2}{n}\sum _{i=1}^nA(X_i; u)-R(u)+t_{2n}(u),\end{aligned}$$

with \(t_{2n}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0\). By the SLLN we have that \(\frac{1}{n}\sum _{i=1}^nA(X_i; u) {\mathop {\longrightarrow }\limits ^{a.s.}} R(u)\), and thus (29) is proven.

Finally, proceeding as in the proof of Theorem 1, one gets that \(R_{in}(u) {\mathop {\longrightarrow }\limits ^{a.s.}} 0\), \(i=3,4\). This completes the proof. \(\square \)

Lemma 2

Let f be a probability density function with finite support, (ab), continuous in (ab). Suppose also that K satisfies Assumption 1 (ii) and that h satisfies Assumption 2 (ii). Then \( \displaystyle \sup _{x \in (a,b)}|f_n(x)-f(x)|{\mathop {\longrightarrow }\limits ^{a.s.}} 0\).

Proof

From the proof of Theorem 2.1.3 in Prakasa Rao (1983), we have that

$$\begin{aligned} \sup _{x}|f_n(x)-{\mathbb {E}}\{f_n(x)\}| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$

So, it suffices to see that

$$\begin{aligned} \sup _{x \in (a,b)}|{\mathbb {E}}\{f_n(x)\}-f(x)|{\longrightarrow } 0. \end{aligned}$$
(30)

We have that

$$\begin{aligned}{\mathbb {E}}\{f_n(x)\}=\int K(y)f(x-hy)dy.\end{aligned}$$

From the continuity of f, for each fixed \(y \in {\mathbb {R}}\)

$$\begin{aligned} f(x+hy) \rightarrow f(x), \quad \forall x \in (a,b), \end{aligned}$$

and

$$\begin{aligned} K(y) f(x+hy) \leqslant M K(y).\end{aligned}$$

Thus, (30) holds true by applying dominated convergence theorem. \(\square \)

Proof of Theorem 3

First of all, we see that, under the assumptions made, R is a continuous function on [0, 1]. Since the function

$$\begin{aligned} G:&{\mathbb {R}}\times (0,1)&\mapsto {\mathbb {R}}\\&(x,u)&\mapsto G(x,u)= \{x-F^{-1}(u)\}^+f^2(x) \end{aligned}$$

is continuous, it follows that \(R(u)=\int G(x,u)dx\) is continuous on (0, 1). Recall that \(\displaystyle \lim _{u \rightarrow 1^-}R(u)=0=R(1)\) and, if the limit exists, \(R(0)=\displaystyle \lim _{u \rightarrow 0^+}R(u)\). Thus, R is continuous at \(u=1\). To see that it is also continuous at \(u=0\) it suffices to see that the limit \(\displaystyle \lim _{u \rightarrow 0^+}R(u)\) exits, which is true since

$$\begin{aligned}{} & {} \lim _{u ^\rightarrow 0+} \{x-F^{-1}(u)\}^+f^2(x)=(x-a)f^2(x), \\{} & {} \{x-F^{-1}(u)\}^+f^2(x) \leqslant c(x)=M(x-a)\textbf{1}\{a \leqslant x \leqslant b\} \end{aligned}$$

and \(\int c(x)dx<\infty \), then by dominated convergence theorem

$$\begin{aligned} \lim _{u \rightarrow 0^+}R(u)= \int \{x-a\}^+f^2(x)dx<\infty . \end{aligned}$$

Next, we consider decomposition (15) and study each term on the right hand-side of such expression. Since R is a continuous function on [0, 1], the point by point convergence of \(R_{1n}(u) \) to R(u) implies the uniform convergence on [0, 1], that is

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1} \left| R_{1n}(u)-R(u)\right| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(31)

As for \(R_{2n}(u)\), taking into account that \(\frac{1}{n}\sum _{i=1}^n \left\{ X_i-F^{-1}(u)\right\} ^+ \leqslant {\bar{X}}-a\), with \({\bar{X}}=(1/n)\sum _{i=1}^n X_i\), we have that

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1} \left| R_{2n}(u) \right| \leqslant \sup _{x \in [a,b]} |f_n(x)-f(x)| ({\bar{X}}-a). \end{aligned}$$

From Lemma 2, taking into account that Assumption 1 (i) implies Assumption 1 (ii) and that \({\bar{X}}-a {\mathop {\longrightarrow }\limits ^{a.s.}} {\mathbb {E}}(X)-a < \infty \), we conclude that

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1} \left| R_{2n}(u) \right| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(32)

From (31) and (32), we conclude that

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1} \left| R_{1n}(u)+R_{2n}(u)-R(u)\right| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(33)

For \(R_{3n}(u)\) we have that

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1}\left| R_{3n}(u) \right| \leqslant \sup _{0 \leqslant u \leqslant 1} |F_n^{-1}(u)-F^{-1}(u)| \frac{1}{n}\sum _{i=1}^n{\hat{f}}_n(X_i). \end{aligned}$$

In the proof of Theorem 1 we saw that \((1/n)\sum _{i=1}^n {\hat{f}}_{n}(X_i)\) is bounded a.s. Under the assumptions made (see, for example, p. 6 of Csörgő 1983),

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1} |F_n^{-1}(u)-F^{-1}(u)| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$

Therefore

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1} \left| R_{3n}(u) \right| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$

Finally, taking into account decomposition (22), it suffices to show that \(\sup _{0 \leqslant u \leqslant 1} \left| R_{4in}(u) \right| {\mathop {\longrightarrow }\limits ^{a.s.}} 0,\) \(i=1,2\). Using (23), (33) and that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), one gets that

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1} \left| R_{41n}(u)\right| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$

Using (25), (33) and that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), one similarly gets that

$$\begin{aligned} \sup _{0 \leqslant u \leqslant 1} \left| R_{42n}(u)\right| {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$

This concludes the proof. \(\square \)

Proof of Theorem 4

First of all, we see that, under the assumptions made, \(R \in L^2\). We have that

$$\begin{aligned} \int _0^1R^2(u)du= & {} \int _0^1\left( \int _{F^{-1}(u)}^{\infty }\{x-F^{-1}(u)\}f^2(x)dx\right) \\{} & {} \times \left( \int _{F^{-1}(u)}^{\infty }\{y-F^{-1}(u)\}f^2(y)dy \right) du=I_1-2I_2+I_3, \end{aligned}$$

with (recall that f is a bounded function and that \(\int _0^1 F^{-1}(u)^2 du={\mathbb {E}}(X^2)\))

$$\begin{aligned} I_1= & {} \int _0^1\left( \int _{F^{-1}(u)}^{\infty }xf^2(x)dx\right) ^2du \leqslant M {\mathbb {E}}(X^2)<\infty ,\\ I_2= & {} \int _0^1F^{-1}(u) \int _{F^{-1}(u)}^{\infty }xf^2(x)dx du \leqslant M\left( \int _0 ^1F^{-1}(u)^2du \int x^2 f(x)dx \right) ^{1/2}\\= & {} M {\mathbb {E}}(X^2)<\infty ,\\ I_3= & {} \int _0^1 F^{-1}(u)^2 \left( \int _{F^{-1}(u)}^{\infty }f^2(x)dx\right) ^2du \leqslant M \int _0^1 F^{-1}(u)^2 du=M{\mathbb {E}}(X^2) <\infty , \end{aligned}$$

and thus \(R \in L^2\).

Next, we consider decomposition (15) and study each term on the right hand-side of such expression. Since \(R_{1n}\) is an average of integrable i.i.d. random elements whose expectation is R(u), applying the SLLN in Hilbert spaces we obtain that (see, for example Theorem 2.4 of Bosq 2000),

$$\begin{aligned} R_{1n} {\mathop {\longrightarrow }\limits ^{a.s.}} R, \end{aligned}$$

in \(L^2\). As for \(R_{2n}(u)\), we have that

$$\begin{aligned} \left| R_{2n}(u) \right| \leqslant \sup _x |f_n(x)-f(x)| \frac{1}{n}\sum _{i=1}^n \{X_i-F^{-1}(u)\}^+. \end{aligned}$$

Let \(V(u)={\mathbb {E}}[\{X_i-F^{-1}(u)\}^+]\). A parallel reasoning to that used to prove that \(R \in L^2\) shows that \(V \in L^2\). Now, from the SLLN in Hilbert spaces it follows that \(\frac{1}{n}\sum _{i=1}^n \{X_i-F^{-1}(u)\}^+ {\mathop {\longrightarrow }\limits ^{a.s.}} V\), in \(L^2\). This fact and (17) gives

$$\begin{aligned} R_{2n} {\mathop {\longrightarrow }\limits ^{a.s.}} 0, \end{aligned}$$

in \(L^2\). From (15), (31) and (32), we conclude that

$$\begin{aligned} \Vert R_{1n}+R_{2n}-R \Vert {\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$
(34)

For \(R_{3n}(u)\) we have that

$$\begin{aligned} \left| R_{3n}(u) \right| \leqslant |F_n^{-1}(u)-F^{-1}(u)| \frac{1}{n}\sum _{i=1}^n{\hat{f}}_n(X_i). \end{aligned}$$

As shown in the proof of Theorem 1, the second factor on the right hand-side of the above inequality is a.s. bounded. Since \({\mathbb {E}}(X^2)<\infty \), from Lemma 8.3 in Bickel and Freedman (1981), it follows that \(\Vert F_n^{-1}-F^{-1}\Vert {\mathop {\longrightarrow }\limits ^{a.s.}} 0\), and thus \(R_{3n} {\mathop {\longrightarrow }\limits ^{a.s.}} 0\), in \(L^2\). Finally, taking into account decomposition (22), using (23), (25) (34) and that \({\hat{\sigma }} {\mathop {\longrightarrow }\limits ^{a.s.}} \sigma >0\), one also gets that \(R_{4n} {\mathop {\longrightarrow }\limits ^{a.s.}} 0\), in \(L^2\). This concludes the proof. \(\square \)

Proof of Theorem 5

Similar developments to those made in the proof of Theorem 2, and sharing the notation used there, show that

$$\begin{aligned}R_{1n}(u)+R_{2n}(u)=\frac{2}{n}\sum _{i=1}^nA(X_i; u)-R(u)+t_{2n}(u),\end{aligned}$$

with \(t_{2n} {\mathop {\longrightarrow }\limits ^{a.s.}} 0\) in \(L^2(w)\). By the SLLN in Hilbert spaces we have that \(\frac{1}{n}\sum _{i=1}^nA(X_i; \cdot ) {\mathop {\longrightarrow }\limits ^{a.s.}} R\), in \(L^2(w)\). Finally, proceeding as in the proof of Theorem 4 we get that \(\Vert R_{in} \Vert _w {\mathop {\longrightarrow }\limits ^{a.s.}} 0\), \(i=3,4\), which completes the proof. \(\square \)

Proof of Theorem 6

Similar developments to those made in the proof of Theorem 2, and sharing the notation used there, show that

$$\begin{aligned} \sqrt{n}\left\{ R_{1n}(u)+R_{2n}(u)-R(u)\right\} =\frac{2}{\sqrt{n}}\sum _{i=1}^n\left[ \{X_i-F^{-1}(u)\}^+ f(X_i) -R(u)\right] +o_P(1). \nonumber \\ \end{aligned}$$
(35)

Now, taking into account (19) we can write

$$\begin{aligned} \sqrt{n}R_{3n}(u)=T_{1n}(u)+T_{2n}(u)+T_{3n}(u), \end{aligned}$$
(36)

where

$$\begin{aligned} T_{1n}(u)= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n\left\{ X_i-F_n^{-1}(u) \right\} {\hat{f}}_n(X_i)\textbf{1}\{F_n^{-1}(u)<X_i\leqslant F^{-1}(u)\}, \\ T_{2n}(u)= & {} -\frac{1}{\sqrt{n}}\sum _{i=1}^n \left\{ X_i-F_n^{-1}(u) \right\} {\hat{f}}_n(X_i)\textbf{1}\{F^{-1}(u)<X_i \leqslant F^{-1}_n(u)\},\\ T_{3n}(u)= & {} \sqrt{n} \left\{ F^{-1}(u)- F^{-1}_n(u) \right\} \frac{1}{n}\sum _{i=1}^n {\hat{f}}_n(X_i)\textbf{1}\{F^{-1}(u)<X_i \}. \end{aligned}$$

We have that

$$\begin{aligned} 0 \leqslant T_{1n}(u) \leqslant \left| \sqrt{n} \left\{ F^{-1}(u)- F^{-1}_n(u) \right\} \right| \frac{1}{n}\sum _{i=1}^n {\hat{f}}_n(X_i) \textbf{1}\{F_n^{-1}(u) <X_i\leqslant F^{-1}(u)\}. \end{aligned}$$

Under the assumptions made, \(\sqrt{n} \left\{ F^{-1}_n(u)- F^{-1}(u) \right\} =O_P(1)\) and \(F^{-1}_n(u) {\mathop {\longrightarrow }\limits ^{a.s.}} F^{-1}(u)\), which implies that for each \(\varepsilon >0\) there exists \(n_0\) such that \(\textbf{1}\{F_n^{-1}(u)<X_i\leqslant F^{-1}(u)\} \leqslant \textbf{1}\{F^{-1}(u)-\varepsilon <X_i\leqslant F^{-1}(u)\}\), \(\forall n \geqslant n_0\), and hence \({\mathbb {E}}[ \textbf{1}\{F_n^{-1}(u)<X_i\leqslant F^{-1}(u)\} ]=O(\varepsilon )\). Proceeding as in the proof of Theorem 1, it can be seen that \((1/n)\sum _{i=1}^n {\hat{f}}_n(X_i)=O_P(1)\). Therefore \(T_{1n}(u) {\mathop {\longrightarrow }\limits ^{P}} 0\). Analogously, it can seen that \(T_{2n}(u) {\mathop {\longrightarrow }\limits ^{P}} 0\) and that \(\frac{1}{n}\sum _{i=1}^n f_n(X_i)\textbf{1}\{F^{-1}(u)<X_i \leqslant F_n^{-1}(u)\}=o_P(1)\). Now, proceeding as in the proof of Theorem 2, it can be seen that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n f_n(X_i)\textbf{1}\{F^{-1}(u)<X_i\} {\mathop {\longrightarrow }\limits ^{P}} \mu (u)={\mathbb {E}}\left[ f(X)\textbf{1}\{F^{-1}(u)<X\} \right] . \end{aligned}$$
(37)

From Lemma 1,

$$\begin{aligned} \left| \frac{1}{n}\sum _{i=1}^n \left\{ {\hat{f}}_n(X_i)-f_n(X_i)\right\} \textbf{1}\{F^{-1}(u)<X_i\} \right| \leqslant \sup _x |{\hat{f}}_n(x)-f_n(x)|{\mathop {\longrightarrow }\limits ^{a.s.}} 0. \end{aligned}$$

We also have that (see, e.g. Theorem 2.5.2 in Serfling 1980)

$$\begin{aligned} \sqrt{n} \left\{ F^{-1}(u)- F^{-1}_n(u) \right\} = \frac{1}{\sqrt{n}}\sum _{i=1}^n\frac{\textbf{1}\{X_i \leqslant F^{-1}(u)\} -u}{f(F^{-1}(u))}+o_P(1). \end{aligned}$$

Thus, by Slutsky theorem,

$$\begin{aligned} T_{3n}(u)=\frac{1}{\sqrt{n}}\sum _{i=1}^n \frac{\textbf{1}\{X_i \leqslant F^{-1}(u)\}-u }{f(F^{-1}(u))} {\mathbb {E}} \left[ f(X)\textbf{1}\{F^{-1}(u)<X \}\right] +o_P(1). \end{aligned}$$

Therefore, it has been shown that

$$\begin{aligned} \sqrt{n}\{R_{1n}(u)+R_{2n}(u)+R_{3n}(u)-R(u)\}=\frac{1}{\sqrt{n}} \sum _{i=1}^n Y_i(u)+o_P(1). \end{aligned}$$
(38)

To prove the result, it remains to see that \(\sqrt{n} R_{4n}(u)=o_P(1)\). Recall decomposition (22). From (23) and (35), it follows that

$$\begin{aligned} \sqrt{n}R_{41n}(u)=-\sqrt{n}\{{\hat{\sigma }}-\sigma \}R(u)/\sigma +o_P(1). \end{aligned}$$
(39)

Now we study \(R_{42n}(u)\). A Taylor expansion of \(K((X_j-X_i)/{{\hat{h}}})\) around \(K\big ((X_j-X_i)/{{h}}\big )\) gives,

$$\begin{aligned} \sqrt{n}R_{42n}(u)= & {} \frac{\sigma }{{\hat{\sigma }}^2} \sqrt{n}(\sigma -{\hat{\sigma }}) \frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+ \\{} & {} \times \frac{1}{n{h}} \sum _{j=1}^n \frac{X_j-X_i}{h}K'\left( \frac{X_j-X_i}{h}\right) +Q_n(u), \end{aligned}$$

where

$$\begin{aligned} Q_n(u)= & {} \frac{1}{2\sqrt{n}}\frac{\sigma }{{\hat{\sigma }}^3} \left\{ \sqrt{n}(\sigma -{\hat{\sigma }})\right\} ^2 \frac{1}{n}\sum _{i=1}^n\{X_i-F^{-1}(u)\}^+ \\{} & {} \times \frac{1}{n{h}} \sum _{j=1}^n \left( \frac{X_j-X_i}{{h}}\right) ^2 K''\left( \frac{X_j-X_i}{\tilde{h}}\right) , \end{aligned}$$

where \(\tilde{h}=\alpha h + (1-\alpha ) {\hat{h}}\), for some \(\alpha \in (0,1)\). The assumptions made imply that \(Q_n(u)=o_P(1)\). Now proceeding as in the proof of Theorem 2, and taking into account that \(\int u K'(u)f(x+hu)du\rightarrow f(x)\int u K'(u)du=-f(x)\), we obtain that

$$\begin{aligned} \sqrt{n}R_{42n}(u)=\sqrt{n}({\hat{\sigma }}-\sigma )R(u)/\sigma +o_P(1). \end{aligned}$$
(40)

Finally, the result follows from (38), (39) and (40). \(\square \)

Proof of Theorem 7

Similar developments to those made in the proof of Theorem 2, and sharing the notation used there, show that

$$\begin{aligned} \sqrt{n}\left\{ R_{1n}(u)+R_{2n}(u)-R(u)\right\}= & {} \frac{2}{\sqrt{n}}\sum _{i=1}^n\left[ \{X_i-F^{-1}(u)\}^+ f(X_i) -R(u)\right] \\{} & {} +r_{1n}(u), \quad u \in (0,1),\end{aligned}$$

with \(\int _0^1r_{1n}(u)^2w(u)du {\mathop {\longrightarrow }\limits ^{P}} 0\).

As for \(R_{3n}(u)\), we consider decomposition (36). In the proof of Theorem 6 it was shown that \(T_{1n}(u) {\mathop {\longrightarrow }\limits ^{P}} 0\), for each \(u \in (0,1)\). We have that

$$\begin{aligned} 0 \leqslant T_{1n}(u) \leqslant \left| \sqrt{n}\left\{ F^{-1}_n(u)- F^{-1}(u) \right\} \right| \,\frac{1}{n}\sum _{i=1}^n {\hat{f}}_n(X_i). \end{aligned}$$

From Theorems 2.1, 3.1.1 and 3.2.1 in Csörgő (1983), it follows that

$$\begin{aligned} n\int _0^{(n-1)/n}\left\{ F^{-1}_n(u)- F^{-1}(u) \right\} ^2w(u)du=O_P(1). \end{aligned}$$

Under the assumptions made,

$$\begin{aligned} 0 \leqslant \frac{1}{n}\sum _{i=1}^n {\hat{f}}_n(X_i) {\mathop {\longrightarrow }\limits ^{P}} {\mathbb {E}}\{f(X)\}<\infty . \end{aligned}$$

Thus, by dominated convergence theorem, it follows that

$$\begin{aligned} \int _0^{(n-1)/n} T_{1n}(u)^2w(u)du {\mathop {\longrightarrow }\limits ^{P}} 0. \end{aligned}$$

Analogously it can be seen that

$$\begin{aligned} \int _0^{(n-1)/n} T_{2n}(u)^2w(u)du {\mathop {\longrightarrow }\limits ^{P}} 0. \end{aligned}$$

From Theorems 2.1 and 3.2.1 in Csörgő (1983), it follows that

$$\begin{aligned} \sup _{0<u\leqslant \frac{n-1}{n}}\left| \sqrt{n}\left\{ F^{-1}_n(u)- F^{-1}(u) \right\} f(F^{-1}(u))- \frac{1}{\sqrt{n}} \sum _{i=1}^n \Big [ \textbf{1}\{X_i \leqslant F^{-1}(u)\}-u \Big ] \right| {\mathop {\longrightarrow }\limits ^{a.s.}}0. \end{aligned}$$

Notice that the convergence in (37) holds for each \(u \in [0,1]\). Since \(\mu (u)\) is a continuous function, it follows that such convergence is uniform on the interval [0, 1]. Therefore,

$$\begin{aligned} \int _0^{(n-1)/n} \left\{ T_{3n}(u)-\frac{1}{\sqrt{n}} \sum _{i=1}^n \frac{\textbf{1}\{X_i \leqslant F^{-1}(u)\}-u }{f(F^{-1}(u))} \mu (u) \right\} ^2w(u)du {\mathop {\longrightarrow }\limits ^{P}} 0. \end{aligned}$$

Summarizing, it has been shown that

$$\begin{aligned} \sqrt{n}\{R_{1n}(u)+{} & {} R_{2n}(u)+R_{3n}(u)-R(u)\}=\frac{1}{\sqrt{n}} \sum _{i=1}^n Y_i(u)+r_{2n}(u), \\{} & {} \int _0^{(n-1)/n}r_{2n}(u)^2w(u)du {\mathop {\longrightarrow }\limits ^{P}} 0. \end{aligned}$$

The same steps given in the proof of Theorem 6 to show that \(\sqrt{n}R_{4n}(u)=o_P(1)\) can be used to see that \(\int _0^{(n-1)/n}n R_{4n}(u)^2w(u)du {\mathop {\longrightarrow }\limits ^{P}} 0\). This completes the proof. \(\square \)

Proof of Corollary 2

Define

$$\begin{aligned} {\widetilde{Y}}_i(u)=\left\{ \begin{array}{ll} Y_i(u) &{} \displaystyle \text{ if } u \in {\mathbb {I}}_n=\left( 0, \, (n-1)/n \right] ,\\ 0 &{} \displaystyle \text{ if } u \in \left( (n-1)/n, \, 1 \right] , \end{array} \right. \end{aligned}$$

\(1 \leqslant i \leqslant n\), and \(W_n(u)=\frac{1}{\sqrt{n}} \sum _{i=1}^n {\widetilde{Y}}_i(u)\), \(u \in [0,1]\).

From Theorem 7, we have that

$$\begin{aligned} n\int _0^1\left\{ R_n(u)-R(u)\right\} ^2w(u)du= & {} \int _0^1 W_n(u)^2w(u)du \nonumber \\{} & {} +n\int _{(n-1)/n}^1R(u)^2w(u)du+o_P(1). \end{aligned}$$
(41)

We first see that \({\widetilde{Y}}_1 \in L^2(w)\). With this aim we write \({\widetilde{Y}}_1(u)={\widetilde{Y}}_{11}(u)-{\widetilde{Y}}_{12}(u)+{\widetilde{Y}}_{13}(u)\), with

$$\begin{aligned} {\widetilde{Y}}_{11}(u)=2 \{X_1-F^{-1}(u)\}^+ f(X_1)\textbf{1}(u \in {\mathbb {I}}_n), \qquad {\widetilde{Y}}_{12}(u)=2R(u)\textbf{1}(u \in {\mathbb {I}}_n), \end{aligned}$$

and

$$\begin{aligned} {\widetilde{Y}}_{13}(u)=\frac{\textbf{1}\{X_i \leqslant F^{-1}(u)\}-u }{f(F^{-1}(u))} \mu (u) \textbf{1}(u \in {\mathbb {I}}_n). \end{aligned}$$

In the proof of Theorem 4 we saw that R, \(\{X_1-F^{-1}(u)\}^+ \in L^2\); since f is bounded, we also have that \({\widetilde{Y}}_{11}, \, {\widetilde{Y}}_{12} \in L^2(w)\). As for \({\widetilde{Y}}_{13}\),

$$\begin{aligned} \int _0^1{\widetilde{Y}}_{13}^2(u)w(u)du=\int _{{\mathbb {I}}_n}\left[ \textbf{1}\{X_1 \leqslant F^{-1}(u)\}-u \right] ^2\mu ^2(u)du \leqslant \int _0^1\mu ^2(u)du<\infty , \end{aligned}$$

because \(\mu (u) \leqslant M\), \(\forall u \in [0,1]\), since f is bounded. Thus, \({\widetilde{Y}}_1 \in L^2(w)\).

From the central limit theorem in Hilbert spaces and the continuous mapping theorem,

$$\begin{aligned} \int _0^1W_n(t)^2w(u)du {\mathop {\longrightarrow }\limits ^{{\mathcal {L}}}} \Vert Z\Vert ^2_w. \end{aligned}$$
(42)

Since R is a decreasing function with \(\displaystyle \lim _{u\uparrow 1} R(u)=0\), and w is bounded

$$\begin{aligned} 0 \leqslant n\int _{(n-1)/n}^1R(u)^2w(u)du \leqslant M n\int _{(n-1)/n}^1R(u)^2du \leqslant M R^2((n-1)/n) \rightarrow 0. \nonumber \\ \end{aligned}$$
(43)

The result follows from (41)–(43). \(\square \)