Multivariate Generalized Linear-statistics of short range dependent data

Generalized linear (GL-) statistics are defined as functionals of an U-quantile process and unify different classes of statistics such as U-statistics and L-statistics. We derive a central limit theorem for GL-statistics of strongly mixing sequences and arbitrary dimension of the underlying kernel. For this purpose we establish a limit theorem for U-statistics and an invariance principle for U-processes together with a convergence rate for the remaining term of the Bahadur representation. An application is given by the generalized median estimator for the tail-parameter of the Pareto distribution, which is commonly used to model exceedances of high thresholds. We use subsampling to calculate confidence intervals and investigate its behaviour under independence and strong mixing in simulations.


Introduction
Generalized linear statistics (GL-statistics) form a broad class of statistics, which unifies not only the widely used U -statistics but also other classes like L-statistics and even statistics which cannot be assigned to a certain class. GL-statistics were first developed by Serfling (1984), who shows a central limit theorem under independence. In this paper we develop results for GL-statistics of random variables which are short range dependent. An important tool to gain a Central Limit Theorem for GL-statistics are U -statistics with multivariate kernels. Up to now we can find a lot of results for bivariate U -statistics of short range dependent data (cf. Borovkova et al. (2001), Dehling and Wendler (2010) and Wendler (2011a)) but in the multivariate case there occur some additional difficulties caused by the dependencies in the kernel structure. Now let us introduce some basic assumptions and definitions which we will use throughout the paper. Let X 1 , . . . , X n be a sequence of random variables with distribution function F . We will assume the random variables to be short range dependent, a detailed definition is given later on. Moreover, let F n be the empirical distribution function of X 1 , . . . , X n with and h(x 1 , . . . , x m ), for given m ≥ 2, a kernel, that is a measurable, symmetric function. We define the empirical distribution function H n of h (X i 1 , . . . , X im ) as H n (x) = 1 n m 1≤i 1 <...<im≤n 1 [h(Xi 1 ,...,X im )≤x] , − ∞ < x < ∞ and H −1 n (p) = inf{x|H n (x) ≥ p} as the related generalized inverse. Furthermore, let H F with H F (y) = P F (h(Y 1 , . . . , Y m ) ≤ y) be the distribution function of the kernel h for independent copies Y 1 , . . . , Y m of X 1 and 0 < h F < ∞ the related density (this implies that H F is continuous). We define h F ;X i 2 ,...,X i k as the density of h(Y i 1 , X i 2 , . . . , X i k , Y i k+1 , . . . , Y im ) for 2 ≤ k ≤ m and i 1 < i 2 < . . . < i m .
Definition 1.1. A generalized L-statistic with kernel h is given by The GL-statistic T (H n ) is a natural estimator of T (H F ), which is defined analogously.
Let h : R m → R be a measurable function. A U -statistic with kernel h is defined as U n = 1 n m 1≤i 1 <...<im≤n h(X i 1 , . . . , X im ).
If the random variables are independent and identically distributed, U n is an unbiased estimator of θ = E(h(X 1 , . . . , X m )). A U -statistic can be written as a GL-statistic by setting d = 0 and J = 1.

Example 1.2.
A widely known L-statistic is the α-trimmed mean where X (i) is the ith value of the order statistic X (1) ≤ X (2) ≤ . . . ≤ X (n) . To rewrite it as a GL-statistic we choose J(t) = 1 1−2α for α < t < 1 − α and J(t) = 0 everywhere else. As kernel we set h(x) = x and let the sum vanish by the choice d = 0.
In the following we will consider a special form of short range dependence: strong mixing.
Definition 1.2. Let (X n ) n∈N be a stationary process. The strong mixing coefficients of (X n ) are where F b a is the σ-field generated by X a , . . . , X b . (X n ) n∈N is called strongly mixing (or α-mixing), if α(k) → 0 as k → ∞.
Strong mixing is the weakest among the different forms of mixing since the α-mixing coefficients are always smaller than for example the β-mixing coefficients (cf. Bradley (2007)).
After stating the main results, among others the Central Limit Theorem for GL-statistics, we also provide some results concerning U -statistics and U -processes. In a second step we give an application, the generalized median estimator (GM -estimator) for the tail parameter of the Pareto distribution (cf. Serfling (2000a) andSerfling (2000b) under independence). The Pareto distribution is commonly used for modelling heavy tails and exceedances of a threshold (peak over threshold, POT). Especially in hydrology it has wide application when only extreme floods above a certain threshold should be considered in the analysis. There also occurs the need of a robust estimator, needing a downweighting of the influence of extreme floods in short time series. Simulations verify that the generalized median estimator is almost as efficient as the maximum likelihood estimator under independence and for autocorrelated data, but more robust. Short range dependence is up to now seldom modelled in the estimation of parameters under POT, but when considering for example monthly discharges it is very probable to find such dependencies. Our investigation of the generalized median estimator aims at closing this gap and can be extended to other situations, where a robust estimator for dependent data is needed.
Results needed for the proofs of the main results are given in Section 4, the proofs of the results given in Section 2 can be found in Section 5.

Main Results
An important and well known result concerning quantiles is the representation proposed by Bahadur, which uses the representation of the quantile by the empirical distribution function. A key role plays the remaining term, for which Ghosh (1971) showed the convergence for ordinary quantiles and under independence. In our case we need the convergence of generalized quantiles and strong mixing. The result is stated in the following theorem.
Theorem 2.1. Let (X n ) n∈N be a sequence of strong mixing random variables with distribution function F , E|X 1 | ρ < ∞ for a ρ ≥ 1 and mixing coefficients α(l) = O(l −δ ) for a δ > 2ρ+1 ρ . Moreover let h(x 1 , . . . , x m ) be a Lipschitz-continuous kernel with distribution function H F and related density 0 < h F < ∞ and for all 2 ≤ k ≤ m let h F ;X 2 ,...,X k be bounded. Then we have for the Bahadur representation withξ p = H −1 n (p) Now we will state the main theorem of our paper, the asymptotic normality of GLstatistics under strong mixing. Under independence this result was proved by Serfling (1984).

Theorem 2.2.
Let h(x 1 , . . . , x m ) be a Lipschitz-continuous kernel with distribution function H F and related density 0 < h F < ∞ and for all 2 ≤ k ≤ m and all i 1 < i 2 < . . . < i m let h F ;X i 2 ,...,X i k be bounded. Moreover let J be a function with J(t) = 0 for t / ∈ [α, β] , 0 < α < β < 1, and in [α, β] let J be bounded and a.e. continuous concerning the Lebesgue-measure and a.e. continuous concerning H −1 F . Additionally, let X 1 , . . . , X n be a sequence of strong mixing random variables with E|X 1 | ρ < ∞ for a ρ ≥ 1 and mixing coefficients α(n) with α(n) = O(n −δ ) for a δ ≥ 8. Then the following statement holds for GL-Statistics T (H n ) with independent copies Y 1 , . . . , Y m of X 1 and .
For the proof of this theorem, which is given in Section 5, a key tool will be the representation of the kernel A as a U -statistic, see Example 1.1. Additionally also the functional H n belongs to the class of U -statistics and therefore we make use of several results of the theory of U -statistics. In the following section we will extend some known results for bivariate U -statistics under strong mixing to the multivariate case. We will see that this extension causes some problems concerning the dependencies in the kernels and the solution of these problems is not straightforward.
Remark 2.1. In the case of bivariate kernels, similar results as Theorems 2.3 and 2.4 can be found in Borovkova et al. (2001), Dehling and Philipp (2002) and Wendler (2011a) for NED-sequences of absolutely regular processes. We conjecture that an extension to the multivariate case is possible also under this other type of weak dependence, but detailed proofs are beyond the scope of this paper.

U -statistics and U -processes
While examining U -statistics often a technique called Hoeffding decomposition (Hoeffding (1948)) is used. It decomposes the U -statistic into a sum of different terms, which we can examine separately.
Definition 2.1. (Hoeffding decomposition) Let U n be a U -statistic with kernel h = h(x 1 , . . . , x m ). Then one can write U n as . . .
The term m n n i=1 g 1 (X i ) is called the linear part, the remaining parts are called degenerated.
For most of the results in this section we need a regularity condition for the kernel h, which was first developed by Denker and Keller (1986) and is extended for our purpose.
Definition 2.2. A kernel h satisfies the variation condition, if there exists a constant L and an 0 > 0, such that for all ∈ (0, 0 ) where the X i are independent with the same distribution as X i and · is the Euklidean norm.
A kernel h satisfies the extended variation condition, if there additionally exist constants L > 0 and δ 0 > 0, such that for all δ ∈ (0, δ 0 ) and all 2 ≤ k ≤ m E sup for independent copies (Y n ) n∈N of (X n ) n∈N and all i 1 < i 2 < . . . < i m . If the kernel has dimension m = 1, we note that it satisfies the extended variation condition, if it satisfies the variation condition.
Remark 2.2. Every Lipschitz-continuous kernel satisfies the variation condition.
Now we state another main result of this paper, the aymptotic normality of U -statistics under strong mixing. For bivariate U -statistics this result is already known (see Wendler (2011a)), but not for arbitrary dimension m of the kernel h.
Let h : R m → R be a bounded kernel satisfying the extended variation condition. Moreover let (X n ) n∈N be a sequence of strong mixing random variables with E|X 1 | ρ < ∞ for a ρ > 0 and mixing coefficients α(l) with σ 2 = Var(g 1 (X 1 )) + 2 ∞ j=1 Cov(g 1 (X 1 ), g 1 (X 1+j )). If σ = 0 then the statement means convergence to 0 in probability.
The key tool for the proof of this theorem is the Hoeffding decomposition, for which the first term converges against the given distribution while all remaining terms converge towards zero.
As an extension to U -statistics we also analyse U -processes and their convergence. In other words our U -statistic has no longer a fixed kernel h but we have a process (U n (t)) t∈R . Up to now we have had (H n (t)) t∈R as an example of such a process.
Definition 2.3. Let h : R m+1 → R be a measurable and bounded function, symmetric in the first m arguments and non-decreasing in the last. Suppose that for all We call the process (U n (t)) t∈R empirical U -distribution function. As U -distribution function we define U (t) := E (h(Y 1 , . . . , Y m , t)) for independent copies Y 1 , . . . , Y m of X 1 . Then the empirical process is defined as Analogous to simple U -statistics here the Hoeffding decomposition is an important technique in our proofs. For fixed t we have and therefore we can decompose U n (t) analogously to Definition 2.1. Likewise we will need a new form of the extended variation condition.
Definition 2.4. We say h satisfies the extended uniform variation condition, if the extended variation condition holds for h(x 1 , . . . , x m , t) with a constant not depending on t.
A typical result for processes is the Invariance Principle, a result we also need for our U -processes. For near epoch dependent sequences on absolutely regular processses it was already proved by Dehling and Philipp (2002). A result for strong mixing can be found in Wendler (2011a). Under independence one can find a strong invariance principle in Dehling et al. (1987). Nevertheless these results only consider the bivariate case, whereas we also admit multivariate kernels. For our purposes we only need the convergence of the first term of the Hoeffding decomposition, so the proof will be somewhat different.
From now on consider the case where H n is our empirical U -process, that is U n (t) has the kernel g( Theorem 2.4. Let h be a kernel with distribution function H F and related density h F < ∞. Moreover, let g 1 be the first term of the Hoeffding decomposition of H n . Let (X n ) n∈N be a sequence of strong mixing random variables with mixing coefficients α where W is a continuous Gaussian process.
This theorem can be proved in the same way as Theorem 4.1 of Dehling and Philipp (2002) and is therefore omitted.
By using results concerning the convergence of all remaining terms of the Hoeffding decomposition, which is given in Lemma 4.4, we can state the following corollary.
Let (X n ) n∈N be a sequence of strong mixing random variables with mixing coefficients α(l) = O(l −δ ) for δ ≥ 8 and E|X 1 | ρ < ∞ for a ρ > 1 4 . Moreover let h be a Lipschitzcontinuous kernel with distribution function H F and related density h F < ∞ and for all 2 ≤ k ≤ m let h F ;X 2 ,...,X k be bounded. Then The proofs of all results in this section are given in Section 5.

Application: The Generalized Median Estimator
The generalized median (GM -) estimator was developed by Brazauskas and Serfling under independence as a robust estimator of the parameters of different distributions, for example the Pareto distribution or Log-Normal distribution (Brazauskas and Serfling (2000a), Brazauskas and Serfling (2000b) and Serfling (2002)). We will concentrate on the Pareto distribution, which is a very heavy tailed distribution often used in hydrology and other fields for modelling the tail of a distribution. Its distribution function is given by where α > 0 and σ > 0. We assume σ to be unknown and estimate it through the minimum of the sample. We want to expand the GM -estimator to sequences of strong mixing random variables with Pareto distributed margins and estimate the tail index α. Therefore we have to choose a kernel which is median unbiased. Like Brazauskas and Serfling (2000a) we choose the modified maximum likelihood estimator as kernel, which was shown to be median unbiased under independence, and use this result to show its asymptotical median unbiasedness under strong mixing, that is where M 2m−2 is the median of the χ 2 2m−2 -distribution. Lemma 3.1. For a sequence of strong mixing, Pareto distributed random variables (X n ) n∈N with E |X 1 | ρ < ∞ for a ρ ≥ 1 and mixing coefficients α ..,xm))) is asymptotically median unbiased. Proof.
We have E (H n − H F ) 2 −→ 0 using the same arguments as in Lemma 2.1. With arguments of Glivenko-Cantelli type this implies Following Example 1 of Pollard (1984) the proof is completed.
The GM -estimator of the parameter α is then given bŷ which can be expressed as an GL-statistic by choosing J = 0, d = 1, a 1 = 1, and The results concerning robustness given by Brazauskas and Serfling (2000b) remain valid since the kernel is unchanged. Additionally one can show that the influence function of the GM -estimator is bounded (cf. Serfling (1984)).
In the following simulations we compute confidence intervals for the tail index α using subsampling (cf. Politis and Romano (1994)). We show the coverage probability and the length of the confidence interval for different block lengths in subsampling and three different kernel dimensions of the generalized median estimator, that is m = 2, 3, 4.
The underlying n=100 random variables we compute as independent, identically Paretodistributed with α = 1 and σ = 2 and also from an AR(1)-process with autocorrelation coefficient ρ = 0.2 and Pareto-distributed margins. The simulation is repeated 500 times.
The procedure of subsampling is as follows: Because √ n (α GM − α) converges against an unknown distribution, we estimate the quantiles of the distribution the following way: we first choose a blocklength b = b n with b n → ∞ and bn n → 0 for n → ∞. Then we calculate the GM -estimator of α for each of the n − b + 1 subsamples consisting of b consecutive data values, getting a vector of estimates α 1 GM , . . . ,α n−b+1 GM . Using the quantiles q * γ = L −1 n (γ) are calculated, whereα is the GM -estimate for α derived from the whole sample. The confidence interval CI for a confidence level 1 − γ is then These results are compared with the case m = n corresponding to the maximumlikelihood (ML) estimator. All simulations were done in R 3.0.1 using the packages VGAM and fExtremes and the algorithm of Wilde and Grimshaw (2013) for the generalized median estimator. We need   (500). First we investigate the efficiency of the GM -estimator in comparison with the classical maximum-likelihood estimator corresponding to the case m = n. For this we have a look at the coverage probability and the length of the confidence interval under data from an ideal model. As expected we see in Tables 1 and 2  We also tested the case where ρ = 0.8, but the results for a sample size n = 100 were very poor for all cases of m with a coverage probability always about 0.3 and a length of the confidence interval between 3 and 10, and therefore they are omitted here.
Additionally we compared the robustness of the M L-estimator (m = n) with the GMestimator for m = 2, the most robust case. We contaminate a sample by adding a value y i of the interval (0, 100], and calculate the average coverage probability, that is where CI1 j and CI2 j are the bounds of the confidence interval calculated for the jth sample (X (1) j , . . . , X (n) j ) and CI1 j (i) and CI2 j (i) are the bounds of the confidence interval calculated for the jth sample contaminated by y i , (X (1) j , . . . , X (n) j , y i ), for a confidence level of 0.95 respectively and j = 1, . . . , 100. The confidence intervals were again computed by subsampling with a block length of 15. This method is analogous to classical sensitivity curves, but focuses on the coverage probability. The results can be found in Figure 1. Examining the robustness for data which are contaminated by a value y we can see that for the M L-estimator in all three dependence cases the coverage probability flattens for increasing y but does not reach a constant value. This indicates a non-robust behaviour. The opposite can be seen for the GM -estimator, which coverage probability becomes constant when y exceeds 5 and only fluctuates between two values. The behaviour of both estimators close to zero is similar. When y decreases towards the lower bound of the distribution, both estimators have large deviations between the contaminated coverage probability and the uncontaminated one. Nevertheless the results concerning the robustness of the GM -estimator with m = 2 are confirmed by the simulations. The results for m = 3, 4, 5 were very similar, showing also a robust behaviour of the estimator by a constant coverage probability, and are therefore omitted here.
Altogether we can say that the GM -estimator is a good alternative to the M L-estimator and has similar coverage probability as well as length of the confidence interval even for small choices of m. These small choices give us an estimator, which is easy to calculate and for which we have shown that it is robust in contrast to the M L-estimator. This is underlined by the results in Figure 1.

Preliminary Results
In this section we state some results, which will help us to prove or main results. First of all we want to use the (extended) variation condition not only for the kernel h, but also for the kernels g k , 1 ≤ k ≤ m, of the Hoeffding decomposition. For that the following lemma is helpful.
If the kernel h satisfies the extended variation condition, then the kernels g k , 1 ≤ k ≤ m, satisfy it as well.
Proof. The proof will be made by mathematical induction. Initially let k = 1. We had defined because h satisfies the variation condition. So g 1 satisfies the extended variation condition. Now let g k−1 satisfy the extended variation condition. We show that g k also satisfies it: The space of the functions satisfying the (extended) variation condition is a vector space (cf. Wendler (2011a)) and since we know that all kernels up to g k−1 satisfy the variation condition, it is sufficient to show that E(h(x 1 , . . . , x k , Y k+1 , . . . , Y m )) − θ satisfies the extended variation condition.
since h satisfies the extended variation condition.
Remark 4.1. All results shown before for the extended variation condition without parameter t remain true for the extended uniform variation condition.
To ultimately show the asymptotic normality of U -statistics of strongly mixing random variables, we will first generalize some lemmas proved by Wendler (2011a) respectively Dehling and Wendler (2010) or Wendler (2011b) from the case m = 2 to arbitrary m.
First we need a covariance inequality, which we can establish by the coupling technique. A similar result for absolutely regular variables can be found in Yoshihara (1976). Here we will follow Wendler (2011a) and expand the lemma to the case m ≥ 2, meaning we will treat g k for 2 ≤ k ≤ m. The proof is analogous to Wendler (2011a) using the extended variation condition instead of the ordinary one and is therefore omitted.

Lemma 4.2.
Let (X n ) n∈N be a strong mixing sequence of random variables with E|X 1 | ρ < ∞ for a ρ > 0 and h a bounded kernel, which satisfies the extended variation condition. Moreover

Lemma 4.3.
Let the kernel h be bounded and satisfy the extended variation condition. Let (X n ) n∈N be a sequence of strong mixing random variables with E |X 1 | ρ < ∞ for a ρ > 0 and let n l=0 lα ρ 2ρ+1 (l) = O(n γ ) for a γ ≥ 0 hold. Then for all 2 ≤ k ≤ m n i 1 ,...,i 2k =1 2k) . We can rewrite the above sum as by application of Lemma 4.2.
For a further simplification we calculate via combinatorical arguments the quantity of the summands of the inner sum, that is the quantity of tuples (i 1 , . . Consequently the inner sum altogether is (2k)! · n 2 ln 2k−4 = l · (2k)! · n 2k−2 and therefore n i 1 ,...,i 2k =1 We also need results concerning the remaining terms of the Hoeffding decomposition for U -processes. In this case we of course do not need simple convergence against zero, but since we consider processes need to have convergence of the supremum.
The following lemma was proved by Wendler (2011a) for the case m = 2. We will modify the main idea of the proof to obtain a similar result for the degenerated terms of higher dimensional U -processes.

Lemma 4.4.
Let h be a kernel satisfying the extended uniform variation condition, such that the Udistribution function U is Lipschitz-continuous. Moreover let (X n ) n∈N be a sequence of strong mixing random variables with mixing coefficients α(l) = O(l −δ ) for δ ≥ 8 and E|X i | ρ < ∞ for a ρ > 1 4 . Then for all 2 ≤ k ≤ m and γ = δ−2 δ we have sup t∈R 1≤i 1 ,...,i k ≤n
From now on suppose that the statement of the lemma is valid for k − 1. Together with the above consideration we have for every t ∈ [t r−1,l , t r,l ] and 2 l ≤ n < 2 l+1 1≤i 1 <i 2 ≤n (g 2 (X i 1 , X i 2 , t r,l ) − g 2 (X i 1 , X i 2 , t r−1,l )) Again we will treat the first, second and last summand separately. For the first summand follows For the first inequality we used the so called chaining technique: via the triangular inequality we parted the term Q n into two differences Q 2 l +i2 d − Q 2 l +(i−1)2 d . Now we apply the Chebychev inequality getting for every > 0 Then with the Borel-Cantelli Lemma That is, max r=0,...,s |Q k n (t r,l )| = o(n k− 1 2 − γ 8 ). Now we will treat the second summand for which we want to apply Lemma 4.2.1 of Wendler (2011a). For 2 l ≤ n < 2 l+1 it follows By usage of the assumption |U (t r,l ) − U (t r−1,l )| ≥ 2 − 5 8 l ≥ C2 − 3 4 l ≥ Cn − 3 4 , the last term simplifies to Thereby we used Corollary 1 of Moricz (1983) and the assumption s = O(2 5 8 l ). Analogously to the above calculation we again apply the generalized Chebychev Inequality and the Borel-Cantelli Lemma getting For the last summand, using the assumptions and the fact that γ < 1, we have max r=0,...,s Now the terms including g 2 , . . . , g k−1 remain. For these we know for 2 ≤ j ≤ k − 1 sup t∈R 1≤i 1 ,...,i j ≤n and consequently So we could show for arbitrary k and all sumands that they are of order o(n k− 1 2 − 1 8 δ−2 δ ). Using mathematical induction the proof is completed.

Proofs
In this section we give the missing proofs of the main results stated in Section 2.
Theorem 2.2. For the main proof we have to show that the following three conditions are fulfilled. Serfling (1984) has already proved that these conditions together are sufficient to show asymptotic normality. From there one can see that independence is not required, if these conditions are fulfilled. Some of the lemmas used for proving this theorem can also be found in Choudhury and Serfling (1988).
of the Bahadur representation of an empirical quantile holds (iii) For a U -statistic with kernel

Proofs of the conditions
Now we show that the conditions (i)-(iii) are satisfied.
For the first part of condition (i) we refer to Lemma 8.2.4.A of Serfling (1980). Although he demands independence of the random variables in his proof this property is not needed. The second part of condition (i) follows from Corollary 2.1 .
Condition (ii) is fulfilled by Lemma 2.1.
It remains to show that condition (iii) is satisfied. For this we apply Theorem 2.3. We merely have to verify, whether A satisfies the assumptions for the kernel, that is (a) A is bounded and (b) satisfies the extended variation condition. We consider again the kernel A .
(a) The boundedness is a result of the continuity of H F and J and that J vanishes off the interval [α, β].
(b) Now we want to show that A satisfies the extended variation condition. We will treat both summands separately, at first for arbitary y 1 , . . . , y m : ...,ym) .
For the verification of the simple variation condition we first treat A 1 getting = 1 , if h(X 1 , . . . , X m ) ∈ y − L , y + L 0 , else.
One can easily see that C := ∞ −∞ J(H F (y))dy is bounded. Therefore The treatment of A 2 is analogous, using the same notation of the supremum as above. Therefore A satisfies the variation condition and using the same arguments for the extended variation condition the proof is finished.
We have shown conditions (i)-(iii) and so the proof of asymptotic normality is completed.
Using |p − H n (ξ p )| ≤ 1 n we obtain Next we will show that Z n (t) − Z n (0) P −→ 0. One can easily see that To find bounds for the right hand side, we define U n and U n as where g k and g k are the related terms of the Hoeffding decomposition as used before. Therefore we have We have shown in the proof of Theorem 2.3 that for all 2 ≤ k ≤ m it is for a γ < 1, if the kernel is bounded and satisfies the extended variation condition. Analogous to the proof of Corollary 2.1 we know that g(x 1 , . . . , x m ) = 1 h(X i 1 ,...,X im )≤ξp+tn − 1 2 and g (x 1 , . . . , x m ) = 1 [h(Xi 1 ,...,X im )≤ξp] satisfy the extended variation condition.
Applying Proposition 1 of Doukhan et al. (2010) on g 1 (X i ) − g 1 (X i ) and p = 2, b = 3 and using g 1 (X i ) − g 1 (X i ) 3 < ∞, since the kernels are bounded, we have where the constant only depends on g 1 (X i ) − g 1 (X i ) 3 , since Doukhan and Lang (2009) So we get where the constant C only depends on g 1 (X i ) − g 1 (X i ) 3 . Let us come back to Since |g 1 (X i ) − g 1 (X i )| ≤ 1 for all X i and the constant C converges to zero in probability and therefore Applying the Chebychev inequality we then have Z n (t) − Z n (0) P −→ 0 . Altogether we have for t ∈ R and every > 0 ≤ P |Z n (t) − Z n (0)| ≥ 2 + P |V n (t) − t| ≥ 2 −→ 0, and analogously P √ n(ξ p − ξ p ) ≥ t, Z n (0) ≤ t −→ 0.
Using Lemma 1 of Ghosh (1971) the proof is completed. We show that the linear part m √ n n i=1 g 1 (X i ) is asymptotically normal and that the remaining terms converge to 0 in probability. If (X i ) i∈N is strong mixing then this also applies to (g 1 (X i )) i∈N , because g 1 is measurable (Korolyuk and Borovskikh (1993)), and the mixing coefficients are smaller or equal to the original ones. With these considerations and observing that (g 1 (X i )) i∈N is strong mixing with mixing coefficients α(l) = O(l −δ ) for a δ > 2 and moreover E(g 1 (X i )) = 0 and g 1 (X i ) is bounded (because h is bounded) we can apply Theorem 1.6 of Ibragimov (1961)  Using the Theorem of Slutsky we get the result of the theorem. For the first summand we get, using Theorem 2.4 and the Continuous Mapping theorem, Since W is a continuous Gaussian process we have W ∞ = O p (1).
For the remaining results we want to apply Lemma 4.4. Therefore the kernel of the U -process g(x 1 , . . . , x m , t) = 1 [h(x 1 ,...,xm)≤t] has to satisfy the extended uniform variation condition. This can be shown using the Lipschitz-continuity of h: Applying Lemma 4.4 we get for 2 ≤ k ≤ n sup t∈R √ n m k n k 1≤i 1 ,...,i k ≤n g k (X i 1 , . . . , X i k , t) With Slutsky's Theorem the proof is completed.