Limit theorems for non-degenerate U-statistics of block maxima for time series

The block maxima method is a classical and widely applied statistical method for time series extremes. It has recently been found that respective estimators whose asymptotics are driven by empirical means can be improved by using sliding rather than disjoint block maxima. Similar results are derived for general non-degenerate U-statistics of arbitrary order, in the multivariate time series case. Details are worked out for selected examples: the empirical variance, the probability weighted moment estimator and Kendall's tau statistic. The results are also extended to the case where the underlying sample is piecewise stationary. The finite-sample properties are illustrated by a Monte Carlo simulation study.


Introduction
A common target parameter in various domains of application is the distribution of componentwise yearly or seasonal maxima calculated from some underlying multivariate time series [26,1].Statistical inference on the target distribution typically involves the assumption that the block maximum distribution is an extreme value distribution.The latter is justified by probabilistic results from extreme value theory: under broad conditions on the time series, the only possible limit distribution of affinely standardized componentwise maxima, as the block size goes to infinity, are extreme value distributions; see [30] for the univariate case and [25] for multivariate extensions.
The statistical literature on estimation and testing for extreme-value distributions is abundant, ranging from univariate estimators for the parameters of the generalized extreme value distribution [35,24] to nonparametric estimators for extreme value copulas [21] and parametric estimators for max-stable process models [33].
Mathematically, statistical methods are typically validated under the additional assumption that the block maxima sample is serially independent.However, heuristically, both the independence assumption as well as the assumption that block maxima genuinely follow an extreme-value distribution should only be satisfied asymptotically, for the block size tending to infinity.[17,20,18] have shown that specific univariate estimators are consistent and asymptotically normal in a sampling scheme where the block size tends to infinity, while maintaining an i.i.d.assumption on the underlying time series.For specific univariate and multivariate estimators, [8,9] also relax the i.i.d.assumption, and allow for more general stationary time series satisfying certain mixing conditions.It has moreover been found that estimators based on block maxima may be made more efficient by considering sliding rather than disjoint block maxima, both in the univariate [36,10,11] and in the multivariate case [42].
In general, the field of asymptotic statistics is based on a number of fundamental theoretical tools like the central limit theorem, the delta-method, the empirical process or the concept of U-statistics [39].While the efficiency gain of the sliding block maxima method over its disjoint blocks counterpart mentioned in the previous paragraph has been established for classical empirical means as well as empirical (copula) processes, it has not been studied yet for the case of U-statistics.The present paper aims at filling this gap by studying non-degenerate U-statistics of disjoint and sliding block maxima samples.The topic is related but different to [32], who study U-statistics in the univariate case where the kernel of order m is evaluated blockwise in the largest m order statistics of a (disjoint) block of observations.
In general, U-statistics comprise a number of important estimators like the empirical covariance, Wilcoxon's statistic or Kendall's tau statistic.A prominent example from extremes is the probability weighted moment estimator [24].Mathematical theory for i.i.d.random variables dates back to [23]; since then, several favorable statistical properties have been demonstrated [39,Chapter 12].Asymptotic results on U-statistics have also been generalized to the time series context [38,41,15]; unbiasedness then only holds asymptotically.
The main result of this paper is Theorem 3.5, where we establish a central limit theorem on the estimation error of U-statistics for multivariate disjoint and sliding block maxima under mild assumptions on the serial dependence and the kernel function.As in the papers mentioned before, the disjoint blocks version is found to be at most as efficient as the sliding blocks version.In selected examples, it is in fact found to be less efficient.The results are extended to a sampling scheme involving piecewise stationarities which is used to capture certain applications from environmental extremes where maxima are calculated based on, for example, summer days [11].The model is interesting mathematically, because unlike the disjoint block maxima sample the sliding block maxima sample is not stationary anymore.
The remaining parts of this paper are organized as follows: the underlying model assumptions and the definition of respective U-Statistics for disjoint and sliding block maxima are presented in Section 2. The main limit results are discussed in 3, and illustrated for three selected examples in Section 4. Extensions to piecewise stationary time series are presented in Section 5. Results from a Monte Carlo simulation study illustrate the behavior in finite-sample situations (Section 6).Finally, the proofs are deferred to Sections 7. Additional limit results under strong mixing assumptions and lengthy calculations of some asymptotic variances are postponed to a supplement.
An extension of the classical extremal types theorem to strictly stationary time series [30] implies that, under suitable broad conditions, affinely standardized maxima extracted from a stationary time series converge to the GEVdistribution.This was generalized to the multivariate case in [25], where the marginals are necessarily GEV-distributed.We make this an assumption, and additionally require the scaling sequences to exhibit some common regularity inspired by the max-domain of attraction condition in the i.i.d.case [12].Condition 2.1 (Multivariate Max-domain of attraction).Let (X t ) t∈Z denote a stationary time series in R d with continuous margins.There exist sequences (a r ) r = (a  (1) , . . ., γ (d) ) ∈ R, such that, for any s > 0 and j ∈ {1, . . ., d}, lim r→∞ a (j) ⌊rs⌋ a (j) r = s γ (j)  lim where the second limit is interpreted as log(s) if γ (j) = 0.Moreover, for r → ∞, where G denotes a d-variate extreme-value distribution with marginal c.d.f.s G γ (1) , . . ., G γ (d) and where , j ∈ {1, . . ., d}.
In the case d ≥ 2, let C denote the unique extreme value copula associated with Z.As is well-known, C can be written as d, where e j denotes the j-th unit vector in R d ; (L3) max x (1) , . . ., x (d) ≤ L(x) ≤ x (1) see, e.g., [22,34].Note that (2.1) and (2.2) may for instance be deduced from Leadbetter's D(u n ) condition, a domain-of-attraction condition on the associated i.i.d.sequence with stationary distribution equal to that of X 0 and a weak requirement on the convergence of the c.d.f. of Z r , see Theorem 10.22 in [1].
From now on, we assume to observe X 1 , . . ., X n , an excerpt from a strictly stationary d-dimensional time series (X t ) t satisfying Condition 2.1 (some generalizations will be discussed in Section 5).For block size parameter r ≪ n, define componentwise block maxima of size r by where i ∈ {1, . . ., n − r + 1} denotes the first observation within each block.The traditional block maxima method is based on applying statistical methods to the sample of disjoint block maxima.The latter is given by M Note that m is the number of disjoint blocks of size r that fit into the sampling period.Under Condition 2.1, the sample of disjoint block maxima is stationary and approximately follows the multivariate extreme value distribution G.
Instead of partitioning the observation period into disjoints blocks, one may alternatively slide the blocks through the observation period, thereby taking successive maxima of only one to the right instead of r.The resulting sliding block maxima sample is given by M (sb) n,r = M r,i : i ∈ I sb n , where I sb n := {1, . . ., n − r + 1}.Under Condition 2.1, the sliding block maxima sample is stationary as well, with approximate c.d.f.G. Hence, statistical methods that are based on estimating unknown expectations by empirical means are meaningful.
The case of classical empirical means has been treated in varying generality in [10,42,11].It was found that estimators based on sliding block maxima are typically more efficient than their disjoint block maxima counterparts, despite the fact that the sample M (sb) n,r is heavily dependent over time, even if (X t ) t is an i.i.d.sequence.In this paper we generalize these results to U-statistics of order p ∈ N, with p = 1 corresponding to classical empirical means.
More precisely, let h : (R d ) p → R be a known symmetric measurable function of p d-dimensional input variables, subsequently referred to as a kernel of order p.The main objects of interest in this paper are the associated U-statistic of order p given by, for mb ∈ {db, sb}, where n mb = |I mb n | denotes the length of the block maxima sample (i.e., n db = m if mb = db and n sb = n − r + 1 if mb = sb) and where A standard heuristic argument suggests that, for the majority of summands in (2.4), the underlying block maxima can be considered as asymptotically independent.As a consequence, U mb n should be considered as an estimator for where M (1) r,1 , . . ., M (p) r,1 are i.i.d.copies of M r,1 .We are interested in obtaining asymptotic results for the estimation error in an asymptotic framework where r = r n → ∞ such that r = o(n) for n → ∞.

Limit theorems for U-statistics of block maxima
We start by introducing further conditions and notations.First, throughout the proofs we will use traditional blocking techniques relying on mixing coefficients.The latter are well-known to control the serial dependence of the underlying time series.A similar condition has been imposed in [8], among others.Here, α and β denote the alpha-and beta-mixing coefficients, see [7] for exact definitions and basic properties.Subsequently, we often write r = r n and ℓ = ℓ n .
The expectation and higher order moments of h(M r,i1 , . . ., M r,ip ) in (2.4) will be controlled by uniform integrablity and by relying on the convergence of rescaled block maxima from Condition 2.1.For that purpose, we need the kernel function h to behave well under location-scale transformations; see also [37], Chapter 5,and [32] for a similar, slightly more restrictive assumption.

Condition 3.2 (Location-scale property of the kernel function). There exist functions
where x/y := x (1) /y (1) , . . ., x (d) /y (d) Example 3.3.Condition 3.2 is met for the following kernel functions.Note that the kernels in ( 5) to ( 7) may be used to construct tests for stochastic independence; see, for instance, [31].In the current case, this corresponds to testing asymptotic independence of the coordinates of X 1 .
(1) The mean kernel: (2) The variance kernel: The modified probability weighted moment kernel of degree k ∈ N (see also Section 4.2): (1) − y (1) )(x (2) − y (2) π3 ) with d = 2, p = 3, f ≡ 1, ℓ ≡ 0 and where S n denotes the symmetric group of order n.(7) Hoeffding's D kernel and Bergsma and Dassio's t * kernel: we refer to [31] for the kernel definition, which satisfy d = 2, f ≡ 1, ℓ ≡ 0 and p = 4 and p = 5, respectively.From now on, for the ease of notation, we only consider the case p = 2 (see also [15], among others).For i ∈ {1, . . ., n − r + 1}, let with a r and b r from Condition 2.1.Note that (Z r,i ) i is stationary with Z r,1 ⇝ G as n → ∞.Further, under Condition 3.2 one has which will ultimately allow to deduce asymptotic results on U mb n,r defined in (2.4) from respective results on Heuristically, the expectation of U mb n,Z is close to with Zr,1 an independent copy of Z r,1 .The sequence ϑ r in turn converges to under suitable integrability assumptions; here Z, Z ∼ G are independent (see Lemma 8.1 below).The necessary integrability condition, which will also ensure convergence of higher order moments, is as follows.
Finally, for (Z 1,ξ , Z 2,ξ ) ∼ G ξ , let where, with Z ∼ G and ϑ from (3.5).The following result is the main result of this paper.
Theorem 3.5.Suppose Conditions 3.1, 3.2 and 3.4 are met.Furthermore let h be λ λ 2d -a.e.continuous and bounded on compact sets.Then, for mb ∈ {db, sb}, with θ r from (2.5) and σ 2 mb from (3.9).Moreover, σ 2 sb ≤ σ 2 db .Note that, under Condition 3.2, θ r = f (a r , b r ){ϑ r + ℓ(a r , b r )} with ϑ r from (3.4).In certain situations (in particular when ℓ = 0 and f ≡ const; see, e.g., Kendall's tau), one may be willing to regard U mb n,r as an estimator for the asymptotic analogue θr = f (a r , b r ){ϑ + ℓ(a r , b r )}.(3.11)For instance, in case of the variance kernel (see also Section 4.1), θr is the variance of the GEV(b r , a r , γ)-distribution, which is exactly the GEV-distribution approximating the distribution of M r,1 , see Assumption 2.1.Under an additional bias condition, we may deduce the following result on the estimation error.
Corollary 3.6.Additionally to the assumptions made in Theorem 3.5, suppose that the limit B = lim n→∞ B n exists, where (3.12) with σ 2 mb from (3.9) and θr from (3.11).Remark 3.7 (Generalizations).Using the Cramér-Wold Theorem it is possible to generalize the limit theorems to the case of joint convergence involving a finite number of kernel functions.Moreover, as mentioned before and at the cost of a more complicated notation, one might extend the results to higher kernel degrees p ∈ N. Joint weak convergence then even holds for kernels of different degrees.These generalizations allow, for example, to handle the joint convergence of probability weighted moments estimators of different order, which would be needed to deduce the asymptotics of the PWM-estimator for the parameters of the GEV-distribution.Further generalizations concerning different model assumptions are worked out in Section 5 and Section A in the supplement.Remark 3.8 (A bias reduced version of the sliding blocks estimator).In view of Lemma 8.6 below, the block maxima M r,i and M r,j are asymptotically independent for |i − j| ≥ r and asymptotically dependent otherwise.As a consequence, the summands h(M r,i , M r,j ) with |i − j| < r induce a dependency bias, which suggests to replace U sb n,r by where Jsb Hence, the two estimators are typically asymptotically equivalent.Throughout, we only consider U sb n,r for simplicity.

Examples
Details are worked out for specific kernel functions of interest.

Variance estimation
The variance is one of most fundamental parameters to describe a distribution of interest, which, in our case, is σ 2 r := Var(M r,1 ).The respective empirical variance, based on either disjoint or sliding block maxima, is given by where As is well-known, the empirical variance can be written as a U-statistic of order p = 2, that is, The following result is a direct consequence of Theorem 3.5.
Corollary 4.1.Suppose Condition 3.1 is met with γ < 1/4 and that there exists where σ 2 db and σ 2 sb only depend on the tail index γ.Explicit formulas are provided in (B.1) and (B.2) in the supplement, respectively.Moreover, σ 2 sb < σ 2 db .The assumption γ < 1/4 is natural, as asymptotic normal results on empirical variances require finite fourth moments; in the case of the GEV-distribution, this exactly corresponds to γ < 1/4. Figure 1 shows the ratio of the asymptotic variances, σ 2 db /σ 2 sb as a function of γ.We observe that the estimator based on sliding blocks has a significantly smaller variance for negative γ, say γ < −.25, while hardly any difference is visible for positive γ.
The previous results may be made more explicit when imposing a specific time series model.We exemplary work out details for a marginal transformed version of the ARMAX-model.The model is defined as follows: for an i.i.d.sequence (W t ) t∈Z of Fréchet(1) distributed random variables and α ∈ (0, 1], consider the ARMAX(1) recursion defined as The recursion has the stationary solution Y t := max j≥0 (1 − α)α j W t−j , which has Fréchet(1) distributed marginals and extremal index θ = 1−α, see Example 10.5 in [1].Define X t as the transformed random variables where F W is the c.d.f. of a Fréchet(1) distribution, F γ is the c.d.f. of the Pareto family defined as and where F ← is the left continuous generalized inverse of F .By [3] and [7] the untransformed time series (Y t ) t is exponentially β-mixing, which implies the same for (X t ) t .This results in a large spectrum of choices for r n and ℓ n satisfying Condition 3.1, which can hence be regarded as non-restrictive.We will prove in Section 7.2 that, if γ < 1/4 and if r = o(n), n = o(r 3 ), all assumptions from Corollary 4.1 are met, with a r = (r as asserted.Moreover, one may show that bias condition is met with B = 0, whence σ 2 r may be replaced by (r

The probability weighted moment estimator
It is well-known that η is a one-to-one function of the first three probability weighted moments [24].Replacing the moments in (4.4) by suitable estimators and plugging those into the one-to-one function results in (the) PWM estimator for η.One version, as proposed in [29], is given by where M = (M 1 , . . ., M n ) is a sample of random variables distributed as M and is the ordered sample.If M is an i.i.d.sample, then there are no ties with probability 1, whence βk = βk , where, for k ∈ N, with the permutation invariant kernel function In this section, we apply Theorem 3.5 to derive limit results for the estimator ) with mb ∈ {db, sb}.For simplicity, we restrict attention to the case k = 2, which yields a U-statistic of order 2. Since the function h pwm,2 does not satisfy Condition 3.2, we will need the modified kernel function hpwm,2 (x, y) := max(x, y)/2 from Example 3.3.

Proposition 4.2. Suppose that (X t ) t∈Z from Condition 2.1 does not contain ties with probability 1 and that Condition 3.1 is met. If there exists
with 0 < σ 2 sb < σ 2 db .Note that similar asymptotics have also been worked out in [11], where the derivation was based on explicit expansions of the kernel function involving empirical cumulative distribution functions.Comparing our result with their Theorem 3.5, we observe that our result is slightly more restrictive, since we impose β-mixing rather than α-mixing.An extension to α-mixing is given in Section A in the supplement.

Estimation of Kendall's tau
Kendall's tau statistic is a well-known nonparametric distribution-free measure of rank correlation that quantifies the degree of association between two variables [27].The population version τ = τ (X) for a bivariate vector X = (X (1) , X (2) ) is defined as follows: for i.i.d.copies X 1 , X 2 of X, we have τ : 2 ) > 0) and 2 ) < 0) denote the probabilities of concordance and discordance of X 1 , X 2 , respectively.Applied to bivariate extreme value distributions, Kendall's tau provides a useful summary of extremal dependence; see [1, pp. 274-275] and the references therein.

Extensions to piecewise stationarity
Environmental data typically involve different forms of non-stationarity.A particular source is seasonality, which may statistically be approached by restricting attention to seasons rather than years, bearing in mind that the inner-season variability should be approximately stationary.This idea may be approached mathematically by working with data satisfying the following assumption taken form [11]. Condition 5.1 (Piecewise stationary observation scheme).For sample size n ∈ N, we have observations X n,1 , . . ., X n,n taking values in R d .Moreover, for some block length sequence (r n ) n ⊂ N diverging to infinity such that r n = o(n), we have where n db = ⌊n/r n ⌋ and where (Y 1,t ) t , (Y 2,t ) t , . . .denote i.i.d.copies from a stationary time series satisfying Condition 2.1 with continuous marginal c.d.f.F .Note that Y j,t should be regarded as the t-th observation in the j-th season.
We refer to [11] for further discussions of Condition 5.1, see in particular Remark 2.3.For the rest of this section, we tacitly assume Condition 5.1 and write X j := X n,j for simplicity.Note that the triangular array (X n ) n is r n dependent, which in fact simplifies the analysis of the disjoint block maxima method.For the sliding block maxima method however, mathematical challenges arise from the fact that the sliding block maxima sample is typically non-stationary.Indeed, for x ∈ R d , generally In [11], Lemma 2.4, it is shown that this non-stationarity disappears asymptotically, which suggests that statistical methodology derived under stationarity assumptions (as in Section 3) may also be applicable under Condition 5.1.For deriving respective limit results, some modifications of the previous conditions are necessary.First of all, the integrability conditions from Condition 3.4 take the following, slightly more involved form.Condition 5.2.There exists a ν > 2/ω with ω from Condition 3.1 such that (a) lim sup r→∞ sup 1≤i≤j≤r |h(x, y)| 2+ν dP Zr,i (x) dP Zr,j (y It is worth noting that, if there exist monotone functions g 1 , g 2 such that |h(x, y)| ≤ |g 1 (x)| + |g 2 (y)|, the inner supremum may be omitted; examples can be found in Section 4.
Next, we quantify the average non-stationarity for the sliding block maxima.For i, j ∈ {1, . . ., r}, let where ( Zr,j ) j=1,...,r is an independent copy of (Z r,j ) j=1,...,r .Note that ϑ r,1,1 = ϑ r with ϑ r from (3.4), while ϑ r,i,j ̸ = ϑ r in general.We do however have θr = This result suggests that the non-stationarity of the sliding block maxima method under Condition 5.1 may show up in the asymptotic bias of the Ustatistic U sb n,r .The following assumption requires r to be sufficiently large to make this bias negligible.
Condition 5.4 (Negligibility of the bias due to non-stationarity).The limit D := lim n→∞ D n exists, where

Simulation study
A Monte Carlo simulation study was conducted to evaluate the finite sample performance of two selected estimators based on U-statistics: the empirical variance (univariate) as well as Kendall's τ statistic (bivariate).The study mainly aimed at comparing the disjoint and sliding block maxima method for various extreme value indices and time series models.

Estimating the block maxima variance
In Section 4.1, the empirical variance based on sliding block maxima, σ2 n,r,sb , was found to be an asymptotically more efficient estimator of σ 2 r := Var(M r,1 ) than its disjoint blocks counterpart, σ2 n,r,db .We assess the performance in finitesample situations for data-generating processes made up from the following marginal and temporal models: Stationary distribution of X t : We consider the generalized Pareto distribution GPD(0, 1, γ) with shape parameter γ ∈ {−0.4,−0.2, 0, 0.1}, see (4.2).Note that the largest value of γ = 0.1 is close to the non-integrability point 0.25 for the variance estimator.

Time series models:
In addition to the i.i.d.case, two time series models were considered, each with three parameter choices.The first model is the (transformed) ARMAX(1) model, see Section 4.1, with time series parameter α ∈ {0.25, 0.5, 0.75}; note that the extremal index is θ = 1 − α.As the second model we chose the Cauchy AR model (CAR), defined as the stationary solution (Y t ) t of the CAR recursion ∼ Cauchy(0, 1), with time series parameter ϕ ∈ {0.25, 0.5, 0.75}.This corresponds to the extremal index θ = 1−ϕ, see, e.g., Problem 7.9 in [28].Realizations from the model were transformed to the GPD(0, 1, γ) distribution by setting , where F Y and F γ denote the c.d.f. of the Cauchy(0,1) and the GPD(0,1,γ)distribution, respectively.
Combining each marginal model with each time series models results in a total of 4 × 7 = 28 different models.Throughout, we chose to fix the block size to r = 90, which roughly corresponds to the number of days in the summer months and which is a common block length in environmental applications.The number of blocks, denoted as m, ranged from 10 to 100, resulting in corresponding sample sizes ranging from n = 900 to n = 9, 000 observations.The performance of the estimators was assessed based on approximating the MSE, the squared bias and the variance of the estimators based on N = 10, 000 simulation repetitions.For assessing the bias, the true variance σ 2 90 was determined in a preliminary simulation experiment involving a huge sample of size 10 6 drawn from the distribution of M r,1 ; with one such sample for each of the 28 models.
The results for the i.i.d. and the ARMAX-models are illustrated in Figure 2, where we depict the ratio MSE(σ 2 n,r,db )/MSE(σ 2 n,r,sb ) as a function of the number of seasons (results for the CAR-model are omitted because they are qualitatively similar).Across all considered numbers of seasons, tail indices and time series parameter, the sliding blocks estimator consistently outperforms its disjoint blocks counterpart.Notably, the depicted ratio is significantly larger than one for small tail indices and for small sample sizes.This particular observation is promising because obtaining large sample sizes is sometimes challenging in the area of extreme value statistics.Also, it should be noted that the serial dependence does not substantially influence the relative performance (as was to be expected from the asymptotic results).Finally, we would like to report that the estimation variance was found to be of much larger order than the bias, whence the MSE-ratio is nearly the same as the respective variance ratio.

Estimating Kendall's tau
We investigate the finite-sample performance in the bivariate case for the estimation of Kendall's τ = τ r = τ (M r,1 ) based on the estimators τ db n,r and τ sb n,r from Section 4.3.Note that both Kendall's τ and its estimators do not depend on the marginal distributions of X (1) t and X (2) t (in case they are continuous).The data generating processes are as follows: Time series models: Three types of time series models were considered: bivariate versions of the ARMAX(1) and CAR(1) model from the previous section as well as i.i.d.observations.The bivariate ARMAX(1) model is defined as the stationary bivariate solution to the recursion equation: where α ∈ (0, 1] and where (W t ) t is an i.i.d.sequence with Fréchet(1)-distributed margins and with copula as specified below.Throughout, the value of α was fixed to 0.5; and the i.i.d.case is obtained for α = 0.The bivariate CAR(1) model is defined as the stationary solution of the bivariate CAR(1) recursion where ϕ ∈ (0, 1] and where (W t ) t is an i.i.d.sequence with Cauchy(1) margins and with copula as specified below.Throughout, the value of ϕ was fixed to 0.5.
Copula of W t : Seven different copulas were considered: the independence copula, the Gaussian copula, the t ν -copula with ν = 4 degrees of freedom, and the Gumbel-Hougard copula, where the parameter of three last-named copulas was chosen in such a way that the associated value of Kendall's tau is in {0.3, 0.6}.Note that the Gaussian copula is tail independent, while the t-and Gumbel copula exhibit upper tail dependence.The upper tail dependence coefficients as a function of Kendall's tau are given by 2•t 5 (− 5(1 − sin(πτ /2))/(1 + sin(πτ /2)) ∈ {0.23, 0.5} and 2 − 2 1−τ ∈ {0.375, 0.68} for the t 4 and Gumbel-Hougard copula, respectively; see [19].Overall, we obtain 3 × 7 = 21 different models.As in the previous section, we fix the block length to r = 90 and vary m between 10 and 100, resulting in sample sizes n = mr ranging from 900 to 9,000 observations.The estimators are evaluated in terms of the mean squared error (MSE), the bias and the variance, based on N = 1, 000 simulation repetitions.The true value of τ r was assessed in a preliminary simulation involving a sample of size 100, 000 from M r,1 .
The results are presented in Figure 3, where we restrict attention to the CAR(1) model, as the performance in the other two time series is nearly identical.As in the previous section, the bias was found to be of much smaller order than the variance, whence we further restrict attention to MSE(τ db n,r ) and to the MSE ratio MSE(τ db n,r )/MSE(σ sb n,r ).We observe that the sliding blocks estimator consistently outperforms the disjoint blocks counterpart.The level of dependence impacts the performance in that the estimation is more precise for higher dependence (for both estimators), and in that the advantage of the sliding blocks estimator over its disjoint blocks counterpart is highest for low levels of dependence/independence. Furthermore, as in the previous section, the sliding blocks estimator's advantage is slowly decreasing in the number of blocks.

Proofs for Section 3
Proof of Theorem 3.5.Recall the definition of U Hence it suffices to show that for mb ∈ {db, sb} and For the proof of (7.1) we will use a Hoeffding decomposition and verify weak convergence of the linear part to the normal limit and L 2 -convergence to zero of the asymptotically degenerate part.For both parts we will employ common blocking techniques to deal with the serial dependence, (see e.g., [14], page 31).Define and notice the algebraic identity Proof of (7.1) for mb = db: we start by proving By Lemma 8.5, we may switch to i.i.d.copies of Z r,i , denoted by Zr,i .Hence, Ljapunov's central limit theorem becomes applicable.Consider first the case Var(h 1 (Z)) > 0. In view of the fact h 1,r ( Zr,i ) is centered, we need to check the Ljapunov condition: 1+δ/2 = 0.
In the next part we show that the (asymptotically) degenerate part converges to zero in L 2 , i.e., E[( The Cauchy-Schwarz inequality, standard inequalities for the expectation and Condition 3.4 imply that sup such that both the distance between the smallest, min(i, j), and the second smallest index and the largest, max(i, j), and the second largest index is at most 2r.Clearly, the cardinality of the set of all those (i, j) is of the order O(m 2 ), whence the expression in (7.6) with the sum restricted to those tuples is of the order O(m −1 ).It is hence sufficient to consider the sum over those summands for which either the distance between the smallest index and all other indices is strictly larger than 2r, or the distance between the largest index and all other indices is strictly larger than 2r.We only consider the first case, as the other can be treated similarly.Without loss of generality, let i 1 be the smallest index, and let J n,db = denote the respective set of indices, that is, For each tuple (i, j) ∈ J n,db , we may use Berbee's coupling Lemma [2] to construct a random variable Z * r,i1 having the same distribution as Z r,i1 that is independent of (Z r,i2 , Z r,j1 , Z r,j2 ) and which satisfies Using stationarity, basic properties of the conditional expectation and the properties of Z * r,i1 we obtain, via conditioning on (Z r,i2 , Z r,j1 , Z r,j2 ), that Next, repeated applications of Hölder's inequality imply that, uniformly in (i, j) ∈ J n,db , where we have used Condition 3.4.Overall, which converges to zero by Condition 3.1 (c) as 2/ν < ω.This implies (7.6), and in combination with (7.4) and (7.5) we obtain (7.1).
Proof of (7.1) for mb = sb: In order to show that the degenerate part of the rescaled sliding blocks U-statistic converges to zero, it is sufficient to show that This can be worked out analogously to the disjoint case: again, we may restrict the sum in the upper display to tuples in J sb n = {(i, j) ∈ J sb n × J sb n : i 2 − i 1 > 2r, j 1 − i 1 > 2r}, as the set of the remaining tuples is of the order O((nr) 2 ).We can then copy the disjoint blocks proof verbatim by replacing J n,db and n db with J n,sb and n sb .
It remains to show For this purpose use Theorem 8.7 with f r,s := h 1,r , f := h 1 and note that all conditions are satisfied, where we use Lemma 8.6 and an easy adaptation of Lemma B.15 in [11] to obtain the weak convergence condition in (8.3).

Proof of (7.2):
The inequality follows from Lemma A.10 in [42], where X n,i := h 1,r (Z r,i ) and the preconditions of Lemma A.10 can be deduced from Condition 3.1(a), (c) and 3.4(a).
Proof of Corollary 3.6.By Condition 3.2 and the assumption on B n , we have Hence, the assertion follows from Theorem 3.5 and Slutsky's theorem.
Proof of Equation (4.3).Fix γ < 1/4 and omit the lower index 1 everywhere; e.g., write Z r instead of Z r,1 .We need to verify the conditions of Corollary 4.1.
We start by proving Assumption 2.1, for which we restrict attention to the case γ > 0 since the other cases can be treated similarly.Using F ← W (F γ (t)) = −1/ log{1 − (1 + γt) −1/γ } for t > 0 and equation (10.5) from [1] we have where we substituted a r = (r Define a ′ r := r γ and b ′ r := (r γ −1)/γ, where the latter is defined by continuity as b ′ r = log r if γ = 0.The p.d.f. of Z ′ r is then given by for t ∈ supp( Zr ).We will only present the case γ > 0 as the other cases use similar ideas.Substituting 1 + tγ, we obtain By the monotone convergence theorem the first integral converges to Using similar ideas as before, one can show that n = o(r 3 ) implies lim n→∞ B n = 0.
Proof of Proposition 4.2.Write h pwm,2 = h pwm and hpwm,2 = hpwm .First of all, we have The number of summands in R mb n is of the order O(nr) for mb = sb and of the order O(m) for mb = db, whence R mb by the integrability assumption.
Next, we have which is zero with probability one by the no-ties assumption; note that all indices in the sum refer to blocks that do not overlap.
The second statement follows from Corollary 3.6, applied to U mb n,r ( hpwm ).Finally, the inequality for the asymptotic variances can be found in [11].
Proof of Proposition 4.3.Recall Example 3.3(5) and apply Theorem 3.5.A short calculation yields the formulas for the asymptotic variances.
For the first convergence assume that n = mr for simplicity.We have where the O-term is due to leaving nearby summands out.Next, by independence, piecewise stationarity and including nearby summands again, Overall, Proof of Theorem 5.5.For mb = db, (Z r,i ) i∈I db n is an i.i.d.sample.Thus the proof essentially is an easier version of the proof of Theorem 3.5.
For mb = sb note that, by Lemma 5.3, Conditions 5.4 and 3.2, it is sufficient to show that Note that we might replace n − r + 1 with n, since (n − r + 1)/n = 1 + O(m −1 ).Unlike in the situation from Theorem 3.5, the sliding block maxima sample is not stationary anymore, which requires a different version of the Hoeffding decomposition.For 1 ≤ i, j ≤ n, introduce functions h 1,r,i : R d → R and h 2,r,i,j : R 2d → R by with ϑ r,i,j from (5.1).Note, that by Lemma 5.3 The asymptotic normality of √ mL sb n,r follows from Theorem 8.8 with f r,i := 2h 1,r,i , where the preconditions are met since the time series is piecewise stationary and by assumption; moreover, (8.3) is a consequence of Lemma 8.11.We omit the proof of √ mD sb n,r = o P (1) as the proof is similar to the proof of the respective statement in the proof of Theorem 3.5.
Proof.We have h(Z r,1 , Zr,1 ) ⇝ h(Z, Z) by independence and the continuous mapping theorem.The assertion then follows from Example 2.21 in [39] and the integrability assumption.
Moreover, for any p < 2 + ν with p ∈ N, we have Proof.Since ϑ r → ϑ by Lemma 8.1, we may assume ϑ r ≡ 0. We will use Wichura's Theorem [4,Theorem 4.2].Note that and define, for In order to show weak convergence of T r (b) to T (b) we use the extended continuous mapping theorem (Theorem 1.11.1 in [40]).Let x r → x ∈ R d and note that the map (x, y) → h(x, y)1{y ∈ B} is P ⊗2 Z -a.e.continuous.By the ordinary continuous mapping theorem we obtain weak convergence of h(x r , Z r,1 )1{Z r,1 ∈ B} to h(x, Z)1{Z ∈ B} .Next, since there exists a compact set A containing (x r ) r , we have which in turn implies moment convergence of h(x r , Z r,1 )1{Z r,1 ∈ B}.This shows continuous convergence of the mapping sequence x → B h(x, y) dP Zr,1 (y), and the extended continuous mapping theorem finally implies weak convergence of T r (b) to T (b) as asserted.
Next, we have weak convergence of T (b) to T , for b → ∞.Indeed, with Z an independent copy of Z, we have as b → ∞.
We finally verify lim b→∞ lim sup r→∞ P (|T r (b) − T r | > ε) = 0 for any fixed ε > 0. Let Zr,1 be an independent copy of Z r,1 .Applying the Markov inequality, we have Applying the Cauchy-Schwarz inequality and taking the limit over r results in the upper bound Wichura's theorem implies weak convergence of h 1,r (Z r,1 ) to h 1 (Z), for r → ∞.The stated convergence of moments follows by Example 2.21 in [39] using the Jensen inequality and Condition 3.4 (a).
Let ℓ = ℓ n ∈ N denote the sequence from Condition 3.1(b).We may assume that 1 < ℓ < r.For j ∈ N, recall that ) and since Z r,1 converges weakly to Z by assumption, it suffices to show that Z r−ℓ,1 − Z r,1 = o P (1).In particular, we may assume d = 1 and note (M r−ℓ,1 − b r−ℓ )/a r−ℓ converges weakly to Z. Condition 2.1 yields local uniform convergence, see the proof of Lemma B.15 in [11], hence a r /a r−ℓ = 1 + o( 1) and (b r−ℓ − b r )/a r = o (1).By Lemma B.15 from [11] we have, for any ε > 0, Using the convergence of the rescaling sequences and that Z r−ℓ is stochastically bounded we have For the next results, let ∆ r,ℓ (j) := h 1,r (Z r,j ) − h 1,r−ℓ (Z r−ℓ,j ) for j ∈ I db n .Furthermore let ( Xj , . . ., Xj+r−1 ) j∈I db n be i.i.d.copies of (X j , . . ., X j+r−1 ) j∈I db Proof.Let ϕ Y denote the characteristic function of a real-valued random variable Y. First note that, by Lemma 3.9 in [14], we have, for t ∈ R, ) , e ith 1,r−ℓ (Z r−ℓ,1+r ) ≤ 2πα(ℓ), as there is a lag of length ℓ between X r−ℓ and X r+1 .An induction over the number of summands results in by Condition 3.1 (b).Hence, by Levy's continuity theorem, the weak limit of m −1/2 j∈I db n h 1,r−ℓ (Z r−ℓ,j ) exists iff the weak limit of m −1/2 j∈I db n h 1,r−ℓ ( Zr−ℓ,j ) exists, and in that case, the limits coincide.
Finally, by Lemma 8.4 Thus, the weak limit of m −1/2 j∈I db n h 1,r (Z r,j ) exists iff the weak limit of m −1/2 j∈I db n h 1,r−ℓ ( Zr−ℓ,j ) exists and they coincide.The same line of argumentation as in the proof Lemma 8.4 yields the result.

Sliding blocks -stationary case
Recall the definitions of G ξ , L ξ and C ξ from (3.8), (3.7) and (3.6), respectively.Recall the convention that The following is a generalization of Lemma B.3 in the supplementary material to [11] for dimensions d ≥ 1. where ξ r = 1 + ⌊rξ⌋.Furthermore, G ξ is the cdf of a 2d-variate extreme value distribution with copula C ξ and stable tail dependence function L ξ .
Proof.We only consider the case ξ ∈ [0, 1]; the case ξ > 1 can be treated similarly.By the same arguments as in the proof of Lemma B.3 in [11], we obtain that x (1) x (1) ∧ y (1) Since − log G γ (x) = (1 + γx) −1/γ , we may write A straightforward calculation then shows that the expression on the right-hand side of (8.2) can be written as G ξ (x, y).In particular, C ξ is a copula, which can easily be seen to be max-stable, i.e., C ξ (u s , v s ) = C ξ (u, v) s for all s > 0 and u, v ∈ [0, 1] d .It is hence an extreme-value copula with the given stable taildependence function L ξ and G ξ is the cdf of an extreme-value distribution. ) Proof.The proof is similar to the one of Theorem 3.6 in [9].For j ∈ {1, . . ., m}, let I j := {(j − 1)r + 1, . . ., jr}.Choose m * = m * n ∈ N with 3 ≤ m * ≤ m such that m * → ∞ and m * = o(m ν/(2(1+ν)) ).Next, define q := q n := m/m * and assume without loss of generality that q ∈ N, n/r ∈ N.For j ∈ N define J + j := I (j−1)m * +1 ∪ . . .∪ I jm * −2 as the index set making up the big blocks, and J − j := I jm * −1 ∪ I jm * as the index set making up the small blocks.Note that #J + j = (m * − 2)r and #J − j = 2r.The previous definitions allow to rewrite where S + n,j := q/(nr) s∈J + j f r,s (Z r,s ) and S − n,j := q/(nr) s∈J − j f r,s (Z r,s ).
Note that the random variables (S ± n,j ) j are stationary by (i).Hence, by (ii) Properties (ii), (iii) and the definition of m * and q yield R n1 3 For R n,2 note that by property (iii) and Lemma 3.11 from [14] we have that sup 1) by assumption.Therefore, in view of E[S − n,j ] = 0 for all j by (ii), we obtain q −1/2 q j=1 S − n,j = o P (1).Concerning the sum over S + n,j note that we may assume that S + n,1 , S + n,2 , . . .are independent by arguing as in the proof of Lemma 8.5, since there is a lag of r between any two big blocks.Hence, we may subsequently apply Ljapunov's central limit theorem.

Sliding blocks -non-stationary case
The following theorem is an adaptation of Theorem 8.7 to the non-stationary setting of Section 5.
Proof.The proof is essentially the same as for Theorem 8.7, with the following simple adaptation: independence of S + n,1 , S + n,2 , ... is a direct consequence of the imposed sampling scheme.
The following result is an extension of Lemma 2.4 from [11] to multivariate time series.Proof.Note first that for univariate x ∈ R and s > 0, γ ∈ R we have 3) and (L1) from Condition 2.1.By piecewise stationarity and Condition 2.1 it suffices to show the result for ξ ∈ (0, 1).Analogous to the proof of Lemma 2.4 from [11] we have where the last equality follows from the identity in the previous display.
Lemma 8.10.Suppose the sampling scheme from Condition 5.1 is met and that the underlying time series (Y t ) t satisfies Conditions 3.1(a) and (b).Then, for any ξ, ξ ′ ≥ 0 and x, y ∈ R d , Proof.This is a slight adaption of the proof of Lemma 8.

Fig 2 .
Fig 2. For the estimation of σ 2 r = Var(M r,1 ), the ratio MSE(σ 2 n,r,db )/MSE(σ 2 n,r,sb ) is depicted as a function of number of blocks m.

) 1 / 2
by the Portmanteau Theorem and Condition 3.4 (a), for a constant C not depending on b.The bound goes to 0 for b → ∞ since B(b) c ↓ ∅.

Theorem 8 . 7 (
CLT for sliding blocks).Suppose that Conditions 3.1(a), (b) are satisfied and that there exists an ω > 0 with m 1+ω n α(r n ) → 0. For each r = r n let F r = {f r,i : R d → R | i ∈ N} be a family of deterministic maps with the following properties: (i) f r,r+s = f r,s for all s ∈ N and r = r n with n ∈ N; (ii) The random variables f r,i (Z r,i ) are centered for all i ∈ N and r = r n with n ∈ N; (iii) There exists a ν > 2/ω with lim sup n→∞

Theorem 8 . 8 .
Suppose that the sampling scheme from Condition 5.1 is met and that the underlying time-series (Y t ) t satisfies Conditions 3.1(a), (b) and m 1+ω n α(r n ) → 0 for some ω > 0. For each r = r n , let F r = {f r,i : R d → R | i ∈ N} be a family of deterministic maps satisfying Conditions (i) -(iv) of Theorem 8.7.Then, for n

Lemma 8 . 9 .
Suppose the sampling scheme from Condition 5.1 is met and that the underlying time series (Y t ) t satisfies Conditions 3.1(a) and (b).Then, for every ξ ≥ 0 and x ∈ R d , lim n→∞ P(Z r,ξr ≤ x) = G(x), with G from Condition 2.1 and with ξ r := 1 + ⌊rξ⌋.
6 using standard clipping techniques and Lemma 8.9.