Non-parametric estimation of time varying AR(1)–processes with local stationarity and periodicity

: Extending the ideas of [7], this paper aims at providing a kernel based non-parametric estimation of a new class of time varying AR(1) processes ( X t ), with local stationarity and periodic features (with a known period T ), inducing the deﬁnition X t = a t ( t/nT ) X t − 1 + ξ t for t ∈ N and with a t + T ≡ a t . Central limit theorems are established for kernel estimators (cid:98) a s ( u ) reaching classical minimax rates and only requiring low order moment conditions of the white noise ( ξ t ) t up to the second order.

This paper is dedicated to the memory of Jean Bretagnolle

Introduction
Since the seminal paper [5], the local-stationarity property provides new models and approaches for introducing non-stationarity in times series.The recently published handbook [7] gives a complete survey about new results obtained since 20 years on this topics.An interesting new kind of models is obtained from a natural extension of usual ARMA processes, so called tvARMA(p, q)-processes defined in [8], as: where α j and β k are bounded functions.This is a special case of locally stationary linear process defined by X (n) t = ∞ j=0 γ j t n ξ t−j .Such models has been studied in many papers, especially concerning the parametric, semi-parametric or non-parametric estimations of functions α j , β k or γ j , or other functions depending on these functions; see, for instance references [6], [8], [7], or [3], [11], [15] or [2].
Bardet and Doukhan/Non-parametric estimation of periodic time varying AR (1) processes2 For simplicity, we restrict in this first work to time-varying AR(1)-processes (X (n) t ) including a periodic component: t−1 + ξ t , with a t+T ≡ a t , for any 1 ≤ t ≤ nT, n ∈ N, (1.2) where T ∈ N * is a fixed and known integer number, and (ξ t ) a white noise.The choice of such extension of the tvAR (1) processes is relative to modelling considerations: for instance, in the climatic framework, [4] considered models of air temperatures where the function of interest writes as the product of a periodic sequence by a locally varying function.This choice provide an interesting extension of more classical periodic models of air temperature such as those proposed in [12].Other periodic representation for locally stationary processes can also be found in for instance in the paper [16], but the seasonal component is treated as an additive deterministic trend and is not included in the dynamic of the process, which is the case for model (1.2).
We then study non-parametric estimators a s (u), for s = 1, . . ., T , u ∈ (0, 1) from an observed trajectory (X nT ).We consider kernel-based estimators which are naturally induced from covariance relationships satisfied by the process (see Section 2).Central limit theorems are established for these estimators under some regularity conditions on the functions a s (•).The results are only obtained second-order moments on the white noise (ξ t ).This is a main improvement with respect to usual limit theorems on locally-stationary processes which are obtained with the assumption that any moments exist for (ξ t ).This is due to the new ideas developed in our proof which combines a central limit theorem for martingale increment arrays as well as an embedding in an Orlicz space (see details in Section 4).The obtained convergence rate is optimal with respect to the minimax rate up to a logarithm terms.Simulations based on Monte-Carlo experiments exhibit the accuracy of the estimators.This paper is also a first step concerning new results for new class of nonstationary processes.Indeed, we can extend the definition (1.2) to processes (X (n) t ) such as: t−2 , . . ., where (Z t ) is a sequence of i.i.d.random vectors modelling for instance exogenous inputs.This more tough case is deferred to forthcoming papers.Time varying other models with an infinite memory may also be processed as GARCH-type models (see for instance [9]).Quote also that [10] introduced INGARCH-models, those models are GLM models; non-stationary versions of which also may be considered.They will be considered in further works.
The structure of the paper is as follows.In Section 2, we define and study asymptotic properties of non-parametric estimators for the process (1.2).Section 3 provides some Monte-Carlo results while the proofs are reported in Section 4.

Definition and first properties of the process
Here we denote by T ∈ N * a fixed and known period.
The paper is dedicated to the simplest case X = (X Here (ξ t ) t∈N is a sequence of i.i.d.r.v.satisfying E(ξ t ) = 0 and Var (ξ t ) = σ 2 for any t ∈ N. The functions (a s (•)) 1≤s≤T , [0, 1] → R are supposed to satisfy some regularity.Hence, we provide the forthcoming definition 2.1 usually made in a nonparametric framework: Remark that with this unusual definition, a Lipschitz function is in C 1 .As a consequence we specify the assumptions on functions (a t ) using a fixed positive real number ρ > 0: Assumption (A(ρ)): The functions {a t (•); t ∈ N} are such as:
First it is clear that the conditions on functions (a s ) insure the existence of a causal linear process (X (2.2) ∞ and E(ξ 3 0 ) = 0 (this holds e.g. if ξ 0 admits a symmetric distribution).For s ∈ {1, . . ., T }, there exists functions γ s (v) (2.3) We will now assume X 0 = 0.In addition of the previous proposition, another relation can be easily established.Indeed, for t ∈ {0, 1, . . ., nT }, with s = t [T ], by multiplying (2.1) by X t and taking the expectation: The relation (2.5) is the foundation of the definition of the following nonparametric estimators of the functions a s (•).

Asymptotic normality of the estimator
Assume that the sample (X 1 , . . ., X nT ) is observed for some n ≥ 1; this condition entails a reasonable loss of at most T data and allows a more comprehensive study.
Assumption ( K): Let K : R → R + be a Borel bounded function such that: • there exists some B > 0 such as Typical examples of kernel functions are K(t) = (2π) −1/2 e −t 2 /2 and K(t) = Assume that a sequence of positive bandwidths (b n ) n∈N is chosen in such a way that lim Now, keeping in mind the expression (2.5), for s ∈ {1, . . ., T } and u ∈ (0, 1), we set (2.6) since extremities are omitted we avoid the corresponding edge effects.The case u = 0 does not make any contribution while the case u = 1 corresponds with simple periodic behaviours and such results should be found in [12].
Using essentially a martingale central limit theorem (the steps of the proofs are precisely detailed in Section 4), we obtain: for any u ∈ (0, 1), s ∈ {1, . . ., T }, with γ (2) . (2.7) Note that for ρ ≤ 1 the classical optimal semi-parametric minimax rate is reached.This is not the case if ρ ∈ (1, 2].In that case, another moment condition is needed in order to improve the convergence rate of a s (u).Moreover in case ρ = 2 and if b n = cn − 1 5 then the central limit still holds but the limit is now non centred: Remark 2.2.Optimal window widths write as b n ∼ cn − 1 2ρ+1 thus the above result holds with a suboptimal window width.Moreover the symmetry assumption is discussed in Remark 4.2.Now for the case ρ = 2 in case the derivatives of a s are regular around the point u, then the optimal window width actually may be used and the central limit theorem again holds with a non-centred Gaussian limit.
Remark 2.3.Of course, if T = 1, Theorems 2.1 and 2.2 hold.These results provide another minimax estimation of the function u ∈ [0, 1] → a(u) requiring sharper moment and regularity conditions than the ones proposed in Theorem 4.1 of [8].

Monte-Carlo experiments
In this section, numerous Monte-Carlo experiments have been made for studying the accuracy of the new non-parametric estimator a s (•).Firstly, we considered 3 typical functions s (u) = 0.9 cos 2π nu T cos(3u).Figure 1 exhibits the graph of the function a and an example of its estimation (for n = 1000). .
• For ρ = 1.5, we chose a (1.5) is an observed trajectory of a Wiener Brownian motion; • For ρ = 0.5, we chose a (0.5) where is an observed trajectory of a Wiener Brownian motion. .
Finally, for each n, functions a s , kernel K and probability distributions of ξ 0 , we present the results computed from 1000 replications and the following methodology: 1.For each replication j, we defined b n = n −λ with λ = 0.10, 0.11, . . ., 0.80, (u i ) 1≤i≤99 = 0.01, 0.02, . . ., 0.99, s = 1, 2, • • • , T , and the estimators a s (u i ) are computed.2. For each replication j and each λ = 0.10, 0.11, . . ., 0.80, an estimator of the M ISE is computed: 3. For each replication j, we minimised an estimator of the global square root of MISE: 4. Then we computed λ = 1 1000 1000 j=1 λ j over all the replications.5. Finally, we computed the estimator of the minimal global square root of MISE, As a consequence, λ and M ISE 1/2 are two interesting estimators relative to Theorems 2.1 and 2.2.The first one specifies the link between the choice of an optimal bandwidth b n qnd the regularity ρ of the functions a s (•).The second one measures the optimal convergence rate of the estimators a s (•) to a s (•).All the results are printed in Tables 1 and 2.
Conclusions of the simulations: Firstly, and as it should be deduced from Theorem 2.1 and 2.2, we observed the larger the regularity ρ, the smaller λ Table 1 Results of the Montecarlo experiments providing the accuracy of as for the three chosen functions the three chosen functions with ξ 0 following a N (0, 4) distribution, 1000 independent replications are generated.and therefore the larger the optimal bandwidth b n = n −λ , and the faster the convergence rate of a s .Secondly, even if the choice of the optimal bandwidth is significantly different following the choice of the kernel (clearly smaller with the Epanechnikov kernel), the optimal convergence rate is almost the same for both the kernel.Finally, according also with Theorem 2.2, the convergence rate is clearly slower with a heavy tail distribution (t(3)) than with a Gaussian distribution, and this phenomenon increases when ρ increases.

Proofs
We first provide the proof of Proposition 2.1.
Proof of Proposition 2.1.
1. We have EX
2. Below, for ease of reading, we will omit the exponent n.Set v t = E X 2 t , and v = sup s v s ∈ [0, +∞]; also write α t = a 2 t t nT .We have: Moreover, with δ t = v t − v t−T for any t > T , we have: from (4.1) and since α As a consequence of (4.2), we also obtain: Now use again the definition (2.1) of the model, and by iterating (4.1), we derive: Hence, Now quoting that α t−j = a 2 t−j t − j nT we set α t−j = a 2 t−j t nT for 1 ≤ j < T then since ρ ≥ 1 and from (4.5) we derive The conclusion follows.

The proof mimics the case of
Since µ 3 = 0, we have: with r(t) = 6σ 2 v t + µ 4 and this implies as previously sup t w t < ∞.We also obtain: We also obtain: Finally by iterating (4.7), we obtain: from (4.8).Hence, always following the previous case and this implies (2.3).Finally, for any t < t such that t, t ∈ {1, . . ., nT }, since (X t ) is a causal process and by iteration, This completes the proof.Now we establish a technical lemma, which we were not able to find in the past literature (even if variants of this result may be found) and that will be extremely useful in the sequel.For a bounded continuous function c defined on [0, 1], and a kernel function H (see details below), then a Riemann sums approximation yields (as for [17]'s estimator, see [18] for further developments): where u ∈ (0, 1), I n,s = s, s + T, . . ., s + (n − 1)T } with s ∈ {1, . . ., T } and T ∈ N * .More precisely we would like to provide expansions of Lemma 4.1.Let u ∈ (0, 1), ρ > 0, c ∈ C ρ (V u ) a bounded function and H satisfying Assumption (K)(ρ).Then, there exists C > 0 depending only on H ∞ , c ∞ , Lip (H) and Lip (c), such that, for n large enough and b n > 0, (4.10) Finally, if ρ ∈ N * we have:
• First assume that the function c ≡ 1 is a constant.and L n,s = I n,s \ K n,s .Then, for n large enough, with C > 0 and using the assumptions on H.Then, if A n ≥ β −1 log n then exp(−β A n ) ≤ 1/n and we deduce (4.10).
• We now turn to the case of a non-constant function c.First, if ρ > 0, for (u, v) ∈ (0, 1) 2 the Taylor-Lagrange formula implies: with |R(u, v)| ≤ C ρ |u − v| ρ .Then for any u ∈ (0, 1), using Assumption imsart-ejs ver.2014/10/16 file: nonstationarity24052017.tex date: May 30, 2017 Bardet and Doukhan/Non-parametric estimation of periodic time varying AR(1) processes 14 (K)(ρ) and especially the relation z p H(z)dz = 0 for p = 1, . . ., , with C > 0.Here we denote and therefore using the previous results: from (4.14) and this implies (4.10) since nb n → ∞ and therefore n −ρ is negligible with respect from b ρ n .Now, if ρ ≥ 1 and since H and c are bounded continuous Lipschitz functions, we obtain the inequality Then, using the same computations than previously (replace h n by h n ×c), from (4.14) and this completes the first item since b n is supposed to converge to 0. The proof is now easily completed.
• Finally, in the case ρ ∈ N * , we can use the previous case an a Taylor- Then, using (4.13) and with µ u (z) ∈ [0, 1], and from Lebesgue theorem on dominated convergence.
But if 0 ≤ x ≤ 1 and y ≥ x, then xy ≤ y: therefore g(xy) ≤ g(y) ≤ g(x)g(y) since g is an increasing function and g(x) ≥ 1 for any x ≥ 0.Moreover, if 1 < x ≤ y, there exists 0 ≤ k and λ ∈ [0, 1] such as ) is a convex function since h ≥ 0 a.e.As a consequence, is a piecewise function, we finally obtain g(y 2 ) ≤ g(y)+1.We conclude with g(xy) ≤ g(y 2 ) for any 1 ≤ x ≤ y and g(x) ≥ 2 (since c 1 = 2).Hence the function ψ is a Orlicz function and ξ 0 ψ < ∞ with for any random variable V. (4.21) Now theorem 1.1 in [14] implies: Then, from the definition of (X t ) and the triangular inequality α j ξ t−j ψ for any t ∈ N * , with 0 ≤ α < 1.Since ξ s ψ = ξ 0 ψ for any s ∈ N, we finally obtain Thus (4.19) implies with the independence of ξ t and X t−1 that: Thus with t = s + (j − 1)T we have from (4.22), As a consequence, for any ε > 0, we decompose it as: Therefore we obtain: with We are going to derive the consistency of the estimator a s (u) of a s (u), in two parts.
1/ We first prove that Let s ∈ {1, . . ., T } and u ∈ (0, 1).Denote for n ∈ N * and j ∈ {1, . . ., n}, This is clear that (Y n,j ) ≤j≤n, n∈N * is a triangular array of martingale increments with respect to the σ-algebra t ) t≥0 is a process, causal with respect to (ξ t ) t≥0 .This implies that ξ t is independent of (X (n) i ) i≤t−1 and that E(ξ 0 ) = 0. We are going to use a central limit theorem for triangular arrays of martingale increments, see for example [13].Denote since E(ξ 2 0 ) = 0. Using Lemma 4.2, we obtain: s (u) is defined from (4.24) and satisifies As a consequence, the conditions of the central limit theorem for triangular arrays of martingale increments, in [13]), are satisfied and this implies that Therefore from Slutsky lemma entails: 2), we have Using twice Lemma 4.1, with firstly c(x) = γ s (x)(a s (x) − a s (u)), and secondly c(x) = (a s (x) − a s (u)), we deduce: In the case ρ ∈ {1, 2}, we also obtain from (4.11) and with s (u) using the Markov Inequality.Indeed, and, for ρ = 1 the Lipschitz property of z → |z| allows to conclude), and c(u) = 0, we derive:      From its expression given in (4.25), J n is a quadratic form of (X t ) and therefore, as X t is a linear process with innovations (ξ t ), J n is also a quadratic form of (ξ t ).As a consequence, the fourth order moment can be injected such as there exists a sequence z n ↓ 0 (as n ↑ ∞) satisfying:

10 3 and ξ 0
admits a symmetric distribution.Then (2.7) holds for a sequence (b n ) n∈N of positive real numbers such as b n n

Figure 2
Figure 2 exhibits the graph of this chosen function a (0.5) 1

Table 2
Results of the Montecarlo experiments providing the accuracy of as for the three chosen functions with ξ 0 following a t(3) distribution, 1000 independent replications are generated.