Change-point detection in the marginal distribution of a linear process

: The subject of this paper is the detection of a change in the marginal distribution of a stationary linear process. By considering the marginal distribution, the change-point model can simultaneously incor- porate any change in the coeﬃcients and/or the innovations of the linear process. Furthermore, the change point can be random and data dependent. The key is an analysis of the asymptotic behaviour of the sequential empirical process, both with and without a change point. Our results hold under very mild conditions on the existence of any moment of the innovations and a corresponding condition of summability of the coeﬃcients.


Introduction
In recent years, there has been ever increasing interest in statistical methods addressing the problem of structural stability in a time series environment. The literature on change-point detection is vast, including many different time series models, both short and long range dependent, sequential and retrospective detection, single or multiple change-points. Here, we consider one of the most widely applied time series models, the causal linear process, which is defined as follows: for i ∈ Z, where (ξ j : j ∈ Z) is a sequence of independent and identically distributed (i.i.d.) random variables and (a j : j ∈ N) is an absolutely summable sequence of constants. When E[ξ 2 i ] is finite, the summability of the coefficients ensures that linear model has a summable covariance structure (i.e. is short range dependent). However, the model we consider is much more general and includes a wide range of stationary ARMA time series, many of which are non-mixing. Further, our methods can be applied to discretely observed continuous time models such as the Lévy driven Ornstein-Uhlenbeck process, where the sampled process can be represented as a discrete-time AR(1) process (cf. [1]).
The goal is to detect a change in the marginal distribution of the linear process (X 1 , ..., X n ) at an unobserved time [nθ n ], θ n ∈ (0, 1). Our approach is retrospective and the test statistics are based on the sequential empirical distribution F [ns]  [ns] i=1 I(X i ≤ x), for −∞ < x < ∞ and 0 ≤ s ≤ 1. The Change-point detection in a linear process 3947 novelty of our approach is in its generality -our change point model incorporates any change in the innovations (ξ i ) and/or the coefficients (a i ). Furthermore, the change point can be data-dependent. Provided that the nature of the change ensures a change in the marginal distribution of the X i 's, it will be reflected in the asymptotic behaviour of the sequential empirical distribution, thereby ensuring consistency of the proposed test statistics. Our conditions are mild and in fact it is not required that E[|X i |] be finite -all that is needed is the existence of a moment of some order δ > 0 for the innovations (ξ i ) combined with a corresponding condition on summability of the coefficients (a i ). In contrast with most of the literature on short range dependent time series, we impose no conditions on mixing or association, and indeed our model includes many non-mixing sequences. A classic example due to Ibragimov is the following: let the ξ i 's be i.i.d. N (0, 1) and let a i be the coefficient of z i in the power series expansion of the function h(z) = (1 − z) p , where p > 4 is non-integer. In this case, |a i | = O(i 1−p ) but (X i ) is not strong mixing (cf. [10]). Therefore, we take a different approach to change-point detection for linear processes.
As noted above, the time-change literature in a time series framework is vast. In most of the literature, the tests developed detect a specific change (in location, scale, covariance structure, or spectrum, as a few examples). A good review and discussion may be found in [16] and [6]. For instance, a linear model is considered in [2] and although mixing is not required, the summability and moment conditions imposed are stronger than ours and the change considered is reflected in the covariance structure of the process. On the other hand, for a change in the marginal distribution, the most relevant references are [8] and [11].
The test statistics that we propose here are also discussed in [8] and [11]. In all cases, the key to studying the asymptotic behaviour of the test statistics is a functional central limit theorem (FCLT) for the sequential empirical process both with and without a change point. The simple model considered in [8] is a change from one stationary time series to another at a fixed time, with no assumptions about the relation between the pre-and post-change time series. Both short and long range dependent time series are considered. However, it is assumed a priori that appropriate FCLTs are satisfied by the sequential empirical process. Conditions under which this is true are discussed briefly for independent observations, mixing sequences, converging alternatives and long memory processes. In the case of the linear model, there is a detailed analysis of the long range dependent case when there is a change in the coefficients (a i ), but not in the innovations (ξ i ). Here our model includes, but is not limited to, short range dependent linear processes and we are able to prove the necessary FCLTs both with and without a change point. As noted above, we can incorporate changes in both the coefficients and innovations at the same time. Furthermore, we allow random change points.
On the other hand, the change point model in [11] is more complex. While no specific time series model is imposed, there are very precise conditions on the alternative, which involves a gradual change in the marginal distribution at a specific rate throughout the observation period. These conditions control not only the way in which the marginal distribution changes, but also impose restrictions on changes in the bivariate distributions P (X i ≤ x, X i+h ≤ y). Furthermore, very strong mixing conditions are imposed on the time series (X i ). The requisite FCLTs are verified, and under further mixing assumptions a bootstrap procedure is proposed that produces suitable critical values for the test statistics. While these results are highly nontrivial and of significant theoretical interest, the mixing conditions exclude a wide range of linear processes and can be difficult to verify in practice.
As evidenced by [8] and [11], there is inevitably a trade-off between the model and the conditions required for the FCLTs -the more general the model, the more stringent the conditions. By restricting ourselves to the linear process, we are able to strike a balance whereby the alternative is as broad as possible -it is simply a change from one linear process to any other linear process -while at the same time the FCLTs can be proven under much weaker conditions than in [11] (see Comment 2.2.5).
We are able to avoid mixing conditions when deriving the asymptotic behaviour of the empirical distribution by using an elegant martingale method introduced by Gordin [9], which allows one to approximate the empirical process √ n(F n (x) − F (x)) with a martingale, and then apply one of the classical martingale central limit theorems. This was the approach taken by Doukhan and Surgailis, who in [5] proved an invariance principle for the empirical process of the linear model under the moment and summability conditions alluded to above. Under exactly the same conditions, we are able to extend the FCLT of [5] to the sequential empirical process of the linear model both with and without a change point. It is the martingale approach that allows us to simultaneously incorporate arbitrary changes in both the coefficients and the innovations, as well as a possibly random change point.
We proceed as follows: we introduce the stationary linear model and the simple conditions of [5] in the next section. Our main results, functional central limit theorems for the sequential empirical process both with and without a change-point, will then be stated. We end Section 2 with an analysis of the asymptotic behaviour under the null and the alternative hypotheses of both a Kolmogorov-Smirnov type statistic and a Cramér-Von Mises statistic under two different scenarios. In Section 3, the proposed framework and performance of the tests will be illustrated by simulations for both scenarios. Concluding comments and directions for further research are presented in Section 4, and all proofs appear in Section 5.

Main results
For the linear model X i = j≥0 a j ξ i−j defined in (1), let F and F ξ denote the respective distribution functions of X 0 and ξ 0 . In the sequel, we will proceed under the following assumptions as in Doukhan and Surgailis [5]. 1. Let {a j , j ∈ Z} be a sequence of non-random weights, infinitely many of which are non-zero, satisfying j≥0 |a j | γ < ∞ for some γ ∈ (0, 1].
2. There exist constants C < ∞ and Δ ∈ 2 3 , 1 such that for all u ∈ R It also implies that the distribution function of a partial sum of the a j ξ i−j terms is differentiable with a bounded density satisfying a uniform Lipschitz condition, provided that sufficient terms with non-zero a j are included in the moving average (cf. [5]). Obviously, the distribution F of X 0 is uniformly Lipschitz as well. We note that Doukhan and Surgailis assumed that Δ ∈ 1 2 , 1 , but there is a small error in the tightness argument in [5]. In fact, the right hand side of equation (17) of [5] should be CN 2 |x − y| 3Δ/2 . 1 4. The assumption that infinitely many coefficients (a i ) are non-zero is not required if F ξ has a uniformly Lipschitz derivative. In this case, all the results that follow remain valid. 5. Any linear process with Gaussian innovations and summable coefficients satisfies Assumptions 2.1. More generally, if the innovations have a bounded density and finite second moment, then Assumptions 2.1 are satisfied if |a k | = O(k −(2+ρ) ) for some ρ > 0. However, as pointed out in the introduction, such a linear process is not necessarily mixing. Furthermore, even if the linear process can be shown to be strong mixing, for a k = O(k −(2+ρ) ), the sharpest bounds on the mixing coefficients given in [10] are α(k) = O(k − 2 3 ρ ). However, Assumption A of [11] requires that ∞ j=1 j 2 α(j) δ/(4+δ) < ∞ for some δ ∈ (0, 2), and so is far more restrictive than the summability condition 2.1.1.
Before we state the functional central limit theorems for the sequential empirical process, we need to introduce some basic notation: We consider the weak convergence, denoted by D →, of random elements taking values in the space D(R × [0, 1]) equipped with Skorokhod's J 1 -topology (cf. [3] and [12] for more details).

The sequential empirical CLT under the null hypothesis
Define now the two-parameter sequential empirical process for (x, s) ∈ R×[0, 1]: Here we provide a sequential version of the functional central limit theorem of [5].
where W (1) (·, ·) is a centred Gaussian process with finite covariance The proof of Theorem 2.3 appears in Section 5. While the martingale techniques of [5] are readily adapted to prove convergence of the finite dimensional distributions, it will be seen that the proof of tightness becomes more complex.

The sequential empirical CLT under the alternative
In this section, we consider the following change-point model. Let {θ n , n ∈ N} be a sequence of random variables in [0, 1], converging in probability to a nonrandom θ. We have a causal linear process with a change-point at [nθ n ]. More precisely, consider the following stationary processes for i ∈ Z and a (1) j , a (2) j ∈ R: where the vectors (ξ i , ξ i ) are i.i.d. We do not make any assumption about the relation between ξ i and ξ i -they can have any sort of dependence structure.
We make the assumption that θ n is either independent of the innovations or that [nθ n ] is a stopping time with respect to the filtration generated by the innovations.
Denote by F and G the respective distribution functions of Y 0 and Z 0 . Borrowing the notation of [8], we write X n := (X 1 , ..., X n ) ∈ Ψ n (θ n , F, G) if The model considered here is very general and readily includes fixed changepoints (θ n = θ), random change points, as well as the so-called "converging alternatives" θ n p → 0 or 1 as n → ∞. The change-point can be data dependent. Our model includes any change in the parameters and/or the innovations. Provided that the nature of the change ensures that F = G, it should be detected by the test statistics proposed in the next section. In the case of no change, we write X n ∈ Ψ n (F ). Now consider the asymptotic behavior of the sequential empirical distribution where W (2) (·, ·) is a centred Gaussian process with finite covariance

F. El Ktaibi and B. G. Ivanoff
The proof of Theorem 2.4 appears in Section 5. Comment: Note that under the null hypothesis, θ n = 1 for all n. In this case, comparing Theorems 2.3 and 2.4, we have

The test statistics
We consider appropriate test statistics under two scenarios -first, we assume that the pre-change model is known, and second, that the pre-change model is unknown. In each case, we propose both Kolmogorov-Smirnov and Cramér von Mises-type test statistics. In each instance we find appropriate critical values and show consistency under the alternative. All proofs appear in Section 5.

Scenario 1: The pre-change model is known.
In this scenario, prior information may ensure that the linear model before the change is known, and in particular, F is specified. In this case, the null hypothesis and alternative are The test is based on the process: is the empirical distribution function based on X m+1 , . . . , X n .

Proposition 2.5. Under the null hypothesis H 0 and the assumptions of Theorem 2.3, we have for every
where W (1) (·, ·) is the limiting process in Theorem 2.3.
The following proposition deals with consistency of the test statistics T 1 and T 2 . Note that under the converging alternative θ n p → 1, the test statistics will be consistent provided that the rate of convergence of θ n is slower than 1/ √ n.
Proposition 2.6. Suppose the sequence {θ n : n ∈ N} satisfies one of the following assumptions: Then under the assumptions of Finally, we note that since the pre-change model is assumed to be known, appropriate critical values can be determined empirically by simulation. This will be illustrated and the performance of the test statistics will be compared in Section 3 for various examples.

Scenario 2: The pre-change model is unknown.
In practice, the more common situation is that there are no assumptions made about the pre-change model. In this case, the hypothesis and alternative become In this case, the test statistics will be based on the following process: which compares the (suitably weighted) empirical distributions before and after [ns], for 0 ≤ s ≤ 1:

F. El Ktaibi and B. G. Ivanoff
To test the pair (H 0 , H 1 ), we use the following statistics: • Weighted Kolmogorov-Smirnov statistic: • Weighted Cramér-Von Mises statistic: Again, we reject the null hypothesis H 0 for large values of T i , for i = 3, 4.
We next deal with consistency of the test statistics T 3 and T 4 . There are now two cases of converging alternatives to consider: θ n p → 0 and θ n p → 1. As before, the test statistics are consistent under a converging alternative provided that the rate of convergence of θ n is slower than 1/ √ n.
Proposition 2.8. Suppose the sequence {θ n : n ∈ N} satisfies one of the following assumptions Then under the assumptions of Finally, we observe that since there are no assumptions made about the pre-change model, the problem of finding suitable critical values for the test statistics is much more difficult for this scenario. We end this section with a brief discussion of the moving block bootstrap and a heuristic explanation of how it can be applied to yield consistent tests. The technical details and a rigorous justification of the technique are lengthy and complex, and are therefore reserved for a separate publication.
For the remainder of this discussion, we strengthen Assumption 2.1.1 slightly and assume that j≥0 j|a j | γ < ∞, while 2.1.3 can be weakened to E[|ξ 0 | 2γ ] < ∞ for some γ ∈ (0, 1] (these assumptions are made for both the pre-and postchange sequences (Y i ) and (Z i )). We note that these conditions are satisfied by the example discussed in Comment 2.2.5 and also by a wide range of non-mixing linear processes.
We will be using the same version of the moving block bootstrap (MBB) as presented in [15] and [7]. Consider first a sequence X i , i = 1, · · · , n such that n = lk for some integers l and k. Secondly, we extend our sample of size n by the first l − 1 observations, namely, X 1 , · · · , X l−1 to define the extended sequence X ni , i = 1, · · · , n + l − 1 as follows: Let I n1 , I n2 , · · · , I nk be independent and identically distributed random variables each having uniform distribution on {1, 2, · · · , n}. The intuitive idea behind the MBB is to concatenate k randomly chosen blocks of size l of the form {X nInj , X n,Inj +1 , · · · , X n,Inj +l−1 }, 1 ≤ j ≤ k, and construct the bootstrap sample of size n, The (non-sequential) bootstrapped empirical process is defined as follows: for where is the bootstrapped empirical distribution. This representation suggests the following definition of a sequential bootstrapped empirical process: The bootstrapped versions T of the Kolmogorov and Cramér-Von Mises statistics T 3 and T 4 are calculated as before, using the bootstrapped process where F

F. El Ktaibi and B. G. Ivanoff
As in [15], we assume the following relationship between the block lengths l n and the number of blocks k n . Following the notation used in [15], we write a n b n to indicate a n = O(b n ).
Assumption 2.9. Let (l n ) and (k n ) be sequences of natural numbers satisfying l n = l 2 k for 2 k ≤ n < 2 k+1 , l n → ∞ as n → ∞ and n = k n l n .
Let W (F ) (·, ·) and W (G) (·, ·) denote, respectively, the limiting Gaussian processes of Theorem 2.3 for the Y i 's and Z i 's. It can be argued that that for almost all realizations X 1 (ω), X 2 (ω), ..., as n → ∞, where B is a standard Brownian motion on [0, 1]. We have two situations. First, if l n θ n (1 − θ n ) p → 0, we have a converging alternative: θ n p → θ = 0 or θ = 1. In this case, the asymptotic distribution of W (b) n is determined by (15). In particular, W (b) n converges weakly almost surely to W (F ) if θ = 1 and to W (G) if θ = 0. This is proven in [6] (the doctoral thesis of the first author), where it is pointed out that if, for instance, n h−1/2 θ n n −1/3 or analogously n h−1/2 1 − θ n n −1/3 for some 0 < h < 1/6, then both test statistics T 3 and T 4 based on the original sample converge to infinity, while their bootstrapped counterparts converge weakly to finite limits. This allows us to tabulate critical values by constructing repeated moving block bootstrap samples.
In the second situation, θ n p → θ ∈ (0, 1). In this case, the asymptotic behaviour of W (b) n is dominated by (16). Specifically, we have that almost surely, Recalling that l n n converge weakly to finite limits, ensuring that the tests are consistent using bootstrapped critical values.

Simulations
In this section, Examples 1-4 illustrate the performance of the proposed test statistics for Scenario 1, in which the pre-change model is known. Although a detailed investigation of Scenario 2 will appear separately, we briefly demonstrate the sequential bootstrap technique via simulations in Examples 5 and 6.
In our simulation studies, we will consider the following stationary autoregressive processes where the vectors of innovations (ξ i , ξ i ) are i.i.d. and ρ 1 , 2). The change-point model satisfies

Scenario 1
In Examples 1-4 (Scenario 1), we will investigate both normal and Cauchy innovations, since Assumption 2.1 is satisfied in both cases with γ ∈ (0, 1] for normal innovations and γ ∈ (0, 1/4) for Cauchy innovations. For both the normal (Example 1) and Cauchy (Example 2) cases, we will consider separately changes in location or scale of the innovations, and a change in the coefficients.
We observe that our model assumes an abrupt change from one stationary process to another at [nθ n ]. A more natural assumption for the AR(1) model would be a change from ρ 1 to ρ 2 or from ξ i to ξ i at [nθ n ] (see (17)), in which case stationarity would be lost immediately after the change-point. This violation of the stationarity assumption is investigated in Examples 3 and 4 for normal and Cauchy innovations, respectively.
In all cases, we will assume that θ n = 0.5, and the change occurs at [0.5n].
We compare the performance of the two test statistics, the Kolmogorov-Smirnov (K.S.) and Cramér-Von Mises (C.V.M). The nominal level of significance is α = 5% and critical values were determined by 100,000 simulations of each test statistic. For the analysis of power, each simulation was repeated 400 times.

Example 1
Here we investigate the performance of our test statistics in detecting a change in an AR(1) process with normal innovations. We consider changes in the mean or variance of the innovations and finally in the coefficient ρ 1 .
The parameters used throughout this analysis are: n = 5000 for the sample size and θ n = 0.5 for the break location.
• Change in the mean of the innovations: In this case, we consider the following model with ρ 1 = ρ 2 = 0.5:   In this case, we consider ⎧ ⎨ N (0, 1).
Under the null hypothesis, ρ 1 = ρ 2 = 0.5. Furthermore, ρ 2 varies from 0.1 to 0.9 under the alternatives. The size and the power of the tests in this case are illustrated in Fig. 3 for both the K.S and C.V.M statistics. The power function is now quite asymmetric and we observe lower power when ρ 2 < ρ 1 . This is possibly due to the fact that the larger the value of ρ 1 , the greater the influence of an extreme value of ξ i (and therefore, Y i ) on subsequent values Y i+1 , Y i+2 , ..., and hence on the simulated critical value of the test statistic. For ρ 2 < ρ 1 , extreme values of ξ i will have less influence on Z i , Z i+1 , Z i+2 , ... and on the value of the test statistic.

Example 2
Similar to the preceding example, we will be considering now a stationary autoregressive model with Cauchy innovations. The break point will be chosen again at θ n = 0.5 with a sample size n = 3000.
• Change in the location parameter: The model to be considered in this case is as follows:  In this case, we consider the following model:  Here, we consider a model analogous to that of Fig. 3, but with Cauchy innovations: The performance of the K.S and C.V.M tests for this case are illustrated in the following figure. We note less asymmetry than in the case of normal innovations. In all cases, we can see that the rejection rate under the null hypothesis is close to the nominal level of significance α = 0.05 and that we achieve good power under the alternatives. We note that contrary to what is frequently observed, the Cramér-Von Mises statistic does not consistently outperform the Kolmogorov-Smirnov statistic. In fact, the performance of the test statistics is overall very similar and often indistinguishable except when testing for a change in the coefficient in the normal case where the empirical power showed considerable asymmetry.
Moreover, we note that the sample sizes chosen for these examples, while quite large, are appropriate for many types of financial data. In fact, we observed from many more simulations not presented here that a shift in location is easily detected with much smaller sample sizes. Furthermore, when there is a switch from normal innovations to Cauchy, and vice versa, almost perfect power is achieved. This is not surprising, since the change would involve moving from innovations with finite variance to ones with infinite variance, or vice versa. Indeed, in this situation the tests perform well with much smaller sample sizes. However, we observe that the degree of asymmetry in the empirical power increases as the sample size decreases.

Example 3
Our change-point model assumes an immediate change from one stationary process to another at the time of the change. We now investigate the behaviour of our test statistics if the assumption of stationarity after the change-point is violated. In particular, we consider the following model: Despite the fact that we now have a gradual change and stationarity of the process is lost after [nθ n ], we will see that our testing approach still allows us to detect simultaneously any change in the coefficients of the process, the mean or the variance of the innovations. For comparison, we will be using the same parameters for each case as in the previous examples. We begin with normal innovations and consider first a shift in the mean as in Example 1. The following figures show the empirical size and the power performance of the tests when there is a change, respectively, in the variance of the innovations or in the coefficient of the AR(1) model (17) with normal innovations.

Example 4
Similarly to Fig. 4, Fig. 5 and Fig. 6, the following figures illustrate the performance of the tests in the case of the Cauchy innovations in the AR(1) model (17). In comparing Examples 1 and 2 with Examples 3 and 4, we see that even when the assumption of stationarity is violated after the change, the tests still perform very well. Convergence to stationarity for the post-change process is rapid enough that it does not affect the power in these particular examples. Naturally, with small samples the pre-change parameters have a greater influence and therefore power is lost and more asymmetry is observed in the power functions.

Scenario 2
In Examples 5 and 6, we briefly consider Scenario 2, when the pre-change model is unknown. We illustrate the performance of the K.S. and C.V.M. statistics T 3 and T 4 assuming Gaussian innovations in the AR(1) model, using the sequential bootstrap described at the end of Section 2.3. Let us first recall our change-point model where Y i and Z i are two stationary autoregressive processes defined as in the preceding scenario where the vectors of innovations (ξ i , ξ i ) are i.i.d.
In both Examples 5 and 6, we assume that the pre-change innovations are N (0, 1) and examine power for a change in the mean, a change in the variance, and a change in the coefficient ρ 1 . More extensive simulations will appear separately with a rigorous theoretical justification of the sequential bootstrap procedure.
In both examples, we consider a sample size n = 10000, block length l n = 5, number of blocks k n = 2000, and number of bootstrap replications B = 500.
The simulations are made at a nominal level of significance α = 5% and each case is performed p = 400 times for the power analysis.

Example 5
Here we consider a converging alternative and assume that θ n = 0.05. The next two figures illustrate the power of the test statistics when there is a change in the mean and the variance. We assume ρ 1 = ρ 2 = 0.5. The next two figures illustrate detection of a change in the coefficients when θ n = 0.05. In the first, we assume that ρ 1 = 0.75 and in the second, we assume ρ 1 = −0.75.  From Example 5 it can be observed that the test statistics achieve acceptable power under a converging alternative. Example 6 illustrates much better performance when the alternative is not converging, despite the fact that the bootstrap test statistics diverge. We see that the Cramér-von Mises statistic out-performs the Kolmogorov-Smirnov statistic when there is a change in the coefficient with a converging alternative, but otherwise the two statistics achieve very similar power. Once again, we observe that this unified approach allows us to detect virtually any sort of change in the model that results in a change in the marginal distribution.

Conclusion
In this paper we have investigated the behaviour of the sequential empirical process of a causal linear model under a time change. We have proven a functional CLT (Theorem 2.4) under very general conditions that include non-mixing processes and random time changes. This allows us to test the hypothesis of no change against the very general alternative that the marginal distribution of the linear process changes at some unobserved time. We have proposed appropriate test statistics under two scenarios, first when the pre-change model is specified and second, when it is not. The performance of the test statistics is investigated for the first scenario under various types of changes. The tests are shown to continue to perform well when the change is more gradual and the assumption of stationarity after the change is violated. For the second scenario, a bootstrap procedure is proposed and illustrated via simulations.
There are many open questions and directions for future research that are beyond the scope of this paper.
• When the pre-change model is specified, critical values for the test statistics are easily found by Monte Carlo techniques. However, when the model before the change is unknown, bootstrap techniques are required. This is a complex question addressed in [6] and a detailed justification of the discussion in Scenario 2 will be the subject of a separate publication. • Our model includes a random change-point [nθ n ] under the assumption that θ n P → θ where θ ∈ [0, 1] is a constant. Conditions under which this occurs should be investigated. Further, a random limit can likely be introduced into the model, as in [4], pp. 144-145.
• The tests proposed here are retrospective. Sequential methods should be investigated. • The spatial causal linear process has been studied in [13]. Detection of a change-point or change-set for this process is currently under investigation.

Proofs
Throughout this section, C will denote a generic constant which may be different at each appearance.

Proof of Theorem 2.3
We will prove (a) Convergence of finite dimensional distributions of W n using the Cramér-Wold device. (b) Tightness of the sequence W n (·, ·). First we introduce some notation. Let F i be the σ-algebra and for h ≥ 0 define the martingale differences Define for N < ∞, We observe that: where the first and the second lines follow respectively using the reverse martingale convergence theorem and the 0-1 law. Thus, almost surely and in L p for all p > 0, and

Proof of (a): convergence of finite dimensional distributions
The proof follows the same lines as in [5], and so we give only a brief outline. For full details, refer to [6]. Let Exactly as is proven in [5], the limiting finite dimensional distributions of W n (·, ·) as n → ∞ are the same as those of W N n (·, ·) as first n → ∞ and then N → ∞. Furthermore, define For each x, (M N i (x), F i ) is a martingale difference sequence. Using arguments similar to those in [5], it can be shown that there exist b j > 0 independent of N such that j∈Z b j < ∞ and |Cov(R N 0 (x), R N j (y))| < b j . Therefore, letting

F. El Ktaibi and B. G. Ivanoff
Define the martingale Q N n (x, s) : . As shown in [5], for N, x, s fixed as n → ∞. Therefore, the limiting finite dimensional distributions as n → ∞ of (W n (x, s), x ∈ R, s ∈ [0, 1]) are the same as those of (Q N n (x, s), x ∈ R, s ∈ [0, 1]) as first n → ∞ and then N → ∞.
The Cramér-Wold device and McLeish's martingale CLT ( [14], Theorem 2.3) are used to verify convergence of the finite dimensional distributions of Q N n (·, ·). We illustrate the proof by showing the result for only two points since the general proof follows similarly. Let s ≤ t and consider aQ N n (x, s) We have by the ergodic theorem.
Applying Theorem 2.3 of McLeish [14] and combining that with equation (22) leads to aQ N n (x, s) + bQ N n (y, t) where Finally, let N → ∞ and convergence of the finite dimensional distributions of W n (·, ·) follows from (23) and (24).

Proof of (b): tightness
The proof of (b) is presented in more detail because demonstrating tightness of the sequential process W n (·, ·) in D(R × [0, 1]) is more complex than the nonsequential case considered in [5] and involves extending some arguments in [4] to two dimensions. For complete details, see [6].
The proof of (26) is based on the following two lemmas.
The proof of Lemma 5.1 is lengthy and technical, and so we defer it to the end of the section. It is used to prove the following lemma, which extends an argument of Billingsley ([4], pg 198-199) to two dimensions.

Lemma 5.2. If Assumptions 2.1 hold, then there exists
for all sufficiently large n.

Proof
Recall that |x − y| ≤ 1 and assume without loss of generality that 0 < ε < 1. If ε n ≤ (|x − y||t − s|) 3Δ 4 , then from Lemma 5.1 we have and the so called condition (β, γ) in [3] is satisfied with β = 3Δ 2 and γ = 4. Let p be a number satisfying ε n 2 3Δ ≤ p ≤ 1, and remark that for i, j = 1, . . . , m Now apply the argument used in the proof of Theorem 1 of [3] to get

F. El Ktaibi and B. G. Ivanoff
We now show that for x ≤ y ≤ x + p and s ≤ t ≤ s + p be the number among X 1 , . . . , X [ns] that satisfy X i ≤ x. Then, recalling that F is Lipschitz, and (30) follows immediately. Therefore, We know that 1 √ n < ε for all sufficiently large n. In addition, if then (29) applies and we get Choose δ such that Cδ 3Δ−2 ε 6 < η, then (27) follows from (32) provided there exists a real p satisfying (31) and an integer m such that δ = mp. This is equivalent to the existence of an integer m such that which is true for all sufficiently large n. This completes the proof of Lemma 5.2.
Returning to the proof of (b) and arguing as in the proof of Theorem 8.3 in [4], we define If |x − y| < δ and |t − s| < δ then it follows that for n sufficiently large, we get which is (26) except for the factors preceding ε and η. This concludes the proof of (b), thereby completing the proof of Theorem 2.3.
We now return to the proof of Lemma 5.1.

Proof of Lemma 5.1:
Given a function g(x), x ∈ R, define g(x, y) = g(y) − g(x) and suppose for instance that s ≤ t. Then

F. El Ktaibi and B. G. Ivanoff
Next define for k ≤ [nt] We will make use of the methodology and the results obtained in [5]. First, we establish a formula for the fourth moment via the sums V k (x). Write For the other terms, we will use the following inequalities proven in [5].
For h > h 0 , For 0 ≤ h ≤ h 0 and p ≥ 1, Consider now the term I 2 and remark that for k ≤ −h 0 On the other hand, for −h 0 < k ≤ [nt], we need to take into consideration the position of i − k and j − k with respect to h 0 . In each case, we will make use of equation ( Denote the upper bounds found in (36) and (37) by T (1) k and T (2) k respectively and define By orthogonality we obtain for k ≤ [nt] Therefore, From (38) and (39), we deduce the following inequality which will be used later in the proof: Next, consider

F. El Ktaibi and B. G. Ivanoff
In what follows, we will use equations (34) and (35) repeatedly in addition to having ξ k independent of F k−1 and E[|ξ If −h 0 < k ≤ [nt], then we need to consider the position of i j − k with respect to h 0 for j = 1, 2, 3. We get in this case, Denote the upper bounds defined in (41) and (42) by B (1) k and B (2) k respectively and define Then, Therefore, (40) and (43) imply that Finally, consider We use (34) Combining (33), (39), (44) and (47) completes the proof of Lemma 5.1.
If it can be shown that the limiting processes in (50) and (51) are independent, the proof is complete since W n (·, ·) D → W (F ) (·, · ∧ θ) + W (G) (·, (· − θ) + ), and the Gaussian limit on the right has the correct covariance structure. It remains to show independence of the limits in (50) and (51). If for all n, [nθ n ] is a stopping time with respect to the innovations, define G i := σ{(ξ j , ξ j ), j ≤ i}. On the other hand, if for all n, [nθ n ] is independent of the innovations, let G i := σ{(ξ j , ξ j ), j ≤ i} ∨ σ{θ n , n ≥ 1}. Denoting F F i := σ{ξ j , j ≤ i} and F G i := σ{ξ j , j ≤ i}, define the martingale differences  (2) j ξ −j ). By stationarity and independence we have the following: As in the proof of Theorem 2. By orthogonality of the martingales, the limiting Gaussian processes W (F ) and W (G) in (50) and (51) are therefore independent, completing the proof of Theorem 2.4.

Comment:
In the proof above, the time change argument found in [4] is applicable provided that θ n p → θ -there is no requirement that [nθ n ] be a stopping time. The assumption that [nθ n ] is a G-stopping time is used only to ensure independence of W (F ) and W (G) .