Empirical likelihood for break detection in time series

: Structural breaks have become an important research topic in time series analysis and in many ﬁelds of application, e.g. econometrics, hy-drology, seismology, engineering and industry, chemometrics, and medicine. In most phenomena encountered in the real world modeling assuming no change of structure may obviously lead to inconsistent estimates and poor forecasts. Much research work has been carried out in the past decades and eﬀorts mostly concentrated on the identiﬁcation of points in time where structural breaks possibly occur and on the development of statistical tests of absence of breaks. Common procedures are the CUSUM based statistics and the F based statistics proposed by Bai and Perron in several papers and technical reports. The need of speciﬁcation of a probability distribution for conducting the statistical tests has been often a drawback in practical applications. If the distribution is misspeciﬁed the test may turn to be unre- liable. In this paper a distribution free procedure for identiﬁcation of change points and testing for structural breaks is proposed based on the empirical likelihood and the empirical likelihood ratio statistics. A comparison with the F test and the CUSUM test is performed on both simulated and real world data. The size and power of the test is comparable with existing procedures and the identiﬁcation procedure is generally accurate and eﬀective.


Introduction
The analysis of structural breaks, or change points, has attracted a great deal of interest in many fields, and a vast literature is available about this topic. For a comprehensive review, see e. g. [2,35]. Independent data, regression models and time series have been considered, with the aim of detecting whether one or more structural changes are present in the data, and estimating their dates and the parameters characterizing the different data subsets.
Structural break detection is usually based on hypothesis tests, where the null hypothesis states absence of breaks, and the alternative specifies one or more breaks. Proposed test statistics are likelihood ratios, Lagrange multipliers or Wald statistics, and are therefore essentially based on the sum of squared residuals under the two hypotheses.
In recent years a new framework has been proposed for similar test problems, and related confidence intervals, known as empirical likelihood [e.g. 34]. It has the advantage of being essentially non-parametric in nature, not requiring any strong assumption on the probability distribution of the data. In this paper we propose the use of empirical likelihood methods for addressing the problem of detection and estimation of level changes and structural breaks both in independent and dependent data, and compare this proposal with the most popular alternative, the multiple structural change analysis in linear models of [4].
The plan of the paper is as follows. Section 2 briefly reviews the structural change analysis literature for independent data and linear models. Section 3 reviews the empirical likelihood methods relevant to our problem. Section 4 is concerned with the use of empirical likelihood for detecting level changes in independent data. In Section 5 an empirical likelihood based method for detecting structural breaks in regression and autoregression models is proposed. Section 6 discusses some implementation issues, Section 7 presents some simulation results and applications, and Section 8 concludes.

Structural change
We begin with level shifts, i. e., changes in the mean. Let y 1 , y 2 , . . . , y n denote a sample of independent data with common variance σ 2 , and a level shift at an unknown date q such that the series is broken in two sub-samples with different means: E(y i ) = µ, i = 1, 2, . . . , q and E(y i ) = µ + ω, i = q + 1, . . . , n. We want to test the null hypothesis of constant mean: H 0 : ω = 0. The most used tool in this framework is the so-called CUSUM process [see e.g. 15]. Assuming that the break date is a fixed proportion of the number of observations: q = ⌊nx⌋, the CUSUM is a scaled difference between the observed average of the two subsets: However, since the variance of z n (q/n) is [ q n (1 − q n )], change points occurring near the beginning or the end of the data series are much more difficult to detect, and a weighted version has been suggested. Moreover, in the case of Gaussian distribution the LR test statistic for a level change at time q is easily seen to be − 2 log Λ q = z n (q/n) 2 q n 1 − q n −1 (2.2) thus one may consider to use the adjusted CUSUM (2.2) rather than the original one. A supplementary problem arises since on writing we have that max{T n (x) 2 , 0 ≤ x ≤ 1} diverges to infinity as n → ∞: this may be solved considering only a truncated interval {ǫ ≤ x ≤ 1 − ǫ}, which amounts to allowing only subseries with at least h = nǫ observations. With this limitation, it may be shown that max nǫ≤q≤n(1−ǫ) .
It has been shown that a similar result holds also for dependent data under some assumptions [see 2, and references therein], with σ replaced by the (square root of the) so-called long-run variance v = lim Obviously, to compute the CUSUM statistic in practice, a consistent estimate of the variance σ 2 , or v, is needed, and this may be difficult in some dependent case, see e.g. [38].
The case when {y t } is a causal linear process: where {u t } is a white noise process, is addressed by [37]. They fit an ARMA model to the data and consider the adjusted CUSUM statistic computed on the observed residuals, and find that its performance is generally better than the CUSUM computed directly on the data. Also (like the independent data case) the LR and F tests are asymptotically equivalent to the CUSUM test, and their finite sample behavior in some simulated AR(p) series is essentially equivalent.
Suppose now that the data {y i } is generated by a linear regression model where the parameters θ j may be subject to change. This framework was addressed in several papers by Andrews, Bai, Perron and co-authors [e.g. 1,4].
The hypothesis of absence of breaks may be phrased in terms of the parameters as H 0 : θ , j = 1, . . . , p; k = 1, . . . , m and expressed as a linear constraint Rθ = 0. Therefore, such hypothesis may be tested via the usual F test: whereθ = (X ′X ) −1X ′ y is the unrestricted least squares estimate, andθ is the least squares estimate conditional to Rθ = 0 so that: In the case of a level shift in independent data with only one break, it may be readily seen that the statistic F (q/n) is equivalent to the adjusted CUSUM (2.2).
Since the break dates are unknown, one may consider the maximum value of (2.6) on all possible partitions of the series into (m+1) subseries. Again, to avoid divergence we have to limit to partitions where each subseries has at least h = nǫ observations, with ǫ > 0. To reduce computations, Bai and Perron [5] have proposed an efficient algorithm, based on the principle of dynamic programming, which requires at most O(n 2 ) least-squares operations. This algorithm will be extended in Section 6 to the use of empirical likelihood.
The asymptotic distribution of the resulting sup F statistic has been derived in [4] under some regularity assumptions, and is a generalization of the sup Brownian Bridge. It depends on ǫ, m and p and tabulated quantiles by simulation may be found in [6].
A partial structural break is also considered by [4], where only some of the regression parameters are allowed to change, while the others are constrained to remain equal in each regime. To be specific, denote the regression parameter vector by θ = (β ′ , δ ′ ) ′ where β = (β 1 , β 2 , . . . , β p ) ′ are the first p parameters that are assumed constant, and δ = (δ 1 , δ 2 , . . . , δ q ) ′ are the last q regression parameters, which are subject to change, and partition the design matrix into blocks: X = (W, Z), where W : (n × p) corresponds to the β parameters, and Z : (n × q) to the last q parameters δ. Then, given any partition into subseries induced by m breaks, the analogous model to (2.4) may be written: . . , δ (m+1) ′ ) ′ andZ is the n × (m + 1)q matrix corresponding to the expansion of Z, analogue to (2.5).
The hypothesis of absence of breaks may be tested using the F statistic related to the linear restrictions on the parametersδ specifying that δ , j = 1, . . . , q; k = 1, . . . , m, expressed as a linear system Sδ = 0. Then the conditional F statistic is: whereβ,δ are the unrestricted LS estimates of (β,δ), and (β,δ) are the estimates under the hypothesis Sδ = 0. In this case also, to avoid divergence we restrict to the cases when the subseries have at least h = nǫ observations, and compute the maximum of (2.8) on any segmentation of the series into (m + 1) segments. For such problem also [5] proposed a dynamic programming algorithm, that is however more complicated than in the previous case. The resulting sup F c statistic is used to test the null hypothesis of no break. Its asymptotic distribution has been derived by [4] and is similar to the complete structural break (2.6) but depends now on m, ǫ and q (the number of regression parameters subject to change). Several other methods for identifying and estimating change points (including e.g. Bayesian methods, least absolute deviation and LASSO procedures) are discussed in [20].

Empirical likelihood
Empirical likelihood methods were originally derived for building confidence intervals for the mean of a random sample, without assumptions on the form of the probability distribution. They have been later extended to cover inference on parameters of linear models and time series. A comprehensive introduction is [34]. Further applications of empirical likelihood have been recently proposed for many statistical problems, among them for example generalized linear models [23,40], conditional heteroskedasticity models [11,17,12], unstable autoregressive models [14], and long-memory time series [41].
Consider an observed random sample (y 1 , y 2 , . . . , y n ) from the random variable y with E(y) = µ and Var(y) = σ 2 . That sample may be thought of as drawn from a discrete probability distribution concentrated only at the values y 1 , y 2 , . . . , y n with probabilities p i (p i > 0, p i = 1), called the empirical distribution. In this case the probability of getting exactly that sample would be p 1 p 2 . . . p n . The mean of that distribution is µ = i p i y i , therefore, given µ, the possible discrete distributions generating the observed sample are the set {(p 1 , p 2 , . . . , p n ) : p i > 0, p i = 1, p i y i = µ} and the largest probability to get the observed sample, given µ, is (3.1) Expression 3.1 as a function of µ is called the empirical likelihood profile for µ. The maximum is reached for µ =ȳ = y i /n and equals n −n , corresponding to p i = 1/n, i = 1, . . . , n. Thus,ȳ is the maximum empirical likelihood estimator of µ. The empirical likelihood ratio is obtained dividing (3.1) by its maximum n −n : The solution of the maximum problem (3.2) may be obtained through Lagrange multipliers:p If we define the empirical log likelihood ratio as ℓ(µ) = −2 log{ELR(µ)}, then and [32] has shown that if µ 0 is the true mean of y, then as n → ∞, ℓ(µ 0 ) converges in distribution to a χ 2 1 variable, a result similar to the parametric likelihood. Therefore ℓ(µ) may be used for defining asymptotic confidence intervals, and for hypothesis testing, for the mean.
A similar result holds also for random vectors. The following theorem is proved in [34].
Theorem 3.1 (Owen [34]). Let y 1 , y 2 , . . . , y n be i. i. d. random vectors with mean µ 0 and finite variance-covariance matrix V 0 of rank d. Then −2 log ELR(µ 0 ) converges in distribution to a χ 2 d as n → ∞. The key step in the proof is to show that −2 log ELR(µ 0 ) may be written as n (y 1 + y 2 + · · · + y n ) and S = 1 The empirical likelihood framework has been generalized to linear regression and analysis of variance [33] and also to data generated by a statistical parametric model through estimating equations [36,29,34]. Let θ = (θ 1 , . . . , θ p ) ′ ∈ Θ and suppose that the relationship between the parameters θ and the data is summarized by r estimating equations E{g k (y, θ)} = 0, k = 1, 2. . . . , r that determine θ uniquely. Usually r = p but the case r > p has also been discussed in [36]. The empirical likelihood ratio is defined by analogy: It may be shown that where the multipliers λ k ≡ λ k (θ) are the solutions of n i=1 g j (y i , θ) 1 + p k=1 λ k g k (y i , θ) = 0, j = 1, 2, . . . , p.
Finally, if θ 0 denotes the true parameter value, then ℓ(θ 0 ) converges in distribution, as n → ∞, to a χ 2 p under some regularity conditions (see Theorem 5.1 below), whose most important one is that the matrix S with entries S ij = E{g i (y, θ 0 )g j (y, θ 0 )} be of full rank. It may be shown [see 36] that, with g(y, θ) = {g 1 (y, θ), g 2 (y, θ), . . . , g p (y, θ)} ′ , In the case of linear regression model (2.3), that will be of special interest here, the estimating functions g are provided by the normal equations: Since the maximum of ELR(θ) is obtained in p 1 = p 2 = · · · = p n = 1/n, the LS estimatorθ = (X ′ X) −1 X ′ y is also the maximum empirical likelihood estimator, and ℓ(θ) = 0.
It follows that an empirical likelihood ratio test for the hypothesis H 0 : θ = θ 0 may be based on the statistic ℓ(θ 0 ), which under H 0 is asymptotically chi square distributed with p degrees of freedom.
The empirical likelihood framework was proposed for testing the difference of two samples [21,25], and for the difference of regression coefficients in two samples [44]. Less attention in literature has received the use of empirical likelihood for detecting change points. To the best of our knowledge, the first proposal is [45] who consider an independent sample with one change in some distribution moments at unknown time, and [26] where a regression model is considered with one possible change in the parameter values at unknown point. The present paper is an extension to those contributions in three directions: allowing more than one break, considering parametric models defined by estimating equations and autoregressions in addition to regression models, and including partial structural breaks.
Finally, related contributions are [18,42,19,16]. The first paper proposed empirical likelihood for detecting one change point in a sample where the distribution before change is F , and that after change is where w(·, θ) is a known weight function depending on a parameter θ. The other papers [42,19,16] deal with partially time-varying regression models: where the time-varying coefficients α(t) are assumed continuous and differentiable, and propose empirical likelihood confidence intervals for β.

Testing level changes with empirical likelihood
Let us consider first the simple case of a random sample (y 1 , . . . , y n ) with a single level shift at a known time q, assuming E(y i ) = µ 1 , (i = 1, . . . , q), E(y i ) = µ 2 , (i = q + 1, . . . , n), Var(y i ) = σ 2 , all i and independent y i 's. The empirical likelihood profile is and the maximum is n −n obtained for m 1 =ȳ 1 , m 2 =ȳ 2 (the averages of the two sub-samples). Therefore the maximum empirical likelihood under the hypothesis H 0 : µ 1 = µ 2 = µ is obtained as π(µ, µ) and the empirical likelihood ratio test statistic is −2 log π(µ, µ) − 2n log n and under H 0 , keeping q/n fixed, converges as n → ∞ to a central chi square variable with one degree of freedom [see e.g. 21]. The computation of π(µ, µ) is simple because it equals the sum of the empirical likelihood computed separately on the two sub-series (y 1 , . . . , y q ) and (y q+1 , . . . , y n ). We state this result in the more general case of m breaks.
Then, the hypothesis of absence of breaks may be tested using the inf of (4.1) with respect to the mean: The statistic (4.2) is equal to the empirical likelihood test statistic for an analysis of variance model with one factor, and the null hypothesis of equal mean for each group, thus under H 0 is asymptotically chi square distributed with m degrees of freedom [see 33, p. 1738]. This idea is very similar to the CUSUM framework, and indeed leads to asymptotically equal results, as detailed in the following Corollary.
Corollary 4.1. In the case of a test of the null hypothesis of absence of breaks against an alternative of a single break at time q, if q/n → ρ as n → ∞ with 0 < ρ < 1 and E|y| 4 < ∞ the difference between the ELR test statistic (4.2) and the adjusted CUSUM statistic (2.2) converges to zero in probability as n → ∞ under H 0 .
The problem of multiple level changes may be also obviously expressed in terms of a regression model (2.3) with only one parameter (p = 1) and design matrix coefficients x i1 = 1 ∀i. The hypothesis of absence of breaks may be tested using the Bai and Perron method. In this case also an asymptotic equivalence holds.

Corollary 4.2.
For a test of the null hypothesis of absence of breaks against an alternative of m breaks at dates t j , if t i /n → ρ i (i = 1, . . . , m) as n → ∞ with 0 < ρ 1 < ρ 2 < · · · < ρ m < 1 and E|y| 4 < ∞ the difference between the ELR test statistic (4.2) and m times the F test statistic (2.6) converges to zero in probability as n → ∞ under H 0 .
Given the asymptotic equivalence of the test statistics, the empirical likelihoodbased test for breaks may be accomplished by the same strategy as proposed by [4]: 1. Select the number of breaks m, and a minimum length of the resulting sub-series, h = nǫ 2. Compute the maximum of the ELR test statistic on all partitions of the series into (m+1) sub-series (y 1 , . . . , y t1 ), (y t1+1 , . . . , y t2 ), . . . , (y tm+1 , . . . , y n ) with at least h observations each, call it ℓ * (m).
Under the null hypothesis of absence of breaks, the statistic ℓ * (m) has the same asymptotic distribution as the Bai and Perron's F (m|0) test, derived in Proposition 6 of [4], and quantiles for selected values of ǫ are derived by simulation in [6]. For the alternative of only one break, the asymptotic distribution is that of the sup of a Brownian bridge, and the theoretical approximated quantiles of [37] may be considered.
In the case of scalar y i 's further insight into the nature of the difference between the statistics ELR and F may be easily gained. Given the segmentation 1 < t 1 < t 2 < · · · < t m < n, and denoting by n i the length of the i−th subseries, withȳ i its average andȳ the overall average, the Bai and Perron statistic whereσ 2 is an estimate of the variance under the hypothesis of m level changes: On the other side, for each subseries {y ti−1+1 , . . . , y ti } the ELR ratio tends asymptotically to a quadratic form: is the variance of the i−th sub-series under the hypothesis of absence of breaks, usually larger thanσ 2 . Therefore one can expect that the ELR test statistic tends to be smaller than the corresponding F test statistic, a closer value might be obtained with a weighted ELR: In this case the optimal value of µ is simplyȳ; for the unweighted ELR, and for vector y i 's, the search is more complicated and will be discussed in Section 6. When only one break is allowed, our procedure is equivalent to that proposed in [45], but computations are different. The method of [45] is based on solving a non linear equations system, while ℓ 0 in (4.2) requires evaluation of ELR on two sub-series, and may be computed using the standard empirical likelihood routines.

Testing structural breaks with empirical likelihood
We consider now the case that the data {y 1 , . . . , y n } is generated by a model described by the estimating equations: where θ = (θ 1 , . . . , θ p ) ′ ∈ Θ are the model parameters, and may be subject to m breaks at times 1 < t 1 < · · · < t m < n, so that each i−th subseries (y ti−1+1 , . . . , y ti ) is generated by a model with parameters while under the null hypothesis of absence of breaks all the observations are generated by the same model with parameters θ 0 = (θ 0 1 , . . . , θ 0 p ) ′ . We denote again the ELR statistic for the sub-series (y s+1 , . . . , y t ) by The empirical likelihood test statistic for the null hypothesis of absence of breaks may be computed as the sum of the ELR statistic on each sub-series, as stated in the following Theorem.
The null hypothesis of absence of breaks: H 0 : θ (1) = θ (2) = · · · = θ (m+1) may be tested considering the sup of the ELR statistic under H 0 : Here Corollary 5 of [36] (discussed at the end of Section 3) may be applied, on taking for the free parameter (previously denoted by θ 2 ) the regression coefficient of the first sub-series θ (1) , and for the constrained parameter (previously denoted by (1) . The hypothesis H 0 corresponds to θ 1 = θ 0 1 = 0, and from the Corollary, the statistic (5.2) is under H 0 asymptotically chi square distributed with mp degrees of freedom.
The simplest and most interesting case is when the generating model is linear: where we assume that the ε i 's are independent from the x ij and mutually independent, but only mild boundedness conditions for the x ij are required. The estimating functions are given by the left hand sides of the normal equations: . Thus, the asymptotic value of the ELR ratio at the parameter value θ 0 is When only one break is allowed, our procedure is equivalent to that proposed in [26], but again the computation strategy is different.
The asymptotic similarity between empirical and Gaussian likelihood ratios implies an asymptotic relationship between the ELR ratio test statistic and the Bai and Perron's F statistic also for linear models: Corollary 5.1. If the data is generated by model (5.3) and the assumptions of Theorem 5.1 are satisfied, for a test of the null hypothesis of absence of breaks against an alternative of m breaks at dates t j , if t i /n → ρ i (i = 1, . . . , m) as n → ∞ with 0 < ρ 1 < ρ 2 < · · · < ρ m < 1, the difference between the ELR ratio test statistic (5.2) and mp times the F test statistic (2.6) converges to zero in probability under H 0 .
For linear models too, an essential difference, for finite n, between ELR and F resides in the evaluation of the residual variance, which is computed under the null hypothesis in ELR, and under the alternative in F . However, here a weighted version for reducing such difference is not viable, since the residuals appear inside the matrix G(θ) to be inverted.
Corollary 5.1 suggests that similar strategies as those proposed by [4] may be employed for testing breaks using the empirical likelihood: 1. Select a maximum number of breaks m, and a minimum length of the sub-series h = nǫ 2. Compute the maximum of the ELR ratio test statistic (5.2) on all possible partitions. Under H 0 the maximum statistic has the same asymptotic distribution of the Bai and Perron's F (m|0) statistic, and their quantiles may be used.
More elaborate strategies, also proposed in [5], and the important issue of maximizing ELR(θ) with respect to θ and with respect to the break dates, are discussed in Section 6.
Let us address now the problem of structural breaks for autoregressive processes. Consider an AR(p) model: The empirical likelihood for these processes has been studied by [14] extending results on dual likelihood of [29]. Since (5.4) is a linear model it may be put in the form of a regression model (2.3) with x ij = y i−j , and letting z t = (y t , y t−1 , . . . , y t−p ) ′ the estimating equations may be written so that, if θ 0 is the vector of the true parameter values, g(z t , θ 0 ) = ε t (y t−1 , . . . , y t−p ) ′ and an empirical likelihood ratio analogue to (3.3) may be defined: p t g k (z t , θ) = 0, k = 1, 2, . . . , p .

(5.5)
The results of [36] cannot be directly applied since the g(z t , θ) are not independent, but [14] have shown that under the following assumptions: AR1 The process {ε t } forms a martingale difference with respect to the sequence of σ−fields F t = {. . . , ε t−2 , ε t−1 , . . . , y t−2 , y t−1 } AR2 E{ε 2 t |F t } = σ 2 for every t, E{ε (4+c) t |F t } < ∞ for a c > 0, the initial values y 0 , . . . , y 1−p are F t −measurable AR3 All roots of the autoregressive polynomial θ(z) = 1 − θ 0 1 z − · · · − θ 0 p z p lie outside the unit circle the empirical likelihood ratio ELR(θ 0 ) is asymptotically equivalent to and converges in distribution to a chi square with p degrees of freedom, like in the case of independent data. Turning now to the structural change analysis, Bai and Perron's [4] sup F test may be also applied to autoregressive processes under some additional assumptions, see [4, page 57] and [3, page 302]. Let us denote any partition of the data into m + 1 sub-series by {(y ti+1 , . . . , y ti+1 ), i = 0, 1, . . . , m} with t 0 = 0 and t m+1 = n as in Sections 4 and 5, and let n i = t i+1 − t i ; the assumptions are as follows: is a vector Wiener process in [0, 1] with variance σ 2 vQ i , and W 0 , W 1 , . . . , W m are independent of each other.
The above assumptions are sufficient for extending to autoregressive processes our results concerning the asymptotic distribution of the ELR test statistic and the asymptotic equivalence between the sup F test and the supremum of the ELR statistic under H 0 : as stated in the following Theorem.
Theorem 5.2. For an AR(p) process with parameters θ 0 and under assumptions AR1-AR5, the empirical likelihood ratio test statistic for the hypothesis and if t i /n → ρ i (i = 1, . . . , m) as n → ∞ with 0 < ρ 1 < ρ 2 < · · · < ρ m < 1 (i) as n → ∞, under H 0 , the statistic converges to a central chi square distribution with (m + 1)p degrees of freedom (ii) for a test of the null hypothesis of absence of breaks against an alternative of m breaks at dates t i , the difference between the ELR ratio test statistic (5.6) and mp times the F test statistic (2.6) converges to zero in probability under H 0 .

Partial structural breaks
We address here the hypothesis that the breaks involve only some parameters, while the others remain unchanged throughout the whole dataset. However, we believe that only very rarely one may have a priori information sufficient to exclude that some parameters of the linear model are subject to possible breaks. Thus it is advisable, to start with, testing for a complete structural break. Only if the hypothesis of absence of breaks is rejected, and given the best specification of the number of breaks m, and their dates t 1 , . . . , t m , we proceed to examine the hypothesis that some of the model parameters remain constant, selecting them among those whose estimates are most variable throughout the (m + 1) sub-series. Such a procedure has the advantage of requiring only one maximization stage with respect to the break dates, and only for the complete break structure. We shall assume, without loss of generality, that the possibly constant parameters are the first p, while there are q more parameters (labeled from p + 1 to p + q) that are subject to m breaks at dates t 1 , t 2 , . . . , t m .
Consider the regression model analogous to (5.3): y = Xθ + ε, where now θ has (p + q) parameters. Partition the design matrix as in Section 2: X = (W, Z) where W : n× p corresponds to the first p parameters, and Z : n× q corresponds to the last q parameters, and denote θ = (β ′ , δ ′ ) ′ where β : p × 1 relates to W , and δ : q × 1 relates to Z. Now, as in (2.4), we consider a partition of the whole series into (m + 1) sub-series delimited by the break dates, and write the expanded model: y =Xθ + ε whereθ = (θ (1) ′ , θ (2) ′ , . . . , θ (m+1) ′ ) ′ , andX is the corresponding expanded design matrix. The partial break model, assuming that the first p parameters remain fixed, may be written as in (2.7), whereZ is the n × (m + 1) matrix corresponding to the expansion of Z, similar to (2.5).
If the null hypothesis is rejected, and given m and the dates t 1 , . . . , t m , an empirical likelihood ratio statistic for the partial break hypothesis may be obtained maximizing the sum of the EL ratios: on the set The results of the previous Section imply that, under the hypothesis of no break, and given the dates, ℓ 1 is asymptotically chi square distributed with mp degrees of freedom. The results concerning partial empirical likelihood of [36,Corollary 5], referred to in Section 3, also ensure that ℓ * − ℓ 1 is asymptotically distributed as a chi square with mq degrees of freedom under the hypothesis that the first p parameters remain constant. This difference may be used as a test statistic for the partial break, rejecting, if it is significant, the hypothesis that the change does not involve the first p parameters.
A similar strategy may be adopted also when detecting breaks with F-type statistics. Three sub-spaces may be defined: • S: the space generated by the columns ofX without restrictions; letθ denote the least squares estimate, and therefore P (y) =Xθ is the projection of the data onto S. • S 1 : the space generated by the columns ofX with the restriction that the first p coefficients are constant. S 1 is isomorphic to the space generated by the columns of (W,Z). We denote byθ the least squares estimate under the conditions {θ , j = 1, 2, . . . , p; k = 1, 2, . . . , m}, thus the projection of y onto S 1 is P 1 (y) =Xθ. • S 0 : the space generated by the columns ofX with the restriction that all coefficients remain constant, i. e., absence of breaks. S 0 is isomorphic to the space generated by the columns of X. Let θ * denote the least squares estimate in this case, subject to {θ , j = 1, 2, . . . , p + q; k = 1, 2, . . . , m}, so that the projection of y onto S 0 is P 0 (y) =Xθ * .
In the framework of the Gaussian likelihood test theory, the hypothesis of absence of breaks against a complete structural break may be tested using the statistic F (t 1 , . . . , t m ) = P (y) − P 0 (y) 2 y − P (y) 2 n − (m + 1)(p + q) m(p + q) .
This is the Bai and Perron statistic for the complete structural break. The hypothesis that the first p parameters remain constant in the (m + 1) sub-series could be tested using the statistic The Bai and Perron F test statistic for the partial structural break is easily seen to be and may be seen as a conditional test of absence of breaks given that the first p parameters are not subject to change. The hypothesis of partial break, i. e., that the change does not involve the first p parameters, may be rejected if the statistic F c (5.7) is large, compared with the quantiles of a F variable with mq and n − (m + 1)q − p degrees of freedom. In this case also, our procedure has the advantage of requiring the search of the number of breaks m, and their dates t 1 , . . . , t m , only for a complete break structure, which is much simpler and easier than maximizing the partial break F statistic.

Implementation of the empirical likelihood-based procedure
We start noting a boundedness issue, that is typical of the empirical likelihood framework, and does not arise in the procedures based on squared residuals. Consider the level change case. The statistic for testing no level change against m changes at times t 1 , . . . , t m is (4.2) and is the sum of the logarithms of the partial empirical likelihoods: Their definition depends on the existence of weights w j satisfying the constraints in (6.1) for each k, in other words on the existence of a number µ belonging to the convex hull of each sub-series (y t k +1 , . . . , y t k ), k = 0, 1, . . . , m. If this does not happen, at least one of the ratios (6.1) is not defined, and may be conventionally taken equal to zero [e. g. 34], thus the test statistic (4.2) becomes arbitrarily large. It implies that if the data contains any sub-series whose values are all larger than the remaining observations, the null hypothesis of no level change will be rejected at any significance level; this does not happen with the CUSUM or F -type tests, unless the difference is significantly large. In our view, this is a distinctive feature of the empirical likelihood, and not necessarily a shortcoming. However, to avoid such problem, the definition itself of empirical likelihood may be slightly modified in order to ensure the existence of a solution with all positive weights in any case. An adjusted empirical likelihood is proposed in [13], consisting in the addition of an artificial observation with the aim of satisfying the constraint. Suppose that the series is (y 1 , . . . , y n ) and the empirical likelihood is expressed in terms of the estimating functions g(y, θ). Then we add an artificial observation y n+1 , and define g(y n+1 , θ) = − a n n n i=1 g(y i , θ) (6.2) and instead of the empirical likelihood ratio ELR(θ) (3.3) we consider the adjusted empirical likelihood ratio: The behavior of the adjusted ELR and the choice of the term a n are discussed in [13], showing asymptotic coincidence if a n = o p ( √ n). The authors suggest to choose a n = max{1, 1 2 log n} and the use of a truncated mean in (6.2). In our experience we found such adjustment satisfactorily effective, and the difference with ELR(θ) negligible when the hypothesis of absence of breaks is true.
A further issue is Bartlett correctability. A Bartlett correction may be applied to the empirical likelihood [see e. g. Chapter 13 of 34], both in the case of a level change and for structural change in linear models, resulting essentially in dividing the ELR by (1+A/n). However, the appropriate value to be used for A may be difficult to select; a rough approximation based on Gaussian distribution is A = d 2 + d/2 if the data, or the estimating functions, belong to R d , but more research is still needed on this topic.
An open problem in the implementation of an empirical likelihood-based procedure resides in the actual maximization for finite n of the sum of the log empirical likelihood ratios with respect to the mean µ in the case of level change (4.2), and the maximization of (5.2) with respect to the parameter θ in the case of estimating functions. Consider first the level change identification, when only one break is allowed, at time q. The statistic to be minimized is −2 log ELR(z, 0, q) − 2 log ELR(z, q, n) with respect to z. Under the hypothesis of no break, and for z = µ + O p (n −1/2 ), we have [33, p. 1727]: The choice of the overall averageȳ = qȳ 1 +(n−q)ȳ 2 , that would be the minimizer if S 1 = S 2 = S independent of z, may be appropriate for n large, but for smaller n the optimal value of z may be different. In the case of scalar data it is easy to show that the minimizer is a convex linear combination ofȳ 1 andȳ 2 , as stated in the following Theorem. Theorem 6.1. Let {y 1 , y 2 , . . . , y n } be n independently identically distributed random variables with finite variance, and 1 < q < n. The minimizer of the asymptotic part of (6.3) is a convex linear combination ofȳ 1 andȳ 2 : For k > 1 breaks, it easily follows that, under the assumptions of Theorem 6.1, the optimal choice of z belongs to the convex hull of the sub-series averages: We conjecture that also in the multivariate case the optimal z is a linear combination ofȳ 1 andȳ 2 , as some simulation experience suggests, but no proof is available.
We turn now to the structural break case, on considering a linear model (5.3): y = Xθ + ε and only one break, that splits the data into two sub-series (y 1 , y 2 , . . . , y q ) and (y q+1 , . . . , y n ). Let us partition accordingly the data vector, the design matrix and the residuals: For the estimating equations we have: Thus if θ 0 is the true parameter value, for θ = θ 0 + O p (n −1/2 ): . On denoting the least squares estimate on each sub-series byθ 1 = (X ′ 1 X 1 ) −1 X ′ 1 y 1 andθ 2 = (X ′ 2 X 2 ) −1 X ′ 2 y 2 , the test statistic (6.4) may be written: Like in the case of level change, minimization of (6.5) with respect to θ is made difficult by the dependence of the matrices G 1 and G 2 on θ. To obtain an approximate solution, one can ignore such dependence, and put G 1 (θ) −1 (X ′ 1 X 1 ) ≃ G 2 (θ) −1 (X ′ 2 X 2 ) ≃ σ 2 I. Then, equating derivatives to zero, we obtain: that is solved byθ = (X ′ X) −1 X ′ y, the least squares estimate under absence of breaks. However, in practice for small n the dependence of G 1 (θ) and G 2 (θ) on θ may not be negligible, and a numerical optimization routine could be advisable. A further required stage is to determine the best segmentation, i. e., given the number of breaks m, determining their dates t 1 , t 2 , . . . , t m . However, the additive nature of the statistics (4.1) and (5.1) makes it possible to solve such problem by means of a procedure based on dynamic programming, similar to that introduced by [5] for the sup F tests. The key idea is that if 1 < t 1 < · · · < t m < n is the best segmentation into (m + 1) sub-series, then t 1 , t 2 , . . . , t m−1 are the optimal dates for the segmentation of the "truncated" series (y 1 , y 2 , . . . , y tm ) into m subseries. Thus, having already computed the best break dates for k breaks of any series (y 1 , . . . , y t ) for any t, and if λ(t) is the resulting test statistic, the best dates for the whole series (y 1 , . . . , y n ) and (k + 1) breaks are obtained by taking for the last break date the time t k+1 that solves: and the other dates equal to the best segmentation of the series (y 1 , . . . , y t k+1 ) into k + 1 sub-series. This allows to solve iteratively the search for the break dates with an increasing number of breaks, by considering only one-dimensional maximization problems similar to (6.6). The value of the parameter θ is set for simplicity equal to its approximate optimum θ * (the least squares estimate under the hypothesis of no break; and respectivelyȳ in case of level changes). Thus, the proposed procedure for determining the optimal dates, allowing up to m breaks, runs in the following stages (remember that we assume at least h observations in each sub-series): • 1st stage. On assuming only one break, for any length τ from 2h to n − h, and for τ = n, the series (y 1 , . . . , y τ ) is split into 2 parts by means of the date t 1 solving , τ ]} the resulting value of the test statistic. • 2nd stage. Assuming two breaks, for any length τ from 3h to n − h, and for τ = n, the series (y 1 , . . . , y τ ) is split into 3 parts. The second break date t 2 is obtained by solving while the first break date will be selected equal to t 1 [t 2 (τ )], and ℓ 0 (1, τ ) will denote the resulting value of the test statistic.
• k-th stage. For k breaks, for any length τ from (k + 1)h to n − h, and for τ = n, the series (y 1 , . . . , y τ ) is split into (k + 1) parts. The last break date t k is given by: (1, t) + log ELR(θ, t, τ ), (k + 1)h < t ≤ τ − h and the preceding break dates are the optimal ones for (k − 1) breaks of the data from 1 to t k (τ ).
Though this dynamic programming strategy is based only on one-dimensional optimization problems like (6.6), it may be computationally heavy, specially if n is not small; an alternative is the search for nearly-optimal solutions by means of a meta-heuristic method. Genetic algorithms have been successfully employed for similar problems, e. g. in threshold autoregressive models building [see 8, for a review].
The final problem is identifying the number of breaks, if any, given a maximum number of allowable breaks m. The simplest way is computing the ELR test statistic for absence of breaks against an alternative of k breaks (k = 1, 2, . . . , m) and selecting the most significant figure, or using an information criterion. Identification criteria such as AIC and BIC applied to empirical likelihood are discussed, e. g., in [24] and [40]. Alternatively, [4] proposed a sequential method, symbolized by F (ℓ + 1|ℓ), to decide whether to accept the hypothesis of ℓ + 1 breaks against that of ℓ breaks. Essentially, in its simpler version, it amounts to consider the (ℓ + 1) optimal sub-series obtained under ℓ breaks, and to test each of them for the presence of an additional break.
Given the asymptotic equivalence between the F and ELR test statistics, such strategies may also be adopted for the empirical likelihood-based procedure. Practical recommendations are provided by [5]. They suggest to use the sequential procedure, but not stopping as soon as the test is not significant, rather computing the F (ℓ + 1|ℓ) statistic for sufficiently many values of ℓ, and selecting the number of breaks m so that the tests F (ℓ + 1|ℓ) are insignificant for ℓ ≥ m.

Simulations and applications
A simulation study has been conducted to check the behavior of the two tests on finite length samples (n = 50, 100, 200). Thousand random samples were simulated according to the following probability distributions: standard Gaussian N (0, 1), standardized Gamma distribution G(5, 1) and G(10, 1), exponential G(1, 1) and Student's t with different degrees of freedom t(2), t(5), t(10).
The tests for the hypothesis of absence of breaks against the alternative of one level change were applied to the simulated samples with sizes 0.10, 0.05 and 0.01. Since the essential findings were similar, only results for size 0.05 are presented. Moreover, the two tests yielded the same practical results for series length n = 200, we shall report only findings for n = 50 and 100.  The frequencies of rejections of H 0 when it is true for a nominal size α = 0.05 are shown in Table 1. An overall consideration is that the observed size of F tends to be generally slightly smaller, and that of ELR slightly larger, than 0.05. A moderate leptokurtosis as in t(5) and t(10) does not modify such a behavior, while this phenomenon is more visible in asymmetric distributions like G(5, 1) and G(10, 1) and specially for exponential data, though lighter for n = 100. Notably different are the results for t(2) data, where the ELR shows an exceptionally large frequency of rejection, indicating that the hypothesis of finite moments (this distribution has not finite variance) is critical for convergence to the asymptotic distribution.
The difference between the F and ELR statistics in case of absence of breaks is generally moderate and decreases rapidly when n increases. Gaussian, G(5, 1), G(10, 1), t(5) and t(10) distributions give similar results, while the behavior is somewhat different specially for Student's t with infinite variance t(2) and partly for exponential G(1, 1), where ELR tends to be slightly larger than F .
In order to check the power of the tests, we inserted a level change at the midpoint of the simulated series and repeated the exercise. The power of the tests as a function of the break size (from 0.1 to 0.9 times the series standard error) is reported in Figure 1 for the Gaussian case and series length n = 50, showing a marginal advantage in power for the ELR test; when n = 100 the two lines are undistinguishable.
The results for a moderate break size 0.5 are now described in more detail. The overall behavior is described in Figure 2 where the distribution of results for Gaussian data in the cases of no break and one break is presented for n=100.
To make the graph more readable, we have computed histograms on 100 equal width classes, and plotted the class frequencies against the central class values.
The frequencies of rejection of the null hypothesis of absence of breaks is shown in Table 2. For n = 50 the ELR test has a slightly larger observed power, while for n = 100 the power of the two tests is essentially equivalent. In the level change case the differences between F and ELR tend to increase slightly. ELR is often a bit larger than F , specially for exponential G(1, 1) and t distributions with 2 and 5 degrees of freedom rather than in the Gaussian case.
Finally, the distributions of the selected break dates according to the two tests appear similar, an example is Figure 3 concerning the case of the t(10) data, n=100 (here also the frequency of each histogram class is plotted against the central value of the class).
We have also simulated data according to two linear models (among those considered in [7]), with n = 100 and 1000 replications.
Gaussian, α = 10, β = 0.5 and changes in the parameter α at t = 50.  The power of the two tests for break sizes 0(0.1)0.9 are shown in Figure 4. Here also it may be observed that the behavior of the two tests is similar, though the ELR test has an observed size slightly larger than the nominal 0.05.
The tests for partial structural change were also considered. Sets of thousand replications of series generated according to Model 1, and breaks with different size in the first parameter were simulated with n = 200, and the partial ELR and F tests for the hypothesis that the second parameter remains constant were applied. The experiment was repeated with series containing also a moderate increase in the second parameter at the break date. The results are reported in Table 3. Figures are relative frequencies of rejection of the hypothesis of constancy of the second parameter, computed only on the replications where the null hypothesis of absence of a complete structural change was rejected. The first rows (bs2 = 0.0) show the observed test size, that is always close to the nominal value 0.05, though a little larger for ELR and smaller for F . The second and third row sets relate to cases when the structural change is complete, since the break has an effect, though moderate, on the second parameter too; thus, they denote observed power. The figures for ELR are generally slightly larger than those for F . An overall comment is that the results are not largely influenced by the break size of the first parameter, implicitly indicating that its estimation bias is not serious.
We present now two applications of the break detection methods to real data. The two series are not satisfactorily described by linear models since they exhibit non linear or non stationary features, possibly due to structural changes, and more complicated models were proposed.
The first dataset is the help-wanted advertising index (monthly from January 1960 to December 1996, n = 444) analyzed by [27]. They fitted a smooth transition time-varying model with a structural change at December 1979. The same series was considered by [9] who proposed a non-linear non-stationary tworegime model of a similar type but with change at April 1969. We tried a model with a constant, a linear trend T t = (t/n) and two autoregressive terms at lags 1 and 2; the Bai and Perron and ELR procedures gave the same results, indicating three regimes with structural change at t = 106 (May 1969) and t = 359 (June 1990). The sum of squared residuals is 0.766, slightly larger than the model of [9] (0.649) and the model of [27] (0.691), that both allow non-linearity in addition. Figure 5 shows the data and the break dates, the first one appears nearly equal to [9], the second structural change is less clear-cut. The Bai and Perron procedure indicated a third possible break at October 1979 (similar to the break date proposed by [27]), but the F (3|2) statistic was not significant.
The second series relates to the average annual temperature at Innsbruck, 1803-2005. This dataset was analyzed by [10] who proposed an autoregressive model with the autoregressive coefficients linearly time-varying, and a regime change at year 1973. We fitted a simple linear model with a constant, a linear trend T t = (t/n) and an autoregressive term at lag 1. Here also the Bai and Perron and ELR suggest essentially the same structural changes: three breaks at t = 55 (t = 57 for ELR), 93, 158 (years 1857, 1895, 1960). The data are shown in Figure 6. The residual variance of the 3-break resulting model is about 43, while the residual variance of the model in [10] is 46.7, it seems that their method privileges parsimony, as found also for the help-wanted data.  [9] and [27] respectively.

Conclusions
We have discussed how the empirical likelihood paradigm may be adopted for the problem of identifying and estimating structural changes in time series. The proposed empirical likelihood ratio test is shown to be asymptotically equivalent to the tests based on F -type and Cusum statistics.
Therefore, similar strategies as those proposed by Bai and Perron [4] may be used for identifying the number of breaks and their dates with empirical likelihood. In particular, a dynamic programming algorithm analogue to that proposed in [5] is introduced for detecting the optimal break dates.
When testing the null hypothesis of absence of breaks against the alternative of m breaks, the ELR test statistic is simply the sum of the ELR computed on each of the (m + 1) sub-series having a different structure under the alternative. However, the required ELR test statistic value may be computationally heavy, since a maximization with respect to the model parameters under the null hypothesis is needed, and for some models the solution is not near the maximum likelihood, or the least squares, estimates of the parameters, but must be actually obtained through an optimization algorithm.
It has also been suggested how to use the empirical likelihood test in the case of partial structural breaks, when the possible change involves only some of the model parameters.
Experience on a set of simulated series with finite length suggests that the results of the two tests are equivalent for long series, but not so different also for smaller lengths n = 50 and 100, when generally the ELR test shows a slightly larger power, but also a slightly larger size than nominal. Finally, we applied the tests on some real series with non linear or non stationary features induced by structural changes, and found that both the ELR and F procedures are able to detect the breaks.
The basic idea of the present paper may be applied in principle to any model for which an empirical likelihood inference has been proposed. Thus, future research may be concerned with break detection in GARCH models using the results of [11], or random coefficient INAR(p) processes basing on [43]. Moreover, it may be possible to consider periodogram ordinates rather than data, extending to break detection the frequency domain empirical likelihood inference proposed by [28] for ARMA models, extended to long-memory ARFIMA models by [41] and further developed for more general models by [31,30,22].