Multiple breaks detection in general causal time series using penalized quasi-likelihood

: This paper is devoted to the oﬀ-line multiple breaks detec- tion for a general class of models. The observations are supposed to ﬁt a parametric causal process (such as classical models AR( ∞ ), ARCH( ∞ ) or TARCH( ∞ )) with distinct parameters on multiple periods. The num- ber and dates of breaks, and the diﬀerent parameters on each period are estimated using a quasi-likelihood contrast penalized by the number of dis- tinct periods. For a convenient choice of the regularization parameter in the penalty term, the consistency of the estimator is proved when the moment order r of the process satisﬁes r ≥ 2. If r ≥ 4, the length of each approxi- mative segment tends to inﬁnity at the same rate as the length of the true segment and the parameters estimators on each segment are asymptotically normal. Compared to the existing literature, we added the fact that a dependence is possible over distinct periods. To be robust to this dependence, the chosen regularization parameter in the penalty term is larger than the ones from BIC approach. We detail our results which notably improve the existing ones for the AR( ∞ ), ARCH( ∞ ) and TARCH( ∞ ) models. For the practical applications (when n is not too large) we use a data-driven procedure based on the slope estimation to choose the penalty term. The pro- cedure is implemented using the dynamic programming algorithm. It is an O ( n 2 ) complexity algorithm that we apply on AR(1), AR(2), GARCH(1 , 1) and TARCH(1) processes and on the FTSE index data.


Introduction
The breaks detection is a classical problem as well as in the statistic than in the signal processing community. The first important result in this topic was obtained by Page [20] in 1955 and real advances have been done during the seventies, notably with the results of Hinkley (see for instance [12]) and the break detection became a distinct and important area of research in statistic (see the book of Basseville and Nikiforov [3] for a large overview).
Two approaches are generally considered for solving a problem of breaks detection: an 'on-line' approach leading to sequential estimation and an 'offline' approach when the series of observations is complete. Concerning this last approach, numerous results were obtained for independent random variables in a parametric frame (see for instance Bai and Perron [1]). The case of the off-line detection of multiple change-points in a parametric or semiparametric frame for dependent variables or time series also provided an important literature. The present paper is a new contribution to this problem.
In the problem of change-points detection, numerous papers were devoted to the CUSUM procedure (see for instance Kokozska and Leipus [14] in the specific case of ARCH(∞) processes). In Lavielle and Ludena [17] a "Whittle" contrast is used for estimating the breaks dates in the spectral density of piecewise longmemory processes (in a semi-parametric framework). Davis et al. [6] proposed a likelihood ratio as the estimator of breaks for an AR(p) process. Lavielle and Moulines [18] consider a general contrast using the mean square errors for estimating the parameters. In Davis et al. [7], the criteria called Minimum Description Length (MDL) is applied to a large class of nonlinear time series.
We consider here a semiparametric estimator based on a penalized contrast (so-called penQLIK in the sequel) using the quasi-likelihood function. For usual stationary time series, the conditional quasi-likelihood (so-called QLIK in the sequel) is constructed as follow: 1. Assume the process (ξ t ) t∈Z is a Gaussian sequence and compute the conditional likelihood (with respect to σ{X 0 , X −1 , . . .}) based on the unobservable infinite realization of (X t ) t∈Z ; 2. Approximate this computation for a sample (X 1 , . . . , X n ); 3. Apply this approximation even if the process of the innovations is not a Gaussian sequence.
The quasi-maximum likelihood estimator (QMLE) obtained by maximizing the QLIK has convincing asymptotic properties in the case of GARCH processes (see Jeantheau [13], Berkes et al. [5], Franck and Zakoian [10]) or generalizations of GARCH processes (see Mikosch and Straumann [22], Robinson and Zaffaroni [21]). Bardet and Wintenberger [2] study the asymptotic normality of the QMLE of θ applied to the class of models considered here. Thus, when K * is known, a natural estimator of the parameter (t * , θ * ) = ((t * j ) 1≤j≤K * −1 , (θ * j ) 1≤j≤K * ) for a process satisfying (1) is the QMLE on every intervals [t j + 1, . . . , t j+1 ] and every parameters θ j for 1 ≤ j ≤ K * . However we consider here that K * is unknown and such method cannot be directly used. The chosen solution is to penalize the contrast by an additional term κ n K, where the regularization parameters κ n form an increasing sequence of real numbers (see the final expression of the penalized contrast in (4)). Such procedure of penalization was previously used for instance by Yao [23] to estimate the number of change-points with the Schwarz criterion and by Lavielle and Moulines [18]. Hence the minimization of the penalized contrast leads to an estimator (see (5)) of the parameters (K * , t * , θ * ).
Classical heuristics such as the BIC one lead to choose κ n ∝ log n. In our study, such penalties terms are excluded in some cases, when the models in (1) are very dependent on their whole past, see Section 3 (and simulation results) for more details. Roughly speaking, an explanation of this can be provided by the simple relation: penQLIK(K, t, θ) = QLIK(K, t, θ) + κ n K = QLIK(K, t, θ) − QLIK(K, t, θ) + QLIK(K, t, θ) + κ n K where QLIK is the conditional quasi-likelihood of a process following (1) except that it is composed by stationary time series on each period which are independent of the stationary processes defined on the other periods. Using moment bounds we will prove in Section 6 that QLIK(K, t, θ) − QLIK(K, t, θ) = O P (u n ) with u n → ∞ and u n /n → 0, where (u n ) n∈N depends on the Lipshitzian behavior of g θ . Since QLIK(K, t, θ) ∼ C n a.s. when n → ∞ from results obtained in [2], it is clear that the penalty term can play a role only if κ n >> u n . Finally, we will show that under weak conditions on the model, the regularization parameter κ n ∝ √ n over-penalizes the number of breaks for avoiding artificial breaks in cases of models very dependent on their whole past (see Section 3 for details). Such a choice of κ n is robust to the (possibly strong) dependence.
The main results of the paper are the following: under Lipshitzian condition on g θ and when the moments of order r ≥ 2 of the innovations and X are finite, the estimator K n , ( t j /n) 1≤j≤ Kn−1 , ( θ j ) 1≤j≤ Kn is consistent. If moreover Lipshitzian conditions are also satisfied by the derivatives of g θ and if r ≥ 4, then the convergence rate of ( t j /n) 1≤j≤ Kn−1 is O P (w n ) for any sequence (w n ) n such that w n >> n −1 and a Central Limit Theorem (CLT) for ( θ j ) 1≤j≤ Kn with a √ n-convergence rate is established. These results are "optimal" in the sense that the convergence rate is the same than in an independent setting.
After detailing the particular cases of AR(∞), ARCH(∞) and TARCH(∞) satisfying the break-point problem (1), the estimator is applied to generated trajectories of such time series. Two difficulties appeared. Firstly, the computation time was very long and exponentially increased with K. We solved this problem by using a dynamic programming algorithm which is a O(n 2 ) complexity algorithm. We also considered only small length trajectories (n ≤ 2000). Secondly, we obtained the consistency of the estimator of K * as a theoretical result in all considered model regardless of their dependence properties when κ n ∝ √ n when n → ∞. We will see that for particular models such that ARMA(p, q) or GARCH(p, q) a BIC-type penalty with κ n ∝ log n is also possi-ble, but κ n ∝ √ n ensures the convergence for a larger class of models (including AR(∞), ARCH(∞) or TARCH(∞) processes).
However, for n not too large (for instance n = 1000) the choice of κ n = √ n very often led to K n = K * . Hence we chose to implement a data-driven procedure for estimating κ n (denotedκ n in the sequel) using a slope estimation method (see [4]), such procedure being nowadays often used in the model selection frame. In such a way, the results of simulations are clearly satisfying (see Section 5). The estimation procedure is also applied to financial data and this provides estimating dates of breaks corresponding with key dates of financial crisis.
The following Section 2 is devoted to the assumptions and the study of the existence of a nonstationary solution of the change point problem (1). The definition of the estimator and its asymptotic properties are studied in Section 3.
The particular examples of AR(∞), ARCH(∞) and TARCH(∞) processes are detailed in Section 4, while the concrete estimation procedure and numerical applications are presented in Section 5. Finally, Section 6 contains the main proofs.

Notation and assumptions
Let θ ∈ R d and M θ and f θ be real-valued measurable functions such that for all (x i ) i∈N ∈ R N , M θ (x i ) i∈N = 0. In this paper, we consider a general class M T (M θ , f θ ) of causal (non-anticipative) time series. Let T ⊂ Z and (ξ t ) t∈Z be a sequence of centered independent and identically distributed (iid) random variables called the innovations and satisfying var(ξ 0 ) = 1. Define The process X = (X t ) t∈Z belongs to M T (M θ , f θ ) if it satisfies the relation: The existence and properties of these general affine processes were studied in Bardet and Wintenberger [2] as a particular case of chains with infinite memory considered in Doukhan and Wintenberger [8]. Numerous classical real valued time series are included in M Z (M, f ): for instance AR(∞), ARCH(∞), TARCH(∞), ARMA-GARCH or bilinear processes.
For obtaining conditions of existence of a process included in M T (M θ , f θ ) first define the following different norms: 1. · applied to a vector denotes the Euclidean norm of the vector; 2. for any compact set K ⊆ R d and for any g : Let Ψ θ = M θ , f θ and i = 0, 1, 2, then for any compact set K ⊆ R d , define In the sequel we refer to the particular case called "ARCH-type process", if f θ = 0 and the following assumption holds on h θ = M 2 θ : Now, for any i = 0, 1, 2 and any compact K ⊂ R d , under Assumptions A i (f θ , K) and A i (M θ , K), denote:

and under Assumption
The dependence with respect to r of the coefficients β (i) andβ (i) are omitted for notational convenience. From now on let us fix Θ a compact subset of R d satisfying some contraction properties: From [2] we have: which is stationary, ergodic and satisfies X 0 r < ∞.
The assumption A is classical when studying the existence of stationary solution of general models. For instance, Duflo [9] used such a Lipschitz-type inequality to show the existence of Markov chains. The elements of the compact set Θ satisfies one Lipschitz-type condition specified either for general causal models either for ARCH-type models. This distinction is adequate as for ARCH-type models A 0 (h θ , {θ}) is less restrictive than A 0 (M θ , {θ}). Remark that assumptioñ β (0) (θ) < 1 is optimal for the stationarity of order r ≥ 1 but not for the strict stationarity of the solution of an ARCH-type model. Let θ ∈ Θ and X = (X t ) t∈Z a stationary solution included in M Z (f θ , M θ ). For studying QMLE properties, it is convenient to assume the following assumptions: Assumption Var(Θ): For all θ ∈ Θ, one of the families ∂f θ ∂θ i (X 0 , X −1 , . . .) 1≤i≤d or ∂h θ ∂θ i (X 0 , X −1 , . . .) 1≤i≤d is a.e. linearly independent. Assumption D(Θ) will be required to define the QMLE, Id(Θ) to show the consistence of the QMLE and Var(Θ) to show the asymptotic normality.
The following proposition establishes the existence of the non stationary solution of the problem (3) and its moments properties.
Proposition 2.2. Consider the problem (3) with θ * j ∈ Θ for all j = 1, . . . , K * , Θ satisfying A for some r ≥ 1. Then (i) there exists a solution X = (X t ) t∈Z of the model (3) and X is a causal time series. (ii) there exists a constant C > 0 such that for all t ∈ Z we have X t r ≤ C.
The problem (3) distinguishes the case t ∈ T * 1 = {1, . . . , t * 1 } to the other ones since it is easy to see that (X t ) t∈T * 1 is a stationary process while (X t ) t>t * 1 is not. However, all the results of this paper hold if (X t ) t∈T * 1 is defined as the other (X t ) t∈T * j , j ≥ 2 (by defining a break in t = 0 setting X t = 0 for t ≤ 0 for instance). Now, for any number of periods K ≥ 1, any instants of breaks t ∈ F K and any parameters on each periods θ ∈ Θ K , the global QLIK contrast J n is given by the expression: Since K * has to be estimated, define the QLIK contrast penalized by the number of periods, called penQLIK contrast, by where κ n ≤ n is called the regularization parameter and will be fixed later.
Suppose that an upper bound K max > 0 of the number of periods is known. Our estimator is defined as one of the minimizers of the penalized contrast: Argmin (t,θ)∈FK×Θ K ( J n (K, t, θ)) and τ n = t n n .
The assumption H i is interesting as it links the decrease rate of the Lipschitz coefficients and the penalty term of (4). The classical BIC corresponds to regularization parameters of the order of log(n). This choice is possible if the Lipschitz coefficients decrease exponentially fast, which hold for all models M(f θ , M θ ) with finite order (see below). However, if the decrease of the Lipschitz coefficients is polynomial only regularization parameters satisfying κ n >> log(n) satisfy H i . Moreover, whatever the decay of the Lipschitzian coefficients, the estimation is more robust (with respect to the dependence over distinct segments) for the largest regularization parameter. More precisely, consider the following two paradigmatic examples for which (κ n ) satisfies conditions (7) (also used in [16]): with 0 ≤ a < 1, then any choice of regularization parameters (κ n ) such that κ n → ∞ and κ n = o(n), satisfy (7) (for instance κ n of order log n as in the BIC approach).
Remark 3.2. The sequence (δ n ) with δ n := nc * / log n appearing in (7) is the size of "small" blocks that are excluded from the original observations to deal with the possible dependence between period. It is the theoretically size below which we do not distinguish the breaks due to the dependence. This size depends on the real model and is unknown.

Consistency of the estimators
For establishing the consistency, we add the couple of following classical assumptions in the problem of break detection: Assumption B: min j=1,...,K * −1 θ * j+1 − θ * j > 0. Furthermore, the distance between instants of breaks cannot be too small: is the floor of x). The is called the vector of breaks.
Even if the length of T * j has asymptotically the same order than n, the dependence with respect to n of t * j , t k , T * j and T k are omitted for notational convenience.
The assumption C implies that the length of each segment tends to infinity at the same rate as n. We will introduce a size u n << n which represents the lower bound on the accuracy of the approximation of the lengths of the segments. This minimum size is needed for the numerical computation of the criteria. For the ARMA and GARCH model, u n = O((log n) δ ) can be chosen for 1 ≤ δ ≤ 2.
We are now ready to prove the consistency of the penalized QLIK contrast: Theorem 3.1. Assume that D(Θ), Id(Θ), B, C and H 0 are satisfied with r ≥ 2. If K max ≥ K * then: Note that if K * is known, we can relax the assumptions for the consistency by taking κ n = 1 for all n as the penalty term in (4) does not matter. If K * is unknown and r = 2, then a robust choice to any geometric or Riemanian dependence is κ n ∝ n/ log n. However, such large regularization parameters always over-penalized in practice.

Rates of convergence of the estimators
To state the rates of convergence of the estimators τ n and θ n , we need to work under stronger moment and regularity assumptions. By convention, if the vectors t n and t * do not have the same length, complete the shorter of the 2 vectors with n before computing the norm t n − t * m .
This theorem induces that w −1 n t n −t * m P → 0 for any sequence (w n ) n such that w n → ∞ and therefore t n − t * m = o P (w n ): the convergence rate is arbitrary close to O P (1). This is the same convergence rate as in the case where (X t ) t is a sequence of independent r.v. (see for instance [1]). Such convergence rate was already reached in the frame of piecewise linear regression with innovations satisfying a mixing property in [18].
Let us turn now to the convergence rate of the estimator of parameters θ * j . By convention if K n < K * , set T j = T Kn for j ∈ { K n , . . . , K * }. Then, Theorem 3.3. Assume that D(Θ), Id(Θ), B, C and H 2 are satisfied with r ≥ 4 where, using q 0,j defined in (12), the matrix F and G are such that In Theorem 3.3, a condition on the rate of convergence of κ n is added. The most robust choice for the regularization parameter corresponds to κ n ∝ √ n as it corresponds to the most general problem (3) (see above). However, by assumption H 2 it excludes models with finite moments r ≥ 4 satisfying: ℓ (h θ , Θ)) with 1 < γ ≤ 3/2 for some i = 0, 1, 2. For these models the consistency for τ n holds but we do not get any rate of convergence for θ n .

AR(∞) models
Consider AR(∞) with K * − 1 breaks defined by the equation: the choice κ n = n/ log n ensures the strong consistency of ( K n , τ n , θ n ).
, the choice κ n = √ n ensures the convergence (8) of t n and the CLT (9) satisfied by θ n ( T j ) for all j.
Note that this problem of change detection was considered by Davis et al. in [6] but moments r > 4 are required. In Davis et al. [7], the same problem for another break model for AR processes is studied. However, in both these papers, the process is supposed to be independent from one block to another and stationary on each block.

ARCH(∞) models
Consider an ARCH(∞) model with K * − 1 breaks defined by: where (ψ k (θ)) k≥0 is a sequence of positive real numbers and E( the choice of κ n = n/ log n leads to the consistency of ( K n , τ n , θ n ) when θ * j ∈ Θ for all j.
, then the choice of κ n = √ n ensures the convergence (8) and the CLT (9) satisfied by θ n ( T j ) for all j.
This problem of break detection was already studied by Kokoszka and Leipus in [14] but they obtained the consistency of their procedure under stronger assumptions.
Example 1. Let us detail the GARCH(p, q) model with K * − 1 breaks defined by: Remark that this sequence is twice differentiable with respect to θ and that its derivatives are exponentially decreasing. Moreover for any Then if q k=1 a * k,j + p k=1 b * k,j < 1 for all j (case r ≥ 2), our estimation procedure associated with a regularization parameter κ n K for any 1 << κ n << n is consistent. Moreover, if (E|ξ 0 | 4 ) 1/2 q k=1 a * k,j + p k=1 b * k,j < 1 for all j, then our procedure with a regularization parameter 1 << κ n = O( √ n) allows the same rates of convergence than in the case where (X t ) are independent random variables. For example, a BIC-type regularization parameter κ n ∝ log n as in [7] can be chosen in this case.
To our knowledge, these results are the first one concerning the change detection for TARCH(∞).

Some simulations results
The procedure is implemented on the R software (developed by the CRAN project). Since we proceed with not so large samples (n ≤ 2000), the consistency of K n is often not obtained for the most robust theoretical choice of κ n = √ n. As a consequence, for numerical applications, we chose a data-driven procedure for computing the regularization parameter κ n . Thus, κ n is calibrated using the slope estimation procedure of Baudry et al. [4]. Once obtained the regularization parameter κ n , the dynamic programming algorithm (see [15]) is used to minimize the criteria. Remark that we could also use the genetic algorithm and the approximated likelihood of [7] to speed up the procedure.

The slope estimation procedure
The heuristic of the procedure is that the criteria (here QLIK) is a linear transformation of the penalties (here the number of periods K) for the most complex models (with K close to K max ). This slope should be close to −κ n /2. This procedure has already been used in [4] for breaks detection in an i.i.d. context. We adapt it to the case of dependence (details are omitted for the common part with the iid case and we refer the interested reader to [4]).
By construction, the procedure is very sensitive to the choice of K max as only complex models are used to estimate the slope. As discussed in the Remark 3.3, we only consider periods of length larger than u n and we can a priori fix K max smaller than [n/u n ]. Therefore, the slope estimation procedure consider only the linear part of −QLIK with K ≤ K max . The concrete procedure (see examples below) is: Then compute the slope of the linear part: this slope isκ n /2. 2. Using κ n =κ n , draw (K, min t,θ penQLIK(K)) 1≤K≤Kmax . This curve has a global minimum at K n .

Implementation details
We assume that the regularization parameter is known (for instance κ n =κ n , κ n = log n or κ n = √ n). In this section, we give more details on how to com-pute K n and the optimal configuration of the breaks by using the dynamic programming algorithm. The basic idea of this algorithm is that: for a given . . , l} and let M L be the upper triangular matrix of dimension n×n with M L i,l = L(T i,l , θ n (T i,l )) for i ≤ l. The estimated number of segment K n and the corresponding optimal configuration can be obtained as follow: 1. Let C be an upper triangular matrix of dimension K max × n. For 1 ≤ K ≤ K max and K ≤ t ≤ n, C K,t will be the minimum penalized criteria of . . , K max − 1 and the break-point are obtained as follow: set t Kn = n, t 1 = 1 and for Note that the above procedure requires O(n 2 ) operations, instead of O(n Kmax ) if the standard procedure is used.
Remark 5.1. The minimum description length (MDL in the sequel) criterion (see [7]) is defined in our setting by: We can also write: Hence, the MDL criterion can be seen as a penalized criterion and the dynamic programming algorithm described above can be used to find the optimal configuration.
The regularization parameter is chosen by using the slope estimation presented above (Subsection 5.1). Figure 1 represents the slope of the linear part of the −QLIK criteria (minimized in (t, θ)) in scenario A 4 for n = 500 and n = 1000. Thus, by referring to the Figure 1 we obtainκ n ≈ 7.0 for n = 500 andκ n ≈ 9.8 for n = 1000. We are going to minimize the penQLIK in (K, t, θ), with 1 ≤ K ≤ K max and κ n =κ n . Figure 2 represents the points (K, min t,θ penQLIK(K)) for 1 ≤ K ≤ K max = 10.
One can easily read on the Figure 2, the estimated values K n = 4 for n = 500 and K n = 3 for n = 1000 (the estimated number of break is K n − 1). Moreover, the estimated instants of break are t n = (146, 228, 357) (t * = (150, 350)) for n = 500 and t n = (282, 687) (t * = (300, 700)) for n = 1000. Figure 3 shows the estimated break points for the trajectories (n = 500 and n = 1000) for AR(1) processes following scenario A 4 with two changes. Now, 100 independent replications of a AR(1) process are generated following the scenarios A 0 -A 4 . For each replication, the estimated number of segments is computed using QLIK criteria with κ n =κ n , κ n = log n, κ n = √ n and using MDL procedure and Table 1 indicates the proportions of number of replications (frequencies) where the true number of breaks is achieved following the scenarios A 0 -A 4 .  For the replications of scenario A 4 , where the true number of break is fitted ( K n = 3), the average of the estimated parameters are computed and shown in Table 2.
AR(2) models: we consider the problem (3) for a AR (2): Denote θ * (j) = (φ * 1 (j), φ * 2 (j)). For n = 500 and n = 1000, we generate a sample (X 1 , . . . , X n ) in the following situations:  100 independent replications of a AR(2) process are generated following the scenarios B 0 -B 3 . It is evaluated the performance of the procedure using QLIK criteria with κ n =κ n , κ n = log n, κ n = √ n and the one of MDL procedure. Table 3 indicates the proportions of number of replications (frequencies) where the true number of breaks is achieved following the scenarios B 0 -B 3 .
GARCH(1,1) models: we consider examples of problem (3) when X is a GARCH(1, 1) process on each period:   Thus θ * = (a * 0 , a * 1 , b * 1 ). For n = 500 and n = 1000, we generate (X 1 , . . . , X n ) in the following situation:  shows an example of scenario G 4 where one break is fitted withκ n ≈ 12.7 for n = 500 and two breaks withκ n ≈ 18.3 for n = 1000; we obtain, t n = 168 (while t * = (150, 350)) for n = 500 and t n = (307, 725) (while t * = (300, 700)) for n = 1000. Now, 100 independent replications of GARCH(1, 1) processes are generated following the scenarios G 0 -G 4 . For each replication, the estimated number of segment is computed using QLIK criteria with κ n =κ n , κ n = log n, κ n = √ n and using MDL procedure and Table 4 indicates the proportions of replications (frequencies) when the true number of breaks is achieved following the scenarios For the replications of the scenario G 2 , when the true number of break is fitted ( K n = 2 = K * ), the average of the estimated parameters are computed and shown in Table 5.
Finally, recall that in [7], the process is stationary on each segment and assumed to be independent from a segment to another. Davis et al. (2008) used the genetic algorithm to approximate the optimal values of the MDL criteria. We consider three of their scenarios with n = 1000 for GARCH(1, 1) processes: • scenario A: θ * (1) = (0.4, 0.1, 0.5) is constant (K * = 1); Table 4 Frequencies of the number of breaks estimated after 100 replications for GARCH (1,1) processes following the scenarios G 0 -G 4   Table 6 shows the results obtained with our penQLIK method with κ n =κ n , κ n = log n, κ n = √ n and the results of the MDL procedure (obtained after 500 replications) taken in Table I of [7]. Table 5 The estimated parameters for the replications of GARCH(1, 1) processes following the scenario G 2 and satisfying Kn = 2 = K * (one break fitted)

Table 6
Frequencies of the number of breaks estimated after 100 replications for GARCH (1,1) processes with n = 1000 following the scenarios A, C and J of Davis et al. (2008) [7]. The results of MDL procedure were taken in Table I  Conclusion of simulations for AR(1), AR(2) and GARCH(1, 1) processes: The results of QLIK criteria withκ n and √ n penalty show that the probability P( K n = K * ) increases as n increases in all scenarios as it can be deduced from the theory. This is not the case for log n penalty (see for instance the scenario A 2 ). Comparing the results of scenarios A 1 and A 2 (or scenarios A 3 and A 4 ), the BIC penalty (κ n = log n) under-penalizes the number of breaks when the process is sufficiently dependent on its own past. More dependent the process, larger the probability to fit the true number of breaks with √ n orκ n penalty (except in the case G 2 for √ n penalty). However in the case of two breaks, the √ n penalty over-penalizes the number of breaks contrarily withκ n penalty which provides the best results as well for AR(1) as for GARCH (1,1) processes.
For the scenarios A 1 -A 4 , the change in the parameter induces a change in the variance of the stationary solution of the model. In these cases, the Table 1 shows that the MDL procedure provides satisfactory results when there is one break in the model. But this procedure is not really efficient in the case of two breaks (see scenarios A 3 and A 4 ). In Table 3, we also consider two scenarios (B 2 and B 3 ) of AR(2) process where there is a change in the parameters but the variances of the stationary solutions are very close. As can be seen, the penalty κ n still works well whereas the MDL procedure provides poor results. Moreover, the Table 6 shows that, the MDL procedure provides sometimes excellent results (scenarios A and J), but also very weak result (scenario C).
Finally, one can see that if theκ n penalty does not always provide the best results, its results in all scenarios remain satisfactory, in the sense that in all considered scenarios, the estimated probability to fit the true number of break is greater than 0.50 for n = 1000. The use of our method withκ n is clearly the best possible trade-off for one break models. In the case of two breaks, thê κ n penalty provides best results. Contrary to the MDL procedure, the QLIK criteria with κ n penalty works well in the AR models even when the changes in the parameters does not induce a change in the variance of the stationary solution. For all these reasons, we recommend to use our procedure with the penalty term κ n = κ n .
We can see that the results obtained for AR(1) and GARCH(1, 1) models are much better than those obtained for TARCH(1) process even when K * is known and K * = 2 instead of K * = 3. This is explained by the fact that this model provides an asymmetric function of the past observations. Thus, some asymmetric effects can be confused with breaks. Table 7 The estimated parameters for a TARCH (1)  However, Table 5 shows that the change is correctly detected and the decay rate of the error | τ n − τ * | is confirmed. Figure 5 presents an example of such TARCH(1) process with one break.

Application to financial data: FTSE index analysis
Now we apply our detection of changes methodology to the series of the logreturns of the closing values of the FTSE index: the share index of the 100 most highly capitalized UK companies listed on the London Stock Exchange, with the aim of investigating whether and how any detected breakpoints correspond to the milestones of the recent financial crisis. This is a trajectory composed with n = 1428 observations ranging from 27 July 2005 to 18 March 2011, i.e. roughly 6 trading years and uploaded from Yahoo finance (see Figure 6). We studied the log-ratio of the closing daily prices. Remark that we completely treat the period studied in [11]. The penQLIK contrast is applied for a GARCH(1, 1) model (see (11) for a formal definition). The slope estimation procedure applied with u n = [n/(4 * log(n))] = 49 and K max = 25 returns the valuesκ ≈ 15 andK = 4, i.e. three breakst 1 = 499,t 2 = 792 andt 3 = 853. These values are close to the three breaks obtained in [11]: Remark that our first two breaks are closer to the events identified in [11] than their own breaks. Analyzing the estimated values of coefficients (see Figure 7), breaks are due to changes of the coefficients a 1 and b 1 in the GARCH(1, 1) model (11). There is no break for the mean µ and the a 0 coefficients, valued close to 0. Next, we compare the fitted volatilities of the parameters estimations over the whole sequence and within the 4 periods. The third period, corresponding to a change of the value of the parameter a 1 (a 1 (3) is not significantly different from 0), leads to an estimated volatility satisfying the recurrence equation σ t ≈ a 0 (3)+ b 1 (3)σ t−1 . In this period of high volatility, the estimated volatilities have different behaviors whether we take the break into account or not.

Proofs of the main results
In the sequel C denotes a positive constant whom value may differ from one inequality to another and (v n ) is a sequence such that v n = n/κ n for all n ≥ 1.

Proof of Proposition 2.2
(i) It is clear that {X t , t ≤ t * 1 } exists and is causal, stationary with finite moments of order r (see [2]). Therefore, X is defined by induction as follows: Thus, X t is independent on (ξ j ) j>t and it suffices to prove (ii) which immediately leads existence of moments.
and Lemma 6.2 is proved.
In the ARCH-type case when f θ = 0 and A 0 (h θ , {θ}) holds withβ (0) (θ) < 1, we follow the same reasoning than previously remarking that (14) has the simplified form: Then We easily conclude to the result by choosing p = u k as above.

2-) We detail the proof for one order derivation in the general case where
A 0 (f θ , {θ}) and A 0 (M θ , {θ}) hold with β (0) (θ) < 1. The proofs of the other cases follow the same reasoning.

Consistency when the breaks are known
When the breaks are known, we can choose v n = 1 for all n; in (4), the penalty term does not matter at all. Proof. Let us first give the following useful corollary of Lemma 6.3 Corollary 6.1. i-) under the assumptions of Lemma 6.3 1-) we have: ii-) Under assumptions of Lemma 6.3 2-) we have: We conclude the proof of Proposition 6.1 using L j (θ) = − 1 2 E (q 0,j (θ)) has a unique maximum in θ * j (see [13]). From the almost sure convergence of the quasi-likelihood in i-) of Corollary 6.1, it comes: Proof of Corollary 6.1. Note that the proof of Lemma 6.3 can be repeated by replacing L n by the quasi-likelihood L n . Thus, we obtain for i = 0, 1, 2, i-) Let j ∈ 1, . . . , K * . From [2], we have: Using (22), the convergence to the limit likelihood follows.

Proof of Theorem 3.1
This proof is divided into two parts. In part (1) K * is assumed to be known and we show ( τ n , θ n ) P −→ n→∞ (τ * , θ * ). In part (2), K * is unknown and we show K n P −→ n→∞ K * which ends the proof of Theorem 3.1.