Inference for a mean-reverting stochastic process with multiple change points

The use of an Ornstein-Uhlenbeck (OU) process is ubiquitous in business, economics and finance to capture various price processes and evolution of economic indicators exhibiting mean-reverting properties. When structural changes happen, economic dynamics drastically change and the times at which these occur are of particular interest to policy makers, investors and financial product providers. This paper addresses the change-point problem under a generalised OU model and investigates the associated statistical inference. We propose two estimation methods to locate multiple change points and show the asymptotic properties of the estimators. An informational approach is employed in detecting the change points, and the consistency of our methods is also theoretically demonstrated. Estimation is considered under the setting where both the number and location of change points are unknown. Three computing algorithms are further developed for implementation. The practical applicability of our methods is illustrated using simulated and observed financial market data.


Introduction
We examine the change-point detection problem on the drift parameters of a generalised version of the Ornstein-Uhlenbeck (OU) process introduced in Dehling et al. (2010); see also Dehling et al. (2014) and Nkurunziza and Zhang (2016), which is a compact version of Zhang (2015). Such a process X t is a solution to the stochastic differential equation (SDE) . . , p, θ = (μ 1 , . . . , μ p , −a) and denotes the transpose of a matrix. Here, W t is a one-dimensional standard Brownian motion defined on some probability space (Ω, F, P ). In particular, if L(t) = μ then (1.1) is the SDE of the classical OU process, which is commonly used to model the stochastic dynamics of various financial variables.
The OU process was initially introduced for problems in the physical sciences (e.g., Lansky and Sacerdote (2001)). Further, many economic indicators, prices in the financial market as well as processes in the natural and engineering are captured sufficiently by the OU model. The classical work of Vasiček (1977) employs an OU model for bond valuation. The importance of this stochastic process is also demonstrated by its ubiquity in many fields. For instance, the OU process is used in mathematical models of the electricity market (e.g., Erlwein et al. (2010)), commodity futures market (e.g., Date et al. (2013)), weather derivatives (e.g., Elias et al. (2014)), central-bank rate setting policy (e.g., Elliott and Wilson (2007)), spreads between pairs of securities (e.g., Elliott et al. (2005)), stochastic control-driven insurance problems (e.g., Liang et al. (2011)), spot freight rates in the shipping industry (e.g., Benth et al. (2015)), risk management (e.g., Date and Bustreo (2016)), and power generation (e.g., Howell et al. (2011)). In the OU-modelling context,  proposed a signal processing-based approach to determine presence of market liquidity regimes. Various applications of the OU process are also highlighted in biology (e.g., Rohlfs et al. (2010)), neurology (e.g., Shinomoto et al. (1999)), survival analysis (e.g., Aalen and Gjessing (2004)), and chemistry (e.g., Lu (2003Lu ( , 2004).
We note that the mean-reverting level of an OU process is constant, which can be a notable weakness for many financial datasets. This may be rectified by introducing a generalised OU process where a time-dependent function describes its level of mean reversion. Such a generalised version incorporates timeinhomogeneity and seasonality of mean reversion simultaneously. Dehling et al. (2014) developed the framework to study a change-point phenomenon under the generalised OU process. This allows the model to capture drastic changes at certain time points (e.g., dramatic changes to interest rates due to the outbreaks of financial crisis or war). In practice, many data series are characterised by some potential changes in their evolution structure, i.e., a sudden change in mean or variance and other model parameters. It is then of interest to determine the (i) existence and (ii) location of the change point. This implies segregating the data series into different segments and analysing them in a less efficient but more accurate way. Thus, our research contributions support and complement the objective of papers employing regime-switching OU-process as we provide a methodology to verify the switching phenomenon in the data. We go further by precisely estimating where the switch occurred and how many switches are possible given a data set. An instance of this support and complementarity are depicted in Subsection 3.1 of , where a simple statistical testing of regime-switching in the data was performed.
Pioneering contributions to this field of change-point detection were spearheaded by Page (1954) and Shiryaev (1963). Advances in recent years have tackled the (i) estimation of change points and coefficients of linear regression models with multiple change points (Bai and Perron (1998); Perron and Qu (2006); Lu and Lund (2007), Gombay (2010), and Chen and Nkurunziza (2015)); (ii) change-point testing for the drift parameters of a periodic mean-reverting process (cf. Dehling et al. (2014)); (iii) applications in finance (cf. Spokoiny (2009)); (iv) detection of malware within software (Yan et al. (2008)); (v) climatology (Reeves et al. (2007), Robbins et al. (2011), Gallagher et al. (2012)); and epidemiology (Yu et al. (2013)). The analysis of change points could be described more generally as a hypothesis-testing problem for the existence of change points in various locations. This could be viewed, from another perspective, as a model selection problem where the change points are additional unknown parameters to be estimated.
The change-point problems are typically examined depending on two alternatives: (i) the number of change points is known but their exact locations are unknown (Perron and Qu (2006) and Chen and Nkurunziza (2015)) and (ii) both the number and the exact locations of the change points are unknown. The estimation methods under the first scenario only require the identification of the exact locations of the change points. Clearly, the first alternative is more tractable but less realistic than the second. Closed-form solutions for the direct calculation of the change point are usually not available. Current changepoint estimation approaches are normally constructed to perform a search at every possible location of unknown candidate change points via some efficient computational algorithms subject to some constraint or criteria. Examples of well-known algorithms for change point detection include: (i) the binary segmentation type algorithm (Scott and Knott (1974); Sen and Srivastava (1975)), (ii) the segment-neighbourhood type algorithm (Auger and Lawrence (1989); Bai and Perron (1998)) with adaption to the restricted regression model (Perron and Qu (2006)); and (iii) the optimal partitioning type algorithm (Jackson et al. (2002)) and its pruned version, PELT method by Killick et al. (2012). Further details of these algorithms can be found in Killick et al. (2012) and Maidstone et al. (2014).
The intents of our work are motivated by two major research results. The first motivation is from Dehling et al. (2010) that derives a maximum likelihood estimator (MLE) for the drift parameters of the diffusion process and establishes its asymptotic properties. This was extended in Dehling et al. (2014), where there is one unknown change point and a likelihood-ratio test statistic was constructed to determine such change point. The second motivation is from Nkurunziza and Zhang (2016) that establishes the asymptotic properties of both the unrestricted and restricted MLE for the drift parameters of the generalised OU process with a single change point. A James-Stein-type shrinkage estimator for the drift parameters is proposed in Nkurunziza and Zhang (2016) as an improvement and it is also shown that the previously established asymptotic properties also hold for any consistent estimator for the rate of the change point.
Neither Dehling et al. (2014) nor Nkurunziza and Zhang (2016) offer a specific methodology to identify the change point, although Nkurunziza and Zhang (2016) suggest using a method in Chen and Nkurunziza (2015). The Chen and Nkurunziza (2015) method is, however, designed to find change points in multiple regression models, and is not particularly designed for the detection of change points in OU processes. This hole in the literature led us to develop the three main contributions of the current paper. First, we extend the singlechange point framework to the multiple-change point setting and present two consistent methods to estimate the unknown locations of change points. Second, we prove the asymptotic normality of the drift parameters' MLE. Third, we employ information-based statistics to resolve the issue of estimating the unknown number of change points and then created three algorithms to implement the calculations. We validate the performance of our estimation techniques using simulated and real market data. This paper is structured as follows. In Section 2, we present the formulation of the multiple change-point problem. Section 3 summarises the results of Dehling et al. (2014) and Nkurunziza and Zhang (2016) on MLE and the related asymptotic properties, which provide an impetus on the asymptotic performance of our proposed methods. Two estimation methods are put forward to determine the unknown locations of change points in section 4 along with the discussion of the asymptotic properties of the estimators; we find that the asymptotic properties obtained in Nkurunziza and Zhang (2016) also hold in our proposed techniques. Section 5 deals with the problem of both the existence issue and location of the change points using an information approach. We develop computing algorithms in section 6 in order to implement the proposed methods. In section 7, we assess the applicability of our methods through numerical examples on simulated and observed financial market data. Finally, section 8 provides some concluding remarks.
We start by assuming that the number of change points m is known, but the exact value of each change point denoted by τ 0 1 , . . . , τ 0 m (and correspondingly the exact rates s 0 j , j = 1, . . . , m) are unknown. Furthermore, considering that we have multiple change points in the model, we posit that these change points are asymptotically distinct. We further impose the following assumptions.
points' arrival rate, and if we have s j , the value τ 0 j is immediate. Assumption 1 implies that the length of each regime [τ 0 j−1 , τ 0 j ] is proportional to T . The structure of the model in each regime is similar to that of the nochange point process studied in Dehling et al. (2010); see also Nkurunziza and Zhang (2016) for the case of a single change point. This means that the results established in the existing literature could also be adapted to the case of multiple change points.
MLEs for the drift parameters and their asymptotic properties were shown in Dehling et al. (2010) for the case of no change point and in Nkurunziza and Zhang (2016) for the case of one change point. Certainly, Nkurunziza and Zhang (2016) is a special case of our study with m = 1. The next section reviews previous results and extends them to the multiple change points problem.

Prior MLE-based results and our extension
The asymptotic normality for the MLE estimator of the drift parameters in Nkurunziza and Zhang (2016) assumes that the estimator is already consistent. In our case, we shall prove (rather than simply assume) that such an estimator of the change point is consistent.
In the subsequent discussion, we write " to mean convergence in probability, convergence in distribution, and convergence almost surely, respectively. The notation ||.|| denotes the Frobenius norm for matrices. We use bold, unitalicised English or Greek letters in lowercase for vectors; and bold, unitalicised English or Greek letters in upper case for matrices.
The "O(·)" denotes the Landau symbol, also known as the "Big O" notation, which is used to describe the asymptotic behaviour of functions. So, for a set of random variables U n and a corresponding set of constants a n , U n = O p (a n ) means U n /a n is stochastically bounded. Formally, this means ∀ > 0, ∃ M > 0, P (|U n /a n | > M) < , ∀n. On the other hand, the symbol involving "small o", i.e., U n = o p (a n ) means U n /a n converges in probability to zero as n approaches an appropriate limit. So, since U n = o p (a n ) is equivalent to U n /a n = o p (1), convergence in probability to zero is here defined as lim n→∞ (P (|U n /a n )| ≥ ) = 0.

Log-likelihood function
The following assumption from Dehling et al. (2010) is also retained here.
Under Assumptions 1-2 and Theorem 7.6 of Lipster and Shiryaev (2001), the corresponding likelihood function in our modelling framework is * (θ, The log-likelihood function is therefore Note that the (i 1 , i 2 ) entry (i.e., the element in the i 1 -th row and i 2 -th column)

Maximum likelihood estimators for the drift parameters
By setting the first partial derivatives with respect to each of the parameters of (θ, X t ) to 0, we obtain the MLE of the drift parameters, provided Q (τ 0 Dehling et al. (2010) shows the existence of lim T →∞ (s 0 examines its asymptotic properties (note that the term ' 1 T Q −1 ' in the remark may be a typo and the corrected form, as one can confirm from Proposition 4.5 in Dehling et al. (2010), should be T Q −1 ). Further, Nkurunziza and Zhang (2016) revisits this topic and employs the incompleteness condition of basis functions (see Assumption 3) to guarantee that the denominator of expression (14) in Dehling et al. (2010) does not equal to 0, which is an essential condition for lim T →∞ to be positive definite and thus invertible. Nkurunziza and Zhang (2016) also extends the asymptotic results established in Dehling et al. (2010) to the case of single change point. The results in Nkurunziza and Zhang (2016) can be further adapted to give the asymptotic properties of j ) (as well as its inverse) under the following assumption. Assumption 3 (Assumption 2 in Nkurunziza and Zhang (2016)). For any T > 0, the basis functions {ϕ k (t), k = 1, . . . , p} are Riemann-integrable on [0, T ] and satisfy three properties.
(1) Periodicity. That is, Dehling et al. (2014). Also, for T large, the family of basis functions {ϕ k (t) : k = 1, . . . , p} is incomplete, and this is imposed in Nkurunziza and Zhang (2016) to establish the positive definiteness of 1 T Q (0,T ) . Note that the link between the incomplete basis functions and positive definiteness of 1 T Q (0,T ) is discussed in Zhang (2015), and Nkurunziza and Zhang (2016) applied this result directly. The set of basis functions is incomplete in the sense that there are smooth functions which can not be represented as their linear combination (to fix ideas, consider the set {1, cos( 2πjx T )} p j=1 on the interval [0, 2π]). Another condition used in Nkurunziza and Zhang (2016) in proving the asymptotic properties of the proposed estimators is the boundedness of the basis function (see the proof of Theorem 6.1 in Nkurunziza and Zhang (2016) for more details). Following the methodologies in Nkurunziza and Zhang (2016), we require the incomplete basis function {ϕ k (t), k = 1, . . . , p} to be sufficiently well behaved so that the asymptotic results similar to those in Nkurunziza and Zhang (2016) can be established. A restructure was to guarantee this which still admits practical choices for basis functions, for example, Fourier basis, to require ϕ k (t), k = 1, . . . , p, t ∈ [0, T ] to be bounded on R + (i.e. ϕ k (t) ≤ K ϕ for some 0 < K ϕ < ∞ ) as for every k the basis function ϕ k (t) is bounded on [0, T ] and v-periodic. Moreover, for the rest of this paper, we assume without loss of generality that the sample size T is an integral multiple of the period length v, i.e., T = Nv for some integer N . Without loss of generality, we let v = 1, which implies that ϕ k (t + 1) = ϕ k (t).

Remark 1. The two points under Assumption 3 correspond to similar assumptions in
Using the results in Dehling et al. (2010) and Nkurunziza and Zhang (2016), the MLE of the drift parameters based on the log likelihood function provided above are given byθ = (θ Substituting (1.1) into (3.1) and going through some algebraic computations will lead toθ

Asymptotic properties of the MLE
To study the asymptotic proprieties of the MLE in the next section, equation (3.1) to be precise, we review the established asymptotic results in Dehling et al. (2010) for the case where there is no change point (m = 0) and also the results in Nkurunziza and Zhang (2016) when there exists one change point (m = 1).
For more details about the above properties, readers may refer to Dehling et al. (2010) or Nkurunziza and Zhang (2016). These properties, together with Slutsky's theorem, yield Nkurunziza and Zhang (2016) extend the above asymptotic properties to the case of a single change point. Using similar arguments, we extend these results in the context of multiple change points. We first present a result covering the coefficients of SDE (1.1).

Proposition 3.1. Under Assumptions 1-3, the coefficients in SDE (1.1) for m ≥ 1 satisfy both the space-variable Lipschitz and the spatial growth conditions.
Proof. See Appendix A.
Using Proposition 3.1 and the mean-reverting property of OU process, together with the similar arguments employed in the proof of Theorem 3.1 (as well as Lemma 3.3) in Nkurunziza and Zhang (2016), it may be verified that SDE (1.1) admits a solution that is uniformly bounded in L 2 and Employing (3.8), we (3.3) and (3.4) to their representations in context of multiple change points (m ≥ 1) given by (3.12) Using (3.6) and (3.8) and similar arguments in the proof of Theorem 3.1 in Nkurunziza and Zhang (2016) with the estimated change point replaced by the exact value of the change points, the following properties hold: Further, by the Continuous Mapping Theorem, So long as Assumptions 1-3 hold and with the aid of similar argument used in the proof of Proposition 2.1 in Nkurunziza and Zhang (2016), it may be shown that Σ j is positive definite.
Note that (3.13) and (3.15) are key elements in analysing the asymptotic properties of Q (τj−1,τj ) and its inverse, whereτ j is the estimator of τ 0 j . Invoking the boundedness property of ϕ k (t), we have for k = 1, . . . , p and j = 1, . . . , m. Similarly, using (3.8), From Proposition 1.21 in Kutoyants (2004), we get where 0 is a vector of zeros andΣ = diag(s 0 . Combining (3.13), (3.15) and (3.18), along with the Slutsky's Theorem and some algebraic computations (see Proposition 6.3 in Nkurunziza and Zhang (2016) for more details), The above asymptotic properties are established based on the exact values of the locations of the change points τ 0 j , j = 1, . . . , m. However, in practice, τ 0 j are often unknown. Hence, we shall devise methods to estimate the unknown τ 0 j and investigate whether the above asymptotic normality still hold for the estimated change points.

Estimation of change points and pertinent asymptotic properties
In Subsections 4.1 and 4.2, we develop two techniques to estimate the unknown locations of change points. The asymptotic normality of θ based on the estimated change points is discussed in Subsection 4.3.

Least sum squared error method
We introduce the least sum of squared errors (LSSE) method then investigate the consistency of our proposed estimator. Consider a partition Due to the uncertain locations of estimated change points, the exact value of the drift parameters θ and the MLE may have different indices. For example, ifτ j > τ 0 j , then for all t i ∈ (τ 0 j ,τ j ], the associated exact value of the drift parameters is θ (j+1) but the MLE isθ (j) . So here, θ i andθ i refer to the exact value and MLE of the drift parameters at time point t i , respectively. In this case, = Q −1 (τj−1,τj )r (τj−1,τj ) for j = 1, . . . , m + 1. Then by the Euler-Maruyama discretisation method, where i is the error term σ √ Δ t ω i , and ω i is the ith independent draw from a standard normal variable. Therefore, we could now use the LSSE method to estimate the change points.

Consistency of the proposed estimator
Under Assumptions 1-3, are both positive definite with probability 1 provided that the basis functions {ϕ k (t), i = 1, . . . , p} are incomplete. Moreover, using (3.13) (see also Proposition 2.1 in Nkurunziza and Zhang (2016)), we have that converges in probability to some positive definite matrices for large T , as do their respective discretised versions. Hence, for large T , it is reasonable to impose a useful assumption in proving the consistency of the estimators of the change points.
Assumption 4. For every j = 1, . . . , m, there exists an L 0 > 0 such that for all L > L 0 the minimum eigenvalues of For the motivation of the above assumption, see Perron and Qu (2006) and Chen and Nkurunziza (2015). The next two propositions provide results characterising consistency. Proposition 4.1 shows that the estimated rateŝ j =τ j T is consistent for s 0 j , for j = 1, . . . , m. Proposition 4.2 gives the convergence rate T ofτ j , j = 1, . . . , m. In reality we may encounter the case where the shift is time-dependent, and, in particular as T tends to infinity, the shift may shrink towards 0 at rate v T , i.e., In this case, the validity of Propositions 4.1 and 4.2 depends on the speed v T . In fact, using similar arguments as in the proofs of Propositions 4.1 and 4.2, we have the following corollary.

Maximum log-likelihood method
We introduce an alternative method to estimate the location of the change points based on the maximum of log-likelihood function. Recall that the log-likelihood function for (1.1) with the exact change points τ 0 1 , . . . τ 0 m is given by From (4.4), the estimator of the location of the change points is given whereθ(τ ) is the MLE of θ based on the given change points τ = (τ 1 , . . . , τ m ).
In practice, the calculation of log (τ ,θ(τ )) in (4.5) relies on numerical approximation methods (see Auger and Lawrence (1989)) to compute the integrals inside log (τ ,θ(τ )). For example, by approximating the Riemann sum based on The approximated version of (4.5) is then

Consistency of the proposed estimator
We now link results (4.7) and (4.2), which are the respective results from the LSSE-and MLE-based methods. Using the consistency properties, we can establish the asymptotic normality for the MLE of drift parameters based on the estimated change points.

Asymptotic normality ofθ based on the estimated change points
Previously, we established the T -rate consistency of the estimated change points. Based on these asymptotic consistency results, we extend some of the asymptotic normality results in Nkurunziza and Zhang (2016) that are related to our study to the case of multiple change points.
Proposition 4.5. Under the same conditions as in Proposition 4.4, we have that for i = 1, . . . , p, Proof. This follows from (3.8) and similar arguments in the proofs of Lemmas 3.1 and 3.2 in Nkurunziza and Zhang (2016).
Employing Propositions 4.4 and 4.5, we have the following results.
where Σ j is defined in (3.14).
Proof. See Appendix C (or Proposition 6.2 in Nkurunziza and Zhang (2016) for the single change-point case).
The next result following from Propositions 4.6 and 4.7 plays an essential role in proving the asymptotic normality of MLE of the drift parameter based on the proposed estimated change points.
Proof. See Appendix C.

Estimating the number of change points
In the last section, we developed two consistent estimation methods for the case when the number of change points is known. In this section, we extend our examination of the change-point problem when the number of change points is also unknown. Hence, we are interested in knowing the number of change points as well as their exact locations. One popular methodology for detecting the unknown number of change points is to treat this issue as a model-selection problem. For instance, adding one change point into (1.1) brings p + 1 extra drift parameters into the model. Thus, detecting the number of change points can be considered as selecting the most suitable statistical model from a series of candidate models with different number of change points, and this can be solved using an informational approach. Such approach deems the most appropriate model as the one which minimises the log-likelihood-based information criterion IC(m) = −2 log (τ ,θ) + (m + 1)h(p)φ(T ). (5.1) In (5.1), log (τ ,θ) is defined in (4.4);τ is obtained via (4.5) corresponding to each m; h(p) = p + 1 if there is no change in σ (or p + 2 if there is a change in σ); φ(T ) is a non-decreasing function of T , the length of the data set; and m is the potential number of change points to be determined. Based on the asymptotic results for the Riemann sum approximation of the log-likelihood function log (τ ,θ), we use the criterion where log * ([0, T ], τ ,θ(τ )) is given in (4.6).
Note that, if the number of change points is known, the term (m+1)h(p)φ(T ) is fixed and the approach covering (5.2) is equivalent to the maximum loglikelihood method introduced in the previous section. The efficiency of information criterion depends on the choice of the penalty criterion φ(T ). For example, if φ(T ) = 2, then (5.2) reduces to the well-known Akaike information criterion (AIC) Akaike (1973). However, in practice, a model selected by minimising the AIC may not be asymptotically consistent in terms of the model order; see Schwarz (1978). Modified information criteria were, thus, proposed to overcome this problem. One example is the Schwarz information criterion (SIC) Schwarz (1978), which sets φ(T ) as the logarithm of the sample size. The SIC has been successfully applied to the change-point analysis in the literature. As it gives an asymptotically consistent estimate of the order of the true model, we also adapt the SIC for our theoretical development.
Note that the penalty term (m + 1)(p +1)φ(T ) in SIC increases as the sample size increases. Hence, for large sample size, SIC tends to ignore the relatively small changes in the process. This feature makes it useful for those who are mainly interested in studying only the major changes within certain time period. Further, based on the SIC, we have the following asymptotic results for (5.2). Proof. See Appendix D.
Proposition 5.1 tells us that, for large T, IC(m) reaches its minimum value when m = m 0 and this allows us to detect the exact value of m 0 .

Computational algorithms
In this section, we put forward algorithms for computing (4.2) and (4.7) when m 0 is known. We also provide algorithms for computing (5.2) when m 0 is unknown. Based on these algorithms, a simulation study to examine the efficiency of the proposed methods for different time periods T is presented in Section 7.1. Our numerical results will show that, with our sample parameter set, the proposed methods perform well for values of T as small as T = 5.

Algorithm for (4.2) and (4.7) (with known m 0 )
In estimating the unknown locations of change points, a standard searching method is to compute the criteria, i.e., least squared errors for (4.2) or maximum log likelihoods for (4.7), through all possible locations of change points and search for the one that returns the optimal value. However, for m change points the associated costs for the above searching procedure are of order O((T/Δ t ) m ). Thus, for large T and small Δ t , the computations can be time consuming. To overcome this problem, we adopt to our two proposed LSSE and MLE methods a dynamic programming algorithm due to Bai and Perron (1998), see also Perron and Qu (2006), to reduce the computational cost to O(m(T/Δ t ) 2 ) for m change points. This algorithm is very efficient when m ≥ 2. Algorithm 1. Let H 1 (r, T r ) be either H 1 (r, T r ) = min τ SSE([0, T r ], τ ,θ(τ )), the least sum squared error for (4.2) or H 1 (r, T r ) = max τ log * ([0, T r ], τ ,θ(τ )), the maximum Riemann sum approximation of log likelihood for (4.7)) computed based on the optimal partition of time interval [0, T r ] that contains r change points. Also, let H 2 (a, b) be the SSE for (4.2) or Riemann sum approximation of log likelihood for (4.7)) computed based on a time regime (a, b]. Further, we assume that Assumption 1 holds and let h = T be the minimal permissible length of a time regime. Then (4.2) or (4.7) with known m = m 0 can be computed as follows.
The steps in Algorithm 1 can be viewed as a combination of two components.
Step 1 computes all possible choices of H 2 , and the computations in this step are at most of order O((T/Δ t ) 2 ) as there are at most (T/Δ t ) 2 different time periods (a, b] in the dataset. This step is useful, since in the succeeding steps some pairs of (a, b] will be visited more than once during the optimisation process, so using previously saved results for H 2 (a, b) will be helpful to reduce computations.
Steps 2-4 can be treated as an application of the Segment Neighbourhood Search (SNS) method introduced by Auger and Lawrence (1989). The goal of Steps 2-4 is to search for the global optimal locations of the m change points and the total computation costs in these steps are also of O(m(T/Δ t ) 2 ). Note that when m = m 0 = 1 (a single change point), only the last step is needed to search for the optimal location of a change point, and the related computations costs are of O(T/Δ t ).
When m 0 > 1 and T is large, Algorithm 1 can be extremely time-consuming because of the O(m(T/Δ t ) 2 ) computations, we aim to decrease the computational costs in this case. Apparently, some computations in Step 1 may be redundant. For example, in Step 2, the domain for the optimisation problem is Step 1 become unnecessary as these results will not be used. Thus, the computations in Step 1 could be moved into Steps 2-4 so that only the necessary H 2 are computed and stored. This means one could begin the algorithm from Step 2 by computing and storing H 2 (0, b) Step 3, for r = 2, . . . , m − 1, one only needs to compute and save H Step 4, we compute and store H 2 (a, T ) for a ∈ [mh, T − h] before solving for H 1 (m, T ).

Algorithm for (5.2) (with unknown m 0 )
When m 0 is unknown, one may compute and compare the m values of (5.2) and m varies from 0 up to m max for some 0 ≤ m max ≤ [T/h] . The upper bound m max can also be predetermined from the descriptive analysis of the observed processes. For each m, one can first apply Algorithm 1 to obtain the estimated change points and compute (5.2) accordingly. After the m max computations, the desiredm is the one that returns the minimum value of (5.2). By Proposition 5.1, m is consistent when T is large, provided m 0 ∈ [0, m max ].
If we directly apply Algorithm 1 to m = 1, . . . , m max , the total computations will be of order O((1 + 2 + . . . + m max )(T/Δ t ) 2 ). To further simplify the computations, we study the behaviour of Step 2-3 in Algorithm 1 when m increases from m * to m * + 1. In this case, the ranges of T r reduces from This implies that the stored optimisation results of Steps 2 and 3 in Algorithm 1 at the previous step (m = m * ) can also be used in the current step (m = m * +1). Therefore, with the previously stored results, the only step that needs to be updated for each m is when r = m−1 and r = m, and the associated computations are of order O((T/Δ t ) 2 ) at r = m − 1 and O(T/Δ t ) at r = m. Based on these considerations, we tailor the SNS algorithm for (5.2) and the related computations for m = 0 . . . , m max are of order O(m max (T/Δ t ) 2 ).
Step 1: Follows all steps in Algorithm 1 to search for the optimal locations of the m estimated change points then store the computed value of (5.2) for m = 0, 1, 2. Note that the results of H 2 (a, b) for all (a, b] such that a − b ≥ h as well as the optimisation results of H 1 (r, T r ) for all r = 1, . . . , m and T r ∈ [(r + 1)h, T − (m − r)h] need to be stored for future use. Step 3:m is obtained from m = 1, . . . , m max that returns the smallest value of (5.2).
The advantage of SNS method is that it returns the optimal locations of change points for every m = 1, . . . , m max . Hence, it is useful if one interested in investigating the relationships between the locations of change points and m. However, for large T and m max , the O(m max (T/Δ t ) 2 ) computational costs in the SNS method may be high. In addition to SNS method, another dynamic programming algorithm for finding the unknown number of change points is called the Optimal Partitioning (OP) algorithm introduced by Jackson et al. (2002). The related computational costs are of order O(T/Δ t ) 2 ) for any m; hence, it is more efficient than SNS when m max is large. Based on the OP algorithm, Killick et al. (2012) introduced the Pruned Exact Linear Time (PELT) method. Although the maximum computational costs for the PELT algorithm is still up to O(n 2 ) for a data set with size n, the computations in the PELT method, which involved pruning of the solution space under some conditions can be much less than those required in the OP.
However, the PELT method introduced in Killick et al. (2012) may not satisfy Assumption 1, which is essential for most of the theoretical properties developed in this paper. In fact, under Assumption 1: (i) there is no change point in time period [0, h), where h is defined in Algorithm 1; (ii) if there is a change point τ * ∈ [h, 2h), then there is no non-zero change point prior to τ * ; and (iii) for any potential change point τ * ∈ [2h, T ], the minimal distance between it and the most recent change point prior to this change point is at least h. Based on these considerations, we use the following modified version of the PELT algorithm.
Algorithm 3 (Modified PELT method). Let n = T/Δ t be the length of the data set based on the partition 0 = t 0 < t 1 < . . . < t n = T of time period [0, T ] with increment Δ t , and let t h = h/Δ t . Set SS 2t h = {0} and F (t i ) = 0 for i = 1, . . . , t h − 1. Then, for i = t h , . . . , n, compute and store the values obtained from the following steps.

Numerical demonstrations
The Monte-Carlo simulation technique will be used in Subsection 7.1 (i) to evaluate the comparative performance of the two estimation methods, viz. LSSE in (4.2) and MLL in (4.7) to determine the unknown location of m 0 change points assumed to already exist; and (ii) to test the method in (5.2) for detecting the unknown number of change points. In Subsection 7.2, we illustrate the various implementation details of our proposed methods on some observed financial market data.

Monte-Carlo simulation study
Our simulation considers two different scenarios (or cases). In the first case, we study the performance of the proposed methods under the classical OU process defined by In the second case, the performance evaluation of the proposed methods is tested assuming a periodic mean-reverting OU process, with 2-dimensional periodic incomplete set of basis functions 1, √ 2 cos πt 2Δt (which are orthogonal on [0,T ] with weight fixed to 1), given by Each case consists of 500 iterations. Although an exact solution is available, we choose to use the Euler-Maruyama discretisation scheme to be consistent with the results of Nkurunziza and Zhang (2016). In each iteration, we first generate a desired simulated process based on a given period T with pre-assigned "true" parameters such as the number and location of change points and the model coefficients. To evaluate the performance of (4.2) and (4.7), we specify the number of change points to be known but the rate is unknown. Then, we estimate and record the change points' arrival rates by applying (4.2) and (4.7) on the simulated process. The detailed simulation setup and results are reported in Subsection 7.1.1.
In Subsection 7.1.2, we also use the Monte-Carlo simulation method to investigate the performance of (5.2). That is, we assume that m 0 is unknown and apply (5.2) with m ranging from 0 to m 0 + 3. Then, the m that returns the minimum value of (5.2) is chosen as the estimated value for the number of change points. After 500 iterations, we analyse the performance of the proposed methods based on the recorded results.

Estimating the rate s j of change points
We first study the performance of (4.2) and (4.7) in estimating the rates of the change points with known m 0 . For the simulation setup, we consider the case where m 0 = 2 and 3 with different time periods T = 5, 10, 20, respectively. The pre-assigned values of the coefficients are provided in Table 1.
For each case, after 500 iterations we record the mean of the estimates based on (4.2) and (4.7), together with the 95% empirical confidence interval (i.e., locating the 2.5 and 97.5 percentiles) and also the mean-squared error. The results are reported in Tables 2-5. In this section, we also report the following     histograms: Figure 1 presents the histograms of the estimated rates based on MLL method for Case 1 with m 0 = 3 when T increases from 5 to 20, which shows the behavior of the estimated rates as T increases. Figure 2 shows the histograms of the estimated rates for the two cases under the same conditions (i.e. m 0 = 3, T = 10 and the rates are estimated by LSSE method). Moreover, for the convenience of the reader, all histograms of the estimated rates for m 0 = 3 are provided in E (Figures 8-13) for reference. From Tables 2-5 (along with Figures 1, 2 and 8-13 in E), we see that in Cases 1 and 2, both proposed methods (4.2) and (4.7) estimate very accurately the exact rates of change points. In particular, the sample means of the estimated change points' arrival rates are close to the exact values, and the results obtained by the 2 proposed methods are very close, which confirms Proposition 4.3. We also clearly observe that as T increases from 5 to 20, the lengths of the 95% empirical confidence intervals and MSEs of the two estimators all decrease. This is well substantiated for example by the pertinent histograms in Figure 1, which shows that when MLL method is employed to estimate the change points' arrival rates in Case 1 with m 0 = 3, the central tendencies of the estimated rates are all close to their exact values, and the sample variances decrease as T becomes larger. Similar evidences are shown for other choices of scenarios (different combinations of cases, methods and time period T ) as illustrated in Figures 8-13. Although not shown in this paper, the histograms for the case m 0 = 2 exhibit similar features. These outcomes confirm the theoretical findings regarding the asymptotic consistency of our two proposed methods.
Also, from Tables 2-5 and Figure 1 (see E as well), the lengths of the 95% confidence intervals (C.I.'s), with corresponding MSEs, of the estimated rates for the first and the last unknown change points (ŝ 1 andŝ 3 ) are more accurate than those of the middle change point (ŝ 2 ). Moreover, the improved accuracy of s 1 andŝ 3 is more sensitive to the increase of T as compared to that ofŝ 2 . This is because the unknown change point's arrival rate s j satisfies s j−1 < s j < s j+1 for j = 1, . . . , m. For s 1 and s m , one of their boundaries is known (which are s 0 = 0 and s m+1 = 1, respectively), whilst for the intermediate rates both the upper and lower bounds s j−1 and s j+1 are unknown. Therefore, under the same condition, the uncertainties of the first and the last change points' arrival rates would be lower than that in the middle.
For the comparisons between Case 1 and 2, we can see from the selected scenario (m 0 = 3, T = 10, LSSE method) shown in Figure 2, along with the results in Tables 2-5 that under the same conditions, the lengths of the 95% empirical C.I. and MSEs in Case 2 are all smaller than those in Case 1. However, since the simulated processes in the 2 cases are generated from different SDEs, it may be inappropriate to make conclusion based only on the results shown in the provided tables and figures. In fact, note that the main difference between (7.1) and (7.2) is the number of coefficients in the models. Therefore, the comparison between the 2 models when fitting them to the same process can be considered as a model selection problem that has been well studied in the literature.

Estimating the number of change points
In this subsection, we study the performance of (5.2) in estimating the unknown number of the change points based on Algorithms 2 (SNS) and 3 (Modified PELT). For the simulation setup, we assume the exact value of m 0 = 2, with different time periods T = 5, 10, 15, 20. The pre-assigned coefficients are provided in Table 6.
Based on the simulated process, we apply Algorithms 2 and 3 with m ranging from 0 to 5 to estimate the unknown number of change points. In Tables 7, we count and report the cumulative frequency (CF) of 500 iterations that return the correct estimates 500 i=1 1(m i = m 0 ) and the relative frequency (RF)  For the estimated numberm of change points, one could see from Table 7 that, when m 0 = 2, the proposed methods perform very well in both cases with different time periods. Furthermore, the accuracy of the estimating results in different cases all increase as T increases. These results suggest that our proposed method is asymptotically consistent, which confirms the theoretical finding in Proposition 5.1.

Implementation on observed financial market data with discussion
We apply the estimation methods to the Brent oil one-month futures settlement daily price data for the period 18 March 1993 to 25 September 2015. The data set is available at www.quandl.com. The empirical studies of Schwartz (1997) and Chen (2010) showed that meanreversion features hold for prices of several commodities including oil. Hence, we use OU-types processes to model such price behaviour. We first fit the classical OU process without any change point, i.e., using the dynamics dX t = (μ − αX t )dt + σdW t ), to the log-transformed data series with σ's estimate as the data's realised volatility σ = ti∈[0,T ] (X ti+1 − X ti ) 2 /T . The MLE of the drift parameters are given byμ = 0.48 andα = 0.12. Based on these MLE values, the log likelihood (via the Riemann-sum approximation) is about 1.02 and IC(m = 0) = 15.27.
However, from the plot of the price series in Figure 3, there are several changes in the shapes of the price evolution. This observed feature suggests that it may be more appropriate to use (7.1) with unknown (m > 0) change points.
The data set covers approximately 22.5 years giving a sample size of 5735 trading days. The recorded yearly number of trading days varies from year to year; so, for convenience, we let Δ t = 22.5/5735. Since the size of the data set is large, we apply Algorithm 3 using a minimum permissible regime-time length h of 0.25/Δ t ≈ 63 trading days (quarterly). The algorithm detectsm = 2 change points, which occurs on 24 September 2008 and 23 December 2008, with a corresponding log likelihood increase (via Riemann-sum approximation) of 26.81 and IC(m = 2) = −1.69 lower than IC(m = 0). To confirm the results, we also apply Algorithm 2 with LSSE and MLL methods respectively and m max set to be 10. The results are the same as that obtained from Algorithm 3.
The plot of the price series, with change points indicated, is depicted in Figure 4. It shows that, from September 2008 to March 2009, there is a decreasing trend in the log-transformed futures prices and then the trend is slightly increasing after this period. Further, based on the estimated change points, the MLE of the drift parameters, and the associated statistics such as (long-term) means (i.e.,μ (j) /α (j) ) and variances (i.e.,σ 2 /(2α (j) )) are given in Table 8. From Table 8, there are huge changes in the MLEs of the drift parameters under different T 's. Based on the MLEs, we also plot two simulated series based on the OU process with and without change points; see Figures 5 and 6, respectively. As most notably expected, the simulated series based on the OU process with two change points is closer to the original series, especially during the period spanning 25 September-23 December 2008, than the simulated series based on the OU process without change point. Judging from these observed characteristics and taking into account the above-mentioned substantial improvements in the log likelihood and SIC values, we conclude that the OU-process with two change points occurring on 25 September 2008 and 23 December 2008 is the appropriate model for the data set that we analysed. Moreover, we see from Figure 4 that from 18 March 1993 to 25 September 2008 (the first estimated change point), there are several noticeable changes in the series. For example, from 1996 to 1998 there was a fall in the futures price. However, based on the SIC, these changes are not significant enough to warrant the inference of a regime change and they are therefore ignored by the proposed methods. We take a closer look concentrating only on the 18- Mar-1993-to-25-Sep-2008 data set to see if there is any change point at all. This analysis is equivalent to reducing the sample size and the penalty term of the SIC accordingly. Algorithm 3 is re-applied to the reduced data set with a sample size 3928 and maintaining the same Δ t and σ as those in our other experiments. The resulting SIC indicates still no change point during the shortened period. Furthermore, we employ the estimated parameters and sample size in our reduced data set to run a simulation similar to that in Subsection 7.1.2 in assessing the performance of the proposed methods, and we obtained an RF of 86.2% in producing the correct estimates.  The above result tells us that an OU process without a change point would be appropriate to model the data series from 18 March 1993 to 25 September 2008. However, from Table 8 the respective long-term mean and variance (of values, not returns)μ α and σ 2 2α are 25.794 and 10.889, which are both higher than those in the two other time periods. Such high statistics may be less preferable in practice, although they reasonably explain the increasing trend in the investigated time period. On the other hand, we know that imposing more change points, which is equivalent to increasing the number of coefficients in the model, can reduce the variance. Hence, we examine the potential reduction in the variance by imposing a change point into this period. To this end, we fit an OU process with one change point to the series and use Algorithm 1 to estimate the location in the OU process. The estimated change point is at 15 February 1999, which   is near the bottom of the series; see Figure 7. Based on Table 9, by imposing a change point at 15 February 1999, the meansμ α before and after the change point both strikingly decrease to 2.646 and 4.407, respectively, in comparison to the previous result of 25.794. Additionally, the variances σ 2 2α for the time periods before and after the change point also markedly go down to respective values of 0.085 and 0.163. The approximate log likelihood increases correspondingly to 4.199 for the period 16 February 1999 -25 September 2008. This implies that imposing a change point (15 February 1999) into the model does improve the accuracy. Nevertheless, recall that our SIC-based method shows no change point at this time period. Therefore, we demonstrated a strong potential for an over-fitting problem to arise when a change-point assumption is unnecessarily introduced into the model. This also reminds us of a trade-off between accuracy improvement and issue of over fitting that must be avoided whenever possible.

Conclusion
The main contribution of this paper is the development of MLL-and LSSEbased methods in detecting the unknown number of multiple change points along with the identification of their locations in a generalised univariate OU process. Additionally, we showed that our proposed estimators for the change points' locations satisfy the asymptotically consistent and normality properties under certain suitable conditions that we painstakingly imposed. These results guided the design of three computing algorithms customised for the efficient implementation of our proposed methods. The numerical applications we showcased covering both simulated and observed data illustrated the excellent performance and accuracy of the estimation approaches that we created to handle change points detection. The usefulness of our results have relevance to regulatory authorities' policy-making, trading strategy's construction by investors and provider's of financial products and services, and other scientific endeavours in the natural and social sciences impacted by sudden and significant changes (e.g., break, jumps, shifts, etc) in the time-series data. This work provides impetus for the investigation and development of methodology suited in tackling further the multiple-change point problem for a multivariate OU process and other closely related modelling challenges in the research literature and practice that entail the statistical inference of stochastic processes.

Appendix A: Proof of Proposition 3.1
Proof of Proposition 3.1. We first need to prove that the coefficients of (1.1) satisfy the space-variable Lipschitz condition. That is, Given that σ(t, x) is constant in our modelling framework, the second term above is 0. Hence, Since a (j) < ∞ for j = 1, . . . , p, there exists a K a > 0 such that m+1 j=1 (a (j) ) 2 ≤ K a . Then, Next, we prove the spatial growth condition. That is, Using the identity (a + b) 2 ≤ 2a 2 + 2b 2 , we have Since {ϕ k (t), i = 1, . . . , p} are bounded, we can find a constant K b > 0 such that This gives Then

Appendix B: Proof of the propositions in section 4
Proof of Proposition 4.1. Letû i be the residual of the ith element based on the estimated change points {τ j }, j = 1, . . . , m + 1, i.e., where θ i andθ i are defined in Section 4 as the true value and MLE of the parameters based on the assigned estimated change points of the coefficients associated with the ith element. Also, letû 0 i be the residual of the ith element based on the exact change points {τ 0 j , j = 1, . . . , m + 1} and θ . The proof relies on investigating the behaviour of By (4.2), (B.1) ≤ 0 with probability 1. Hence, it remains to show that if one of the change points is not consistently estimated, (B.1) > 0 with positive probability yielding a contradiction.
Using the quadratic expansion (a + b) 2 = a 2 + 2ab + b 2 and some algebraic computations, (B.1) can be expressed as We aim to show that if there exists a change point τ 0 j , j = 1 . . . , m, and it is not consistently estimated, the first term 1 T ti∈[0,T ] z i θ i −θ i 2 is larger than a positive constant with positive probability, whilst the rest of terms is of o p (1) and hence, (B.1) > 0 with positive probability. To this end, we first provide a lemma, which will be useful in deriving the asymptotic consistency of the estimated change points.
with positive probability.
Proof. If the change point τ 0 j is not consistently estimated, then with some positive probability there exists an η > 0 such that there is no estimated change point in [τ 0 j − ηT, τ 0 j + ηT ] for some η > 0. Without loss of generality, let Let γ 1 and γ 2 be the smallest eigenvalues of 1 Chen et al. Using the convexity of a quadratic function, we have Hence, Under Assumption 4, γ 1 and γ 2 are both bounded away from 0 and min(γ 1 , γ 2 ) is also bounded away from 0. Therefore, the right-hand side of the above inequality is positive. Then, the proof is complete by letting C 0 = η min(γ 1 , γ 2 ) 2 .

Lemma B.2. Under Assumptions
For the proof of Lemma B.2, one may refer to Proposition 6.2 in Nkurunziza and Zhang (2016). To emphasise again, both lemmas B.1 and B.2 are key in proving the asymptotic properties of the proposed estimators.

(B.15)
Using the asymptotic results in Proposition 4.1, we have that Similarly, is of order o p (1). Thus, (B.15)/(τ 2 − τ 0 2 ) = o p (1). Furthermore, using the same argument above together with (B.6), we have Proof of Proposition 4.6. By Propositions 4.4-4.5 (see also Proposition 6.2 in Nkurunziza and Zhang 2016), it may be shown that Proof of Proposition 4.7. Here we only prove that The convergence of the remaining components may be proved analogously by the same approach. Since by Proposition 4.2, lim for some C * > 0 with probability 1. Therefore, IC(m < m 0 ) > IC 0 (m = m 0 ) with probability 1. This completes the proof of part (i).
Part ( Note that for the case where m = m * > m 0 and the estimated locations of the m * change points are given byτ 1 <τ 2 <, . . . , <τ m * , we have . We note that m * > m 0 , and from m * of these estimated change points, there are m * − m 0 estimated change points that divide the time interval [0, T ] into m * − m 0 + 1 regimes such that within each regime, the number of estimated change points is equal to the number of exact change points. For example, suppose that m 0 = 2 and m * = 3 with 0 <τ * 1 < τ 0 1 <τ * 2 < τ 0 2 <τ * 3 < T. Then, if we divide the given time interval into [0,τ * 2 ] and (τ * 2 , T ], we can see that within these two intervals, the number of estimated change points is equal to the number of exact change points.
Since within (τ * j−1 ,τ * j ], the number of estimated change points and the number of exact change points are the same, we first consider the case where there is no any change points within (τ * j−1 ,τ * j ]. In this case, we have τ 0 k * −1 <τ * j−1 < τ * j < τ 0 k * for some j and k * . Then,θ * . Substituting the above expressions into (D.7), we have