Detecting Changes in Covariance via Random Matrix Theory

Abstract A novel method is proposed for detecting changes in the covariance structure of moderate dimensional time series. This nonlinear test statistic has a number of useful properties. Most importantly, it is independent of the underlying structure of the covariance matrix. We discuss how results from Random Matrix Theory, can be used to study the behavior of our test statistic in a moderate dimensional setting (i.e., the number of variables is comparable to the length of the data). In particular, we demonstrate that the test statistic converges point wise to a normal distribution under the null hypothesis. We evaluate the performance of the proposed approach on a range of simulated datasets and find that it outperforms a range of alternative recently proposed methods. Finally, we use our approach to study changes in the amount of water on the surface of a plot of soil which feeds into model development for degradation of surface piping.


Introduction
There is considerable infrastructure within the oil and gas industry that is on the surface of the ground.These are exposed to changing weather conditions and models are used to estimate the degradation of the assets and direct monitoring.These degradation models rely on estimates of the conditions that the assets are under.As the surrounding soils evolve, so do the absorption properties of the soil.This may result in assets sitting in waterlogged soil when the models are assuming that water drains away.Thus, estimating how these soil properties vary over time, specifically regarding water absorption, is an important task.Soil scientists typically measure water absorption through ground sensors at an individual location.This is not desirable as there may be large areas where assets are located.Recent research has explored video capture to estimate changes in surface water as a surrogate for absorption.Ambient lighting due to time of day and cloud cover hamper traditional mean-based methods for identifying surface water.In this article, we develop a method for identifying changes within an image sequence (viewed as multivariate time series) that are linked to the changing presence of surface water.
One approach to modeling changing behavior in data is to assume that the changes occur at a small number of discrete time points known as changepoints.Between changes, the data can be modeled as a set of stationary segments that satisfy standard modeling assumptions.Changepoint methods are relevant in a wide range of applications including genetics (Hocking et al. 2013), network traffic analysis (Rubin-Delanchy, Lawson, and Heard 2016) and oceanography (Carr et al. 2017).We consider the specific case where the covariance structure of the data changes at each changepoint.While our focus here is on a specific application, the problem has wide applicability.For example, Stoehr, Aston, and Kirch (2020) examine changes in the covariance structure of functional Magnetic Resonance Imaging (fMRI) data, where a failure to satisfy stationarity assumptions can significantly contaminate any analysis, while Wied, Ziggel, and Berens (2013) and Berens, Weiß, and Wied (2015) examine how changes in the covariance of financial data can be used to improve stock portfolio optimization.
The changepoint problem has a long history in the statistical literature, and contains two distinct but related problems, depending on whether the data is observed sequentially (online setting) or as a single batch (offline setting).We focus on the latter and direct readers interested in the former to Tartakovsky, Nikiforov, and Basseville (2014) for a thorough review.In the univariate setting there is a vast literature on different methods for estimating changepoints, and there are a number of state of the art methods (Killick, Fearnhead, and Eckley 2012;Frick et al. 2014;Fryzlewicz 2014;Maidstone et al. 2017).
The literature on detecting changes in multivariate time series has grown substantially in the last few years.In particular, many authors consider changes in the moderate dimensional setting, that is, where the number of the parameters of the model, is of the order of the number of data points.Much of this work considers changes in expectation where the series are uncorrelated (Grundy, Killick, and Mihaylov 2020;Horváth and Hušková 2012).Furthermore a number of authors have examined detecting changes in expectation where only a subset of variables under observation change (Jirak 2015;Wang and Samworth 2018;Enikeeva and Harchaoui 2019).Separately a number of authors have considered changes in second-order structure of moderate dimensional time series models including auto-covariance and cross-covariance (Cho and Fryzlewicz 2015), changes in graphical models (Gibberd andNelson 2014, 2017) and changes in network structure (Wang, Yu, and Rinaldo 2018).
The problem of detecting changes in the covariance structure has been examined in both the low dimensional and high dimensional setting.In the low dimensional setting (p << n) Chen and Gupta (2004) and Lavielle and Teyssiere (2006) use a likelihood based test statistic and the Schwarz Information Criterion (SIC) to detect changes in covariance of normally distributed data.Aue et al. (2009) consider a nonparameteric test statistic for changes in the covariance of linear and nonlinear multivariate time series.Matteson and James (2014) study changes in the distribution of (possibly) multivariate time series using a clustering inspired nonparametric test statistic that claims to handle covariances.In the high-dimensional setting, Avanesov and Buzun (2018) and Wang, Yu, and Rinaldo (2017) study test statistics based on the distance between sample covariances, using the operator norm and ∞ norm, respectively.Crucially all of these approaches are focused on exploring the theoretical aspects of the proposed test statistics rather than the practical implications.
In this work we propose a novel method for detecting changes in the covariance structure of moderate dimensional time series motivated by the practical challenges of implementing the approach for estimating changes in soil.In Section 2, we introduce a test statistic inspired by a distance metric intuitively defined on the space of positive definite matrices.The primary advantage of this metric is that under the null hypothesis of no change, it is independent of the underlying covariance structure.This is not the case for other methods in the literature which require users to estimate this.In Section 3, we study the asymptotic properties of this test statistic when, the dimension of the data is of comparable size to (but still smaller than) the sample size.In Section 4, we use these results to propose a new method for detecting multiple changes in the covariance structure of multivariate time series.In Section 5, we study the finite sample performance of the proposed approach on simulated datasets.Finally in Section 6, we use our method to examine how changes in the covariance structure of pixel intensities can be used to detect changes in surface water.

Two Sample Tests for the Covariance
where each i,p ∈ R p×p is full rank.Furthermore, let X n,p denote an n × p matrix defined by X n,p := (X T 1 , . . ., X T n ) T .Our primary interest in this article is to develop a testing procedure that can identify a change in the covariance structure of the data over time.For now, let us consider the case of a single changepoint.We compare a null hypothesis of the data sharing the same covariance versus an alternative setting that allows a single change at time τ .Formally we have 2) where τ is unknown.We are interested in distinguishing between the null and alternative hypothesis, and under the alternative locating the changepoint τ , when the dimension of the data p, is of comparable size to the length of the data, n.In particular we require that for all pairs n, p, the set is nonempty, where > p is a problem dependent positive constant.Note T n,p ( ) defines the set of possible candidate changepoints, while is the minimum distance between changepoints or minimum segment length.Then for each candidate changepoint t ∈ T n,p ( ), a two sample test statistic T(t) can be used to determine if the data to the left and right of the changepoint have different distributions.If the two sample test statistic for a candidate exceeds some threshold, then we say a change has occurred and an estimator for τ is given by the value t ∈ T n,p ( ) that maximizes T(t).
Let X (a+1):b denote the subset of data (X T a+1 , . . ., X T b ) T and ¯ (a, b) (or ¯ (a, b)) be a plug in estimator for the covariance (or precision) of data X (a+1):b .Then to test for a changepoint at time τ , we can measure the magnitude of the matrix are sequences of normalizing constants.If the spectral norm of this matrix is large, then there is evidence for a change and vice versa.This approach is well represented in the literature, for instance Wang, Yu, and Rinaldo (2017), Aue et al. (2009), Galeano and Peña (2007) measure the difference between sample covariance estimates, while in the high-dimensional setting, Avanesov and Buzun (2018) measures the difference between debiased graphical LASSO estimates.Although this approach is intuitive, it can be challenging to use in practice.A good estimator, ¯ (a, b), will depend on the true covariance of X p+1:q which implies that the difference matrix above is dependent on the true covariance of X 1:n .As a result, any test statistic based on a difference matrix must be a function of the underlying covariance, 0 , and should be corrected to account for this.For example, Aue et al. (2009), Galeano and Peña (2007) normalize their test statistic using the sample covariance for the whole data, Avanesov and Buzun (2018) use a bootstrap procedure which assumes knowledge of the measure of X i and Wang, Yu, and Rinaldo (2017) use a threshold which is a function of * 0 .All these approaches require estimating 0 in practice.This is impractical under the alternative setting, since estimating the segment covariances requires knowledge of the changepoint.
Therefore, it is natural to ask whether there are alternative ways of measuring the distance between covariance metrics.In the univariate setting, a common approach is to evaluate the logarithm of the ratio of the segment variances (Inclan and Tiao 1994;Chen and Gupta 1997;Killick et al. 2010).This is in contrast with the change in expectation problem where it is more common to measure the difference between sample means.In the variance setting a ratio is more appropriate for two reasons.First, since variances are strictly positive, if the underlying variance is quite small then the absolute difference between the values will also be small whereas the ratio is not affected by the scale.Second, under the null hypothesis of no change, the variances will cancel and the test statistic will be independent of the variance.Thus, there is no need to estimate the variance when calculating the threshold.
We propose to extend this ratio idea from the univariate setting to the multivariate setting by studying the multivariate ratio matrix, R(A, B) := B −1 A, where A, B ∈ R p×p .Ratio matrices are widely used in multivariate analysis to compare covariance matrices (Finn 1974).In particular, we are often interested in functions of the eigenvalues of these matrices (Wilks 1932;Lawley 1938;Potthoff and Roy 1964).Here we are interested in the following test statistic, where λ j (R(A, B)) is the jth largest eigenvalue of the matrix R(A, B).The function T has valuable properties that may not be immediately obvious.
Proposition 2.1.Let 1 , 2 ∈ R p×p , be the covariance matrices of data Z 1 ∈ R n 1 ×p and Z 2 ∈ R n 2 ×p , respectively, and define T as in (2.5).Then we have that, for any covariance matrix 0 : T is symmetric with respect to inversion of matrices that is, The symmetry property is important for a changepoint analysis as the segmentation should be the same regardless of whether the data is read forwards or backward.The second property states that T is the same whether we examine the covariance matrix or the precision matrix.This ensures that differences between both small and large eigenvalues can be detected.The third property is particularly important as we can translate Proposition 2.1 from two separate datasets Z 1 , Z 2 to two subsets of a single dataset X n,p .This implies that T provides a test statistic which is independent of the underlying covariance of the data.In particular, let ¯ (a, b) be the sample covariance estimate for a subset of data X a+1:b that is, Under the null hypothesis for any 1 0 Z a:b where the covariance of Z i is the identity matrix.Then property 3 implies that for any 1 In other words, under the null hypothesis the underlying covariance cancels out (as occurs with the ratio approach in the univariate variance setting).Furthermore, due to the square term, T is a positive definite function.This is necessary to prevent changes canceling out.These properties are clearly not unique to our chosen test statistic T and in fact, there are other possible choices (such as log 2 x).However, we argue that for an alternative function T to be appropriate in the changepoint setting, it would also require these properties.Furthermore, this choice of T allows us to analytically derive relevant quantities such as the limiting moments.
It is both possible and interesting to study the properties of this test statistic in the finite dimensional setting (i.e., where p is fixed).However in this work, our focus is on problems where the dimension of the data is of comparable size to the length of the data.Under this asymptotic setting, the eigenvalues of random matrices (and by extension the properties of T) have different limiting behavior and a proper test should take this into account.For example, the two sample Likelihood Ratio Test (as used in Galeano and Peña (2007)) is a function of the log of the determinant of the covariance, or equivalently the sum of the log eigenvalues.In the moderate dimensional setting, this test has been shown to breakdown due to the differing limiting behavior (Bai et al. 2009).Therefore in the next section, we consider the properties of T as a two sample test, and derive the asymptotic distribution under moderate dimensional asymptotics.

Random Matrix Theory
We now describe some foundational concepts in Random Matrix Theory (RMT), before discussing how these ideas are used to idenfity the asymptotic distribution of our test statistic under the null hypothesis.RMT concerns the study of matrices where each entry is a random variable.In particular, RMT is often concerned with the behavior of the eigenvalues and eigenvectors of such matrices.Interested readers should see Tao (2012) for an introduction and Anderson, Guionnet, and Zeitouni (2010) for a more thorough review.
A key object of study in the field is the Empirical Spectral Distribution (ESD), defined for a p × p matrix A as where I is an indicator function.In other words the ESD of A is a discrete uniform distribution placed on the eigenvalues of A. Several authors have established results on the limiting behavior of the ESD as the dimension tends to infinity, the so-called Limiting Spectral Distribution (LSD).For example, Wigner (1967) demonstrate that if the upper triangular entries of a Hermitian matrix A have mean zero and unit variance, then F 1/ √ pA (x) converges to the Wigner semicircular distribution.
The LSD of the ratio matrix R, was shown to exist in Yin, Bai, and Krishnaiah (1983) and computed analytically in Silverstein (1985).The following two assumptions are sufficient for the LSD of an F matrix to exist.
Assumption 3.2.The sample sizes n 1 , n 2 , and the dimension p grow to infinity such that We refer to the limiting scheme described in Assumption 3.2 as n → ∞.
Let X n 1 ,p , X n 2 ,p be matrices satisfying Assumptions 3.1 and 3.2.These assumptions place restrictions on the mean of the data, and the tails of the data.The mean assumption is standard in the literature.If the data has nonzero mean, the data should be standardized, typically by removing the sample mean (assuming the mean is constant).The impact of this on our proposed method is examined in Appendix B.3, supplementary materials.Note that although the matrices X n 1 ,p , X n 2 ,p have identity covariance, these results also hold for data with general covariance p , since by property 3 of Proposition 2.1 the covariance term cancels out under the null hypothesis and we do not require knowledge of p .Furthermore, let Then Silverstein (1985) demonstrate that F n converges almost surely to the nonrandom distribution function (3.10) The LSD, F γ provides an asymptotic centering term for functions of the eigenvalues of random ratio matrices.In particular, for any function f , we have that, as n → ∞, by the definition of weak convergence.This allows us to account for bias in the statistic as seen in Figure 1.
The rate of convergence of E F n f − E F γ (f ) to zero was studied in Zheng (2012) and found to be 1/p.In particular, the authors establish a central limit theorem for the quantity, G n (x) := p F n (x) − F γ (x) .We can apply this result to our problem in order to demonstrate that our two sample test statistic converges to a normal distribution with known mean and variance terms.
be random matrices satisfying Assumptions 3.1 and 3.2 and T be defined as in (2.5).Then we have that as n → ∞, Using Theorem 3.3, we can properly normalize T such that it can be used within a changepoint analysis.In particular, we have that under the null hypothesis weakly as n, p tend to infinity, where γ t/n := (p/t, p/(n − t)) and f * is as defined in Theorem 3.3.Thus, we use the normalized test statistic, T, which under the null hypothesis converges pointwise to a standard normal random variable.The asymptotic moments of the test statistic, T, depend on the parameter γ t/n and, as t approaches p (or equivalently n − p) the mean and variance of the test statistic dramatically increase.In the context of changepoint analysis, this implies that the mean and variance increase at the edges of the data.We note that this is a common feature of changepoint test statistics.We can significantly reduce the impact of this by the above standardization.This can be seen empirically in Figure 1.After standardization, the test statistics for the series with no change, do not appear to have any structure.Similarly, the test statistics for the series with a change show a clear peak at the changepoint.Importantly we can now easily distinguish the test statistic under the null and alternative hypotheses, and this normalization does not require knowledge of the underlying covariance structure.

Practical Considerations
Before we can apply our method to real and simulated data, we need to address three practical concerns, namely we must select a threshold for rejecting the null hypothesis, determine an appropriate minimum segment length and address the issue of multiple changepoints.

Threshold for Detecting a Change
First, we need to select an appropriate threshold for rejecting the null hypothesis.We choose to use the asymptotic distribution of the test statistic on a pointwise basis, that is for each < t < n − we say that T(t) ≈ Z t , where Z t is a standard normal variable.This do not take into account whether or not we are in the limiting regime and as a result, the method may be unreliable if p is small (indeed we observe this pattern in Section 5).We then use a Bonferroni correction (Haynes 2013) to control the probability that any Z t exceeds a threshold α.In particular, for a given significance level α, we reject the null hypothesis for a single change in data of length n if T(t) > q(1 − α/n) for some < t < n − , where q(α) is the αth quantile of the standard normal distribution.In the case of multiple changepoints, we use q(1 − 2α/n(n + 1)) to account for the extra hypothesis tests.We note that a Bonferroni correction is known to be conservative and as a result, using this approach may have poor size (again results from the simulation study validate this concern).Ideally, one would take account of the strong dependence between consecutive test statistics to get a better threshold, but this is challenging given the nonlinear nature of the test statistic.Further work may wish to investigate whether finite sample results which exploit this dependence can be derived.Alternatively practitioners could use several different thresholds and ascertain the appropriate threshold for a particular application at hand as demonstrated in Lavielle (2005).

Minimum Segment Length
The test statistic proposed relies on an appropriate choice for the minimum segment length parameter, .Too small and the covariance estimates in the small segments will elicit false detection, too large and the changepoints will not be localized enough to be useful.
In many applications, domain specific knowledge may be used to increase this parameter.However, it is also important to consider smallest value that will give reliable results in the general case.The minimum segment length must grow sufficiently fast to ensure that T(t) converges to a normal distribution.Outside the asymptotic regime, it is possible for the ratio matrix to have very large eigenvalues.Thus, for candidate changepoints t close to p (or by symmetry n − p), the probability of observing spuriously large values of T(t) becomes much larger.This can be seen in Figure 2. When = p (the smallest possible value), we observe extremely large values of the test statistic, that would make identifying a true change almost impossible.However, when = 4p, the test statistic behaves reliably.Thus, we need p/(p + n,p ) to converge to γ ∈ (0, 1) or equivalently n,p = O(p) for the asymptotic results to hold.In Appendix B.1, supplementary materials, we analyze the effect of different sequences in the finite sample setting via a simulation study.Based on these results, we recommend using a default value of max{4p, 30}, however note that as p grows to a moderate size, smaller values can be taken without any corresponding decrease in performance.

Multiple Changepoints
Finally, we also consider the extension to multiple changes.In this setting, extend the single changepoint setting considered previously to a set of m unknown ordered changepoints, i ≤ τ k+1 , 1 ≤ k ≤ m + 1, and i is the covariance matrix of the ith time-vector.We are interested in estimating the number of changes m, and the set of changepoints τ .The classic approach to this problem, is to extend a method defined for the single changepoint setting to the multiple changepoint setting, via an appropriate search method such as dynamic programming or binary segmentation.For this work, we cannot apply the dynamic programming approach (Killick, Fearnhead, and Eckley 2012), which minimizes the within segment variability through a cost function for each segment.This is because our distance metric is formulated as a two-sample test and cannot be readily expressed as cost function for a single segment.Therefore, we use the classic binary segmentation procedure (Scott and Knott 1974).The binary segmentation method extends a single changepoint test as follows.First, the test is run on the whole data.If no change is found then the algorithm terminates.
If a changepoint is found, it is added to the list of estimated changepoints and, the binary segmentation procedure is then run on the data to the left and right of the candidate change.This process continues until no more changes are found.Note the threshold, v, and the minimum segment length, , remain the same.We note that a number of extensions of the traditional binary segmentation procedure have been proposed in recent years (Olshen et al. 2004;Fryzlewicz 2014).Although we do not use these search methods in our simulations, due to additional optional parameters that affect performance, it is not difficult to incorporate our proposed test statistic into these adaptations of the original binary segmentation approach.The full proposed procedure is described in Algorithm 1.

Simulations
In this section, we compare our method with existing methods in the literature, namely the methods of Wang, Yu, and Rinaldo (2017), Aue et al. (2009), and Galeano and Peña (2007), which we refer to as the Aue, Galeano and Wang methods, respectively.We do not consider Avanesov and Buzun (2018) as this method is intended for the high-dimensional setting and so would be an unfair comparison.Software implementing these methods is not currently available and as a result, we have implemented each of these methods according to the descriptions in their respective papers.All methods, simulations, visuals and analysis have been implemented in the R programming language (R Core Team 2020).The code to repeat our experiments is available at https:// github.com/s-ryan1/Covariance_RMT_simulations.
Simulation studies in the current literature for changes in covariance structure are very limited.Wang, Yu, and Rinaldo (2017) do not include any simulations.Aue et al. (2009) and Avanesov and Buzun (2018) only consider the single changepoint setting, and do not consider random parameters for the changes.Furthermore to our knowledge, no papers compare the performance of different methods.While theoretical results are clearly important, it is also necessary to consider the finite sample performance of any estimator, and we now study the finite sample properties of our approach on simulated datasets.Further details on the general setup of our simulations are given in Appendix A, supplementary materials.Note that the significance thresholds for each method are set to be favorable to competing methods and we anticipate that performance would decrease in practical settings.
We begin by analyzing the performance of our approach (which we refer to as the Ratio method) in the single changepoint setting.This allows us to directly examine the finite sample properties of the method, such as the power and size, as well as investigate how violations to our assumptions, such as autocorrelation and heavy tailed errors impact the method.We then compare our approach with current state of the art methods for detecting multiple changepoints.Results for assessing the chosen default values for the minimum segment length parameter in Section 4, as well as a comparison of different methods in the single changepoint setting, are given in Appendix B, supplementary materials.These demonstrate that, overall, the Ratio, Galeano and Aue methods are well peaked whereas the Wang method is not leading to accurate changepoint localization.In both settings, the localization of the changepoints is more accurate for our approach than the Aue et al. (2009) method while also being applicable to larger values of p.However, the Galeano method does produce more defined peaks than the Ratio method.This results in better localization but unfortunately does not produce better overall results due to the peaks not exceeding the asymptotic threshold.This could be rectified by using a smaller (nonasymptotic) threshold.Finally, comparisons for the Ratio method based on whether or not the mean is known (under the null and alternate hypotheses) are provided in Appendix B, supplementary materials.These results show that centering the data by subtracting the sample mean has a small impact on the performance of the method.
Performance Metrics.In the single changepoint setting, we are interested in whether the Ratio approach provides a valid hypothesis test.Therefore, for a given set of simulations, we measure how often the method incorrectly rejects the null (Type 1 error) and how often the method fails to correctly reject the null (Type 2 error).Furthermore, under the alternative hypothesis, we measure the absolute difference between the estimated changepoint and the true change, and refer to this throughout as the Changepoint Error.For the multiple changepoint setting, we use τ := {τ 1 , . . ., τ m } and τ := {τ 1 , . . ., τ m} to denote the set of true changepoints and the set of estimated changepoints, respectively.We say that the changepoint τ i has been detected correctly if | τj − τ i | ≤ h for some 1 ≤ j ≤ m and denote the set of correctly estimated changes by τ c .Then we define the True Positive Rate (TPR) and False Positive Rate (FPR) as follows, A perfect method will have a TPR of 1 and FPR of 0. We set h = 20 although it should be noted that in reality the desired accuracy would be application specific and dependent on the minimum segment length l.Although, while the specific values vary with h the conclusions of the study do not.We also consider whether or not the resulting segmentation allows us to estimate the true underlying covariance matrices, and define the Mean Absolute Error (MAE) as

Finite Sample Properties
We begin by examining the performance of our proposed approach on normally distributed datasets with length n = {500, 1000, 2000, 5000} and dimension p = {10, 50, 100}.Based on results from Section 4.2, the minimum segment length was set to 4p.For each n, p pair, we generated 1000 datasets with a single change at time n/2 as follows X i ∼ N (0, I p ) for 1 ≤ i ≤ n/2 and X i ∼ δN (0, I p ) for i > n/2, (5.12) where I p is the identity matrix of dimension p and δ = {1, 1.05, 1.1, 1.15, 1.2}.Note since our approach is invariant to the covariance of the data under the null hypothesis, this is equivalent to generating the data with some unknown covariance.We use the approach described in Section 4.1 to select the threshold with α = 0.05.A histogram of the test statistic values under the null hypothesis (δ = 1) for n = 2000 and p = 10, 50 is shown in Figure 3.For p = 10, we observe a right-skewed distribution, however, this effect is not present for p = 50, indicating that we may not entered the limiting regime when p = 10.However, when we compute the FPR for each n, p pair (Table E.4, supplementary materials) we note that across all dimensions, we observe low numbers of false positives.In particular, for p = 50 and p = 100 the method is conservative.This highlights an important difference between theoretical convergence and practical performance.Although at p = 10 the theoretical limiting distribution has not been reached, the practical results show that the false positive rate is not inflated.We measure the power and accuracy of our method via the True Positive Rate (TPR) and the absolute difference between the estimated change and the true change (Changepoint Error).The TPR is given in Table E.4, supplementary materials for δ = 1.1.
As n, p increases the probability of detection increases.However, for smaller values of n/p, the method can have less power, such as n = 1000, p = 100 and n = 500, p = 50.In these cases, using less data gives a better detection rate, implying that the method inefficiently uses the data and a high dimensional approach may be preferable.Changepoint errors are given in Figure 3.As n, p, δ increase, the method more accurately locates the changepoint, as we would expect.
Serial Dependence.Our method does not allow for dependence between successive data points.To measure the impact of serial dependence, we generated data,  in the univariate changepoint literature (Shi et al. 2021) and can be mitigated by scaling the threshold by the autocorrelation observed.
Similarly the power and accuracy of our method decreases when changepoints are present.Figure 4 shows the separation between test statistic value under the null and alternative hypothesis.If these values are well separated then the method will have good power given a valid threshold.We find that the separation between the null and alternative distributions decreases and thus changepoint error increases as φ increases.These results show that as autocorrelation increases, our method becomes less accurate as expected.
Model Misspecification.Our method places assumptions on the data which may not hold in practice.To measure the impact of this we generated data 5), n = 2000, p = 50 and δ = 1, 1.1, 1.2.Note the exponential and t-distributions do not satisfy Assumption 3.1 and as a result, the threshold is invalid producing FPRs of 0.512 and 0.248, respectively.Interestingly, although the t-distribution has heavier tailed errors than the exponential distribution, it gives a lower FPR.This is likely due to the skewness of the exponential distribution.We find that the method has less power for the heavier tailed distributions (Figure 4), as again, the distributions under the null and alternative overlap more.This pattern is repeated for the changepoint error.Thus, the method will be less accurate in the presence of heavy tailed errors.Note that the test statistic can still be used for heavy tailed data, the threshold for identification of a change simply needs to be modified to account for this.This could be achieved by theoretically identifying the new limiting distribution or by using data-driven threshold methods as in Lavielle (2005).

Multiple Change Points
We now compare the Ratio approach with other methods on simulated datasets with multiple changepoints.We consider datasets with 4 changepoints, uniformly sampled with minimum segment length p log n, where p = {3, 10, 30, 100} and n = {200, 500, 1000, 2000, 5000}.Covariance matrices are generated so that the distance between consecutive covariance matrices is sufficiently large to detect a change.We consider two separate metrics, where d 1 matches the assumptions from Wang, Yu, and Rinaldo (2017), while d 2 matches the distance metric used by T. As such, the first set of simulations should favor the Wang method.Full details for how the covariances were generated are provided in Appendix A.2, supplementary materials.For each (n, p) pair and distance metric, we generated 1000 datasets and applied our method, the Aue et al. ( 2009) method, the Wang, Yu, and Rinaldo (2017) method and Galeano and Peña (2007) method to each dataset.Due to its computational complexity, we do not run the Aue method for p > 10.Using the resulting segmentations, we then calculated the error metrics for each method and report these in Tables 1 and 2. The worst performers across all metrics are the Galeano and Wang methods.Notably the true positive rate for the Wang method decreases as p grows.This is in striking contrast with the other methods which become more accurate for larger values of p as one may expect.This may be due to the fact that the Wang method only considers the first principal component of the difference matrix, ignoring the remainder of the spectrum or the bias issue identified in Appendix B.2, supplementary materials.The high false positive rate indicates that adapting the threshold would not lead to improved results.The Galeano performance is expected due to the threshold issue highlighted in the single changepoint section (Appendix B.2, supplementary materials) being exacerbated by the smaller sample sizes between changepoints.What is not expected is the larger increase in FPR relative to TPR.This indicates that the test statistic may struggle to maintain its clear peaks when multiple changepoints are present.The Ratio and Aue methods are more closely matched.We can see that the Ratio method is the more conservative of the two, producing a lower FPR and corresponding lower TPR when p and n are smaller.This is unsurprising since our results in the single changepoint case found that the Ratio method can be less reliable when p < 10.For scenarios with p > 30, the Ratio approach is extremely accurate.This indicates that for problems with smaller dimensions, the Aue method may be preferable, while the approach presented here is suitable for larger dimensions.

Detecting Changes in Moisture Levels in Soil
In this section, we investigate whether changes in the covariance structure of soil data correspond with shifts in the amount of moisture on the soil.There is significant interest in developing new techniques to better understand when water is present, how water is absorbed and the general modeling of surface water (Cândido et al. 2020;Burak et al. 2021).This is an important question and is relevant to a variety of industrial applications such as farming, construction and the oil and gas industry (Hillel 2003).A widely used approach is to place probes at different depths and locations in the soil which measure the level of moisture.To measure across a site more easily, scientists are investigating the use of cameras to capture the soil surface as a surrogate for moisture.
We analyze images from a preliminary experiment studying moisture on the surface of the soil provided privately to the authors by Anette Eltner from TU Dresden following discussions around modeling challenges for soil erosion with the Lancaster Environment Centre.A camera was placed over a large plot of soil and took a set of 589 pictures over the experiment.The images are taken at 10 sec intervals of a total of just over 98 min.At different times, different amounts of rainfall are artificially added to the field site and the amount of water on the soil surface changes (Eltner et al. 2017).We wish to segment the data into periods of dry, wet and surface water.The intensity of a set of pixels over time is shown in Figure 5 .We can see that the mean level is clearly nonstationary.This nonstationary behavior may be attributed to two causes, changes in the background light intensity (due to a cloud passing by) and changes in the wetness of the soil which changes how much light is reflected.Since changes in the mean intensity are not necessarily associated with changes in the wetness, we instead focus on changes in the covariance structure.If pixels become wet we would expect that the correlation between the pixels should increase as they become more alike as the surface becomes uniformly water instead of the variable soil surface.Thus, changes in the covariance structure of the pixels may correspond with changes in the wetness.
The data consists of 589 images and we analyze two subsets with p = 10, 30 pixels within a target area of the image.The original images are in color but were transferred to grayscale for computational purposes.We run a multiple changepoint   49, 77, 244, 347, 452, 493, 532,562 64, 125, 184, 255, 340, 450, 527 NOTE: The dimension of the larger subset means the Aue method can detect at most one changepoint.
analysis on the smaller subset using our approach as well as the Aue and Wang methods.We also ran a multiple changepoint analysis on the larger subset.
In order to analyze the covariance structure of the data, we first need to transform the data to have stationary mean.Estimating the mean of this series is challenging as there is stochastic volatility and the smoothness of the function appears to change over time.As a result, standard smoothing methods such as LOESS and windowed mean estimators may be inappropriate.We use a Bayesian Trend Filter with Dynamic Shrinkage (Kowal, Matteson, and Ruppert 2019), implemented in the DSP R package (Kowal 2020), which is robust to these issues.We then take the residuals.The transformed data for a subset of the pixels can be seen in Figure 5.The minimum segment length is set to 25. Considering the results in Figure B.1, choosing a minimum segment length of 25 will increase the false positive rate under no changepoints.We need to balance this against the need to detect potentially short-lived periods of change.The thresholds for significance for each method were again set to the defaults as discussed in the previous section.The results of this analysis are shown in Table 3.
To validate our results we worked with scientists studying this data and identified three clear time points where there is a substantial change in the amount of water on the surface at the relevant pixels.The first change is somewhat gradual going from very dry at time t = 64 to very wet from time t = 76.The second and third changes are more abrupt, with a substantial increase in the amount of water at time t = 350 and a corresponding sharp decrease at time t = 450.The Aue method reports 7 changepoints, the Wang method reports 5 changepoints and our method locates 8 changepoints.All methods detect the first and last changes.However, the Wang method does not detect any change near the second anticipated changepoint.All of the methods appear to overfit changepoints, in the sense that they report changes that do not correspond with clear changes in the amount of water on the surface.For our method and the Aue method, the majority of these overfitted changes occur when the soil is dry (before t = 64 and after t = 450).During these periods the scientists were moving around the site increasing the variability in the amount of light exposure from image to image which may explain these nuisance changes.However, we also acknowledge that, for the Ratio method, this could be due to the modified segment length increasing the false alarm rate.
For the larger dimension dataset, the minimum segment length was set to 60 (2p) and the thresholds were set to their defaults.As p is of a moderate size, the 4p minimum segment length is quite conservative (Figure B.1) so we can reduce the minimum segment length without increasing the false positive rate.For p = 30, a minimum segment length of 2p gives a null false positive rate around 0.05.The results were broadly similar for our method and quite different for the Wang method.Our approach reports 6 changes again detecting the three obvious changes in the video.We note that the reduced number of changepoints is primarily due to the increased minimum segment length.The Wang method only reports a single changepoint.This drop in reported changes is caused by the largest eigenvalue of the sample covariance being much larger.As a result, the threshold for detecting a change is 3.5 times larger to account for this and consequently, it appears that the method loses power.
On a practical note, when reducing the minimum segment length one needs to be careful on interpretation of changepoints that are closer than 4p.If the changes in covariance structure are small then these could be false alarms due to the variance of the covariance estimate for small samples rather than true changepoints.Practitioners can use Figure B.1 to guide decisions around minimum segment lengths and the risk of false alarms.

Conclusion
In this work, we have presented a novel test statistic for detecting changes in the covariance structure of moderate dimensional data.This geometrically inspired test statistic has a number of desirable properties that are not features of competitor methods.Most notably our approach does not require knowledge of the underlying covariance structure.We use results from Random Matrix Theory to derive a limiting distribution for our test statistic.The proposed method outperforms other methods on simulated datasets, in terms of both accuracy in detecting changes and estimation of the underlying covariance model.We then use our method to analyze changes in the amount of surface water on a plot of soil.We find that our approach is able to detect changes in this dataset that are visible to the eye and locates a number of other changes.It is not clear whether these changes correspond to true changes in the surface water and we are investigating this further.
While our method has a number of advantages, it is important to recognize some limitations.First, our method requires calculating the inverse of a matrix at each time point, which is a computationally and memory intensive operation.As a result, our approach is challenging for larger datasets that can be considered by other methods, which only require the first principle component.This could be mitigated by novel solutions to solving inverse matrices alongside GPU computation.However, as we demonstrate through simulations, there are a wide range of settings where our method produces considerable better results for a marginal increase in computational time.Finally we note that a limitation of our method is that the minimum segment length is bounded below by the dimension of the data.This means that the method cannot be applied to tall datasets (p > n) or datasets with short segments.It may be that our method could be adapted to these settings by applying recent advances in high dimensional covariance estimation, and replacing the empirical estimate of the covariance matrix with these.

Figure 4 .
Figure 4. Clockwise from Top Left: Test statistic values for AR(1) data with δ = 1, 1.1, dashed line indicates threshold.Changepoint Error for AR(1) data (δ = 1.1).Changepoint Error for different error distributions.Test values for different error distributions, dashed line indicates threshold.As autocorrelation and probability of outliers increases the method becomes less accurate.

Figure 5 .
Figure 5. (a) Raw grayscale intensities for three pixels.(b) Standardized intensities for the same three pixels.

Table 1 .
Results (to 2dp) from multiple changepoint simulations based on Ratio constraints described in Appendix A.2, supplementary materials.For smaller values of p, the Ratio provides lower FPR and lower TPR than the Aue method.However, for larger values of p the Ratio method is the top performer, indicating a tradeoff between the methods.Confidence intervals for mean values are provided in TableE.2,supplementary materials.The bold values highlight the best performer in each comparison.

Table 2 .
Wang, Yu, and Rinaldo (2018)point simulations based on assumptions inWang, Yu, and Rinaldo (2018)described in Appendix A.2, supplementary materials.For smaller values of p, the Ratio provides lower FPR and lower TPR than the Aue method.However, for larger values of p the Ratio method is the top performer, indicating a tradeoff between the methods.Confidence intervals for mean values are provided in TableE.3,supplementary materials.The bold values highlight the best performer in each comparison.

Table 3 .
Detected changepoints for each of the three methods when applied to the soil image data.