Bounds for the Distance to the Nearest Correlation Matrix

. In a wide range of practical problems correlation matrices are formed in such a way that, while symmetry and a unit diagonal are ensured, they may lack semideﬁniteness. We derive a variety of new upper bounds for the distance from an arbitrary symmetric matrix to the nearest correlation matrix. The bounds are of two main classes: those based on the eigensystem and those based on a modiﬁed Cholesky factorization. Bounds from both classes have a computational cost of O ( n 3 ) ﬂops for a matrix of order n but are much less expensive to evaluate than the nearest correlation matrix itself. For unit diagonal A with | a ij | ≤ 1 for all i (cid:3) = j the eigensystem bounds are shown to overestimate the distance by a factor at most 1 + n √ n . We show that for a collection of matrices from the literature and from practical applications the eigensystem-based bounds are often good order of magnitude estimates of the actual distance; indeed the best upper bound is never more than a factor 5 larger than a related lower bound. The modiﬁed Cholesky bounds are less sharp but also less expensive, and they provide an eﬃcient way to test for deﬁniteness of the putative correlation matrix. Both classes of bounds enable a user to identify an invalid correlation matrix relatively cheaply and to decide whether to revisit its construction or to compute a replacement, such as the nearest correlation matrix.


Introduction.
In many applications involving statistical modeling the first step in the analysis is to compute the sample correlation matrix-a real symmetric positive semidefinite matrix with unit diagonal-from empirical or experimental data [28, p. 25].Because of missing observations, indefinite approximations to the sample correlation matrix frequently arise [19, sec. 2.2].A lack of definiteness is also encountered in several other contexts.Stress testing in finance requires certain elements of a valid correlation matrix to be replaced by new values, which may result in the new matrix becoming indefinite [11], [23].The sensitivity of sample correlation matrices to outliers in the data has led to the development of robust estimators.Devlin, Gnanadesikan, and Kettenring [9] propose several possibilities and note that some methods that compute the estimator in an elementwise manner can produce matrices with negative eigenvalues.
A further example of how indefiniteness can arise is in aggregation methods used in large-scale resource assessment, for example in geology [5] or finance [1].These methods combine reliable estimates of correlation matrices for each group, say a geographical region or a market portfolio, into a global correlation matrix.The combination is achieved either by embedding small, "within group" correlation matrices as diagonal blocks into a crudely estimated global correlation matrix, or by constructing a block-diagonal matrix from the individual group correlation matrices and filling out the off-diagonal blocks by assigning the "between group" correlation coefficients according to expert judgement.Again, there is no guarantee that the newly constructed matrix is in fact positive semidefinite.
A popular approach to correcting an indefinite approximation A ∈ R n×n to a correlation matrix is to replace A by the nearest correlation matrix in the Frobenius norm, that is, by (1.1) ncm(A) = argmin{ A − X F : X is a correlation matrix }, where A 2 F = i,j a 2 ij .The first method with guaranteed convergence for solving (1.1) was the alternating projections method proposed by Higham [15,Alg. 3.3], which iteratively projects onto the set U n of matrices with unit diagonal and onto the convex cone S n of symmetric positive semidefinite matrices, supplementing the projection onto S n with a correction due to Dykstra [10].The rate of convergence of this method is at best linear but we have recently shown that the convergence can be accelerated significantly using Anderson acceleration [16].The alternating projections method remains widely used in applications (see, for example, the references in [16], [17]), though a faster, globally quadratically convergent Newton algorithm was later developed by Qi and Sun [24], to which practical improvements were made by Borsdorf and Higham [6].The algorithm of Borsdorf and Higham requires an eigendecomposition of a symmetric matrix on each iteration and so it costs at least 10n 3 /3 flops per iteration.The Newton algorithm typically needs about 7 iterations, so the total cost to compute ncm(A) is at least 70n 3 /3 flops.
While methods for computing the nearest correlation matrix to a given symmetric matrix A are well developed, little attention has been given to estimating the distance d corr (A) = A − ncm(A) F without computing ncm(A).Importantly, the iterates produced by the alternating projections method are not themselves correlation matrices as (with P denoting projection) the matrix P Un (P Sn (X)) might be indefinite and the matrix P Sn (P Un (X)) might not have an exactly unit diagonal.For the Newton method, the iterates do not satisfy the constraint of having a unit diagonal, as discussed in [6, sec. 3.4].Hence for both methods the iterates do not provide upper bounds on d corr (A).As the Newton method solves the dual problem of (1.1) [24], on each iteration the value of the dual function provides a lower bound for d corr [20].
Our goal is to obtain bounds on d corr (A) that are inexpensive to compute and are of the correct order of magnitude.Indeed bounds correct to within a small constant factor are entirely adequate for practical applications.In this work we summarize the few existing bounds for d corr (A) and derive several new upper bounds.While the best bounds have a computational cost of O(n 3 ) flops they are significantly less expensive to compute than ncm(A) itself.
Bounds on d corr (A) that can be easily evaluated using standard computational tools will certainly be of interest to practitioners, as illustrated by the discovery by Xu and Evers [35] that several matrices thought to be correlation matrices in the work of Tyagi and Das [33] actually had some negative eigenvalues.While attempting to compute the Cholesky factorization is sufficient to determine whether a matrix is positive semidefinite, we propose using a modified Cholesky factorization instead.The standard and modified Cholesky factorizations have the same computational cost, but for an indefinite matrix modified Cholesky factorization provides additional information that can be used to construct an upper bound on d corr (A); this bound can help the user to decide whether to revisit the construction of the matrix, perhaps by acquiring more data or by refining the statistical analysis.In our experiments, the best modified Cholesky bound is at most two orders of magnitude larger than d corr (A).
Sharper bounds are available based on spectral information.We present several bounds based only on the knowledge of the eigenvalues of A, but the best bound in this class, which in our experiments is at most one order of magnitude larger than d corr (A), uses the nearest positive semidefinite matrix to A and so a knowledge of eigenvectors is also required.
The paper is organized as follows.In section 2 we summarize existing upper and lower bounds on the distance to the nearest correlation matrix.In section 3 we derive our new upper bounds and give a result bounding the overestimation by a factor that does not exceed 1 + n √ n.We analyze the computational cost of the bounds in section 4. In section 5 we illustrate the quality of the bounds on several small examples of invalid correlation matrices found in the literature and on larger, real-life matrices from practical applications.Conclusions are given in section 6.

Existing bounds.
We first summarize currently available bounds for the distance to the nearest correlation matrix.We will need the following result on the nearest positive semidefinite matrix found in, for example, [14, Thm.Lemma 2.1.Let A ∈ R n×n be symmetric with spectral decomposition A = QΛQ T , where Q = [q 1 , . . ., q n ] is orthogonal and Λ = diag(λ i ).Then the unique solution to min{ A − X F : X is symmetric positive semidefinite } is (2.1) The next result is [15, Lem.1.1].
Lemma 2.2 (Higham).For symmetric A ∈ R n×n with eigenvalues λ i , where The lower bound α 1 follows from the fact that the elements of a correlation matrix are bounded in modulus by 1.The equivalence of the two formulas for α 2 is shown by Lemma 2.1.The upper bounds in Lemma 2.2 are obtained as the distance to certain classes of correlation matrices.In particular, β 3 arises from the matrices with c 2016 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 08/22/16 to 86. 22.174.207.Redistribution subject to CCBY license (i, j) element ρ |i−j| , known as Kac-Murdock-Szegő Toeplitz matrices [31], which are positive semidefinite for −1 ≤ ρ ≤ 1.
Travaglia [30,Prop. 3.1] obtained a further lower bound on d corr (A) using the circulant mean A c , defined as the circulant matrix with first row (c 0 , c 1 , . . ., c n−1 ), where This lower bound and a trivial upper bound are combined in the next result.
3. New bounds.In this section we derive new upper bounds on the distance to the nearest correlation matrix that do not require the solution to a minimization problem, unlike the bounds β 2 and β 3 from Lemma 2.2 and the upper bound in Lemma 2.3.Our first bound is the distance to the correlation matrix obtained by scaling A + from (2.1) to have unit diagonal.
Theorem 3.1.Let A ∈ R n×n be symmetric with positive diagonal elements.Then where Proof.The lower bound is α 2 in (2.3).The upper bound is immediate if we can show that A + is a correlation matrix.The only question is whether it is defined, that is, whether the positive semidefinite matrix A + has positive diagonal elements, so that D is nonsingular and positive definite.From Lemma 2.1 we see that A + − A is positive semidefinite and it follows that the diagonal elements of A + are at least as large as the corresponding diagonal elements of A, and hence they are positive.
In the next result we obtain an alternative upper bound that, while weaker than that in (3.1), is less expensive to compute, as we explain in section 4. Note that the theorem is valid for t = n, that is, for a positive semidefinite matrix A. Theorem 3.2.Let A ∈ R n×n be symmetric with positive diagonal elements and eigenvalues Proof.The lower bound is (2.3).By Theorem 3.1 and the triangle inequality we have Let m = min i a ii and M = max i a ii .With e i the ith unit vector and Λ − = diag(min(λ i , 0)), we have With y = Q T e i , we have y 2 = 1 and δ i = y T Λ − y ≤ 0, and we can bound δ i below by Note that we must have min(λ n , 0) for the first equality in the previous line to hold, since λ n could be positive.It follows that m ≤ d i ≤ M − min(λ n , 0) for every i and so 1 The upper bound in (3.2) then follows from (3.3).
For t = n, Theorem 3.2 yields the following corollary, which quantifies the effect on the distance d corr (A) of the departure of the diagonal elements of a positive semidefinite matrix A from 1.
Corollary 3.3.Let A ∈ R n×n be symmetric positive semidefinite with positive diagonal elements.Then If all the diagonal elements of a positive semidefinite matrix A are at most 1 then ncm(A) is easy to compute directly as it is obtained from A by replacing each diagonal element by 1.However, (3.4) applies more generally.
In many applications the invalid approximation to a correlation matrix has unit diagonal and at least one negative eigenvalue.In this case Theorem 3.2 simplifies as follows.
Corollary 3.4.Let A ∈ R n×n be symmetric with unit diagonal and eigenvalues The next result gives a sharper upper bound than (3.5).The proof uses the idea of shrinking from Higham, Strabić, and Šego [17].
Theorem 3.5.Let A ∈ R n×n be symmetric with unit diagonal and smallest eigenvalue λ n < 0. Then and this bound is no larger than the upper bound in (3.5).
Now we compare the bound with (3.5).The triangle inequality gives where A + from (2.1) is the nearest positive semidefinite matrix to A. For the second term, we have We noted in the proof of Theorem 3.1 that A + − A is positive semidefinite, and since A has unit diagonal it follows that trace(A + ) ≥ n.Therefore, −2 trace(A + ) + n ≤ −n < 0 and so This completes the proof, since the right-hand side of the latter inequality is the upper bound in (3.5).
Note that the bound (3.6) is also sharper than β 1 given in (2.4).We now have several upper bounds and a natural question is "how sharp are they?"For the most practically important case of A with unit diagonal, the next result gives a limit on the overestimation for the upper bound of Theorem 3.1.
Theorem 3.6.Let A ∈ R n×n be symmetric with unit diagonal, t nonnegative eigenvalues, largest eigenvalue λ 1 , and smallest eigenvalue λ n < 0.Then, in the notation of Theorem 3.1, Proof.Using the triangle inequality and the relation A = A + + A − we have As in the proof of Theorem 3.2, we have , it follows that (noting that the assumptions of the theorem imply λ 1 > 0) Substituting this bound into (3.10)yields (3.8).The bound (3.9) follows because λ 1 ≤ n for any matrix with elements bounded in modulus by 1.
It is easy to see that the upper bound (3.8) also holds for the ratio of the upper and lower bounds from Corollary 3.4, and hence also for the ratio of the shrinking bound (3.6) and (2.3), by Theorem 3.5.Moreover, we have Another way to obtain an upper bound on the distance to the nearest correlation matrix is to modify Theorem 3.1 by replacing the nearest positive semidefinite matrix A + by a more cheaply computable approximation to A + .To construct such an approximation we will use modified Cholesky factorizations, originally developed in the context of nonlinear optimization as a way of dealing with indefinite Hessians.Given a symmetric, possibly indefinite matrix A these algorithms construct a factorization , [27] produce a diagonal D and a diagonal E, while the algorithm of Cheng and Higham [8] produces a block diagonal D with diagonal blocks of order 1 or 2 and an E that is generally full.The cost of all three algorithms is the same as the cost of computing the Cholesky factorization to highest order terms, which is substantially less than the cost of computing A + .Note that bounds based on modified Cholesky factorizations provide an efficient way to determine whether the matrix is positive semidefinite to start with, as in this case E = 0.
Theorem 3.7.Let A ∈ R n×n be symmetric with positive diagonal elements.Then where Proof.The proof is essentially the same as that of the upper bound in Theorem 3.1 and uses the fact that the diagonal elements of A mc are at least as large as those of A.
As a final new upper bound on d corr (A) we make use of one of the rare explicitly known solutions to the nearest correlation matrix problem, for the so-called one parameter model.Here, a matrix C(w) ∈ R n×n is defined for a real parameter w as a unit diagonal matrix with all off-diagonal elements equal to w:

Computing the bounds.
The main criteria for judging a bound are its cost and its accuracy.In this section we discuss the cost of the bounds presented above and in the next section we carry out numerical experiments to test their accuracy.
Table 4.1 summarizes the bounds, their applicability, and their cost.We will comment only on the nontrivial entries in the table.
We can evaluate the lower bound α 2 in (2.3) and the upper bounds in (3.2) and (3.5) without computing A + explicitly, but rather by computing all the positive eigenvalues or all the negative eigenvalues-whichever are fewer in number-and then using We can assume t ≥ n/2 without loss of generality and therefore we compute the n − t negative eigenvalues by tridiagonalizing A at the cost of 4n 3 /3 flops [13, p. 459] and then computing the n − t negative eigenvalues of the tridiagonal matrix by the bisection method at a cost of O(n(n − t)) flops [3, p. 50], which makes the total cost for the bounds α 2 , (3.2), and (3.5) at most 4n 3 /3 flops.The cost of (3.6) is the same.
As noted in [15], computing the upper bound β 2 from Lemma 2.2 is equivalent to maximizing z T Az over all vectors z with elements ±1, which is an NP-hard problem [25].For a matrix A of size n there are 2 n positive semidefinite matrices zz T for such z, which makes an exhaustive search algorithm unfeasible unless n is very small.
A formula for the distance d corr (A c ) in Lemma 2.3 is given in [30,Thm. 4.1].However, it requires not only all the eigenvalues but also their multiplicities, which are not reliably computable in floating point arithmetic.We therefore have to regard d corr (A c ) as no more easily computable in general than d corr (A), and so the bounds of Lemma 2.3 are of limited interest.
Next we turn to the bound (3.1), which requires A + , and hence A + .Recall that we order the eigenvalues and assume without loss of generality that t ≥ n/2 so that the majority of eigenvalues are nonnegative.We first compute the tridiagonalization A = QT Q T and do not form Q explicitly but keep it in factored form.By applying bisection and inverse iteration to the tridiagonal matrix T we compute λ t+1 , . . ., λ n and the corresponding eigenvectors, which are placed in the columns of Z = [z t+1 , . . ., z n ].We then compute the matrix t) and apply Q to get B = QW .Finally, A + = A+BB T .The total cost is 4n 3 /3 flops for T , O(n2 ) flops to compute λ t+1 , . . ., λ n and form Z and W , 2n 2 (n − t) flops1 to form B [13, sec.5.1.6],and n 2 (n − t) flops to form A + , exploiting symmetry throughout.The total cost is therefore 4n 3 /3 + 3n 2 (n − t) ≤ 4n 3 /3 + 3n 3 /2 = 17n 3 /6 flops.
Three different bounds are obtained from (3.12), corresponding to the three different modified Cholesky algorithms.While E in (3.11) is explicitly produced by the

7) As dcorr(A)
algorithms of Gill, Murray, and Wright, and Schnabel and Eskow, the algorithm of Cheng and Higham does not explicitly produce E, so this algorithm requires an extra matrix multiplication L • DL T .The cost stated in Table 4.1 includes the latter step.Alternatively, A − P LDL T P T can be estimated in O(n 2 ) flops, using the algorithm of [18] for the 1-norm or the power method for the 2-norm.
In [15] an approximation for β 3 from Lemma 2.2 was computed as the approximate local minimum obtained with the MATLAB fminbnd minimizer.We propose an alternative.Note that the function we are minimizing for the bound β 3 is a polynomial in the variable ρ: We compute the stationary points of f , that is, the zeros of the derivative which has degree 2n − 3. Then we obtain β 3 as the minimum value of f over all stationary points in [−1, 1] along with the endpoints ±1.The dominant cost for this bound is computing the stationary points, which are the real eigenvalues of a companion matrix of order 2n − 3 in [−1, 1]; these can be computed in O(n 2 ) flops [2].
To summarize, we can separate the new upper bounds into three main categories.The most expensive bound to compute is (3.1) and it uses A + ; the less expensive bounds (3.2), (3.5), and (3.6) are based on the knowledge of eigenvalues only; and the least expensive bound is the modified Cholesky bound (3.12), which has half the cost of the eigenvalue-only based bounds.In the next section we analyze the accuracy of the bounds on examples collected from the literature and real-life applications.1].Although thought by those authors to be correlation matrices, as pointed out by Xu and Evers [35] they have some negative eigenvalues.usgs13 A matrix of order 94 corresponding to carbon dioxide storage assessment units for the Rocky Mountains region of the United States that was generated during the national assessment of carbon dioxide storage resources [34].RiskMetrics1-RiskMetrics6 Six matrices from the RiskMetrics database, as used in [6] and [16].Each matrix has dimension 387.cor1399, cor3210 Two large matrices constructed from stock data, the first of order 1399 and the second of order 3120.The matrices were provided by investment company Orbis.The nearest correlation matrix required to determine the true distance d corr (A) is computed by the code nag_correg_corrmat_nearest (g02aa) from the NAG Toolbox for MATLAB Mark 24 [22], which implements the preconditioned Newton algorithm of [6], and we chose tolerance tol = 10 −10 .
In our first test we analyze the performance of the modified Cholesky algorithms used for the bound (3.12) for all the indefinite matrices listed above.In Table 5.1 the matrix A mc from (3.12) corresponding to the algorithms of Gill, Murray, and Wright [12, sec.4.4.2.2], Schnabel and Eskow [26] and [27], and Cheng and Higham [8] is denoted by GMW, SE90, SE99, and CH, respectively.The results show two main features.First, the four modified Cholesky algorithms provide bounds of similar quality for all but the RiskMetrics matrices, and these bounds are often of the correct order of magnitude.Second, for all the RiskMetrics matrices except RiskMetrics4, d corr (A) is relatively small and the revised Schnabel and Eskow [27] algorithm and the Cheng and Higham algorithm provide bounds three or four orders of magnitude smaller than those from the other two algorithms.Since the Cheng and Higham algorithm gives the best bounds overall, we use it in the remaining experiments.
We next compute all our bounds.The results are given in Tables 5.2 and 5.3.The ordering of the bounds is the same as in Table 4.1, but note that we exclude the bound (3.5) as for our test matrices it is the same as (3.2).The bound α 1 is zero for all examples where a ii ≡ 1 and |a ij | ≤ 1.Moreover, the circulant mean A c of several of these matrices turns out to be a correlation matrix and so d corr (A c ) = 0 in (2.7).operations, are poor in these tests, the more so when the distance is small.

6.
Conclusions.This is the first thorough treatment of upper and lower bounds for the distance of a symmetric matrix A to the nearest correlation matrix.For the most common case in practice, in which A is indefinite with unit diagonal and |a ij | ≤ 1 for i = j, we have obtained upper bounds (3.1), (3.5), and (3.6) that differ from the lower bound (2.3) by a factor at most 1 + n √ n.For the sharpest bound (3.1) we found the ratio to be always less than 5 in our experiments with matrices of dimension up to 3120, so the upper bound was demonstrably of the correct order of magnitude in every case.The cost of computing the pair (2.3) and (3.1) is 17n 3 /6 flops, which is substantially less than the 70n 3 /3 or more flops required to compute ncm(A) by the preconditioned Newton algorithm of [6].
The upper bound (3.6) based on shrinking has about half the cost of (3.1) and, while less sharp than (3.1), it still performed well in our tests.
The modified Cholesky bound (3.12) has the attraction that it provides an inexpensive test for definiteness (n 3 /3 flops) along with an upper bound (costing another n 3 /3 flops) that, while sometimes two orders of magnitude larger than (3.1), can still provide useful information.
We conclude that our bounds are well suited to gauging the size of d corr (A).The information they provide enables a user to identify an invalid correlation matrix relatively cheaply and to decide whether to revisit its construction or proceed to compute a replacement directly from it.A natural replacement is the nearest correlation matrix itself; an alternative is to use shrinking [17].
Our test matrices can be downloaded in MATLAB form from https://github.com/higham/matrices-correlation-invalid, with the exception of the RiskMetrics matrices, which we do not have permission to distribute.The MATLAB implementation of the modified Cholesky algorithm from [8] that we used is available from https://github.com/higham/modified-cholesky.

(3. 11 )
P T (A + E)P = LDL T , where P is a permutation matrix, L is unit lower triangular, and E and A + E are positive semidefinite.The algorithms of Gill, Murray, and Wright [12, sec.4.4.2.2] and Schnabel and Eskow [26]

2
is a correlation matrix with A mc = A + E from the modified Cholesky factorization (3.11) and D = diag(A mc ).

Table 4 . 1
Approximate cost in flops of the bounds for a symmetric A ∈ R n×n .For the bound α 1 , k is the number of elements |a ij | > 1, i = j.For the bound(3.4),m = min i a ii and M = max i a ii .
c 2016 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 licenseDownloaded 08/22/16 to 86.22.174.207.Redistribution subject to CCBY license 5. Numerical experiments.Our tests were carried out in MATLAB R2014a on a machine with an Intel Core i7-4910MQ 2.90GHz processor and 16GB RAM.As test matrices we use the following indefinite symmetric matrices with unit diagonal.high02 A matrix of order 3 from Higham [15, p. 334].tec03 A matrix of order 4 from Turkay, Epperlein, and Christofides [32, Ω on p. 86].bhwi01 A matrix of order 5 from Bhansali and Wise [4, sec.2, second matrix].mmb13 A matrix of order 6 constructed from foreign exchange trading data supplied by the Royal Bank of Scotland [21, p. 36].fing97 A matrix of order 7 from Finger [11, Table 4].tyda99R1-tyda99R3 The matrices R 1 , R 2 , and R 3 of order 8 from Tyagi and Das [33, Table

Table 5
Downloaded 08/22/16 to 86.22.174.207.Redistribution subject to CCBY license Several observations can be made about the results.(a) Of the lower bounds, only (2.3) provides useful information.Moreover, in all examples this bound is within a factor 2.4 of d corr (the worst case being cor3120).(b) Of the upper bounds, (3.1)-the most expensive bound to compute-is the most accurate and is always within a factor 4 of d corr (the worst case being RiskMetrics2).(c) Over all the test matrices, the upper bound (3.1) exceeded the lower bound (2.3) by at most a factor 4.9 (the worst case being cor3120).(d) Of the other eigenvalue-based upper bounds, the bound from shrinking (3.6) is better than the bound (3.2), as we already know from Theorem 3.5.The shrinking bound (3.6) is typically an order of magnitude larger than (3.1) on real-life examples.(e) The upper bounds (3.6) and (3.12) based on shrinking and the modified Cholesky factorizations, respectively, are of similar quality and they overestimate d corr at most by one or two orders of magnitude.The modified Cholesky bound has the advantage of being computable in half the number of operations as the bound based on shrinking.(f) The upper bounds (2.4), (2.6), and (3.13), which are computable in O(n 2 ) c 2016 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 license