RESTORING DEFINITENESS VIA SHRINKING, WITH AN APPLICATION TO CORRELATION MATRICES WITH A FIXED BLOCK

. Indeﬁnite approximations of positive semideﬁnite matrices arise in many data analysis applications involving covariance matrices and correlation matrices. We propose a method for restoring positive semideﬁniteness of an indeﬁnite matrix M 0 that constructs a convex linear combination S ( α ) = αM 1 + (1 − α ) M 0 of M 0 and a positive semideﬁnite target matrix M 1 . In statistics, this construction for improving an estimate M 0 by combining it with new information in M 1 is known as shrinking. We make no statistical assumptions about M 0 and deﬁne the optimal shrinking parameter as α ∗ = min { α ∈ [0 , 1] : S ( α ) is positive semideﬁnite } . We describe three algorithms for computing α ∗ . One algorithm is based on the bisection method, with the use of Cholesky factorization to test deﬁniteness, a second employs Newton’s method, and a third ﬁnds the smallest eigenvalue of a symmetric deﬁnite generalized eigenvalue problem. We show that weights that reﬂect conﬁdence in the individual entries of M 0 can be used to construct a natural choice of the target matrix M 1 . We treat in detail a problem variant in which a positive semideﬁnite leading principal submatrix of M 0 remains ﬁxed, showing how the ﬁxed block can be exploited to reduce the cost of the bisection and generalized eigenvalue methods. Numerical experiments show that when applied to indeﬁnite approximations of correlation matrices shrinking can be at least an order of magnitude faster than computing the nearest correlation matrix.

1. Introduction.Covariance matrices and correlation matrices constructed from discrete sets of empirical data play a key role in many applications.These matrices are symmetric positive semidefinite, with a correlation matrix also having unit diagonal.However, in practice the matrices obtained may lack definiteness due to missing or asynchronous observations or by the nature of their construction, such as in stress testing in finance or when correlations are determined within groups and then the groups are joined together.To ensure the validity of the subsequent analysis the indefinite approximation needs to be replaced by a valid covariance or correlation matrix, which we call a replacement matrix.This restoration of definiteness is needed in a very wide variety of applications, of which some recent examples include modelling public health [8] and dietary intakes [36], determination of insurance premiums for crops [12], simulation of wireless links in vehicular networks [37], reservoir modelling [26], oceanography [32], and horse breeding [35].
The matrices arising in these applications are generally dense, with the order ranging from the tens to the tens of thousands.Approaches to repairing an invalid covariance matrix include computing the nearest positive semidefinite matrix in the Frobenius norm, which amounts to shifting all the negative eigenvalues to zero while keeping the eigenvectors fixed [15], and adding a small positive multiple of the identity to the matrix to make it positive semidefinite.For repairing invalid correlation matrices a simple approach is to compute the nearest positive semidefinite matrix and then to diagonally scale it to restore the unit diagonal.As with increasing the diagonal of an invalid covariance matrix, this approach may change the matrix more than necessary.A widely used alternative is to compute the nearest correlation matrix to the original matrix in the Frobenius norm [4], [18], [30].
In this paper we develop a new method to repair invalid covariance and correlation matrices, inspired by an idea from statistics called shrinking.Shrinking has a long history going back to the work of Stein beginning in the 1950s, and is widely used in statistical estimation; see, for example, [6], [23], [24], [34], [38] and the references therein.A basic idea of shrinking is to form a convex linear combination αM 1 + (1 − α)M 0 of two correlation or covariance matrices, where α ∈ [0, 1] is chosen based on statistical considerations in order to obtain an estimator that has better properties than the extremes M 0 and M 1 .
Our use of shrinking differs from this standard usage in two respects.1.For us, M 0 is indefinite, not positive semidefinite, so it is not a covariance matrix or a correlation matrix.2. We make no statistical assumptions about M 0 or M 1 and choose α so that αM 1 + (1 − α)M 0 is positive semidefinite based solely on information in the matrices M 0 and M 1 .The possibility of using shrinking for restoring definiteness was mentioned by Devlin, Gnanadesikan, and Kettenring [10, sec. 4.4] and also by Kupiec [21, sec. 5], who suggests a "trial and error" way of choosing the shrinking parameter α.Rebonato and Jäckel [33] criticize Kupiec's suggestion on the grounds that it is expensive, since each trial requires a full eigenvalue decomposition, that a target matrix must be chosen, and that "there is no way of determining to what extent the resulting matrix is optimal in any easily quantifiable sense." We present a shrinking method that overcomes the drawbacks pointed out by Rebonato and Jäckel.We define an optimal shrinking parameter that produces a minimal elementwise perturbation to M 0 in the direction of the difference between the target matrix M 1 and the initial approximation M 0 .We propose three algorithms for computing the optimal parameter, none of which requires repeated full eigenvalue decompositions.We also show how a target matrix can be constructed in a natural way based on the confidence with which the elements of the given matrix are known.
Often, additional requirements are put on a replacement matrix, and in this paper we focus on the case where a positive semidefinite leading principal submatrix of an approximate correlation matrix is known to be exact and needs to be preserved.To illustrate the importance of this specific problem variant, we next present two applications where this requirement occurs: dealing with incomplete data sets and correlation stress testing.
To see how keeping a block fixed naturally appears as a condition in dealing with incomplete data sets, suppose that the data from K observations of N random variables is collected in a matrix X ∈ R K×N and that some of the entries are missing.Without loss of generality, we can permute the columns of X so that there are no missing elements in the first m columns.The pairwise deletion method [25, sec.2.2], which in calculating the correlation between a pair of vectors uses only the components available in both vectors, leads to an approximate sample correlation matrix of the form where A ∈ R m×m is a valid correlation matrix since it contains the exact correlations between variables 1 : m, which have no missing observations.In contrast, Y ∈ R m×n and B ∈ R n×n , with m + n = N , contain approximate correlations because their elements are computed from an incomplete data set.The matrix M 0 is symmetric with unit diagonal and off-diagonal elements in [−1, 1], but there is no guarantee that it is positive semidefinite.Our goal is to modify M 0 to make it positive semidefinite while preserving the unit diagonal and the (1, 1) block A. We take as the target the matrix M 1 = diag(A, I), for which the optimal α can be interpreted as the minimal relative change applied uniformly to all unfixed elements of M 0 .Stress testing is a method used in finance to explore the impact on a financial instrument of pushing risk parameters toward more extreme levels.In correlation stress testing [11], [29] we have a valid correlation matrix where C 11 and C 22 are correlation matrices corresponding to a first and second group of assets and C 12 is the cross-group correlation matrix, and wish to modify it by replacing C 22 with a new correlation matrix C 22 that reflects changes to the second group of assets.If the modified matrix is indefinite it needs to be replaced with a valid correlation matrix, but the C 11 block should remain unchanged since the first group of assets was not affected.Again, shrinking can be applied, with M 1 = diag(C 11 , I).
An application that generalizes keeping just one diagonal block fixed is risk aggregation [1], [20].Here, diagonal blocks of a global covariance or correlation matrix represent individual markets and these can be individually updated with more refined local forecasts.Replacing the old diagonal blocks with the new ones can result in the global matrix becoming indefinite, so the covariances or correlations in the off-diagonal blocks have to be readjusted while keeping the new diagonal blocks unchanged.Shrinking can be used to solve this problem by selecting as target the matrix comprised of the new diagonal blocks and the optimal α can again be interpreted as the minimal relative change applied uniformly to all unfixed elements of the global matrix.
The paper is organized as follows.In the next section we define the shrinking problem, characterize the solution, and discuss the choice of target matrix.We present three methods to compute the optimal shrinking parameter in section 3: one based on the bisection method, a second based on Newton's method, and a third that solves a symmetric definite generalized eigenvalue problem.In section 4 we explain how weights can be incorporated into the choice of a target matrix.In section 5 we focus on restoring the definiteness of a correlation matrix while preserving a specified positive semidefinite leading principal submatrix.We show how the bisection and generalized eigenvalue methods can be adapted to exploit the problem structure, explain how the case of a singular fixed block can be reduced to the nonsingular case, show how a lower bound on the smallest eigenvalue can be incorporated, and describe how multiple fixed diagonal blocks can be treated.Numerical experiments are presented in section 6, which include a comparison of shrinking with the solution of the nearest correlation matrix problem.Concluding remarks are given in section 7.
2. The shrinking problem.Given a real symmetric indefinite matrix M 0 of order N our task is to modify M 0 to make it positive semidefinite by computing a convex linear combination of M 0 and a chosen positive semidefinite target matrix M 1 .Hence we consider the matrix (2.1) Clearly, S(α) is symmetric for every α, S(0) = M 0 is indefinite, and S(1) = M 1 is positive semidefinite.We define the optimal shrinking parameter as Since S(α) = M 0 +α(M 1 −M 0 ), it is clear that we are seeking the elementwise minimal change to M 0 in the direction M 1 − M 0 .For another interpretation, note that fixed, this means that S(α * ) is the nearest positive semidefinite matrix to M 0 of the form S(α), in any norm.
We now characterize the optimal shrinking parameter α * .The following results form the basis for the bisection and Newton methods for computing α * proposed in sections 3.1 and 3.2.
Since a symmetric matrix is positive semidefinite if and only if its smallest eigenvalue is nonnegative, we focus on the function f : R → R defined by where λ min denotes the smallest eigenvalue of a symmetric matrix.Note that f is a continuous function, since the eigenvalues of a matrix are continuous functions of its elements.Hence α * is characterized as Recall that a function g : R → R is concave if for every α 1 , α 2 ∈ R and t ∈ [0, 1], In the following lemma we show that the function f defined in (2.3) is concave.In the proof below we will use the characterization [19,Thm. 4.2.2] for symmetric C of order N , (2.4) Proof.Let α 1 , α 2 ∈ R and t ∈ [0, 1] be arbitrary and note that S(α) is an affine function of α.Then we have Since f (0) < 0, f (1) = λ min (S(1)) = λ min (M 1 ) and f is concave and continuous, it follows that α * is the unique zero of f in (0, 1) if the matrix M 1 is positive definite.In principle we need to allow M 1 to be positive semidefinite and singular, as can happen in our correlation matrix application discussed in section 5, but as we show in section 5.4 in that case the problem can be reduced to the case in which M 1 is positive definite.
When the only goal is to repair the indefiniteness of the matrix M 0 , the target matrix M 1 can be chosen as any positive semidefinite matrix.When M 0 is an invalid correlation matrix, which we take to mean that it is indefinite but has unit diagonal, and we want S(α * ) to be a valid correlation matrix, then from (2.1) it follows that the target matrix needs to be a correlation matrix, so that the unit diagonal is preserved.Hence the simplest target in this case is the identity matrix.
In a setting with time-varying matrices a natural target is the (shrunk) matrix from the previous time step.
3. Computing the optimal shrinking parameter.We present three methods to compute the optimal shrinking parameter α * when M 0 is symmetric indefinite and the target matrix M 1 is positive definite.Recall that in this case α * is the unique zero in (0, 1) of f in (2.3).

Bisection method.
The simplest iterative method to find a zero of a function on a given interval is the bisection method, which yields the following algorithm for our problem.
Algorithm 3.1 (Bisection method).Given the indefinite matrix M 0 ∈ R N ×N , a positive definite target matrix M 1 ∈ R N ×N , and a convergence tolerance tol this algorithm uses the bisection method to compute the optimal shrinking parameter α * defined by (2.2).
In the last line of the algorithm we have set α * to α r rather than to the generally more accurate value (α + α r )/2 in order to ensure that S(α * ) is positive semidefinite.
The main computational task in Algorithm 3.1 is testing for positive semidefiniteness.As argued in [15, sec.5], an arbitrarily small perturbation can make a singular positive semidefinite matrix become positive definite and hence in finite precision arithmetic testing for positive semidefiniteness is equivalent to testing for positive definiteness.One way to test for definiteness is to compute the eigenvalues of S(α m ) and check whether the smallest one is positive.A less expensive approach is to attempt to compute the Cholesky factorization of S(α m ) and declare the matrix positive definite if the process succeeds.Although it might seem numerically dangerous to apply Cholesky factorization to a potentially indefinite matrix, this approach is numerically stable [15, sec. 5].Hence, we replace step 4 in Algorithm 3.1 with "if the Cholesky factorization of S(α m ) breaks down".
The number of steps needed by Algorithm 3.1 is | log 2 tol| , where the ceiling function α denotes the smallest integer greater than or equal to α, reflecting the linear convergence of the bisection method.However, in practical applications the data is often accurate only to three or four significant digits, in which case the tolerance tol will be of order 10 −4 or larger and bisection will need less than 15 iterations.The cost per step of Algorithm 3.1 depends on the number of successful elimination stages in the Cholesky factorization of S(α m ) and is at most N 3 /3 flops.
3.2.Newton's method.For a method with faster convergence than the bisection method it is natural to turn to Newton's method, defined by where x(α) is a unit norm eigenvector for λ min (S(α)) and, from (2.1), The Newton iteration for finding the zero of f (α) = λ min (S(α)), where S(α) is defined in (2.1), can be written as where x(α k ) is a unit norm eigenvector for λ min (S(α k )) and it is assumed that λ min (S(α k )) is simple for each k.
Proof.Dropping the index k for simplicity, let us look at the quotient f (α)/f (α).We have, from (2.1), and, from (3.1), which yields the result.
Recall that we are looking for α * ∈ (0, 1) such that f (α * ) = 0, where f is continuous and concave with f (0) < 0 and f (1) > 0. The function f is monotone increasing on an interval [0, β] ⊆ [0, 1] that contains α * , but β is not necessarily equal to 1 (and it is possible that f decreases on [β, 1]).Therefore we have the geometrically obvious result that for any α 0 < α * the Newton iterates converge monotonically to α * and hence the Newton method for our problem is globally and quadratically convergent.In practice, we can set α 0 = 0.
Taking all of this into consideration, we have the following algorithm.Algorithm 3.3 (Newton method).Given the indefinite matrix M 0 ∈ R N ×N , a positive definite target matrix M 1 ∈ R N ×N , and a convergence tolerance tol this algorithm uses Newton's method to compute the optimal shrinking parameter α * defined by (2.2).
which corresponds to the bisection stopping criterion.
The main computational work in the algorithm is computing a unit norm eigenvector for the smallest eigenvalue at each step, and of the many methods that compute one or a few of the (extremal) eigenvalues and their corresponding eigenvectors we have chosen tridiagonalization followed by the bisection method and inverse iteration [9, sec. 5. Note that there is no guarantee that the computed α * from Algorithm 3.3 will in fact define a positive semidefinite S(α * ) since the iterates stay to the left of α * .
We do not consider the secant method.While its convergence rate is lower than for the Newton's method it has the general advantage that it avoids the need for derivatives.However, for Newton's method the cost of computing λ min (S(α k )) and x(α k ) is dominated by the cost of the tridiagonalization, so avoiding the computation of x(α k ) produces no significant saving.

Generalized eigenvalue problem.
The third method for computing the optimal shrinking parameter is essentially different from the root-finding methods presented above and provides the most elegant description of α * .Recall that we are looking for the smallest α ∈ (0, 1) for which the matrix is positive semidefinite.The matrix S(α) is a symmetric matrix for every α, which means that it has real eigenvalues.Then α → λ 1 (S(α)) , . . ., α → λ N (S(α)) is a continuous parametrization of the N eigenvalue functions If α is such that λ k (S(α)) = 0 for some k then the matrix S(α) is singular which means, by definition, that α is a generalized eigenvalue of the pencil E −αF .It follows that α * , the zero of λ N , is a generalized eigenvalue of the matrix pencil E − αF , and among all generalized eigenvalues in (0, 1), α * is the rightmost one.
The matrices E = M 0 and F = M 0 − M 1 are symmetric indefinite, and the QZ algorithm for computing the generalized eigenvalues of E − αF cannot exploit the symmetry.However, some symmetric pencils, known as definite pencils, can be transformed into pencils in which one of the matrices is positive definite, and there are algorithms for attempting to find such a transformation [5], [14].In our case, it is trivial to obtain a definite pencil.We write and, since α * < 1, S(α) is singular precisely at the generalized eigenvalues of the definite pencil M 0 − µM 1 , where µ = α/(α − 1).We find α * by computing the Bisection Newton Generalized eigenvalue smallest generalized eigenvalue of this pencil.To do so we transform it to a standard symmetric eigenvalue problem C − µI, where Cholesky factorization of M 1 ; see, for example, [7].To compute the smallest eigenvalue of the matrix C we use tridiagonalization followed by the bisection method.
The algorithm can be summarized as follows.Algorithm 3.4 (Generalized eigenvalue method).Given the indefinite matrix M 0 ∈ R N ×N and a positive definite target matrix M 1 ∈ R N ×N , this algorithm uses the generalized eigenvalue interpretation to compute the optimal shrinking parameter α * defined by (2.2).
1 Compute the Cholesky factorization by multiple right-hand side triangular solves.
3 Find µ * , the smallest eigenvalue of C, by tridiagonalization followed by bisection.4 α * = µ * /(µ * − 1).3.1 summarizes the approximate costs in flops of the three algorithms.Which method is the cheapest depends on the desired accuracy, with relatively large values of tol (corresponding to low precision data) favoring the bisection algorithm.The bisection algorithm also has the advantage of being the easiest to implement and it guarantees a positive semidefinite solution.

Comparison. Table
Note that when M 1 = I, the first two lines of Algorithm 3.4 are empty and the cost reduces to 4N 3 /3 flops.
4. Introducing weights.In practice, different elements of M 0 may be known to different levels of accuracy or confidence [2].This can be reflected by introducing a symmetric matrix W ∈ R N ×N of nonnegative weights w ij ∈ [0, 1] and defining the target matrix as M 1 = W • M 0 , where • is the Hadamard (elementwise) product.Then Therefore a weight w ij = 1 signifies that the (i, j) element of M 0 must not be changed, while a weight w ij = 0 allows that element to be changed as much as necessary.Intermediate values w ij ∈ (0, 1) put a greater restriction (for larger w ij ) or lesser restriction (for smaller w ij ) on the relative amount by which the (i, j) elements of M 0 can change.The unit diagonal in correlation problems poses no difficulties as it is simply obtained for W with a unit diagonal.
Weighting provides a natural answer to the question of how to choose the target matrix: the target is based on the original information in M 0 and the trust that can be put in each individual entry.However, there is no guarantee that M 1 obtained this way is positive semidefinite, which is a requirement for a target matrix in the shrinking method.If the target matrix turns out to be indefinite then the weights are too restrictive and W should be modified.
Since weighting is reflected entirely in the target matrix M 1 , all the methods from section 3 apply without change.This is in contrast to the nearest correlation matrix problem in the Frobenius norm, where the so-called H-weighted problem min{ W •(M 0 −X) 2 F : X is a correlation matrix } is much more challenging to solve than the unweighted problem with w ij = 1 [18], [31].For example, the NAG code g02aj/nag_nearest_correlation_h_weight solves the H-weighted nearest correlation matrix problem, but the documentation says that if the weights vary by several orders of magnitude then the underlying algorithm may fail to converge [28].
5. Correlation matrix with fixed block.We now consider an important special case of weighting in which the given matrix M 0 is an invalid correlation matrix and has a positive semidefinite leading principal submatrix that must remain fixed.As explained in section 1, this problem arises when a correlation matrix is formed from incomplete data sets or through stress testing.
One approach is to replace M 0 by the nearest correlation matrix in the Frobenius norm with a prescribed block fixed, which can be computed by a simple modification of the alternating projections method from [18].The modified algorithm and numerical experiments with it can be found in [3], [25].The algorithm is guaranteed to converge to the unique solution of the problem but the convergence is at best linear and so it can be slow.
We derive a new alternative based on shrinking.Recall from section 1 that we have is the simplest matrix that satisfies these conditions.We are looking for (5.4) In addition to the interpretation mentioned in section 2 that the resulting matrix S(α * ) is the elementwise minimal change of M 0 in the direction M 1 − M 0 , here we also have that α * is the minimal relative change applied uniformly to all the unfixed elements of M 0 .A desirable property is that if the rows and columns of A and of B are symmetrically permuted then S(α * ) is permuted in the same way.It is easy to show that this is the case, using the formulas in section 5.4.
We first assume that A is positive definite, so that we have a positive definite target matrix.The matrix S(α) has special structure: its leading positive definite block A does not change with α and this can be very efficiently exploited in the bisection method (Algorithm 3.1) and the generalized eigenvalue approach (Algorithm 3.4) to compute α * , as we show in the next two sections.
Having the fixed block A singular leads to significant changes to both the problem and the proposed methods for computing the optimal shrinking parameter, as discussed further in section 5.4 and illustrated by the differing plots in Figure 5.1.
We now show how to modify the bisection and generalized eigenvalue methods to exploit the fixed block.

Bisection method.
Let us look more closely at the Cholesky-based test for definiteness in the case where S(α) is given by (5.4) is the Cholesky factor of S(α) then 1. R 11 is the Cholesky factor of the fixed block A.
2. R 12 is the solution of the multiple right-hand side triangular system R T Note that R 11 is independent of α and so needs to be computed only once.Also, since R 12 = (1 − α)R −T 11 Y , once we have computed the solution X of R T 11 X = Y then for each α we do not need to solve a linear system for R 12 , but can instead set R 12 = (1 − α)X.Hence, to determine if the matrix S(α) is positive definite or not for a given α we attempt to compute the Cholesky factor of the matrix Taking all this into account, our optimized bisection algorithm for the case when A is positive definite is as follows.
Algorithm 5.1 (Bisection method).Given the indefinite matrix M 0 in (5.1) with positive definite (1, 1) block A and a convergence tolerance tol, this algorithm uses the bisection method with Cholesky factorization to compute the optimal shrinking parameter α * defined by (5.3) for the target matrix (5.2).
3 Compute the solution X of R T 11 X = Y and form Z = X T X. 4 while α r − α > tol

Generalized eigenvalue problem.
Recall from section 3.3 that we are looking for the smallest generalized eigenvalue of the definite pencil M 0 − µM 1 , with M 0 and M 1 now given by (5.1) and (5.2).If A = R T 11 R 11 is the Cholesky factorization then Hence we obtain a standard symmetric eigenvalue problem for the matrix (5.5) at the cost of one Cholesky factorization and one multiple right-hand side triangular system solve.
In summary, we have the following algorithm.Algorithm 5.2 (Generalized eigenvalue method).Given the indefinite matrix M 0 in (5.1) with positive definite (1, 1) block A this algorithm uses the generalized eigenvalue interpretation to compute the optimal shrinking parameter α * defined by (5.3) for the target matrix (5.2).
1 Compute R 11 , the Cholesky factor of A.
2 Compute the solution X of R T 11 X = Y and form C from (5.5). 3 Find µ * , the smallest eigenvalue of the matrix C, by tridiagonalization (exploiting the identity block) followed by the bisection method.4 α * = µ * /(µ * − 1).The cost of Algorithm 5.2 is at most m 3 /3 + m 2 n + 4(m + n) 3 /3 flops.The essential difference between Algorithm 3.4 and Algorithm 5.2 is that we need the Cholesky factorization of M 1 in the former but only that of A in the latter.

Enforcing a lower bound on the smallest eigenvalue.
In applications it may be required that a replacement correlation matrix is strictly positive definite.When the (1, 1) block A is positive definite we therefore generalize the problem to Proof.Since f (0) < 0, f (1) = λ min (A) > 0 and f is concave and continuous, it is sufficient to show that for every α ∈ [0, 1] we have From (5.4), since A is a leading principal submatrix of S(α), for every α we have, using (2.4), λ min (A) ≥ λ min (S(α)) = f (α).Since f (1) = λ min (A) by (5.7), we have f (α) ≤ f (1).
Clearly, finding α ψ * is equivalent to finding the minimal α such that the matrix For ψ < λ min (A) the matrix A ψ is positive definite and with only minor changes the methods from sections 5.1 and 5.2 for computing α * in (5.3) can be used to compute α ψ * in (5.6).
The extreme case, when θ = 1, significantly changes the nature of the problem.Here we are asking that α * is such that λ min (S(α * )) = λ min (A) and it follows that the matrix A ψ is singular and positive semidefinite.In this case, α * = 1 might be the only solution or all values on the interval [α * , 1], with α * < 1, might be solutions.These two cases are illustrated in plots (c) and (d) of Figure 5.1, with A there representing A ψ .
In the next section we discuss the problems, both theoretical and computational, that arise from the singularity of the leading block in S(α).

Singular
A. We now suppose that A in (5.1) is positive semidefinite and singular.For the bisection method, none of the computational savings discussed in section 5.1 are now applicable, since they were derived under the assumption that A is positive definite.We still have the basic Algorithm 3.1, but f may now be zero on an interval [α * , 1] (see Figure 5.1(d)).In this case we cannot use the standard Cholesky factorization, since its success or failure for a singular positive semidefinite matrix is unpredictable.Instead we need to use the Cholesky factorization with complete pivoting.If the factorization runs to completion with positive pivots we declare the matrix positive semidefinite.If the factorization terminates with a nonpositive pivot, we declare the matrix positive semidefinite if the norm of the remaining block S k (the Schur complement) satisfies S k ≤ c n u A , where c n is a constant and u is the unit roundoff, and indefinite otherwise.A weakness of this approach is that c n could potentially have to be of order 4 n−1 in order to not misdiagnose a positive semidefinite matrix as indefinite [16], [17, sec.10.3.2].
The case of an interval of zeros is not problematic for the Newton method.However, the method might run into difficulties when α = 1 is the only solution and the function f is slowly increasing near that point, because then the computed iterates might leave the [0,1] bracket.Therefore the Newton method should be safeguarded.
The most severe difficulties arise in the generalized eigenvalue method.If A is singular then M 1 is singular and we no longer have a definite pencil; moreover, if f has infinitely many zeros then the pencil (3.3) is singular, which means that every α is a generalized eigenvalue and we cannot characterize α * as before.
Our preferred way to handle the case of singular A is to employ a deflation method that reduces the problem to the nonsingular case.As a bonus, this analysis also allows us to distinguish the case when f has infinitely many zeros in [0, 1] from the case when its only zero is 1.
Since A is singular and positive semidefinite it has the eigenvalue decomposition A = QDQ T , where Q is orthogonal and D = diag(0, D + ), with D + a nonsingular diagonal matrix of size r = rank(A), containing all the positive eigenvalues of A. Then A necessary condition for S(α) to be positive semidefinite is that the first m − r rows of (1 Otherwise, α * < 1 and α * is the smallest α such that is positive semidefinite, where Z comprises the last r rows of Q T Y .Since the leading (1,1) block of this matrix is now positive definite we can find α * by either of the methods from sections 5.1 and 5.2 with S(α) replaced by S + (α).
Note that the condition that the first m − r rows of Q T Y are zero means that each column of Y is in the column space of A. 5.5.Generalization to multiple fixed blocks.The problem of this section generalizes naturally to applications where a large correlation matrix needs to be constructed from blocks, as in the risk aggregation problem described in section 1. Shrinking can easily be used to solve this problem by choosing as target the matrix comprising the diagonal blocks of the large matrix: M 1 = diag(A 11 , A 22 , . . ., A kk ).As in the case of keeping just one block fixed, α * is characterized as the minimal elementwise relative change in the cross-correlations.
When k = 2 it is easy to show that the optimized bisection algorithm is a simple modification of Algorithm 5.1, where in step 6 the matrix T is now For optimal efficiency the matrix should be reordered, if necessary, so that the larger of the two diagonal blocks is in the (1, 1) position.For the generalized eigenvalue method, Algorithm 3.4 leads to computing the smallest eigenvalue of 22 is formed by solving linear systems with the Cholesky factors R 11 of A 11 and R 22 of A 22 .Note that the required smallest eigenvalue of C is equal to 1 − σ * , where σ * is the largest singular value of the matrix Z, so instead of computing the smallest eigenvalue of a matrix of order m + n we can compute the largest singular value of an m × n matrix.For k > 2, the general algorithms from section 3 should be used.
When some of the diagonal blocks are singular, deflation analogous to that in section 5.4 can be done by employing the eigenvalue decomposition of each singular diagonal block.
6. Numerical experiments.Our tests were carried out in MATLAB R2014a on a machine with an Intel Core i7-4910MQ 2.90GHz processor and 16GB RAM.
We first compare the performance of our methods on a correlation matrix problem with a fixed block.We generate M 0 in (5.1) and M 1 in (5.2) by forming A ∈ R m×m using the MATLAB function gallery('randcorr',m) and the elements of the blocks Y ∈ R m×n and B ∈ R n×n are taken from the uniform distribution on [−1, 1], with B symmetric and forced to have unit diagonal.The size N = m + n of M 0 varies from 300 to 1500 and the test matrices are split into three groups.In the first group the matrices A and B (the diagonal blocks of M 0 ) are of the same size, in the second A is twice the size of B, and in the third B is twice the size of A. Unless specified otherwise, we use tolerance tol = 10 −6 , which is small enough for most practical applications.
We use the following algorithms.1. bisection: Algorithm 3.1 with Cholesky factorization.2. bisection fb: Algorithm 5.1, the optimized bisection algorithm for the fixed block case.3. newton: Algorithm 3.3.On each iteration the required eigenvector is computed by tridiagonalization followed by the bisection method (with the same tolerance as for the Newton iteration itself), using routines from the NAG Toolbox for MATLAB Mark 24 [27].4. GEP: Algorithm 3.4, the algorithm based on solving a generalized eigenvalue problem.The tridiagonalization and bisection is again done using the NAG Toolbox. 5. GEP fb: Algorithm 5.2, the optimized generalized eigenvalue problem algorithm for the fixed block case.The computation times for the five methods averaged over 10 matrices of each size are presented in Table 6.1.The average number of steps for newton varies from 7 to 10 and the bisection methods always take 20 steps.
The experiments confirm the merit of using the optimized versions of bisection and the generalized eigenvalue method in applications where we keep a block fixed.Newton's method is the slowest of the three methods.GEP is a little faster than bisection, while bisection fb is faster than GEP fb for m = n and m = 2n and of similar speed for n = 2m.To illustrate the effect of weighting, we consider an example with M 0 and W defined by 1.000 0.900 0.450 0.300 0.225 0.900 1.000 0.900 0.450 0.300 0.450 0.900 1.000 0.900 0.450 0.300 0.450 0.900 1.000 0.900 0.225 0.300 0.450 0.900 1.000 The eigenvalues of M 0 are, to the digits shown, −0.18, 0.05, 0.50, 1.27, 3.36, therefore M 0 is indefinite.We see that 1.00 0.90 0.00 0.00 0.00 0.90 1.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.45 0.00 0.00 0.00 1.00 0.45 0.00 0.00 0.45 0.45 1.00 , with eigenvalues 0.10, 0.36, 1.00, 1.64, 1.90.Hence, M 1 is a valid target matrix.Using bisection we obtain α * = 0.24 and  To explore this example further, denote the elements of M 0 and S(α * ) by s ij and s ij , respectively.Then s 34 = 0.9 = s 45 , but since w 34 = 0 and w 45 = 0.5 the elements s 34 and s 45 are not the same.The relative change in the (3, 4) element is but that in the (4, 5) element is confirming that each element s ij in S(α * ) was obtained by multiplying the corresponding element s ij in M 0 by 1 + α * (w ij − 1).
In our next example, we use two invalid correlation matrices from the finance industry.The matrix C 1 is of order 1399 and the matrix C 2 is of order 3120.We use the identity matrix as the target in the shrinking method, thus we are not fixing any off-diagonal elements.We compare the execution times of bisection and GEP with that for computation of the nearest correlation matrix (NCM) to C 1 and C 2 by NAG code g02aa/nag_nearest_correlation which implements a Newton method [4], [30].We also use three further matrices constructed as block 2 × 2 matrices with diagonal blocks C 1 and C 1 , C 1 and C 2 , and C 2 and C 2 , with remaining off-diagonal elements from the random uniform distribution on [−1, 1].Convergence tolerances of both 10 −3 and 10 −6 are taken for bisection and g02aa, as well as for the bisection part of GEP.The times are shown in Table 6.2, where N denotes the size of the matrix.The shrinking solution is computed one to two orders of magnitude faster than the NCM.It is clear that g02aa and GEP do not benefit significantly from a relaxed tolerance, whereas the time for bisection is proportional to the logarithm of the tolerance.The table also shows that the Frobenius norm distances from the original matrix to the shrinking solution range from being similar to the distance to the NCM to much larger than it.
Our final experiment provides some insight into how the Frobenius norm distance from the original matrix to the shrinking solution compares with the distance to the NCM when the smallest eigenvalue varies in size.We take for M 0 a random symmetric indefinite matrix with unit diagonal of size 500, constructed by a diagonal scaling of a random orthogonal similarity applied to a diagonal matrix D; the diagonal elements of D are generated from the uniform distribution on [0, 1] and half of them are multiplied by −10 −p for some p.We generate 10 random target correlation matrices M 1 using Table 6.3Comparison of the distances in the Frobenius norm of the NCM and the solution computed by shrinking for matrices M 0 of size 500 with varying order of magnitude for the smallest eigenvalue.Distance λ min (M 0 ) Avg. α * NCM shrinking (max) shrinking (I) -4.6635e-1 6.0586e-1 5.5041e0 2.3506e1 1.0936e1 -4.2839e-2 9.7077e-2 5.0532e-1 3.5461e0 1.2367e0 -4.2466e-3 1.0144e-2 4.9173e-2 3.5748e-1 1.2255e-1 -4.0378e-4 9.4919e-4 4.6492e-3 3.3126e-2 1.1789e-2 -4.3278e-5 1.0900e-4 5.3498e-4 3.9736e-3 1.2897e-3 -4.0634e-6 1.0014e-5 4.5978e-5 3.4107e-4 1.3969e-4 MATLAB function gallery('randcorr',500) and apply bisection.For each M 0 , Table 6.3 shows the average shrinking parameter, the NCM distance, the maximum distance for the shrinking solution, and the distance for shrinking with M 1 = I.We see that the distance with M 1 = I is smaller than the worst-case for the random targets M 1 and the shrinking distance is one order of magnitude larger than the NCM distance.This experiment gives some feel for the trade-off between the speed of shrinking versus the optimality of the NCM as measured by distance, at least for the case where there are no fixed elements.It is clear from the last two experiments that in applications where it is not essential to compute the nearest correlation matrix shrinking provides an attractive and much faster alternative for restoring definiteness.
7. Concluding remarks.The motivation for this work was the growing number of applications in which matrices that are supposed to be (semi)definite turn out to be indefinite.We aimed to develop an alternative to the popular, but relatively expensive, approach of replacing the given matrix by the nearest positive semidefinite matrix or nearest correlation matrix.We have shown that shrinking is an attractive way of restoring definiteness.The method is flexible as it allows the practitioner to choose a target matrix that best serves the needs of the application; all that is required is to make sure that the chosen matrix is positive semidefinite or, in the correlation matrix case, a correlation matrix.We have described how to define a target matrix in the case of fixed diagonal blocks, as occur in stress testing, for example and have shown that shrinking can also take advantage of this structure.Weighting is a popular feature of the nearest correlation matrix methods used in practice and we have shown that with shrinking weights can be incorporated in the target matrix without any effect on the solution techniques.
Shrinking can be achieved in at least three different ways, all of which are straightforward to implement.Of our three shrinking methods we favor the bisection and generalized eigenvalue methods.Bisection is perhaps preferable for convergence tolerances of 10 −6 and larger, whereas the generalized eigenvalue method is preferable for more stringent tolerances, since its cost is essentially independent of the tolerance.Bisection also has the advantage that it produces a numerically semidefinite matrix (one for which the Cholesky factorization succeeds).
The problems for which we believe shrinking is of interest range in size from order 10-100, as arise for example in foreign exchange trading, and which may need to be solved thousands of times in a simulation, to orders in the thousands or millions, as for example in bioinformatics [34].In the case of invalid correlation matrices an attractive feature of shrinking is that it is at least an order of magnitude faster than computing the nearest correlation matrix.This is due to the fact that computing the nearest correlation matrix requires at least several full eigenvalue decompositions, while we can completely avoid computing any eigenvalues in the bisection method and need to compute only one for the generalized eigenvalue method.Therefore, in applications where the computation time is the governing factor, shrinking is preferable.
An implementation of Algorithm 3.1 is to appear in Mark 25 of the NAG Library (2015) as the routine g02anf [28].MATLAB implementations of the algorithms discussed here are available at https://github.com/higham/shrinkingand Python implementations, which do not require access to the NAG Library, are available at https://github.com/vsego/shrinking.

4
and we wish for A and the unit diagonal of B to remain unchanged.Hence the (1, 1) block of the target matrix M 1 must equal A and the (2, 2) block must have a unit diagonal.The target matrix (5.2) M 1 = diag(A, I)

Table 3 . 1
Approximate costs, in flops, of k 1 iterations of the bisection algorithm (Algorithm 3.1 with Cholesky factorization), k 2 iterations of Newton's method (Algorithm 3.3), and the generalized eigenvalue-based algorithm (Algorithm 3.4), all for M 0 of size N .

Table 6 . 1
Computation times in seconds for the three general algorithms and the two algorithms optimized for the fixed block problem, for invalid correlation matrices of size m + n with fixed leading block of size m.
Note that the elements corresponding to weight w ij = 1 (typeset in bold) are unchanged, as required by the interpretation of weights.

Table 6 . 2
Times in seconds to compute shrinking solution by bisection and GEP and nearest correlation matrix using g02aa, for tolerances 10 −3 and 10 −6 , and Frobenius norm distances to original matrix.