The Dispersion Bias

Estimation error has plagued quantitative finance since Harry Markowitz launched modern portfolio theory in 1952. Using random matrix theory, we characterize a source of bias in the sample eigenvectors of financial covariance matrices. Unchecked, the bias distorts weights of minimum variance portfolios and leads to risk forecasts that are severely biased downward. To address these issues, we develop an eigenvector bias correction. Our approach is distinct from the regularization and eigenvalue shrinkage methods found in the literature. We provide theoretical guarantees on the improvement our correction provides as well as estimation methods for computing the optimal correction from data.


Introduction
Harry Markowitz transformed finance in 1952 by framing portfolio construction as a tradeoff between mean and variance of return. This application of mean-variance optimization is the basis of theoretical breakthroughs as fundamental as the Capital Asset Pricing Model (CAPM) and Arbitrage Pricing Theory (APT), as well as practical innovations as impactful as Exchange Traded Funds. 1 Still, all financial applications of mean-variance optimization suffer from estimation error in covariance matrices, and we highlight two difficulties. First, a portfolio that is optimized using an estimated covariance matrix is never the true Markowitz portfolio. Second, in current practice, the forecasted risk of the optimized portfolio is typically too low, sometimes by a wide margin. Thus, investors end up with the wrong portfolio, one that is riskier, perhaps a lot riskier, than anticipated.
In this article, we address these difficulties by correcting a systematic bias in the first eigenvector of a sample covariance matrix. Our setting is that of a typical factor model, 2 but our statistical setup differs from most recent literature. In the last two decades, theoretical and empirical emphasis has been on the case when the number of assets N and number of observations T are both large. In this regime, consistency of principal component analysis (PCA) estimates may be established (Bai & Ng 2008). Motivated by many applications, we consider the setting of relatively few observations (in asymptotic theory: N grows and T is fixed). Indeed, an investor often has a portfolio of thousands of securities but only hundreds of observations. 3 PCA is applied in this environment in the early, pioneering work by Connor & Korajczyk (1986) and Connor & Korajczyk (1988), but also very recently (Wang & Fan 2017). In this high dimension, low sample-size regime, PCA factor estimates necessarily carry a finite-sample bias. This bias is further amplified by the optimization procedure that is required to compute a Markowitz portfolio.
An elementary simulation experiment reveals that in a large minimum variance portfolio, errors in portfolio weights are driven by the first principal component, not its variance. 4 The fact that the eigenvalues of the sample covariance matrix are not important requires some nontrivial analysis, which we carry out. In particular, we show (in our asymptotic regime) that the bias in the dominant sample eigenvalue does not effect the performance of the estimated minimum variance portfolio. Only the bias in the dominant sample eigenvector needs to be addressed. We measure portfolio performance using two well-established metrics. Tracking error, the workhorse of financial practitioners, measures deviations in weights between the estimated (optimized) and optimal portfolios. We use the variance forecast ratio, familiar to both academics and practitioners, to measure the accuracy of the risk forecast of the portfolio, however right or wrong that portfolio may be.
To develop some intuition for the results to come, consider a simplistic world where all security exposures to the dominant (market) factor are identical. With probability one, a PCA estimate of our idealized, dominant factor will have higher dispersion (variation in its entries). Decreasing this dispersion, obviously, mitigates the estimation error. We denote our idealized, dominant factor by z. We prove that the same argument applies to any other dominant factor along the direction of z with high probablity for N large. Thus moving our PCA estimate towards z, by some amount, is very likely to decrease estimation error. In the limit (N ↑ ∞), the estimation error is reduced with probability one. The larger the component of the true dominant factor along z is, the more we can decrease the estimation error.
While a careful proof of our result relies on some recent theory on sample eigenvector asymptotics, rule of thumb versions have been known to practitioners since the 1970s (see footnote 10 and the corresponding discussion). Indeed, the dominant risk factor consistently found in the US and many other developed public equity markets has most (if not all) individual equities positively exposed to it. In other words, empirically, the dominant risk factor has a significant component in z. Our characterization of the dispersion bias may then be viewed as a formalization of standard operation procedure.
The remainder of the introduction discusses our contributions and the related literature. Section 2 describes the problem and fundamental results around the sample covariance matrix and PCA. In Section 3, we present our main results on producing a bias corrected covariance estimate. Section 4 discusses the implementation of our correction for obtaining data-driven estimates. Finally, in Section 5 we present numerical results illustrating the performance of our method in improving the estimated portfolio and risk forecasts.

Our contributions
We contribute to the literature by providing a method that significantly improves the performance of PCA-estimated minimum-variance portfolios. Our approach and perspective appear to be new. We summarize some of the main points.
Several authors (see above) have noted that sample eigenvectors carry a bias in the statistical and model setting we adopt. We contribute in this direction by, first, recognizing that it is the bias in the first sample eigenvector that drives the performance of PCA-based, minimum-variance portfolios. Second, we show that this bias may in fact be corrected to some degree (cf. discussion below (3.7) in Wang & Fan (2017)). In our domain of application this degree is material. We point out that eigenvalue bias, which has been the focus in most literature, does not have a material impact on minimum-variance portfolio performance. This motivates lines of research into more general Markowitz optimization problems. Finally, our correction can be framed geometrically in terms of the spherical law of cosines. This perspective illuminates possible extensions of our work. We discuss this further in our concluding remarks.
We also develop a bias correction and show that it outperforms standard PCA. Minimum variance portfolios constructed with our corrected covariance matrix are materially closer to optimal, and their risk forecasts are materially more accurate. In an idealized one-factor setting, we provide theoretical guarantees for the size of the improvement. Our theory also identifies some limitations. We demonstrate the efficacy of the method with an entirely datadriven correction. In an empirically calibrated simulation, its performance is far closer to the theoretically optimal than to standard PCA.

Related literature
The impact of estimation error on optimized portfolios has been investigated thoroughly in simulation and emprical settings. For example, see Jobson & Korkie (1980), Britten-Jones (1999), Bianchi, Goldberg & Rosenberg (2017) and the references therein. DeMiguel, Garlappi & Uppal (2007) compare a variety of methods for mitigating estimation error, benchmarking againt the equally weigted portfolio in out-of-sample tests. They conclude that unreasonably long estimation windows are required for current methods to consistently outperform the benchmark. We review some methods most closely related to our approach. 5 Early work on estimation error and the Markowitz problem was focused on Bayesian approaches. Vasicek (1973) and Frost & Savarino (1986) were perhaps the first to impose informative priors on the model parameters. 6 More realistic priors incorporating multi-factor modeling are analyzed in Pástor (2000) (sample mean) and Gillen (2014) (sample covariance). Formulae for Bayes' estimates of the return mean and covariance matrix based on normal and inverted Wishart priors may be found in Lai & Xing (2008, Chapter 4, Section 4.4.1).
A related approach to the Bayesian framework is that of shrinkage or regularization of the sample covariance matrix. 7 Shrinkage methods have been proposed in contexts where little underlying structure is present (Bickel & Levina 2008) as well as those in which a factor or other correlation struc-5 The literature on this topic is extensive. We briefly mention a few important references that do not overlap at all with out work. Michaud & Michaud (2008) recommends the use of bootstap resampling. Lai, Xing & Chen (2011) reformulate the Markowitz problem as one of stochastic optimization with unknown moments. Goldfarb & Iyengar (2003) develop a robust optimization procedure for the Markowitz problem by embedding a factor structure in the constraint set.
6 Preceeding work analyzed diffuse priors and was shown to be inneficient (Frost & Savarino 1986). The latter, instead, presumes all stocks are identical and have the same correlations. Vasicek (1973) specified a normal prior on the cross-sectional market betas (dominant factor). 7 In the Bayesian setup, sample estimates are "shrunk" toward the prior (Lai & Xing 2008).
ture is presumed to exist (e.g. Ledoit & Wolf (2003), Ledoit & Wolf (2004), Fan, Liao & Mincheva (2013) and Bun, Bouchaud & Potters (2016)). Perhaps surprisingly, shrinkage methods turn out to be related to placing constraints on the portfolio weights in the Markowitz optimization. Jagannathan & Ma (2003) show that imposing a positivity constraint typically shrinks the large entries of the sample covariance downward. 8 As already mentioned, factor analysis and PCA in particular play a prominent role in the literature. It appears that while eigenvector bias is acknowledged, direct 9 bias corrections are made only to the eigenvalues corresponding to the principal components (e.g. Ledoit & Péché (2011) and Wang & Fan (2017)). Some work on characterizing the behavior of sample eigenvectors may be found in Paul (2007) and Shen, Shen, Zhu & Marron (2016). In the setting of Markowitz portfolios, the impact of eigenvalue bias and optimal corrections are investigated in in El Karoui et al. (2010) andEl Karoui (2013).
Our approach also builds upon several profound contributions in the literature on portfolio composition. In an influential paper, Green & Hollifield (1992) observe the importance of the structure of the dominant factor to the composition of minimum variance portfolios. In particular, the "dispersion" of the dominant factor exposures drives the extreme positions in the portfolio composition. This dispersion is further amplified by estimation error, as pointed out in earlier work by Blume (1975) (see also Vasicek (1973)). These early efforts have led to a number of heuristics 10 to correct the sample bias of dominant factor estimates.

Problem formulation
We address the impact of estimation error on Markowitz portfolio optimization. To streamline the exposition, we restrict our focus to the minumum variance portfilio. In particular, given a covariance matrix Σ estimated from T observations of returns to N securities, we consider the optimization problem, We denote by w the solution to (1), the estimated minimum variance portfolio. Throughout, 1 N is the N -vector of all ones. It is well-known that the portfolio weights, w, are extremely sensitive to errors in the estimated model, and risk 8 This is generalized and analyzed further in DeMiguel, Garlappi, Nogales & Uppal (2009). 9 Several approaches to alter the sample eigenvectors indirectly (e.g. shrinking the sample towards some structured covariance) do exist. However, the analysis of these approaches is not focused on characterizing the bias inherent to the sample eigenvectors themselves.
10 For example, the Blume and Vasicek (beta) adjustments. See the discussion of Exhibit 3 and footnote 7 in Clarke, De Silva & Thorley (2011). forecasts for the optimized portfolio tend to be too low. 11 We aim to address these issues in a high-dimension, low sample-size regime, i.e., T N . We are also interested in the equally weigthed portfolio where w e = 1 N /N , a very simple non-optimized portfolio frequenly employed as a benchmark. We use this benchmark to test whether the corrections we make for improving the minimum variance portfolio are not offset by degraded performance elsewhere.

Model specification & assumptions
We consider a one-factor, linear model for returns to N ∈ N securities. Here, a generating process for the N -vector of excess returns R takes the form where φ is the return to the factor, β = (β 1 , . . . , β N ) is the vector of factor exposures, and = ( 1 , . . . , N ) is the vector of diversifiable specific returns. While the returns (φ, ) ∈ R × R N are random, we treat each exposure β n ∈ R as a constant to be estimated. Assuming φ and the { n } are mean zero and pairwise uncorrelated, the N × N covariance matrix of R can be expressed as Here, σ 2 is the variance of φ and ∆ a diagonal matrix with nth entry ∆ nn = δ 2 n , the variance of n . Estimation of Σ is central to numerous applications.
We consider a setting in which T observations {R t } T t=1 of the vector R are generated by a latent time-series {φ t , t } T t=1 of (φ, ). It is standard to assume the observations are i.i.d. and we do so throughout. Finite-sample error distorts measurement of the parameters (σ 2 , β, δ 2 ) leading to the estimate, which approximates (3) by using the estimated model (σ 2 ,β,δ 2 ). Without loss of generality we assume the following condition throughout.
We require further statistical and regularity assumptions on our model for our techical results. These conditions stem from our use of recent work on spiked covariance models (Wang & Fan 2017). Some may be relaxed in various ways. Our numerical results (Section 5) investigate a much more general setup.
11 Extreme sensitivity of portfolio weights to estimation error and the downward bias of risk forecasts are also found in the optimized portfolios constructed by asset managers. Portfolio specific corrections of the dispersion bias discussed in this article are useful in addressing these practical problems. The focus on the global minimum variance portfolio in this article highlights the essential logic of our analysis in the simplest possible setting.
Assumption 2.4. For z = 1 N / √ N we have sup N γ β,z < 1 and sup N γβ ,z < 1. Also, β andβ are oriented in such a way that β z andβ z are nonnegative.
The requirement in Assumption 2.2 that the factor variance grow in dimension while the specific variance stays bounded is pervasive in the factor modeling literature (Bai & Ng 2008). The extra requirement that the specific risk is homogenous (i.e. ∆ is a scalar matrix) is restrictive but shortens the technical proofs significanly. It is also commonplace to the spiked covariance model literature. 12 We discuss the adjustments required for heterogenous specific risk (i.e. ∆ diagonal) in Section 4.2. The distributional Assumption 2.3 facilitates several steps in the proofs. In particular, it allows for an elegant characterization of the systematic bias in PCA, i.e., the bias in the first eigenvector of the sample covariance matrix of the returns {R t } (see Section 2.2). Assumption 2.4 is not much of a restriction. First, all results can be easily extended to the case β = z, which is simply a point of singularity and thus requires a separate treatment. The orientation requirement is essentially without loss of generality. We will see (in Section 3) that the vector z plays a special role in our bias correction procedure. And if β z < 0, we would simply consider −z.

PCA bias characterization
We consider PCA as the starting point for our analysis as its use for risk factor identification is widespread in the literature. It is appropriate when σ 2 is much larger than ∆ (e.g., Assumption 2.2). Assembling a data matrix R = (R 1 , . . . , R T ), we will denote the (data) sample covariance matrix by S = T −1 RR . PCA identifiesσ 2 with the largest eigenvalue of S andβ with the corresponding eigenvector (the first principal component). The diagonal matrix of specific risks ∆ is estimated as ∆ = diag(S −σ 2ββ ), which corresponds to least-squares regression of R onto the estimated factors. Finally, the estimate Σ of the covariance Σ is assembled as in (4).
Bias in the basic PCA estimator above arises from the use of the sample 13 covariance matrix S. We focus on the high-dimension, low sample size regime which is most appropriate for the practical applications we consider. Asymptotically, T is fixed and N ↑ ∞, a regime in which the PCA estimates of σ 2 and β are not consistent. We summarize some recent results from the literature below. We also state our characterization of PCA bias as it pertains to its pricipal components. Our result is novel in that it suggests a remedy.
12 In particular, our need for this condition stem from our use of results from Shen et al. (2016). The assumption may be relaxed by imposing regularity conditions on the entries of ∆ at the expense of a more cumbersome exposition. It may also be removed entirely if we consider the regime N, T ↑ ∞ as is more common in the literature. We do not pursue this because many important pratical applications are restricted to only a small number of observations.
13 If Σ replaces S, sample bias vanishes and the estimator is asymptotically exact as N ↑ ∞.
(a)β lies near the cone defined by the Ψ in (5) There is no reference frame to detect the bias since β is unknown.
. PCA estimates exhibit a systematic (dispersion) bias relative to vector z. Figure 1: A unit sphere with example vectors β,β and z = 1 N / √ N . The vector z provides a reference frame in which PCA bias may be identified. Note, the angle θ x,y is also the arc-length between points x and y on the unit sphere.
Let θβ ,β denote the angle between β and its PCA estimateβ. Shen et al. (2016) showed, under Assumption 2.2 and mild conditions on the moments of {R t }, that there exist random variables Ψ > 1 and ξ ≥ 0 such that 14 cos θβ ,β a.s.
as N ↑ ∞ where ξ and Ψ depend on T (fixed). The pair (ξ, Ψ) characterizes the error in the PCA estimated model asymptotically. As the sample size T increases, both ξ and Ψ approach one almost surely, i.e., PCA supplies consistent estimates. Since Ψ > 1, the estimateσ 2 tends to be biased upward (whenever ξ fluctuates around one). Under Assumption 2.3, ξ = χ 2 T /T where χ 2 T has the chi-square distribution with T degrees of freedom. Here, ξ is concentrated around one with high probability (w.h.p.) for even moderate values of T .
Identifying a (systematic) bias in PCA estimates of β requires a more subtle analysis. Observe from (5) that Ψ defines a cone near which the estimateβ will lie with high probability for large N (see panel (a) of Figure 1). However, this does not point to a systematic error that can be corrected. 15 Indeed, it is not apriori clear where on the cone the vectorβ resides as (5) provides information only about the angle θβ ,β away from the unknown vector β. We provide a result (see Theorem 2.5 below) that sheds light on this problem.
Recall the vector z = 1 N / √ N . We consider, not the angle θβ ,β between β andβ, but the their respective angles θ β,z and θβ ,z to this reference vector. ∼ Ψ cos θβ ,z as N ↑ ∞ and, in particular, we have that θβ ,z exceeds θ β,z with high probability for large N .
The proof of this result is deferred to Appendix C. It applies the spiked covariance model results of Shen et al. (2016) and Paul (2007) on sample eigenvectors using a decomposition in terms of the reference vector z.

Errors in optimized portfolios
Estimation error causes two types of difficulties in optimized portfolios. It distorts portfolio weights, and it biases the risk of optimized portfolios downward. Both effects are present for the minimum variance portfolio w, constructed as the solution to (1) using some estimate Σ of the returns covariance Σ. We now define the metrics for assessing the magnitude of these two errors.
We denote by w * the optimal portfolio, i.e., the solution of (1) with Σ replaced by Σ. Since the latter is positive definite, the optimal portfolio weights w * may be given explicitly. 16 We define, the (squared) tracking error of w. Here, T 2 w measures the distance between the optimal and estimated portfolios, w * and w. Specifically, it is the square of the width of the distribution of return differences w * − w.
The variance of portfolio w is given by w Σ w and its true variance is w Σ w. We define, the variance forecast ratio. Ratio (9) is less than one when the risk of the portfolio w is underforecast. 17 Metrics (8) and (9) quantify the errors in portfolio weights and risk forecasts induced by estimation error. 18 We analyze T w and R w asymptotically. 16 Indeed, the Sherman-Morrison-Woodbury formula yields an explicit solution even for a multi-factor model and with a guaranteed mean return (El Karoui 2008). In our setting, 17 With respect to the equally weighted portfolio, w e , (the tracking error of which is zero) we only consider the variance forecast ratio.
18 For a relationship to more standard error norms see Wang & Fan (2017).
Again recall z = 1 N / √ N and let γ x,y = x y. Note, γ x,y = cos θ x,y whenever x and y lie on the surface of a unit sphere as in Figure 1. Define, The variable E drives the asymptotics of our error metrics. Note that E = 0 whenβ = β. Indeed, it is not difficult to show that for any estimates (σ,β,δ) satisfying Assumption 2.4, if E is bounded away from zero (i.e. inf N E > 0), where µ 2 = σ 2 /N which is bounded in N . This result (with our all assumptions above relaxed) is given by Proposition D.1 of Appendix D.
Corollary 2.6 (PCA performance Proof. Expression (12) follows from (5), Theorem 2.5 (i.e. γ β,z a.s. ∼ Ψγβ ,z ) and the identity 1 − γ 2 β,z = sin 2 θβ ,z . The remaining claims follow from (11). The result states that the variance forecast ratio for the minimum variance portfolio w that uses the PCA estimator of Section 2.2 is asymptotically 0. The estimated risk will be increasingly optimistic as N → ∞. This is entirely due to estimation error between the sample eigenvector and the population eigenvector. As N grows, the forecast risk becomes negligible relative to the true risk, rendering the PCA estimated minimum variance portfolio worse and worse. The tracking error is also driven by the error of the sample eigenvector and for increasing dimension, its proximity to the true minimum variance portfolio as measured by tracking error is asymptotically bounded below (away from zero).

Bias correction
Our Theorem 2.5 characterizes the bias of the PCA estimator in terms of the vector z. This is the unique (up to negation) dispersionless vector on the unit sphere, i.e., its entries do not vary. Of course when z = β, then the its PCA estimateβ will have higher dispersion with probability one. The argument works along the projection of any β along z, given by γ β,z . Our PCA bias characterization implies that γ β,z > γβ ,z , or equivalently, θβ ,z > θ β,z . with high probability (for large N ). Figure 1b illustrates this systematic PCA bias and clearly suggests a correction: a shrinkage of the PCA estimateβ towards z.

Intuition for the correction
Given an estimateβ, we analyze a parametrized correction of the form for ρ ∈ R. The curveβ (ρ) represents a geodesic between −z and z that passes throughβ as ρ passes the origin. We select the optimal "shrinkage" parameter ρ * as the ρ that minimizes the error in our metrics T 2 w and R w asymptotically. With S N −1 denoting the unit N -sphere, we define the space Lemma 3.1. Let β = z. There is a ρ * such thatβ(ρ * ) ∈ S β andβ (ρ * ) = z.
Proof. By the spherical law of cosines, we obtain where κ is the angle (in S N −1 ) between the geodesic fromβ to z and the one fromβ to β (see Figure 2a). Write x =β (ρ) and κ = κ ρ . Then, cos κ ρ = 0 when ρ = ρ * for which the geodesic betweenβ (ρ * ) and β is perperdicular to that betweenβ and z. 19 By construction,β(ρ * ) is not z and is in S β .
In a setting whereβ is the PCA estimator, observe that our selection of β (ρ * ) does more than just correct PCA bias. Theorem 2.5, under the proper assumptions, states that γ β,z a.s. ∼ Ψγβ ,z for a random variable Ψ > 1. This suggests taking ρ such that γ β,z equals γβ (ρ),z . This choice is not optimal however as it lies on the contour {x ∈ S N −1 : γ x,z = γ β,z } (see Figure 2b). It is not in S β unless β = z and its asymptotic error E is bounded away from zero.
(b) Setting the angle κ = π/2 corresponds to setting the error driving term E = 0. Thus,β (ρ * ) ∈ S β . Figure 2: Illustration of the spherical law of cosines and the geodesicβ (ρ) between the pointsβ and z along which we shrink the PCA estimateβ.

Statement of the main theorem
As noted above, we can improve tracking error and variance forecast ratio by reducing the angle between the estimated eigenvector and the true underlying eigenvector, β, equivalently, by replacingβ with an appropriate choice ofβ ρ . We find an optimal value ρ * N for a particular N . We present our method and its impact on tracking error and variance forecast ratio for a minimum variance portfolio below. We restrict to the case of homogeneous specific risk where ∆ = δ 2 I for expositional purposes but consider the full-fledged case in empirical results in Section 5.
In the following theorem, we also provide a correction to the sample eigenvalue. Our bias correction for the sample eigenvector introduces a bias in the variance forecast ratio for the equally weighted portfolio. We shrink the sample eigenvalue, treated as the variance of the estimated factor, to debias the variance forecast ratio for the equally weighted portfolio.
ii. For the oracle value ρ * N , the tracking error and forecast variance ratio for the minimum variance portfolio for Σ ρ * N =σ 2β ρ * β T ρ * +δ 2 I satisfy, That is, after the optimal correction, the forecast variance ratio for the minimum variance portfolio no longer converges to 0 while the tracking error to the true minimum variance portfolio does. iii where Ψ 2 = 1 + δ 2 c 1 /ξ, ξ = χ 2 T /T , and χ 2 T is the chi-squared distribution with T degrees of freedom. Also,ρ N > 0 almost surely if γ β,z > 0. And the asymptotic improvement of the optimal angles θ β,β ρ * and θ β,βρ N over the original angle θ β,β as N → ∞ is, Remark 3.3. Geometrically, there are two views ofβ ρ * . One is thatβ ρ * is the projection ofβ onto S β . The other is thatβ ρ * is the projection of β onto the geodesic defined byβ ρ . In either case, our goal is to find the intersection of the geodesic and the space S β .
Remark 3.4. While we consider a specific target, z, in principle the target does not matter. It is possible that these kind of factor corrections can be applied beyond the first factor, given enough information to create a reasonable prior.
The first takeaway from this result is that in the high dimensional limit, it is always possible to improve on the PCA estimate by moving along the geodesic betweenβ and z. As γ β,z approaches 0 or for a larger T , the optimal correction approaches 0. Conversely, as γ β,z approaches 0 or for smaller T , the magnitude of the correction is larger. For γ β,z = 1, the proper choice is naturally to choose z since β and z are aligned in that case.
The improvement in the angle as measured by the ratio of squared sines is bounded in the interval (1 − γ 2 β,z , 1). As γ β,z approaches 0 or for larger T , the improvement diminishes and the ratio approaches 1. Conversely, for large values of c 1 , the improvement approaches 1 − γ 2 β,z , indicating that improvement is naturally constrained by how close β is to z in the first place.
In the application to the minimum variance portfolio, the initial idea is to correct the sample eigenvector so that we reduce the angle to the population eigenvector. However, it is not immediately clear that this should have a dramatic effect. Even more surprising is that underestimation of risk has a large component due to sample eigenvector bias and not any sample eigenvalue bias. While an improved estimateβ ρ has the potential to greatly improve forecast risk, this represents only a single dimension on which to evaluate the performance of a portfolio. We could be sacrificing small tracking error to the true long-short minimum variance portfolio in exchange for better forecasting. That however is not the case here.
Since ξ and Ψ are unobservable non-degenerate random variables, determining their realized values, even with asymptotic estimates, is an impossible task. Hence perfect corrections to kill off the driving term of underestimation of risk are not possible. However, it is possible to make corrections that materially improve risk forecasts.

Eigenvalue corrections
Our bias correction, based on formula (14), adjusts the dominant eigenvector of the sample covariance matrix S. It does not involve standard corrections to the eigenvalues of S, which are well known to be biased. This distinguishes our results from the existing literature (see Section 1.2).
For the purpose of improving accuracy of minimum variance portfolios, there is no need to adjust the dominant eigenvalue. As shown in formulas (11) of Section 2.3, the main drivers of T 2 and R for large minimum variance portfolios do not depend on the dominant eigenvalue of S when returns follow a one-factor model. This is a particular feature of minimum variance, where the dominant factor is effectively hedged.
Since our correction removes a systematic form of bias, it can be used to improve accuracy of other portfolios. If these portfolios have substantial factor exposure, however, a compatibility adjustment to the dominant eigenvalue may be required. As our eigenvector adjustment, the compatibility adjustment to the eigenvalue is distinct from the eigenvalue corrections in the literature.
Here, we provide some discussion of compatible eigenvalue corrections for "simple" portfolios, i.e., those that do not depend on the realized matrix of returns R. Note, that for these weights, the tracking error T is zero, so we treat the risk forecast ratio R only.
Under assumptions on our one-factor model that hold for most cases of interest, one can write, for a simple portfolio w that where C (w, β,β) = (w β)/(w β ). Our correction addresses only the quantity C, but asymptotic formula (20) reveals that for simple portfolios, sample eigenvalues play a material role. Another difference between simple portfolios and minimum variance is that the estimateδ 2 of δ 2 does not play any role; the factor risk is all that matters. From the discusion in Section 2.2 we know that σ 2 N tends to be larger than σ 2 N for large N , but it is not apriori clear that an eigenvalue correction should aim to lowerσ 2 N . This would depend on the behavior of the coefficient C (w, β,β) given the estimateβ and the simple portfolio w at hand. Moreover, a correction that decreasesσ 2 N will adjust risk forecast ratios downward, potentially leading to unintended underforecasts. Thus, sample eigenvalue corrections should be coupled with those for the sample eigenvectors to balance their respective terms in (20).
We state a sharp result in this direction for the equally weighted portfolio, a widely used simple portfolio.

Algorithm and extensions
For the precise statement of our algorithm see Appendix A. In what follows we address data-driven corrections for the case ∆ = δ 2 I of Theorem 3.2 as well as the extension to heterogeneous specific risk.

Data-driven estimator for homogenous specific risk
We introduce procedure for constructing an estimatorρ for the asymptotic oracle correction parameterρ N given in (19). It is based on estimates the specific variance δ 2 and c 1 from Assumption 2.2. From Yata & Aoshima (2012), we have the estimator given by, where S is the sample covariance matrix for the data matrix R andλ 1 =σ 2 is the first eigenvalue of S. A natural estimate for the true eigenvalue λ 1 (Σ) is λ S 1 = max{λ 1 −δ 2 N/T, 0}. Forλ 1 sufficiently large, the estimate of c 1 is given by,ĉ Given the estimates of δ 2 and c 1 , we need a precise value for ξ, as well as Ψ, in order to have a data-driven estimator. We approximate ξ by its expectation E[ξ] = 1 to obtain a completely data driven correction parameter estimateρ, where Ψ 2 = 1 +δ 2ĉ 1 . We compute the factor variance as, whereσ 2 =λ 1 is the first sample eigenvalue from S.

An extension to heterogenous specific risk
Our analysis thus far has rested on the simplifying assumption that security return specific variances have a common value. Empirically, this is not the case, and the numerical experiments discussed below in Section 5 allow for the more complex and realistic case of heterogenous specific variances. To address the issue, we modify both the oracle estimator and the data-driven estimator by rescaling betas by specific variance. Under heterogeneous specific variance, the oracle value ρ * N is given by the formula, where γ ∆ x,y = x T ∆ −1 y/ γ ∆ x,x γ ∆ y,y is a weighted inner product. Furthermore, the risk adjusted returns R∆ −1/2 have covariance Σ given by, For the risk adjusted returns, Theorem 3.2 holds. The oracle formula coupled with the risk adjusted returns suggest we use R = R ∆ −1/2 as the data matrix where ∆ is the specific risk estimate from the standard PCA method. Why should we expect this to work? The purpose of the scaling R∆ −1/2 is to make the specific return distribution isotropic, and R approximates that. Since we are only trying to obtain an estimatorρ that is close to ρ * N , this approximation ends up fine. And for ellipses specified by ∆ with relatively low eccentricity, the estimator in (23) actually works in practice since the distribution is relatively close to isotropic. So for larger eccentricity, we require the following adjustment just to get the data closer to an isotropic specific return distribution. The updated formulas for the heterogenous specific risk correction estimators are given below, and we use them in our numerical experiments. For an initial estimate of specific risk ∆, the modified quantities are, We use the PCA estimate of the specific risk ∆ = diag(S −σ 2ββ ) as the initial estimator.
Once we have the estimatedρ, we return to the original data matrix R and apply the correction as before toβ, the first eigenvector of the sample covariance matrix. The method for correcting the sample eigenvalue remains the same and we opt to recompute the specific variances using the corrected factor exposures and variance.

Numerical study
We use simulation to quantify the dispersion bias and its mitigation in minimum variance and equally portfolios. We design our simulations around a return generating process that is more realistic than the single-factor, homogenousspecific-risk model featured in Theorems 2.6 and 3.2. 20 Returns to N ∈ N securities are generated by a multi-factor model with heterogenous specific risk, where φ is the vector of factor returns, B is the matrix of factor exposures, and = ( 1 , 2 , . . . , N ) is the vector of diversifiable specific returns. Our examples are based on a model with K = 4 factors. For consistency with Sections 2 and 3, we continue to adopt the mathematically convenient convention of scaling exposure vectors to have L 2 norm 1, and we denote the vector of exposures to the first factor (the first column of the exposure matrix B) by β. The recipe for constructing β with a target value γ β,z is to generate a random vector, rescale the vector to length 1, and then modify the component in the z direction to have the correct magnitude (while maintaining length 1). A similar approach would be to construct a random vector β with aveage equal to 1 and variance equal to τ 2 , where τ 2 is the dispersion of the "market beta," and then rescale β to length 1 to obtain β. The parameter τ 2 would control the concentration of the market betas, just like γ β,z , and tends to be greater in calmer regimes. The connection between τ 2 and the dispersion parameter γ β,z is given by γ 2 β,z = 1/ √ 1 + τ 2 . The three remaining factors are fashioned from equity styles such as volatility, earnings yield and size. 21 We draw the exposure of each security to each factor from a mean 0, variance 0.75 normal distribution and again normalize each vector of factor exposures to have L 2 length 1.
We calibrate the risk of the market factor in accordance with Clarke et al. (2011) and Goldberg, Leshem & Geddes (2014); both report the annualized volatility of the US market to be roughly 16%, based on estimates that rely on decades of data. We calibrate the risk of factors 2, 3 and 4 in accordance with Morozov, Wang, Borda & Menchero (2012, Table 4.3), by setting their annualized volatilities to be 8%, 4% and 4%. We assume that the returns to the factors are pairwise uncorrelated.
We draw annualized specific volatilities {δ 2 n } from a uniform distribution on [32%, 64%]. This range is somewhat broader than the estimated range in Clarke et al. (2011). 22 In each experiment, we simulate a year's worth of daily returns, T = 250, to N securities. From this data set, we construct a sample covariance matrix, S, from which we extract three estimators of the factor covariance matrix Σ. The first is the data-driven estimator, the implementation specifics of which are discussed in Section 4.2 and precisely summarized of Appendix A. The second is the oracle estimator which is the same as the data-driven estimator but with the true value of γ β,z , the projection of the true factor onto the z-vector, supplied. 23 Our third estimator is classical PCA, which is specified in detail in Section 2.2. We use the three estimated covariance matrices to construct minimum variance portfolios and to forecast their risk. We also use these covariance matrices to forecast the risk of an equally weighted portfolio.
In the experiments described below, we vary the number of securities, N , and the concentration of the market factor, γ β,z . We run 50 simulations for each pair, (N, γ β,z ). Results are shown in Figure 3   In Figure 3, the concentration of the market factor is γ β,z = 0.9. Panels (a), (b) and (c) show that for minimum variance portfolios optimized with dispersion bias-corrected PCA models, tracking error and volatility decline materially as N grows from 500 to 3000, ranges of outcomes compress, and variance forecast ratios are near 1 for all N considered. These desirable effects are less pronounced, or even absent, in a PCA model without dispersion bias mitigation. A comparison of panels (c) and (d) highlights the difference in accuracy of risk forecasts between a minimum variance portfolio and an equally weighted portfolio. Dispersion bias mitigation materially improves variance forecast ratio for the former and has no discernible impact for the latter. In Figure 4, the number of securities is N = 500. Panels (a), (b) and (c) show that for minimum variance portfolios optimized with PCA models, tracking error and volatility increase materially as γ β,z grows from 0.5 to 0.9, variance forecast ratio diminishes, and the ranges of outcomes expand. These undesirable effects are diminished when the dispersion bias is mitigated.
Results for values γ β,z ∈ [0.9, 1.0] are shown separately in Figure 5. The severe decline in performance for all risk metrics for γ β,z in this range is a consequence of the sea change in the composition of a minimum variance portfolio that can occur when the dominant eigenvalue of the covariance matrix is sufficiently concentrated. Here, the true minimum variance portfolio tends to lose its short positions. 24 Errors in estimation of the dominant factor lead to long-short optimized portfolios approximating long-only optimal portfolios. The market factor is hedged in the former but not in the latter, and this discrepancy propagates to the error metrics.
A comparison of panels (c) and (d) in Figures 4 and 5 highlights, again, the difference in accuracy of risk forecasts between a minimum variance portfolio and an equally weighted portfolio. Dispersion bias mitigation materially improves variance forecast ratio for the former and has no discernible impact on the latter. A casual inspection of the figures suggests that the data-driven estimator performs nearly as well as the oracle. However, there can be substantial differences between the two estimators in some cases, as shown in tracking error and variance forecast ratio of minimum variance Figure 5 when γ β,z ∈ [0.9, 1.0]. The origin of the differences can be seen Lemma 3.1. In order for the tracking error and variance forecast ratio to have good asymptotic properties, the estimator β must lie in the null space S β . This condition is guaranteed for the oracle estimators but will not generally be satisfied for data-driven estimators.

Summary
In this article, we develop a correction for bias in PCA-based covariance matrix estimators. The bias is excess dispersion in a dominant eigenvector, and the form of the correction is suggested by formulas for estimation error metrics applied to minimum variance portfolios.
We identify an oracle correction that optimally shrinks the sample eigenvector along a spherical geodesic toward the distinguished zero-dispersion vector, and we provide asymptotic guarantees that oracle shrinkage reduces both types of error. These findings are especially relevant to equity return covariance matrices, which feature a dominant factor whose overwhelmingly positive exposures tend to be overly dispersed by PCA estimators.
Our results fit into two streams of academic literature. The first is the large-N -fixed-T branch of random matrix theory, which belongs to statistics. The second is empirical finance, which features results about Bayesian adjust-ments to estimated betas.
To enable practitioners to use our results, we develop a data-driven estimator of the oracle. Simulation experiments support the practical value of our correction, but much work remains to be done. That includes the development of estimates of the size and likelihood of the exposure bias in finite samples, the identification and correction of biases in other risk factors, and empirical studies. Explicit formulas for error metrics in combination with the geometric perspective in this article provide a way to potentially improve construction and risk forecasts of investable optimized portfolios.

A Algorithm
Our corrected covariance matrix algorithm is given in Algorithm 1 where the input is a data matrix of returns R.

C Proof of main results
We start off with some foundational asymptotic results from the literature. Let X ∼ N (0, Λ) where Λ = diag(λ 1 , λ 2 , . . . , λ 2 ) is a diagonal matrix satisfying with λ 1 satisying Assumption C1 in Shen et al. (2016). Let S X = 1 T XX T be the sample covariance for X with eigendecomposition S X =VΛV T . Further letv 1 be the first sample eigenvector given by, Via a simple scaling by λ 2 , by Shen et al. (2016, Theorem 6.3) we have e T 1v 1 =v 11 a.s. → Ψ −1 where Ψ 2 = 1 + λ 2 c 1 /ξ and ξ = χ 2 T /T . We introduce the following lemma, which we will use in the proofs of our results. We leave its proofs until the end.
Proof of Theorem 2.5. Let X = UR where U is the matrix of eigenvectors (β, u 2 , . . . u N ) of Σ so that Cov(X) = Λ as introduced in the beginning of this section. Also as before let S X = 1 T XX T =VΛV T be the sample covariance of X and its eigendecomposition. Then S = 1 T RR T = UVΛV T U T so that the first sample eigenvector of S,β, is given by, Then we have γβ ,β =v 11 and, where ω N = u T 2 z · · · u T N z . As noted before, by Shen et al. (2016, Theorem 6.3),v 11 a.s. → Ψ −1 . We know that both ṽ 2 and ω N 2 are bounded as, Therefore, from Lemma C.1 for X N =ṽ ṽ 2 and Y N = ω N ω N 2 we haveṽ T ω N converges almost surely to 0 so we conclude the result.
Proof of Theorem 3.2.
For the asymptotic improvement due to shrinkage, we rely on Björck & Golub (1973, Theorem 1), which shows that principal angle can be derived from the singular value decomposition of, By maximizing γ β,βρ (or equivalently minimizing the angle between β and β ρ ), we are directly choosing ρ * N such thatβ ρ * N is the principal vector with corresponding principal angle toβ. Finding the vector in terms of correction quantity is easier through direct maximization of γ β,βρ , and finding the improvement is easier through the principal angle computation, despite the results being equivalent.
From the above product, the squared cosine of the principal angle and its asymptotic value is, Proof of Lemma C.1. By orthogonal invariance of X n and the independence of Y n , X T n Y n = X T n Q T n Q n Y n = (Q n X n ) 1 where Q n is an orthongal matrix such that Q n Y n = e 1 , e 1 is the first canonical vector, and X n1 is the first entry of X n . We know from Muller (1959) for X n1 , where Z 1 ∼ N (0, 1) and is independent of χ 2 n−1 . We have, where the inverse chi-squared distribution has finite moment for n large enough and C is some constant related to the moments of the standard normal distribution and the inverse chi-squared distribution. By the Borel-Cantelli lemma, we conclude the result.
Proof of Proposition 3.5. For the equally weighted portfolio w = 1 N 1 N usingβ ρ * N ,
Note that all quantities involved depend on N . We suppress this dependence to ease the notation. Recall E in (10) defined as Proposition D.1 (Asymptotics). Suppose Assumption 2.2 holds. Let µ 2 = σ 2 /N andμ 2 =σ 2 /N and assume that (μ,δ) are bounded in N .
Proof. All claims follow from the collection of Lemmas below.