Structural shrinkage of nonparametric spectral estimators for multivariate time series

In this paper we investigate the performance of periodogram based estimators of the spectral density matrix of possibly high-dimensional time series. We suggest and study shrinkage as a remedy against numerical instabilities due to deteriorating condition numbers of (kernel) smoothed periodogram matrices. Moreover, shrinking the empirical eigenvalues in the frequency domain towards one another also improves at the same time the Mean Squared Error (MSE) of these widely used nonparametric spectral estimators. Compared to some existing time domain approaches, restricted to i.i.d. data, in the frequency domain it is necessary to take the size of the smoothing span as"effective or local sample size"into account. While B\"{o}hm and von Sachs (2007) proposes a multiple of the identity matrix as optimal shrinkage target in the absence of knowledge about the multidimensional structure of the data, here we consider"structural"shrinkage. We assume that the spectral structure of the data is induced by underlying factors. However, in contrast to actual factor modelling suffering from the need to choose the number of factors, we suggest a model-free approach. Our final estimator is the asymptotically MSE-optimal linear combination of the smoothed periodogram and the parametric estimator based on an underfitting (and hence deliberately misspecified) factor model. We complete our theoretical considerations by some extensive simulation studies. In the situation of data generated from a higher-order factor model, we compare all four types of involved estimators (including the one of B\"{o}hm and von Sachs (2007)).


Structural shrinkage of nonparametric spectral estimators for multivariate time series
Hilmar Böhm Rainer von Sachs * Abstract: In this paper we investigate the performance of periodogram based estimators of the spectral density matrix of possibly high-dimensional time series. We suggest and study shrinkage as a remedy against numerical instabilities due to deteriorating condition numbers of (kernel) smoothed periodogram matrices. Moreover, shrinking the empirical eigenvalues in the frequency domain towards one another also improves at the same time the Mean Squared Error (MSE) of these widely used nonparametric spectral estimators. Compared to some existing time domain approaches, restricted to i.i.d. data, in the frequency domain it is necessary to take the size of the smoothing span as "effective or local sample size" into account. While Böhm and von Sachs (2007) proposes a multiple of the identity matrix as optimal shrinkage target in the absence of knowledge about the multidimensional structure of the data, here we consider "structural" shrinkage. We assume that the spectral structure of the data is induced by underlying factors. However, in contrast to actual factor modelling suffering from the need to choose the number of factors, we suggest a model-free approach. Our final estimator is the asymptotically MSE-optimal linear combination of the smoothed periodogram and the parametric estimator based on an underfitting (and hence deliberately misspecified) factor model. We complete our theoretical considerations by some extensive simulation studies. In the situation of data generated from a higher-order factor model, we compare all four types of involved estimators (including the one of Böhm and von Sachs (2007)).

Introduction
Spectral analysis of multivariate time series is known to be a useful tool to analyse not only serial but also cross-correlations of dynamic data of possibly high dimension (Shumway and Stoffer, 2000). In the absence of some possibly restrictive parametric assumptions on the dynamics of the time series (such as vector autoregressive -moving average of finite order), the standard nonparametric approach of smoothing the periodogram matrix over frequency usually shares wellestablished and generally even for moderate sample sizes satisfactory properties such as approximate unbiasedness, approximate uncorrelatedness over different frequencies and the usual variance-bias trade off known from classical nonparametric theory (Brillinger (1975)). What is less known and explored, however, and highly relevant for more and more frequently met situations of large dimensionality of the time series, is the deterioration of the condition number of the resulting nonparametric estimator (smoothed periodogram matrix). It is known that a high condition number of such a matrix, i.e. the ratio l max /l min of its largest to its smallest eigenvalue, leads to numerical instabilities, in particular when the (estimated) spectral density matrix is used subsequently in sensitive functionals such as its inverse or its determinant. A prominent example for the latter ones is the use of the Kullback-Leibler discrimination information (Kullback and Leibler, 1952), as a measure of disparity between several estimated multivariate spectra (as in Kakizawa, Shumway and Taniguchi (1998), e.g.), to be used in classification of multivariate time series.
In many fields of application, including economic panel data (Bai and Ng, 2002;Forni, Hallin, Lippi and Reichlin, 2000), but also genetic engineering or neuropsychology, the dimension of the data can come close to the sample size, making the smoothed periodogram become close to a singular matrix, in particular.
In this paper we suggest a remedy to improve upon the smoothed periodogram as an estimator for the multivariate spectrum using regularization, i.e. shrinkage, techniques. It is known from the statistical literature on estimation in i.i.d. data situations (Haff, 1977(Haff, , 1979(Haff, , 1980, that shrinkage helps to correct the following effect: the dispersion of the sample eigenvalues can be tremendously larger than the dispersion of the population eigenvalues of the spectrum as the large eigenvalues are biased upwards, the small ones downwards (Jolliffe, 2002). Thus, the quality of an estimator of a high-dimensional target can be improved, by shrinking the eigenvalues towards one another, not only numerically, but even on the level of the widely used criterion of mean square error (Beran and Dümbgen, 1998;Ledoit and Wolf, 2004).
We note that this technique is more than just standardizing each dimension of the time series -which would improve the condition number in case of minimally coherent data, but not so with potentially highly cross-correlated data (the interdependence over dimension being responsable for the afore-mentioned dispersion effect).
Compared to existing work on shrinkage in the time domain, we show that in the frequency domain it is necessary to take the size of the smoothing span m as "effective or local sample size" into account. We note that simply choosing the smoothing span of the smoothed periodogram sufficiently large is no reasonable solution to the problem: depending on the roughness of the true spectral density to be estimated, this might result into important oversmoothing.
For reasons of notational simplicity, in this work, we consider as simplest smoothing method the averaged periodogram, that is a symmetric kernel smoother of finite support ("boxcar") with equal weights for each periodogram ordinate within the smoothing span. One can easily check that all the results of our paper carry directly over to the more frequently used kernels in the smooth-ing literature.
Our proposed shrinkage estimator is, pointwise at frequency ω ∈ (0, 2π], a convex combination of the averaged periodogramf 0 T (ω) with some shrinkage targetf 1 T (ω) in the frequency domain. I.e., our estimators are of the form where in order to reduce the dispersion of the eigenvalues off 0 T (ω), the factor r T is chosen such that the sample eigenvalues are shrunk towards each other linearly. The most direct target to use would be (a multiple of) the identity matrix, i.e.f 1 T (ω) = µ(ω) Id. This set-up has been treated by the authors in a companion paper (Böhm and von Sachs, 2007), where they determine the optimal amount of shrinkage by a data driven approach in a framework of an asymptotically with sample size growing dimension.
Obviously, the technique of shrinkage has a certain relationship to ridge regression. In fact, a linear combination of a sample covariance and the identity matrix has been used as original motivation for ridge regression (Hoerl and Kennard, 1970). However, shrinkage of empirical covariance matrices or spectral density estimators is not the same thing as ridge estimation. In the first approach, the eigenvalues of the matrices under consideration are shrunken towards each other, and hence their dispersion is reduced. Constructing a ridge, all eigenvalues are moved away from zero (either by the same amount for ordinary ridge or by some individual constant for generalized ridge regression) in order to regularize the estimator. Recent theoretical work of Bickel and Levina (2007), Rothman, Levina and Zhu (2008), e.g., using a lot more refined techniques such as Lasso, or thresholding for regularization of large-dimensional covariance matrices, are in this latter spirit.
Using the identity matrix as a shrinkage target is reasonable if there is little or no knowledge about the underlying multidimensional structure of the data. In this case, a shrinkage target should be used that imposes the least possible amount of structure and which, at the same time, has the best of all possible condition numbers. In many settings, however, it is reasonable to assume that the covariance or spectral structure of the data is induced by underlying, known or hidden, factors. The general idea underlying factor models is that p observed random variables can be expressed, except for an error term, as linear functions of q < p random factors. For instance, in econometrics, markets are usually assumed to be driven by underlying global variables such as interest rate, employment rate or gross national product. The models reach from simple one-factor models, as in Sharpe (1963), to sophisticated approaches that use multiple global and industry specific factors that may be intercorrelated, as, e.g., in Forni et al. (2000).
A disadvantage of factor models is that, usually, the number of factors is a parameter that must be either specified a priori or chosen by somewhat sophisticated data-driven procedures akin to model selection. Research on how to propose a generally satisfying criterion is still going on (Bai and Ng, 2002;Hallin and Liška, 2007), and it would be interesting to avoid this problem while taking advantage of the structure imposed by a factor model to be a remedy to the curse of dimensionality.
We have developed a hybrid approach to circumvent the dilemma of model choice and still retain the advantages of factor analysis. We combine a nonparametric estimator, in our case the averaged periodogramf 0 T (ω), with a parametric estimatorf 1 T (ω) of the spectral matrix. The latter is our new shrinkage target. It is given by fitting a one-factor model to the data. However, we do not assume that this model is true; rather, we believe that the data follow a more complicated structure. This may be a q-factor model (q > 1), a model driven by different layers of factors, or the model may be completely unknown. By combining a shrinkage target, which is actually underfitted, with a nonparametric estimator of the spectrum, we circumvent the problem of model choice. In a data driven approach, weights are chosen such that the new, hybrid estimator is the asymptotically optimal linear combination of two conventional estimators. The first component, the averaged periodogram, is asymptotically unbiased but has high variance. The second component is biased due to misspecification but, by imposing structure, has low variance.
We note that, instead of choosing a one-factor model as our shrinkage target, we might as well opt for something more complicated, e.g. a q factor model with q > 1. The only prerequisite for doing this is having background knowledge that the underlying structure is more complicated than the shrinkage target, e.g. aq factor model,q > q. The theory we will give in section 3.1 can easily be adapted to such a case.
To the best of the authors' knowledge, there is no literature on shrinkage to a factor model in time series analysis. In the literature on finance, an approach to shrink to a factor model has been developed in the context of portfolio selection (Ledoit and Wolf, 2003) under iid assumptions on the data. However, the idea of shrinking a nonparametric fit towards a parametric estimator drives quite generally a variety of existing approaches, among which one finds the work of Daniels and Cressie (2001), and to some extent, of Botts and Daniels (2006), in the context of Bayesian covariance and spectral estimation, respectively.
The remainder of this paper is organized as follows: in the next section, we will develop the theoretical background for data driven shrinkage to a 'market' one-factor model, where the term 'market' is just a wildcard term that does not necessarily mean that we are in an economic context. We will first give the basic assumptions and definitions in the following subsection. In section 3.2, we will introduce the shrinkage target, which is a one-factor model. The model assumptions are that the p dimensional process is driven by a dynamic, hidden or known, underlying process with spectral density f 0 (ω). We will fit this model to the data; however, at the same time we assume that the model be misspecified. The philosophy behind this is that the model is just a parsimonious tool of describing the data. In sections 3.4 and 3.5 we derive the MSE-optimal solution for the shrinkage intensity which is a function of the true spectral density f (ω). In section 3.6, we will examine the asymptotic behaviour of the MSE-optimal shrinkage intensity r T (ω), which will help us to develop a data driven estimator in section 3.7. Comprehensive Monte Carlo studies will show the usefulness of our estimator in section 4. We note that most proofs are relegated to an appendix section.

Multivariate spectral analysis
We assume that we observe a realization (X t ) T t=1 of a p-dimensional real-valued, centered stationary Gaussian time series (X t ). We aim at estimating the p × p spectral density matrix function at frequency ω ∈ (0, 2π] where ι = √ −1 . The most common nonparametric estimators of (1) are based on the periodogram. If we denote by the vector-valued discrete Fourier transform of the realization (X t ) T t=1 , then the p × p periodogram matrix is defined as where * means conjugate complex transpose . Furthermore, we will denote conjugate complex (for a scalar value) by overline. The periodogram is not a consistent estimator of the spectrum (1), but it is asymptotically unbiased. Moreover, for p > 1, the periodogram is a singular matrix: if d T (ω) = (d 1 (ω), . . . , d p (ω)) ′ , then (3) can be expressed as and thus has almost surely rank 1. If the periodogram is smoothed over frequency, the estimators derived this way are consistent under a classical asymptotical framework. We will restrict ourselves to the simplest form of smoothing, the averaged periodogram with smoothing span m T , where the conditions m T /T → 0 and m T → ∞ as T → ∞ guarantee consistency and asymptotic unbiasedness: where ω k denotes the Fourier frequency 2πk/T .

Setup and assumptions
Our aim is to estimate the spectrum f (ω) of a p-variate Gaussian time series. We assume that we have realizations Moreover, we assume that we have realizations from another, one dimensional time series (X 0t ) t∈{1,...,T } = X 01 , . . . , X 0T to which we refer as the market or exogenous time series. The market time series is thought to be a process that has a certain explanatory value for the other time series (X it ), i = 1, . . . , p. One possible choice is to use the average over dimension in the time domain of the (X it ) i=1,...,p , However, we make no special assumptions on the market time series. It would as well be possible to choose an external variable or the first principal component of the data. We make the following assumptions: Assumption 3.1. All our time series, including the market time series, are centered E X it = 0 i = 0, . . . , p and stationary. In this paper, purely for reasons of simplifying the presentation, we do not present our estimation results in terms of the spectrum directly, but rather choosing the expected periodogram as estimation target. This is possibly without loss of generality because the expected periodogram f 0 T (ω) approaches the true spectrum f (ω) with a rate of convergence suffiently fast to enable us to carry over our proofs immediately to estimate f (ω). In order to do so we make the following assumption: Then, we have the following well-known result from Brillinger (1975) or Shumway and Stoffer (2000): Lemma 3.1. Under assumption 3.2, f (ω) has (elementwise and for real-and imaginary parts separately) continuous derivatives of order one, and hence The enhanced estimator we want to construct is gained by linearly combining a standard nonparametric estimator, in our case the averaged periodogram, with a shrinkage target. The latter is gained by fitting a one-factor model to the data, where the time series X 0t is assumed to be the underlying factor.
We assume the dimension p to be fixed while still T → ∞. We denote the ith component of the discrete Fourier transform of the data at frequency ω as d i (ω).
We furthermore make the following notational convention: whenever we use vector-or matrix valued terms, we will mean the respective p-dimensional vector or the p × p matrix unless we explicitly state otherwise. Thus, f (ω), f 0 T (ω) andf 0 T (ω) refer to the spectrum, expected averaged periodogram and averaged periodogram, respectively, of the time series (X it ) i=1,...,p . We will also refer to the p-dimensional vector of the time series at time t as X t . However, when we look at components, we will use the index value zero to refer to the market time series. E.g., we refer to the cross-spectrum between the market series and the first component of X t as f 01 (ω).

One-factor model
The shrinkage target is given by fitting a one-factor model to the data (X it ), i = 1, . . . , p, which we will define in this section. We will use a different notation for the random variables to emphasize that this model is not assumed to hold true for the data X it . Rather, we use the model as a parsimonious tool to approximate the spectral structure of the process.
Let us assume that we have a univariate exogenous time seriesẊ 0t , t = 1, . . . , T with spectrumḟ 0 (ω). When we speak of exogenous, we mean that this dataẊ 0t can be used as a factor time series that has some explicative value for the data in the sense of the following model: The weights β i ∈ R are non-random. The idiosyncratic components ǫ it are assumed to be normally distributed and independent over time and dimension, and independent of (Ẋ 0t ): In this simple factor model, all serial correlation in the dataẊ it originates from serial correlation in the exogenous time seriesẊ 0t . The serial correlation of the exogenous time series is determined by its spectrumḟ 0 (ω).
The fact that the idiosyncratic components are uncorrelated over time and dimension is important, as in either other case, it would be impossible to identify the model under classical asymptotics (Forni et al., 2000). Together with the independence between the idiosyncratic components and the exogenous time series, this has two more advantages: first, it will allow us to use linear regression to estimate the β i and the (σ ǫ i ) 2 . Second, this model implies, simply by linearity, the following relationship for the DFTs of the data: whereḋ ǫ i (ω) is the DFT of the idiosyncratic components. Furthermore, We see from (10) that the variance in the idiosyncratic components is independent of frequency. Furthermore, the weights β = (β 1 , . . . , β p ) ′ are independent of the frequency, too, due to (7). This means that the spectrum under the above specified one-factor model (7) is When it comes to estimation of the one-factor model, we will as afore-mentioned identify the spectrum with the expected averaged periodogram. Thus, instead of using the model (11), we will use the slightly modified model whereḟ 0 0 (ω) means the expected averaged periodogram of the factor time se-riesẊ 0t .

Estimation of one-factor model
The model (13) is assumed not to hold true. However, even under these circumstances, it is possible to fit this model to the time series X it by choosing weights β i such that the L 2 distance between f 0 (ω) and ββ ′ f 0 (ω) becomes minimal.
We will refer to this minimum L 2 distance spectral density under the onefactor model as to f 1 (ω).
The fact that both weights β i and idiosyncratic variances (σ ǫ i ) 2 are independent of lag and frequency, respectively, enables us to estimate these parameters with standard methods. We use linear regression to obtain the following esti- which is just the standard estimator of the slope in linear regression. Next, we need to estimate the variances (σ ǫ i ) 2 of the idiosyncratic components. The standard way to do this is again to use the time domain estimator of the residual variance, which we normalize by 1/2π: Furthermore, both estimators have the convenient property of being consistent, and the stochastic rate of convergence is in both cases 1/ √ T (Sachs and Hedderich, 2006): and Plugging the estimators from (16) and (17) and the averaged periodogram of into the definition of the one-factor model (13), we obtain an estimator of the multivariate spectrum that is based on a one-factor model: where This estimator is our shrinkage target. By construction of the model, with equations (14) and (15), we observe that on the diagonalf 1 T (ω) =f 0 T (ω).

Optimal shrinkage intensity
Our aim is to improve upon the averaged periodogram by shrinking to a target matrix function that is more regular, at the price of possibly having larger bias.
Here, we make the assumption that a one-factor model is not far too crude an approximation. We do, however, not believe that the underlying structure is totally explained; we even make the opposite assumption, namely that the model is misspecified: Assumption 3.3. There exists a δ > 0 such that, uniformly over all frequencies ω ∈ [0, 2π] and all i, j = 1, . . . , p, we have

H. Böhm, R. von Sachs/Structural shrinkage of nonparametric spectral estimators 705
Assumption 3.3 is made for technical reasons: the estimator of the shrinkage intensity which we are going to derive will have an estimator of the difference f 1 ij (ω) − f 0 ij (ω) in the denominator. Because of this, assumption 3.3 is needed to avoid problems of identifiability.
We search for a linear combination where ζ T (ω) is a data driven estimator of an optimal, oracle shrinkage intensity ζ * T (ω) that is the solution of the minimization problem that is, We will proceed in three steps: First, in subsection 3.5, we will derive the optimal, oracle shrinkage intensity ζ * T (ω) which depends on background knowledge of the underlying process.
Second, in subsection 3.6 we will derive the asymptotic behaviour of the oracle shrinkage intensity. We will see that the necessity to shrink vanishes asymptotically. This is because the averaged periodogram is a consistent estimator whereas the shrinkage target is misspecified due to assumption 3.3. As a consequence, the data driven estimator of f 0 T (ω) will asymptotically have the same behaviour as the averaged periodogram, as the data driven estimator of the shrinkage intensity will converge to zero. Finally, we will construct a data driven estimator in subsection 3.7.

Oracle shrinkage intensity
We will derive the oracle shrinkage intensity by solving the minimization problem given in formula (20). This can simply be done by differentiation. Let z ∈ [0, 1] denote a shrinkage intensity. The risk R(z) associated with z is derived in Appendix A.1: where we have used that Ef 0 T (ω) = f 0 T (ω) and, according to (13), Ef 1 T (ω) = f 1 T (ω).

H. Böhm, R. von Sachs/Structural shrinkage of nonparametric spectral estimators 706
The first derivative of R(z) with respect to z is: Moreover, the second derivative is where we use thatf 1 T (ω) andf 0 T (ω) are hermitian, so that the imaginary parts sum to zero.
Thus, we know that any local extremum will be a minimum. Setting the first derivative equal to zero, we obtain the following theorem. Theorem 3.3. The optimal shrinkage intensity is given by Proof. The proof is found in A.1.

Asymptotic behaviour of optimal shrinkage intensity
Now, we will examine the asymptotic behavior of the optimal shrinkage intensity (23). This will enable us to derive a data driven estimator in the following subsection. We first define the following parameters: where the subcomponents are defined, respectively, as: (29) using the notation AsyVar(·) := lim T →∞

Var(·)
and with weights β i defined in equation (7). Now, we can express ζ * T (ω) as a function of (24) to (26) plus a remainder term which converges to zero sufficiently fast under the following additional assumption: Assumption 3.4. The smoothing span m T is supposed to fulfill m 2 T /T → 0 as T → ∞.
This assumption 3.4 is made for the technical reason of proving the following theorem which gives now the exact expression of ζ * T (ω): Theorem 3.4. The optimal shrinkage intensity can be expressed as the following function of the parameters π(ω), ρ(ω) and γ(ω): This means that the optimal shrinkage intensity converges to zero at a rate of 1/m T . At the same time, it can be approximated by the parameters (24) to (26) with an error that vanishes, under assumption 3.4, with the faster rate of T −1/2 . This will allow us to derive a data driven estimator of the shrinkage intensity, and thus of f 0 T (ω), by plugging in estimators for (24) to (26) in (30).

Data driven estimation
The final step in deriving a data driven estimator of the spectrum that combines the averaged periodogram with a parsimonious, one-factor model based estimator, is to derive estimators for the parameters π(ω), ρ(ω) and γ(ω). We will start by estimating π(ω). According to (24), π ij (ω) is the asymptotic variance of the i, jth component of the averaged periodogram, scaled by the smoothing span m T . The following lemma will give a consistent estimator: Lemma 3.5. π(ω) is estimated consistently by where i.e. p ij (ω) is the standard estimator of the local variance of the (i, j)th component of the periodogram at frequency ω.

Proof. The proof is given in A.3
The next step is to estimate ρ(ω). We will estimate its components and distinguish between the components on the diagonal and the components on the off-diagonal. As observed earlier, on the diagonal,f 1 T (ω) =f 0 T (ω), thus we can use the estimator (32). On the off-diagonal, we can use the estimator given by the following lemma: Lemma 3.6. For i = j, a consistent estimator of ρ ij (ω) is given by Proof. The proof is given in A.4.
With the help of lemmata 3.5 to 3.7, we can now construct the data driven market shrinkage estimator of the spectrum, which is given by the following theorem: Theorem 3.8. The estimator is a consistent estimator of Proof. This is implied by assumption 3.3 in conjunction with lemmata 3.5, 3.6 and 3.7.
Thus, we have finally arrived at a shrinkage estimator that depends on the data only, not on background knowledge of the underlying process: We will refer to this estimator as to the DDMSE (data driven market shrinkage estimator). The following theorem gives the asymptotic behavior of the DDMSÊ f + T (ω):

H. Böhm, R. von Sachs/Structural shrinkage of nonparametric spectral estimators 709
Theorem 3.9. Under assumptions 3.1 to 3.4,f + T (ω) is a consistent estimator of the spectrum.
The performance of the DDMSE in practice will be examined by extensive Monte Carlo simulations in section 4.

Monte Carlo studies for the DDMSE
In this section, we will evaluate the performance of the data driven market shrinkage estimator in practice. For this, we will perform comprehensive Monte Carlo simulations. The DDMSE will have three benchmark estimators to compete with: 1. the averaged periodogram 2. the one-factor model that is the shrinkage target 3. a competing shrinkage estimator, referred to as DDSSE, that uses the identity matrix as the shrinkage target, see Böhm and von Sachs (2007) In a setting where it is reasonable to use the DDMSE, it should outperform all three benchmarks. Such a setting can be characterized as the frequently encountered situation where one may fit a factor model to the data, but has no background knowledge on how many factors to actually choose. In a screeplot of the eigenvalues, one will typically encounter one or more prominent eigenvalues followed by a longer tail of small eigenvalues. The method we have developed will allow us to avoid the problem of model choice.

Setup
For the simulations, we have chosen to use a two-factor model as the true model. The first factor is an MA(2) process. Its spectrum has a peak at π/2. The second factor driving the process is a Gaussian white noise time series; its variance will be varied in a first simulation study, to examine the performance of the DDMSE on the 'scale' between almost one-factor model to true two-factor model. Figure 1 shows the spectrum of the two factors underlying the simulations. These two factors are then projected onto a 5-dimensional time series according to the following model: Here, Υ is a 5 × 2 weight matrix that was chosen at random initially, then fixed for this section. The initial random distribution for the components of Υ was  The market factor time series was obtained as the mean over dimension of the simulated data. All simulations presented in this section were repeated for new realizations of {Υ, Cov(Ω)} without any major changes in the results, which is why we will omit these repetitive studies. A length of T = 1, 024 was chosen for the time series in this section.

Influence of the true model
The only formal prerequisite for the true model in order for the DDMSE to work is that its true spectrum is not that of a one-factor model such as the one specified in section 3.2. In this subsection, we will examine the influence of the 'distance' from a one-factor model. This is accomplished by using the twofactor model (38) to generate the data and systematically varying the standard deviation of the second, flat-spectrum factor. For small standard deviation, the data are very close to a one-factor model; as the standard deviation of the second factor increases, so does its influence. The results are given in figure 2. The effects we observe in the simulations study confirm our assumptions on the respective behavior of averaged periodogram, one-factor model, DDSSE and DDMSE. First of all, we remark that the DDMSE performs best for all choices of the white noise variance in the simulations. The averaged periodogram, upon which we want to improve, exhibits the worst performance. Not only is it outperformed by the DDSSE, which we would have expected based on the results of the preceding section, but also by the one-factor model. This shows that, in this context, the one-factor model is a useful model in itself, even although it is actually misspecified. It even outperforms the DDSSE for most choices of white noise variance. Overall, the MISE increases with the variance of the second factor, and the different estimators follow the MISE in a parallel shape.

Influence of the smoothing span
In the next Monte Carlo study, we have varied the smoothing span and examined its influence on the MISE. The results are given in figure 3. Not surprisingly, we observe that the overall MISE decreases as the sample size is increased for all three estimators. For small smoothing span, the averaged periodogram exhibits the worst performance. The DDSSE performs better than the averaged periodogram for small smoothing span, but is outperformed by the one-factor model and by the DDMSE. For the very small smoothing span m = 7, the DDMSE and the one-factor model have approximately the same MISE. Then, we have again the ranking averaged periodogram-DDSSE-one-factor model-DDMSE, as in the preceding subsection. Finally, for a comparatively large smoothing span of m = 31 or larger, the DDMSE, DDSSE and averaged periodogram seem to have approximately the same MISE. This is again not surprising, as for fixed dimension, both data driven estimators converge to the averaged periodogram. Moreover, for large smoothing span, the one-factor model performs worse than the averaged periodogram. This is, however, not due to a loss of performance of the one-factor model, which improves monotonously with m, but rather due to the faster improvement of the averaged periodogram in terms of MISE. Finally, the deterioration of the estimator based the one-factor model with respect to the averaged periodogram for large smoothing span does not make the DDMSE perform worse than the averaged periodogram. This can be explained by the fact that, for large m, the shrinkage intensity becomes negligibly small.

Conclusions
Our work deals with the concept of shrinkage in the frequency domain of multivariate time series. Similarly to our companion paper Böhm and von Sachs (2007), it uses a new, localized concept of shrinkage that allows for the development of estimators that simultaneously overcome the problem of numerical instability due to high dimensionality or collinearity and have lower quadratic risk. In contrast to the developments in the time domain of Ledoit and Wolf (2003), in the frequency domain of nonparametric estimation of the spectral density matrix by smoothing the periodogram matrix, all considerations have to be undertaken with respect to the (locally) effectively available sample size, which is governed by the smoothing parameter (and not the sample size alone). In Böhm and von Sachs (2007) asymptotic theory has been derived for the situation of shrinkage towards a multiple of the identity matrix where both the dimensionality p = p T and the smoothing span m = m T tend to infinity as the length of the time series T → ∞. In this paper, we have contented ourselves to investigate the theoretical properties of our proposed estimator by classical asymptotics, noting that a transfer to the more complex situation of "Kolmogorov" or double asymptotics would be possible as well. However, with this work on structural shrinkage, we want to put emphasis onto a different aspect of shrinkage, perhaps driven by a more applied interest. Using the identity matrix as a shrinkage target is reasonable if there is little or no knowledge about the underlying multidimensional structure of the data. However, in many situations, in particular in economic applications, it is more rewarding to incorporate potentially available background knowledge on the underlying cross dimensional structure of the data into the shrinkage target. This opens up the way to designing 'custom made' shrinkage estimators that offer a new answer to problems of model choice. In a given setting where a class of parametric or semi-parametric estimators is eligible, and the order has to be chosen, instead of relying on criteria such as AIC or BIC, the minimum order model can be used as a target towards which to shrink. Instead of calling the method "shrinkage" we might as well describe it as stretching: a too parsimonious model is fitted and the estimate is then refined by adding the periodogram as a stretching target that has low bias and high variance.
In addition to showing that a MISE-optimal "oracle" shrinkage intensity can be consistenly estimated from the data, we have shown by our Monte Carlo simulations, even for small sample size, the large gain in terms of L 2 risk of our estimator, in a situation of disposing additional structure, over the following competitors: the classical averaged periodogram, the "shrinkage to identity" estimator of Böhm and von Sachs (2007) and an estimator based on a fully parametric factor model. Simulations not reported here also demonstrate that shrinkage can be applied to tapered data; as tapering improves the rate of the bias without changing the rate of consistency, it is easy to transfer this to theory. For similar reasons, it is possible to replace the averaging of the periodogram by kernel smoothing.
An important field of application of our approach would be factor modelling of panels of economic time series data of comparatively high dimensionality. We recall that "high dimension" needs to be understood as high compared to the "effective sample size" m T . Our achievements of this paper suggest that it could be possible to circumvent the problem of searching for an appropriate factor dimension -a problem still not satisfactorily solved in the literature, in particular for dynamic factor models. This latter application calls for a possible theoretical direction of future research: the generalization of our approach to a dynamic (and latent) factor model setting that allows for lag effects.

Acknowledgements
We would like to thank Christian Hafner and Johan Segers (UC Louvain) as well as Hernando Ombao (Brown University) for helpful discussions and an anonymous referee for his comments on ridge regression. Further, we acknowledge financial support from the IAP research network grant P 5/24 of the Belgian government (Belgian Science Policy) as well as from the contract "Projet d'Actions de Recherche Concertées" n 07/12-002 of the "Communauté française de Belgique", granted by the "Académie universitaire Louvain".