Data-driven shrinkage of the spectral density matrix of a high-dimensional time series

: Time series data obtained from neurophysiological signals is of- ten high-dimensional and the length of the time series is often short relative to the number of dimensions. Thus, it is diﬃcult or sometimes impossible to compute statistics that are based on the spectral density matrix because estimates of these matrices are often numerically unstable. In this work, we discuss the importance of regularization for spectral analysis of high-dimensional time series and propose shrinkage estimation for estimating high-dimensional spectral density matrices. We use and develop the multivariate Time-frequency Toggle (TFT) bootstrap procedure for multivariate time series to estimate the shrinkage parameters, and show that the multivariate TFT bootstrap is theoretically valid. We show via simulations and an fMRI data set that failure to regularize the estimates of the spectral density matrix can yield unstable statistics, and that this can be alleviated by shrinkage estimation.


Introduction
With the ubiquity of high-dimensional time series data, there is a need for developments of statistical methods for spectral analysis of time series data that are robust to the curse of high-dimensionality. Examples of high-dimensional time series include, but are not limited to, signals from electroencephalogram (EEG), data obtained from functional magnetic resonance imaging (fMRI) experiments, systems biology, and economic panel data. Böhm and von Sachs [4] developed a shrinkage procedure for spectral analysis of high-dimensional portfolio analysis and Fiecas et al. [11] used shrinkage estimation for spectral analysis of EEG signals. There have been many contributions to the literature on the estimation of high-dimensional covariance matrices (see Pourahmadi [25] for a thorough review), but only very few contributions to the literature in the estimation of high-dimensional spectral density matrices, which can be thought of as the covariance matrix of the time series data in the frequency domain. The works by Böhm and von Sachs [5] and Fiecas and Ombao [10] used the same shrinkage framework for spectral analysis of multivariate time series, but they had two different goals. The former was interested primarily in regularization, and so their shrinkage estimator introduced a substantial amount of bias to yield an estimator that is numerically stable. The latter, on the other hand, was interested primarily in obtaining a good fit to the spectral density matrix, and so their shrinkage estimator had good frequency resolution, allowing them to accurately capture the frequencies that drive the dynamics of the process. However, they had the luxury of large samples and multiple traces of the data. The primary goal of this work is to give further methodological developments to shrinkage estimation of spectral density matrices by balancing two extremes, namely, that of regularization per Böhm and von Sachs [5] and that of fit per Fiecas and Ombao [10], and in the setting where the length of the time series is small relative to its dimensionality. Specifically, we will use a diagonal shrinkage target in order to accurately capture the power for each dimension of the time series and simultaneously yield an estimate of the spectral density matrix that is regularized due to dimensionality. Furthermore, one important parameter in the shrinkage framework is the shrinkage weight, which is a function of population-level statistics. The strategy by Böhm and von Sachs [5] and Fiecas and Ombao [10] to estimate these statistics was to borrow information from neighboring frequencies, which decreased their frequency resolution. In the present work, to estimate these statistics, we have developed a bootstrap procedure for multivariate time series, and so we do not suffer from this loss in frequency resolution.
There has been very little theoretical and methodological developments on the bootstrap for multivariate time series. The classic work by Franke and Härdle [15] on bootstrapping univariate time series in the frequency domain was extended to the multivariate setting by Berkowitz and Diebold [2], though without proving theoretical validity. Dette and Paparoditis [9] gave theoretical developments on bootstrapping frequency domain statistics for hypothesis testing. Dai and Guo [8] and Guo and Dai [17] showed how to create bootstrap samples of multivariate time series given any valid spectral density matrix, and their ideas were recently used by Krafty and Collinge [20] for creating bootstrap confidence intervals for each element of the spectral density matrix. Jentsch and Kreiss [18] developed the multiple hybrid bootstrap for multivariate time series, which combines the time domain parametric bootstrap and the frequency domain nonparametric bootstrap, and showed its theoretical validity. In the present work, we have extended the ideas by Kirch and Politis [19] on the Time-Frequency Toggle (TFT) bootstrap for univariate time series to the multivariate setting. The method is called TFT because the original data is observed in the time domain, which is then mapped to the frequency domain where it is resampled, and then mapped back to the time domain. In our context, we can show that our extension of the TFT to the multivariate setting produces theoretically valid bootstrap samples.
This paper is organized as follows. In Section 2, we discuss and develop shrinkage estimation for the spectral density matrix. This section includes a brief review of smoothed periodogram matrices and also our algorithm for the multivariate TFT bootstrap. In Section 3, we illustrate the performance of shrinkage estimators on simulated high-dimensional time series data. In Section 4, we present results from the analysis of a high-dimensional resting-state fMRI data set. The theoretical validity of the multivariate TFT bootstrap is argued in Section 5. And finally, Section 6 is our discussion of this work.

The smoothed periodogram matrix
Let X(t), t = 1, . . . , T, be a discrete real-valued zero-mean weakly stationary time series with an absolutely summable autocovariance function. The P × P spectral density matrix f (ω) of X(t) is To estimate f (ω) nonparametrically, we first convert the data X(t) from the time domain to the frequency domain using the discrete Fourier transform: d X (ω) = T t=1 X(t) exp(−i2πωt). The periodogram matrix is I T (ω) = T −1 d X (ω)d X (ω) * , where ( * ) denotes the complex conjugate transpose. It is well-known that the periodogram matrix is an asymptotically unbiased but inconsistent estimator for f (ω) [6]. If we smooth each (j, k)-th element of I T (ω) using a smoothing kernel K (jk) T (·) whose smoothing span is M (jk) T , then this gives us each element of the smoothed periodogram matrix f T (ω), i.e., the (j, k)-th (ω−α)I jk,T (α)dα. Under regularity conditions and in an asymptotic framework where the smoothing spans increase at a rate slower than the sample size T , the smoothed periodogram matrix is a consistent estimator for f (ω) [6].
Even though the smoothed periodogram matrix is a consistent estimator, it can also be very unstable numerically, in particular for high dimensionality; at each frequency, the dispersion between its maximum eigenvalue and its minimum eigenvalue can be considerably larger than the corresponding quantity of its population analog, the true underlying spectral density matrix. This leads to an increased condition number (see, e.g., Böhm and von Sachs [5]), which is defined to be the ratio of the maximum eigenvalue to the minimum eigenvalue. As a result, statistics based on the inverse of the spectral density matrix are either impossible to compute because the smoothed periodogram matrix is not invertible or, if it is invertible, they will have high variance. This motivates to use, as an alternative, shrinkage estimators to obtain regularized estimates of the spectral density matrix, as shrinkage can considerably reduce the dispersion in the range of the empirical eigenvalues and, in particular, move the minimum eigenvalue further away from zero.

Shrinkage estimators
The class of estimators for the spectral density matrix we consider in this work will have the following form: where f T (ω) is the smoothed periodogram matrix, Ξ(ω) is what we call the shrinkage target, and W T (ω) is the shrinkage weight. This form, a convex combination between the shrinkage target and the data-driven smoothed periodogram matrix, was used in previous works for estimating spectral density matrices [5,4,10]. Specifically, Böhm and von Sachs [5] used is the mean eigenvalue of f T (ω) at frequency ω. This is the frequency domain analog of the estimator developed by Ledoit and Wolf [22] for estimating highdimensional covariance matrices. If the primary goal is to improve the condition number, then one should choose µ(ω)Id to be the shrinkage target because it guarantees an estimator with a considerable reduction in the dispersion of its eigenvalues, i.e., the distribution of the eigenvalues are shrunk towards their mean. Böhm and von Sachs [4] and Fiecas and Ombao [10] used Ξ(ω) = Ξ(ω; θ), i.e., the smoothed periodogram matrix was shrunk towards the spectral density matrix of a parametric model. Their idea was to fit a parametric model, which was likely to be misspecified, but correct the misspecification via the smoothed periodogram matrix which is completely data-driven. Specifically, Fiecas and Ombao [10] used the spectral density matrix obtained by fitting a vector autoregressive (VAR) model to the data. The dimension of the parameter space of a VAR model, however, is of the order P 2 , but in that work they had a large sample size and many traces of the time series that allowed them to efficiently estimate the large number of parameters. In the present work, we are more interested in the scenario where there is only one trace of the time series available and whose dimensionality P is large relative to its length T .

The diagonal shrinkage target
Our aim in this work is to use a shrinkage target that will balance regularization due to the high-dimensionality of the data and fit. To motivate this problem, consider Figure 1, which shows an estimate of the autospectra of a 90-dimensional time series, taken from a resting-state fMRI data set which we will see later. The autospectra are the diagonal elements of f (ω). In Figure 1, we see a considerable amount of heterogeneity among the autospectra, and so we would achieve better fit to the data if this heterogeneity is accounted for. Thus, in this work we will let the shrinkage target be a diagonal matrix, which will be a function of a vector of parameters θ T , i.e., Ξ(ω) = Ξ(ω, θ T ), estimated from the data; for further emphasis that the shrinkage target is a diagonal matrix, from here on we denote Ξ(ω, θ T ) = D T (ω), and we omit the dependency of D T (ω) on θ T for simplicity in notation. In order to construct the diagonal shrinkage target D T (ω), we proceed by treating each dimension of X(t) independently: for the j-th dimension of X(t), we will consider the class of autoregressive (AR) models, and the estimated parametric spectral density function of the AR model will be the (j, j)-th element of the estimated shrinkage target D T (ω), i.e., where (p j ) is the order of the AR model picked using, say, the Bayes Information Criterion (BIC), and ( σ 2 j,T , φ pj ,T ) ⊤ is the vector of estimated parameters for the AR(p j ) model. AR models have the desirable property that, under regularity conditions, the spectral density of a univariate time series can be approximated by that of an AR model [1]. Thus, asymptotically, the estimated shrinkage target will preserve the power of each dimension of X(t).

The shrinkage weight
From Equation (1), the shrinkage estimator is a weighted average, where the weight is given by W T (ω), between the shrinkage target and the smoothed periodogram matrix. We can define W T (ω) to be optimal in the sense that it minimizes the quadratic risk function for the shrinkage estimator. Following Böhm and von Sachs [5] and Fiecas and Ombao [10], the quadratic risk was constructed with respect to E( f T (ω)) as opposed to the true spectral density matrix f (ω). This is a purely theoretical device because, under mild regularity conditions on the smoothing span, f T (ω) is asymptotically unbiased and converges to the true spectral density matrix f (ω) sufficiently fast [6]. As shown by Fiecas and Ombao [10], this leads to a closed-form solution for the optimal shrinkage weight W T (ω), namely where var( f T (ω)) = P jk var( f jk,T (ω)), and ||A|| 2 = tr(AA * ) is the Hilbert-Schmidt norm. Note that the behavior of the shrinkage weight is a function of the relative performance of each of the smoothed periodogram matrix and the shrinkage target. Consider the three terms: i) the variance of the smoothed periodogram matrix, ii) the covariance between the smoothed periodogram matrix and the shrinkage target, iii) and the expected squared distance between the smoothed periodogram matrix and the shrinkage target. For i), if the variance of the smoothed periodogram matrix is large, which is the case when the dimensionality is large [5], then the shrinkage weight will be large and so shrinkage estimator will favor the shrinkage target. Term ii) corrects for the covariation coming from the fact that the smoothed periodogram matrix and the shrinkage target were estimated using the same data. Term iii) corrects for model misspecification. If the model misspecification of the shrinkage target is severe, then its distance from the smoothed periodogram matrix (which is a consistent estimator) will be large, and so the shrinkage weight will be small so that the shrinkage estimator will favor the smoothed periodogram matrix.
To estimate the shrinkage weight, Böhm and von Sachs [5] and Fiecas and Ombao [10] both used moment estimators that borrowed information from neighboring frequencies. However, with this approach, frequency resolution is lost. This would also apply to plug-in estimators which could be used alternatively, but which would be suboptimal as depending on asymptotic develop-ments. Instead, in this work, we propose to use the bootstrap to estimate the shrinkage weight, which we describe in the next section.

The multivariate time-frequency toggle bootstrap
Our aim with the bootstrap is to create replicates of the data in order to obtain a bootstrap sample of the estimated shrinkage target D T (ω) and of the smoothed periodogram matrix f T (ω). We can then easily use the bootstrap distribution to obtain moment estimators of the variance, covariance, and expected squared distance that are necessary to estimate W T (ω). We use a multivariate generalization of the Time-Frequency Toggle (TFT) bootstrap method given by Kirch and Politis [19] described in the following sections.

Estimating the shrinkage weight
Given the data X(t), t = 1, . . . , T , the bootstrap procedure for estimating W T (ω) is as follows: 1. Obtain the smoothed periodogram f T (ω) and shrinkage target D T (ω). 2. For each bootstrap sample and Fourier frequency k = 1, . . . , T , generate , and Z + (k) = Z + (T − k + 1) * , where N R and N C denote the real and complex P -variate normal distributions, respectively. 3. Use the resampled Fourier coefficients Z + (k) to generate a time domain sample: . This is motivated by the discretized version of the Cramér representation of the time series data [6]. Repeat this step B times to obtain B bootstrap samples. 4. For each bootstrap sample, obtain D + T (ω) using independent univariate model fits, where the model for dimension j is the same as that used to obtain D jj,T (ω). Obtain f + T (ω) using the same smoothing kernel and smoothing span as before. 5. Set var( f T (ω)) to be the empirical variance of f + T (ω). 6. Set cov( f T (ω), D T (ω)) to be the empirical covariance between f + T (ω) and Note that we generated data using the smoothed periodogram matrix f T (ω) so that the spectral density matrix of the bootstrapped data X + (t) is f T (ω). Now one may often encounter the scenario of f T (ω) having negative eigenvalues, especially when T is small relative to P . In this case, to ensure eigenvalues which are strictly positive, we suggest to instead generate the bootstrapped data from a pre-regularized estimator f T (ω) + ǫ(ω)Id, where ǫ(ω) is a scalar, which is potentially a function over frequencies, large enough at each frequency ω j so that f T (ω j ) + ǫ(ω j )Id has eigenvalues that are strictly positive. Also, we emphasize that, even though the data X(t) is observed in the time domain, the resampling takes place in the frequency domain, and the resampled data are then mapped back to the time domain to create the bootstrapped time domain data X + (t). In this work, we are not interested in reproducing (the whole distribution of) the time series X(t), but rather in estimating quantities which only depend on second-order characteristics of this time series. Hence, using Gaussian increments in the frequency domain to generate our time-domain data, in this specific context, does not appear to be restrictive. This is indeed similar to what Franke and Härdle [15] have suggested in their seminal work on (univariate) kernel spectral bootstrap as a valid alternative to their residualbased bootstrap, namely, bootstrapping from the asymptotic distribution of periodogram ordinates, which is χ 2 , and in fact, the square of the (complex) normals we use in our bootstrap.

Shrinkage towards an arbitrary target
The bootstrap algorithm proposed in Section 2.3.1 for estimating the shrinkage weights are not constrained to work only when the shrinkage target is a diagonal matrix. Consider now the general class of shrinkage estimators as given in Equation (1). The shrinkage target, Ξ(ω), is potentially a function of some vector of parameters θ, i.e., Ξ(ω) = Ξ(ω; θ). Using the data to estimate both the shrinkage target and the smoothed periodogram matrix with Ξ(ω; θ T ) and f T (ω), respectively, the convex combination in Equation (1) then becomes The shrinkage weight that minimizes quadratic risk is As before, in order to estimate the optimal shrinkage weight, we use the bootstrap to generate bootstrapped distributions of the statistics of interest. The multivariate TFT bootstrap algorithm we developed in Section 2.3.1 can be easily modified. In Step 4 of the algorithm, we use the bootstrapped data X + (t) to obtain a bootstrapped estimate of the parameters of the shrinkage target θ + T . Steps 6-8 are appropriately modified by replacing D T (ω) and D + T (ω) with Ξ(ω; θ T ) and Ξ(ω; θ + T ), respectively. Each of the shrinkage estimators proposed by Böhm and von Sachs [4], Böhm and von Sachs [5], and Fiecas and Ombao [10] is a special case of Equation (4) by letting the shrinkage target Ξ(ω; θ T ) be appropriately specified. Using the multivariate TFT bootstrap as we have described in this section for estimating the shrinkage weight will improve on their estimation procedures by maintaining frequency resolution since we do not rely on another layer of smoothing over frequencies to estimate the shrinkage weight, as was done in those works.

Simulation study
We assessed the performance via a Monte Carlo simulation study of shrinkage estimators by investigating how well they estimate the spectral density matrix and the partial cross-coherence (PCCoh) matrix. PCCoh is the frequency domain analog of partial cross-correlation. Calculating PCCoh is challenging for high-dimensional time series data because it is a function of the inverse of the spectral density matrix [7]. To evaluate the estimators of the spectral density matrix, we used the mean integrated squared error (MISE), defined by where M = 100 denotes the number of Monte Carlo samples in our simulation study. Similarly, to evaluate an estimator's performance in estimating PCCoh, we also used the MISE but in the above formulation replace f (ω k ) and f (ω k ) with the true and estimated values of PCCoh at frequency ω k , respectively. We compared the performance of the smoothed periodogram matrix to the shrinkage estimator using various shrinkage targets, namely, the diagonal target described in the present work, the VAR target obtained by fitting a vector autogregressive model as described by Fiecas and Ombao [10], and the scaled identity target as described by Böhm and von Sachs [5].
The simulated data were P -dimensional vectors drawn from two different time series models. The first model was the first-order vector moving average with innovations such that each dimension was first drawn from Unif(−3,3), and then jointly rotated to induce correlation between dimensions. The second model was the first-order vector autoregressive, with innovations also drawn from Unif(−3,3), and then jointly rotated to induce correlation between dimensions. The details of the simulation settings are given in the appendix. We considered the cases P = 12, 48, and 96 using sample sizes T = 256 and 512. These are challenging scenarios, and are the scenarios that one can often encounter when analyzing data such as, e.g., fMRI time courses, as we will see in Section 4. First, the effective sample sizes (the smoothing span of the smoothing kernel) are small relative to the dimension P because a small smoothing span (relative to the sample size T ) is needed in order to accurately capture the frequencies which drive the process; indeed, in each setting for T , the effective sample sizes ranged from as low as approximately .06 T to as high as approximately .17 T . Thus, the setting T = 256 and P = 96 is the most challenging scenario. Second, since both the diagonal and the VAR shrinkage targets are based on the univariate and vector autoregressive models, respectively, then these shrinkage targets used misspecified parametric models whenever the true process was a vector moving average. Finally, to confirm that the performance of our bootstrap estimators, based on Gaussian increments in the frequency domain, is not tied to Gaussianity of the underlying time series, we used a non-Gaussian time series in our simulation study.
To smooth the periodogram matrix, we used the algorithm by Ombao et al. [24] for obtaining an optimal smoothing span for each dimension of the time series, and then taking the maximum of these smoothing spans to smooth the off-diagonal elements because, in our experience with empirical data, the crossspectra tended to be smoother than the autospectra. For the VAR and scaled identity shrinkage targets, we used the multivariate TFT bootstrap outlined in Section 2.3.2 to estimate the shrinkage weights, and generated the bootstrap samples using a pre-regularized estimator that guarantees a minimum eigenvalue of 0.01 at the frequencies where the smoothed periodogram matrix had negative eigenvalues. For all shrinkage estimators, we used a pre-regularized smoothed periodogram matrix. The results for the smoothed periodogram matrix we report in this study is also for the pre-regularized smoothed periodgram matrix to make the comparisons fair. For all shrinkage estimators, we generated B = 200 bootstrap samples using the multivariate TFT to estimate the shrinkage weight. We used the BIC to pick the order of each of the univariate AR fits for the diagonal target and also for the order of the VAR model for the VAR target.
First, let us discuss the performance in estimating the spectral density matrix as shown in Table 1. In all cases, shrinkage towards the diagonal matrix improved on both the smoothed periodogram matrix and the diagonal target alone. Specifically, we clearly see that shrinking towards the diagonal target balances the low bias but high variance of the smoothed periodogram matrix with the high bias but low variance of the diagonal shrinkage target. Now let us compare the performance of the different shrinkage estimators. In Table 2, recall that the dimension of the parameter space of a VAR model is of the order P 2 . Consequently, the order of the VAR model for the shrinkage target was picked to be 1 in all cases because of the high number of parameters in the model, yielding a highly biased shrinkage target whenever the true process was VMA. However, whenever the true process was VAR, shrinkage towards the VAR yielded the smallest bias compared to the other estimators, but the MISEs were high because of the high variance in the parameter estimates due to the large dimensionality of the parameter space. On the other hand, the dimension of the parameter space for each of the diagonal and the scaled identity targets is of much smaller order, hence, their better performances at higher dimensions. Of these two estimators, shrinking towards the diagonal led to lower MISEs because it better captured the heterogeneity in the autospectra. We point out that the scale of the MISEs with respect to the true process being VAR is much larger than that of the VMA process is because the power integrated across all frequencies is much larger for the former than the latter. Now let us turn to the performance in estimating PCCoh, as shown in Tables 3 and 4. In Table 3, the variance for the diagonal only estimator is zero for all cases because its estimates of all pairwise PCCoh are all 0, hence no variance. The pre-regularized smoothed periodogram yielded estimates of PCCoh with high variance, and thus had high MISEs. We see that shrinking towards the diagonal target also yielded good estimates of PCCoh relative to the smoothed periodogram, particularly when P = 48 and P = 96. Now let us look at Table 4 that compares the different shrinkage estimators. We see that shrinkage towards the VAR has some merits, especially when T is large relative to P . By shrinking towards the VAR, the conditional dependencies between the dimensions of the time series were modeled and then adjusted nonparametrically via the smoothed periodogram matrix [10]. The merits of the diagonal and the scaled identity shrinkage targets include a reduction in the dimensionality of the parameter space, though this is at the expense of the estimated cross-dependencies being biased towards zero. The performance of shrinking towards the proposed diagonal matrix is comparable to shrinking towards the scaled identity matrix.
Altogether, it is clear that the smoothed periodogram matrix is not a good estimator for the spectral density matrix of high-dimensional time series data. Each of the three shrinkage estimators we have considered improved on the smoothed periodogram matrix. The choice of the shrinkage target, however, is not clear, and is likely to be dependent on the problem at hand. Shrinking towards a diagonal matrix leaves each autospectrum as a free parameter, in contrast to the scaled identity matrix which averages across all the autospectra; if there is heterogeneity in the shapes of the autospectra, as was the case in our simulated data, then it may be better to shrink towards a diagonal matrix. If matrix inversion is necessary for the statistics of interest, as is the case for computing PCCoh, we recommend shrinkage towards the diagonal or towards the scaled identity matrix because they both give estimates which are regularized over the dimensions. We only recommend to consider shrinking towards the VAR model if the length of the time series is long relative to its dimensionality.

Description of the data
Resting-state fMRI studies have provided evidence on using functional connectivity (FC), conceptually defined as the temporal dependencies across different regions of the brain [16], as a biomarker for various diseases [14,13]. Recently, test-retest analyses have been conducted to investigate the reliability of FC in resting-state fMRI studies [28,12]. Partial cross-coherence (PCCoh) has been successfully used in resting-state FC studies and there is evidence that the frequency band [0.01 0.10] Hertz carry the relevant signal [26,27]. Our interest in this study is to investigate the effects of regularization on the smoothed periodogram matrix used to obtain estimates of PCCoh in a test-retest analysis. In the following test-retest data set, the same subjects were scanned at different sessions, though without any changes to the scanning protocols, and so ideally, under the assumption that the brain dynamics do not change across sessions, the estimates of PCCoh are robust with respect to the sessions and to the noise.
We analyzed a resting-state fMRI data set of 25 participants (mean age 29.44 ± 8.64, 10 males) that is publicly available at NITRC (http://www.nitrc.org/ projects/nyu trt). A Siemens Allegra 3.0-Tesla scanner was used to obtain three resting-state scans for each participant. Each scan consisted of T = 197 contiguous EPI functional volumes with a time repetition (TR) = 2000 ms. Scans 2 and 3 were conducted in a single session 45 minutes apart and were 5-16 months (mean 11 ± 4 months) after scan 1. During each scan, each participant was asked to relax and remain still with eyes open during the scan. The raw images were preprocessed as follows: they were 1) motion corrected, 2) normalized into the Montreal Neurological Institute space, 3) removed of nuisance signals, namely the six motion parameters, signals from white matter and the cerebrospinal fluid, and the global signal, and then 4) spatially smoothed using a Gaussian kernel with full-width half-maximum 6mm. Because we will perform a test-retest analysis in the frequency domain on the signals, the signals were not passed through a band-pass filter which may introduce another source of variability. These are the same preprocessing procedures carried out by Fiecas et al. [12].
To obtain anatomically defined regions-of-interest, we used the Anatomical Automatic Labeling (AAL) atlas, which parcellates the whole brain into 90 different regions [30]. Each region's mean time course was obtained by averaging the fMRI time series over all of the voxels within the region. Each regional time course was then detrended and standardized to unit variance. Thus, the data in hand is a P = 90 dimensional fMRI time series of length T = 197 for each of the twenty-five subjects and in each of the three sessions.

Overview of the statistical procedure
We smoothed each element of the periodogram matrix with a M T = 31-point Hamming window. Thus, the effective sample size for estimating the spectral density matrix at each discrete frequency is smaller than the dimension P = 90 of the time series. We could not obtain estimates of PCCoh from the (unregularized) smoothed periodogram matrix because the smoothed periodogram matrix was not invertible and so we used our proposed shrinkage estimator to regularize the smoothed periodogram matrix. We generated B = 200 bootstrap samples of the time series data using the multivariate TFT using a pre-regularized estimator such that the minimum eigenvalue at each frequency is at least 0.01. Though the pre-regularization scalar varied across frequencies for each subject in each session, within the frequency band [0.01 0.10] Hertz the pre-regularization scalar was constant at 0.01 because the negative eigenvalues were small in magnitude. By comparison, if we were to shink towards the scaled identity target, within the frequency band [0.01 0.10] Hertz the scale, which also varied across frequencies for each subject in each session, ranged from 0.339 to 3.518, with a median value of 0.757. Thus, the pre-regulization scalar was relatively much smaller. For each subject and each of the three sessions, we computed the PCCoh between each dimension in our 90-dimensional time series, so that for each subject, we have 4005 many estimates of PCCoh. PCCoh estimates were averaged within the frequency band [0.01 0.10] Hertz.
We performed a test-retest analysis on each of the 4005 estimates in order to investigate the stability of partial cross-coherence as an estimate of the conditional dependencies between different regions. We calculated the intraclass correlation coefficient (ICC) to investigate how much each source of variability contributed to the overall variability in the estimates. To compute the ICC, consider first the following random-effects ANOVA model: ρ jk = ρ + α j + β k + ǫ jk , where ρ jk is the estimated PCCoh for subject j in session k, ρ is PCCoh at the population-level, α j ∼ N (0, σ 2 α ) is the subject-effect, β k ∼ N (0, σ 2 β ) is the session-effect, and ǫ jk ∼ N (0, σ 2 ǫ ) is noise [29]. Using this model, we can define the ICC to be ICC = σ 2 α /(σ 2 α + σ 2 β + σ 2 ǫ ), which is the proportion of variability in the estimates of PCCoh that is attributed to the subjects. Data from all three sessions were used to compute the "overall" ICC; only data from Sessions 1 and 2 were used to compute the "long-term" ICCs and only data from Sessions 2 and 3 were used to compute the "short-term" ICCs.

Results of the test-retest analysis
Overall, long-term, and short-term ICCs, along with the estimates of subject, session, and noise effects, of PCCoh averaged over the 4005 estimates are shown in Table 5. Though it may seem that the ICCs are low, this is the case with resting-state fMRI data, and our estimates of ICC are within the same range as those reported by Shehzad et al. [28] and Fiecas et al. [12]. As reported by

M. Fiecas and R. von Sachs
Fiecas et al. [12], there is a positive relationship between ICC and PCCoh, and because a substantial proportion of the 4005 PCCoh values were small, a large number of the 4005 PCCoh values also yielded small ICCs. We point out that the ICC only quantifies the effects of the elapsed time across scanning sessions, and so a higher ICC does not mean that the estimates are more accurate, but rather, they indicate that they are less variable across scanning sessions. This data set perfectly illustrates the need for regularization of estimates of the spectral density matrix. In particular, we emphasize that if we did not regularize the smoothed periodogram matrix, then matrix inversion would not have been possible, and thus, obtaining estimates of PCCoh would not have been possible. The results of our test-retest analysis are identical to the results obtained by Fiecas et al. [12], who used the scaled identity matrix as the shrinkage target. Using a diagonal shrinkage target can better capture the heterogeniety in the autospectra as suggested by Figure 1. Thus, we obtain better fit to the data while simulateneously achieving similar performance in test-retest reliability for the estimates of PCCoh as that when the shrinkage target is the scaled identity matrix.

Theoretical validity of the bootstrap estimates
Our estimate of the shrinkage weight uses statistics computed from the bootstrapped distribution. We show this strategy to be theoretically sound by showing that the bootstrapped distribution well-approximates the asymptotic distribution of the estimate of the parameter of interest, which we assess using Mallows' d 2 metric [23,3]. The d 2 -distance between distributions F 1 and F 2 is d 2 (F 1 , F 2 ) = inf(E|X 1 − X 2 | 2 ) 1/2 , where the infimum is taken over all random variables X 1 and X 2 with marginal distributions F 1 and F 2 , respectively. Formally, one has to consider the bootstrap distribution as a conditional distribution given the data X(1), . . . , X(T ) and show the convergence in probability of this distance to zero. In our case it will be sufficient to establish convergence of the distribution of the bootstrapped quantities, appropriately standardized, to the normal distribution (as in Franke and Härdle [15]), accompanied by the convergence of the first two moments of this distribution. In order to do so we will first derive the results for the convergence of the distribution of the considered estimators in the "real world" and then show that the same Central Limit Theorem holds in the "bootstrap world". Moreover, we treat the first and second moments in these central limits: thanks to a well-chosen rate of convergence of smoothing span and order of autoregressive fit, respectively, the squared bias term is asymptotically vanishing, hence asymptotic normality with closeness of the asymptotic variances serves us to show the required convergence in the Mallows' d 2 metric. With this, an empirical estimator of the variance in the bootstrap world (the "bootstrap variance") approaches the variance in the "real word". Applying this argument to the distributions of our two used estimators, the smoothed periodogram f T (ω) and the parametric fit D T (ω), we derive that our bootstrap consistently estimates the variance of these two estimators. A continuous mapping argument ("Slutsky's theorem") will finally be used to show that the bootstrapped optimal shrinkage weight W T (ω) is a con-sistent estimator of the true optimal one W T (ω), meaning that with T → ∞ their differences converges to zero in probability. The details of this argument are presented now along the following three theorems.
First, we give the theoretical validation to our bootstrap procedure for obtaining statistics about the smoothed periodogram matrix. For simplicity, we assume that the smoothing kernels and smoothing spans used to estimate each element of the smoothed periodogram matrix are the same span, i.e., K Theorem 5.1. Suppose that X(t) is a (P -dimensional) linear process with (element-wise) absolute summable coefficient matrix driven by independent and identically distributed innovations with finite fourth moments and non-singular variance-covariance matrix. Suppose further that the spectral density matrix of X(t) is (element-wise) two times differentiable on [−0.5, 0.5], and that it is estimated with the smoothed periodogram matrix f T (ω) using a kernel function of order 2 (such as a symmetric kernel) with smoothing span M T . Assume that M T → ∞ and M 5 T /T 4 → 0 as T → ∞. Suppose f T (ω) is used to generate the bootstrapped data X + (t). Then for any given frequency ω and j, k ∈ {1, . . . , P }, In this theorem the assumptions on the smoothing parameter M T are somewhat classical. Recalling that the kernel span M T is related to the kernel bandwidth b T (also often used in the literature) by M T = b T T , we observe that its asymptotic behavior is equivalent to assuming b 5 T T → 0 as T → ∞. Note that this condition leads to a slightly smaller than the usual MSE-optimal bandwidth for nonparametrically estimating a spectral density under the given conditions of regularity (which is b T ∼ T −1/5 ). As an advantage and in fact motivated from this, the aforementioned convergence in distribution of the appropriately scaled spectral estimates can directly be stated by centering about the population quantities without needing to consider a bias term; for details we refer to the proof of Theorem 5.1 in the Appendix. As a consequence, in the particular context of bootstrapping kernel spectral estimates and in contrast to Franke and Härdle [15], the smoothing parameters of the kernel estimators can be chosen to be the same in the bootstrap and in the "real" world.
Second, we address the asymptotic behaviour of the distribution of D T (ω). Under additional conditions on the underlying time series process, derived by [1], we will establish an asymptotic result for D T (ω) which is similar to the one of Theorem 5.1, established for the smoothed periodogram matrix f (ω). For this we have to suppose that p = p T = min 1≤j≤P p j tends asymptotically to infinity as T → ∞, meaning that for all elements of the diagonal matrix D T (ω) we assume an asymptotically growing order of the AR-fit. More precisely we use the conditions of [1], Theorem 6, applied to each dimension of the true underlying multivariate process separately, to show asymptotic normality of the univariate autoregressive fits and to control, in particular, the asymptotic behavior of both bias and variance of D T (ω) in comparison with the rate of convergence M −1/2 T of f T (ω). This gives us the following analog of Theorem 5.1.
Theorem 5.2. Suppose that each marginal X i (t), i = 1, . . . , P of the true underlying process X(t) can be represented as an invertible linear process driven by independent and identically distributed innovations with finite fourth moments, and denote the autoregressive coefficients of its AR representation by {a k } k≥1 . Suppose further that i) M T → ∞ and M 5 T /T 4 → 0 as T → ∞, that ii) p T → ∞ and p 3 T /T → 0 as T → ∞ and iii) p = p T is chosen such that T 1/2 ℓ≥1 |a p+ℓ | → 0 as T → ∞, and that finally iv) M T p T /T → 0 as T → ∞. Then for any j ∈ {1, . . . , P }, whereas its variance is of order o(M −1 T ). The proof of this Theorem 5.2 is a copy of the proof of Theorem 5.1, a direct consequence of the asymptotic normality of D jj,T (ω) stated in Theorem 6 of [1]. For some more details, we refer to the appendix. Note that, in particular, we obtain the asymptotic variance of each diagonal element of D T (ω), which turns out to only depend on the true underlying spectrum. With this result, the sample variance of the bootstrapped distribution of each diagonal element of the bootstrapped D + T (ω), is a valid estimator for the variance of D T (ω). Now we have prepared the ground to address the validity of our bootstrap estimator of the optimal shrinkage weights W T (ω) which we recall from equation (3) to be For details of the proof, we refer again to the Appendix section. However, for the reader's insight, we note already here that it is in fact not necessary to treat the covariance term arising both in numerator and denominator of Equation (3) (which we actually would not be able to do as we cannot control the joint distribution of ( f T (ω), D T (ω))): under the conditions on the rate of increase of p T given in Theorem 5.2, cov( f T (ω), D T (ω)) = o(M −1 T ), and hence converges to zero faster than does the dominating term var( f T (ω)) (which is of order O(M −1 T )). In total, we observe the validity of the multivariate TFT bootstrap, and consequently, the validity of our estimate of the optimal shrinkage weight.

Discussion
We further developed methodology on addressing the challenge between balancing spectral fit versus regularization of estimates of high-dimensional spectral matrices, two extremes that were developed by Böhm and von Sachs [5] and Fiecas and Ombao [10], and then proceeding with a general algorithm that contains those two works as special cases. As previously investigated in the cited literature and in this work, the smoothed periodogram matrix, which is the classical nonparametric spectral estimator, needs to be regularized in high dimensions. Hence, we chose a shrinkage target that sufficiently stabilizes the regularity of the smoothed periodogram matrix and serves as a reasonable, though deliberately misspecified, parametric fit. We chose a diagonal matrix, composed of a collection of univariate AR-fits to each autospectrum, hence, representing a good compromise between the highly regularizing fully misspecified multiple of the identity as in Böhm and von Sachs [5] and the fully parametric VAR fit of Fiecas and Ombao [10]; the diagonal structure regularizes over the dimensions and substantially reduces the number of parameters from that of a full VAR model, and simultaneously, modeling the diagonal elements results in better fit for the autospectra of the process.
One could, however, choose any valid shrinkage target, and use our procedure as outlined in Section 2.3.2. Another possible shrinkage target, for example, is a block-diagonal matrix. For instance, in the context of functional connectivity analyses for fMRI, one could arrange the blocks to correspond to known functional networks in order to obtain improved fit to the cross-dependencies between the dimensions within each known network. Moreover, the block diagonal structure will regularize the smoothed periodogram matrix, though only mildly so relative to our proposed diagonal shrinkage target. In the context of high-dimensional time series data, we recommend picking a shrinkage target which is highly regularized and has a low-dimensional parameter space.
Our second important contribution is the multivariate TFT bootstrap and its theoretical validity. Our multivariate TFT bootstrap can be considered as the multivariate generalization of an instance of the univariate TFT bootstrap of Kirch and Politis [19]. We showed the utility of the multivariate TFT bootstrap in estimating the optimal shrinkage weights of the shrinkage estimator. The multivariate TFT bootstrap samples from the multivariate complex normal distribution. A fully nonparametric bootstrap on the residuals would better capture the higher-order structure of the data in order to reproduce the whole distribution of the time series. However, for our purposes of using the multivariate TFT bootstrap to estimate the shrinkage weight, we are interested in reproducing the variance of estimates of the spectral density matrix, which asymptotically does not depend on higher-order moments of the data. In a simulation study not reported here, we investigated the impact of skewness in the underlying data on the bootstrap procedure for estimating the variance of the smoothed periodogram. We saw that, even though the data was skewed and the bootstrap samples were not skewed, the resulting estimates of the variance of the smoothed periodogram, which asymptotically does not depend on higher-order quantities, were still very satisfactory, even for small samples. Accurate estimates of the variance of the smoothed periodogram is sufficient for our purpose of estimating the shrinkage weights.
To our knowledge, the only other method for bootstrapping multivariate time series data that has been shown to give theoretically valid bootstrap samples in both the time and frequency domains is the multiple hybrid bootstrap proposed by Jentsch and Kreiss [18]. This bootstrap procedure, which is the multivariate generalization of the bootstrap procedure proposed by Kreiss and Paparoditis [21], first fits a VAR model to the data, resamples the residuals, and then applies a bias-correction in the frequency domain. One could also use this multiple hybrid bootstrap to estimate the shrinkage weight since it has the attractive feature that it can create bootstrap samples in the time domain even though the resampling takes place in the frequency domain. However, the multiple hybrid bootstrap requires one to fit a VAR model to the data. If the length of the time series is short relative to the dimensionality, fitting a VAR model may not be possible. Moreover, matrix inversion is necessary to do the bias-correction in the frequency domain in their procedure, and so the performance of this procedure may not be optimal in the context of high-dimensional time series. We speculate, however, that, in the same spirit as our diagonal shrinkage target, one could modify the multiple hybrid bootstrap whenever the dimensionality is large by instead fitting independent univariate AR models to each dimension of the data.
show the sufficient assertion For ease of notation, henceforth we denote L + (·), E + (·), and cov + (·, ·) to denote the law, expectation, and covariance, respectively, conditional on the data X(1), . . . , X(T ). According to Mallows [23] and Lemma 8.8 in Bickel and Freedman [3], we can split Equation (5) into two terms, namely a term each for variance V 2 T and squared bias B 2 T , given by )))} and [3], we have convergence in Mallow's d 2 metric if we also have convergence in the first two moments and convergence in distribution.

By Lemma 8.3 in Bickel and Freedman
First, we treat the variance term. Recall the following result, which can be found in Brillinger [6]: Moreover, the asymptotic distribution of the smoothed periodogram matrices is where the elements of the asymptotic variance-covariance matrix W are obtained from Equation (6) [6]. For the bootstrapped smoothed periodogram matrices, recall that the bootstrapped time series have f (ω) as the "true" spectral density matrix in the bootstrap world. Given the time series data, the asymptotics of the bootstrapped smoothed periodogram follow the above derivations of the kernel estimator in the "real world" as can be seen from, e.g. Franke and Härdle [15] (with the difference that in contrast to their approach we can choose the same M T in both worlds). In particular, we have that the difference between M T cov + ( f + jk,T (ω), f + lm,T (λ)) and      ( f jl,T (ω) f mk,T (ω) + f jm,T (ω) f lk,T (ω)) K 2 (u)du, ω = λ ∈ {0, ±0.5}, f jl,T (ω) f mk,T (ω) K 2 (u)du, ω = λ ∈ (−0.5, 0.5), 0, ω = λ, tends to zero in probability with T → ∞.
Furthermore, we have where the elements of the asymptotic variance-covariance matrix W are obtained from the above equation (7). Because each element of the smoothed periodogram matrix is consistent, then || W − W|| 2 converges in probability to 0, and consequently V 2 T P → 0. By Brillinger [6], our assumption on the rate of convergence of the smoothing span M T , and under the given conditions on the spectrum and the used kernel of second order, we have E( f jk,T (ω l ) − f jk (ω l )) = O((M T /T ) 2 ).
Since the asymptotics of the bootstrapped smoothed periodogram follow those from the "real world", we have It follows that B 2 T P → 0. Altogether, it follows that we have convergence in probability to zero of the considered Mallow's d 2 metric.
Proof of Theorem 5.2. We first prove the second part of the theorem. We compare the convergence of the bias and the variance of D jj,T (ω) with that of the nonparametric fit, for which we need condition iv). For this we first observe that condition i) simply retakes the condition M 5 T /T 4 → 0 of Theorem 5.1 coming from the control of the squared bias therein. Using the conditions ii) and iii) taken from Theorem 6 of Berk [1], the bias of D jj,T (ω) is of the order of o( p T /T ) which is, under condition iv), well of order o(M −1/2 T ) whereas its variance is of order O(p T /T ) which is, again under condition iv), of order o(M −1 T ). Now we prove the first part of the theorem. For the reader's convenience we state the Central Limit Theorem of Berk [1], Theorem 6, reformulated for the diagonal elements of D T (ω) (for 0 < ω < 0.5, to simplify): L T /p T ( D jj,T (ω) − f jj (ω)) ∼ AN (0, 2 f 2 jj (ω)), for j = 1, . . . , P . Based on this asymptotic normality, the proof of the convergence of the Mallows' metric follows the lines of the proof of Theorem 5.1, replacing the rate of convergence M −1 T of the variance by the appropriate rate p T /T coming from Berk [1], Theorem 6. In order to transfer the convergence to the asymptotic distribution from the "real world" to the bootstrap world, we use the following arguments of the (univariate) TFT-bootstrap of Kirch and Politis [19].
Although we have to go back into the time domain to obtain our parametric spectral estimator (via the estimates of the autoregressive parameters constructed in the time domain, such as the Yule-Walker estimators), the (univari-ate) TFT-bootstrap is valid for sample autocorrelations [19] and, in our asymptotic context of p T → ∞, p/T → 0 as used by Berk [1], the TFT-bootstrap is valid for sample autocovariances. This is because, in our asymptotic set-up, a possible contribution of the fourth-order cumulant of the considered time series in the time domain, as typically arising in the asymptotic normality of the timedomain estimator of the innovations variance of the linear process, drops out (as it is known to be the case for certain ratio statistics and in particular also for the nonparametrically smoothed periodograms). Thus, for the bootstrapped estimators we have L + T /p T ( D + jj,T (ω) − f jj,T (ω))|X(1), . . . , X(T ) ∼ AN (0, 2 f 2 jj,T (ω)), for j = 1, . . . , P . We decompose Mallow's d 2 metric into a squared bias and a variance term as before. We have the convergence in probability to zero of the bias term per the second part of the theorem. Per the consistency of the smoothed periodogram, we also have the convergence of the variance term, and hence, we have the convergence in probability to zero of the difference in the limiting variances. Thus, we have convergence in probability to zero of the considered Mallow's d 2 metric.
the P × P correlation matrix of the innovations is the block diagonal matrix R = diag(R 3 , R 3 , . . . , R 3 ), so that R is composed of P/3 many blocks. The coefficient matrix Φ is defined in a similar manner. For the first coefficient matrix, first define a 3 × 3 coefficient matrix Then the P × P coefficient matrix is Φ = diag(Φ 3 , Φ 3 , . . . , Φ 3 ). Just as with the correlation matrix, the coefficient matrix is composed of P/3 many blocks.
For the vector autoregressive model, we use the same procedure for generating the innovations. The first-order vector autoregressive model has the form where we used the same coefficient matrix Φ.