A nonparametrically corrected likelihood for Bayesian spectral analysis of multivariate time series

This paper presents a novel approach to Bayesian nonparametric spectral analysis of stationary multivariate time series. Starting with a parametric vector-autoregressive model, the parametric likelihood is nonparametrically adjusted in the frequency domain to account for potential deviations from parametric assumptions. We show mutual contiguity of the nonparametrically corrected likelihood, the multivariate Whittle likelihood approximation and the exact likelihood for Gaussian time series. A multivariate extension of the nonparametric Bernstein-Dirichlet process prior for univariate spectral densities to the space of Hermitian positive definite spectral density matrices is specified directly on the correction matrices. An infinite series representation of this prior is then used to develop a Markov chain Monte Carlo algorithm to sample from the posterior distribution. The code is made publicly available for ease of use and reproducibility. With this novel approach we provide a generalization of the multivariate Whittle-likelihood-based method of Meier et al. (2020) as well as an extension of the nonparametrically corrected likelihood for univariate stationary time series of Kirch et al. (2019) to the multivariate case. We demonstrate that the nonparametrically corrected likelihood combines the efficiencies of a parametric with the robustness of a nonparametric model. Its numerical accuracy is illustrated in a comprehensive simulation study. We illustrate its practical advantages by a spectral analysis of two environmental time series data sets: a bivariate time series of the Southern Oscillation Index and fish recruitment and time series of windspeed data at six locations in California.

Bayesian nonparametrics, completely random measures, Markov chain Monte Carlo

Introduction
Statistical models can be classified into parametric and nonparametric models.Parametric techniques assume a pre-specified model with a finite number of parameters from a particular family of distributions for the data.When the specification is correct, those techniques are powerful and efficient.However, the results of parametric approaches are easily affected by misspecification and their advantages significantly diminished in this case.In contrast, nonparametric techniques do not need to make stringent parametric assumptions about the data generating process.As a result, nonparametric models which can have an infinite number of parameters are better at adapting to varying complexities of the data than parametric approaches.
Bayesian nonparametric time series analysis has drawn more attention over the last couple of decades.Many techniques applied in the time domain are based on Hidden Markov models (Fox et al. (2011); Fox et al. (2014); Aicher et al. (2019)) and factor models (Carvalho et al. (2008); Rodríguez and Dunson (2011); Barigozzi and Hallin (2016); Kalli and Griffin (2018)).In the frequency domain, numerous methods have been developed that use the Whittle likelihood (Whittle (1957)) for univariate stationary Gaussian time series such as Choudhuri et al. (2004a), Rousseau et al. (2012), Cadonna et al. (2017), Kirch et al. (2019), Edwards et al. (2019), Rao and Yang (2021) and Maturana-Russel and Meyer (2021).The Whittle likelihood approximates the Gaussian likelihood by a pseudo-likelihood, making use of the asymptotically independent discrete Fourier coefficients that have variances equal to the spectral density at the corresponding Fourier frequencies.The merits of using the Whittle likelihood are its direct dependence on the spectral density and reduced computational costs compared to the true Gaussian likelihood.The Whittle likelihood has also been extensively applied for the analysis of multivariate stationary time series by diverse approaches such as Rosen and Stoffer (2007), Krafty and Collinge (2013), Cadonna et al. (2019), Meier et al. (2020) and Hu and Prado (2023).
As per the preceding discussion, parametric approaches are more effective than nonparametric ones when the specified model closely matches the true data generating process.However, the performance of parametric methods will significantly degenerate when misspecifications occur.Therefore, we propose a novel method which combines the advantages of both parametric and nonparametric approaches for multivariate spectral density estimation of stationary Gaussian time series using the Bayesian framework.To elaborate, we specify a parametric model for the observations in the time domain (e.g. a VAR model) and propose a nonparametric correction of the parametric likelihood in the frequency domain.We show mutual contiguity of the multivariate Gaussian, the Whittle and the nonparametrically corrected likelihood.In particular, when the parametric working model is independent and identical (i.i.d.) Gaussian white noise, the nonparametrically corrected likelihood reduces to the multivariate Whittle likelihood.Thus, with this new corrected likelihood, we effectively propose a novel generalisation of the multivariate Whittle likelihood.This idea has been applied in the univariate case by Kirch et al. (2019).Following the approach by Choudhuri et al. (2004a), they specified a Bernstein-Dirichlet process prior on the spectral density.However, no direct extension of the Dirichlet process to the multivariate scenario (Meier, 2018) is available.Therefore, Meier et al. (2020) proposed a Gamma process prior on the space of Hermitian positive definite (Hpd) matrix-valued functions.This prior has been successfully implemented along with the Whittle likelihood in their work for nonparametric Bayesian spectral inference.Nonetheless, the case study in Meier et al. (2020) showed that spectral density estimates based on the multivariate Whittle likelihood and the Hpd Gamma process are generally too smooth to capture peaks and sharp features of the spectra.To address this shortcoming, we adapt the Hpd Gamma process prior and update it with our proposed generalised multivariate Whittle likelihood.This yields significant improvements.In particular, when the parametric likelihood specified in the time domain is sufficiently close to the true model, our procedure performs almost as efficiently as the pure parametric procedure.On the other hand, in the case of model misspecification, our approach is almost as robust as the pure nonparametric approach.More importantly, in the partially misspecified case, our approach outperforms both nonparametric and parametric models.
The remainder of this article is organised as follows.Section 2 proposes the nonparametrically corrected likelihood used for Bayesian spectral analysis of stationary multivariate time series and sates its asymptotic properties.Section 3 specifies the prior and describes the posterior computation.Section 4 provides the simulation result.Section 5 demonstrates the application of our approach to two real case studies.Section 6 gives the conclusion and an outlook for future work.The supplementary material contains the proofs and additional results of the simulation study.

Extension of the Multivariate Whittle Likelihood
In this section, we will introduce a likelihood for multivariate stationary time series which nonparametrically corrects a parametric likelihood.It can be regarded as an extension of the multivariate Whittle likelihood.To this end, we first revisit the multivariate version of the Whittle likelihood, which was initially proposed by Whittle (1957) in the univariate case.

Review of the Whittle likelihood
Let {  = ( 1 , . . .,   )  ,  = 0, 1, ...} be a Gaussian stationary time series in ℝ  with zero mean and autocovariance matrix Γ(ℎ) Under the assumption that all autocovariance functions are absolutely summable, i.e., ∞ ℎ=−∞ |   (ℎ)| < ∞ for all  and , the spectral density matrix of {  } is given by the Fourier transform of the autocovariance matrix: for 0 ≤  ≤ 2.Thus,  () is a Hermitian positive definite (Hpd) matrix and there is a one-to-one relationship between the spectral density and the covariance matrix.The diagonal elements of  () are real-valued and equal to the marginal spectral densities of each of the  univariate time series.The off-diagonal elements    for  ≠  are complex-valued in general and referred to as the cross-spectra of {  ,  = 0, 1, ...} and {   ,  = 0, 1, .
According to the well-known asymptotic properties of the Fourier coefficients (Brockwell and Davis, 1991), the periodograms at different Fourier frequencies are asymptotically independent, the expectations of the periodograms converge to the corresponding spectral density matrices with increasing sample size, that is and the periodograms have Exponential distributions with means  (  ).
Based on the asymptotic distribution of the Fourier coefficients, the Whittle likelihood    is defined as for  = 1, ...,  = ⌈/2⌉ − 1.For the purpose of spectral density estimation, the boundary Fourier frequencies  0 = 0 and  /2 =  (the latter for  even) are not considered in (4) since the corresponding Fourier coefficients are proportional to the sample mean and the alternating sample mean, respectively.Under certain assumptions on the time series, the Whittle likelihood provides an approximation to the true likelihood but is exact only in the case of Gaussian white noise (Shao and Wu, 2007).In contrast to the true Gaussian likelihood, the Whittle likelihood depends directly on the spectral density matrix.Furthermore, an evaluation of the Whittle likelihood only requires the inversion of  matrices of dimension  ×  instead of the inversion of a  × matrix for the true Gaussian likelihood.This requires  ( 3 ) floating point operations, a significant reduction in computational costs compared to the evaluation of the true Gaussian likelihood.

Nonparametric likelihood correction
For univariate time series, a nonparametrically corrected likelihood has already been proposed by Kirch et al. (2019).The idea is to begin with a parametric working likelihood in the time domain and then apply a nonparametric correction in the frequency domain.This is to ensure that the likelihood has the correct second-order dependence structure even if the parametric model is misspecified.Here we propose a nonparametrically corrected likelihood for multivariate time series.
To illustrate that the idea stems from the multivariate Whittle likelihood and thus provides an extension and refinement thereof, we first consider multivariate Gaussian white noise, i.e.    ∼   (0,   ), as the parametric working model.Note that the spectral density matrix of this parametric working model is 1 2   .However, if the model is misspecified, i.e., in the case of coloured noise, the estimate of the spectral density matrix based on this model may not be efficient.Therefore, we nonparametrically correct for potential model misspecification in the frequency domain by multiplying the Fourier coefficients by the square root of a correction matrix and backtransform to the time domain via a discrete inverse Fourier transform.In the case of a white noise working model, the correction matrix is defined by the block diagonal matrix . This is illustrated in the diagram below where   denotes the Fourier transform operator such that   Z  = Z : Z  ∼ parametric working model time domain As is readily verified, the density induced by the above linear transformation with   =  , under the Gaussian white noise working model turns out to be the Whittle likelihood defined in (4), up to the boundary frequencies.This derivation of the Whittle likelihood as a nonparametrically corrected parametric likelihood provides a way towards a generalization by starting with a more adequate parametric working model than Gaussian white noise.We suggest using a VAR model since the dependence structure of any causal Gaussian linear process can be captured by a VAR model of sufficiently large order.However, any other parametric time series model could be used.Denote by  pa the spectral density matrix corresponding to the parametric model.
Then the general correction matrix   is defined as . (5) Denoting the parametric likelihood of Z  by   pa , the corrected likelihood is given by with  being the parameters for the parametric working model.
The following theorem states the properties of the corrected likelihood.Most importantly, the corrected likelihood is equivalent to the parametric likelihood if the parametric model is the true data-generating model and the periodograms under the corrected likelihood are asymptotically unbiased for the true spectral density matrix regardless of whether the parametric model is correctly specified or not.
Theorem 1.Let Z  ∈ ℝ  be a stationary time series with zero mean and spectral density  .Let Γ ,pa and  pa be the autocovariance matrix and the spectral density matrix of a parametric model used in the correction likelihood.
(b) Under the corrected likelihood, the periodogram is an asymptotically unbiased estimator for the true spectral density matrix, i.e. for ,  = 0, ..., , where  ,  is the -th diagonal block of   .
Please refer to Appendix B for the proof of Theorem 1.

Mutual contiguity
In this section, we will derive an asymptotic property of the corrected likelihood for Gaussian time series, which is useful e.g. for the purpose of studying posterior consistency as in Kirch et al. (2019) and Meier et al. (2020).More precisely, we will prove mutual contiguity of the corrected likelihood, the Whittle likelihood and the full Gaussian likelihood.This is important since mutual contiguity of two probability measures defined on measurable spaces that change with the sample size  implies that those measures have the same asymptotic properties.The main usefulness of showing mutual contiguity of the corrected likelihood, the Whittle and the true Gaussian likelihood is that if one can prove consistency or find the rate of convergence of an estimator under the Whittle measure, then the estimator is also consistent and has the same rate of convergences under the true Gaussian measure.In the univariate case, mutual contiguity between the true Gaussian likelihood and the Whittle likelihood was proved by Choudhuri et al. (2004b), in which Le Cam's first lemma (see Lemma 6.4 in van der Vaart (1998)) was used.Meier et al. (2020) extended the univariate contiguity to the multivariate case, so it remains to establish mutual contiguity between the corrected likelihood and the Whittle likelihood.To this end, we first state the definition of mutual contiguity and specify the following two assumptions, which are the generalisation of Assumption A.1 in Kirch et al. (2019) to the multivariate case.
Two probability measures   and   on measurable spaces Ω  are mutually contiguous if for every sequence of measurable sets   ,   (   ) → 0 implies   (   ) → 0 and versa vice (cf.Definition 6.3 in van der Vaart (1998)).
Assumption 1.For the parametric working model, the eigenvalues of the spectral density matrix  pa (•) are uniformly bounded and uniformly bounded away from 0. That is, there exist positive constants  0,pa and  1,pa such that Moreover, the autocovariance function matrix Γ pa (•) fulfills Assumption 2. The eigenvalues of the underlying true spectral density matrix  (•) are uniformly bounded and uniformly bounded away from 0. That is, there exist positive constants  0 and  1 such that Moreover, the true autocovariance function matrix Γ(•) fulfills Assumption 2 was used by Meier et al. (2020) to establish the mutual contiguity between the exact Gaussian likelihood and the Whittle likelihood.The restriction on the Frobenius norm of Γ(•) ensures the continuous differentiability of  with derivative being Hölder of order  − 1 > 0. Assumption 1 makes the same restrictions on the parametric working model.Under these two assumptions, we can establish the mutual contiguity.
Theorem 2. Let Z  ∈ ℝ  be a time series with zero mean, spectral density matrix  and autocovariance matrix Γ  fulfilling Assumption 2. Let  pa and Γ ,pa be the spectral density matrix and autocovariance matrix of a Gaussian parametric model fulfilling Assumption 1. Then the true joint density   , the joint density    under the Whittle likelihood and the joint density    under the corrected likelihood equipped with the Gaussian parametric working model of Z  are mutually contiguous.
The proof of Theorem 2 is contained in Appendix B.

Prior specification
For a nonparametric Bayesian approach to spectral density estimation, a nonparametric prior on the space of Hpd matrix-valued functions is required.To this end, we employ the Bernstein-Hpd-Gamma prior introduced by Meier et al. (2020), which combines the Bernstein polynomial and the Hpd-Gamma process.Let S+  be the set of  ×  Hermitian positive semidefinite (Hpsd) matrices with unit trace and S+

S+
× (0, ∞).Let  ∈ (0, ∞) be the radial part and  ∈ S+  be the spherical part such that any  ∈ S+  can be decomposed as  = .The Hpd-Gamma distribution  ∼ Ga × (, ) is defined via its Laplace transform as for all Θ ∈ S+  with  on S+  and  : S+  → (0, ∞), where etr( ) := exp(tr( )).When  = 1, the Hpd-Gamma distribution is the same as the univariate Gamma distribution, so  and  can be regarded as an extension of the scale and rate parameters.Then a random process Φ ∼ GP × (, ) is called a Hpd-Gamma process on X = [0, ] if any arbitrary numbers of non-empty partitions of X are jointly Hpd-Gamma distributed.However, instead of specifying the prior on  directly, we specify the prior on the Hpd correction matrix defined as For  ∈ ℕ, we equidistantly partition [0, ] into  equidistant intervals as The Bernstein-Hpd-Gamma prior for (•) is defined by where (•| ,  −  +1) denotes the probability density function of a Beta( ,  −  +1) distribution.The parameter  is a smoothness parameter, the smaller , the smoother the resulting matrix-valued function .By putting a discrete prior probability function () on the polynomial degree , we achieve a data-driven choice of the degree of smoothness.
In the univariate case, a mixture of Bernstein polynomials with weights defined by a Dirichlet process was used by Choudhuri et al. (2004a) and Kirch et al. (2019).Such a Dirichlet process is a normalized Gamma process (see Section 1.2 in Meier ( 2018)), so the Bernstein-Hpd-Gamma prior can be regarded as a multivariate extension of the Bernstein-Dirichlet process prior on the space of univariate spectral densities to that of spectral density matrices.Analogous to the famous stick-breaking representation of the Dirichlet process, Theorem 4.4 in Meier et al. (2020) shows that under weak assumptions Φ can be represented by an almost surely convergent series involving i.i.d.components as where A truncation of the infinite series allows the design of a MCMC algorithm to sample from the posterior distribution as described in Section 3.2.By putting a prior on the parameters of the parametric working model, both parametric and nonparametric components of the model can be inferred simultaneously.This has been considered in the univariate corrected likelihood procedure by Kirch et al. (2019), in which the working model is an autoregressive (AR) time series model and the prior is put on the partial autocorrelations to ensure stationarity and causality.Similarly, in the multivariate case, we can use the VAR time series model as our parametric working model.This parametric model has also been used for the multiple hybrid bootstrap by Jentsch and Kreiss (2010).To elaborate, recall that the equation of a VAR(p) model with  ≥ 1 and Gaussian innovation is defined as with innovation covariance Σ ∈ S +  (ℝ) and the coefficient matrices which stacks all the VAR coefficients of (11).In the implementation, the VAR order  is predetermined by some order selection criteria, such as Akaike's Information Criterion (AIC) by Akaike (1974) or the elbow criterion (see Section 6).For the coefficients, we assume so that the prior is parametrised by  and   .We use a noninformative prior by setting

Posterior computation
To implement an MCMC algorithm, we truncate the representation of Φ in (10) at some large  such that Φ ≈  =1        .The value of  should be determined by considering the sample size and the error tolerance in practical computations.A conservative truncation choice is  = max{20,  1/3 }.In fact, that choice is used in the stick-breaking representation of the Dirichlet process found by Muliere and Tardella (1998) and Choudhuri et al. (2004a).As a result, the posterior distribution combined with the prior of  and the corrected likelihood (6) follows with Θ Φ := ( 1 , ...,   ,  1 , ...,   ,  1 , ...,   )being the parameter space of Φ is composed of 3 parameters and  = ( 1 , ...,   ) containing the parameters of the VAR parametric working model with order  ≥ 1.
Remark 1.In the univariate case, for reasons of identifiability, it is important to use a parametric working model with standardised errors, i.e., with  = 1, because all choices for  lead to the same corrected likelihood.This eliminates the need of pre-estimating or putting a prior on the error variance.However, the situation is more involved in the multivariate case as two different covariance matrices can lead to different corrected likelihoods.This is because two covariance matrices can differ by more than just a multiplicative constant and, indeed, two correlation matrices can have a very different structure.Based on some pilot simulation studies, putting a prior on the innovation term results in a high computational demand and can cause difficulty in posterior convergence (possibly due to some additional non-identifiability beyond multiplicative constants).Therefore, we pre-estimate the covariance of the innovation term and do not update it in the MCMC algorithm.The simulation results and the case studies in Sections 5 and 6 show that this implementation with a fixed (pre-estimated) innovation covariance performs well.

A Metropolis-within-Gibbs algorithm
We use Metropolis-Hastings (MH) steps (Hastings (1970)) to update the full conditionals in the Gibbs sampler, a Metropolis-within-Gibbs algorithm (Tierney (1994)).To elaborate, we will use the same algorithm to infer Φ and  as Meier et al. (2020), in which   's are reparametrised by some hyperspherical coordinates   's.The multivariate normal random walk is used for the full conditional for the coefficients of the parametric working model.Algorithm 1 summarises the algorithm of sampling from the posterior.For more details, please see https://github.com/easycure1/vnpctest.

Simulation study
In the simulation study, we compare the corrected likelihood approach to the parametric VAR approach and the nonparametric Whittle likelihood approach proposed by Meier et al. (2020) in the case of a correctly specified parametric model and a misspecified one.To this end, we first introduce some criteria used for quantifying the estimation performance, and the corresponding results for the posterior will be summarised in Section 4.3.The implementation of this approach is available at https://github.com/easycure1/vnpctest while the other two approaches are included in the R package beyondWhittle on CRAN (Meier et al. (2022)).
Consider a posterior spectral density matrix sample  (1) (), ...,  () () at any given frequency .Let S +  be the set of  ×  Hpd matrices.Consider the mapping H : S  → ℝ × 1 such that the real parts above the diagonal of an Hpd matrix  will remain above the diagonal of H , the imaginary parts above the diagonal of  will be assigned to below the diagonal of H  and the diagonal entries of  stay in the same position in H . Denote the image of the sample  ( ) () via H by ℎ ( ) := H  ( ) () for  = 1, ..., .Assume that 2 }.Let ĥ be the median of all ℎ (1) , ..., ℎ () so that ĥ is the median of {ℎ (1)   , ..., ℎ ()  } for  = 1, ...,  2 .Then we call f 0 () := H −1 ĥ the pointwise median spectral density matrix of  (1) (), ...,  () ().In the following contents, f 0 will be regularly used as a Bayes estimator for the true spectral density matrix  0 .To measure the goodness of f 0 , we consider the corresponding  1 -and  2 -errors.

Prior choice
The prior probability of  is chosen as () ∝ exp(−0.01log ).For practical feasibility, we employ the Γ-process Γ(, , Σ) (see Remark 4.1 1 CK: It might be easier to give the formula instead of a text explanation that goes over 4 lines in Meier et al. (2020) for details) as the prior for Φ.In particular, the parameters of Γ(, , Σ) are chosen as  = ,  =  and Σ = 10 4   .Furthermore, to improve the robustness at the boundary, we employ the truncated Bernstein prior with   = 0.1 and   = 0.9.
In the implementation, we run 80, 000 iterations for the Markov Chain, and the first 30, 000 iterations are discarded after the burn-in period.The remaining 50, 000 samples are thinned by taking every 5-th value in order to reduce the amount of memory used.We set  max = 300 and  = 20.

Simulated data
The following two bivariate models are used to generate the simulated multivariate time series: These two models have also been considered by Meier et al. (2020) for the comparison between VNP and VAR.The VMA(1) model is a simple linear multivariate time series example, and it does not belong to the family of VAR models, which are used by the VAR procedure and the parametric working model for our VNPC procedure.We use a fixed small order for the parametric working model of the VNPC procedure while the order of the parametric VAR approach is selected by AIC.We consider three different sample sizes,  = 256,  = 512 and  = 1024.For each sample size,  = 500 independent realisations are generated for both the VAR(2) and VMA(1) models.One realisation of length  = 256 of each model ( 14) and ( 15) is given in Figure 1.

Results
The results of the simulation study are shown in Table 1.As to be expected, for the correctly specified VAR(2) model, the parametric VAR procedure has much smaller  1 -and  2 -errors than the other two nonparametric procedures.The VNP and the VNPC procedures show almost the same performance in this case.For the data generated from a VMA(1) model, a large order  was selected by AIC to fit a VAR() model in this misspecified case.The estimates based on the VNPC and the VNP model have smaller  1and  2 -errors than the VAR model.The VNPC model outperforms the nonparametric VNP estimate as it can benefit from the partially correct VAR(1) working model.
Table 1 also shows the empirical coverage of uniform 90% credible regions and the median (among replications) of the median (over all frequencies) width of uniform 90% credible regions for all procedures.Not surprisingly, for the correctly specified case, the empirical coverage of the VAR credible intervals attain the nominal level.In the misspecified VMA(1) case, the coverage of the VAR procedure is even larger due to the enlarged width.
The VNP procedure produces similar coverages with smaller widths in either of the two cases.As discussed in Szabó et al. (2015), to produce honest (in the sense that their coverage matches (approximately) the corresponding posterior probability mass of the regions, 90% in our case) credible sets in the nonparametric setting, the prior has to be selected to slightly undersmooth.However, the Bernstein polynomials used in the mixture prior in the VNP procedure tend to oversmooth the true spectral density, so it might not be appropriate for achieving an honest credible region.As a comparison, the VNPC procedure enhances the coverage sufficiently to almost 90% for both cases.Spectral density estimates obtained by the three procedures for one random realisation of each of the two models are displayed in Appendix A. Moreover, an additional simulation study in Appendix A shows that the VNPC procedure using a Gaussian VAR working model performs well even in the misspecified case when the data-generating process is non-Gaussian.

Case studies
Here we demonstrate the proposed Bayesian nonparametric approach based on the corrected multivariate likelihood on a bivariate time series that has been extensively studied in the literature and a six-dimensional times series of windspeed measurements.Section 5.1 first introduces a method to choose the order of the parametric VAR() working model of the VNPC approach.Both studies use the same prior settings as the simulation study (see Section 4.1).

Elbow criterion
Under some weak assumptions, the VAR() model with sufficiently large order  can capture the dependence structure of a Gaussian linear process to any degree of accuracy.Accordingly, using a VAR model with a sufficiently large order can be regarded as a nonparametric procedure.This result has been employed by the VAR-sieve-bootstrap method proposed by Meyer and Kreiss (2015).In addition, that result also implies that the misspecified VAR model can still be used for the spectral density matrix estimation if the order is chosen to be sufficiently large.This explains why standard model selection techniques such as AIC tend to return a large order when misspecification occurs.This can lead to increased computational time, while also potentially causing the procedure to exhibit excessive confidence in the parametric model.Hence, we want to use a relatively small order for the parametric working model in our corrected likelihood procedure that is capable of describing the main features but not minute detail of the data.With this objective in mind, we follow the order selection approach used by the univariate corrected likelihood procedure in Kirch et al. (2019), which applies a rule-of-thumb rule to check for a bend or elbow in a plot of model order versus negative maximum log-likelihood evaluated at the least squares estimates.To elaborate, in most cases we will see an obvious bend in this scree-like plot as the order increases.The penalty term of standard penalisation techniques is tuned to find the best fit but too small to find the elbow at which only the most obvious features are explained.In analogy to Principal Components Analysis (PCA), which uses the scree plot to reduce the dimension, we can also use the order where we see a clear bend in the negative maximum log-likelihood plot.With the corresponding autoregressive order, the main features of the data are captured and increasing the number of parameters further will contribute very little to the corrected likelihood procedure except computational costs.

Southern oscillation index
The Southern Oscillation Index (SOI) and the associated recruitment series are two simultaneously recorded series used to explore the El Niño cycle (Shumway and Stoffer (2011)).The SOI is a standardised index of sea level air pressure differences between Tahiti and Darwin, Australia.The index is related to the El Niño Southern Oscillation (ENSO), the movement of equatorial sea surface temperature across the Pacific Ocean and its atmospheric response.ENSO is known to be periodically occurring every 2-7 years and last 6-18 months, and it contains three phases: neutral, El Niño (the warm phase) and La Niña (the cool phase) (National Oceanic and Atmospheric Administration (2022); Statistics New Zealand ( 2020)).ENSO is an extremely important meteorological movement in the Pacific Ocean as it has an impact on the weather through changes in air pressure, sea temperature and wind direction.Due to the periodic behaviour of ENSO, the observations in the SOI series are expected to have a time-dependent structure.For our analysis, we make the same simplifying assumption as Rosen and Stoffer (2007) that all effects (e.g. the seasonal effect) can be represented by the autocovariance structure.The recruitment series is composed of the number of newly spawned fish, re-scaled to the interval [0, 100] as shown in Figure 2. It is believed that the fish spawn is associated with the water temperature so we can expect a cross-dependence between the SOI and the recruitment series.Both series consist of a period of 453 monthly values over the years 1950-1987, which are shown in Figure 2. The data is available in the R package astsa as datasets soi and rec (Stoffer (2022)).Here, the standardised bivariate time series is analysed.We employ the VNPC procedure with four orders ( = 0, 2, 5 and 15) for the parametric working model and the VAR procedure with three orders ( = 2, 5 and 15), and we compare the performance of those two procedures.In fact, when the order of the parametric working model is 0, the VNPC procedure is the same as the VNP procedure.The order of 5 is chosen by the elbow criterion (in Figure 3, there is a significant bend at that order).The order of 15 is chosen by minimising the AIC.The arrangement of the bivariate time series is the SOI to be the first series and the recruitment to be the second.To make the comparison for the three procedures with different orders, we demonstrate the results of the two series separately, the SOI in Figure 4 and the recruitment series in Figure 5.It should be noted that since both series are monthly data, the frequency corresponding to a year period within [0, 2] is  yearly := 2/12 ≈ 0.52.From the periodograms of SOI in Figure 4, it can be seen that there is an annual peak.The peak is detected by the VNP procedure, while the spectral density estimate is overly smooth.However, when  = 2, the VAR procedure performs even worse as it is too smooth to pick up any spectral density peaks.In this case, the VNPC procedure with  = 2 does not improve much compared to VNP, i.e., VNPC with  = 0.When the order is increased to 5, which is the relatively small order chosen by the elbow criterion, the VNPC procedure has successfully estimated the main annual peak, in contrast to the VAR procedure.When using the large order given by AIC, i.e.,  = 15, the performances of the VNPC and the VAR procedures are very similar.Both estimators pick up the annual peak and its harmonics as well as a broader low-frequency that can be interpreted as the El Niño effect which occurs irregularly every 2-7 years.The findings for the recruitment series in Figure 5 are similar.The VNPC procedure can make use of the relatively small VAR order,  = 5, to capture the main annual peak in that series, while the estimates obtained by the VAR procedure with that order are too smooth.Figure 6 provides the 90% pointwise region for VNPC with  = 5.The top-right plot diplays the real part of the cross-spectrum, which is called the cospectrum; and the bottom-left plot displays the imaginary part of the cross-spectrum, which is called the quadrature spectrum.Figure 6: Posterior credible regions for the SOI and recruitment series for the VNPC procedure with a parametric working model with order 5.The diagonal pair shows the logarithmic spectral estimate for the two series and the off-diagonal pair displays the real and imaginary parts of the cross-spectrum.Pointwise 90% regions are visualised in shaded pink and uniform 90% regions are in shaded blue.The posterior median is given by the black solid line and the periodogram is shown in grey.
The squared coherency is more adequate for investigating the correlation between two series than the cospectrum and the quadrature spectrum.Recall that the coherency  between two series at frequency  is defined as This statistic can be viewed as the counterpart of the cross-correlation function in the time domain and it holds |(|  )| 2 ≤ 1 for all 0 ≤  ≤  (see Section 11.6 in Brockwell and Davis (1991)).We calculate the squared coherency for all 5000 spectral density estimates and then consider the median and the 90% pointwise region.The result in Figure 7 shows that there is an obvious peak at  yearly , and this agrees with the findings by the cospectrum and the quadrature spectrum in Figure 7.There are also a few minor peaks at the higher frequencies, which are the harmonics of  yearly , and the two series are correlated at those frequencies as well.In addition, the strong coherency shown at the low frequencies close to 0 might be because of the 2-7 years El Niño period mentioned previously.In fact, all the results in this Section are in line with the findings in Rosen and Stoffer (2007).In this case study, we apply the VNPC method to the California wind speed data from the Iowa State University Environmental Mesonet (IEM) Automated Surface System (ASOS) database (Iowa State University (2022); Mannarano (1998)).This example has been studied by Hu and Prado (2023) using a stochastic gradient variational Bayes approach for fast Bayesian inference.In their work, they consider the wind speed in knots at six airports in California from 01-06-2019 12:00 am to 31-08-2019 11:59 pm: EDU (Davis), SAC and SMF (Sacramento), MRY (Monterey), SNS (Salinas), and WVI (Watsonville).That particular time interval is used to avoid extreme values and non-stationarities caused by rainfall and storms, which occurred in other months.In particular, they apply their method to the standardised first difference of median wind speed every two hours for each location.In our study, we use the same processed data, which is accessible from the supplementary material in Hu and Prado (2023).Figure 8 shows the geographical locations for those six airports and the standardised detrended data.All six series are treated as a six-dimensional time series and analysed simultaneously.The order for the parametric working model suggested by the elbow criterion is 10 (see Figure 9).The results in Figure 10 show that all six spectral densities have a prominent peak at around 2/12 ≈ 0.524, which indicates a strong daily period of wind profile patterns at all locations.This is consistent with the finding in Hu and Prado (2023).However, in contrast to spectral estimates obtained by variational inference in Hu and Prado (2023), peaks in the VPNC estimates are much more pronounced and sharper, for the main daily peak as well as its harmonics.The spectral estimates at 2/12 ≈ 0.524 for the coastal locations WVI, SNS and MRY have slightly larger power than for the inland airports as also noted by Hu and Prado (2023).Furthermore, we are interested in the squared coherencies between the windspeeds at the various pairs of airports.The estimates are given in Figure 11.Not surprisingly, all pairs have a strong daily periodic coherency, which is in accordance with the daily period found in all series in Figure 11.Moreover, all pairs except for the ones with SMF have a strong second harmonic.In Hu and Prado (2023) correlations tend to appear locally, between locations close by, and no correlation between locations separated by 100-200 miles, i.e., between those pairs where one wind profile is taken from the group (EDU, SAC, SMF) and one from (WVI, SNS, MRY).In contrast, our results in Figure 11 show high squared coherencies also between locations in different groups, most pronounced at the daily frequency.Coherencies at higher frequencies drop off more quickly for between-group pairs than within-group pairs, e.g.WVI vs. SNS as compared to WVI vs. EDU.Even over geographical distances of 100-200 miles such as between the airports in the Monterey and Sacramento region, one would expect a daily correlation between the wind profiles, however hourly or minutely patterns of the wind profiles could vary greatly between distant airports but be similar for neighbouring airports.In fact, wind speed is affected by multiple factors including wind direction, topography and local climate.Wang et al. (2020) analysed large-scale wind patterns in California using model simulations from the variable-resolution Community Earth System Model (VR-CESM).They described ten wind patterns by directions and strengths in the northern California and patterns are associated with geopotential height pattern, geography and climate.Wind pattern is not variable with distance necessarily and wind speed correlation may not be strictly related to distance between locations.

Conclusion
This paper presents a novel Bayesian approach for spectral density estimation of multivariate stationary Gaussian time series.A nonparametrically corrected likelihood is proposed which can be regarded as a generalisation of the Whittle likelihood.To elaborate, a parametric likelihood is specified for the data in the time domain, and then the likelihood is nonparametrically corrected in the frequency domain in order to mitigate effects of model misspecification.A multivariate extension of the nonparametric Bernstein-Dirichlet process prior for univariate spectral densities to the space of Hermitian positive definite spectral density matrices is specified directly on the correction matrices.We prove contiguity of the nonparametrically corrected likelihood and the true likelihood for Gaussian time series.The results of a simulation study, exploring scenarios of a correctly specified parametric and a misspecified model, show that spectral density estimates based on the nonparametrically corrected likelihood inherit the efficiencies of the parametric approach in the correctly specified case and the robustness of the nonparametric approach in the misspecified case.Moreover, the nonparametrically corrected Gaussian VAR model performs well even for non-Gaussian time series.Applications demonstrate that estimates based on the nonparametrically corrected likelihood are much better in estimating sharp peaks and abrupt changes in the spectrum than nonparametric methods.R-code that implements the MCMC algorithm to sample from the posterior distribution is made publicly available at https://github.com/easycure1/vnpctest.
The contiguity shown in this paper is useful for deriving the posterior consistency of the corrected likelihood approach for stationary multivariate Gaussian time series based on Schwartz's theorem (Schwartz (1965)) and its extensions.The posterior consistency of our approach for the stationary multivariate non-Gaussian time series is also an exciting topic for future work.Moreover, our proposed approach holds broad applicability for analyzing multivariate stationary time series and practical utility across various datasets and disciplines, such as ECG data in medicine and gravitational waves measurements in astrophysics.1 in the article for the VNPC procedure with fixed order 1 for the parametric working model (red dashed), the VAR procedure with order selected by AIC (dotted) and the VNP procedure (dash-dotted), and the true spectral density is given by the solid black line.The following simulation study considers the VAR(2) and VMA(1) models with the same coefficients and covariance matrix as ( 14) and (15) in the paper, while the innovation term follows a bivariate Laplace distribution rather than the Gaussian distribution.Simulation results are provided in Table A.2.These clearly demonstrate that the VNPC procedure also performs well in the non-Gaussian case.Figures A.15-A.17 display the estimates for one  = 256 realisation by the three approaches.the VMA(1) model with a Laplace innovation of size  = 256 for the VNPC procedure with fixed order 1 for the parametric working model (red dashed), the VAR procedure with order selected by AIC (dotted) and the VNP procedure (dash-dotted), and the true spectral density is given by the solid black line.

Figure 1 :
Figure 1: Realisation of (a) the VAR(2) model (14) and (b) the VMA(1) model (15) of length  = 256.The left panel displays the first component and the right panel the second component of the time series.

Figure 3 :
Figure 2: The SOI and recruitment series

Figure 4 :Figure 5 :
Figure 4: Logarithmic spectral estimates for the SOI series by the VNPC(p) procedure (red line) and the VAR(p) procedure (black dashed line).The periodogram is given in grey.

Figure 7 :
Figure7: Estimated squared coherency for the SOI and recruitment series by the VNPC procedure with a parametric working model with order 5.The posterior median is given by the black line and the pointwise 90% credible region is in shaded pink.

Figure 8 :
Figure 8: The left graph shows the geographical locations for the six airports.The right graph shows the standardised first differences of median wind speed measurements every two hours from 01-06-2019 12:00 am to 31-08-2019 11:59 pm.

Figure 9 :
Figure 9: Negative maximum log likelihood for different VAR(p) models applied for the standardised detrended wind speed data.

Figure 10 :
Figure10: Logarithmic spectral estimates for the standardised detrended wind speed data by the VNPC procedure with a parametric working model with order 10.The periodogram is given by the grey line and the posterior median by the black line.The pointwise 90% posterior credible region is visualised in shaded pink and the uniform 90% region is in shaded blue.

Figure 11 :
Figure11: Estimated squared coherency for the standardised detrended wind speed data by the VNPC procedure with a parametric working model with order 10.The posterior median is given by the black line and the pointwise 90% credible region is in shaded pink.
Figure A.12: Spectral estimates for the realisation of (a) the VAR(2) model and (b) the VMA(1) model in Figure1in the article for the VNPC procedure with fixed order 1 for the parametric working model (red dashed), the VAR procedure with order selected by AIC (dotted) and the VNP procedure (dash-dotted), and the true spectral density is given by the solid black line.
Figure A.13: Posterior credible regions for the realisation of the VAR(2) model from Figure 1a in the article for (a) the VNPC procedure with fixed order 1 for the parametric working model, (b) the VAR procedure with order selected by AIC and (c) the VNP procedure.Pointwise 90% regions are represented in shaded pink, uniform 90% regions are in shaded blue, the true spectral density is given by the black solid line and the periodogram is shown in grey.
Figure A.15: Spectral estimates for one realisation of (a) the VAR(2) model and (b)the VMA(1) model with a Laplace innovation of size  = 256 for the VNPC procedure with fixed order 1 for the parametric working model (red dashed), the VAR procedure with order selected by AIC (dotted) and the VNP procedure (dash-dotted), and the true spectral density is given by the solid black line.

FigureFigure
Figure A.16: Posterior credible regions for the realisation of the VAR(2) model same as the one referred by Figure A.15 for (a) the VNPC procedure with fixed order 1 for the parametric working model, (b) the VAR procedure with order selected by AIC and (c) the VNP procedure.Pointwise 90% regions are visualised in shaded pink, uniform 90% regions are in shaded blue, the true spectral density is given by the black solid line and the periodogram is shown in grey.