Frequency Domain Estimation of Cointegrating Vectors with Mixed Frequency and Mixed Sample Data

This paper proposes a model suitable for exploiting fully the information contained in mixed frequency and mixed sample data in the estimation of cointegrating vectors. The asymptotic properties of easy-to-compute spectral regression estimators of the cointegrating vectors are derived and these estimators are shown to belong to the class of optimal cointegration estimators. Furthermore, Wald statistics based on these estimators have asymptotic chi-square distributions which enable inferences to be made straightforwardly. Simulation experiments suggest that the finite sample performance of a spectral regression estimator in an augmented mixed frequency model is particularly encouraging as it is capable of dramatically reducing the root mean squared error obtained in an entirely low frequency model to the levels comparable to an infeasible high frequency model. The finite sample size and power properties of the Wald statistic are also found to be good. An empirical example, to stock price and dividend data, is provided to demonstrate the methods in practice.


Introduction
The concept of cointegration plays a prominent role in the analysis of multivariate time series with unit roots, and a large variety of methods is available to the applied researcher for handling such data. Prominent among these methods is the vector error correction model (VECM) which is a convenient reparameterisation of a vector autoregressive (VAR) system that accounts for the cointegration between the variables. The popularity of the fully parametric VECM approach lies in its (relative) ease of estimation and its suitability for testing for the number of cointegrating vectors that exist. The VECM method is also implemented in many econometric software packages, is amenable to use as a forecasting tool and can be subjected to the usual battery of time series specification tests.
In some circumstances, however, a researcher may be unwilling to model the system dynamics in the form of a heavilyparameterised VAR but may still be interested in the cointegrating vectors themselves. In such cases alternative methods are available, including, but certainly not restricted to, the dynamic ordinary least squares (DOLS) method of Stock and Watson (1993), the fully modified (FM) least squares method of Phillips and Hansen (1990), and the spectral regression techniques proposed by Phillips (1991a). These approaches focus on the cointegrating vectors of interest and account for the system dynamics without needing to specify a parametric model. The DOLS approach adds leads and lags of first differences to the cointegrating regression; the FM least squares method employs nonparametric estimates of certain covariance matrices; and the spectral regression estimator is a type of feasible generalised least squares estimator in the frequency domain.
The vast majority of the contributions to the cointegration literature, both theoretical and applied, have focused on situations in which all the variables of interest are sampled at the same frequency. In cases where the variables are sampled at different frequencies this typically amounts to converting the higher frequency series into the lowest frequency. In recent years, however, there has been a growing interest in developing methods that are capable of exploiting all the mixed frequency data without converting the higher frequency data to the lowest frequency. Mixed frequency approaches applicable to testing for cointegration have been developed by Ghysels andMiller (2014, 2015) and Miller and Wang (2016), while estimation of the cointegrating vectors in regression models using mixed frequency data has been investigated by Miller (2010Miller ( , 2014Miller ( , 2016. It is also possible to extend the VECM approach for use with mixed frequency data; see Seong et al. (2013).
It might be tempting to argue that, because cointegration describes a set of long-run relationships between variables, the use of additional high frequency data alongside the low frequency data is unlikely to yield many benefits. Indeed, the use of very high frequency data, of the type available in finance, might introduce additional complications, such as microstructure noise. Moreover, as first emphasised by Shiller and Perron (1985), it is the span of the data, rather than the number of observations, that is important when modelling nonstationary data. This suggests that there are limitations as to how far the high frequency data should be exploited. But, used appropriately, it is possible that the additional information contained in the higher frequency data can be used to improve the properties of estimators of cointegration vectors in finite samples, even though the asymptotic properties are likely to be the same as those obtained using just the low frequency data.
In this paper we adapt the spectral regression approach of Phillips (1991a) to the estimation of cointegrating vectors using mixed frequency data. We treat the mixed frequency issue in the context of a discrete time temporal aggregation problem where the highest observed frequency is taken as the fundamental frequency; an alternative continuous time approach can be found in Chambers (2019). An advantage of the spectral regression estimators is that the model's disturbances are only required to be stationary and to satisfy a functional central limit theorem (FCLT), meaning that there is no need to assume any particular form of parametric dynamic model. By addressing the temporal aggregation directly we are able to show that the disturbances in the mixed frequency model are, indeed, stationary.
This paper makes three main contributions. The first, indicated above, is the derivation of a model that can be used with mixed frequency and mixed sample data for the estimation of cointegrating vectors. In this sense its motivation is very similar to that of Miller (2016), some of whose results are used in the proofs. The proposed method of dealing with the mixed frequency data turns out to be very straightforward -simply average the high frequency variables over the low frequency sampling interval. This, in fact, was proposed by Chambers (2003) in his study of the asymptotic efficiency of cointegration estimators under temporal aggregation. Although the high frequency observations are not used separately, the averaging nevertheless does use the information contained in all the high frequency observations.
The second main contribution is the derivation of the asymptotic properties of the spectral regression estimators of the cointegrating vectors. The estimators we consider are band limited around the zero frequency in view of cointegration being associated with this frequency. A large literature exists on the estimation of spectral density matrices but we focus on smoothed periodogram estimators in view of their relative ease of computation and analysis. It is shown that the resulting spectral regression estimators fall into the class of optimal cointegration estimators as defined by Phillips (1991c) and have the familiar mixed normal limiting distribution. We also consider a spectral estimator based on a regression that is augmented by an additional variable in first-difference form. This avoids the need for the estimation of a spectral matrix based on the residuals from an initial (consistent) estimation of the cointegrating vectors. This augmented spectral estimator possesses the same form of optimal limiting distribution. A useful feature of these limiting distributions is that Wald statistics, formed using the spectral regression estimators, have limiting chi-square distributions, thereby making inference a straightforward procedure.
The third contribution concerns some simulation evidence for the proposed methods of estimation and inference in finite samples. The simulation model involves a single cointegrating relationship between two high frequency variables. We consider the performance of the spectral regression estimator and its augmented version, based on smoothed periodogram estimators of the spectral density matrices, as well as the regression estimator based on an autoregressive spectral density estimator. We compare estimates obtained from the infeasible high frequency model (where both variables are sampled at the high frequency), a feasible low frequency model, and three mixed frequency representations. The spectral estimators are found to have good finite sample properties and compare very favourably with the time domain FM estimator. The performance of Wald statistics is also examined, with those based on the spectral regression estimator having good size and power properties.
The spectral methods proposed in this paper are related to those in Phillips (1991a,b) but are distinguished from them in the following ways. Phillips (1991a) proposed a class of spectral regression estimators for a single cointegrating vector and for time series sampled at a common frequency. Here we allow for multiple cointegrating vectors as well as data that may be sampled at different frequencies while simultaneously distinguishing between stock and flow sampling. Phillips (1991b) examined the properties of discrete time data generated from an underlying cointegrated continuous time system. His theorem and examples show conditions under which filtered series remain cointegrated, and similar arguments could be used in our discrete time setting to show that the observed mixed frequency and mixed sample data also remain cointegrated. But we go further to derive the form of a cointegrated representation for the mixed frequency observations that provides a basis for estimation and demonstrate that the disturbances in this system are stationary. Furthermore we also provide explicit proofs of the asymptotic properties of the spectral estimators.
The paper is organised as follows. Section 2 defines the triangular model of cointegration at the high frequency and provides the mixed frequency representation. Stationarity of the disturbance vector in this representation is also demonstrated. Issues concerning frequency domain estimation are addressed in Section 3, in which the estimators and test statistics are defined and their asymptotic properties derived. Section 4 defines the simulation experiments and reports the results obtained, while some concluding comments appear in Section 5. All proofs and supplementary results are presented in Appendix.
The following notation is used. The lag operator, L, is such that, for a variable x t , L h x t = x t−h for some real number h (not necessarily whole). Following Phillips (1991b) a variable, x t , is I(0) if it belongs to the class of covariance stationary processes that have a spectral density function, f (λ), that is bounded and continuous and for which f (0) is positive. A variable is I(1) if its first difference is I(0), and a vector of variables will be said to be I(0) or I(1) if all its elements are of the same order of integration. In the vector case it is possible that each element of the first difference is I(0) by this definition but the spectral density matrix is singular at the origin; in this case the vector of variables is said to be cointegrated. Convergence in distribution is denoted d →, convergence in probability by p → and almost sure convergence by a.s.
→. Finally, [x] denotes the smallest integer less than or equal to the scalar x, I n denotes an n × n identity matrix, 0 n×m is an n × m matrix of zeros, ⊗ denotes the Kronecker product operator, tr (A) denotes the trace of a square matrix A, ∥A∥ = √ tr (AA ′ ) denotes the Euclidean norm of A, B * is the transpose of the conjugate of a complex-valued matrix B and, for an n × m matrix C , vec (C ) denotes the nm × 1 column vector obtained by stacking the columns of C vertically on top of each other.

The model and a mixed frequency representation
The model concerns the cointegration properties of the elements of an I(1) vector of variables, y, of dimension n × 1. It is convenient to partition y as y = (y ′ 1 , y ′ 2 ) ′ where y 1 is n 1 ×1, y 2 is n 2 ×1 and n 1 +n 2 = n. The 1 ≤ n 1 ≤ n−1 cointegrating equations are normalised on the sub-vector y 1 and are expressed as linear combinations of y 2 so that y 1 −Cy 2 is stationary, the n 1 × n 2 matrix C containing the unknown cointegrating parameters of interest (the rows denoting the cointegration vectors). In the most general setting the elements of y 1 and y 2 are allowed to comprise both stock and flow variables and to be observed at both high and low frequencies (to be defined below). It is convenient to partition these vectors (without loss of generality) as where n S j + n F j = n j , n SH j + n SL j = n S j and n FH j + n FL j = n F j (j = 1, 2). In the above, the first superscript denotes the type of variable (stock, S, or flow, F ) while the second indicates the frequency with which the vector is sampled (high frequency, H, or low frequency, L).
We assume that the high frequency sampling corresponds to a sampling interval of length 0 < h H = h < 1, while the low frequency sampling interval is normalised to h L = 1. We also assume that k = h −1 is an integer so that there is a whole number of high frequency observations per low frequency observation. The variables are generated at the high frequency according to the triangular cointegrated system 1 y 1,τ h = Cy 2,τ h + u 1,τ h , τ = 1, . . . , N, where ∆ h = 1−L h denotes the high frequency first-difference operator, N denotes the number of high frequency sampling periods, and u 1,τ h (n 1 ×1) and u 2,τ h (n 2 ×1) are disturbance vectors whose properties are defined below. The cointegration in the system is depicted by (1), while (2) denotes the n 2 unit roots/stochastic trends. We shall refer to (1) and (2) as being the High Frequency Representation (HFR). With regard to the disturbance vector we make the following assumption. where: (b) ϵ τ h is a martingale difference sequence (mds), with natural filtration F τ h , satisfying: 1 Intercepts and deterministic trends in the model can be handled by the methods in this paper subject to suitable demeaning and detrending of the data prior to spectral regression. Further discussion of this issue is provided in Section 5.
Assumption 1 defines u τ h as a linear process driven by an mds disturbance vector; it therefore allows certain types of conditional heteroskedasticity (such as autoregressive conditional heteroskedasticity, or ARCH) provided condition (b)(ii) holds. In view of the constant unconditional variance it follows that the spectral density matrix of u τ h is given by where B u (r) is a Brownian motion process with covariance matrix Ω u = 2π f uu (0). The FCLT is used in the derivation of the asymptotic properties of the estimators.
The problem with the system (1) and (2) for the estimation of C is that the low frequency variables are not observed at the high frequency. To be precise, the high frequency observations are given by while the low frequency observations are of the form where T denotes the number of low frequency observations (and is also the time span that the data cover); in fact, T = Nh.
The low frequency flow variables are of the form i.e. the low frequency flows are averages of the (unobservable) high frequency flows y FL j,τ h over the low frequency However, cointegration is a property that persists at any sampling frequency, and so observations at the low frequency are also cointegrated. Re-writing (1) at the low frequency (essentially setting t = τ h and picking out the integer values for this index) yields The corresponding stochastic trends in (2) can be transformed to the low frequency by the application of the filter s(L h ) where s(z) = 1 + z + · · · + z k−1 ; noting that s(L h )∆ h y 2,τ h = y 2,τ h − y 2,τ h−kh = y 2t − y 2,t−1 we obtain However, the low frequency representation in (6) and (8) is also not directly amenable to the estimation of C because neither y FL 1t nor y FL 2t is observable (the aggregates, Y FL 1t and Y FL 1t , are observable instead) and hence it can be regarded as an Infeasible Low Frequency Representation. The challenge is to utilise the low and high frequency representations so that all of the information contained in the observed sample -at both the high and low frequencies -can be used to estimate C .
A simple, but effective, method of combining the mixed frequency data at the low frequency is to simply average the high frequency stock and flow variables. Such a procedure was suggested by Chambers (2003), albeit in a continuous time setting, for improving the asymptotic efficiency of cointegration estimators when observations on higher frequency stock 2 Note that the frequency range is (−π /h, π/h] because u τ h is defined at the high frequency and the frequency range for low frequency data is normalised to be (−π , π]. 3 Ghysels and Miller (2015) and Miller (2016) consider more general types of aggregation of the form ∑ k−1 l=0 ω l+1 y FL j,t−lh where the aggregation weights sum to unity and may be known or unknown; implicitly we are setting ω The assumption of known weights here avoids the need for employing a method such as MIDAS.
variables were available; similar efficiency arguments could be shown to hold in the discrete time setting employed here. We therefore construct low frequency averages of the form We shall refer to the resulting model as the Mixed Frequency Representation (MFR) and its form is presented below. Lemma 1. Let y 1 and y 2 satisfy the high frequency cointegrated system in (1) and (2), and define the observable averaged vectors Then the Mixed Frequency Representation is, for t = 2, . . . , T , The representation in Lemma 1 provides a basis for the estimation of the matrix of cointegration vectors, C . Its derivation is based on the infeasible low frequency representation in (6) and (8) in which deviations between observable (possibly averaged) and unobservable components, e.g. y SH quantities can be shown to be stationary; see Lemma A1 in Appendix. The MFR is, therefore, based on the entire sample of mixed frequency observable variables even though the high frequency variables are not included separately at each high frequency time point but are averaged to, in effect, mimic the observed flow variables. It is precisely this form of averaging of (stock) variables that was proposed by Chambers (2003) to improve the efficiency of the estimation of cointegration vectors when the stocks are available at a higher frequency than the flows; it is also nested within the aggregation schemes considered in Miller (2016). 4 It might be tempting to argue that the mixed frequency representation is discarding data by aggregating the high frequency stocks rather than including them separately. However, an important feature of the MFR in Lemma 1 to note is that it retains the n 1 cointegration equations and the n 2 stochastic trends of the underlying high frequency model. Alternative approaches that use the high frequency observations separately have been shown to be possible in some circumstances. For example, consider the vector each z jt contains n zj = k(n SH j + n FH j ) + n SL j + n FL j elements, so that z t is of dimension n z × 1, where n z = n z1 + n z2 . (2016) and Thornton (2019) examine continuous time systems and exploit the restrictions on the discrete time representation arising from temporal aggregation which results in a more parsimonious system than an unrestricted VAR in z t . It would, of course, also be possible to specify a similar VAR representation in the cointegrated system considered here but the resulting vector of disturbances -an expanded version of ξ t defined in Lemma 1 -would then have a singular spectral density matrix. The reason for this is that the expanded ξ t , sayξ t (which contains n z > n elements), is a function of only n underlying random variables contained in the vector u t . In other words, we can writeξ

Ghysels (2016) considers a VAR representation in the expanded vector z t while Chambers
is an n z × n matrix whose elements are polynomials that depict the way u t and its high frequency lags feed intoξ t .
The spectral density matrix ofξ t , given by H(e iλ )f uu (λ)H(e −iλ ) ′ , is then singular. In particular, the inverse of this matrix at the origin (λ = 0) characterises the limiting distribution of the spectral regression estimator, and therefore causes a degeneracy in this expanded system. We now turn to the analysis of a frequency domain-based estimator of C that rests only on the weak Assumption 1 concerning the disturbances in the HFR.

Estimation in the frequency domain
In view of the level of generality associated with the model of cointegration developed in the previous section, in which the disturbance vector, ξ t , is a linear process under Assumption 1 rather than having any specific (parametric) dynamic structure, a natural approach to estimating the matrix C of cointegrating vectors is to use spectral/frequency domain regression. Let x 0t = (x ′ 1t , ∆x ′ 2t ) ′ and define J = (I n 1 , 0 n 1 ×n 2 ) ′ . Then, based on the MFR in Lemma 1, we can write the system of interest as 5 Taking discrete Fourier transforms (dFts) in (11) and assuming T to be even 6 yields where {λ s = 2π s/T ; s = −T /2 + 1, . . . , T /2} denotes the set of Fourier frequencies and denote the dfTs of x 0t , x 2,t−1 and ξ t , respectively, at the Fourier frequencies. When C is unrestricted a simple spectral regression estimator can be obtained by minimising Λ denotes the set of frequencies over which the estimator is to be determined, and #(Λ) denotes the number of frequencies in Λ. In the most general case Λ consists of all the Fourier frequencies in the interval (−π , π]; however, if the model is believed to hold only over a subset of (−π , π] then Λ can be restricted accordingly, resulting in a band-limited estimator. In all situations we require both λ s and −λ s to belong to Λ. In the case of cointegration there are compelling reasons to limit Λ to a set of frequencies around the origin based on the theoretical arguments in Phillips (1991a,b) as well as the simulation results reported in Corbae et al. (1994). We therefore consider the symmetric set of frequencies Λ 0 = {λ s = 2π s/T ; s = −m, . . . , m} which contains the 2m + 1 Fourier frequencies around the origin for some integer m. As in Robinson (1972) we also generalise the objective function by incorporating a positive definite Hermitian weighting matrix, Φ(λ), resulting in an objective function of the form However, as argued by Phillips (1991a), the choice of the weighting matrix Φ(λ) is critical when spectral regression is applied using I(1) time series. For reasons of efficiency we require Φ(λ) to be proportional to f ξ ξ (λ) −1 , the inverse of the spectral density matrix of the unobservable disturbance vector ξ t . Although ξ t is unobserved a consistent estimator of f ξ ξ (λ) can be obtained by using the residuals,ξ t , from a least squares regression of (11).
Our estimator of f ξ ξ (0) is the smoothed periodogram estimator, defined bŷ where Iξξ (λ) = wξ (λ)wξ (λ) * and wξ (λ) is the dFt ofξ t ; it is therefore a straightforward symmetric average of 2m + 1 periodogram matrices around the zero frequency. More sophisticated estimates could be used but the smoothed periodogram has performed well in the simulations that are reported in the next section, despite its simplicity. With this choice of weighting matrix the objective function becomes Minimisation of (14) with respect to C results in the estimator where the spectral density estimatorsf 02 (0) andf 22 (0) are defined bŷ and where I 22 (λ j ) = w 2 (λ j )w 2 (λ j ) * and I 02 (λ j ) = w 0 (λ j )w 2 (λ j ) * .
An alternative, asymptotically equivalent, estimator of C , that avoids estimation of the spectral matrix f ξ ξ (0), can be obtained by augmenting (9) with ∆x 2t as an additional regressor, leading to consideration of where F = Ω 12 Ω −1 22 and ξ 1.2t = ξ 1t − F ξ 2t . In the frequency domain the relevant equation is where w 1 (λ s ), w ∆ 2 (λ s ) and w 1.2 (λ s ) are the dFts of x 1t , ∆x 2t and ξ 1.2t , respectively. The band-limited estimator of C based on the augmented equation is obtained by minimising The resulting estimator is given bŷ where thef ab (0) are the smoothed periodogram estimators using the relevant variables.
In order to derive the asymptotic properties of the estimators it is convenient to relate them directly to the matrix C .
By noting that w 0 (λ) = JC w 2 (λ) + w ξ (λ) it follows thatf 02 (0) = JCf 22 (0) +f ξ 2 (0), and making this substitution in (15) we Adopting a similar procedure forĈ A 0 , using w 1 (λ) = C w 2 (λ) +F w ∆ 2 (λ) +w 1.2 (λ) and making the appropriate substitutions, we find that where the subscript 1.2 denotes a quantity based on the dfT of ξ 1.2t . 7 The derivation of the asymptotic properties ofĈ 0 andĈ A 0 clearly relies on the asymptotic properties of various spectral density matrix estimators, which in turn will be driven by an FCLT for the normalised partial sums of ξ t . Based on (5) -the FCLT for partial sums of u τ h that holds under Assumption 1 -it is possible to derive an appropriate FCLT for the partial sums of ξ t , which are functions of the partial sums of u τ h . This is presented below.
Lemma 2. Under Assumption 1, as T → ∞, where B(r) is a Brownian motion process with covariance matrix Ω = h −1 GΩ u G ′ and .
The key to establishing Lemma 2 lies in utilising the precise relationship between ξ t and u τ h (that arises in the proof of Lemma 1) and then relating the two partial sum processes. The matrix G arises through use of a Beveridge-Nelson-type of decomposition of the matrix filter linking ξ t and u τ h ; details of this filter can be found in Lemma A2 in Appendix. In addition to Lemma 2 a result concerning the growth rate of autocovariances of ξ t is useful in establishing the asymptotic properties ofĈ 0 andĈ A 0 , and is given below. 7 Note that such dfT's are used purely as theoretical quantities in the proofs and are in no way used to compute the estimator itself.
Lemma 3 is used to establish a consistency result concerningfξξ (0) which is provided in Theorem 1(c). The derivation of the asymptotic properties ofĈ 0 andĈ A 0 also requires an assumption concerning the number, m, of frequencies employed in the estimation of the relevant spectral density matrices. To this end we make the following assumption.
Assumption 2 is common in the literature on spectral density estimation; see, for example, Brockwell and Davis (1991, p.351). The use of Assumptions 1 and 2 enables the following result concerning the asymptotic properties of the smoothed periodogram estimators of spectral density matrices to be established.
Then, under Assumptions 1 and 2, as T → ∞: It is convenient to partition Ω conformably with B 1 (r) and B 2 (r) in the form and to define Ω 11.2 = Ω 11 −Ω 12 Ω −1 22 Ω 21 . Note that the n ×n 2 matrix Ω 2 is the same matrix that appears in Theorem 1(b).
The asymptotic distributions ofĈ 0 andĈ A 0 can now be stated.
Theorem 2. Under Assumptions 1 and 2, as T → ∞, where B 1.2 (r) is a Brownian motion process with covariance matrix Ω 11.2 .
The form of limit distribution in Theorem 2 shows thatĈ 0 andĈ A 0 belong to the class of optimal estimators of Phillips (1991c) and that such optimality can be achieved either through system-wide estimation (yieldingĈ 0 ) or through augmentation of the cointegration equations (yieldingĈ A 0 ). A particular advantage of optimal estimators is that their mixed normal limiting distributions enable traditional asymptotic chi-square hypothesis testing in appropriate circumstances.
Suppose that interest centres on a set of q < n 1 n 2 possibly non-linear restrictions on the elements of C , represented by the null hypothesis H 0 : r(γ ) = 0, where γ = vec(C ) and r(·) is a q × 1 vector whose elements are twice continuously differentiable functions of γ . Let R(γ ) = ∂r(γ )/∂γ ′ be the q × n 1 n 2 matrix of first derivatives, assumed to be of rank q. Then a Wald statistic for testing H 0 based onĈ 0 against the alternative H 1 : r(γ ) ̸ = 0 is given by whereγ 0 = vec(Ĉ 0 ) and A Wald statistic can also be defined usingĈ A 0 ; it is given by whereγ A 0 = vec(Ĉ A 0 ) and ) ⊗f 11.2 (0) −1 .
ForV A 0 we requiref 11.2 (0) to be a consistent estimator of f 11.2 (0); this can be achieved using the smoothed periodogram estimator , which is consistent under Assumptions 1 and 2. 8 The limiting distributions of these Wald statistics are given below.

Theorem 3. Under Assumptions 1 and 2, as T
q under H 0 . Asymptotic chi-square inference can therefore be conducted based on both band-limited spectral regression estimators. A simulation analysis of the finite sample properties of the estimators and Wald tests is provided in Section 4.

Simulation results
In this section we explore the finite sample properties of the spectral regression estimators and Wald statistics. Our focus is on a bivariate model with cointegrating parameter C = 1 so that y 1 − y 2 is stationary. One advantage of a simulation exercise is that data can be generated at any chosen frequency and aggregated as required. We can therefore investigate the properties of the estimators and tests in the following five sampling schemes: Scheme 1 is the infeasible case where high frequency observations on both variables are available i.e. y 1,τ h and y 2,τ h for τ = 1, . . . , N. The variables could be stocks or flows but their nature is irrelevant in this case as they satisfy the HFR.
Scheme 2 uses only low frequency observations of the variables that are generated by the HFR; in terms of the MFR in Lemma 1 we have x 1t = y 1t and x 2t = y 2t for t = 1, . . . , T . This would correspond to a situation where only low frequency observations on two stock variables were available despite them actually being generated at the high frequency.
Schemes 3-5 use combinations of averaged and/or low frequency observations. In scheme 3 we have x 1t = y 1t and x 2t = Y 2t ; scheme 4 uses x 1t = Y 1t and x 2t = y 2t ; and scheme 5 has x 1t = Y 1t and x 2t = Y 2t . These can correspond to various types of stock and flow sampling; for example, scheme 5 represents averaging a high frequency stock combined with a low frequency flow, or various sampling of flow variables with the high frequency observations being averaged.
The simulations take the data span to be T = 100 and the high frequency sampling interval to be h = 1/3, which leads to N = 300 high frequency observations. Data are generated at the highest frequency (i.e. y 1,τ h and y 2,τ h for τ = 1, . . . , N) and then aggregated as required. The high frequency bivariate innovations satisfy a first-order vector autoregression of the form where ϵ τ h is a Gaussian white noise process with an identity covariance matrix and the following autoregressive matrices were used: Ψ 0 = 0 2×2 (so that u τ h is also Gaussian white noise), Ψ 1 = 0.4I 2 and Ψ 2 = 0.8I 2 , the latter two having repeated roots of 2.5 and 1.25, respectively. We also consider ARCH innovations when the VAR matrices are Ψ 0 and Ψ 2 , in which case each ϵ j,τ h (j = 1, 2) is Gaussian with variance σ 2 j,τ h and σ j,τ h = αϵ 2 j,τ h−h with α = 0.9. There are, therefore, five different generating schemes under consideration and five different sampling schemes for each one. In all cases u 0 = (0, 0) ′ .
A total of 10,000 replications of each model for u τ h were conducted and estimates under each of the five sampling schemes were computed. In addition to the ordinary least squares (OLS) estimator of C we also consider three different spectral regression estimators as well as the time domain fully-modified (FM) OLS estimator, each using three different values of the bandwidth parameter m, resulting in thirteen different estimates of C in each replication. 9 The first spectral regression estimator isĈ 0 , defined in (15), in which the smoothed periodogram estimatorfξξ (0) is based on a set of OLS residuals,ξ t ; this estimator is denoted FD in what follows. The second is the augmented estimatorĈ A 0 , defined in (19), and is denoted FDA. The third estimator isĈ 0 but is based on an autoregressive spectral density estimator (ASDE) of f ξ ξ (0) rather than a smoothed periodogram estimator; this is denoted ASD. 10 The ASDE of f ξ ξ (0) first fits a first-order VAR to the OLS residuals and then computes the estimator of the spectral density matrix at the origin using the expression whereK andΣ v denote the estimated autoregressive and covariance matrices in the VAR, respectively. The choice of m is required to satisfy Assumption 2 and so we take m = [T δ ] in schemes 2-5 with δ ∈ {0.3, 0.5, 0.7}; for T = 100 this results in m ∈ {3, 10, 25}. In the infeasible high frequency case (scheme 1) we scale these values by k leading to m ∈ {9, 30, 75}. The estimators based on each choice of δ are distinguished by appending 1, 2 or 3 to their abbreviated name, corresponding to the three values of δ in increasing order. Hence FD1 refers toĈ 0 using δ = 0.3, FD2 refers toĈ 0 based on δ = 0.5, and so on.
The simulation results concerning the performance of the estimators of C are presented in Table 1. In view of the well-known trade-off between bias and variance in spectral density estimation the Table reports 51 196.53 191.99 231.36 196.19 191.08 217.07 196.85 192.75 191.67 196.66  in the infeasible HFR (scheme 1). The effect of increasing dependence in the VAR -moving from Ψ 0 (white noise) to Ψ 1 to Ψ 2 -leads to small increases in RMSE in schemes 1 and 5 and reductions in RMSE in schemes 2-4. The addition of ARCH effects tends to increase the RMSEs across all five sampling schemes. Overall the results in Table 1 indicate that the spectral estimators typically perform at least as well as, and sometimes better than, the FM estimators in the majority of cases, subject to m being chosen large enough. Among the class of spectral estimators the ASD variants tend to have higher RMSE than the FD and FDA versions. 11 It is also of interest to investigate the finite sample properties of Wald statistics based on the spectral and FM estimators. To do this we examine the size properties of the tests under the null hypothesis H 0 : C = 1 and the power properties under the alternative H 1 : C ̸ = 1 using the four fixed alternatives C = {0.95, 0.99, 1.01, 1.05}. The results are presented in Table 2 for h = 1/3 and the VAR model using Ψ 2 as well as its ARCH variant. In addition to the OLS-based test we report results for the spectral regression estimators using the largest number of periodogram ordinates (FD3, FDA3 and ASD3) as well as the FM estimator based on the smallest number of autocovariances (FM1) -these correspond to the estimators that typically have the lowest RMSE. The entries in Table 2 are percentages and those for power are size-adjusted; the nominal size of the tests is 5 percent. It can be seen from Table 2 that the OLS-based Wald tests show the greatest size distortions closely followed by the FM-based tests. The largest size distortions for the spectral-based tests occur (perhaps surprisingly) in the infeasible high frequency sampling scheme 1 while the size is much closer to the nominal level in schemes 2-5 (although the ASD3-based tests are notably under-sized). The size-adjusted power is good for the spectral-and FM-based tests across all sampling schemes and dominates that of the OLS-based tests substantially in schemes 2-5. These features, combined with the good performance of the spectral estimators in terms of RMSE, suggest that spectral regression provides a good platform for estimation and inference in cointegrated models with mixed frequency and mixed sample data using the simple averaging of high frequency data.
11 Results obtained when changing the level of aggregation to h = 1/12 show reductions in RMSE in the majority of cases when compared to Table 1. This is presumably due to the corresponding increase in the number of high frequency observations that are feeding into the averaged data in the MFR.

Concluding comments
This paper has proposed a simple method capable of exploiting fully the information contained in mixed frequency and mixed sample data in the estimation of cointegrating vectors. The properties of easy-to-compute spectral regression estimators of the cointegrating parameters have been derived, these being in the form of theoretical asymptotic properties as well as simulated finite sample properties. The proposed estimators belong to the class of optimal cointegration estimators defined by Phillips (1991c) and Wald statistics based on these estimators have asymptotic chi-square distributions. The finite sample performance of spectral regression estimators have good RMSE properties which are at least as small as those of a time domain fully modified estimator. The size and size-adjusted power properties of associated Wald statistics are also good.
The model analysed contains no deterministic components but it is a straightforward matter to deal with an intercept and time trend, for example. Demeaning and detrending the data by regression methods prior to the application of the frequency domain regression to estimate the cointegration parameters is valid in this framework and the limiting distributions are defined in terms of demeaned and detrended Brownian motion processes. Such an approach is valid because the cointegration parameters are assumed to be fixed, thereby avoiding the problems highlighted by Corbae et al. (2002) in band limited spectral regression in models in which the parameters are frequency-dependent. Alternatively the intercept and trend could also be estimated as part of the spectral regression procedure albeit at the cost of an increase in the dimension of the parameter vector.

Appendix. Proofs of lemmas and theorems
Proof of Lemma 1. We begin by noting that we can relate x 1t and x 2t to y 1t and y 2t , respectively, using the matrix filter relationships where the matrix filter functions are defined as Re-writing (6) in the form y 1t = Cy 2,t−1 + v 1t , where v 1t = u 1t + C w 2t (a stationary vector), we can pre-multiply by S 1 (L h ) to obtain Inspection of the individual sub-matrices of S 1 (z)CS 2 (z) −1 shows that we can write . Hence and we need to demonstrate that ξ 1t is stationary. From the definitions of S 1 (z) and v 1t the first component is clearly stationary. In the second, note that φ 1 (L h ) operates on x SL 2,t−1 = y SL 2,t−1 ; we then have φ 1 (L h )y SL 2,t−1 = k −1 s(L h )y SL 2,t−1 − y SL 2,t−1 = −δ SL 2,t−1 , which is stationary by Lemma A1. Also, φ 2 (L h ) operates on the remaining (aggregated) elements; for example, which is also stationary by Lemma A1. Similar results apply to Y FH 2,t−1 and Y FL 2,t−1 , so that is a vector of stationary components, implying that ξ 1t is also stationary. Finally, (10) is obtained by applying the operator which is also clearly stationary. □ Proof of Lemma 2. From Lemma A2 we can relate the partial sum of interest to that of u t as follows: The task is then to relate the partial sums involving fractions of T to the FCLT in Assumption 1 which deals with the high frequency process and partial sums involving a fraction of N. Following the proof of Lemma 1 of Miller (2016)  from which we obtain the last quantity being asymptotically negligible owing to the summation being over a finite interval and hence will converge to zero, as shown in Miller (2016). Now, the elements of G(z) are polynomials of order no greater than k − 1 so we can use Lemma 2.1 of Phillips and Solo (1992) to write where the elements ofG(z) are polynomials of order no greater than k − 2. We can then write, using T = hN, It follows that, as T → ∞, is a Brownian motion process with covariance matrix Ω as defined in the Lemma. □

Proof of Lemma 3. Let
. Then, from Hamilton (1994, p.268), the AGF of ξ t , measured in high frequency time units, is given by The aim is to first relate Γ ξ ,lh to Γ u,lh . The product s(z)s(z −1 ) is a two-sided scalar polynomial of order k − 1: ) .
Then, from the form of G(z) in Lemma A2, we find that When multiplied by s(z)s(z −1 ) these matrices will have additional terms involving where the coefficients are again implicitly defined. Hence each summand of interest, Γ u,lh , is multiplied by a finite-order scalar polynomial in z h of order 2k − 2 at most. We therefore need to consider quantities of the form (with p = 2k − 2) which implies that Γ ξ ,lh = ∑ p m=−p a m Γ u,lh−mh . Taking integer values of lh we obtain But p is finite and independent of T , so that We are then led to consider using Lemma A3(a) and as the limit involving the sum of w k is equal to 2; see Lemma A4(a).
(b) Proceeding in a similar way as in part (a) we find that where w k is as previously defined and We are then led to consider S 2,k+1 w k using Lemma A3(b) and Lemma A4(a) and where S 2,k = ∑ ∞ l=k Γ ξ 2,l . Using summation-by-parts the second term can be because S 2,k+1 − S 2,k+2 = Γ ξ 2,k+1 . Now S 2,T → 0 as T → ∞ while, from Lemma A4(a), hence the first term converges to zero. As for the second term we have, from Lemma A4(b), for all k, and so we deduce that, under Assumption 2, as required.
(c) We begin by using the decomposition ) and then proceed to show that each of the two terms in parentheses is o p (1). Note that It then follows that and so the quantity of interest iŝ Using the stochastic orders of magnitude already established we obtain (1) under Assumption 2. The second term of interest is o p (1) because Lemma 3 controls the rate of growth of the autocovariances of ξ t . This second term is a consistency result for the infeasible estimator based on the unobservable ξ t and follows, for example, from results in Hannan (1970) and Fuller (1996). □ Proof of Theorem 2. From Theorem 1(c) we can replacefξξ (0) with f ξ ξ (0) and so Using the definitions 2π f ξ ξ (0) = Ω and Ω 11.2 = Ω 11 − Ω 12 Ω −1 22 Ω 21 it can be shown that J ′ f ξ ξ (0) −1 J = 2π Ω −1 11.2 and J ′ f ξ ξ (0) −1 = 2π ( Ω −1 11.2 : −Ω −1 11.2 Ω 12 Ω −1
Then, based on Theorem 1(c), Similarly we find that using Theorem 1(a). Combining these limits yields the stated distribution forĈ A 0 . □ Proof of Theorem 3. We begin with W 0 and note that the limiting distribution ofγ 0 has the representation Then, from the proof of Lemma 5.1 in Park and Phillips (1988), where the elements ofγ lie on the line segment betweenγ 0 and γ . Under H 0 , r(γ ) = 0 while the consistency ofγ 0 ensures that R(γ ) p → R(γ ) = R 0 . Then it follows that This limiting distribution, conditional on B 2 , is N(0, R 0 QR ′ 0 ) where Q = M −1 22 ⊗ Ω 11.2 . Theorem 1 implies that 2m + 1 2T 2V 0 d → Q −1 and hence we are led to consider Tr(γ 0 ).
The limiting distribution of this quantity, conditional on B 2 , involves a quadratic form in N(0, R 0 QR ′ 0 ) random variables weighted by the matrix (R 0 QR ′ 0 ) −1 , and hence is χ 2 q . But because this does not depend on B 2 it is also the unconditional distribution. Similar arguments apply to W A 0 . □

Supplementary Lemmas
The following Lemma is used in the proof of Lemma 1. It is more general than is actually required in the proof of Lemma 1 (which uses the result for j = 0) but the additional cost of showing that it holds for j = 1, . . . , k − 1 i.e. at any point in the interval over which the aggregation takes place, is minimal.
Proof of Lemma A1. We first write δ t−jh = s δ,j (L h )y t where s δ,j (z) = z j − k −1 s(z) and s(z) is defined following (6). The spectral density matrix of δ t−jh is then given by where f y (λ) is the pseudo-spectrum of y t satisfying f y (λ) = O(λ −2 ) as λ → 0. Now the leading term of which is ) .