The estimation of continuous time models with mixed frequency data

This paper derives exact representations for discrete time mixed frequency data generated by an underlying multivariate continuous time model. Allowance is made for different combinations of stock and flow variables as well as deterministic trends, and the variables themselves may be stationary or nonstationary (and possibly cointegrated). The resulting discrete time representations allow for the information contained in high frequency data to be utilised alongside the low frequency data in the estimation of the parameters of the continuous time model. Monte Carlo simulations explore the finite sample performance of the maximum likelihood estimator of the continuous time system parameters based on mixed frequency data, and a comparison with extant methods of using data only at the lowest frequency is provided. An empirical application demonstrates the methods developed in the paper and it concludes with a discussion of further ways in which the present analysis can be extended and refined. © 2016 The Author. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
In multivariate models it is not uncommon to find that the variables of interest are observed at different frequencies. A leading example where this arises is in models containing both financial and macroeconomic variables, the former being observable at high frequencies (e.g. daily, hourly or minute-by-minute) and the latter at lower frequencies (often monthly or quarterly). The default method in such cases is usually to express all data in terms of a common frequency, typically by aggregating the high frequency data to the lowest frequency, which inevitably results in a lot of information contained in the higher frequency data being discarded. If utilising this information can lead to better inferences about the relationships between variables, then it is important to develop methods that enable the information contained in the high frequency data to be retained alongside the low frequency data.
In recent years a variety of methods have been proposed to deal with mixed frequency data. Prominent among these in a regression framework has been the Mixed Data Sampling (MIDAS) approach proposed by Ghysels et al. (2002Ghysels et al. ( , 2006. A typical model in this framework contains a dependent variable that is observed less frequently than the explanatory variables, and the MIDAS method makes use of a polynomial weighting scheme to aggregate the high frequency regressors. A key feature of this approach is that the weighting scheme depends only on a small number of * Tel.: +44 0 1206 872756; fax: +44 0 1206 872724. E-mail address: mchamb@essex.ac.uk. unknown parameters (possibly only one or two) which makes this a parsimonious method of aggregating the high frequency data. The basic MIDAS approach has been refined further by Ghysels et al. (2007), Andreou et al. (2010) and Foroni et al. (2015), among others.
Another area in which progress has been made recently in the analysis of mixed frequency data is that of cointegration. Miller (2010) considers a general cointegrating regression in which the integrated regressors may be mismeasured, missing, observed at different frequencies or have certain other types of irregularity, and derives the asymptotic properties of least squares and related inferential techniques. Issues of testing for the presence of cointegration amongst mixed frequency (and temporally aggregated) data series have been explored by Ghysels andMiller (2015, 2014)) while Seong et al. (2013) consider estimation of vector error correction models with mixed frequency data via state space representations and the Kalman filter. A common implication of this body of work is that incorporating the additional information in high frequency data, rather than discarding it, can lead to better properties of estimators and inferential procedures and to improved forecasts.
To date, the analysis of mixed frequency data has, with one notable exception discussed below, been conducted firmly within the realm of discrete time models. A common approach is to consider a vector of variables observed at two different frequencies, one being high (e.g. weekly), say y H , and one being low (e.g. quarterly), say y L . Suppose the low frequency variables are observed once every k (high frequency) time periods, where k is an integer. Then the aim is to specify a model for the vector T L denotes the number of low frequency observations. Recent work in this vein has focused on vector autoregressive (VAR) models, variants of which are in widespread use in empirical macroeconomics and finance where data are often available at different frequencies. Examples of this approach include Anderson et al. (2015), , Marcellino (2013, 2014a,b), and Ghysels (2014), which collectively cover issues of identification, estimation, inference, and impulse response analysis in the context of regular and structural VARs. 1 A common feature of this approach is that the high frequency time scale is, in effect, assumed to be the fundamental time scale determining the dynamic relationships between the variables and, hence, the model specification is tied to the (arbitrary) highest sampling frequency. The notable exception referred to at the start of the last paragraph is Zadrozny (1988) who considers a continuous time autoregressive moving average (ARMA) model 2 with mixed quarterly and annual data for a set of stock and flow variables. The model is cast in state space form and Kalman filter recursions are used to compute the Gaussian likelihood function. An advantage of the continuous time specification is that the model of interest, whose parameters are estimated, is not tied to the (highest) sampling frequency. Instead, the dynamic process of interest operates more frequently (i.e. continuously) than the highest observation frequency. This is a more realistic setting for many financial variables, such as financial assets, that are traded (nearly) continuously. It is also, arguably, more relevant for macroeconomic aggregates which, although only observed at low frequencies, may be subject to changes, in response to stimuli, at any point within the sampling interval. 3 In this paper we consider the estimation of continuous time models formulated as a system of stochastic differential equations when the observed data are recorded at different frequencies.
The approach is structural in the sense that the parameters of interest are those of the continuous time model which govern the (unobservable) dynamics of the observed variables. The temporal aggregation of stock and flow variables is taken into account in the derivation of exact discrete models which have the property that data generated by the continuous time system satisfy these discrete time representations exactly -there is no approximation error involved in these representations of the discrete time data. We treat the cases of common sampling (i.e. all stocks or all flows) and mixed sampling (a combination of stocks and flows), both with mixed frequency data. The discrete time representations have wide applicability because very few restrictions are placed on the underlying continuous time system, allowing for nonstationary (including cointegrated) series, as well as stationary series.
The paper is organised as follows. Section 2 introduces the mixed frequency sampling framework and the continuous time system under consideration is defined. It also motivates the idea that discarding information by aggregating to a lower frequency has adverse impacts on the properties of estimators by reporting some simulation results based on a simple univariate continuous time model. The more general sampling framework is introduced and the continuous time system under consideration is defined. Section 3 is concerned with the derivation of exact discrete time models in the case of common sampling with mixed frequencies, results being reported for the cases of stock sampling (Theorem 1) and flow sampling (Theorem 2). Section 4 considers the more 1 Foroni and Marcellino (2014b) also consider DSGE models under mixed frequency sampling. 2 Zadrozny (1988) also extends the ARMA model to allow for exogenous variables.
3 This argument has been made most eloquently by Bergstrom (1990). complicated situation in which there is a mixed sample of stock and flow variables, the exact representation being reported in Theorem 3. Estimation, based on the Gaussian likelihood, is discussed in Section 5, which also covers some computational issues and reports the results of a simulation exercise involving stationary as well as cointegrated stock variables observed at mixed frequencies. Use of the exact discrete time model for mixed frequency data is shown to result in estimators with smaller bias and root mean square error (RMSE) than estimators based on extant methods of aggregating all data to the lowest frequency. An empirical application is provided in Section 6 in which monthly price indices for the UK and US are combined with daily, weekly and monthly exchange rate data in an investigation of long run purchasing power parity (PPP) relationships. Results based on the lower frequency exchange rate data do not reject the PPP restrictions but this appears to be due to the estimated parameters having large standard errors -they are not sufficiently informative to be able to reject the restrictions. Use of the daily exchange rate data, however, results in parameter estimates with much smaller standard errors which result in a strong rejection of the PPP restrictions. Section 7 concludes the paper and points to some directions for further research, and proofs of the three theorems, as well as some additional results, are provided in the Appendix.

Temporal aggregation and data sampling: the modelling framework
The general framework is concerned with the n × 1 vector , 2) and n 1 + n 2 = n. Attention is given to the situation where the variables are observed at two different sampling intervals (or frequencies), although the methods can be generalised to account for more than two frequencies. The sub-vector y 1 contains the variables observed at the highest frequency while y 2 contains the variables observed at the lowest frequency; these shall be referred to as the high frequency variables and the low frequency variables, respectively. For convenience (and without loss of generality) the time index shall be normalised to unity to correspond to the low frequency sampling interval. The low frequency observations are therefore, for stock variables, of the form {y 2t } T t=0 = {y 20 , y 21 , . . . , y 2T } = {y 2 (0), y 2 (1), . . . , y 2 (T )}, while for flow variables the observed vector sequence is In both cases T is regarded as being the effective sample size in view of the dynamic models derived in subsequent sections being conditioned on an initial value (either y 20 or Y 20 ). In this framework the effective span is also equal to T in view of the low frequency sampling interval being normalised to unity. The sampling interval for the high frequency variable will be denoted h, so that for stock variables the observations are of the form  y 1,τ h  N τ =0 = {y 10 , y 1h , . . . , y 1,Nh } = {y 1 (0), y 1 (h), . . . , y 1 (Nh)}, while for flow variables the sequence is For the high frequency variables, therefore, N denotes the (effective) number of observations, where Nh = T (implying that N = T /h). It is also convenient to assume that the high sampling frequency (the inverse of the sampling interval) is an integer; this will be denoted k and clearly k = h −1 . For example, if the low frequency data are observed annually and the high frequency data monthly, then h = 1 12 and k = 12, reflecting the fact that the high frequency data are sampled twelve times more frequently than the low frequency data. Note, too, that the high frequency flow variables have been normalised by dividing by h which expresses the observed flow in terms of the low frequency equivalent. For example, if y 1 represents the rate of flow of consumers' income and it is observed quarterly, so that h = 1 4 , then dividing by h expresses the observed quarterly flow as an annual equivalent. Although this normalisation is not essential in stationary models it does have some importance in nonstationary/cointegrated systems; see, for example, Chambers (2011, p. 160). Fig. 1 illustrates the relationship between the high and low frequency sampling schemes; note that kh = 1, and in effect τ = tk or τ h = tkh = t.
At this stage it is, perhaps, useful to give some indication as to the importance of using higher frequency data when available, and to do this a simple univariate example is provided. Suppose y(t) evolves according to the stochastic differential equation where W (t) denotes a Wiener process, y(0) = 0 and a < 0 to ensure stationarity. Furthermore, in accordance with the framework defined above, suppose that y(t) is a stock variable observed at the points h, 2h, . . . , Nh = T . Three sampling intervals are considered, h = 1, 1 4 , 1 12 , as well as two spans, T = 25, 100, and two values for the autoregressive parameter, a = −5, −1. One interpretation of these values is that if h = 1 corresponds to a year, then h = 1 4 and h = 1 12 correspond, respectively, to quarterly and monthly sampling intervals, while the two spans correspond to 25 and 100 years. The observed data can be shown to satisfy The results from 10,000 replications appear in Table 1, in which the bias and root mean square error (RMSE) of the maximum likelihood estimator of a are reported, the likelihood function having been concentrated with respect to the parameter σ 2 . For all four combinations of a and T it can be seen that increasing the sampling interval h (i.e. decreasing the sampling frequency) leads to an increase in both the bias and RMSE of the estimator. Furthermore, for a given value of a, increasing the span T for a given frequency (and thereby increasing sample size N) also reduces bias and RMSE. Also, for given span T , the bias and RMSE are smaller the closer is a to zero. These results suggest that it is desirable to use the highest frequency data available if at all possible. 4 Returning to the more general framework it is assumed in what follows that the n×1 vector y(t) satisfies the stochastic differential equation system 4 Abstracting, of course, from other complications that may arise when sampling more frequently, such as seasonality and microstructure noise. where µ and γ are n ×1 vectors of intercept and trend parameters, A is an n × n matrix of coefficients, and ζ (dt) is an n × 1 vector of (white noise) random measures satisfying ] = 0 for non-overlapping time intervals ∆ 1 and ∆ 2 ; see, for example, Bergstrom (1984) for details of random measures and their use in econometrics.
The aim is to estimate the vectors µ and γ and the matrices A and Σ. The elements of these vectors and matrices may be entirely unrestricted (apart from ensuring the symmetry and positive definiteness of Σ) or they may be known functions of an underlying parameter vector θ. Either way it is necessary to relate the parameters of the system (2) to the observed data.
The system of equations in (2) can be regarded as a continuous time VAR(1) with a deterministic trend, and the focus is on deriving exact discrete models that have the property that data generated by (2) satisfy these discrete time representations exactly and, moreover, are expressed in terms of both the high and low frequency variables. An alternative approach, adopted by Zadrozny (1988), is to write the system in state space form and to use the Kalman filter to derive the Gaussian likelihood function. Zadrozny (1988) deals with a more general continuous time dynamic system, namely a continuous time ARMA (p, q) (without deterministics), but the precise relationships between the discrete time observations are not needed in his approach. It can be of interest, however, to derive explicitly these exact discrete time models and to compare them with the existing mixed frequency VARs that have appeared in the literature, and this partly motivates the approach followed here. It is nevertheless possible, in principle, to derive discrete time models for more general continuous time ARMA (p, q) systems with mixed frequency data by extending the methods of Chambers and Thornton (2012), albeit at the cost of additional notational complexity; such extensions are left for future work with the current contribution focusing on the principles of dealing with mixed frequency data in the setting of (2).
It is perhaps worth reiterating that the usual approach to estimating the parameters of (2) with data observed at mixed frequencies is to aggregate the high frequency data to coincide with the lowest frequency, thereby discarding information contained in the high frequency data. The approach adopted here, however, deals with the solution to (2) in terms of the high frequency timescale and then manipulates the resulting expressions so that no data are discarded. The precise formulations are given in the following sections.

Models with common data sampling
In this section discrete time models are derived for the situations in which both frequency variables are either all stocks or all flows. The next section deals with the mixed sample case. The starting point is the solution to (2) which is given by where the matrix exponential is defined as where I n denotes the n × n identity matrix. The above solution is unique in a mean square sense and the definition of the integral with respect to the random measure can be found in Bergstrom (1984).

Stock variables
When both y 1 and y 2 are stock variables the objective is to derive a discrete time model that simultaneously incorporates the low frequency observations y 2t = y 2 (t) (t = 0, . . . , T ) and the high frequency observations y 1,τ h = y 1 (τ h) (τ = 0, . . . , N). What this effectively reduces to is finding a representation that holds for the points t = 1, . . . , T , given the value for t = 0, but which also contains the intermediate points t − h, t − 2h, . . . , t − 1 + h between each t and t − 1; these intermediate points correspond to the observations on y 1 . The solution (3) can be used to relate y(t) to y(t − h) in the form 5 where F = e Ah and the deterministic vector c(t) and random disturbance vector ϵ(t) are defined by For the purpose of deriving discrete time representations for the observed mixed frequency data, it is convenient to partition the system (5) in accordance with y 1 and y 2 as follows: in which This autoregressive representation depicts the law of motion for both frequency variables but the problem in using it as a basis for estimation is that the observations on y 2 (t) are not observed at intervals of length h; this variable is only observed when t is an integer. Nevertheless, (8) and (9) form the basis for a discrete time representation which is presented in Theorem 1. (2) with y 1 and y 2 consisting of stock variables which are observed as y 1,τ h = y 1 (τ h) (τ = 0, . . . , N) and y 2t = y 2 (t) (t = 0, . . . , T ). Then the observations satisfy, for t = 1, . . . , T , 5 In (5), t can take any value corresponding to τ h.

Remark 1.
A key feature of the discrete time representation in Theorem 1 is that information contained in the high frequency series, y 1 , is not discarded, because the intermediate observations on y 1 between each t − 1 and t, given by y 1,t−kh , y 1,t−1+h , y 1,t−1+2h , . . . , y t−h , are included on the right-hand sides of (10)-(13). This contrasts with extant methods based on aggregating the high frequency observations to the lowest frequency; such a system would then be given by (8) and (9) with h = 1.

Remark 2.
It is, perhaps, of use to consider a particular example to illustrate the nature of the discrete time representation in Theorem 1. Setting h = 1 3 and taking time units to correspond to a quarter of a year implies that the high frequency variables are observed monthly and the low frequency variables quarterly. 6 Then k = h −1 = 3 and the resulting system of equations is given in Box I.
Note that y 1,t−3h = y 1,t−1 and that, for each quarter, there is an equation for y 2 plus three equations for y 1 , corresponding to each month in the quarter. None of these equations contains lags beyond those dated t − 1, and all of the monthly observations on y 1 are included in the low frequency equation for y 2 , thereby incorporating all the available intermediate information on the high frequency variable.
Remark 3. Following on from the example above it is worth commenting that the coefficient vectors and matrices are solely functions of the parameters of the continuous time system, even though the discrete time system is in the form of a mixed frequency VAR; see, for example, the representations in . To emphasise this feature, the continuous time system contains 2n trend parameters (µ and γ ), n 2 autoregressive parameters (A), and n(n + 1)/2 covariance parameters (Σ ). By way of contrast the unrestricted discrete time mixed frequency VAR contains 6n 1 + 2n 2 trend parameters (b jt ), 6n 2 1 + 6n 1 n 2 + n 2 2 autoregressive parameters (B ij,k and B 22 ), and (9n 2 1 + 6n 1 n 2 + n 2 2 + 3n 1 + n 2 )/2 covariance parameters. A simple calculation reveals that the process of temporal aggregation imposes a total of 9n 2 1 + 6n 1 n 2 + 5n 1 restrictions. The parsimony achieved by taking the temporal aggregation into account is even greater with mixed frequency data than with data observed at a common frequency, and greater still with flow data, as will be seen below.
Remark 4. Even if the underlying continuous time system is ignored, and (8) and (9) are regarded as the VAR of interest that operates at the highest sampling interval h, Theorem 1 still provides the correct form of aggregated system for stock variables. This is because the b jt , B ij,k and B 22 are expressed as functions of c(t) and F , which again provides parsimony over a mixed frequency VAR written directly in terms of the discrete time observations, as in (10)-(13). Moreover, the representation takes into account the restrictions on the covariance matrix that arise from temporal aggregation, a feature that is often ignored in standard treatments of mixed frequency data.
Remark 5. The discrete time representation in (10)-(13) holds for both stationary and non-stationary (including cointegrated) continuous time systems in view of no restrictions having been placed on the continuous time system matrix A. For example, neither A, F nor any of their sub-matrices are required to be nonsingular.
Remark 6. Theorem 1 shows, for stock variables, that a continuous time autoregressive model of order one translates into a discrete time autoregressive model of order one with mixed frequency data, as in the case where a common sampling frequency exists. The difference here is that accounting for mixed frequencies results in a more complicated pattern of restrictions on the discrete time data, and the covariance matrix Ω η reflects the presence of the higher frequency components in its construction.

Flow variables
When both high and low frequency variables are observed as flows it is necessary to integrate the system to express it in terms of the observations. This could be achieved by integrating (5) directly or by integrating the discrete time representation in Theorem 1 itself, and it is the latter approach that is followed here. Although the low frequency variable, y 2 , is observed as an integral over (t − 1, t], it is actually convenient to integrate instead over the interval (t − h, t] that corresponds to the high frequency observations, as in the case of stock variables above. In manipulating these systems to eliminate unobservables (in particular the integral of the low frequency variable over intervals of length h) frequent use is made of the filter function s(L h ) where and L denotes the lag operator. 7 This is because the observable integral of y 2 , denoted Y 2t , can be regarded as the sum of the unobservable integrals over intervals of length h, denoted Y u 2t =  t t−h y 2 (r)dr, as follows: Some properties of a convolution of s(z) with another matrix filter are given in Lemma 1 in the Appendix and are used in the proof of Theorem 2. (2) with y 1 and y 2 consisting of flow variables which are observed as

Theorem 2. Let y(t) be generated by
Furthermore, define the aggregated high-frequency flow variables Then Y s 1 and Y 2 satisfy, for t = 1, . . . , T , and where the remaining components are defined in the proof of the theorem.
Remark 7. The discrete time representation in Theorem 2 contains aggregated versions of the high frequency flows. It would be possible, if desired, to express the equations in terms of Y 1t and its lags directly, although there is nothing to be gained from doing so for estimation purposes. 8 One application where this would be necessary is forecasting, but all that is really needed is to express Y s 1t in the form and to incorporate the lags on the right-hand side of (15). (15)-(18) contain additional lags of the high frequency variable y 1 owing to the fact that additional backward substitution is required to eliminate all unobservable components (which arise because of the nature of flow variables). The resulting discrete time equations implicitly contain all observations on the high frequency variable over the interval (t − 2, t] that provide additional information regarding the dynamics of the system and which may potentially result in better estimates of the underlying continuous time parameters.

Remark 8. The discrete time representations in
Remark 9. As in the case of stock variables no assumptions have been made concerning the continuous time parameter matrix A that governs stationarity. As a result the discrete time representation in Theorem 2 is also applicable to both stationary and non-stationary (including cointegrated) systems.

Remark 10.
A key difference of the discrete model for flows when compared to that for stocks is the presence of a moving average component in the disturbance vector. This is a common feature when flow variables are concerned and arises due to the integration involved in determining the observations; see Working (1960).
Remark 11. The covariance matrices, Ω ϵ0 and Ω ϵ1 , appearing in the definitions of the covariance matrices of u t , Ω u0 and Ω u1 , Remark 12. Remarks 3 and 4, relating to the parsimony of the continuous time approach, apply equally, if not more so, in the case of flow variables. In fact, the disturbance vector being MA (1) 8 Such a representation would appear less parsimonious but would emphasise the fact that up to 2k − 1 lags of the observed high frequency flow appear in the discrete time model.
introduces an additional (kn 1 + n 2 ) 2 parameters into the discrete time system via the first-order autocovariance matrix, yet all of the discrete time parameters remain functions of the same number of parameters in the underlying continuous time system (2).
Attention now turns to situations in which the variables are a mixture of stocks and flows as well as the observations being available at different sampling frequencies.

Models with mixed data sampling
In many applications the variables of interest are a mixture of stocks and flows, and it is therefore of practical importance to extend the discrete time representations in Section 3 to allow for such circumstances. In the most general scenario both the high and low frequency observations would consist of stocks and flows, but in order to avoid unnecessary complication the model considered is one where the high frequency variables are stocks and the low frequency variables are flows. While this distinction is somewhat arbitrary it does serve to highlight the issues involved in treating a mixed sample. It also has the advantage, however, of encapsulating situations where the high frequency variables are financial variables such as asset prices, exchange rates and interest rates (observed as stocks) and the low frequency variables are macroeconomic aggregates such as income/output, consumption, and investment expenditures (observed as flows).
In order to derive an exact discrete model, the following assumption is made concerning a sub-matrix of A in (2).
Remark 13. The matrix A 11 governs the response of dy 1 (t) to the level of y 1 (t) in the continuous time system (2) and its invertibility enables the unobservable variable  t t−h y 1 (r)dr to be expressed in terms of the observable y 1 (t) − y 1 (t − h) and  t t−h y 2 (r)dr, the latter being observable once the operator s(L h ) has been applied.
This type of assumption was also made concerning stock variables by Agbeyegbe (1987Agbeyegbe ( , 1988) (Assumption 2 in both cases) and Simos (1996) (Assumption 3) although both authors also made the assumption that the entire matrix A was nonsingular -this additional assumption is not required here. The assumption does, however, rule out the possibility of cointegration between a set of nonstationary stock variables. 9 In presenting the exact discrete model for the mixed sample case it is convenient to partition the matrix functions e Ar , P 0 (r) and P 1 (r) in accordance with y 1 and y 2 as follows: These functions are used as weights in integrals with respect to the random measure ζ (dt) that arise in deriving the exact discrete model. For example, F (r) is the weight function in the definition of the random vector ϵ(t) in (7). The presentation of the results in Theorem 3 is also aided by grouping some of the definitions together in Table 2.
Theorem 3. Let y(t) be generated by (2) with y 1 consisting of stock variables, which are observed as y 1,τ h = y 1 (τ h) (τ = 0, . . . , N), and 9 Subsequent work will explicitly investigate the case of cointegrated continuous time systems with mixed frequency data in more detail, thereby extending existing results in Chambers (2009Chambers ( , 2011 to the mixed frequency setting.

Table 2
Definitions of quantities in Theorem 3.
 NB: Additional quantities, e.g. the R j , can be found in the proof of Theorem 3. Then, under Assumption 1, the observations satisfy, for t = 1, . . . , T , . . .
and the remaining components are defined in Table 2. Furthermore, is a vector MA(1) process with and where the components determining these matrices are defined in Table 2.
Remark 14. The form of the exact discrete model in the mixed sample case is more complicated than in the case of common data sampling owing to the fact that more operations are required in order to eliminate unobservable components from the system and replace them with observable variables. The key component in this process is the integration of the system (2) over the interval (t − h, t] and then solving for the integral of y 1 in terms of its first difference, which is where Assumption 1 is utilised. Remark 15. The (kn 1 + n 2 ) × 1 disturbance vector ξ t is an MA (1) process as is common in discrete time representations of firstorder stochastic differential equations containing flow variables.
The autocovariance matrices, Ω ξ 0 and Ω ξ 1 , are more complicated than in the case of pure flow variables reported in Theorem 2 owing to the mixed sampling characteristics of y 1 (stocks) and y 2 (flows) in addition to the mixed observation frequencies.
Remark 16. Although Theorem 3 has focused on the case of high frequency stock variables and low frequency flow variables it would be possible, in principle, to derive a discrete time representation for the case in which the high frequency variables are flows and the low frequency variables are stocks. Should such a scenario arise then a discrete time model could be derived based on the methods utilised in the proof of Theorem 3.
Attention is now turned in the following sections to issues of estimation (including computational considerations) as well as simulation evidence and an empirical example.

Estimation
A natural approach to the estimation of the parameters of the continuous time system (2) is to maximise the likelihood function based on the exact discrete time representations presented in Theorems 1-3. Assuming that the random measure disturbance vector ζ (dt) in (2) is Gaussian (which would be equivalent to it being the increment of a Brownian motion process with covariance matrix Σdt) results in the discrete time disturbances also being Gaussian. Consider, first, the case of stock variables, in which the relevant vector of disturbances is η t defined in Theorem 1; this vector has dimension n * × 1, where n * = n 1 k + n 2 , and is known to be vector white noise with covariance matrix Ω η . Let θ denote the vector of unknown parameters in the model, which will consist of the elements of the vectors µ and γ , the matrix A and the unique elements of the covariance matrix Σ. 10 In this case the Gaussian log-likelihood, based on a sample of size T , is given by where |·| denotes the determinant of a matrix. In cases where there are flow variables present the log-likelihood function is of a more complicated form, reflecting the MA(1) nature of the disturbances.
In the case of a mixed sample, the relevant vector of disturbances is ξ t , defined in Theorem 3. Let ξ = (ξ ′ 1 , . . . , ξ ′ T ) ′ denote the n * T × 1 vector of all sample disturbances stacked vertically. The covariance matrix of ξ , denoted Ω ξ = E(ξ ξ ′ ), is of block Toeplitz form, with typical n * × n * T band given by where each 0 above is an n * ×n * matrix of zeros. The log-likelihood function is then Clearly this function poses more computational challenges than does ln L η , due to the required calculation of the determinant and inverse of the n * T × n * T covariance matrix Ω ξ . The same would be true in the case of flow variables where ξ and Ω ξ would be replaced by u and Ω u , respectively, using the relevant information in Theorem 2. However, the computational difficulties associated with log-likelihood functions of the form in (24) arising from continuous time systems were addressed in many of the articles in Bergstrom (1990), whose proposed procedure is to implement a Cholesky factorisation of the entire matrix Ω ξ which is also able to exploit its sparse nature. Alternative approaches could also be used to compute the likelihood function. Robinson (1993) and Chambers (1998), for example, employ a frequency domain (Whittle) approximation to the likelihood and work directly from the continuous time system (2). The method of residues is used to evaluate the infinite summations that arise in moving from continuous time to discrete time in the frequency domain. Such methods could, in principle, be extended to the case of mixed frequency data, although such an investigation is beyond the scope of the present paper. Yet another alternative, and one that has already been mentioned earlier, is to employ the Kalman filter, as in Zadrozny (1988). However, it has been argued by Bergstrom (1990, pp. 112-113) that the time domain approach outlined above, based on Cholesky factorisation, is less costly in terms of computational burden than the Kalman filter.
The properties of the Gaussian estimator,θ , obtained by maximising either ln L η (θ ) or ln L ξ (θ ) with respect to θ would depend on the time series properties of the variables. In stationary systems, subject to regularity conditions, the Gaussian estimator would typically have an asymptotically normal distribution and would converge to the true value (θ 0 ) at a rate equal to the square root of the sample size i.e. Bergstrom (1983). In cointegrated models, on the other hand, different rates of convergence are likely to apply to the parameters governing the short-run dynamics and the long run cointegrating vectors -the former typically converge to limiting normal distributions at the rate √ T , while the latter converge at the rate T to a 10 In the simulations reported below and in the empirical application in the next section the elements of the Cholesky factorisation of Σ, denoted M, rather than of Σ itself, are estimated in order to ensure that the covariance matrix is positive definite. Both M and Σ have the same number of unknown elements, n(n + 1)/2. limiting mixed normal distribution. Asymptotic results relating to the estimation of continuous time models of cointegration can be found in Phillips (1991), Chambers and McCrorie (2007) and Chambers (2011).

Computation
Inspection of the discrete time representations in Theorems 1-3 reveals that the autoregressive matrices are related to the matrix exponential F = e Ah while the deterministic terms and autocovariance matrices depend on various integrals involving the function F (r) = e Ar . A number of methods exist for the computation of matrix exponentials, the most straightforward involving truncation of the infinite series in (4) once a point is reached at which the change in successive values for each element is sufficiently small. Although this may not be the most efficient method a study of comparative techniques by Jewitt and McCrorie (2005) in the context of continuous time systems concluded that truncation is valid for the types of (typically) well conditioned problems to be found in models in economics, and is therefore the method adopted here.
The issues involved in computing the deterministic terms and autocovariance matrices are perhaps best illustrated with a concrete example, and for this purpose we focus on the discrete time representation in Theorem 1. The deterministic terms b 1t and b 2t can be seen to depend on c(t) defined in (6). It is possible to show (see Appendix) that If A is nonsingular these matrices have exact representations in the form while if A is nonsingular the following representations follow from integration of the infinite series in (4) term by term: In this latter case the method of truncation can be used, as in the computation of e A itself. The covariance matrix Ω ϵ in Theorem 1 is of the form (2005), using results of van Loan (1978), show that this matrix (and e Ah as well) can be obtained with the computation of a single matrix exponential. Let

Jewitt and McCrorie
Then Ω ϵ = P ′ 22 P 12 and e Ah = P ′ 22 . The matrix exponential e Qh can also be computed by truncating its infinite series representation. Similar techniques can be applied to the deterministic terms and covariance matrices arising in the more complicated representations in Theorems 2 and 3.

Simulation evidence
In order to assess the effects of explicitly incorporating mixed frequency data as opposed to aggregating to the lowest frequency, some simulations were carried out using a model containing two stock variables sampled at two different frequencies. Let y(t) = [y 1 (t), y 2 (t)] ′ , where y 1 (t) is sampled at the higher frequency with sampling interval h = 1/3 and y 2 (t) is the low frequency variable whose sampling frequency is normalised to unity. This scenario corresponds to y 2 being observed quarterly and y 1 being observed monthly. The model is given by where A is a 2 × 2 matrix with coefficients a ij (i, j = 1, 2) and Σ is a 2 × 2 symmetric positive definite matrix with elements σ 11 , σ 21 and σ 22 . This model nests a cointegrated system where A = αβ ′ with α = (α 1 , α 2 ) ′ and β = (β c , 1) ′ ; in this case α is the vector of adjustment coefficients and β is the cointegrating vector, normalised on y 2 . Three different parametric designs are considered for the matrix A, as follows: Design 1 allows for positive feedback between y 1 and y 2 , while in Design 2 this feedback is negative; the eigenvalues of A in both cases are −0.5 and −1.5 so the system is stationary. 11 In Design 3 the stationary cointegrating relationship is given by y 2 (t) − y 1 (t); the eigenvalues of A are given by 0 and −1, the former indicating the zero root (corresponding to a unit root in discrete time) in the system. In all three experimental designs the covariance matrix was taken to be of the form For all these designs the data are generated according to the exact discrete time representation in Theorem 1. The results of 10,000 simulations are contained in Table 3 for Designs 1 and 2 and in Table 4 for Design 3. The tables report the bias and root mean square error (RMSE) of three estimators of the model parameters. The column headed 'Low' contains the results when the data are aggregated to the lowest frequency, which would be the current default method for estimation. The estimated system in this case is given by (5) with h = 1. The next column, headed 'High', refers to the infeasible estimator obtained under the assumption that both variables can be observed at the highest frequency; the estimated system is again given by (5) but with h = 1/3. 13 The third column, headed 'Mixed', is the estimator using both frequencies of data i.e. the entire set of observations available to the researcher. In this case the model being estimated is given in Theorem 1 in which k = h −1 = 3; the vector of disturbances, η t , is of dimension 4 × 1. In all the experimental designs the data were 11 The eigenvalues of A are required to have negative real parts in order for the system to be stationary.
12 In view of σ 11 = σ 22 = 1 the parameter σ 21 is, in effect, measuring the correlation between ζ 1 and ζ 2 in the continuous time system.
13 Although y 1 is observed at the highest frequency, y 2 is not; this estimator is therefore infeasible because it is using observations that are not available to the researcher. from the high frequency data by selecting every third observation. This is consistent with there being 25 years of quarterly data for y 2 and the same span of monthly data for y 1 .
As can be seen from Table 3, the estimates obtained in the infeasible case (using high frequency observations on both variables) tend to produce the smallest bias and RMSE, the latter being approximately half the RMSE values for estimates based on the aggregated low frequency data. Estimates obtained using the mixed frequency data, however, show much lower bias and RMSE than the low frequency estimates even if they do not quite match the performance of the infeasible estimator (which is to be expected). A similar picture emerges from Table 4 in terms of the adjustment and covariance parameters, with all three methods producing very small bias and RMSE in estimating the cointegrating parameter β c ; this is presumably due to the superconsistency (faster convergence rate) of estimates of this parameter. The effect of the continuous time correlation parameter changing from σ 21 = 0 to σ 21 = 0.5 in Table 3 is to increase bias and RMSE in Design 1, particularly for the low frequency estimates, while the bias and RMSE remain broadly unaffected in Design 2. For Design 3 in Table 4 there is a tendency for both bias and RMSE to increase, though not uniformly.
In summary, the simulations suggest that neglecting high frequency data (when available) comes at a cost in terms of larger estimation bias and RMSE. Although the derivation of the exact discrete model is slightly more complicated when dealing with mixed frequencies than is the case when data are aggregated to the lowest frequency, the benefits of doing so would appear to be worthwhile. An empirical application is explored in the next section to ascertain the impact of using mixed frequency data in practice.

An empirical example: purchasing power parity
One of the most widely researched areas in the international macroeconomics/finance arena concerns the stationarity (or otherwise) of the real exchange rate between two currencies. The notion of purchasing power parity (PPP) -ignoring various nuances and subtleties -essentially suggests that, at least in the long run, the nominal exchange rate adjusts so that goods and services cost the same amount when prices in different currencies are expressed in a common currency; this, in turn, has implications for the real exchange rate. To focus ideas, let P denote the domestic price level, P * denote the foreign price level, and S denote the nominal exchange rate expressed as units of domestic currency per unit of foreign currency. Then PPP implies that S = P/P * or P = SP * , at least in the long run. Defining s = ln S, p = ln P and p * = ln P * , another way of writing this relationship is s = p − p * , a form that readily suggests testing PPP by estimating a simple regression of s on p and p * and then testing whether the coefficients on the two regressors are +1 and −1, respectively. However, given that exchange rates and price indices are often found to be characterised as containing unit roots (and possibly two in the case of price indices), cointegration techniques can be used to test the PPP restrictions on the cointegrating vector (if one is found), but the evidence is somewhat mixed. A comprehensive account of empirical research into PPP and the real exchange rate can be found in Sarno and Taylor (2002).
The price indices used in PPP research are typically observed on a monthly basis while exchange rates can be observed at much higher frequencies. 14 The usual approach is to aggregate the exchange rate data to the monthly frequency by taking either a monthly average of daily closing prices or the value at a particular point in the month (such as a daily price in the middle or at the end of the month). This approach throws away a great deal of potentially useful information in the exchange rate data that may be pertinent to assessing the empirical validity of PPP. In what follows, the methods derived in this paper are utilised in an assessment of the implications for PPP of combining the high frequency exchange rate data with the lower frequency price index data.
In accordance with the notation used in previous sections, let y 1 (t) = s(t) denote the high frequency exchange rate variable, let 14 Although we focus on the temporal aggregation aspects of price indices a referee has highlighted that there are also cross-sectional issues involved but which are beyond the scope of this example. y 2 (t) = [p(t), p * (t)] ′ denote the vector of low frequency price index variables, let y(t) = [y 1 (t), y 2 (t) ′ ] ′ , and note that t is being treated as a continuous time parameter. The most general continuous time system under consideration has the representation where µ is a 3 × 1 vector of intercepts, α is a 3 × 1 vector of adjustment parameters, β = [1, β p , β p * ] ′ is a 3 × 1 vector of cointegrating parameters normalised on s(t), and ζ (dt) is a 3 × 1 vector of random measures with mean vector zero and symmetric positive definite covariance matrix Σdt of dimension 3 × 3. The unrestricted intercept, µ, can be decomposed using the identity so that κ represents an intercept in the cointegrating relationship. 15 In terms of the system (2), γ = 0, µ = δ +ακ and A = αβ ′ ; the system (27) therefore represents a continuous time cointegrated system with cointegrating relationship of the form s(t)+κ + β p p(t) + β p * p * (t). The long-run PPP hypothesis consists of the two restrictions β p = −β p * = −1, but we shall also consider a weaker version in which there is a single restriction of the form β p = −β p * that does not constrain these parameters to a particular value.
The data used in the empirical application are the monthly consumer price indices for the United States (P) and the United Kingdom (P * ) for the period January 1996-March 2014 and the daily (closing) exchange rate (S), measured in US dollars per pound, from January 1, 1996 to March 28, 2014. The exchange rate data were also aggregated to weekly and monthly frequencies, there being four weekly observations and twenty daily observations corresponding to each month; further details concerning the data are provided in the Appendix. Table 5 contains summary results and test statistics relating to estimation of the unrestricted model (26) and various nested models using the monthly price indices and monthly, weekly and daily exchange rate data. The table contains the maximised log-likelihood for the unrestricted model as well as the Schwarz Information Criterion (SIC ) for the unrestricted and restricted models; for the latter, the values of the likelihood ratio (LR) test statistics (and their associated p-values) are also reported. 16 For the monthly exchange rate data none of the restrictions is rejected at the 5% significance level and so the preferred model is one in which the PPP restriction is imposed. This restricted model is also chosen by the SIC , and a similar picture emerges with the weekly exchange rate data. However, moving to the daily exchange rate data, entirely different conclusions are drawn. All of the restrictions tested are comprehensively rejected, the preferred model, chosen on the basis of the likelihood ratio tests and the SIC , being the unrestricted model.
To gain further insight into these results, Table 6 reports full estimates of the unrestricted models for all three exchange rate frequencies, as well as the restricted (preferred) models for the monthly and weekly exchange rate data (the preferred model for the weekly data being the unrestricted model). In addition to α, β and µ Table 6  16 In all cases the p-values are obtained assuming that the LR q statistic has a chisquared distribution with q degrees of freedom.  Cholesky matrix corresponding to Σ; this is a lower-triangular matrix M such that Σ = MM ′ and is given by The final row of Table 6 contains the implied values of the intercept in the cointegrating relationship, κ. As can be seen from Table 6, many of the estimated parameters in the models using monthly and weekly exchange rate data have large standard errors relative to the estimated parameter. It is, therefore, perhaps no great surprise that the restrictions are not rejected when imposed on the model, due to this level of parameter uncertainty. This is reflected by the small fall in the value of the log-likelihood reported in Table 5 when the PPP restrictions are imposed. Using daily exchange rate data, however, results in parameter estimates with much smaller standard errors, and hence there is much greater information available to reject the hypotheses being tested. For example,β p = −3.0260 with a standard error of 0.2601, whilê β p * = 3.2674 with a standard error of 0.3223, so the rejection of the joint hypothesis β p = −β p * = −1 should not be too surprising.
The move from monthly and weekly exchange rate data to daily data has resulted in more precise estimates of the parameters of interest and a rather different outcome to the statistical tests carried out.

Conclusions
This paper has derived exact discrete time representations corresponding to a system of linear stochastic differential equations when the observed sample is available at different frequencies. The cases of common data sampling (all stock or all flow variables) and mixed data sampling (a combination of stocks and flows) have both been considered in this mixed frequency scenario. Simulations based on both stationary and cointegrated systems reveal that there are substantial gains to be made in estimation (smaller bias and RMSE) by utilising all the higher frequency data in conjunction with the low frequency data instead of aggregating all data to the lowest frequency. An empirical application using monthly price indices for the UK and US and monthly, weekly and daily exchange rate data reveal that substantially different inferences can be drawn when the highest frequency exchange rate data are used compared with the lower frequency data. An advantage of considering mixed frequency data analysis based on an underlying continuous time model is that the model of interest is not tied to any particular sampling frequency and the discrete time representations hold exactly at whatever frequency the data are sampled.
There are a number of directions for further research that emerge from the results reported here. One obvious extension is to consider exact discrete time representations for mixed sample and mixed frequency data generated by an ARMA(p, q) system in continuous time. This could be achieved by combining the techniques employed in the proofs of Theorems 1-3 with those used by Chambers and Thornton (2012) whose empirical examples demonstrated the benefits of the additional MA components in the continuous time model for improving model fit. In some situations it would also be beneficial to allow both high and low frequency variables to be mixtures of stocks and flows and also to allow for more than two sampling frequencies. A more realistic situation in some cases would be to allow for the sampling intervals to be of different lengths, which may be of some importance when combining monthly and daily data as the number of days in the months of the year are not equal. Another area in which continuous time models can be advantageous is in producing forecasts of variables at any future point in time as they are not tied to a particular sampling frequency. Again the techniques used in this paper form a basis for these, and other, extensions, and will be explored in future work.

Appendix A. Proofs of theorems
Proof of Theorem 1. The aim is to eliminate the unobservable y 2 (t −h) from the system (8) and (9) and, effectively, replace it with y 2 (t −kh) = y 2 (t −1). Lagging (9) by h and repeatedly substituting backwards enables y 2 (t − h) to be expressed in terms of y 2 (t − 1) as follows: Substituting (28) into (9) yields the equation for y 2 , (13), in which the disturbance term is More generally, lags of y 2 (t) can be expressed, for 1 ≤ l ≤ k − 1, as so that, for 0 ≤ l ≤ k − 1, lags of y 1 (t) can be expressed in terms of y 2 (t − 1) as as required, where (29) has been substituted for y 2 (t − lh − h) and if l = k − 1 in this last summation then it is to be taken as zero. From these representations it is possible to write η t in the form where the F η,j (j = 0, . . . , k − 1) are defined by The form of the covariance matrix Ω η then follows in view of ϵ(t) being vector white noise with covariance matrix Ω ϵ .
Proof of Theorem 2. Integrating (10)-(13) over the interval (t − h, t] and dividing the equations for y 1 by h yields, for 0 ≤ l ≤ k − 1, where Y u 2t =  t t−h y 2 (r)dr, the b * jt are defined in the theorem, and e jt =  t t−h η jr dr (j = 1, 2). The objective is to eliminate terms involving the unobservable Y u 2t and replace them with the Turning to u t it is possible to write where F η (z) =  k−1 j=0 F η,j z j from the representation for η t in (30) and The role of the symmetric (kn 1 + n 2 ) × (kn 1 + n 2 ) matrix K h is to account for the normalisation by h in the equation for the high frequency flows. From (7) the integral of ϵ(t) takes the form where P 0 (r) =  r 0 e As ds and P 1 (r) =  h r e As ds; the splitting of the double integral to two single integrals with respect to ζ (dt) follows along the same lines as in the proof of Lemma B2 in Chambers (2011). From this representation it follows that ϵ t is an MA (1) process with autocovariances Using Lemma 1 the convolution H(z) = s(z)F η (z) has the form The autocovariance properties of u t follow from this representation and the MA(1) properties of ϵ t ; see Remark 11.
The autocovariance structure of ξ t can be obtained by noting that ξ t = s(L h )ξ t , and that, in turn, and the R j (j = 0, . . . , k − 1) are defined by The precise formulae come about by noting that v 1t = ∆ h ϵ 1 (t) The random vector v t is thus an MA(1) process with variance matrix Ω v0 and autocovariance matrix Ω v1 as defined in Table 2. This enables the autocovariances ofξ t and, hence, of ξ t itself to be derived.
The matrices in this expression can be represented as stated above.
Derivation of expressions for c(t) in Section 5.2. From (6) we have Hence c(t) = (C 1 µ − C 2 γ ) + C 1 γ t. When A is nonsingular the expressions for C 1 and C 2 follow from matrix generalisations of standard integral formulae. For nonsingular A the stated expressions follow from using the infinite series representation of e Ar and integrating term by term, yielding as required.
Data used in Section 6. The monthly UK CPI data were obtained from the Office for National Statistics while the monthly US CPI data and daily exchange rate data were obtained from the FRED database provided by the Federal Reserve Bank of St. Louis. As mentioned in the text, the daily exchange rate observations were also aggregated to weekly and monthly frequencies. For the monthly series the last available daily observation was used. For the weekly series, the daily dates chosen depended on the number of days in the month. For months with 31 days, the four weekly values correspond to days 8, 16, 24 and 31; for months with 30 days, the observations correspond to days 8, 15, 23 and 30; while for February, days 7, 14, 21 and 28/29 were used. This ensures that the weekly data are in accordance with the monthly data i.e. the observation for the fourth week in the month corresponds to the last day of the month, as for the monthly data. In cases where the required day corresponds to a weekend or holiday, the value immediately prior to the required day was used.