MULTIVARIATE FRACTIONAL INTEGRATION TESTS ALLOWING FOR CONDITIONAL HETEROSKEDASTICITY WITH AN APPLICATION TO RETURN VOLATILITY AND TRADING

We introduce a new joint test for the order of fractional integration of a multivariate fractionally integrated vector autoregressive [FIVAR] time series based on applying the Lagrange multiplier principle to a feasible generalised least squares estimate of the FIVAR model obtained under the null hypothesis. A key feature of the test we propose is that it is constructed using a heteroskedasticity-robust estimate of the variance matrix. As a result, the test has a standard χ limiting null distribution under considerably weaker conditions on the innovations than are permitted in the extant literature. Specifically, we allow the innovations driving the FIVAR model to follow a vector martingale difference sequence allowing for both serial and crosssectional dependence in the conditional second-order moments. We also do not constrain the order of fractional integration of each element of the series to lie in a particular region, thereby allowing for both stationary and non-stationary dynamics, nor do we assume any particular distribution for the innovations. A Monte Carlo study demonstrates that our proposed tests avoid the large over-sizing problems seen with extant tests when conditional heteroskedasticity is present in the data. We report an empirical case study for a sample of major U.S. stocks investigating the order of fractional integration in trading volume and different measures of volatility in returns, including realized variance. Our results suggest that both return volatility and trading volume are fractionally integrated, but with the former generally found to be more persistent (having a higher fractional exponent) than the latter, when more reliable proxies for volatility such as the range or realized variance are used. JEL: C12, C22


Introduction
Long memory models have been used to model time series data in a wide range of fields of application. The class of (multivariate) fractionally integrated autoregressive moving average [ARFIMA] models provides a parsimonious means of simultaneously modelling the patterns of long and short range dependence typically seen in many macroeconomic and financial data sets; see, for example, the surveys in Baillie (1996) and Robinson (2003). In the context of the ARFIMA class of models the long memory parameter, or fractional exponent (vector of exponents in the multivariate case), is the key parameter driving the behaviour of the series. Where this is zero a weakly dependent (short memory) ARMA series obtains. If it is less than one-half the series is weakly stationary, otherwise it is nonstationary, the familiar autoregressive unit root case occurring where the exponent is unity. Consequently, considerable interest has been paid to developing methods of inference on the fractional exponent both as a parameter of interest in its own right and for preliminary data analysis. A leading example is a test of the null hypothesis of weak dependence (against fractional alternatives); here a non-rejection would allow for the use of standard methods for conducting, among other things, causality, structural vector autoregression, or impulse response analyses. More generally, such tests could usefully be employed to indicate what order of differencing of the data is required for such methods to be suitably employed.
In the univariate setting a number of hypothesis tests on the fractional exponent have been proposed; see, among others, Robinson (1994), Tanaka (1999), Breitung and Hassler (2002), Nielsen (2004b), Demetrescu et al. (2008), Hassler et al. (2009Hassler et al. ( , 2016 and Cavaliere et al. (2017). In the context of a vector series one could perform such univariate fractional integration tests separately on each element of the vector. However, the overall size of such a testing procedure would be hard to control. Moreover, multivariate testing can improve efficiency relative to univariate testing because it explicitly acknowledges and exploits the existence of any endogenous cross-dependencies in the vector series which can reduce the variability in the estimation errors and, hence, improve efficiency in estimation and testing. In this paper we develop multivariate fractional integration tests designed to test joint null hypotheses concerning the values of the long memory parameters of the elements of a fractionally integrated vector autoregressive [FIVAR] model. Specifically, we propose a parametric multivariate Lagrange multiplier [LM]-type test in the time-domain which generalises the univariate regression-based LM-type test of Demetrescu et al. (2008) to the multivariate case. The method we propose can also be used to construct confidence sets, at a given asymptotic coverage level, for the true values of the long memory coefficients.
Our testing procedure is implemented in a regression-based context, based on feasible generalised least squares [FGLS] estimation of the multivariate FIVAR model under the null hypothesis, coupled with a heteroskedasticity-robust variance matrix estimate. A key advantage of, and motivation for, this approach is that it allows us to significantly weaken the technical conditions needed on the innovations, relative to existing multivariate fractional tests including, among others, Robinson (1995), Lobato andRobinson (1998), Lobato (1999), Lobato and Velasco (2000), Marinucci and Robinson (2001), Breitung and Hassler (2002), Shimotsu (2007), and (Nielsen 2004b(Nielsen , 2005(Nielsen , 2011. In particular, we allow the driving innovations in the data generating process [DGP] to follow a vector martingale difference sequence [MDS] which is permitted to exhibit time-varying conditional heteroskedasticity. This therefore allows for both serial and cross-sectional dependence in the conditional second-order moments, which is of particular empirical relevance when modelling financial data and is not, to the best of our knowledge, allowed by any extant multivariate fractional integration test. Like Nielsen (2004aNielsen ( , 2005, we work within the context of a multivariate FIVAR model. This model allows each series within the vector process to have different fractional exponents irrespective of the parameters of the short-run component of the model. This property is not guaranteed when using the class of vector autoregressive fractionally integrated [VARFI] models where the orders of integration of the elements of the vector series are not constant throughout the parameter space of the model; for further details see Nielsen (2005, pp.381-382). This is important for the empirical case study we consider in this paper with respect to trading volume and return volatility where we aim to explicitly investigate whether the data support the hypothesis that these series admit a common fractional exponent or not. For a further recent empirical application using FIVAR models, investigating the effects of monetary policy on the economy, where it is important to allow the elements of the vector time series to have potentially different fractional exponents, see Lovcha and Perez-Laborda (2018). An implication of the FIVAR model, however, is that fractional cointegration is not possible between the elements of the vector time series. In common with the tests in Nielsen (2005) we do not restrict the fractional exponents to lie within a particular region, thereby allowing for both stationary and non-stationary dynamics. We also do not impose any particular distributional law on the innovations.
Under the conditionally heteroskedastic setting outlined above, our proposed test retains a standard χ 2 limiting null distribution (irrespective of the null values of the long memory parameters being tested) and exhibits non-trivial power against a sequence of local (Pitman drift) alternatives. Moreover, where the errors are independent and identically distributed [i.i.d.] our test statistic is asymptotically equivalent to the multivariate LM statistic discussed in Nielsen (2004aNielsen ( , 2005. As a consequence, the LM-type test we propose is asymptotically (locally) efficient when the errors are i.i.d. Gaussian. Monte Carlo simulation experiments show that our proposed multivariate fractional integration test displays good finite sample size control and power performance in the presence of empirically relevant data features, such as short-run dependence and time-varying GARCH-type conditional variances for both Gaussian and non-Gaussian innovations. In contrast, extant tests are shown to display quite poor finite sample size control in the presence of such features. 4 Multivariate testing is naturally intended to address joint hypotheses involving the degree of persistence of a set of variables. This has important practical applications. As a leading example, there has been considerable interest in both the theoretical and empirical finance literatures on understanding the link between trading volume and return volatility. A number of papers have analysed if the longrun dynamics of these variables share a common order of fractional integration, with mixed evidence; see, for example, Bollerslev and Jubinski (1999), Lobato and Velasco (2000), Luu and Martens (2003), Fleming and Kirby (2011) and Rossi and de Magistris (2013).
In our empirical analysis we apply our new approach to conduct joint inference on the order of fractional integration of trading volumes and different measures of return volatility focusing on 30 major U.S. stocks from the Dow-Jones Industrial Average Index [DJI]. We also investigate the existence of a common order of fractional integration between volume and these measures of return volatility. Because our tests do not require a particular distribution and, more importantly, allow for time-varying conditional second-order moments, our results are likely to be more robust than those reported in previous studies which are based on estimation techniques which neglect these empirically relevant data features (e.g. Bollerslev and Jubinski (1999), Lobato and Velasco (2000) and Fleming and Kirby (2011)).
Together with daily log-volume, we consider the log-transformations of three alternative measures of return volatility with increasing degrees of efficiency, namely: absolute-valued returns, the range-based estimator of Garman and Klass (1980), and a measure of realised variance constructed from 5-minute returns. An important aspect of this analysis is to investigate the influence that measurement errors have on the conclusions drawn from the data. Our empirical findings suggest that a common fractional exponent cannot in general be rejected when return volatility is proxied by absolute-valued returns, but can be rejected when it is proxied by more accurate estimates such as the range or realised variance. These findings are consistent with the previous literature and help us to understand the disparity between empirical results where different proxies for volatility are used. Our empirical results indicate that return volatility tends to exhibit a larger fractional integration exponent than trading volume, with long-term behaviour possibly driven by non-stationary dynamics. Heterogeneous degrees of fractional integration, such that return volatility tends to be more persistent than trading volume, could originate in certain types of trading strategies associated with imitation and herding in investors and market microstructure environmental conditions; see, e.g., LeBaron and Yamamoto (2008) and Yamamoto (2011).
The remainder of the paper is organised as follows. Section 2 introduces the DGP and the main assumptions underlying our theoretical analysis. In section 3 we detail our new LM-type multivariate fractional integration test and derive its asymptotic distribution under both the null hypothesis and a sequence of local alternatives. Section 4 discusses the results of our finite sample Monte Carlo study. Section 5 analyses the empirical relationship between trading volume and return volatility for stocks from the DJI. Section 6 concludes. An on-line supplementary appendix contains mathematical proofs of the large sample results given in section 3 together with additional material relating to the Monte Carlo analysis in section 4 and to the empirical application in section 5.
In what follows, ⇒ and p → denote weak convergence and convergence in probability, respectively, as T → ∞. I(·) is an indicator function that equals one if the condition in parenthesis is fulfilled, and equals zero otherwise. The operators ⊗ and correspond to the Kronecker and Hadamard products, respectively. The quantities I n and 0 n×m denote an n-dimensional identity matrix and an n × m zero matrix, respectively. The notation A = {a ij } denotes that the (i, j)th element of the matrix A is given by a ij .

A FIVAR Model with Heteroskedasticity
We consider the observable k-dimensional time series vector {y t } T t=1 , where y t ≡ (y 1,t , ..., y k,t ) , is generated according to the DGP: where ∆ d+θ (L) is a k × k diagonal matrix polynomial in the conventional lag operator, L, with characteristic element given by (1 − L) d i +θ i , i ∈ {1, ..., k}. The real-valued fractional exponent, d i + θ i , is commonly referred to as the long memory or fractional integration parameter, such that d + θ ≡ (d 1 + θ 1 , ..., d k + θ k ) . The k-dimensional vector ε t ≡ (ε 1,t , ..., ε k,t ) is a weakly-dependent (short memory or I(0)) noise process with bounded spectral density that is bounded away from zero at the origin. Our focus is on developing tests of the null hypothesis that d is the true order of integration of {y t }; that is, H 0 : θ = 0, against the alternative hypothesis that at least one element of θ is non-zero. Assumption 1 details the formal properties which we will assume to hold on {ε t } in (1). (1) is generated as Π (L) ε t = e t ≡ (e 1,t , ..., e k,t ) , with Π (L) := I p − p j=1 Π j L j , where Π j are k×k parameter matrices such that Π (L) has all of its roots lying outside the unit circle and {e t } satisfies the following conditions:

Assumption 1. {ε t } in
t=−∞ is a strictly stationary and ergodic vector MDS, with respect to the natural filtration F t , the σ-field generated by {e s : s ≤ t} .
Remark 1. Assumption 1 allows the short memory component of {y t } to be driven by a stationary VAR(p) process. Accordingly, (1) is a FIVAR model in which each component {y i,t } , i = 1, ..., k, follows a Type-II ARFIMA(p, d i + θ i , 0) process. The choice of Type-II fractional integration in our setting has the desirable feature that 7 Return Volatility and Trading Volume integration test we propose in section 3.2 is asymptotically equivalent to the multivariate LM fractional test in Nielsen (2005), then for the same reasons as are discussed in Nielsen (2005, pp.378-379), the LM-type test, LM F GLS d , developed in section 3 is also implicitly a test of the null of no fractional cointegration (in the sense defined in Nielsen (2005, p.378)) and will diverge at rate O p (T ) under fractional cointegration. 2 It therefore seems advisable to consider the tests proposed in this paper alongside tests for fractional cointegration. We adopt this approach in the empirical application in section 5 by also considering the procedures developed in Nielsen and Shimotsu (2007).

Preliminaries
Given the observable time series vector {y t } generated as in (1) and an arbitrary real-valued vector g ≡ (g 1 , ..., g k ) , define the k-dimensional stochastic processes where (1 − L) g + := t−1 j=0 Λ j (g) L j , and with {Λ j (g)} t−1 j=0 denoting a sequence of k × k diagonal matrices with ith diagonal element λ 0 (g i ) := 1, and λ j ( corresponding to the truncated series of polynomial coefficients in the binomial expansion (1 − L) g s := ∞ j=0 λ j (g s ) L j . These variables are straightforward generalisations of the corresponding univariate processes in Breitung and Hassler (2002) to the multivariate context, with the characteristic harmonic weighting in (3) arising from the derivative of a (Gaussian) score function. Remark 10 below gives further insight into the key role played by these variables in the construction of our proposed LM-type test statistic.
2. Numerical experiments investigating rejection rates of the LM F GLS d test and the tests of Nielsen (2005) and Breitung and Hassler (2002) in a fractionally cointegrated model are reported in the supplementary appendix. These show that, as expected, all three tests display empirical rejection frequencies in excess of the nominal level which are larger, other things equal, the larger is T or the strength of cointegration. Of the three tests, our FGLS test tends to reject with slightly lower frequency than the other two tests.

8
Let Φ denote a k × k diagonal matrix with ith diagonal element ϕ ii , i = 1, ..., k. Under Assumption 1, testing the null hypothesis that d is the true order of integration of {y t } , H 0 : θ = 0, is equivalent to testing H 0 : Φ = 0 in the multivariate linear regression model where p * := max(1, p). This equivalence holds because, under H 0 : θ = 0, (5) and (1) are bijective with ϕ ii = 0 for all i = 1, ..., k and v t = e t in (5); see also Breitung and Hassler (2002), Demetrescu et al. (2008), andHassler et al. (2009). It will prove convenient to re-write (5) in matrix notation. First, corresponding to the time series of observations for each element of y t , we have the equivalent representation, is a k -dimensional parameter vector, with k := pk + 1, and π ij denotes the i-th row of Π j , j = 1, ..., p, With the exception of the first regressor, all other right-hand side variables that characterise the i-th equation (6) are the same, since these always correspond to lagged values of ε t,d . Then, given T := k (T − p * ), we can write the system of equations (6) compactly as Y d = X * −1,d β + u, with these terms defined implicitly as: .

A Heteroskedasticity-Robust LM Test
Under Assumption 1 and H 0 : θ = 0, it follows that E (uu ) = Σ ⊗ I T −p * . Equation (5) defines a seemingly unrelated regression equation [SURE] system. Although equation-by-equation ordinary least squares [OLS] estimation will deliver consistent estimates of β, these estimates will not be efficient unless Σ is diagonal 9 Return Volatility and Trading Volume (recalling that the regressors differ across the equations in the system). We will therefore consider a FGLS estimator of β based on a preliminary consistent estimate of Σ (obtained using OLS residuals estimated on an equation-by-equation basis).
Theorem 1 Let y t be generated according to (1) and let β be the vector of FGLS estimates defined in (7). Under Assumption 1 and H 0 : .., β 0k ) with β 0s := (0, π s1 , ..., π sp ) , s = 1, ..., k, and Ω β : The dependence of the asymptotic variance of the FGLS estimator on nuisance parameters arising from any weak dependence and/or cross sectional correlation in ε t implies that asymptotically pivotal inference on the long memory parameters will need to be based on a heteroskedasticity-robust statistic formed using a consistent estimate of Ω β . This can be achieved by using the familiar Eicker-Huber-White approach building on the preliminary OLS estimate Σ and the FGLS residuals u := Y d − X * −1,d β. In particular, a heteroskedasticity-robust estimate of the variance matrix Ω β is given by It is shown in the supplementary appendix that Ω β is a consistent estimate of Ω β under the conditions given in Assumption 1.
Based on the heteroskedasticity-robust estimate, Ω β , it is then straightforward to construct a test statistic for the joint hypothesis H 0 : θ = 0 using the LM testing principle. Specifically, we can form a heteroskedasticity-robust LM-type test which rejects H 0 : θ = 0 for large values of the statistic where R = {r ij } is a k × kk indicator matrix taking a value equal to one when j = (i − 1)k + 1, i = 1, ..., k, and zero otherwise. In Theorem 2 we next derive the large sample behaviour of LM F GLS d under both the null hypothesis, H 0 : θ = 0, and under the sequence of local alternatives H c : θ = c/ √ T , where c ≡ (c 1 , ..., c k ) is a k-vector of finite constants (Pitman drifts) at least one of which is non-zero.
Theorem 2 Let y t be generated according to (1) , where χ 2 (k) and χ 2 (k,ξ) denote a standard χ 2 distribution with k degrees of freedom, and a noncentral χ 2 distribution with k degrees of freedom and non-centrality parameter ξ := (L −1 c) (L −1 c), respectively, with L denoting an upper triangular matrix such that L L = RΩ β R .
Remark 6. The result in part (i) of Theorem 2 shows that the limiting null distribution of LM F GLS d is pivotal and that a test of H 0 : θ = 0 can be run using standard critical values from the χ 2 (k) distribution, where k is the dimension of y t . Part (ii) of Theorem 2 establishes that the asymptotic distribution of LM F GLS d displays a non-trivial positive offset under the local alternative, H c : θ = c/ √ T , vis-à-vis the null, H 0 , but that its asymptotic local power function will, in general, depend on nuisance parameters arising from any weak dependence or cross sectional correlation present in ε t . The same is also true of the extant multivariate fractional integration tests discussed in section 1, except that these do not, in general, have pivotal limiting null distributions when conditional heteroskedasticity is present in e t , as a consequence of the fact that they are not based around a heteroskedasticityrobust estimate of the variance matrix Ω β .
Remark 7. Theorem 2 provides a theoretical basis for the construction of confidence sets. This can be achieved by inverting the non-rejection region of the test statistic; see Hassler et al. (2009). More specifically, let LM g denote the value of the LM statistic when testing H 0 : θ = 0 for an arbitrary g ∈ R k , and let Ψ be an arbitrary compact set in R k . Define D λ := g ∈ Ψ : Pr χ 2 (k) > LM g ≤ 1 − λ with λ ∈ (0, 1) , i.e., the subset of Ψ for which H 0 cannot be rejected at the λ significance level. From Theorem 2, it follows that if Ψ is large enough so as to contain the true values of the long memory parameter vector, then the probability of the true order of integration lying within D λ is at least (1 − λ).
Remark 8. Our proposed test procedure can be generalised to account for nonzero means following the approach in Robinson (1994). To that end, consider the extended form of the DGP in (1) given by . Denote the resulting estimates as µ i , i = 1, ..., k, and the corresponding residuals as ε it,d i : One then simply redefines the ith element of the vector ε t,d from (2) to be ε it,d i , i = 1, ..., k, and then proceeds as before. Letβ denote the FGLS estimator obtained in this way. Then, following the approach taken in Proposition 4 of Demetrescu et al. (2008), it can be shown that Theorem 1 holds with β replaced byβ since β − β = o p (T −1/2 ) under the restrictions considered and the additional condition that d > 0. More generally, the results can be extended to account for, among other things, deterministic polynomial time trends and deterministic seasonal effects; see also Nielsen (2005) and Demetrescu et al. (2008). The large sample results given in this section are not affected by accounting for such deterministics. 4 Remark 9. In practical applications of the tests, the lag order p will typically be unknown and so could be selected using a standard consistent information criterion such as the Bayes information criterion [BIC]. Demetrescu et al. (2008) argue that these can lead to substantial finite-sample biases in the context of the tests considered here. As an alternative, Demetrescu et al. (2008) advocate the use of a deterministic lag selection rule, such as the popular Schwert (1989) rule which sets p = K(T /100) 1/4 , where · denotes the integer part of its argument and K is a finite positive constant. Provided the true lag order p is finite, as we assume in this paper, then the limiting distribution theory given in this section will remain apposite for tests based on a lag length determined according to such deterministic rules. We will implement Schwert's rule in the empirical application considered in section 5.
Remark 10. It is useful to compare the large sample properties of our proposed test with the Gaussian LM test of Nielsen (2005) in comparable settings. To this end, consider the case where, as required by the conditions imposed in Theorem 3 of Nielsen (2005, p.381), Assumption 1 is restricted such that p = 0 and e t is an i.i.d. innovation sequence. It is straightforward to show that under these additional restrictions .., z * T −1,d ) and a i denotes the i-th unit k-dimensional vector. Noting, moreover, that z * t−1,d := t−1 j=1 j −1 e t−j,d = − ln (1 − L) + e t−j,d under the null hypothesis, the vector X * −1,d Σ −1 ⊗ I T −1 Y d corresponds to the Gaussian score vector S T := J k vec Σ −1 S 10 given in Equation   (5), Nielsen's (2005) test relies on pre-whitening using the residuals from a VAR(p) model in a two-stage procedure. Augmentation and prewhitening are asymptotically equivalent strategies but will differ in finite samples, as will be explored in the next section.
To simplify our discussion, we fix θ 2 = 0 in all of the reported simulations and vary θ 1 among {−0.3, −0.25, ..., 0, ..., 0.25, 0.3}. Consequently, while the true order of integration of {y 2t } is always one, the true order of integration of {y 1t } is 1 + θ 1 . The case where θ 1 = 0 allows us to investigate the empirical size properties of LM F GLS d , while the cases where θ 1 = 0 allow us to investigate its finite sample power against an alternative where one of the long memory parameters deviates from the null hypothesis. For each of the parameter configurations (α, β, ρ, π 1 , π 2 , θ 1 ) , the two sample lengths, and the two conditional distributions, we compute LM F GLS d and determine the empirical rejection frequencies [ERFs] at the 5% nominal (asymptotic) level over 5, 000 replications.
We also benchmark the performance of LM F GLS d against two alternative (but related) tests. The first is the multivariate LM test of Nielsen (2004aNielsen ( , 2005 discussed in Remark 10 above, denoted LM M LE d in what follows, and the second is the multivariate trace test of Breitung and Hassler (2002), which we denote BH d . While LM F GLS d corrects for stationary serial correlation in ε t via lag augmentation in (5) and so is the most natural candidate to benchmark our test against. In contrast, the BH d test is for the null hypothesis of a common order of integration between the elements of the vector time series. Our simulation DGP is such that this condition holds under the null hypothesis, but not under the alternative, so a comparison with this test is appropriate. Table 1 reports the empirical size properties (θ 1 = θ 2 = 0), for LM F GLS d , LM M LE d and BH d where no short-run dynamics are present (π 1 = π 2 = 0), and where, accordingly, no lag augmentation or pre-whitening is needed. This allows us to first investigate the impact of GARCH effects, contemporaneous correlations, and the conditional distribution of the innovations on each test.

ERFs with no Augmentation/Pre-whitening
The results show that LM F GLS d displays ERFs close to the nominal asymptotic 5% level in almost all cases. Some mild over-sizing is seen for the smaller sample size considered when the innovations are conditionally Student-t distributed with relatively high GARCH persistence, α = 0.1 and β ≥ 0.80, and significant levels of endogeneity, ρ ≥ 0.4. These distortions are largely ameliorated as the sample size increases. Where the innovations are i.i.d. (α = β = 0), both LM M LE d and BH d display good finite sample size control regardless of the conditional distribution or the degree of endogeneity. However, where the innovations exhibit conditional heteroskedasticity a very different pattern emerges for both LM M LE d and BH d . These tests display a tendency to strong over-sizing, with these distortions being larger (other things equal): the stronger the degree of persistent of the GARCH process; the larger the degree of endogenous correlation |ρ|; and for innovations drawn from a heavy-tailed distribution. Moreover, these size distortions are not ameliorated as the sample size increases. To illustrate, for T = 500 and ρ = 0.8, the ERFs of LM M LE d and BH d with GARCH errors driven by (α, β) = (0.10, 0.85) and Student-t innovations are 36% and 38.8%, respectively. In contrast, the LM F GLS d test is only slightly oversized at 7.1%. For T = 1000, the corresponding ERFs of 14 LM M LE d and BH d increase significantly to 48.3% and 52.8%, respectively, while that of LM F GLS d reduces to 6.1%.

ERFs with Augmentation/Pre-whitening
We now analyse the finite sample size and power properties of LM F GLS d , LM M LE d and BH d in the case where the errors, ε t , can display first-order stationary VAR dynamics. Accordingly, we set p = 1 in (5) in relation to the LM F GLS d test, while analogously we use a VAR(1) for pre-whitening in connection with the LM M LE d and BH d tests. For ε t we consider: (i) π 1 = π 2 = 0, so that augmentation/prewhitening is in fact unnecessary, and (ii) π 1 = π 2 = 0.4, so that the correct order of augmentation/pre-whitening is employed. Table 2 reports ERFs of the three tests in the Gaussian homoskedastic case (α = β = 0). Results for the Student-t case are not reported as these are almost identical to the results reported in Table 2. Also, to keep the size of the subsequent tables to manageable proportions we will only report results for two values of the correlation coefficient, namely ρ = 0 and ρ = 0.8. Corresponding results for ρ ∈ {0.2, 0.4, 0.6} can be obtained on request.
Consider first the results for the case where θ 1 = 0 so that the null hypothesis holds. Here we see that the ERFs of the augmented LM F GLS d test lie close to the nominal asymptotic level throughout, even where the lag augmentation is unnecessary. Pre-whitening also appears to be effective for the LM M LE d and BH d tests, with the exception of the case where ρ = 0.8 where these tests are somewhat oversized for T = 1000.
Turning next to the empirical power results for θ 1 = 0, we see that LM F GLS d displays good finite sample power properties with power increasing, other things equal, both as |θ 1 | increases and as T increases, as would be expected. Power is also larger, other things equal, for ρ = 0.8 than for ρ = 0, illustrating the efficiency benefits gained from multivariate modelling when the variables are cross-correlated. In terms of a comparison between the three tests, overall the finite sample power properties of LM F GLS d and LM M LE d seen in Table 2 are very similar for alternatives where θ 1 < 0, as might be expected given the asymptotic equivalence of these tests when the innovations are i.i.d.; cf. Remark 10. For alternatives where θ 1 > 0 (i.e., when the process is more persistent than posited under the null) LM F GLS d can display somewhat higher power than LM M LE d , particularly in the case where the errors are first-order autocorrelated, π 1 = π 2 = 0.4; for example, for π 1 = π 2 = 0.4, ρ = 0, T = 500, and θ 1 = 0.3 the power of LM F GLS  clearly dominate BH d on power; in the previous example the power of BH d is only 26.3%. The power functions of all of the tests are asymmetric in the sign of θ 1 , for a given DGP, such that a false null hypothesis which leads to an over-differenced series (θ 1 < 0) is seen to be more easily rejected than an incorrect null which leads to an under-differenced series (θ 1 > 0) where the magnitude of the under/over difference is the same. To illustrate, for π 1 = π 2 = 0.4, ρ = 0 and T = 500, the power of LM F GLS d to detect θ 1 = 0.25 and θ 1 = −0.25 is 49.6% and 72.0%, respectively. Breitung and Hassler (2002) report a similar asymmetry in the power properties of their univariate tests.
Finally, we turn to the case where the innovations may display GARCH effects and excess kurtosis.
The results for θ 1 = 0 show that the empirical size properties of the tests in the presence of GARCH are similar to the corresponding results reported previously for the serially uncorrelated case with no augmentation/prewhitening in Table 1. In particular, while the empirical size of LM F GLS d is reasonably close to the nominal asymptotic 5% level throughout (size departures are not greater than 1.6% for T = 500 and not greater than 0.8% for T = 1000), incorrectly assuming conditional homoskedasticity causes significant over-sizing in both LM M LE d and BH d which is not ameliorated by increasing the sample size. To illustrate, for ρ = 0.8, and (α, β) = (0.10, 0.85) , LM M LE d and BH d , respectively, display ERFs of 9.2% and 8.7% for T = 500 and 10.4% and 9.9% for T = 1000 with Gaussian innovations, increasing to 28.2% and 32.4% for T = 500 and 40.4% and 46.2% for T = 1000 with Student-t innovations.
For non-zero values of θ 1 , we observe qualitatively similar patterns in relation to the power properties of LM F GLS d as were reported in Table 2 in the homoskedastic case, albeit persistent GARCH-type behaviour in the innovations can be seen to clearly lower the finite sample power of LM F GLS d relative to the i.i.d. case, and particularly so when the conditional distribution of the innovations is heavy-tailed. This is of course consistent with Theorem 2 where it was shown that the asymptotic local power function of the LM F GLS d test depends on any nuisance parameters arising from conditional heteroskedasticity in the innovations. To illustrate, from Table 2 for π 1 = π 2 = 0.4, ρ = 0, T = 500, the power of LM F GLS d to detect θ 1 = 0.3 (θ 1 = −0.3) in the i.i.d. case is 57.4% (87.5%). However, from Table B.1, under GARCH dependence with (α, β) = (0.10, 0.85) the respective probabilities are 52.7% (75.7%) in the Gaussian case, and 37.3% (49.3%) in the Student-t case. Similarly, for T = 1000 in the previous example power is seen from Table 3 to be 98.7% (99.9%) in the Gaussian case, and 71.6% (81.1%) in the Student-t case. A comparison between the finite sample power of LM F GLS d and that of LM M LE d and BH d is somewhat uninformative here because of the poor size control of the latter two tests under conditional heteroskedasticity.

Long-run Dynamics in Volume and Volatility
Understanding the linkages between return volatility, liquidity and trading activity has been an area of considerable research interest in the finance literature. We apply the multivariate testing approach developed in this paper to perform joint inference on the order of fractional integration of trading volume and return volatility for a sample of major stocks traded in the U.S. market. As part of this, we also investigate the hypothesis that these variables exhibit the same order of fractional integration.
A number of previous studies have investigated this hypothesis in trading volume and return volatility within a multivariate ARFIMA framework. No strong consensus has emerged across these studies which are based on a variety of methods of estimation and inference and employ a number of different observable variables to proxy the latent return volatility process. Bollerslev and Jubinski (1999) and Lobato and Velasco (2000) use semiparametric multivariate periodogram-based estimators in the frequency domain, proxying return volatility by absolute-valued returns. They conclude that, for most of the stocks analysed, the hypothesis that trading volume and return volatility share the same order of fractional integration cannot be rejected. However, Fleming and Kirby (2011) argue that the slow rate of convergence of periodogram-based estimators raises concerns about estimation efficiency. Consequently, they implement a parametric Gaussian quasi-maximum likelihood (QML) approach as in Nielsen (2004a) to estimate a bivariate FIVAR model, allowing for short-run dependencies, but under the assumption of conditional homoskedasticity. Moreover, Fleming and Kirby (2011) proxy return volatility using intra-day data with the aim of improving accuracy over the use of absolute-valued returns and reject the hypothesis of a common long memory coefficient in most cases.
Our testing procedure is expected to be useful here for two key reasons. First, as shown in Theorem 1, the FGLS-based test achieves the usual √ T rate of convergence in parametric testing, and is therefore expected to yield improved finite-sample power performance relative to periodogram-based estimators; see, for example, Tanaka (1999). This consideration addresses concerns surrounding efficiency raised by Fleming and Kirby (2011). Second, and arguably most importantly, our testing approach is valid in the presence of stationary conditionally time-varying second-order moments and heavy-tailed innovations, unlike the QML approach of Nielsen (2004a) used by Fleming and Kirby (2011).

Data
Our analysis focuses on 30 major U.S. stocks from the DJI. We analyse data sampled from 02/01/2003 to 31/12/2014. Unlike trading volume, return volatility cannot be directly observed. The literature has suggested a number of different estimation methods in increasing degree of accuracy, which we implement. The simplest approach uses absolute-valued returns computed from close-to-close daily prices. Unfortunately, this measure is known to be highly inefficient and subject to large estimation errors. More accurate estimates can be constructed building on intra-day information. Following Garman and Klass (1980), we also proxy daily return variability as u 2 t /2 − (2 ln 2 − 1) c 2 t , where u t and c t are the differences in the natural logarithms of the high and low, and of the closing and opening prices, respectively. Such range-based estimators produce more efficient estimates than absolute-valued returns computed from close-to-close prices (Parkinson (1980)) and, as discussed in Andersen and Bollerslev (1998), can be as efficient a measure of return volatility as realised volatility computed on the basis of three to four hour returns. The last estimator we consider is a realised variance measure computed from aggregating 5-minute squared continuously compounded returns over the trading session. Daily share volumes and high, low, opening and closing prices are obtained from CRSP. High-frequency prices necessary to compute realised variances are obtained from the NYSE Trade and Quote (TAQ) database. As is customary in this literature, we implement log-transforms in both trading volume and return volatility variables. Standard descriptive statistics for the aforementioned variables as well as a statistical analysis highlighting statistically significant evidence for the presence of time-varying second order moments in the data are presented in Tables C1 and C2 of the supplementary appendix.

Implementation Issues
In conducting our analysis of the long memory properties of log-trading volume and log-return volatility, hereafter denoted as (d(vlm), d(σ)) , a number of key implementation issues arise which we now detail.
First, we construct 99%, 95%, and 90% confidence sets for (d (vlm) , d (σ)) by inverting the non-rejection regions of the multivariate test in a discrete grid search over the support . More specifically, we evaluate LM F GLS d for any pair of values d 1 and d 2 in the grid sequence {−0.2, −0.1, ..., 1.1, 1.2}. Point estimates of the long memory parameter vector can also be obtained by minimising the value of LM F GLS d over Ψ; notice this estimate does not depend on the confidence level used. This method of point estimation has been used in the univariate context; see, for example, Gil-Alaña and Robinson (1997). We denote the resulting point estimates of the long memory parameter for log-trading volume and log-volatility as d min (vlm) and d min (σ), respectively. 5 Second, to account for deterministic effects in the level of these series, we apply the OLS-based demeaning procedure described in Remark 8. While most papers do not consider deterministic trends as a stylised feature of return volatility, trading volume is widely accepted to exhibit trending paths conformable with increasing growth in the number of traders and trading activity; see Fleming and Kirby (2011) and references therein. Consequently, for the log-volatility measures, our main analysis is carried out by including a constant to capture a non-zero drift, as in Hassler et al. (2016), while in the case of log-volume we allow for a quadratic time trend polynomial of the form µ t = µ 0 + µ 1 (t/T ) + µ 2 (t/T ) 2 , as advocated by, among others, Luu and Martens (2003) and Fleming and Kirby (2011). Parameters in these functions are estimated through univariate OLS (see Remark 8), with the multivariate fractional integration test then computed on the resultant residuals.
Third, as discussed in Remark 9, we determine the lag length according to Schwert's rule, p = 4(T /100) 1/4 . Given the large sample size involved, Schwert's rule ensures a relatively long lag length, so that the short-run component of logvolume and log-realised variance should be well captured in the auxiliary regression. Andersen et al. (2003) also adopt a relatively long lag length in estimating their FIVAR model for the realised volatility of exchange rates in order to maintain a conservative approach. 6

Main Results
For each stock and for each volatility measure, Table 5 reports the resulting point estimates d min (vlm) and d min (σ). Table 5 also gives the upper and lower bounds of the corresponding 95% confidence ellipsoids formed as the vertical and horizontal projections of the confidence set onto the log-trading volume and log-volatility axes, respectively. 7 The columns headed "Common d" in Table 5 report the range of values d for which the null hypothesis H 0 : d(vlm) = d(σ) = d cannot be rejected at the asymptotic 5% nominal significance level. If this region is non-empty, it shows the set of values along the 45-degree line contained within the 95% confidence ellipsoid; that is, those values of a common order of fractional integration for which the null cannot be rejected. Notice that, by construction, the resulting interval contains the true value of a common long memory parameter with an (asymptotic) probability not smaller than 95%. In addition, given d min (vlm) and d min (σ), we can compute the residuals from the multivariate FGLS regression and use these to estimate the contemporaneous correlation between the innovations to log-volume and a given return volatility measure; this estimate is denoted by ρ e in Table 5. Large values of ρ e are supportive of the usefulness of the multivariate approach we advocate. And, indeed, we see from Table 5 that this estimated correlation is generally quite large and positive.
6. We also investigated the robustness of our main conclusions to the lag augmentation order used in the FGLS regression and to the inclusion of a deterministic time trend in connection with the return volatility measures. To that end, as in Fleming and Kirby (2011), we also looked at the case where a linear time trend was allowed for in return volatility and a low-order VAR(p) was fitted. Table C.3 in the supplementary appendix reports the main results from this analysis, focusing directly on log-volume and log-realised variance, with p = 2 and both with and without a linear time trend in return volatility. Here we also report the related results when p is chosen according to Schwert's rule and return volatility includes a deterministic time trend. While the results show some sensitivity to these variations in the estimated model, the main qualitative picture that emerges is essentially very similar to that discussed below. 7. These bounds (projections) define a rectangular approximation to the true confidence interval ellipsoids, whose area cannot be smaller than that of the true ellipsoid. However, they have the advantage that they provide a summary measure which can easily be tabulated. The full set of confidence ellipsoids for each stock considered can be found in sections C.2.2-C.2.4 of the supplementary appendix.
Let us first discuss the results from the analysis of the joint dynamics of logvolume and log absolute returns. Consistent with previous literature, we observe that for most stocks considered our multivariate test rejects both the null hypothesis that the order of integration of the bivariate series is I(0) (such that both variables are weakly dependent) and the null hypothesis that it is I(1) (such that both series admit an autoregressive unit root). The only exceptions are INTC (Intel) and MSFT (Microsoft), for which the multivariate test cannot reject the null hypothesis that log-volume is I(0) at the 5% level. Taking a simple average of the estimates d min (vlm) and d min (σ) across all stocks considered yields 0.41 and 0.39, respectively, essentially matching the "characteristic" value of 0.40 typically found in literature; see, Andersen et al. (2003). While for many of the stocks considered the point estimates of the vector of fractional exponents are below the nonstationary threshold, we note that for most of the stocks the respective confidence sets cover both the stationary and non-stationary regions of the parameter space, preventing us from drawing clear conclusions on the stationarity of the underlying series. This is a common finding in the realised-volatility modelling literature; see, e.g., Kellard and Sarantis (2010).
Reflecting the strong similarities seen between the estimates of the two long memory parameters in the bivariate system, the hypothesis that trading volume and return volatility are driven by a FIVAR model with the same fractional exponent can be rejected for only five of the stocks considered at the 5% level, which constitutes about 20% of the stocks in our sample. This is, however, considerably higher than the corresponding frequency found by Bollerslev and Jubinski (1999) who only reject for 8% of the series they considered, but is the same as Lobato and Velasco (2000) who also reject the null hypothesis of a common long memory parameter for 20% of the series they consider. 8 In their study of log-volume and log absolute returns, Fleming and Kirby (2011) reject the common long memory parameter null for 100% of the series they analyse. They attribute this to estimation bias in the QML-based inference they use yielding systematically larger parameter estimates for the long memory coefficient for trading volume, and conjecture that departures from normality in log absolute returns may be causing a pervasive effect on QML estimation; see Fleming and Kirby (2011, pp.1721-1722. Our Monte Carlo simulations in section 4 accord with this conjecture suggesting that the combination of persistent time-varying volatility and non-Gaussian features in the data can introduce sizeable biases into the QML-based methods of Nielsen (2004aNielsen ( , 2005 used by Fleming and Kirby (2011).
We now move to a discussion of the results relating to the use of the logrange and log-realised proxies for return volatility. The overall picture that emerges 8. In making such comparisons it is important to note, however, that these authors use different sample data than we do involving different stocks and different time periods. In particular, the sample lengths considered in Lobato and Velasco (2000) are more than double those we consider and our findings of the same frequency of rejections of a common exponent as they do may reflect the greater efficiency of the methods used here.

20
here is remarkably similar in both cases. Multivariate estimation provides even stronger support for fractional integration in this context, with the I(0) and I(1) null hypotheses both being rejected at the 5% level for all of the stocks considered. For log trading volume, some changes are seen relative to the results discussed previously relating to the use of log absolute returns. 9 Overall, the average value of d min (vlm) taken across all of the stocks considered for the log-range and logrealised variance estimators is 0.45, reasonably similar to the 0.41 value reported above in connection with the use of log absolute returns. In contrast, the estimates of the order of integration of return volatility based on either the log-range or log-realised variance measures show a marked increase compared to log absolute returns. In both cases, the average value of d min (σ) is 0.58 and the overall evidence is strongly suggestive that return volatility displays non-stationary dynamics over the period because the lower bounds of the confidence ellipsoids are not smaller than the 0.5 threshold for many of the stocks considered. Evidence of nonstationary fractionally integrated dynamics in realised volatility over this period (which includes the financial crisis) is consistent with the results reported by Hassler et al. (2016); see also Bandi and Perron (2006). Consequently, and because the persistence of log-realised variance tends to be greater than that of log-volume, the hypothesis that both variables share a common fractional exponent is rejected at the 5% level for a significantly larger proportion of the stocks considered, namely, 53.33% when using log-range and 63.33% when using log-realised variance.
It is well understood in the financial econometrics literature that measurement errors in absolute returns can cause bias in (univariate) long memory parameter estimation. Essentially, log absolute returns are subject to noisy additive measurement errors with large variability, which will make the underlying process appear less persistent than it really is, leading to downward-biased estimates of the true order of fractional integration; see, among others, Bollerslev and Wright (2000), Haldrup and Nielsen (2007), and Dalla (2015). This provides a straightforward and plausible explanation for the systematic differences seen in the long memory estimation results for the different return volatility measures reported in Table 5. 10 According to our results, the more efficient the estimate of return volatility used the higher the percentage of the stocks for which the null hypothesis of a common order of integration can be rejected. Essentially, downward biases in the estimation of the long memory parameter on absolute returns biases the tests to non-rejection of a common order of integration. Using more accurate return 9. Because we conduct joint estimation, and the innovations to the short-term component of volume and return volatility are strongly positively correlated, as reported in the column ρ e in Table  5, the estimates of the long memory parameter of log-volume would be expected to be somewhat sensitive to changes in the variable used as a proxy for return volatility.
volatility measures reduces this estimation bias and leads to increased evidence that return volatility is more persistent than volume.

Fractional Cointegration
As noted in Remark 5, our FGLS-based LM test, like the LM test of Nielsen (2005), assumes the absence of fractional cointegration between the variables, and diverges if fractional cointegration is present. Given we reject the null hypothesis of a common order of integration for trading volume and return volatility for most of the stocks considered, we now also investigate the order of fractional integration of the series using the semiparametric approach of Nielsen and Shimotsu (2007) (NS henceforth), detailed in the supplementary appendix. NS's procedure allows us (under certain regularity conditions) to consistently estimate the cointegration rank of the series and, using the approach of Robinson and Yajima (2002), to test the null hypothesis that the elements of long memory vector, d, are equal (although it is important to note that this is not a multivariate test as it is based on the univariate estimates of the fractional exponents). Denoting the statistic for the latter as T 0 , NS show that T 0 p → 0 when the cointegration rank, r, is greater than zero (i.e., where the variables are cointegrated), whereas T 0 ⇒ χ 2 (1) when r = 0 (where the variables are not cointegrated) and the null of an equal order of integration holds on d. NS argue that large values of the test statistic provide evidence against the hypothesis of a common order of integration, regardless of whether the underlying series are fractionally cointegrated or not.
Given that the highest frequency of rejections of a common fractional exponent occurred when using log-realised variances, we only report that case here. Consistent with the analysis in Table 5, we account for a deterministic drift in log-volatility and a polynomial time trend in log-volume by prior detrending of the data, using the two-stage exact local Whittle estimator in Shimotsu (2010). Following the empirical analysis in NS, we estimated d by setting m T = T 0.6 and compute T 0 with s T = log T. Following NS we also use m 1T = T 0.55 and v T = m −0.3 1T in the estimation of the cointegration rank. Table 6 reports the point estimates and 95% asymptotic confidence level estimates of d, the T 0 test statistic and related p-values, the values of the function L (u) used in the model selection procedure, and the estimates of the cointegration rank, r T . The column r * T reports the conditional estimates of r for the cases in which the hypothesis of a common order of integration cannot be rejected at the 5% nominal size level.
Three key features arise from this analysis. First, the results based on the NS test provide the same qualitative evidence as the tests based on FGLS estimation. There exists strong evidence of fractional integration in both series which is again suggestive that realised volatility is more persistent than trading volume. Accordingly, the hypothesis of an equal order of integration is rejected at any of the usual significance levels for the majority of the stocks in our sample. Second, in most cases, the FGLS and the NS tests agree on whether to reject or not the null hypothesis of a common order of fractional integration. In particular, all of 22 the cases in which the FGLS test rejects at the 5% level correspond to stocks for which the NS test also rejects at this level. There are, however, 5 stocks for which NS rejects the null but the FGLS test does not, so the average rejection rate of the NS test taken across all of the stocks considered is slightly higher at 76.67%. Crucially, the p-values of the T 0 test in three of these 5 cases are only slightly below the 5% threshold, suggesting that the differences with the FGLS test are caused by only marginal differences in significance. Finally, the estimates of the cointegration rank suggest that volume and realised volatilty are not in general cointegrated, supporting the suitability of FIVAR-type modelling. In particular, fractional cointegration, indicated by r T = 1 or r * T = 1, is found for only 13.33% of the stocks, but crucially these are all stocks for which neither the NS T 0 test nor our FGLS-based tests reject the null hypothesis of a common order of integration; that is, none of the rejections of a common order of integration seen with the FGLSbased test in Table 5 are associated with a non-zero estimate of the cointegration rank.
It is worth pointing out in conclusion that the assumptions on which the NS approach are based include the requirement of conditional homoskedasticity. To check how sensitive the NS test is to violations of this assumption, we conducted a small Monte Carlo experiment using the sample simulation DGP as in section 4. For samples of size T = 500 and T = 1000, ρ = 0.8 and persistent GARCH processes with (α, β) = (0.1, 0.85), the ERFs of the T 0 test under Gaussian innovations at the 5% nominal asymptotic level were 3.10% and 4.30%, respectively, suggesting approximately correct size. However, under Student-t innovations with 5 degrees of freedom, the respective ERFs were 15.5% and 24.5%. Although clearly oversized, these distortions are considerably smaller than those that were seen in the corresponding results in section 4 for the QML-based test of Nielsen (2005). These simulation results might help explain the differences seen between the results for the FGLS and T 0 tests in our empirical study whereby slightly more rejections of the null of a common order of integration are obtained when using the T 0 test, given that this test has a tendency to be somewhat oversized when the data display conditional heteroskedastcity and heavy-tailed behaviour, as is the case with the data in our empirical study.

Conclusions
We have proposed a new test for fractional integration in the context of a quite general FIVAR model which allows for conditional heteroskedasticity in the innovations, does not require the order of integration of the elements of the vector time series to coincide or to lie in a certain region (thereby allowing for both stationary and non-stationary dynamics) and does not assume a particular distribution for the innovations. To the best of our knowledge, none of the methods in the extant literature has achieved this degree of flexibility. Our approach is based on an LM-type test statistic using a heteroskedasticity-robust estimate of the variance matrix, and can be readily implemented using FGLS estimation in a regression-based context. We have demonstrated that our proposed test statistic has a standard χ 2 asymptotic null distribution, that the test exhibits non-trivial power to reject against a sequence of local alternatives, and that in the case of i.i.d. Gaussian errors the test is asymptotically locally efficient. Monte Carlo analysis was used to show that while our test is approximately correctly sized in finite samples of data exhibiting conditional heteroskedasticity and heavy-tailed features, extant tests in the literature which neglect conditional heteroskedasticity can be severely over-sized even for very large samples.
In an empirical case study we have used our proposed testing procedure to jointly infer the order of fractional integration of trading volume and return volatility in a sample of major stocks traded in the U.S. market. Return volatility was proxied by three different measures with increasing degrees of accuracy: absolute returns, a range-based estimator, and a realised variance computed over 5-minute returns. The evidence from the analysis based on the realised variance and range-based estimates delivered similar conclusions, namely, that for many stocks in the sample return volatility is more persistent than trading volume. On the other hand, the analysis based on log absolute returns showed that volume and return volatility share the same order of fractional integration. Because long memory estimation in absolute returns is known to be downward-biased, measurement errors in the data would seem to be a plausible explanation for the evidence from log absolute returns.
For applied work it is of interest that our conclusions based on the realised variance and range-based estimators of return volatility were very similar. While the former is a more efficient estimate of conditional variability, the latter seems to provide a reasonable enough level of accuracy such that the conclusions drawn from the data are not markedly different. This might be a useful observation in practice because for many applications the high-frequency intra-day data needed to construct realised variance and related measures is often not available, for example when considering small or illiquid markets. In the absence of intra-day data, but when information on high and low prices are available, inference based on rangebased volatility estimates may still lead to reliable conclusions.
We finish with a suggestion for further research. Here we have proposed parametric FGLS-based multivariate fractional integration tests which, unlike other extant parametric tests, allow for conditionally heteroskedastic innovations. There are relatively few semiparametric multivariate fractional integration tests in the literature, most notably Lobato andRobinson (1998), Lobato (1999), Marinucci and Robinson (2001) and Shimotsu (2007), all of which assume conditionally homoskedastic innovations. Investigating whether or not these tests remain asymptotically valid under conditional heteroskedasticity and, if so, comparing their finite sample performance with the tests developed in this paper is beyond the scope of the present paper but would constitute an interesting topic for further research.   , β) , and sample lengths T . The innovations are drawn from either a multivariate normal distribution or a multivariate Student-t distribution with 5 degrees of freedom.

Gaussian Innovations
Student-t Innovations  Table 2. Empirical rejection frequencies at the 5% nominal asymptotic level for θ 2 = 0 and the values θ 1 = 0 (empirical size) and |θ 1 | > 0 and BH d tests for the correlation coefficient ρ, and sample length T . The short-run errors in the DGP obey VAR(1) dynamics with on-diagonal coefficients π 1 and π 2 and off-diagonal coefficients π 12 = π 21 = 0. The LM F GLS d statistic is computed from an augmented auxiliary regression with one lag of the dependent variable. LM M LE d and BH d are computed from VAR(1) residuals. The innovations are drawn from a multivariate Gaussian distribution.  Table 3. Empirical rejection frequencies at the 5% nominal asymptotic level for θ 2 = 0 and θ 1 = 0 (empirical size) and |θ 1 | > 0 (empirical power) of the LM F GLS Gaussian  Table 4. Empirical rejection frequencies at the 5% nominal asymptotic level for θ 2 = 0 and θ 1 = 0 (empirical size) and |θ 1 | > 0 (empirical power) of the LM F GLS Gaussian    (2005); 'T 0 ' and 'p-value' report the test statistic for the null of a common order of integration and related p-values; 'L(u)', u = 0, 1, report the objective function used to infer the cointegration rank in a model selection procedure; finally, ' r T ' reports the estimated cointegration rank resulting from this criterion, and ' r * T ' reports the conditional estimates of r for the cases in which the null hypothesis of a common order of integration cannot be rejected at 5% level.

Summary of Contents
This supplement to our paper "Multivariate Fractional Integration Tests allowing for Conditional Heteroskedasticity with an Application to Return Volatility and Trading Volume" has three main parts. The first part,

. Preliminary Results
Before presenting the proofs of the main results in the paper, we first need to state and prove some preparatory Lemmas. To this end, consider the following additional notation. For an (n × 1) vector A, ||A|| denotes the Euclidean vector norm, such that ||A|| 2 = A A. For an (n × m) matrix A, ||A|| denotes the Euclidean matrix norm, ||A|| 2 = tr (A A) . The constants K and C are used throughout the proofs to refer to some generic strictly positive constant which does not depend on the sample size. The notation a.s.
→ denotes almost surely convergence of a random sequence as the sample length is allowed to diverge to +∞. Finally, throughout the proofs, we will use the superscripts ' * ' and ' * * ' to denote truncated processes and their non-truncated counterpart, respectively, such as the truncated and non-truncated processes z * s,t−1,d s := t−1 l=1 l −1 ε s,t−l,d s and z * * s,t−1,d s := ∞ l=1 l −1 ε s,t−l,d s , respectively. This distinction is necessary because, while the statistics computed in the paper are constructed from the truncated variables, the asymptotic theory is developed with respect to the corresponding nontruncated processes. Lemmas A2 and A3 below show that the distinction between the two is asymptotically negligible so far as the limiting distribution theory for the statistics considered in this paper are concerned.
bounded, and bounded away from zero if ν ij = 0.

Return Volatility and Trading Volume
Under Assumption 1, H 0 : θ = 0, and for any 1 ≤ s ≤ k, X * * s,−1,d is a measurable function of a strictly stationary and ergodic process and is therefore also a strictly stationary and ergodic process, and so is X * * s,−1,d X * * s,−1,d . The required result then follows from the Ergodic Theorem because x * * absolute expected values. To see this, first note that for any 1 ≤ s ≤ k, there exists Clearly, the condition Σ > 0 rules out the degenerate case E x * * i,t−1,d x * * i,t−1,d = 0, from which the required results follow. Furthermore, for i = j, Ω Aij = ν ii ∞ l=1 Γ il Σ Γ il , and so Ω Aij is positive definite (Davidson, 2000, Corollary 14.2.10, p.216). Demetrescu et al. 2008, Lemma B.1.), we have from the Cauchy-Schwarz inequality that,

Lemma A2. Under Assumption 1 and H
Hence, by the Markov inequality. Next, write z * * s,t−1,d s = z * s,t−1,d s + b s,t−1 , with b s,t−1 := ∞ l=t ω sl e s,t−l . Because ω sl = O (1/l) and b s,t−1 = O p 1/ √ t , it follows that, and the required result holds from the Markov inequality. For part b), note that the first element of the column vector X * * i,−1,d i − X * i,−1,d i u j is given by given that E e 2 j,t e 2 i,t−l 1 ≤ E e 4 j,t E e 4 i,t 1/2 < K for all t, and Assumption (A4) which implies that E e 2 j,t e i,t−l 1 e i,t−l 2 ≤ E (|e j,t e jt e i,t−l 1 e i,t−l 2 |) = o 1 l 1 l 2 for any l 1 , l 2 > 0, l 1 = l 2 under absolute summability; see Lemmas B.1i) and B.5 in Hassler et al. (2009). The required result then holds from the Markov inequality.
Proof of Lemma A4. The proof follows from the Cauchy-Schwarz inequality given that e i,t x * * j,t−1,d is (uniformly) L 2 -bounded for any 1 ≤ i, j ≤ k.
ω jl 1 ω jl 2 E e 2 i,t e j,t−l 1 e j,t−l 2 9 Return Volatility and Trading Volume where E e 2 i,t e j,t−l 1 e j,t−l 1 ≤ E(e 4 i,t ) 1/4 × E e 4 j,t 3/4 < K and, as in Lemma A2,

A.2. Proofs of Main Results
Proof of Theorem 1. Under Assumption 1 and H 0 : θ = 0, the FGLS estimator of β can be written as Using Lemmas A2 and A3, we therefore have that, Recall that ν ij denotes the (i, j)-th element of Σ −1 , and define A * * where A β is a partitioned matrix with ij-th submatrix given by Ω Aij . Noting that the columns of A β cannot be written as linear combinations of the other elements, det(A β ) > 0, and consequently by Slutsky's Theorem. We now discuss the asymptotic behaviour of the second term in (A.1). To this end, define the column vector w * * −1,d := X * * −1,d Σ −1 ⊗ I T −p * u, noting that ν ks X * * k,−1,d u s with u s := (e s,p+1 , ..., e s,T ) under H 0 : θ = 0. Given that E w * * −1,d |F t−1 = 0 and w * * −1,d is a measurable function of {e t } , w * * −1,d , F t is a strictly stationary and ergodic vector MDS. The covariance matrix of w * * −1,d is B β := E w * * −1,d w * * −1,d , which can be represented as a partitioned matrix with ij-th block Ω Bij given by From Lemma A4, E||e r,t e s,t x * * i,t−1,d x * * j,t−1,d || < ∞, and consequently Ω Bij < ∞ for all 1 ≤ i, j ≤ k, so B β < ∞. Furthermore, the condition that Σ is positive definite trivially rules out the degenerate case ||Ω Bii || = 0, and so B β is bounded away from zero. Consequently, the Central Limit Theorem (CLT) for MDS (Davidson, 2000, Theorem 24.3) and the Cramér-Wold device deliver the and so we may conclude that, Proof of Theorem 2. We first prove the stated convergence result under the null hypothesis, which follows directly from Theorem 1 and the consistency of such that exists an upper triangular matrix L such that RΩ β R = L L. Consequently, √ T L −1 R β ⇒ N (0, I k ) , and, hence, We now establish the corresponding asymptotic convergence result under the local alternative H c : θ = c/ √ T , where at least one element of c is nonzero. Under Assumption 1 and H c , we have that Y t,d = X * −1,d β 0 + u θ where u θ ≡ (u θ1 , ..., u θk ) , u θs := u s + 1 √ T X * s,−1,d+θ ψ cs , and ψ cs := c s , −π s1 c , ..., −π sp c for 1 ≤ s ≤ k; see Tanaka (1999). In this context, the FGLS estimator is given by Lemma A3i) applies under the alternative hypothesis because, although the OLS estimator is no longer consistent, it still follows that u s = u s + O p T −1/2 and, Define ψ c := (ψ c1 , ..., ψ ck ) . Then, where we can show that, which follows from the Ergodic Theorem and the AEL because from the CLT for MDS, given that and, finally,

13
Return Volatility and Trading Volume Consequently, under Assumption 1 and H c : This combined with the result that A * with ξ := (L −1 c) (L −1 c), as required.

Additional Reference
.

Appendix B -Additional Monte Carlo Results
To provide further insights into the finite sample performance of the tests we again focus on the bivariate (k = 2) case where y t ≡ (y 1t , y 2t ) , and consider the simulation DGP, where η t ≡ (η 1t , η 2t ) is an i.i.d. vector drawn from either a multivariate Gaussian distribution or a (heavy-tailed) multivariate Student-t distribution with 5 degrees of freedom. The covariance matrix Ω ρ depends on the contemporaneous correlation coefficient ρ, whose value we vary among ρ ∈ {0, 0.2, 0.4, 0.6, 0.8}. The conditional variances σ 2 it are driven by (normalised) stationary GARCH(1,1) processes characterised by: with α, β ≥ 0 and α + β < 1, such that E e 2 it = 1. For simplicity, we impose the same GARCH dynamics on the two series, focusing on GARCH parameter configurations that allow for varying degrees of persistence in the conditional variances as measured by α + β, namely,

B.1. Unconditional Heteroskedasticity
In Table B.1 we report results for the case where the innovations are homoskedastic, DGP1: σ 2 1t = σ 2 2t = 1, and for the case where there is a contemporaneous onetime break of equal magnitude in the variances of ε t . Regarding the latter, two heteroskedastic cases are considered: (i) an upward change in variance such that , where in each case I(·) denotes the indicator function, taking the value one when its argument is true and zero otherwise, and τ ∈ {1/3, 1/2, 2/3} corresponds to the break fraction. DGP2 and DGP3 allow us to examine the impact of unconditional heteroskedasticity, both in isolation and in its interaction with ρ, on the finite sample size of the tests. In each of DGP2 and DGP3 a fourfold change in variance is seen which is likely to be of considerably larger magnitude than we might expect to see in practice, but serves to illustrate how the tests behave in the presence of large changes in unconditional volatility.

B.2. Estimation Accuracy
To illustrate the gains that can be obtained by using the FGLS approach over   We also computed the empirical MSE of the parameter estimates of β i , i = 1, 2 in (6) resulting from a multivariate linear regression model as in (5)

B.3. The Impact of Nonzero Means
To illustrate the impact of nonzero means on the finite sample size performance of the test procedure, we consider the following three cases: µ = 0, 5, 10 which correspond to 2 × 1 vectors of common elements (0, 5 and 10, respectively), and 22 we use three different demeaning approaches: i) no demeaning (which we denote as µ 0 ); ii) demeaning only (which we denote as µ 1 ); and iii) demeaning and detrending (which we denote as µ 2 ).
Specifically, to account for a non-zero deterministic mean in the level of the series we use the demeaning process described in Robinson (1994); Demetrescu et al. (2008) and Hassler et al. (2016). Hence, to account for the nonzero means in (4) of the paper. Denote the resulting estimates µ i , i = 1, 2, and the corresponding residuals as One then redefines the ith element of the vector ε t,d to be ε it,d i , i = 1, 2, and then proceeds as before to compute the respective test statistics; see Remark 8 for further details. Note: LM F GLS d,µ i , i = 0, 1, 2 correspond to statistics computed from data which has not been demeaned (µ 0 ), data that has been demeaned (µ 1 ) and data which has been demeaned and detrended (µ 2 ).    Table B.3. Impact of µ on finite sample size performance. Student-t distributed innovations (5 degrees of freedom). T = 500.  Power of test when data is not demeaned or detrended, when data is demeaned and when data is demeaned and detrended using the approach described in Remark 8, and for µ = {0, 5, 10} and sample size T = 500.

27
Return Volatility and Trading Volume

B.4. Performance Under Fractional Cointegration
The data generation process (DGP) considered for investigating the impact of fractional cointegration is the same as that used in Nielsen (2005); that is, For ρ = 0, y 2t is strictly exogenous whereas for ρ = 0, y 2t is endogenous. We consider ρ ∈ {0, 0.4, 0.8}.
is the periodogram of u evaluated at the fundamental frequencies λ j := 2πj/T , m 1T is a bandwidth parameter, and [·] denotes the real part of the argument. Given the eigenvalues δ * i and a suitable bandwith parameter v T , the cointegration rank can be determined as r T = arg min  Note: ***, ** and * indicate significance at the 1%, 5% and 10% significance level, respectively; and LM (1) and LM (5) , correspond to Engle's LM test results for ARCH effects using 1 and 5 lags of the squared residuals of an ARFIMA, respectively.