Cointegration Vector Estimation by DOLS for a Three-Dimensional Panel Estimación de un modelo de cointegración utilizando DOLS para un panel de tres dimensiones

This paper extends the results of the dynamic ordinary least squares cointegration vector estimator available in the literature to a three-dimensional panel. We use a balanced panel of N and M lengths observed over T periods. The cointegration vector is homogeneous across individuals but we allow for individual heterogeneity using different short-run dynamics, individual-specific fixed effects and individual-specific time trends. We also model cross-sectional dependence using time-specific effects. The estimator has a Gaussian sequential limit distribution that is obtained by first letting T → ∞ and then letting N → ∞, M → ∞. The Monte Carlo simulations show evidence that the finite sample properties of the estimator are closely related to the asymptotic ones.


Introduction
This paper proposes an extension of the dynamic ordinary least squares (DOLS) cointegration panel estimators of Mark & Sul (2003) to a three-dimensional panel.The single equation DOLS for estimating and testing the cointegration hypothesis was proposed by Phillips & Loretan (1991), Saikkonen (1991), and generalized by Stock & Watson (1993).
DOLS is a single-equation cointegration technique that overcomes the common problems of the static and modified OLS.The static OLS finite sample estimates of long-run relationships are potentially biased and inferences cannot be drawn using t-statistics (Banerjee, Hendry & Smith 1986, Kremers, Ericsson & Dolado 1992).DOLS methodology is based on an equation that includes lags and leads of right-hand side variables, which eliminates the effect of the endogeneity of these variables.Therefore, it is possible to construct asymptotically-valid test statistics and also to estimate the long-run relationships.
Panel DOLS (PDOLS) has been analyzed by Kao & Chiang (2000) and Mark & Sul (2003).Kao & Chiang (2000) study the properties of panel DOLS when there are fixed effects in the cointegration regressions.Mark & Sul (2003) allow for individual heterogeneity through different short-run dynamics, individual-specific fixed effects and individual-specific time trends.They also permit a limited degree of cross-sectional dependence through the presence of time-specific effects.
Panel analysis usually employs two dimensions, being time one of them.However, given the great availability of data nowadays two dimensions are not always enough, in these cases, a panel in three dimensions is a relevant option.These methodologies are very useful as they model the heterogeneity of the data in a more rigorous way.Some empirical application of panels in three dimensions can be found in Eilat & Einav (2004), Davies (2006) and Davies, Lahiri & Sheng (2011), among others 1 .
For extending the results of Mark & Sul (2003) to a three dimensions setup, we use a balanced panel of three dimensions with lengths N , M and T .The cointegration vector is homogeneous across individuals but we allow for individual heterogeneity using different short-run dynamics, individual-specific fixed effects and individual-specific time trends.Both individual effects are considered in the first two dimensions.As in Mark & Sul (2003), we also model some degree of cross-sectional dependence using time-specific effects.After obtaining the Panel

Representation of a Cointegrated Model in Panel
Data in Three Dimensions Consider the following triangular representation of a cointegrated system for a panel with individuals indexed by i = 1, . . ., N and j = 1, . . ., M over time periods t = 1, . . ., T where {y ijt } is the dependent variable integrated of order one, {x x x ijt } is a kdimensional vector of integrated series of order one and {u ijt , v v v ijt } is a covariance stationary error process independent across i and j but possibly dependent across t.In this case, the variables are said to be cointegrated for each member of the panel, with cointegrated vector γ γ γ.Individual heterogeneity is considered through different short-run dynamics, individual-specific fixed effects of the first two dimensions, α .A limited degree of cross-sectional dependence is also permitted by the presence of time-specific effects, θ t .In this notation, (N ) and (M ) indicate the first and second dimension, respectively.On the other hand, N and M indicate the number of individuals in the the first and second dimension, respectively.

Panel DOLS Estimator in Three Dimensions
PDOLS methodology is based on the estimation of the following equation where z z z ijt = (∆x x x ijt−p , . . ., ∆x x x ijt , . . ., ∆x x x ijt+p ) is a (2p + 1)k-dimensional vector of leads and lags of the first differences of the variables x x x ijt .The inclusion of lags and leads eliminates the effect of the endogeneity of these variables.To avoid perfect collinearity, α 2) can be expressed as follows, where y ‡ ijt and x x x ‡ ijt represent the linear projection of the dependent variable and the variables x x x ijt with respect to the short run components, z z z ijt .

Asymptotic Distribution of the PDOLS-3D Estimator
Taking into account that elements in β β βNMT have different rates of convergence, we can rewrite (10) as Where where (1 1 1) N ×M is a matrix of ones, The asymptotic distribution of the PDOLS-3D estimator is presented in proposition 1 part (ii).The following lemmas are required to prove this proposition.The proofs of the lemmas follow from simple extensions of the results of Mark & Sul (2002).Nevertheless, they are presented in Appendix A, B and C 2 Following the results of Mark & Sul (2003), a linear hypothesis of the form R R Rγ γ γ = r r r can be tested using regular Wald statistics.Let, R R R a r × k known matrix and r r r a r × 1 known vector.
2 Following the results of Phillips & Moon (1999) and White (2001) and under the assumptions N T → 0, M T → 0 and N M → 1, we obtain similar asymptotic results when considering joint convergence in the three dimensions (N → ∞, M → ∞ and.T → ∞) instead of sequential convergence.
Lemma 1.For each i and j as T → ∞, This lemma demonstrates the equivalence in probability of the projected series in the z ijt space and the series which are not projected.This gives an asymptotic justification for ignoring the fact that we are using projection errors instead of the original observations.
This lemma shows the convergence of each element in the M N M T matrix.
This lemma shows the convergence in distribution of m m m N M .
Proposition 1.For the PDOLS-3D estimator in (2), as where where This proposition presents the sequential limit distribution of the PDOLS-3D estimator.The proof of part (i) follows from Lemma 2 and Lemma 3.(iii), the proof of part (ii) follows from Lemma 2 and Lemma 3.(ii).The proof of part (iii) is straightforward.

Monte Carlo Experiments
This section summarizes the Monte Carlo experiments used for evaluating some small sample properties of the PDOLS-3D estimator derived in Section 3. The design of the experiment follows the structure presented in Mark & Sul (2003).The Data Generating Process (DGP) includes two regressors in the cointegration relation, and is defined as follows: The short run dynamics are given by: , e e e ijt = (e 1,ijt , e 2,ijt , e 3,ijt ) , e e e ijt ∼ N 3 0, , where diag(a 1 , . . ., a n ) is an n × n diagonal matrix which diagonal elements are a 1 , . . ., a n ; and A ij is a 3 × 3 matrix whose hr-th element is A hr,ij for h, r = 1, 2, 3.
The design of the DGP is also related to the empirical work on the Colombian factor productivity by Iregui, Melo & Ramírez (2007), where the regressors x 1,ijt and x 2,ijt represent the level of capital stock and labor force, respectively, for the i-th industrial sector and j-th metropolitan area in year t.As can be seen in the specification of the DGP, both regressors are driftless I(1) processes.The equilibrium error is modeled to allow for a general form of Cross-Sectional Dependence (CSD), where the CSD is induced by θ 1t , while φ controls the degree of Cross-Sectional Dependence.The parameters θ 2t and θ 3t cause cross-sectional endogeneity between the regressors and the equilibrium error.
The cointegration vector is simulated as (γ 1 , γ 2 ) = (0.15, 0.85).The values A 11,ij and σ hij , for each i and j, are extracted from a uniform distribution, and are kept constant throughout the experiment.Different levels of persistence in the short-run dynamics are obtained by varying the limits of the uniform distribution from which the elements of A ij are drawn.Three levels of persistence are considered: low, A 11,ij ∼ U [0.3,0.5], medium, A 11,ij ∼ U [0.5,0.7], and high, A 11,ij ∼ U [0.7,0.9] .Additionally, different values of φ are used in the simulations: φ = 0 for no CSD, φ = 0.3 for low CSD, and φ = 0.7 for high CSD.
Other parameters used in the simulation are as follows: ; and σ 2 θ1 = 1.8, σ 2 θ2 = 0.645, σ 2 θ3 = 2.0 for i = 1, . . ., N , j = 1, . . ., M .The prewhitened quadratic spectral methodology proposed by Sul, Phillips & Choi (2005) was used for estimating the long-run variances Ω uu,ij .The values of the individual-specific fixed effects of the first two dimensions, α i and α j , are taken from the PDOLS-3D estimation with the data in Iregui et al. (2007).The simulation structure includes 1,000 samples of size N = 9, M = 18, and T = 50 or T = 100 or T = 150.The number of leads and lags of ∆x x x ijt included in the PDOLS-3D estimation is taken as two.Then, three cases are evaluated: Case 1: No CSD (φ = 0), with low, medium and high persistence levels.
Tables 1 to 9 report the results of the simulation experiments described in cases 1 to 3 for N = 18 and M = 9.Tables 1, 4 and 7 present the simulations with T = 50, Tables 2, 5 and 8 for T = 100, and Tables 3, 6 and 9 for T = 150.The effective size results for Case 1 with 5% and 10% nominal-sized tests and T = 50 are presented in Table 1.Under low levels of persistence, the tests' effective sizes are fairly accurate; nevertheless, the results for H 0 : γ 2 = 0.85 are slightly better than those for H 0 : γ 1 = 0.15.For medium levels of persistence, there is a loss of size accuracy of the tests relative to low persistence levels, in nominal sizes of both 10% and 5%.For high levels of persistence, the test sizes for both γ 1 and γ 2 are notably smaller than their nominal sizes.
Table 2 shows the results obtained for Case 1 with T = 100.Under medium and high levels of persistence, the effective sizes are closer to the nominal sizes than they were when T = 50.When T = 100 the tests for low levels of persistence are not as accurate as those for T = 50.However, the results are not very different.The results for Case 1 with T = 150, presented in Table 3, are very similar to the ones obtained for T = 100.The size results of PDOLS-3D tests for Case 2, with T = 50 are shown in Table 4.When the CSD degree is low, the test for γ 2 is accurate at low levels of persistence, and effective size becomes smaller when persistence increases to medium and high levels.For γ 1 , the test is mis-sized when the persistence is low, and its effective size decreases when the persistence reaches medium and high levels.In Case 2 under high CSD for the simulations presented in Table 4, the tests are, in general, not as well sized as they were in low CSD.For example, for γ 2 the effective sizes in low and medium levels of persistence are not as accurate as under low CSD.Even though the tests are highly mis-sized for some cases, the test accuracy improves when the level of persistence increases.
Table 5 presents the test sizes for case 2 with T = 100.Under low CSD, the test results generally improve with respect to those presented in Table 4, specially for γ 1 .Under high CSD, the increase in the time dimension improves the accuracy of the tests under low, medium, and some cases of high persistence levels.The simulations for T = 150 produce even better results, as is showed in Table 6.
The results for Case 3 with T = 50 are reported in Table 7. Size results are closer to the nominal levels in the presence of low CSD than in the presence of high CSD, for low and medium persistence levels.Additionally, as in the previous cases described in Tables 1 and 4, size decreases when persistence approaches higher levels.This leads to small and mis-sized tests for γ 1 in high persistence levels.It is also important to note that, in most of the cases, the results of Case 3 are not very different from those obtained in Case 2. Size results for Case 3 and T = 100 are shown in Table 8.When the CSD degree is low, there are gains in accuracy for the γ 1 tests in all persistence levels compared with simulations of Table 7.However, there are some cases with no improvement for γ 2 .These gains are also obtained under high CSD for most of the cases.For T = 150 (Table 9), size distortions are, in general, smaller.
In conclusion, there are four relevant observations related to the simulation experiments.First, nominal and effective sizes are, in general, close enough.Second, persistence levels and size are negatively related; as the level of persistence is increased, effective size systematically decreases.Third, the results of the effective size in Cases 2 and 3 are relatively similar, which indicates that subtracting the cross-sectional average is an effective way to control CSD, even in the presence of heterogeneous CSD.Finally, as expected, increasing the time dimension, in general, improves the accuracy of the tests.

Empirical Application
As an empirical exercise, we estimate capital and labor elasticities associated with the total factor productivity for the Colombian industry.This exercise is useful, since productivity is a variable that reflects how efficiently an economy uses its resources to produce goods and services and helps to determine the distribution of value added between capital and labor.
Assuming a Cobb-Douglas production function, we obtain the following equation for value added: where Y is value added, K is capital stock, L is labor, A corresponds to total factor productivity, α and β are the elasticities of capital and labor, respectively, and α + β = 1.The subscripts i, j and t represent metropolitan areas, industry sectors and time, respectively.
Taking logarithms on both sides of equation ( 11) we get: To estimate equation ( 12) we employ the dataset used in Iregui et al. (2007).They use data from the annual manufacturing industry survey (EAM) of the national administrative department of statistics (DANE).The value added is calculated as the difference between gross output and intermediate inputs, where the latter corresponds to the value of domestic and foreign consumed raw materials and the value of purchased electricity (kw/h).Labor is defined as the number of employees.And finally, capital stock was calculated using the perpetual inventory method for gross investment3 .This data includes annual information for 9 metropolitan areas and 18 industrial sectors (three-digits CIIU) from 1975 to 2000.The metropolitan areas considered are: Bogotá, Cali, Medellín, Manizales, Barranquilla, Bucaramanga, Pereira and Cartagena and the rest of the country.
The estimated parameters of ( 12) are presented in Table 10.As indicated in section 3, these estimations are obtained controlling for individual-specific fixed effects in the metropolitan areas and industrial sector dimensions, individual-specific time trends in those dimensions and cross-sectional dependence.These results show that the elasticities of capital and labor are 0.24 and 0.76, respectively.These elasticities are similar to those found in Colombian literature.For example, a technical document from Secretaría de Hacienda Distrital (2003) estimated coefficients of 0.27 and 0.72 for capital and labor, respectively; and Eslava, Haltiwanger, Kugler & Kugler (2004) found elasticities of 0.32 for capital and 0.74 for labor for the period 1982-1998.

Final Remarks
This paper extends the asymptotic results of the dynamic ordinary least squares cointegration vector estimator of Mark & Sul (2003) to a three dimensional panel (PDOLS-3D).This method allows for individual heterogeneity using different short-run dynamics, individual-specific fixed effects and individual specific time trends.Also some degree of cross-sectional dependence is considered by the use of time-specific effects.A convenient feature of this method is that it permits the construction of asymptotically-valid test statistics for hypothesis testing.
The proposed estimators have also acceptable finite sample properties.Throughout the Monte Carlo experiments it was found that the effective sizes of the PDOLS-3D t-tests are relatively close to the nominal sizes, for different persistence levels of the series, and different forms and degrees of cross-sectional dependence.