1 Introduction

Panel data models have been increasingly popular in applied economics and finance, due to their ability to model various sources of heterogeneity. A standard practice is to impose strong restrictions on error cross-section dependence (CSD). This takes the form of independence across individual units under the fixed effects model whilst a common time effect severely restricts the nature of CSD under the random effects specification. However, most recently, a large number of studies have developed econometric methodologies for modelling CSD, mainly through the structure of interactive effects (hereafter IE) introducing heterogeneous unobserved factors into the error components and allowing for a richer cross-sectional covariance structure.

In this framework, conventional wisdom has been that the standard two-way fixed effects (FE) estimator would be inconsistent, due to ignoring the potential endogeneity arising from the correlation between the regressors and factors and/or factor loadings (e.g. Bai 2009). Hence, two leading approaches have proposed in the literature, see Chudik and Pesaran (2015) for a survey. The first, based on the principal component (PC) estimation, estimates the factors jointly and iteratively with the main slope parameters, see Bai (2009) and Moon and Weidner (2015), Fernandez-Val and Weidner (2016) and Charbonneau (2017) for extensions. The second approach, advanced by Pesaran (2006), treats factors as nuisance terms, and removes their effects through proxying them by the cross-section averages of the dependent and independent variables. This is referred to as the common correlated effects (CCE) estimator. A growing number of extensions have been developed by Kapetanios et al. (2011), Chudik and Pesaran (2015), Westerlund and Urbain (2015) and Petrova and Westerlund (2020).

In the empirical work, the CCE estimator is mostly used as this is easier to implement with respect to the PC. Indeed, a common practice is to apply the CCE estimator after detecting the existence of strong CSD by the Pesaran (2015) CD test, see Mastromarco et al. (2016), Holly et al. (2010), Baltagi and Li (2014), among others.

This paper contributes to this literature by raising some important issues which might be considered relevant for practitioners. We start by highlighting a simple fact that the FE estimator is not always inconsistent even in the presence of IE. If the regressors are correlated with factors but uncorrelated with loadings, then the FE estimator is shown to be consistent, which has also been noted earlier by Coakely et al. (2006), Sarafidis and Wansbeek (2012) and Westerlund (2019a). In such a case, we formally show that the FE estimator is consistent and asymptotically normally distributed. But, the variance estimator provided by the standard FE estimation will be invalid due to the presence of remaining zero-mean IE in the error components. Hence, we provide two consistent nonparametric variance estimators that are robust to the presence of heteroscedastic and serially-correlated disturbances as well as the slope parameters heterogeneity. Via Monte Carlo studies, we find that FE and CCE estimators display a similar and satisfactory performance when the regressors are correlated with factors but uncorrelated with loadings. Further, the coverage rate of the FE estimator evaluated using nonparametric variance estimators reaches the nominal 95%. The performance of both CCE and FE estimators worsens significantly if the regressors are correlated with loadings, which is in line with Westerlund and Urbain (2013). As expected, the performance of the PC estimator is not unduly affected by the presence of correlation between the regressors and loadings.

Furthermore, we point out that in the specification tests that have been proposed in the literature to testing the presence of the CSD or IE, e.g. Pesaran (2015), Sarafidis et al. (2009), Bai (2009) and Castagnetti et al. (2015), the rejection of the null hypothesis does not always imply that the FE estimator is inconsistent under the alternative model with IE. For instance, Sarafidis et al. (2009) maintain an assumption that factor loadings (between equations for the dependent variable and the regressors) are uncorrelated even under the alternative. More importantly, we show that the Hausman test for the null hypothesis of the two way additive fixed effects against the alternative hypothesis of IE proposed by Bai (2009), would be inconsistent against the alternative, especially if the regressors are uncorrelated with loadings. This suggests that the presence of no correlation between the regressors and loadings emerges as an influential but under-appreciated feature of the panel data model with IE. For large T, in order to avoid any potential omitted variables bias, it is natural to allow for the regressors to be correlated with unobserved factors. But, it still remains the important issue to test whether the regressors are correlated with loadings or not in practice.

Despite a growing number of studies on modelling CSD through IE, it is rather surprising to find that the literature has been silent on investigating the important issue of testing the validity of correlation between the regressors and factor loadings in panels with IE. This is the important hypothesis to be tested because if the loadings are uncorrelated with the regressors, we can just use the simple but consistent FE estimator. In what follows we develop the Hausman-type test that determines the validity of whether the regressors are correlated with loadings. Both the FE and PC estimators are consistent under the null hypothesis of uncorrelated factor loadings whilst only the latter is consistent under the alternative hypothesis. Our proposed test is different from the Hausman test developed by Bai (2009), because our null hypothesis is subsumed under his alternative model with IE. As a result, the FE estimator is not necessarily more efficient than the PC estimator under the null hypothesis. Based on this idea, we develop two nonparametric variance estimators for the difference between the FE and PC estimators, that are shown to be robust to the presence of heteroscedasticity, autocorrelation and slope heterogeneity. We derive that the proposed test statistic follows the \(\chi ^{2}\) distribution asymptotically. Monte Carlo simulation results confirm that the size and the power performance of the test is quite satisfactory even in small samples.

Finally, our most important contribution is the provision of extensive empirical evidence that regressors are uncorrelated with factor loadings, in many panel datasets employed in the literature. We find that the null hypothesis of the regressors being uncorrelated with factor loadings, is not rejected in thirteen out of fourteen datasets considered. Next, we find that Bai’s Hausman test rejects the null of additive effects model against the alternative of IE only once whilst the CD test by Pesaran (2015) strongly rejects the null of weak CSD for all the datasets. Such conflicting results could provide an additional support for our main findings that the regressors are indeed uncorrelated with factor loadings even in cross-sectionally correlated panels with IE, in which case we show that Bai’s Hausman test is inconsistent. Furthermore, the FE estimator is invariant to any complex issues related to selecting the number of unobserved factors incorrectly which would significantly affect the performance of PC estimators (Moon and Weidner 2015), and to employing the inconsistent initial estimates which may not guarantee the convergence of the iterative PC estimator (Hsiao 2018). In this regard, the FE estimation combined with nonparametric variance estimators will provide the simple and robust approach, avoiding uncertainty in specifying and estimating nuisance parameters for potential efficiency gain. This suggests that the FE estimator can still be of considerable applicability in a wide variety of cross-sectionally correlated panel data, especially if the regressors are found to be uncorrelated with factor loadings, the validity of which can be easily verified by our proposed test.

The paper proceeds as follows. Section 2 describes the model and highlights that the FE estimator is still consistent in panels with IE, under the condition that the regressors are uncorrelated with factor loadings. Section 3 develops the Hausman-type test for the validity of correlated factor loadings, which is the important hypothesis to be tested. Section 4 employs a range of Monte Carlo simulations to investigate the finite sample performance of the alternative estimators and the proposed test statistic. Section 5 presents empirical evidence documenting that the null hypothesis of the regressors uncorrelated with factor loadings, is not rejected for thirteen out of fourteen datasets. Section 6 offers some concluding remarks. Mathematical proofs, the data descriptions and additional empirical results are relegated to Appendices. Additional simulation results can be found in Online Supplement.

2 The model

Consider the following heterogeneous panel data model with IE:

$$\begin{aligned} y_{it}=\varvec{\beta }_{i}^{\prime }\varvec{x}_{it}+\varvec{\gamma }_{i}^{\prime }\varvec{f}_{t}+\varepsilon _{it} \end{aligned}$$
(1)

where \(y_{it}\) is the dependent variable of the ith cross-sectional unit in period t, \(\varvec{x}_{it}\) is the \(k\times 1\) vector of covariates with \(\varvec{\beta }_{i}\) the \(k\times 1\) vector of parameters, and \(\varepsilon _{it}\)’s are idiosyncratic errors. \(\varvec{f}_{t}\) is an \(r\times 1\) vector of unobserved common factors while \(\varvec{\gamma }_{i}\) is an \(r\times 1\) vector of random heterogeneous loadings.

We make the following assumptions:

Assumption A

(i) \(\varepsilon _{it}\) is independently distributed across i with \(E\left( \varepsilon _{it}\right) =0\), \(E\left( \varepsilon _{it}^{2}\right) =\sigma _{\varepsilon _{i}}^{2}\) and \(E\left( \varepsilon _{it}^{8+\delta }\right) <\infty \) for some \(\delta >0\). Each \(\varepsilon _{it}\) follows a linear process with absolutely summable autocovariances such that \(\lim _{T\rightarrow \infty }T^{-1} {\textstyle \sum _{s=1}^{T}} {\textstyle \sum _{t=1}^{T}} \left| E\left( \varepsilon _{is}\varepsilon _{it}\right) \right| ^{1+\delta }<\infty \), \(E\left| N^{-1/2} {\textstyle \sum _{i=1}^{N}} \left[ \varepsilon _{is}\varepsilon _{it}-E\left( \varepsilon _{is} \varepsilon _{it}\right) \right] \right| ^{4}<\infty \) for all ts, and \(\lim _{T,N\rightarrow \infty }T^{-2}N^{-1} {\textstyle \sum _{i=1}^{N}} {\textstyle \sum _{s=1}^{T}} {\textstyle \sum _{t=1}^{T}} {\textstyle \sum _{r=1}^{T}} {\textstyle \sum _{w=1}^{T}} \left| Cov\left( \varepsilon _{is}\varepsilon _{it},\varepsilon _{ir}\varepsilon _{iw}\right) \right| <\infty \). The largest eigenvalue of \(E\left( \varvec{\varepsilon }_{i}\varvec{\varepsilon }_{i}^{\prime }\right) \) is bounded uniformly in every i and t, where \(\varvec{\varepsilon }_{i}=\left( \varepsilon _{i1},\ldots ,\varepsilon _{iT}\right) ^{\prime }\). \(\varepsilon _{it}\) is independent of \(\varvec{x} _{js}\), \(\varvec{\gamma }_{j}\) and \(\varvec{f}_{s}\) for all ijs and t.

(ii) \(\varvec{f}_{t}\) is covariance stationary with finite mean and variance, \(\varvec{\Sigma }_{f}\) with \(E\left( \left\| \varvec{f} _{t}\right\| ^{4}\right) <\infty \) where \(\varvec{\Sigma }_{f}\) is an \(r\times r\) positive definite matrix.

(iii) \(\varvec{\gamma }_{i}\) is iid across i with finite mean and variance, \(\varvec{\Sigma }_{\gamma }\), where \(\varvec{\Sigma }_{\gamma }\) is an \(r\times r\) positive definite matrix. \(\varvec{\gamma }_{i}\) are independent of \(\varepsilon _{jt}\) and \(\varvec{f}_{t}\) for all ij and t.

(iv) The \(k\times 1\) vector of \(\varvec{\beta }_{i}\) are generated as \(\varvec{\beta }_{i}=\varvec{\beta }+\varvec{\eta }_{i}\). \(\varvec{\eta }_{i}\) is independent across i with \(E\left( \varvec{\eta }_{i}\right) =0\) and \(E\left( \varvec{\eta } _{i}\varvec{\eta }_{i}^{\prime }\right) =\Omega _{\eta \eta ,i}\), where \(\Omega _{\eta \eta ,i}\) is a positive definite matrix uniformly for every i. \(E\left\| \varvec{\eta }_{i}\right\| ^{4}\le \Delta <\infty \) and \(\left\| \varvec{\beta }\right\| <\infty \). \(\varvec{\eta }_{i}\) is independent of \(\varepsilon _{it}\) and \(\varvec{\gamma }_{i}\).

Assumption A is standard in the literature, see Bai (2009), Karabiyik et al. (2017) and Cui et al. (2019) (CHNY, hereafter).

For a consistent estimation of the parameters in (1), we need to first account for unobserved factors, and then estimate \(\varvec{\beta }\) by applying panel estimators to (1) with defactored variables. On the basis of this idea, two popular approaches have been proposed. The common correlated effects (CCE) estimator advanced by Pesaran (2006), imposes that \(\varvec{x}_{it}\) share the same factors, \(\varvec{f}_{t}\)

$$\begin{aligned} \varvec{x}_{it}=\varvec{\Gamma }_{i}^{\prime }\varvec{f} _{t}+\varvec{v}_{it} \end{aligned}$$
(2)

where \(\varvec{\Gamma }_{i}\) an \(r\times k\) matrix of random heterogeneous loadings and \(\varvec{v}_{it}\) are idiosyncratic errors, and proposes to approximate \(\varvec{f}_{t}\) by the cross-section averages of the dependent and independent variables. Next, Bai (2009) allows \(\varvec{x}_{it}\) to be arbitrarily correlated with both \(\varvec{\gamma }_{i}\) and \(\varvec{f}_{t}\), and proposes the iterative principal component (PC) approach that estimates the factors jointly and iteratively with the slope parameters. The validity of the CCE approach depends crucially upon whether an appropriate rank condition, that has to be assumed, holds. Westerlund and Urbain (2015) argue that the issue of correctly selecting the number of factors, r in the PC estimation, is essentially the same as the issue of satisfying the condition, \(r\le k+1\) in CCE estimation. Further, it is shown that both estimators involve bias terms, which do not disappear unless \(N/T \rightarrow 0\). The finite sample performance of the two approaches has been intensively investigated. The earlier studies by Kapetanios and Pesaran (2005) and Chudik et al. (2011), provide Monte Carlo evidence in favour of the CCE estimator, which is partly due to uncertainty associated with estimating the true number of unobserved factors in the PC estimation. Further, Westerlund and Urbain (2015) show that the performance of the PC estimator is sensitive to the value of \(\beta \). For \(\beta =0\), the PC estimator outperforms CCE, while for \(\beta \ne 0\), the CCE estimator tends to outperform.

However, we find that the performance of the two-way fixed effect (FE) estimator has not been widely investigated. Exceptions include the studies by Coakely et al. (2006), Sarafidis and Wansbeek (2012) and Westerlund (2019a). This simply reflects the conventional view that the FE estimator would be inconsistent in the presence of IE, due to ignoring endogeneity stemming from the correlation between regressors and factors/loadings. We aim to challenge this maintained view. For large T, suppose that \(\varvec{f}_{t}\) may represent the unobserved common policy or globalisation trend, and \(\varvec{\gamma }_{i}\) are the heterogeneous individual responses (parameters).

In practice, it is important to test the validity of whether \(\varvec{x}_{it}\) are correlated with \(\varvec{\gamma }_{i}\) or not. Formally, we set the null and alternative hypothesis as follows:

$$\begin{aligned}{} & {} H_{0}:\varvec{x}_{it}\text { uncorrelated with }\varvec{\gamma }_{i} \end{aligned}$$
(3)
$$\begin{aligned}{} & {} H_{1}:\varvec{x}_{it}\text { correlated with }\varvec{\gamma }_{i} \end{aligned}$$
(4)

Under Assumptions A(ii) and (iii), we can express \(\varvec{\gamma }_{i}^{\prime }\varvec{f}_{t}\) in (1) byFootnote 1

$$\begin{aligned} \varvec{\gamma }_{i}^{\prime }\varvec{f}_{t}=\mu +\alpha _{i} + \theta _{t} +\varvec{\mathring{\gamma }}_{i}^{\prime } \varvec{\dot{f}}_{t} \end{aligned}$$
(5)

where \(\mu =\varvec{\bar{\gamma }}^{\prime }\varvec{\bar{f}}\), \(\alpha _{i}=\varvec{\gamma }_{i}^{\prime }\varvec{\bar{f}}\), \(\theta _{t}=\varvec{\bar{\gamma }}^{\prime }\varvec{f}_{t}\), \(\varvec{\mathring{\gamma }}_{i}=\varvec{\gamma }_{i}-\varvec{\bar{\gamma }}\) and \(\varvec{\dot{f}}_{t} = \varvec{f}_{t}-\varvec{\bar{f}}\) with \(\varvec{\bar{\gamma }}=N^{-1}\sum _{i=1}^{N}\varvec{\gamma }_{i}\) and \(\varvec{\bar{f}} = T^{-1}\sum _{t=1}^{T}\varvec{f}_{t}\). Using (5) in (1), we have:

$$\begin{aligned} y_{it}=\varvec{\beta }_{i}^{\prime }\varvec{x}_{it}+\mu +\alpha _{i}+\theta _{t}+\varvec{\mathring{\gamma }}_{i}^{\prime }\varvec{\dot{f}}_{t}+\varepsilon _{it} \end{aligned}$$
(6)

This transformation clearly shows that the panel data model with nonzero-mean IE, \(\varvec{\gamma }_{i}^{\prime }\varvec{f}_{t}\) in (1) can be equally expressed as the 2-way fixed effects panel data model with zero-mean IE, \(\varvec{\mathring{\gamma }}_{i}^{\prime } \varvec{\dot{f}}_{t}\) in (6).Footnote 2 Next, applying the 2-way within transformation to (6) to obtainFootnote 3

$$\begin{aligned} \ddot{y}_{it}=\varvec{\beta }_i^{\prime } \varvec{\ddot{x}}_{it} +\ddot{u}_{it},\ \ddot{u}_{it}=\varvec{\mathring{\gamma }}_{i}^{\prime }\varvec{\dot{f}}_{t}+\ddot{\varepsilon }_{it} \end{aligned}$$
(7)

where \(\ddot{y}_{it}=y_{it}-\bar{y}_{i.}-\bar{y}_{.t}+\bar{y}_{..}\) with \(y_{i.}=T^{-1}\sum _{t=1}^{T}y_{it}\), \(y_{.t}=N^{-1}\sum _{i=1}^{N}y_{it}\), \(\bar{y}_{..}=\left( NT\right) ^{-1}\sum _{i=1}^{N}\sum _{t=1}^{T}y_{it}\), and similarly for \(\varvec{\ddot{x}}_{it}\) and \(\ddot{\varepsilon }_{it}\).

Under Assumption A and (3), it is easily seen by the independence of \(\varvec{\gamma }_{i}\) from all other random quantities in the model and \(E\left( \varvec{\mathring{\gamma }}_{i}\right) =E\left( \varvec{\gamma }_{i}-\varvec{\bar{\gamma }}\right) =0\) that \(\varvec{\ddot{x}}_{it}\) is uncorrelated with the composite error, \(\ddot{u}_{it}=\varvec{\mathring{\gamma }}_{i}^{\prime }\varvec{\dot{f}}_{t}+\ddot{\varepsilon }_{it}\) in (7), provided \(\varvec{x}_{it}\) are strictly exogenous with respect to \(\varepsilon _{it}\) because

$$\begin{aligned} E\left( \varvec{\ddot{x}}_{it}^{\prime }\varvec{\mathring{\gamma }}_{i}^{\prime }\varvec{\dot{f}}_{t}\right) =E\left\{ \varvec{\ddot{x}}_{it}^{\prime }\varvec{\dot{f}}_{t}^{\prime }E\left( \varvec{\mathring{\gamma }}_{i}|\varvec{\ddot{x}}_{it},\varvec{\dot{f}}_{t}\right) \right\} =0. \end{aligned}$$
(8)

See also Section 5 in Hsiao (2018). Therefore, under the null hypothesis, (3), we can apply the two-way FE estimation to (1) and obtain a consistent estimator of \(\varvec{\beta }\) from (7). Conversely, if \(\varvec{x}_{it}\) and \(\varvec{\gamma }_{i}\) are correlated, it is clear that \(E\left( \ddot{u}_{it}\varvec{x}_{it}\right) \ne 0\) so that the FE estimator is inconsistent. Notice that the consistency of the FE estimator requires only \(\varvec{\gamma }_{i}\) to be uncorrelated with \(\varvec{x}_{it}\), but this is implicitly a maintained assumption in the CCE literature.Footnote 4 A further possibility that we do not entertain is that \(\varvec{x}_{it}\) contains a different set of factors to that entering \(y_{it}\) directly and that the two sets of factors are uncorrelated. This points out the symmetry of the role of loadings and factors in the IE setting. Then, (8) may hold even if (3) does not. However, we view this setting as too unlikely to be of interest.

The two-way FE estimator of \(\varvec{\beta }\) is given by

$$\begin{aligned} \hat{\varvec{\beta }}_{FE}=\left( \sum _{i=1}^{N}\varvec{\ddot{X}} _{i}^{\prime }\varvec{\ddot{X}}_{i}\right) ^{-1}\sum _{i=1}^{N} \varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{y}}_{i} \end{aligned}$$
(9)

where \(\varvec{\ddot{X}}_{i}=\left( \varvec{\ddot{x}}_{i1},\ldots ,\varvec{\ddot{x}}_{iT}\right) ^{\prime }\) and \(\varvec{\ddot{y} }_{i}=\left( \ddot{y}_{i1},\ldots ,\ddot{y}_{iT}\right) ^{\prime }\). As \(\ddot{u}_{it}\) in (7) still contains zero-mean IE, \(\varvec{\mathring{\gamma }}_{i}^{\prime }\varvec{\dot{f}}_{t}\), the standard variance estimator for \(\hat{\varvec{\beta }}_{FE}\) will be invalid. Thus, we propose the two consistent variance estimators, which are also robust to the heteroscedasticity and the serial-correlation as well as the slope heterogeneities. The first is the nonparametric variance estimator, similarly applied in deriving the variance of the CCE estimator by Pesaran (2006):

$$\begin{aligned}&\hat{\varvec{V}}^{NON}\left( \hat{\varvec{\beta }}_{FE}\right) \nonumber \\ {}&\quad =\left( \sum _{i=1}^{N}\varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X}}_{i}\right) ^{-1} \left( \sum _{i=1}^{N}\left( \varvec{\ddot{X}} _{i}^{\prime }\varvec{\ddot{X}}_{i}\right) \left( \hat{\varvec{\beta }}_{FE,i}-\varvec{\bar{\beta }}_{FE}\right) \left( \varvec{\hat{\beta }}_{FE,i}-\varvec{\bar{\beta }}_{FE}\right) ^{\prime }\left( \varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X}}_{i}\right) \right) \left( \sum _{i=1}^{N}\varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X} }_{i}\right) ^{-1} \end{aligned}$$
(10)

where \(\hat{\varvec{\beta }}_{FE,i}=\left( \varvec{\ddot{X}} _{i}^{\prime }\varvec{\ddot{X}}\right) ^{-1}\varvec{\ddot{X}} _{i}^{\prime }\varvec{\ddot{y}}_{i}\) and \(\varvec{\bar{\beta }} _{FE}=\frac{1}{N}\sum _{i=1}^{N}\hat{\varvec{\beta }}_{FE,i}\). Next, we consider the following heteroscedasticity and autocorrelation robust variance estimator (see CHNY):

$$\begin{aligned} \hat{\varvec{V}}^{HAC}\left( \hat{\varvec{\beta }}_{FE}\right) =\left( \sum _{i=1}^{N}\varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X} }_{i}\right) ^{-1}\left( \sum _{i=1}^{N}\varvec{\ddot{X}}_{i}^{\prime }\hat{\varvec{u}}_{FE,i}\hat{\varvec{u}}_{FE,i}^{\prime } \varvec{\ddot{X}}_{i}\right) \left( \sum _{i=1}^{N}\varvec{\ddot{X} }_{i}^{\prime }\varvec{\ddot{X}}_{i}\right) ^{-1} \end{aligned}$$
(11)

where \(\hat{\varvec{u}}_{FE,i}=\varvec{\ddot{y}}_{i}-\varvec{\ddot{X}}_{i}\hat{\varvec{\beta }}_{FE}\).

We show that \(\hat{\varvec{\beta }}_{FE}\) is consistent and follows the normal distribution asymptotically under the null, (3). The result holds for both homogeneous and heterogeneous \(\varvec{\beta }\).

Theorem 1

Under Assumption A and under (3), as \(N,T \rightarrow \infty \),

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{\beta }}_{FE}-\varvec{\beta }\right) \rightarrow _{d}N\left( 0_{k\times 1},\varvec{\Psi }_{FE}^{-1} \varvec{R}_{FE}\varvec{\Psi }_{FE}^{-1}\right) \end{aligned}$$
(12)

where \(\varvec{\Psi }_{FE}=\lim _{N,T\rightarrow \infty }\frac{1}{N}\sum _{i=1}^{N}E\left( \frac{\varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X}}_{i}}{T}\right) \). Considering \(\varvec{\beta }_{i}=\varvec{\beta }+\varvec{\eta }_{i}\), \(\varvec{R}_{FE}=\varvec{R}_{1,FE} +\varvec{R}_{2,FE}\) where

$$\begin{aligned} \varvec{R}_{1,FE}= & {} \lim _{N,T\rightarrow \infty }\frac{1}{N}\sum _{i=1} ^{N}E\left( \frac{\varvec{\ddot{X}}_{i}^{\prime }\varvec{\dot{F}}}{T}\varvec{\mathring{\gamma }}_{i}\varvec{\mathring{\gamma }} _{i}^{\prime }\frac{\varvec{\dot{F}}^{\prime }\varvec{\ddot{X}}_{i}}{T}\right) \end{aligned}$$
(13)
$$\begin{aligned} \varvec{R}_{2,FE}= & {} \lim _{N,T\rightarrow \infty }\frac{1}{N}\sum _{i=1}^{N} E\left( \frac{\varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X}}_{i}}{T} \varvec{\eta }_{i}\varvec{\eta }_{i}^{\prime } \frac{\varvec{\ddot{X}}_{i}^{\prime } \varvec{\ddot{X}}_{i}}{T}\right) , \end{aligned}$$
(14)

and \(\varvec{\dot{F}}=\left( \varvec{\dot{f}}_{1},\ldots ,\varvec{\dot{f}}_{T}\right) ^{\prime }\). Furthermore,

$$\begin{aligned}{} & {} \hat{\varvec{V}}^{NON}\left( \hat{\varvec{\beta }}_{FE}\right) ^{-1/2}\left( \hat{\varvec{\beta }}_{FE}-\varvec{\beta }\right) \rightarrow _{d}N\left( 0,\varvec{I}_{k}\right) \text { and } \hat{\varvec{V}}^{HAC}\left( \hat{\varvec{\beta }}_{FE}\right) ^{-1/2}\nonumber \\ {}{} & {} \quad \left( \hat{\varvec{\beta }}_{FE}-\varvec{\beta }\right) \rightarrow _{d}N\left( 0,\varvec{I}_{k}\right) . \end{aligned}$$
(15)

If \(\varvec{\eta }_{i}=0\), \(\forall i\), then (12) and (15) continue to hold with \(\varvec{R}_{FE}=\varvec{R}_{1,FE}\).

3 The Hausman-type test

A number of specification tests have been proposed to test the presence of the CSD or the multiplicative IE in panels. The most popular test is the cross-section dependence (CD) test statistic proposed by Pesaran (2015), that is increasingly applied to the residuals of regression models for use as an ex-post diagnostic tool. However, the CD test fails to reject the null hypothesis of no error CSD when the factor loadings have zero means, implying that the CD test will display very poor power when it is applied to cross-sectionally demeaned data. Furthermore, the residual-based CD test has been shown to often reject the null hypothesis of no remaining CSD in the case of the CCE estimator (e.g. Mastromarco et al. 2016). Juodis and Reese (2018) show that the application of the CD test to regression residuals obtained from IE models introduces a bias term of order \(\sqrt{T}\), rendering an erroneous rejection of the null.Footnote 5 Sarafidis et al. (2009) propose an alternative testing procedure for the null hypothesis of homogeneous factor loadings against the alternative of heterogeneous loadings after estimating a linear dynamic panel data model by GMM. This approach is valid only when N is large relative to T, but it can be applied to testing for any remaining error CSD after including time dummies. But, they maintain an assumption that the loadings between equations for y and \(\varvec{x}\) are uncorrelated (see their Assumption 5(b)).

The PC estimator is consistent both under models with two-way additive (fixed) effects and under models with IE, but less efficient than the FE estimator under the null model with additive effects only. But, the FE estimator is inconsistent under the alternative model with IE. Following this idea, Bai (2009, Section 9) advances the following Hausman test for testing the null of additive effects, i.e. \(\varvec{\gamma }_{i}^{\prime }\varvec{f}_{t} = \alpha _{i} +\theta _{t}\) against the alternative of IEFootnote 6:

$$\begin{aligned} H_{B}=\left( \hat{\varvec{\beta }}_{FE}-\hat{\varvec{\beta }}_{PC_B}\right) ^{\prime } \left( \varvec{V}_{B}\right) ^{-1} \left( \hat{\varvec{\beta }}_{FE}-\hat{\varvec{\beta }}_{PC_B}\right) , \end{aligned}$$
(16)

where \(\hat{\varvec{\beta }}_{PC_B}\) is the iterative PC estimator proposed by Bai (2009), \(\varvec{V}_{B}=\widetilde{Var}\left( \hat{\varvec{\beta }}_{PC_B}\right) -\widetilde{Var}\left( \hat{\varvec{\beta }}_{FE}\right) \), \(\widetilde{Var}\left( \hat{\varvec{\beta }}_{FE}\right) \) is the the standard variance estimator provided by the two-way FE estimation, and \(\widetilde{Var}\left( \hat{\varvec{\beta }}_{PC_B}\right) \) is the analytic (sandwich-form) variance estimator, which takes into account unknown form of heteroscedastic and autocorrelated errors. Bai (2009) derives that \(H_{B}\rightarrow _{d}\chi _{k}^{2}\) under the null.Footnote 7 Westerlund (2019b) proposes the alternative Hausman test statistic obtained by replacing the PC estimator with the CCE estimator, \(H_{W}\).

The conventional wisdom is that if the null hypothesis of no error CSD is rejected, the FE estimator would be biased due to the potential endogeneity arising from the correlation between the regressors and unobserved factors and/or loadings.

In empirical applications we apply the CD test and Bai’s Hausman test to the number of datasets that have been employed in the literature, and find that the CD test strongly rejects the null hypothesis of weak error CSD while the \(H_{B}\) test rarely rejects the null of additive-effects. The results of the CD test confirm the presence of strong CSD while the latter indicates the absence of IE. This suggests that the FE estimator is consistent (and potentially efficient). However, if the regressors are uncorrelated with loadings, the \(H_{B}\) test is inconsistent against the alternative model.

The results of the Monte Carlo simulation (in Section S1 in the Online Supplement) clearly demonstrate the limitation of applying the \(H_{B}\) in practice because it cannot distinguish between panels with the 2-way additive fixed effects and panels with IE where the regressors are uncorrelated with loadings.Footnote 8

The above discussion suggests that the null hypothesis of the absence of correlation between the regressors and factor loadings emerges as an influential but underappreciated feature of the panel data model with IE.

We have shown that the presence of IE does not always imply that the FE estimator is inconsistent. In particular, the FE estimator is still consistent under the null (3), even though the regressors are correlated with factors. In this case we may prefer to use the simple FE estimator, which is invariant to any complex issues related to selecting the number of unobserved factors incorrectly which would significantly affect the performance of PC estimators (Moon and Weidner 2015), and to employing the inconsistent initial estimates which may not guarantee the convergence of the interactive PC estimator (Hsiao 2018).

In this regard, it is surprising to find that the literature has been silent on investigating the important issue of testing if the regressors are correlated with loadings or not in panels with IE. For large T context, it is natural to allow for \(\varvec{x}_{it}\) to be correlated with \(\varvec{f}_{t}\) to avoid any omitted variables bias. It still remains the important issue to test whether \(\varvec{x}_{it}\) are correlated with \(\varvec{\gamma }_{i}\). Given the pervasive evidence of cross sectionally dependent errors in panels (Pesaran 2015), as the main contribution, we proceed to develop a novel Hausman-type test that investigates the validity of the null hypothesis, (3). In the model (1), recall that the PC estimator is consistent under the null, (3) and under the alternative, (4) whereas the FE estimator is consistent only under the null, (3). Following this idea, we propose the Hausman-type test based on the difference between the FE and PC estimators as follows:

$$\begin{aligned} H=\left( \hat{\varvec{\beta }}_{FE}-\hat{\varvec{\beta }}_{PC}\right) ^{\prime }\varvec{V}^{-1}\left( \hat{\varvec{\beta }}_{FE} -\hat{\varvec{\beta }}_{PC}\right) \end{aligned}$$
(17)

where \(\hat{\varvec{\beta }}_{PC}\) is the bias corrected PC estimator to be defined in (18) below, and \(\varvec{V} = Var\left( \hat{\varvec{\beta }}_{FE} - \varvec{\hat{\beta }}_{PC}\right) = Var\left( \hat{\varvec{\beta }}_{FE}\right) + Var\left( \hat{\varvec{\beta }}_{PC}\right) - Cov\left( \hat{\varvec{\beta }}_{FE},\hat{\varvec{\beta }}_{PC}\right) - Cov\left( \hat{\varvec{\beta }}_{PC}, \hat{\varvec{\beta }}_{FE}\right) \). Notice that the FE estimator is not necessarily more efficient than the PC estimator under the null, which implies that

$$\begin{aligned} Var\left( \hat{\varvec{\beta }}_{FE}-\hat{\varvec{\beta }}_{PC}\right) \not =Var\left( \hat{\varvec{\beta }}_{FE}\right) -Var\left( \hat{\varvec{\beta }}_{PC}\right) \end{aligned}$$

in contrast to the well-established finding in Hausman (1978). Hence, our proposed test is not exactly the Hausman test. We interpret the Hausman-type test in (17) as a test for the null hypothesis, (3) in heterogeneous panels with IE, (1).

Before developing the asymptotic theory for the Hausman-type statistic, we describe the asymptotic distribution of the bias-corrected PC estimator given by

$$\begin{aligned} \hat{\varvec{\beta }}_{PC} = \varvec{\tilde{\beta }}_{PC} - \frac{1}{N}\hat{\varvec{B}}_{NT} - \frac{1}{T}\hat{\varvec{C}}_{NT} \end{aligned}$$
(18)

where the \(\varvec{\tilde{\beta }}_{PC}\) is the PC estimator obtained by iteratively solving the set of nonlinear equations:

$$\begin{aligned} \varvec{\tilde{\beta }}_{PC}= & {} \left( \sum _{i=1}^{N}\varvec{X} _{i}^{\prime }\varvec{M}_{\hat{F}}\varvec{X}_{i}\right) ^{-1} \sum _{i=1}^{N}\varvec{X}_{i}^{\prime }\varvec{M}_{\hat{F} }\varvec{y}_{i}\text { and }\\ {}{} & {} \left[ \frac{1}{NT}\sum _{i=1}^{N}\left( \varvec{y}_{i}-\varvec{X}_{i}{\tilde{\varvec{\beta }}}_{PC}\right) \left( \varvec{y}_{i}-\varvec{X}_{i}{\tilde{\varvec{\beta }}} _{PC}\right) ^{\prime }\right] {\hat{\varvec{F}}={\varvec{\hat{F}}V}}_{NT} \end{aligned}$$

where \(\varvec{M}_{\hat{F}}=\varvec{I}_{T}-\hat{\varvec{F}}\left( \hat{\varvec{F}}^{\prime }\hat{\varvec{F}}\right) ^{-1} \hat{\varvec{F}}^{\prime }\), \(\varvec{V}_{NT}\) is the diagonal matrix that consists of the r largest eigenvalues of the above matrix in the brackets arranged in a decreasing order, \(\hat{\varvec{F}}\) is \(\sqrt{T}\) times the corresponding eigenvectors, and \(\frac{1}{N}\hat{\varvec{B}}_{NT}\) and \(\frac{1}{T}\hat{\varvec{C}}_{NT}\) are the bias correction terms derived in CHNY (see Appendix 9 for details).

Next, similar to the nonparametric and HAC variance estimators developed for the FE estimator, we propose two versions of the robust variance estimator for the PC estimator as followsFootnote 9

$$\begin{aligned}&\hat{\varvec{V}}^{NON}\left( \hat{\varvec{\beta }}_{PC}\right) \nonumber \\&\quad =\left( \sum _{i=1}^{N}\varvec{X}_{i}^{\prime } \varvec{M}_{\hat{F}}\varvec{X}_{i}\right) ^{-1}\left( \sum _{i=1}^{N}\left( \varvec{X}_{i}^{\prime }\varvec{M}_{\hat{F} }\varvec{X}_{i}\right) \left( \varvec{\tilde{\beta }}_{PC,i} - \varvec{\tilde{\beta }}_{PC}\right) \left( \varvec{\tilde{\beta }}_{PC,i} - \varvec{\tilde{\beta }}_{PC}\right) ^{\prime }\left( \varvec{X}_{i}^{\prime }\varvec{M}_{\hat{F}}\varvec{X}_{i}\right) \right) \nonumber \\&\qquad \left( \sum _{i=1}^{N} \varvec{X}_{i}^{\prime }\varvec{M}_{\hat{F}} \varvec{X}_{i}\right) ^{-1} \end{aligned}$$
(19)

where \(\varvec{\tilde{\beta }}_{PC,i}=\left( \varvec{X}_{i}^{\prime } \varvec{M}_{\hat{F}}\varvec{X}_{i}\right) ^{-1} \varvec{X}_{i}^{\prime } \varvec{M}_{\hat{F}}\varvec{y}_{i}\), and

$$\begin{aligned} \hat{\varvec{V}}^{HAC}\left( \hat{\varvec{\beta }}_{PC}\right) =\left( \sum _{i=1}^{N}\varvec{X}_{i}^{\prime }\varvec{M}_{\hat{F}} \varvec{X}_{i}\right) ^{-1} \left( \sum _{i=1}^{N}\hat{\varvec{X}}_{i}^{\prime } \hat{\varvec{u}}_{PC,i}\hat{\varvec{u}}_{PC,i}^{\prime } \hat{\varvec{X}}_{i}\right) \left( \sum _{i=1}^{N}\varvec{X}_{i}^{\prime } \varvec{M}_{\hat{F}}\varvec{X}_{i}\right) ^{-1} \end{aligned}$$
(20)

where \(\hat{\varvec{u}}_{PC,i} = \varvec{y}_{i} - \hat{\varvec{X}}_{i} \hat{\varvec{\beta }}_{PC}\).

We provide the asymptotic distribution of the \(\hat{\varvec{\beta }}_{PC}\) estimator in Theorem 2.

Theorem 2

Suppose that Assumption A holds. Considering, \(\varvec{\beta }_{i} = \varvec{\beta } + \varvec{\eta }_{i}\), as \(N,T \rightarrow \infty \),

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{\beta }}_{PC}-\varvec{\beta }\right) \rightarrow _{d}N\left( 0_{k\times 1},\varvec{\Psi }_{PC}^{-1} \varvec{R}_{1,PC}\varvec{\Psi }_{PC}^{-1}\right) \end{aligned}$$
(21)

where \(\varvec{\Psi }_{PC}=\lim _{N,T\rightarrow \infty }\frac{1}{N}\sum _{i=1}^{N}E\left( \frac{\varvec{V}_{i}^{\prime }\varvec{V}_{i}}{T}\right) \) with \(\varvec{V}_{i}=(\varvec{v}_{i1},\ldots ,\varvec{v}_{iT})^{\prime }\) defined in (33) in Appendix 7, and

$$\begin{aligned} \varvec{R}_{1,PC}=\lim _{N,T\rightarrow \infty }N^{-1}\sum _{i=1}^{N}E\left( \frac{\varvec{V}_{i}^{\prime }\varvec{V}_{i}}{T}\varvec{\eta } _{i}\varvec{\eta }_{i}^{\prime }\frac{\varvec{V}_{i}^{\prime }\varvec{V}_{i}}{T}\right) \end{aligned}$$
(22)

Furthermore,

$$\begin{aligned}{} & {} \hat{\varvec{V}}^{NON}\left( \hat{\varvec{\beta }}_{PC}\right) ^{-1/2}\left( \hat{\varvec{\beta }}_{PC}-\varvec{\beta }\right) \rightarrow _{d}N\left( 0,\varvec{I}_{k}\right) \text { and }\nonumber \\{} & {} \quad \hat{\varvec{V}}^{HAC}\left( \hat{\varvec{\beta }}_{PC}\right) ^{-1/2}\left( \hat{\varvec{\beta }}_{PC}-\varvec{\beta }\right) \rightarrow _{d}N\left( 0,\varvec{I}_{k}\right) . \end{aligned}$$
(23)

It is worth noting in the homogeneous case with \(\varvec{\beta }_{i}=\varvec{\beta }\) for all i that while the FE estimator is \(\sqrt{N}\)-consistent, the PC estimator can achieve a faster rate of convergence as it completely removes the effect of the unobserved factors, asymptotically. Further, the rate of convergence of the FE estimator is also shared by the CCE estimator, if the rank condition in Pesaran (2006) does not hold. Such condition cannot be ascertained but needs to be assumed, in which case the FE and CCE estimators have comparable theoretical properties. Nevertheless, the superiority of the PC estimator does not necessarily extend to its small sample properties as we examine in Monte Carlo study below.

Having established that the two versions of the robust estimator can consistently standardise the estimator, we propose to estimate \(Cov\left( \hat{\varvec{\beta }}_{FE}, \hat{\varvec{\beta }}_{PC}\right) \) byFootnote 10

$$\begin{aligned}&\hat{\varvec{C}}^{NON}\left( \hat{\varvec{\beta }}_{FE} ,\hat{\varvec{\beta }}_{PC}\right) \\&\quad =\left( \sum _{i=1}^{N}\varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X}}_{i}\right) ^{-1}\left( \sum _{i=1}^{N}\left( \varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X}}_{i}\right) \left( \hat{\varvec{\beta }}_{FE,i}-\hat{\varvec{\beta }}_{FE}\right) \left( \varvec{\tilde{\beta }}_{PC,i}-\varvec{\tilde{\beta }}_{PC}\right) ^{\prime }\left( \varvec{X}_{i}^{\prime }\varvec{M}_{\hat{F} }\varvec{X}_{i}\right) \right) \\&\qquad \left( \sum _{i=1}^{N}\varvec{X} _{i}^{\prime }\varvec{M}_{\hat{F}}\varvec{X}_{i}\right) ^{-1}\\&\qquad \hat{\varvec{C}}^{HAC}\left( \hat{\varvec{\beta }}_{FE} ,\hat{\varvec{\beta }}_{PC}\right) =\left( \sum _{i=1}^{N} \varvec{\ddot{X}}_{i}^{\prime }\varvec{\ddot{X}}_{i}\right) ^{-1}\left( \sum _{i=1}^{N}\varvec{\ddot{X}}_{i}^{\prime }\varvec{\hat{u}}_{FE,i}\hat{\varvec{u}}_{PC,i}^{\prime }\hat{\varvec{X}} _{i}\right) \left( \sum _{i=1}^{N}\varvec{X}_{i}^{\prime }\varvec{M} _{\hat{F}}\varvec{X}_{i}\right) ^{-1}. \end{aligned}$$

Accordingly, we define two operating versions of the Hausman-type statistic by

$$\begin{aligned} H^{NON}= & {} \left( \hat{\varvec{\beta }}_{FE}-\hat{\varvec{\beta }} _{PC}\right) ^{\prime }\left( \hat{\varvec{V}}^{NON}\right) ^{-1}\left( \hat{\varvec{\beta }}_{FE}-\hat{\varvec{\beta }}_{PC}\right) \end{aligned}$$
(24)
$$\begin{aligned} H^{HAC}= & {} \left( \hat{\varvec{\beta }}_{FE}-\hat{\varvec{\beta }} _{PC}\right) ^{\prime }\left( \hat{\varvec{V}}^{HAC}\right) ^{-1}\left( \hat{\varvec{\beta }}_{FE}-\hat{\varvec{\beta }}_{PC}\right) \end{aligned}$$
(25)

where

$$\begin{aligned} \hat{\varvec{V}}^{NON}= & {} \hat{\varvec{V}}^{NON}\left( \varvec{\hat{\beta }}_{FE}\right) +\hat{\varvec{V}}^{NON}\left( \varvec{\hat{\beta }}_{PC}\right) -2\hat{\varvec{C}}^{NON}\left( \varvec{\hat{\beta }}_{FE},\hat{\varvec{\beta }}_{PC}\right) \end{aligned}$$
(26)
$$\begin{aligned} \hat{\varvec{V}}^{HAC}= & {} \hat{\varvec{V}}^{HAC}\left( \varvec{\hat{\beta }}_{FE}\right) +\hat{\varvec{V}}^{HAC}\left( \varvec{\hat{\beta }}_{PC}\right) -2\hat{\varvec{C}}^{HAC}\left( \varvec{\hat{\beta }}_{FE},\hat{\varvec{\beta }}_{PC}\right) \end{aligned}$$
(27)

We provide the main result in the following Theorem.

Theorem 3

Under Assumption A, as \(N,T \rightarrow \infty \),

$$\begin{aligned} H^{j}\rightarrow _{d}\chi _{k}^{2}\ \mathrm{for\ }j=NON,HAC \end{aligned}$$

\(H^{j}\) follows the \(\chi _{k}^{2}\) distribution even though the rate of convergence of the PC estimator is \(\sqrt{NT}\) while the FE estimator is \(\sqrt{N}\)-consistent. This follows from the use of the robust covariance estimators that properly normalise the test statistic as shown in Appendix 7.

Next, notice that our proposed test, (17), is fundamentally different from Bai’s Hausman test, (16), because our null hypothesis, (3) is subsumed under his alternative model with IE, as is clearly demonstrated in (6). Furthermore, Bai’s test will be consistent only if the regressors are correlated with both factors and loadings. Importantly, Bai’s Hausman test will be inconsistent if \(\varvec{x}_{it}\) are uncorrelated with \(\gamma _{i}\) in (1), which is mainly because the FE estimator is still consistent under (3). This suggests that the non-rejection of the null by the Bai’s test is not informative because it cannot distinguish between the panel data model with the 2-way additive fixed effects only and the model with IE where the regressors are uncorrelated with loadings. See the Online Supplement for the simulation evidence. In the empirical applications below we find that Bai’s Hausman test rarely rejects the null of additive effects model against the alternative of IE even though the CD test strongly rejects the null of weak CSD for all the datasets. Such conflicting results may suggest that the regressors are indeed uncorrelated with factor loadings even in the panels with IE, which could provide the support for the usefulness of our proposed test.

4 Monte Carlo simulations

4.1 Review of previous studies

Westerlund and Urbain (2013) find that the CCE estimator does not perform well in the presence of correlated factor loadings, especially if the full rank condition is not satisfied. Karabiyik et al. (2017) discuss the role of the rank condition in the CCE estimation, and show that the second moment matrix of the estimated factors becomes asymptotically singular if the number of factors is strictly less than the number of dependent and independent variables, invalidating the key arguments commonly applied to establish the asymptotic theory. Westerlund and Urbain (2015) provide a formal comparison between the CCE and PC estimators by employing the same data generating process (DGP)Footnote 11 and show that the two estimators are asymptotically equivalent only if \(N/T \rightarrow 0\) whereas their asymptotic distributions are no longer equivalent if \(N/T\rightarrow \tau >0\), especially in terms of asymptotic biases.

Though a number of papers have examined the small sample performance of the CCE and PC estimators, we find that only two studies by Sarafidis and Wansbeek (2012) and Westerlund (2019a), have explicitly analysed the performance of the FE estimator in the presence of CSD. Assuming the homogeneous parameters with \(N=100\) and \(T=50\), Sarafidis and Wansbeek (2012) compare the performance of the FE, CCE and PC estimators. If the factor loadings between the equations for y and \(\varvec{x}\) are uncorrelated and the rank condition is satisfied, they find that all three estimators perform well in terms of bias and RMSE. If the factor loadings are correlated, however, the FE estimator is severely biased. The CCE estimator is substantially biased if the rank condition is violated. As expected, the performance of the PC estimator is not significantly affected by the presence of correlated factor loadings.

Recently, Westerlund (2019a) shows that the FE estimator can be consistent even in the presence of IE, because both FE and CCE estimators belong to a class of estimators that satisfy a zero sum restriction. But, he maintains the assumption that factor loadings are uncorrelated in which case he demonstrates that the performance of the FE and CCE estimators is satisfactory.

4.2 Monte Carlo design

We generate the data as follows:

$$\begin{aligned} y_{it}= & {} \beta _{i}x_{it}+\gamma _{1i}f_{1t}+\gamma _{2i}f_{2t}+\varepsilon _{it}, \end{aligned}$$
(28)
$$\begin{aligned} x_{it}= & {} \Gamma _{1i}f_{1t}+\Gamma _{2i}f_{2t}+u_{it}, \end{aligned}$$
(29)

where \(\left( f_{1t},f_{2t},\varepsilon _{it},u_{it}\right) ^{\prime }\) are drawn from the multivariate normal distribution with zero means and covariance matrix, \(\varvec{\Sigma }_{i} = diag \left( \sigma _{f1}^{2},\sigma _{f2}^{2},\sigma _{\varepsilon _{i}}^{2},\sigma _{u_{i}}^{2}\right) \) = \(\varvec{I}_4\). We follow Pesaran (2006) and Westerlund and Urbain (2013), and generate the factor loadings, \(\left( \gamma _{1i},\gamma _{2i}\right) \) and \(\left( \Gamma _{1i},\Gamma _{2i}\right) \) as follows:

  • Experiment 1 with uncorrelated factor loadings and the full rank in which case \(\gamma _{1i}\sim iidN(1,1)\), \(\gamma _{2i}\sim iidN(0,1)\), \(\Gamma _{1i}\sim iidN(0,1)\), \(\Gamma _{2i}\sim iidN(1,1)\) such that \(E\left( \begin{array}{cc} \gamma _{1i} &{}\quad \gamma _{2i}\\ \Gamma _{1i} &{}\quad \Gamma _{2i} \end{array} \right) =\left( \begin{array}{cc} 1 &{}\quad 0\\ 0 &{}\quad 1 \end{array} \right) .\)

  • Experiment 2 with uncorrelated factor loadings and the rank deficiency in which case \(\gamma _{1i}\sim iidN(1,1)\), \(\gamma _{2i} \sim iidN(0,1)\), \(\Gamma _{1i} \sim iidN(1,1)\), \(\Gamma _{2i}\sim iidN(0,1)\), such that \(E\left( \begin{array}{cc} \gamma _{1i} &{}\quad \gamma _{2i}\\ \Gamma _{1i} &{}\quad \Gamma _{2i} \end{array} \right) =\left( \begin{array}{cc} 1 &{}\quad 0\\ 1 &{}\quad 0 \end{array}\right) .\)

  • Experiment 3 with correlated factor loadings and the full rank in which case: \(\gamma _{1i}=\gamma _{1}+\upsilon _{1i}\), \(\gamma _{2i}=\gamma _{2} +\upsilon _{2i}\), \(\Gamma _{1i}=\) \(\Gamma _{1}+\upsilon _{1i}\), and \(\Gamma _{2i}=\) \(\Gamma _{2}+\upsilon _{2i}\) with \(\gamma _{1}=1\), \(\gamma _{2}=0\), \(\Gamma _{1} =2\), \(\Gamma _{2}=1\) and \(\left( \upsilon _{1i},\upsilon _{2i}\right) \sim iidN(0,I_{2})\), such that \(E\left( \begin{array}{cc} \gamma _{1i} &{}\quad \gamma _{12}\\ \Gamma _{1i} &{}\quad \Gamma _{12} \end{array} \right) =\left( \begin{array}{cc} 1 &{}\quad 0\\ 2 &{}\quad 1 \end{array} \right) \)

  • Experiment 4 with correlated factor loadings and the rank deficiency in which case \(\gamma _{1i}\sim iidN(1,1)\), \(\gamma _{2i}\sim iidN(0,1)\), \(\gamma _{1i}=\Gamma _{1i}\) and \(\gamma _{2i}=\Gamma _{2i}\) such that \(E\left( \begin{array}{cc} \gamma _{1i} &{}\quad \gamma _{2i}\\ \Gamma _{1i} &{}\quad \Gamma _{2i} \end{array} \right) =\left( \begin{array}{cc} 1 &{}\quad 0\\ 1 &{}\quad 0 \end{array} \right) \).

We specify the main slope parameter as \(\beta _{i} = 1 + \eta _{i}\), \(\eta _{i} \sim iidN \left( 0,0.04\right) \) and consider the following combination of \(\left( N,T\right) = 20,30,50,100,200\), setting the number of replications at \(R=1,000\).Footnote 12

4.3 The small sample performance of FE, CCE and PC estimators

We examine the finite sample performance of the following estimators: the two-way fixed effect (FE) estimator, \(\hat{\beta }_{FE}\), the CCE estimator by Pesaran (2006), \(\hat{\beta }_{CCE}\), and the bias corrected PC estimators proposed by CHNY, \(\hat{\beta }_{PC}\). We consider both pooled and mean group estimator except for \(\hat{\beta }_{PC}\) (see Appendix 7 for details). Notice that consistency of the PC estimator depends crucially upon correctly selecting the number of unobserved factors (Moon and Weidner 2015). In this regard, to address uncertainty associated with the selection criteria, we initially consider the two information criteria, denoted \(IC_{p1}\) and \(AIC_{1}\), proposed by Bai and Ng (2002). Overall, we find that the PC estimator using \(IC_{p1}\) outperforms that with \(AIC_{1}\), and we only report the results based on \(IC_{p1}\).

We report the following summary statistics:

  • Bias: \(\hat{\beta }_{R}-\beta _{0}\), where \(\beta _{0}\) is a true parameter value and \(\hat{\beta }_{R}=R^{-1}\sum _{r=1}^{R}\hat{\beta }_{r}\) is the mean coefficient across R replications.

  • RMSE: the root mean square error estimated by \(\sqrt{R^{-1} \sum _{r=1}^{R} \left( \hat{\beta }_{r}-\beta _{0}\right) ^{2}}.\)

Table 1 shows the simulation results for Experiment 1 with the full rank and uncorrelated factor loadings. The biases of all estimators are mostly negligible even in small samples with the FE performing slightly worse than other estimators when \(N=20\). The results for RMSEs display qualitatively similar patterns. RMSEs of CCE and PC estimators are lower than those of the FE and decline as N or T grows. On the other hand, the RMSE of the FE estimator improves only with N. Finally, biases and RMSEs of the pooled and mean group estimators display almost identical patterns. The relative performance of FE, CCE and PC estimators is generally in line with the simulation results reported in Chudik et al. (2011), Sarafidis and Wansbeek (2012) and CHNY.

The important exception is the poor performance of the PC estimator using \(AIC_{1}\).Footnote 13 In this case the biases are substantial in small samples. They decline only if both N and T become large. Further, their RMSEs are much larger than those of the other estimators and decrease only if N and T are large. This demonstrates the influence of the estimated number of factors for the PC estimator. Given that information criteria have very variable performance, this is a problematic issue for PC estimators in which case the FE estimator can make an operational alternative.

Table 1 Simulation results for Experiment 1 with uncorrelated loadings and the full rank

Table 2 presents simulation results for Experiment 2 where factor loadings are uncorrelated but the rank condition is violated. The performance of the CCE estimators tends to slightly deteriorate, both bias and RMSE of the CCE estimator are higher than in the case with the full rank. The performance of the CCE estimator improves slowly with N only, suggesting that the rank deficiency may slow down its performance. On the other hand, the bias and the RMSE of the PC and FE estimators do not appear to be affected by the rank deficiency. Finally, we find that the mean group estimator performs slightly better than the pooled estimator in small samples.

Table 3 shows the results for Experiment 3 with correlated loadings and full rank. Now, only the FE estimator is severely biased. Next, the biases of the CCE estimator are not negligible for small N, but its performance improves sharply with N, a consistent finding with Westerlund and Urbain (2013), who note that ’the problem with correlated loadings goes away if the rank condition is satisfied’. The overall performance of the PC estimator is qualitatively similar to the previous cases, confirming that it is still consistent with both N and T.

Table 4 presents the simulation results for Experiment 4 with correlated loadings and the rank deficiency. Both CCE and FE estimators are severely biased, confirming our theoretical prediction that both estimators are inconsistent in the presence of correlated factors loadings as also discussed in Sarafidis and Wansbeek (2012) and Westerlund and Urbain (2013). On the other hand, the performance of the PC estimators is qualitatively similar to those presented in Table 2.

Overall, our results show that, when the factor loadings are uncorrelated, all the estimators show a similar and satisfactory performance, suggesting that the FE estimator can produce reliable results even in the presence of IE. When factor loadings are correlated, however, the FE estimator becomes severely biased and the performance of the CCE estimator tends to worsen. Only under the full rank condition, the performance of the CCE improves with N. The performance of the bias-corrected PC estimator is qualitatively similar across all four experiments.

Table 2 Simulation results for Experiment 2 with uncorrelated loadings and the rank deficiency
Table 3 Simulation results for Experiment 3 with correlated loadings and the full rank
Table 4 Simulation results for Experiment 4 with correlated loadings and the rank deficiency
Table 5 Size and power of the \(H^{\textrm{NON}}\) statistic and coverage rates at 95% level for heterogeneous \(\beta \)s, \(\beta _{i}=1+\eta _{i}\), \(\eta _{i}\sim iidN(0,0.04)\) and no serial correlation

4.4 The performance of the Hausman-type test statistic

We examine the small sample performance of the H test statistics, under the above four experiments, considering the following combination of \(\left( N,T\right) =50,100,150,200,500\). To construct the H statistic, we consider the difference between the FE estimator, \(\varvec{\beta }_{FE}\) and the bias corrected PC estimator, \(\varvec{\beta }_{PC}\) standardised respectively by both versions of robust variance estimator, denoted NON and HAC.Footnote 14 We examine size and power of the H statistic, but we also report the coverage rates for the three estimators. We consider slope heterogeneity such as \(\beta _{i} = \beta + \eta _{i}\), \(\eta _{i} \sim N(0,0.04)\) and serially correlated errors given by

$$\begin{aligned} \varepsilon _{it} = \rho _{\varepsilon } \varepsilon _{i,t-1} + v_{\varepsilon _{it}} \text { and } u_{it} = \rho _{u} u_{i,t-1} + v_{uit} \text { with } \rho _{\varepsilon } =\rho _{u}=0 \text { or } 0.5, \end{aligned}$$

where \(\left( v_{\varepsilon _{it}}, v_{uit} \right) ^{\prime }\) are drawn from the bivariate normal distribution with zero means and covariance matrix, \( diag \left( \sigma _{v_{\varepsilon i}}^{2}, \sigma _{v_{u i}}^{2}\right) \) = \(\varvec{I}_2\). Hence, we examine the following two cases:

Case 1: Heterogeneous \(\beta \)s and no serial correlation; see Tables 5 and 6.

Case 2: Heterogeneous \(\beta \)s and serial correlation; see Tables 7 and 8.

Overall, the test performance of the H statistics reported in Tables 5, 6, 7 and 8, is satisfactory and qualitatively similar in terms of the empirical size and power. This confirms that all the estimators are consistent under the null with and without serial correlation. Furthermore, the satisfactory coverage rates revealed by the three estimators demonstrate that both nonparametric and HAC variance estimators are also robust to serial correlation.

Table 6 Size and power of the HHAC statistic and coverage rates at 95% level for heterogeneous \(\beta \)s, \(\beta _{i}=1+\eta _{i}\), \(\eta _{i}\sim iidN(0,0.04)\) and no serial correlation

In Experiments 1 and 2, the sizes of both \(H^{NON}\) and \(H^{HAC}\) tests approach the nominal level (0.05) in most cases as the sample size rises. The power of the H test is always one under Experiments 3 and 4. In particular, when the regressors are uncorrelated with factor loadings, \(\varvec{\beta }_{FE}\) is shown to be consistent and its coverage rate reaches the nominal 95% in Experiments 1 and 2, irrespective of the rank condition. In Experiments 3 and 4 when loadings are correlated with the regressor, however, \(\varvec{\beta }_{FE}\) is significantly biased and displays a zero coverage rate. The coverage rates of the bias-corrected PC estimator tend to 95% under all four experiments.Footnote 15

We have also considered the cases with homogeneous \(\beta \)’s and obtained qualitatively similar results, which are reported in the Online Supplement.

4.5 The pretest estimator

The estimated number of factors can influence the performance of the PC estimator considerably, and this issue needs to be handled carefully. The previous literature has not provided clear evidence on what is the best course of action to choose the number of factors. In this regard, we propose a pretest estimator which is constructed as follows. The pretest estimator, denoted \(\hat{\beta }_{pretest}\), selects either the FE or the PC estimator depending on the Hausman-type test results. To be more specific, we first evaluate the \(H^{NON}\) and \(H^{HAC}\) statistics. If the null hypothesis, (3) is not rejected, then we select \(\hat{\beta }_{pretest} =\hat{\beta }_{FE}\) while, if the null is rejected, we set \(\hat{\beta }_{pretest} =\hat{\beta }_{PC}\).

In the Online Supplement we have examined the finite sample performance of this pretest estimator under the same four experiments considered above. Its overall performance is satisfactory in terms of bias and RMSE, irrespective of whether factor loadings are correlated or not. This suggests that such an estimator has considerable potential as it alleviates the issue of selecting the number of factors, especially in the case where the regressors are found to be uncorrelated with factor loadings in practice.

5 Empirical applications

We investigate the empirical relevance of the null hypothesis of no correlation between the regressors and factor loadings by applying our proposed statistics \(H^{HAC}\) defined in (25) to fourteen datasets.Footnote 16 The details of the data and the empirical specifications are provided in Appendix 8.

Table 7 Size and power of the HNON statistic and coverage rates at 95% level for heterogeneous \(\beta \)s, \(\beta _{i}=1+\eta _{i}\), \(\eta _{i}\sim iidN(0,0.04)\) and serial correlation, \(\varepsilon _{it}=\rho _{\varepsilon }\varepsilon _{it}+v_{\varepsilon it}\), \(u_{it}=\rho _{u}u_{it}+v_{uit}\), \(\rho _{\varepsilon }=\rho _{u}=0.5\)

The Cobb–Douglas production function The first application comprises five different cases—the OECD members (\(N=26\), \(T=41\), Mastromarco et al. 2016), the 20 Italian regions (\(N=20\), \(T=21\)), the 48 U.S. States (\(N=48\), \(T=17\)) and the aggregate sectorial data for manufacturing from developed and developing countries (\(N=25\), \(T=25\)). Following the economic growth literature, we estimate the Cobb–Douglas production function by the FE and PC estimators and then apply our proposed Hausman-type test. For OECD, the output is measured by the per capita GDP while the regressor is the capital-labour ratio. For the Italian regions, output is the per capita value added while for the U.S. application, the output is the per capita gross State product, with the same regressor. In the fourth application, the output is measured as the aggregated manufacturing sector value-added of OECD countries, see Eberhardt and Teal (2019). In the fifth application, the production function is augmented by the R &D stock expenditure, and the output is the aggregate sectorial value added for manufacturing, see Eberhardt et al. (2013).

The gravity model of bilateral trade flows Next, we consider the estimation of a gravity model of the bilateral trade flows for the EU14 countries, counting \(N=91\) pairs from 1960 to 2008 (\(T=49\)). Here, we follow Serlenga and Shin (2007) and estimate the gravity panel data regression, in which the bilateral trade flow is set as a function of GDP, countries’ similarity, relative factor endowment, the real exchange rate as well as the trade union and common currency dummies.

The gasoline demand function This application aims at estimating the price and income elasticity of gasoline demand. In particular, we focus on estimating the demand function for gasoline using the data from Liu (2014), which contains quarterly data for the 50 States in the U.S. over the period 1994–2008 (\(N=50\), \(T=60\)).

Housing prices We estimate the income elasticity of real housing prices from 1975 to 2010. We consider two datasets; the first data from Holly et al. (2010) covers the 49 U.S. States (\(N=49\), \(T=36\)) while the second covers the 384 Metropolitan Statistical Areas (\(N=384\), \(T=36\)) obtained from Baltagi and Li (2014).

Technological spillovers on productivity We consider two applications. First, we estimate the effects of domestic and foreign R &D on TFP controlling for the human capital. We use a balanced panel of 24 OECD countries over the period 1971–2004 (\(N=24\) and \(T=34\)), see Coe et al. (2009) and Ertur and Musolesi (2017). In the second application we explore the channels through which technological investments affect the productivity performance of industrialised economies by estimating the productivity effects of R &D and Information and Communication Technologies (ICT), controlling for the inputs accumulation as labour and (non-ICT) capital for OECD industries. We use a balanced panel of 49 high-tech industries over the period 1977–2006 (\(N=53\) and \(T=30\)) from Pieri et al. (2018).

Health care expenditure and income We estimate the relationship between healthcare expenditure and income after controlling for public expenditure over total health expenditure. We consider a panel of 167 countries covering the period 1995–2012 (\(N=167\) and \(T=18\)), see Baltagi et al. (2017).

Demographic and business cycle volatility. We estimate the impact of the age composition of the labor force on business cycle volatility. We employ a balanced panel dataset for 51 countries over the period 1957–2000 (\(N=51\) and \(T=44\)) provided by Everaert and Vierke (2016).

Carbon emissions and trade We explore the nexus between carbon emissions and trade using a balanced panel of 32 OECD countries over the period 1990–2013 (\(N=32\) and \(T=24\)), see Liddle (2018).

In Table 9, we present the estimation and test results. First of all, the test results by \(H_{HAC}\) provide a surprisingly convincing evidence that the null hypothesis of the regressors being uncorrelated with factor loadings, is not rejected (even at 1% significance level) in thirteen out of fourteen datasets considered.Footnote 17 We also report the results for the CD test proposed by Pesaran (2015), which tests the null of no (weak) CSD against the alternative of strong CSD, and the Hausman test proposed by Bai (2009), \(H_{B}\) in (16) and the Hausman test proposed by Westerlund (2019b), \(H_{W}\), which test the null of additive-effects against the alternative of IE. The CD test strongly rejects the null hypothesis for all the datasets whilst both \(H_{B}\) rejects only once the null hypothesis of additive-effects model, at 10% significance level, and the \(H_{W}\) test reject three times. These test results are rather in conflict, since the former suggests the presence of CSD while the latter suggests no IE. As highlighted in Sect. 2, however, the rejection of CD test does not always imply that the FE estimator is biased in panels with IE. Further, in Sect. 3, we show that the \(H_{B}\) test has no power against the alternative model with IE, especially if the regressors are uncorrelated with factor loadings. Indeed, such conflicting results can provide support for our main test results that the regressors are indeed uncorrelated with factor loadings in the panels with IE.

Table 8 Size and power of the HHAC statistic and coverage rates at 95% for heterogeneous \(\beta \)s, \(\beta _{i}=1+\eta _{i}\), \(\eta _{i}\sim iidN(0,0.04)\) and serial correlation, \(\varepsilon _{it}=\rho _{\varepsilon }\varepsilon _{it}+v_{\varepsilon it}\), \(u_{it}=\rho _{u}u_{it}+v_{uit}\), \(\rho _{\varepsilon }=\rho _{u}=0.5\)

Next, we turn to the slope estimates provided by both FE and PC estimators, and find that they are mostly significant. Their magnitudes and signs are relatively similar to each other, and consistent with theoretical predictions. There is only an exception reported in the gravity model of international trade.Footnote 18

Combining all the above test and estimation results, we come to a conclusion that the regressors are uncorrelated with factor loadings in many cross-sectionally correlated panels with IE in practice. In this situation, the FE estimation can produce consistent estimator. We emphasise that the FE estimator is invariant to any complex issues related to selecting the number of unobserved factors incorrectly which would significantly affect the performance of PC estimators (Moon and Weidner 2015), and to employing the inconsistent initial estimates which may not guarantee the convergence of the iterative PC estimator (Hsiao 2018). This suggests that the FE estimator can still be of considerable applicability in a wide variety of cross-sectionally correlated panel data with IE, especially if the regressors are found to be uncorrelated with factor loadings, the validity of which can be easily verified by our proposed test.

6 Conclusions

A large strand of the literature on panel data has focused on analysing CSD, based on the error components model with IE, which is implicitly understood to bias the conventional two-way FE estimator, due to the potential endogeneity arising from the correlation between regressors and factors/loadings. Two main approaches have been advocated to deal with this issue: the CCE estimator by Pesaran (2006) and the PC estimator by Bai (2009).

Table 9 Empirical applications to fourteen different datasets

In this paper we have shown that the panel data model with IE can be encompassed by the standard two-way error components model if the regressors are correlated with factors but uncorrelated with the loadings. This suggests that the null hypothesis of no correlation between the regressors and factor loadings emerges as an influential but under-appreciated feature of the panel data model with IE. We propose the Hausman-type test, which follows the \(\chi ^{2}\) distribution asymptotically under the null hypothesis. Monte Carlo simulation results confirm that the size and the power of the proposed test is quite satisfactory even in small samples.

Finally, we apply the proposed tests to a number of existing panel datasets, and find strong evidence in favor of the regressors uncorrelated with factor loadings in nine of ten datasets. In this situation, the FE estimator would provide a simple and robust estimation strategy in practice by avoiding nontrivial computational issues associated with the PC estimator, the performance of which relies crucially upon applying the complex bias-corrections and using reliable information criteria correctly selecting the number of unobserved factors.

We conclude by noting a couple of avenues for future research. A natural but challenging extension is to develop the LM-type test which does not require us to estimate the PC estimator at all. Next, it is worthwhile to develop the Hausman-type test in the dynamic heterogeneous panel data model with IE.