Testing for individual and time e ﬀ ects in unbalanced panel data models with time-invariant regressors

: In this paper, we use a moment-based method to test the existence of the individual and time e ﬀ ects in unbalanced panel data models with time-invariant regressors. Based on the di ﬀ erence of two variance estimators of idiosyncratic errors, three test statistics are proposed. The test statistics for individual (time) e ﬀ ect is robust when the time (individual) e ﬀ ect exists, and is robust for the correlation between explanatory variables and individual or time e ﬀ ect. Additionally, they do not require prior distributional assumptions on the error term. The asymptotic properties of estimators and the test statistics are given in this paper. The Monte Carlo simulations show that the test statistics have good power in ﬁnite samples at various situations and a real example is studied for illustration.


Introduction
The panel data model has been widely used in economics and finance, as it can provide more information compared to the time series and cross-section model [1][2][3][4]. Also, it can relieve the problem that the important variables are not included in the model by adding individual and time effects to represent the heterogeneity and commonality of units respectively. Therefore, it is very important to test whether the individual and time effects are in the model, see, e.g., Breush and Pagan [5], Honda [6], Baltagi and Li [7], Bera et al. [8] and Baltagi et al. [9], etc.
If the model contains variables that should not present the estimators of this model will be inconsistent and even the statistic inference will be unreliable. For example, when the ture model is a linear panel data model with individual effect but a linear panel data model is incorrectly specified, the bias estimate of regression parameters will mislead policy makers. This motivated us to test whether the individual and time effects exist in the model.
Many scholars studied the testing methods of the existence of random effects. Breush and Pagan [5] proposed the Lagrange multiplier (LM) test under the assumption of normality of error terms, but Honda [6] found that the LM test was also robust under the assumption of non-normality and proposed two one-way tests on the basis of LM test. Baltagi and Li [7] further applied the LM test to the unbalanced panel. Bera et al. [8] used LM to test the individual effect when the idiosyncratic errors have autocorrelation. However, the above models they studied contain only one effect in the error term. For example, when we need to test individual effect, the test results will be distorted if the time effect exists. Baltagi et al. [9] used the LM test method based on the existence of two effects in the model. But LM test requires normality hypothesis and uncorrelation among explanatory variables, random effects and error term. Therefore, when the individual or time effect in the model is related to the explanatory variables, the estimation results will be biased when the ordinary least square (OLS) or general least square (GLS) estimation is used. We know that the individual or time effect is often related to the explanatory variables in the real econometric problem. Wu and Li [10] proposed a test method for individual and time effects based on moment estimation without making assumptions about the distribution of error terms and the independence between random effects and explanatory variables. The main idea is to construct the difference between the two variance estimators of the error terms of the model. One of the variance estimators of the error term is consistent regardless of whether there is an individual or time effect. Another variance estimator of the error term is only consistent under the null hypothesis. Therefore, the difference between them can be used to construct the test statistics, that is to say, the test statistics are close to zero under the null hypothesis and large under the alternative one. Also, the proposed test statistics are robust when testing one effect and the other effect exists.
In real economics applications, when factors such as age and race are considered, time-invariant variables should be included in the regression model. Pesaran [11], Sebastian and Claudia [12] and many other scholars studied the panel data model with time-invariant regressors. On the basis of Wu and Li [10], Chen et al. [13] further proposed the individual and time effect tests for the two-way error panel model with time-invariant regressors, which also combined with Pesaran [11]'s proposed estimation method when the time-invariant variables are endogenous. However, previous literature has not studied the test for an unbalanced panel with time-invariant variables.
Due to the complex external environment, panel data is often unbalanced or incomplete and the missing data may be random or non random. It is necessary to directly study the estimation and statistical inference of the unbalanced panel in order to get a precise conclusion. In the unbalanced panel model, the estimation and test methods for model contained one or two effects and for the different types of missing data are different. For example, on the basis of the testing idea put forward by Wu and Li [10] mentioned above, if the data is monotonically missing (the data of an individual after a certain time point is missing) and it is a linear panel model with only individual effect, the test method proposed by Wu and Li [10] can still be used in the unbalanced panel [14]. However, if the time effect also exists in the model, it is necessary to classify the cross-section individuals with the same missing type into one group. The main reason is that the common time effect can only be removed within the group when using OLS or other estimation methods to obtain accurate estimators. Besides, the interference of individual (time) effects should be avoided during the construction of test statistics [15]. If the data is random missing, the cross-sectional units with the same missing type need to be divided together, that is, the individuals can be differentiated in pairs to eliminate the time effect [16]. To sum up, many scholars have studied the test of parametric models in the unbalanced panel, but the test for individual and time effects in the unbalanced panel has not been studied. Therefore, we use a moment-based method to test the individual and time effects in the unbalanced panel data model with time-invariant regressors.
The contribution of this paper is the following two aspects. First, considering the difficulty to obtain the whole balanced panel data sample, we study a two-way error component time-invariant regressors model with an unbalanced panel. Also, a effective estimation method is given. Second, referring to Wu and Li [10]'s method, we have build the moment-based test statistics for individual and time effects in the unbalanced panel data model by using the difference of the two variance estimators of the error term. The test for individual (time) effect is robust no matter the existence of time (individual) effect and robust in regards to the correlation between the explanatory variable and individual or time effect. Furthermore, it is unnecessary to make a distributional assumption for the error in advance. The simulations demonstrate that the power of the test statistics is high in various situations.
The rest of this paper is organized as follows. In the Section 2, we introduce the basic model and the estimation process. Tests for individual and time effects are shown in Sections 3 and 4 respectively. Section 5 is the jointly test for two effects in the basic model and Section 6 is Monte Carlo simulations and a real example analysis. The conclusion is in Section 7. The proof of the relevant theorems will be presented in Appendix.

Model and estimation
Consider a two-way error component panel data model with time-invariant regressors, where y it is the response variable at unit i and time t, X it = (X it,1 , · · · , X it,p ) is a p × 1 vector of time varying explanatory variables, X it is the transpose of X it , β is a p × 1 vector of parameter. Z i is a m × 1 time-invariant exogenous variables * and γ is also a parameter. The error term v it contains unobservable individual effect µ i and time effect η t , which are both random effects. µ i is assumed to be independent and identically distributed with mean zero and finite variance σ 2 µ . η t , like µ i , has mean zero and finite variance σ 2 η . Also, The idiosyncratic error it is assumed to be identified independent distribution with mean zero and variance σ 2 .
It is difficult to obtain all the sample data due to several factors including migration. In application, we often get unbalanced panels. However, the traditional methods for balanced panels are not suitable for unbalanced panel data models. We have to propose other effective methods for incomplete data model. It is worth noting that, in unbalanced panel data, if we use normal centering transformation for all units, the time effect will not be removed completely. The reason is that the individuals have different length of time T. Therefore, we can regard each group as a balanced panel data that they also have the same length of time by referring the grouping method of Wu et al. [15]. So, we can solve this problem by centering in each group.
Divide n cross-section units into L disjoint groups N 1 , · · · , N L such that the observed time periods are identical for each i ∈ N l with l = 1, 2, · · · , L. There are n l units with time T l in group N l . The whole number of the sample is N = L l=1 n l T l . For each group N l , y l i = ι T l α + X l i β + (ι T l Z i )γ + ι T l µ l i + η l + l i , l i ∈ N l , i = 1, · · · , n l , where y l i = (y l i ,t l,1 , y l i ,t l,2 · · · , y l i ,t l,T l ) , ι T l is a vector of one with dimension T l and the other right-hand-side variables are stacked accordingly. Next, estimate the unbalanced panel data model. First, we estimate β by making a centering transformation in each group to remove the time effect, whereỹ l i = y l i − 1 n l n l i=1 y l i .X l i ,Z i ,μ l i and˜ l i are defined similarly. We can find matrix Q T l such that (T −1/2 l ι T l , Q T l ) is a T l × T l orthogonal matrix (e.g., Wu and Li [10]). Denote the j th column vector of matrix Q T l by q l j = (q l j 1 , · · · , q l j T l ) , l = 1, · · · , L, j = 1, · · · , T l − 1. Then, multiplying model (2.3) with Q T l , we have Q T lỹ l i = Q T lX l i β + Q T l˜ l i , l i ∈ N l , i = 1, · · · , n l , where Q T l ι T l = 0, so the time-invariant regressors and individual effect are also removed. We can use OLS to obtain the consistent estimator of β, Under some mild conditions, |Σ 1 | > 0 and as n → ∞ with fixed time period,β has the following asymptotic normal distribution, and m l = lim n→∞ n l n , see Shao et al. [25]. Note that the result is based on the large individuals and short time period.
The proof of the asymptotic distribution ofβ can refer Wu et al. [10] since the time-invariant variables are removed after the transformation.
Second, estimate γ. Peseran and Zhou [11] used the filtered method to obtain the estimator of γ by the regression ofû it on Z i and mentioned that the estimator is still consistent under unbalanced panel data model. Details of estimating procedure are as follow. For each group N l , the average over time of u it ,ū t=1 l i andû it can be estimated by y it − X itβ . Next, we can make the centering transformation over cross-section units in each group. Thus, the estimator of γ isγ For the identifying assumption of γ, the time-invariant variables are not correlated to the individual effect in this paper and the endogenous regressors can be considered in the future study.
Assumption 2.1. The error term it is cross-sectional uncorrelated and unrelated with explanatory variables X it for i = 1, · · · , n l and t = 1, · · · , T l , E( it js |X, Z) = 0 for t, s and i j, E( it X it,p ) = 0 for i, j and p.
Assumption 2.2. The it can be heterogeneous, E( it is ) = r i (t, s), where r i (t, t) = r 2 i and r i (t, s) < K, K is the nonzero constant and K < ∞. Assumption 2.3. The time-invariant variable Z l i is uncorrelated to the individual effect µ l i and the error term¯ l i and also µ l i and¯ l i are independently distributed. The fourth moment of Z i is finite.
These assumptions are similar to Peseran and Zhou [11].

Test for individual effect
In this section, we construct a test statistic for individual effect in model (2.2). The hypothesis is: After the transformation of original model, we obtain model (2.4) and the moment condition of Q T l˜ l i : where c 1 = L l=1 c 1l with c 1l = (n l − 1)(T l − 1). Then, the estimator of σ 2 is: where theβ is obtained from Eq (2.5).σ 2 0 is a consistent estimator of σ 2 0 under the null hypothesis, alternative hypothesis, and irregardless of the existence of µ i and η t (see Wu et al. [10]).
We can construct the fourth-order moment of the variance of error σ by referring Wu and Zhu [17] and Wu et al. [15].
Thus, the estimator of the λ 4 , is consistent under some mild conditions. Next, we should obtain the consistent estimator of β under the null hypothesis. Under the null hypothesis, the original model (2.2) becomes We just need to remove the time effect by centering in each group, where theγ is obtained in Eq (2.8). However,σ 2 1 is only consistent under H µ 1 (see Wu et al. [10]) whileσ 2 0 is both consistent under H µ 0 and H µ 1 . Therefore, we can construct a test statistic by following Hausman [18], where ω n = a nλ 4 + b n (σ 2 0 ) 2 is used to standardize the test statistic. The (3.14) That T µ will be close to zero under the H µ 0 but large under H µ 1 .
with zero mean and finite variance The assumptions can refer to Wu et al. [15].
where Φ n = a n λ 4 + b n (σ 2 0 ) 2 , a = lim n→∞ a n , b = lim n→∞ b n . The proof of the above theorem is in Appendix.

Test for time effect
In this section, we will test the time effect of model (2.2). Consider the heteroscedasticity of η, the hypothesis is as follows, H η 0 : var(η 1 ) = · · · = var(η T ) = 0, H η 1 : at least one of them is nonzero.
where T is the largest number of time of all groups. Similar to the test for individual, under H η 0 , the model (2.2) reduces to, We obtain the variance estimator of error by eliminating the individual effect and the time-invariant variables by the same orthogonal transformation mentioned in the Section 2. We have Q T l y l i = Q T l X l i β + Q T l l i , l i ∈ N l , i = 1, · · · , n l . It also holds that where c 5 = L l=1 c 5l with c 5l = (T l − 1)n l . So, Then, we set some assumptions to study the properties of T η (refer to Wu et al. [15]).
where η = (η 0t l,1 , η 0t l,2 , · · · , η 0t l,T l ), [15] for similar discussion and the proof is omitted in this paper. We can obtain that Σ 8l = 0 if the EX it is independent over time period. Then, the time effect test statistic T η follows chi-distribution under the null hypothesis, which is very efficient in real application. In order to satisfy this condition, we consider two transformations as same as Wu et al. [15]. Firstly, centralize the X it in the original model by deducing 1 n l n l i=1 X l i in each group. Secondly, transform the explained variable y l i ,t into y l i ,t − X l·,tβ . Theorem 4.1 still holds after the above transformation.

Test jointly for two effects
In this section, we consider the jointly test for the two effects of model (2.2). The hypothesis is as follows, H µη 0 : σ 2 µ = var(η 1 ) = · · · = var(η T ) = 0, H η 1 : at least one of them is nonzero.
Under the null hypothesis H µη 0 , the original model becomes The estimator of the variance of the error term iŝ whereβ andγ are obtained from (2.5) and (2.8). Referring to Pesaran and Zhou [11], the consistent estimator of α is thatα = 1 N L l=1 n l i=1 ι T l (y l i − X l iβ − (ι T l Z i )γ).σ 2 3 is only consistent under H µη 0 . The test statistic can be similarly constructed based on the difference betweenσ 2 3 andσ 2 0 , where ω n is as same as (3.12). where Φ n = a n λ 4 + b n (σ 2 0 ) 2 and σ 1 is defined in Assumption 3.1. The proof of the theorem is similar to Theorem 3.1 and omitted in this paper since we can also refer to Wu et al. [15].

Simulations
In order to evaluate the performance of the test method, Monte Carlo simulations are used to compute the empirical power of the test statistics based on 1000 replications.

Monte Carlo
We consider the data generating process as follows, where the explanatory variables and time-invariant variables are two-dimensional, X it = (X it,1 , X it,2 ) , Z i = (Z i,1 , Z i,2 ) , β = (β 1 , β 2 ) with β 1 = 1 and β 2 = 1, γ = (γ 1 , γ 2 ) with γ 1 = 1 and γ 2 = 1, {µ i } and {η t } are i.i.d. with mean zero, finite variance var(µ i ) = σ 2 µ and var(η t ) = σ 2 η . In this unbalanced panel data model, we assume that it has three groups with different time periods T i , which is T = [4,8,12]. The number of cross-section units in each group is random (see Wu et al. [15]). Referring to Chen et al. [13], we also set time-varying variables generated by where k and h are constant and we can use them to control the correlations between X it and individual effect µ i and time effect η respectively. The correlation form has no effect on the results. For simplicity, we only set this form. g 1t and g 2t are uniformly distributed U(1, 2), w (1) it and w (2) it are i.i.d. normal distribution with mean zero and one variance. In this paper, we assume that the time-invariant variable Z i is exogenous.
i and ε (2) i are i.i.d. N(0, 1). Next, we describe the estimators of two parameters and the power of the test statistics in different situations. And we choose n = 50, 100, 150, 200 to evaluate the performance of the test statistics as the cross-section units increase. First, we summarize the performance of the estimators in Table 1 when the individual and time effects exist simultaneously. The value of theβ 1 andβ 2 (γ 1 andγ 2 ) is very close, so we just show the result ofβ 1 andγ 1 . Bothγ andβ have small deviations and variances. The standard deviations of the two estimators also decrease when the sample size increases. We can see that the estimator is very close to the true value.
Second, we evaluate the empirical power of individual test statistic T µ in two different distributions of error term. They are ε it ∼ N(0, 1) and ε it ∼ 1 2 χ 2 (1). Table 2 shows that power of the T µ is still high even the time effect η exists and we omit the result here because the results are same as that when the time effect does not exist. The individual effect test statistic is robust under two types of distribution and also robust to the correlation between the explanatory variables and individual effect. Besides, the empirical power of T µ increases with the increase of sample size.  Table 2. Empirical powers of the tests T µ in case of k = 0, k = 1 and h = 0 with σ η = 0. Next, we evaluate the time effect test statistic T η . Table 3 shows that the power of T η is higher than the individual effect test statistic T µ . When the individual effect µ > 0, the power of the T η is as same as the result of T µ > 0 and we omit here. Besides, the power of the T η increases as the sample size increases. Also, T η is robust to the correlation between the explanatory variables and time effect. Table 4 displays the empirical power of the T µη when the explanatory variables are not related to the time effect (h = 0). The test statistics becomes larger as the sample size increases. According to the Tables 2 and 3, the power of T η is higher than T µ . So the value of σ η will have a greater impact on T µη than σ µ . When σ η is large, such as σ η = 0.4, the power of T µη is high at different values of σ µ . When the sample size is small like n = 50, 100, the power is also very high. Thus, we just simulate n = 50 and 100. Table 5 displays the empirical power of the T µη when the explanatory variables are not related to the individual effect (k = 0). T µη is robust under h = 0 and h = 1. When σ µ = σ η = 0, T µη is as same as Table 4 no matter h = 0 or h = 1 and k = 1 or k = 1 and T µη just increases with the sample data increases. So, we omit this case here.
To sum up, three test statistics proposed in this paper are robust and have good power under several unique situations. Furthermore, many studies verified that these types of moment-based test methods have higher power than LM tests proposed by Breush and Pagan, [5], Honda's test [6] and conditional LM tests proposed by Baltagi et al. [9], see, e.g., Wu and Li [10], wu et al. [15], Chen et al. [13], etc, for more details.

A real example
Economic development is of great importance to countries and regions. There are many factors affecting the economic growth, see, e.g., Oleg [19], Iuliana [20], Carlsen [21], etc. The impact of foreign direct investment (FDI) on economic growth has been one of the focuses in economics. It can provide some useful suggestions for policymakers. Richard et al. [22] studied nine OECD countries and seven industries by using the cross-section data and concluded that FDI promotes economic growth through technology spillover. Borenztein et al. [23] studied 69 developing countries and found that FDI can only promote economic growth of host countries when advanced technology has sufficient absorptive capacity, and this impact depends on human capital. Besides, there are many other factors have effects on economic growth.
In order to study the relationship between FDI and economic growth, Kottardi and Stengos [24] considered the traditional linear model, where y it is per capita GDP growth rate in i th province and t th time point, (FDI/Y) it is the ratio of foreign direct investment to total output, (DI/Y) it is the ratio of local investment to total output, n it is natural population growth rate, h it is human capital and it is the idiosyncratic error. However, it is widely accepted that the effect of FDI on economic growth has the cross-sectional units heterogeneity due to the environment and some relevant policies on different provinces and regions. Also, the economic growth may have a trend over time, which cannot be captured by other variables. So, the model may contain the individual and time effects (µ i and η t ). Then, one may wonder whether the individual and time effects exist. Therefore, the possible model for this example is as follows, In this section, the model (6.5) can also be used to study the effect on economic growth of China. Our model utilized the panel data of 30 provincial regions in China from 1992 to 2017 (excluding Tibet) and the data are from China Statistical Yearbook and statistical yearbooks of various regions.
Since the proposed test method is very general, it can also be used for two-way error component panel data model without time-invariant variables. In order to illustrate the efficiency of the proposed test, we chose three types of unbalanced panel data sample with T = [10,18,26] from the original database and the number of cross-section units are random.  Table 6 gives the results of the test statistics in three different cases. The p-value of T µ , T η and T µη are all less than 0.0001. So, we have to reject the null hypothesis at a significance level of 0.05. Similarly, the time effect test statistic T η is larger than T µ , which has the same result we obtained in the simulation. Thus, we conclude that the effect of FDI on economic growth has the heterogeneity over provinces and the time effect should be included in the model to get the correct estimator when studying this problem.

Conclusions
In this paper, we construct three test statisitics for individual effect, time effect and jointly effects in unbalanced panel data models which include the time-invariant variables. The test is based on the moment method using the difference of two variance estimators of error. When we test the individual effect, the test statistic is efficient no matter the existence of the time effect, and vice versa. Furthermore, three test statistics are robust to the correlations between the explanatory variables and individual or time effect. There is no need to make some distributional assumption on error term. The simulation results show that the proposed test statistics are robust under various situations and they all have good finite sample properties. We also studied the relationship between FDI and economic growth, and found that the effect of FDI on economic growth has heterogeneity and common time characteristics.
In fact, traditional linear models are not efficient in actual application. The nonparametric model has been widely used for several years and we can consider a more effective test method of various nonparametric panel models in the future. In addition, the time-invariant variables could be endogenous. We can study the estimation and test methods for this kind of situation such as embedding the instrumental variable method or system generalized method of moments.
The proof of the theorems is as follows.
Proof. For theorem 2.2. We can refer the proof of Pesaran [11]. For each group N l , according to the estimation,ū For each group N l , noting thatβ is consistent to β, we have For i = 1, · · · , n l , Z and X is the matrix that contains Z i and X it respectively, and µ is the vector that contains µ l i . Under the Assumption 2.3, E((Z l i −Z)µ l i ) = 0. We have E(γ) = γ thatγ is consistent to γ. Then, we will proof that theγ is a √ n consistent estimator of γ.
Under the Assumptions 2.2 and 2.3, 1 n L l=1 n l i=1 (Z l i −Z)(X l i −X) will converge to finite value. In this paper, we assume that the individual effects in each group are the same, so the variance of the effect is constant where ω l i T l = σ 2 µ + Σr i (s, t). Then we have Next, obtain the asymptotic variance of the estimatorγ. According to the Eq (A.2), the first term of right side, where m l = lim n→∞ n l n , this setting is commonly used in Shao et al. [25].
The variance of the first term of above formula, where Σ 4 = var L l=1 n l n 1 √ n l n l i=1 (Z l i −Z)µ l i . First, we know that where Σ 5 = L l=1 m l [E(Z l iX l i ) − E(Z l i )E(X l i ) ]. Then, the second term, Note that √ n(β − β) = O p (1) and √ n(γ − γ) = O p (1). Besides, use some limit theorems and the proof in Appendix of Chen et al. [13] and we suppose the explanatory variable X it and time-invariant variable Z it are i.i.d sequence and EX 4 it < ∞, EZ 4 it < ∞. We have where ς l i = n c 4 l i l i − n c 1 l i P T l l i and ς l i is independent cross units. It holds that lim n→∞ where ω n = a n λ 4 + b n (σ 2 0 ) 2 is used to standardize the test statistics. The a = lim n→∞ a n = lim n→∞ 1 n L l=1 n l n 2 c 2